OS-6639: bhyve memory allocations that can't succeed should fail or at least be interruptible

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2018-02-16T18:33:27.466Z
Updated at:2018-03-16T17:00:32.332Z

People

Created by:Mike Gerdts
Reported by:Mike Gerdts
Assigned to:Jerry Jelinek

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-03-15T15:01:57.577Z)

Fix Versions

2018-03-29 Old Man's House (Release Date: 2018-03-29)

Related Links

Labels

bhyve

Description

I have a machine with 96 GB of RAM.  I started one guest with 64 GB and tried to start another with 32 GB.  Predictably, it wasn't able to immediately find enough RAM.  Unfortunately, this means that it hangs until enough RAM is available.

> ffffd02929d6f018::walk thread | ::findstack -v
stack pointer for thread ffffd0287e333c20: ffffd000bc86a3d0
[ ffffd000bc86a3d0 _resume_from_idle+0x112() ]
ffffd000bc86a400 swtch+0x18a()
ffffd000bc86a430 cv_wait+0x89(ffffd0287e333e0e, ffffd0287e333e10)
ffffd000bc86a490 delay_common+0xb8(fa)
ffffd000bc86a4d0 delay+0x30(fa)
ffffd000bc86a500 page_resv+0x82(840000, 0)
ffffd000bc86a5a0 segkmem_xalloc+0x80(ffffd025e2ce4000, 0, 840000000, 0, 0, fffffffffb8a5640, fffffffffc0199b8)
ffffd000bc86a610 segkmem_alloc_vn+0x5e(ffffd025e2ce4000, 840000000, 0, fffffffffc0199b8)
ffffd000bc86a640 segkmem_zio_alloc+0x20(ffffd025e2ce4000, 840000000, 0)
ffffd000bc86a790 vmem_xalloc+0x629(ffffd025fef67000, 840000000, 1000, 0, 0, 0, 0, ffffd02400000000)
ffffd000bc86a810 vmem_alloc+0x145(ffffd025fef67000, 840000000, 0)
ffffd000bc86a850 vm_object_allocate+0x6c(0, 840000)
ffffd000bc86a8c0 vm_alloc_memseg+0xd9(ffffd02929e4b000, 0, 840000000, 1)
ffffd000bc86a900 vmmdev_alloc_memseg+0x36(ffffd0261e6eaf00, ffffd000bc86a950)
ffffd000bc86ac00 vmmdev_do_ioctl+0x615(ffffd0261e6eaf00, 8050760e, ffffbf7fffdffbe0, 202003, ffffd0291f373730, ffffd000bc86ae98)
ffffd000bc86acb0 vmm_ioctl+0x138(13000000004, 8050760e, ffffbf7fffdffbe0, 202003, ffffd0291f373730, ffffd000bc86ae98)
ffffd000bc86acf0 cdev_ioctl+0x39(13000000004, 8050760e, ffffbf7fffdffbe0, 202003, ffffd0291f373730, ffffd000bc86ae98)
ffffd000bc86ad40 spec_ioctl+0x60(ffffd02929d04e00, 8050760e, ffffbf7fffdffbe0, 202003, ffffd0291f373730, ffffd000bc86ae98, 0)
ffffd000bc86add0 fop_ioctl+0x55(ffffd02929d04e00, 8050760e, ffffbf7fffdffbe0, 202003, ffffd0291f373730, ffffd000bc86ae98, 0)
ffffd000bc86aef0 ioctl+0x9b(3, 8050760e, ffffbf7fffdffbe0)
ffffd000bc86af00 sys_syscall+0x290()

This is not interruptible.

[root@emy-17 /root]# pgrep -x zhyve
104238
104507
[root@emy-17 /root]# kill -9 104507
[root@emy-17 /root]# kill -9 104507

Likewise, zoneadm -z $zone halt has no effect: zoneadm hangs in a door call to zoneadmd.

Comments

Comment by Jerry Jelinek
Created at 2018-03-02T13:21:59.062Z
Updated at 2018-03-02T13:27:13.879Z

I can think of two approaches for this
1) Add kernel support for an "interruptible" flag on KM_SLEEP allocations (starting by using cv_wait_sig in page_create_throttle), which we might want for other reasons,
2) Change vmm to use a KM_NOSLEEP allocation, handle the ENOMEM error in zhyve, and sleep/retry a few times in zhyve before giving up and dying.

Since the 2nd approach is less invasive, I'm going to try prototyping that and see how well it works.


Comment by Jerry Jelinek
Created at 2018-03-02T14:43:29.606Z

For future reference, here is a zhyve stack on startup when the system is now overcommitted on memory due to another bhyve instance running.

stack pointer for thread ffffff030c47b4c0: ffffff000e339420
[ ffffff000e339420 _resume_from_idle+0x112() ]
  ffffff000e339450 swtch+0x141()
  ffffff000e339490 cv_wait+0x70(ffffff030c47b6ae, ffffff030c47b6b0)
  ffffff000e3394f0 delay_common+0xb8(fa)
  ffffff000e339530 delay+0x30(fa)
  ffffff000e339560 page_resv+0x82(108000, 0)
  ffffff000e339600 segkmem_xalloc+0x72(ffffff030345c000, 0, 108000000, 0, 0, fffffffffb88c830, fffffffffbcf9078)
  ffffff000e339660 segkmem_alloc_vn+0x4a(ffffff030345c000, 108000000, 0, fffffffffbcf9078)
  ffffff000e339690 segkmem_zio_alloc+0x20(ffffff030345c000, 108000000, 0)
  ffffff000e3397c0 vmem_xalloc+0x5b1(ffffff030569b000, 108000000, 1000, 0, 0, 0, 0, fffffd7f00000000)
  ffffff000e339830 vmem_alloc+0x135(ffffff030569b000, 108000000, 0)
  ffffff000e339870 vm_object_allocate+0x6c(0, 108000)
  ffffff000e3398e0 vm_alloc_memseg+0xd9(ffffff0319c82000, 0, 108000000, 1)
  ffffff000e339920 vmmdev_alloc_memseg+0x36(ffffff0316df0340, ffffff000e339970)
  ffffff000e339c20 vmmdev_do_ioctl+0x615(ffffff0316df0340, 8050760e, fffffd7fffdffbe0, 202003, ffffff0332bd40d8, ffffff000e339ea8)
  ffffff000e339cc0 vmm_ioctl+0x12c(13000000021, 8050760e, fffffd7fffdffbe0, 202003, ffffff0332bd40d8, ffffff000e339ea8)
  ffffff000e339d00 cdev_ioctl+0x39(13000000021, 8050760e, fffffd7fffdffbe0, 202003, ffffff0332bd40d8, ffffff000e339ea8)
  ffffff000e339d50 spec_ioctl+0x60(ffffff031a2ae700, 8050760e, fffffd7fffdffbe0, 202003, ffffff0332bd40d8, ffffff000e339ea8, 0)
  ffffff000e339de0 fop_ioctl+0x55(ffffff031a2ae700, 8050760e, fffffd7fffdffbe0, 202003, ffffff0332bd40d8, ffffff000e339ea8, 0)
  ffffff000e339f00 ioctl+0x9b(3, 8050760e, fffffd7fffdffbe0)
  ffffff000e339f10 sys_syscall+0x19f()

Comment by Jerry Jelinek
Created at 2018-03-02T20:24:11.418Z
Updated at 2018-03-02T20:24:42.149Z

My prototype of the 2nd option (handling/retrying ENOMEM in zhyve) seems to be working well under all of the different memory conditions, and it allows me to halt the zone when the memory is never going to be available (until other zones halt).


Comment by Jira Bot
Created at 2018-03-15T15:01:24.909Z

illumos-joyent commit d8aa9216af8c1d23eaddba43fdd00ffe3f59da7a (branch master, by Jerry Jelinek)

OS-6639 bhyve memory allocations that can't succeed should fail or at least be interruptible
Reviewed by: Mike Gerdts <mike.gerdts@joyent.com>
Reviewed by: John Levon <john.levon@joyent.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: John Levon <john.levon@joyent.com>