OS-6888: bhyve wedged on vioapic write

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2018-04-11T18:28:48.319Z
Updated at:2018-06-20T20:59:10.411Z

People

Created by:Patrick Mooney [X]
Reported by:Patrick Mooney [X]
Assigned to:Patrick Mooney [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-06-20T20:59:10.395Z)

Fix Versions

2018-06-21 Underwater Reactor (Release Date: 2018-06-21)

Related Links

Labels

bhyve

Description

While testing the joyent-retro image for OS-6871, I attempted to reboot a misbehaving instance. When the system complained about being unable to shutdown the zone, I looked toward the in-kernel stacks for bhyve:

> 0t6434::pid2proc | ::walk thread | ::findstack -v
stack pointer for thread fffffe5b04e773c0: fffffe007b723d10
[ fffffe007b723d10 _resume_from_idle+0x126() ]
  fffffe007b723d40 swtch+0x141()
  fffffe007b723d80 cv_wait+0x70(fffffe5b1ddd50de, fffffe593f44c280)
  fffffe007b723dc0 exitlwps+0x13c(0)
  fffffe007b723e40 psig+0x47f()
  fffffe007b723f00 post_syscall+0x805(4, fffffe5afff46080)
  fffffe007b723f10 0xfffffffffb800ccb()
stack pointer for thread fffffe5b04e82140: fffffe008184f410
[ fffffe008184f410 _resume_from_idle+0x126() ]
  fffffe008184f440 swtch+0x141()
  fffffe008184f480 cv_wait+0x70(fffffe5942574108, fffffe5942574100)
  fffffe008184f4e0 vm_handle_rendezvous+0x93(fffffe5942574000, 0)
  fffffe008184f540 vm_smp_rendezvous+0x10b()
  fffffe008184f5f0 vioapic_write+0x153(fffffe5aeb65a800, 0, 12, 10000)
  fffffe008184f670 vioapic_mmio_rw+0xe6(fffffe5aeb65a800, 0, fec00010, fffffe008184f698, 4, 0)
  fffffe008184f6d0 vioapic_mmio_write+0x4e(fffffe5942574000, 0, fec00010, 10000, 4, fffffe008184f8df)
  fffffe008184f760 emulate_mov+0xb2(fffffe5942574000, 0, fec00010, fffffe59425742d8, fffffffff8440130, fffffffff8440190, fffffe008184f8df)
  fffffe008184f7b0 vmm_emulate_instruction+0x6c(fffffe5942574000, 0, fec00010, fffffe59425742d8, fffffe59425742c0, fffffffff8440130, fffffffff8440190, fffffe008184f8df)
  fffffe008184f850 vm_handle_inst_emul+0x185(fffffe5942574000, 0, fffffe008184f8df)
  fffffe008184f920 vm_run+0x3a7(fffffe5942574000, fffffe008184f970)
  fffffe008184fc20 vmmdev_do_ioctl+0xbce(fffffe5b0cf3ab00, c0907601, fffffc7febe0be10, 202003, fffffe5b033038b0, fffffe008184fea8)
  fffffe008184fcc0 vmm_ioctl+0x12c(13100000003, c0907601, fffffc7febe0be10, 202003, fffffe5b033038b0, fffffe008184fea8)
  fffffe008184fd00 cdev_ioctl+0x39(13100000003, c0907601, fffffc7febe0be10, 202003, fffffe5b033038b0, fffffe008184fea8)
  fffffe008184fd50 spec_ioctl+0x60(fffffe5afbf0e780, c0907601, fffffc7febe0be10, 202003, fffffe5b033038b0, fffffe008184fea8, 0)
  fffffe008184fde0 fop_ioctl+0x55(fffffe5afbf0e780, c0907601, fffffc7febe0be10, 202003, fffffe5b033038b0, fffffe008184fea8, 0)
  fffffe008184ff00 ioctl+0x9b(3, c0907601, fffffc7febe0be10)
  fffffe008184ff10 sys_syscall+0x19f()
stack pointer for thread fffffe5961428420: fffffe007b6f3780
[ fffffe007b6f3780 _resume_from_idle+0x126() ]
  fffffe007b6f37b0 swtch+0x141()
  fffffe007b6f37f0 cv_wait+0x70(fffffe5942574108, fffffe5942574100)
  fffffe007b6f3850 vm_handle_rendezvous+0x93(fffffe5942574000, 2)
  fffffe007b6f3920 vm_run+0x443(fffffe5942574000, fffffe007b6f3970)
  fffffe007b6f3c20 vmmdev_do_ioctl+0xbce(fffffe5b0cf3ab00, c0907601, fffffc7feba0de10, 202003, fffffe5b033038b0, fffffe007b6f3ea8)
  fffffe007b6f3cc0 vmm_ioctl+0x12c(13100000003, c0907601, fffffc7feba0de10, 202003, fffffe5b033038b0, fffffe007b6f3ea8)
  fffffe007b6f3d00 cdev_ioctl+0x39(13100000003, c0907601, fffffc7feba0de10, 202003, fffffe5b033038b0, fffffe007b6f3ea8)
  fffffe007b6f3d50 spec_ioctl+0x60(fffffe5afbf0e780, c0907601, fffffc7feba0de10, 202003, fffffe5b033038b0, fffffe007b6f3ea8, 0)
  fffffe007b6f3de0 fop_ioctl+0x55(fffffe5afbf0e780, c0907601, fffffc7feba0de10, 202003, fffffe5b033038b0, fffffe007b6f3ea8, 0)
  fffffe007b6f3f00 ioctl+0x9b(3, c0907601, fffffc7feba0de10)
  fffffe007b6f3f10 sys_syscall+0x19f()
stack pointer for thread fffffe5a92307520: fffffe007f2cd780
[ fffffe007f2cd780 _resume_from_idle+0x126() ]
  fffffe007f2cd7b0 swtch+0x141()
  fffffe007f2cd7f0 cv_wait+0x70(fffffe5942574108, fffffe5942574100)
  fffffe007f2cd850 vm_handle_rendezvous+0x93(fffffe5942574000, 3)
  fffffe007f2cd920 vm_run+0x443(fffffe5942574000, fffffe007f2cd970)
  fffffe007f2cdc20 vmmdev_do_ioctl+0xbce(fffffe5b0cf3ab00, c0907601, fffffc7feb80ee10, 202003, fffffe5b033038b0, fffffe007f2cdea8)
  fffffe007f2cdcc0 vmm_ioctl+0x12c(13100000003, c0907601, fffffc7feb80ee10, 202003, fffffe5b033038b0, fffffe007f2cdea8)
  fffffe007f2cdd00 cdev_ioctl+0x39(13100000003, c0907601, fffffc7feb80ee10, 202003, fffffe5b033038b0, fffffe007f2cdea8)
  fffffe007f2cdd50 spec_ioctl+0x60(fffffe5afbf0e780, c0907601, fffffc7feb80ee10, 202003, fffffe5b033038b0, fffffe007f2cdea8, 0)
  fffffe007f2cdde0 fop_ioctl+0x55(fffffe5afbf0e780, c0907601, fffffc7feb80ee10, 202003, fffffe5b033038b0, fffffe007f2cdea8, 0)
  fffffe007f2cdf00 ioctl+0x9b(3, c0907601, fffffc7feb80ee10)
  fffffe007f2cdf10 sys_syscall+0x19f()

Here, a write to the vioapic has resulted in a rendezvous operation:

> fffffe5942574000::print 'struct vm' rendezvous_func rendezvous_req_cpus rendezvous_done_cpus
rendezvous_func = vioapic_update_tmr
rendezvous_req_cpus = {
    rendezvous_req_cpus.cpub = [ 0xf, 0, 0, 0 ]
}
rendezvous_done_cpus = {
    rendezvous_done_cpus.cpub = [ 0xd, 0, 0, 0 ]
}

Since bhyve is attempting to exit during the rendezvous, the thread running the "missing" vCPU (the one which has not yet completed the rendezvous) has already stopped. Any cv_wait calls which depend on vCPU threads re-entering the kernel like this should be changed to cv_wait_sig. That will require close examination of how the system should act in the face of interruption.

Comments

Comment by Patrick Mooney [X]
Created at 2018-04-11T19:50:22.455Z

The crux of this issue is that these synchronization primitives are apparently not signal-aware in FreeBSD. As such, none of the contexts in which they are used are prepared for bail-outs caused by signal interruption.


Comment by Patrick Mooney [X]
Created at 2018-06-05T19:36:13.241Z

I spent a while digging into the details of vm_smp_rendezvous usage from vioapic, and possible solutions to this issue. As far as I can tell, there's quite a bit of possible fixes and improvements to how bhyve handles setup and routing of level-triggered interrupts. As a temporary stop-gap to prevent the vioapic_update_tmr case from hanging systems when the bhyve process is killed, @mike.gerdts proposed adding some sort of manual bail-out to the rendezvous process. While it treats the symptom, not the problem, it will buy us more time to address the root cause fully, with time for testing and verification.


Comment by Patrick Mooney [X]
Created at 2018-06-20T15:28:17.208Z

With the change applied, the normal rendezvous hang that smartos-retro would trigger is able to bail out when the process is killed. This allows vmadm stop -F to work again. The vm_suspend issue is not one that I've been able to reproduce locally, but the same mitigation should be adequate.


Comment by Jira Bot
Created at 2018-06-20T20:58:44.497Z

illumos-joyent commit dce228e4331f185347c3e0325cab8a3af72d6410 (branch master, by Patrick Mooney)

OS-6888 bhyve wedged on vioapic write
Reviewed by: Hans Rosenfeld <hans.rosenfeld@joyent.com>
Reviewed by: Mike Gerdts <mike.gerdts@joyent.com>
Approved by: Mike Gerdts <mike.gerdts@joyent.com>