|Priority:||4 - Normal|
|Created by:||Patrick Mooney [X]|
|Reported by:||Patrick Mooney [X]|
|Assigned to:||Patrick Mooney [X]|
Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-10-15T14:34:21.392Z)
As part of OS-6864, illumos-specific handlers were added to bhyve in order to perform certain state management tasks when a
vm_run thread is switched on/off cpu. When inside the critical section of
vmx_run, it takes care of
VMPTRLD-ing the VMCS if such context switches were to occur. Much of the VMX-specific code is structured to avoid those situations, but the ctx ops ensure safety. One overlooked aspect of the current logic is the
VMLAUNCH distinction. Inside the main loop of
vmx_run a variable
launched keeps track of when a VMCS has undergone its initial
VMLAUNCH so that subsequent guest entries (until
VMCLEAR-ing the VMCS while exiting
vmx_run) will use
VMRESUME. If the thread goes off-cpu inside this loop, calling the ctx operation to
VMCLEAR the VMCS, the state in
launched will become invalid and a subsequent
VMRESUME (when the thread comes back on-cpu) will fail.
To address this, the state kept by the ctx ops should also track launched/resume ability to prevent such failures.
I discovered this shortcoming when testing an initial fix for OS-7012.
On an Ivy Bridge system (with APICv) running this fix and the one for OS-7012, I see only occasional events where a thread in
vmx_run goes off-cpu. The stack is always the same:
unix`swtch+0x141 genunix`turnstile_block+0x21a unix`mutex_vector_enter+0x3a3 vmm`vcpu_notify_event+0x36 vmm`lapic_set_intr+0x8b vmm`vlapic_icrlo_write_handler+0x114 vmm`vmx_handle_apic_write+0xd2 vmm`vmx_exit_process+0x909 vmm`vmx_run+0x6be vmm`vm_run+0x224 vmm`vmmdev_do_ioctl+0x762 vmm`vmm_ioctl+0x12c genunix`cdev_ioctl+0x39 specfs`spec_ioctl+0x60 genunix`fop_ioctl+0x55 genunix`ioctl+0x9b unix`sys_syscall+0x19f
illumos-joyent commit 88b7f8b0d2ce2d1d224879a20f7d36427c5b7707 (branch master, by Patrick Mooney)
OS-7012 bhyve wedged in vlapic cyclics
OS-7016 vmx ctx ops should inform VMRESUME
OS-6957 clean up unused mutex type from bhyve
Reviewed by: John Levon <email@example.com>
Reviewed by: Hans Rosenfeld <firstname.lastname@example.org>
Approved by: Hans Rosenfeld <email@example.com>