OS-7016: vmx ctx ops should inform VMRESUME

Details

Issue Type:Improvement
Priority:4 - Normal
Status:Resolved
Created at:2018-06-13T14:36:26.477Z
Updated at:2018-10-15T14:34:21.406Z

People

Created by:Patrick Mooney [X]
Reported by:Patrick Mooney [X]
Assigned to:Patrick Mooney [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-10-15T14:34:21.392Z)

Related Issues

Related Links

Labels

bhyve

Description

As part of OS-6864, illumos-specific handlers were added to bhyve in order to perform certain state management tasks when a vm_run thread is switched on/off cpu. When inside the critical section of vmx_run, it takes care of VMCLEAR/VMPTRLD-ing the VMCS if such context switches were to occur. Much of the VMX-specific code is structured to avoid those situations, but the ctx ops ensure safety. One overlooked aspect of the current logic is the VMRESUME/VMLAUNCH distinction. Inside the main loop of vmx_run a variable launched keeps track of when a VMCS has undergone its initial VMLAUNCH so that subsequent guest entries (until VMCLEAR-ing the VMCS while exiting vmx_run) will use VMRESUME. If the thread goes off-cpu inside this loop, calling the ctx operation to VMCLEAR the VMCS, the state in launched will become invalid and a subsequent VMRESUME (when the thread comes back on-cpu) will fail.

To address this, the state kept by the ctx ops should also track launched/resume ability to prevent such failures.

Comments

Comment by Patrick Mooney [X]
Created at 2018-06-13T14:36:54.411Z

I discovered this shortcoming when testing an initial fix for OS-7012.


Comment by Patrick Mooney [X]
Created at 2018-06-13T15:18:15.051Z

On an Ivy Bridge system (with APICv) running this fix and the one for OS-7012, I see only occasional events where a thread in vmx_run goes off-cpu. The stack is always the same:

              unix`swtch+0x141
              genunix`turnstile_block+0x21a
              unix`mutex_vector_enter+0x3a3
              vmm`vcpu_notify_event+0x36
              vmm`lapic_set_intr+0x8b
              vmm`vlapic_icrlo_write_handler+0x114
              vmm`vmx_handle_apic_write+0xd2
              vmm`vmx_exit_process+0x909
              vmm`vmx_run+0x6be
              vmm`vm_run+0x224
              vmm`vmmdev_do_ioctl+0x762
              vmm`vmm_ioctl+0x12c
              genunix`cdev_ioctl+0x39
              specfs`spec_ioctl+0x60
              genunix`fop_ioctl+0x55
              genunix`ioctl+0x9b
              unix`sys_syscall+0x19f

Comment by Jira Bot
Created at 2018-09-13T15:02:15.633Z

illumos-joyent commit 88b7f8b0d2ce2d1d224879a20f7d36427c5b7707 (branch master, by Patrick Mooney)

OS-7012 bhyve wedged in vlapic cyclics
OS-7016 vmx ctx ops should inform VMRESUME
OS-6957 clean up unused mutex type from bhyve
Reviewed by: John Levon <john.levon@joyent.com>
Reviewed by: Hans Rosenfeld <hans.rosenfeld@joyent.com>
Approved by: Hans Rosenfeld <hans.rosenfeld@joyent.com>