OS-7078: NMI while in bhyve guest shouldn't panic()


Issue Type:Bug
Priority:4 - Normal
Created at:2018-07-16T14:25:27.319Z
Updated at:2018-08-22T17:24:54.184Z


Created by:John Levon [X]
Reported by:John Levon [X]
Assigned to:John Levon [X]


Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-08-22T17:24:54.170Z)

Fix Versions

2018-08-30 Zolom Swamp (Release Date: 2018-08-30)

Related Links




As the code says:

2855 vmx_exit_handle_nmi(struct vmx *vmx, int vcpuid, struct vm_exit *vmexit)
2856 {
2875                 panic("XXX vector to NMI handler");

It'd be nice (for apic_kmdb_on_nmi purposes) if this was implemented.


Comment by John Levon [X]
Created at 2018-08-17T12:33:48.937Z

We need to do the same thing for the MCE handling. In upstream, this just does a direct 'int $2' call. This doesn't work for us, as this will go through the KPTI trampoline, and replace bhyve's %cr3 with the failsafe. As we return to the kernel, we'll never restore the right %cr3. So instead, we'll set up a fake trap frame just like vmx_call_isr, and use that for NMIs and MCEs.

Comment by John Levon [X]
Created at 2018-08-22T08:57:40.120Z

To test this, I sent a bunch of NMIs via IPMI while bhyve guests were CPU busy. I confirmed that in kmdb the stack trace showed we were hitting this code path, and a continue from kmdb restored the system properly.

I also used bhyvectl to inject guest NMIs and verified they still behaved as expected, and didn't interfere with the host side.

I didn't do any testing of the MCE path (no idea how to generate such).

Comment by Jira Bot
Created at 2018-08-22T16:24:18.791Z

illumos-joyent commit 88787a9cd548438c7b4d63636df84c9c967cf9f2 (branch master, by John Levon)

OS-7078 NMI while in bhyve guest shouldn't panic()
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: Patrick Mooney <patrick.mooney@joyent.com>