OS-6860: bhyve requires APIC cyclic be localized

Details

Issue Type:Bug
Priority:3 - Elevated
Status:Resolved
Created at:2018-03-31T22:38:54.873Z
Updated at:2019-06-25T19:46:45.165Z

People

Created by:Tomas Celaya
Reported by:Tomas Celaya
Assigned to:Patrick Mooney [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-04-02T18:31:25.254Z)

Fix Versions

2018-04-12 Promised Land (Release Date: 2018-04-12)

Related Links

Labels

bhyve

Description

After queueing up a set of load tests tests overnight and checking this morning I discovered several of the load generator instances had disappeared. Similar to OS-6843, The CNs hosting these instances were locked up. Both SSH and the console were stuck.

Comments

Comment by Bryan Cantrill [X]
Created at 2018-04-01T03:25:10.118Z

The dumps differ in pathology, but it's clear that there is (1) an unresolved panic (the vmcs_write panic) and (2) an unresolved hang (the very strange e4c163cc0f987a0d440aa7a258a34053 dump, which seems to have been cryogenically frozen for several hours. To get further information on both of these, we should run the following script (despite its heavy hand):

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option bufpolicy=ring
#pragma D option strsize=32

api?_*time*:entry,
api?_*int*:entry,
vmx*:entry,
vmcs*:entry
{
	printf("ts=%d cpu=%d thr=%p %s:entry arg0=%x arg1=%x\n",
	    timestamp, cpu, curthread, probefunc, arg0, arg1);
}

api?_*time*:return,
api?_*int*:return,
vmx*:return,
vmcs*:return
{
	printf("ts=%d cpu=%d thr=%p %s:return arg0=%x arg1=%x\n",
	    timestamp, cpu, curthread, probefunc, arg0, arg1);
}

vmm-vexit
{
	printf("ts=%d cpu=%d thr=%p vmm-vexit vcpu=%d rip=%p reason=%d\n",
	    timestamp, cpu, curthread, arg0, arg1, arg2);
}

sched:::on-cpu
{
	printf("ts=%d cpu=%d thr=%p on-cpu\n", timestamp, cpu, curthread);
}

sched:::off-cpu
{
	printf("ts=%d cpu=%d thr=%p off-cpu state=0x%x wchan=%p\n",
	    timestamp, cpu, curthread, curthread->t_state,
	    curthread->t_lwpchan.lc_wchan);
}

Comment by Bryan Cantrill [X]
Created at 2018-04-01T17:59:06.417Z

For thoth dump d2ee7fec33c3ab3c, we have the script enabled. The script output (extracted from the dump) is here:

/thoth/stor/thoth/d2ee7fec33c3ab3c57d48ef712db6c88/dtrace.out

Here is where we died:

> $c
vpanic()
vmx_exit_process+0x26a(ffffd06c235a0000, 5, ffffd0631ce25790)
vmx_run+0x414(ffffd06c235a0000, 5, ffffffff81051586, ffffd063204f3c90, ffffd001ecf348c0)
vm_run+0x1be(ffffd0631ce25000, ffffd001ecf34970)
vmmdev_do_ioctl+0xbce(ffffd0666933b780, c0907601, ffffbf7feb410e10, 202003, ffffd0632b1e7718, ffffd001ecf34ea8)
vmm_ioctl+0x12c(13100000016, c0907601, ffffbf7feb410e10, 202003, ffffd0632b1e7718, ffffd001ecf34ea8)
cdev_ioctl+0x39(13100000016, c0907601, ffffbf7feb410e10, 202003, ffffd0632b1e7718, ffffd001ecf34ea8)
spec_ioctl+0x60(ffffd06320e0cd00, c0907601, ffffbf7feb410e10, 202003, ffffd0632b1e7718, ffffd001ecf34ea8)
fop_ioctl+0x55(ffffd06320e0cd00, c0907601, ffffbf7feb410e10, 202003, ffffd0632b1e7718, ffffd001ecf34ea8)
ioctl+0x9b(3, c0907601, ffffbf7feb410e10)
sys_syscall+0x19f()
> panic_thread/K
panic_thread:
panic_thread:   ffffd06663c02820 

Looking for that panic thread:

mls /thoth/stor/thoth/d2ee7fec33c3ab3c57d48ef712db6c88/
dtrace.out
info.json
thoth.44454c4c-5600-1051-8044-b8c04f524432.vmdump.1
vmcore.0
beakroot:~ bryancantrill$ mlogin /thoth/stor/thoth/d2ee7fec33c3ab3c57d48ef712db6c88/dtrace.out
 * created interactive job -- ae14ca64-47bc-6deb-84f0-b4c76a135e30
 * waiting for session... | established
bcantrill@manta # grep ffffd06663c02820 $MANTA_INPUT_FILE | tail -40
ts=508830818313305 cpu=21 thr=ffffd06663c02820 vmcs_read:entry arg0=810 arg1=1
ts=508830818314164 cpu=21 thr=ffffd06663c02820 vmcs_read:return arg0=32 arg1=0
ts=508830818314954 cpu=21 thr=ffffd06663c02820 vmx_exit_process:return arg0=bb arg1=0
ts=508830818315888 cpu=21 thr=ffffd06663c02820 vmm-vexit vcpu=5 rip=ffffffff81051585 reason=12
ts=508830818317231 cpu=21 thr=ffffd06663c02820 vmx_msr_guest_exit:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818318513 cpu=21 thr=ffffd06663c02820 vmx_msr_guest_exit:return arg0=e5 arg1=40300
ts=508830818319334 cpu=21 thr=ffffd06663c02820 vmx_run:return arg0=2a7 arg1=0
ts=508830818321050 cpu=21 thr=ffffd06663c02820 vmx_pending_intr:entry arg0=ffffd0681d1d2440 arg1=0
ts=508830818322123 cpu=21 thr=ffffd06663c02820 vmx_pending_intr:return arg0=ac arg1=0
ts=508830818325603 cpu=21 thr=ffffd06663c02820 off-cpu state=0x1 wchan=ffffd0631ce25734
ts=508830818887494 cpu=21 thr=ffffd06663c02820 on-cpu
ts=508830818888654 cpu=21 thr=ffffd06663c02820 vmx_pending_intr:entry arg0=ffffd0681d1d2440 arg1=0
ts=508830818890451 cpu=21 thr=ffffd06663c02820 vmx_pending_intr:return arg0=7a arg1=1
ts=508830818893063 cpu=21 thr=ffffd06663c02820 vmx_run:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818894137 cpu=21 thr=ffffd06663c02820 vmx_msr_guest_enter:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818895472 cpu=21 thr=ffffd06663c02820 vmx_msr_guest_enter:return arg0=e3 arg1=0
ts=508830818896767 cpu=21 thr=ffffd06663c02820 vmx_set_pcpu_defaults:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818897848 cpu=21 thr=ffffd06663c02820 vmx_set_pcpu_defaults:return arg0=1af arg1=ffffd062ea086000
ts=508830818898749 cpu=21 thr=ffffd06663c02820 vmx_inject_interrupts:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818900062 cpu=21 thr=ffffd06663c02820 vmx_inject_pir:entry arg0=ffffd0681d1d2440 arg1=500
ts=508830818901352 cpu=21 thr=ffffd06663c02820 vmx_inject_pir:return arg0=ec arg1=ef
ts=508830818902129 cpu=21 thr=ffffd06663c02820 vmx_inject_interrupts:return arg0=201 arg1=ef
ts=508830818903033 cpu=21 thr=ffffd06663c02820 vmx_enter_guest:entry arg0=ffffd06c236027d0 arg1=ffffd06c235a0000
ts=508830818908907 cpu=21 thr=ffffd06663c02820 vmx_enter_guest:return arg0=24f arg1=0
ts=508830818909878 cpu=21 thr=ffffd06663c02820 vmx_exit_process:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818910852 cpu=21 thr=ffffd06663c02820 vmx_handle_apic_write:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818925842 cpu=21 thr=ffffd06663c02820 vmx_handle_apic_write:return arg0=17e arg1=1
ts=508830818927235 cpu=21 thr=ffffd06663c02820 vmx_exit_process:return arg0=bb arg1=1
ts=508830818928663 cpu=21 thr=ffffd06663c02820 vmm-vexit vcpu=5 rip=ffffffff8104ad6c reason=56
ts=508830818929885 cpu=21 thr=ffffd06663c02820 vmx_inject_interrupts:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818931244 cpu=21 thr=ffffd06663c02820 vmx_inject_pir:entry arg0=ffffd0681d1d2440 arg1=500
ts=508830818932571 cpu=21 thr=ffffd06663c02820 vmx_inject_pir:return arg0=ec arg1=ffffd06c23601140
ts=508830818933680 cpu=21 thr=ffffd06663c02820 vmx_inject_interrupts:return arg0=201 arg1=ffffd06c23601140
ts=508830818934878 cpu=21 thr=ffffd06663c02820 vmx_enter_guest:entry arg0=ffffd06c236027d0 arg1=ffffd06c235a0000
ts=508830818937438 cpu=21 thr=ffffd06663c02820 vmx_enter_guest:return arg0=24f arg1=0
ts=508830818938292 cpu=21 thr=ffffd06663c02820 vmx_exit_process:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818939214 cpu=21 thr=ffffd06663c02820 vmx_handle_apic_write:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818948920 cpu=21 thr=ffffd06663c02820 off-cpu state=0x1 wchan=fffffffffbc28008
ts=508830819018374 cpu=21 thr=ffffd06663c02820 on-cpu
ts=508830819033740 cpu=21 thr=ffffd06663c02820 vmx_handle_apic_write:return arg0=17e arg1=1

That itself is telling -- that we were in vmx_handle_apic_write and we went off CPU, as is the wchan:

> fffffffffbc28008/p
cbe_xcall_lock:
cbe_xcall_lock: 0xffffd0681b643480

Looking at the trace records for cpu=21 completes the picture:

bcantrill@manta # grep -w cpu=21 $MANTA_INPUT_FILE | tail -70
ts=508830818909878 cpu=21 thr=ffffd06663c02820 vmx_exit_process:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818910852 cpu=21 thr=ffffd06663c02820 vmx_handle_apic_write:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818925842 cpu=21 thr=ffffd06663c02820 vmx_handle_apic_write:return arg0=17e arg1=1
ts=508830818927235 cpu=21 thr=ffffd06663c02820 vmx_exit_process:return arg0=bb arg1=1
ts=508830818928663 cpu=21 thr=ffffd06663c02820 vmm-vexit vcpu=5 rip=ffffffff8104ad6c reason=56
ts=508830818929885 cpu=21 thr=ffffd06663c02820 vmx_inject_interrupts:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818931244 cpu=21 thr=ffffd06663c02820 vmx_inject_pir:entry arg0=ffffd0681d1d2440 arg1=500
ts=508830818932571 cpu=21 thr=ffffd06663c02820 vmx_inject_pir:return arg0=ec arg1=ffffd06c23601140
ts=508830818933680 cpu=21 thr=ffffd06663c02820 vmx_inject_interrupts:return arg0=201 arg1=ffffd06c23601140
ts=508830818934878 cpu=21 thr=ffffd06663c02820 vmx_enter_guest:entry arg0=ffffd06c236027d0 arg1=ffffd06c235a0000
ts=508830818937438 cpu=21 thr=ffffd06663c02820 vmx_enter_guest:return arg0=24f arg1=0
ts=508830818938292 cpu=21 thr=ffffd06663c02820 vmx_exit_process:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818939214 cpu=21 thr=ffffd06663c02820 vmx_handle_apic_write:entry arg0=ffffd06c235a0000 arg1=5
ts=508830818948920 cpu=21 thr=ffffd06663c02820 off-cpu state=0x1 wchan=fffffffffbc28008
ts=508830818952013 cpu=21 thr=ffffd0681b643480 on-cpu
ts=508830818953181 cpu=21 thr=ffffd0681b643480 vmx_pending_intr:entry arg0=ffffd0681d138e00 arg1=0
ts=508830818954670 cpu=21 thr=ffffd0681b643480 vmx_pending_intr:return arg0=7a arg1=1
ts=508830818956542 cpu=21 thr=ffffd0681b643480 vmx_run:entry arg0=ffffd070fbe93000 arg1=4
ts=508830818957536 cpu=21 thr=ffffd0681b643480 vmx_msr_guest_enter:entry arg0=ffffd070fbe93000 arg1=4
ts=508830818958760 cpu=21 thr=ffffd0681b643480 vmx_msr_guest_enter:return arg0=e3 arg1=0
ts=508830818959855 cpu=21 thr=ffffd0681b643480 vmx_set_pcpu_defaults:entry arg0=ffffd070fbe93000 arg1=4
ts=508830818960809 cpu=21 thr=ffffd0681b643480 vmx_set_pcpu_defaults:return arg0=1af arg1=ffffd062ea086000
ts=508830818961655 cpu=21 thr=ffffd0681b643480 vmx_inject_interrupts:entry arg0=ffffd070fbe93000 arg1=4
ts=508830818962665 cpu=21 thr=ffffd0681b643480 vmx_inject_pir:entry arg0=ffffd0681d138e00 arg1=400
ts=508830818963858 cpu=21 thr=ffffd0681b643480 vmx_inject_pir:return arg0=ec arg1=fd
ts=508830818964518 cpu=21 thr=ffffd0681b643480 vmx_inject_interrupts:return arg0=201 arg1=fd
ts=508830818965295 cpu=21 thr=ffffd0681b643480 vmx_enter_guest:entry arg0=ffffd070fbef5740 arg1=ffffd070fbe93000
ts=508830818969253 cpu=21 thr=ffffd0681b643480 vmx_enter_guest:return arg0=24f arg1=0
ts=508830818970150 cpu=21 thr=ffffd0681b643480 vmx_exit_process:entry arg0=ffffd070fbe93000 arg1=4
ts=508830818970929 cpu=21 thr=ffffd0681b643480 vmx_handle_apic_write:entry arg0=ffffd070fbe93000 arg1=4
ts=508830818972184 cpu=21 thr=ffffd0681b643480 vmx_handle_apic_write:return arg0=17e arg1=1
ts=508830818973032 cpu=21 thr=ffffd0681b643480 vmx_exit_process:return arg0=bb arg1=1
ts=508830818973936 cpu=21 thr=ffffd0681b643480 vmm-vexit vcpu=4 rip=ffffffff8104ad6c reason=56
ts=508830818974760 cpu=21 thr=ffffd0681b643480 vmx_inject_interrupts:entry arg0=ffffd070fbe93000 arg1=4
ts=508830818975596 cpu=21 thr=ffffd0681b643480 vmx_inject_pir:entry arg0=ffffd0681d138e00 arg1=400
ts=508830818976390 cpu=21 thr=ffffd0681b643480 vmx_inject_pir:return arg0=ec arg1=ffffd070fbef4100
ts=508830818977266 cpu=21 thr=ffffd0681b643480 vmx_inject_interrupts:return arg0=201 arg1=ffffd070fbef4100
ts=508830818978341 cpu=21 thr=ffffd0681b643480 vmx_enter_guest:entry arg0=ffffd070fbef5740 arg1=ffffd070fbe93000
ts=508830818979961 cpu=21 thr=ffffd0681b643480 vmx_enter_guest:return arg0=24f arg1=0
ts=508830818980824 cpu=21 thr=ffffd0681b643480 vmx_exit_process:entry arg0=ffffd070fbe93000 arg1=4
ts=508830818981678 cpu=21 thr=ffffd0681b643480 vmcs_read:entry arg0=4404 arg1=40
ts=508830818982806 cpu=21 thr=ffffd0681b643480 vmcs_read:return arg0=32 arg1=800000f2
ts=508830818983922 cpu=21 thr=ffffd0681b643480 vmx_trigger_hostintr:entry arg0=f2 arg1=40
ts=508830818984761 cpu=21 thr=ffffd0681b643480 vmx_call_isr:entry arg0=d2 arg1=40
ts=508830818985593 cpu=21 thr=ffffd0681b643480 apix_do_interrupt:entry arg0=ffffd001ed72b5f0 arg1=40
ts=508830818986935 cpu=21 thr=ffffd0681b643480 apix_intr_enter:entry arg0=0 arg1=ffffd001ed72b6a8
ts=508830818988135 cpu=21 thr=ffffd0681b643480 apix_intr_enter:return arg0=f2 arg1=b
ts=508830818989404 cpu=21 thr=ffffd0681b643480 apix_hilevel_intr_prolog:entry arg0=ffffd062ea086000 arg1=b
ts=508830818990392 cpu=21 thr=ffffd0681b643480 apix_hilevel_intr_prolog:return arg0=c8 arg1=0
ts=508830818991813 cpu=21 thr=ffffd0681b643480 apix_hilevel_intr_epilog:entry arg0=ffffd062ea086000 arg1=0
ts=508830818993310 cpu=21 thr=ffffd0681b643480 apix_intr_exit:entry arg0=0 arg1=0
ts=508830818994430 cpu=21 thr=ffffd0681b643480 apix_intr_exit:return arg0=a8 arg1=ffffd062a3f4f540
ts=508830818995426 cpu=21 thr=ffffd0681b643480 apix_hilevel_intr_epilog:return arg0=d9 arg1=0
ts=508830818996399 cpu=21 thr=ffffd0681b643480 apix_do_softint:entry arg0=ffffd001ed72b5f0 arg1=0
ts=508830818997301 cpu=21 thr=ffffd0681b643480 apix_do_softint:return arg0=9a arg1=0
ts=508830818998195 cpu=21 thr=ffffd0681b643480 apix_do_interrupt:return arg0=22d arg1=15
ts=508830818999290 cpu=21 thr=ffffd0681b643480 vmx_call_isr:return arg0=24 arg1=d2
ts=508830819000267 cpu=21 thr=ffffd0681b643480 vmx_trigger_hostintr:return arg0=44 arg1=d2
ts=508830819001090 cpu=21 thr=ffffd0681b643480 vmx_exit_process:return arg0=bb arg1=1
ts=508830819002226 cpu=21 thr=ffffd0681b643480 vmm-vexit vcpu=4 rip=ffffffff8104ad6c reason=1
ts=508830819002902 cpu=21 thr=ffffd0681b643480 vmx_inject_interrupts:entry arg0=ffffd070fbe93000 arg1=4
ts=508830819003840 cpu=21 thr=ffffd0681b643480 vmx_inject_pir:entry arg0=ffffd0681d138e00 arg1=400
ts=508830819004627 cpu=21 thr=ffffd0681b643480 vmx_inject_pir:return arg0=ec arg1=ffffd070fbef4100
ts=508830819005486 cpu=21 thr=ffffd0681b643480 vmx_inject_interrupts:return arg0=201 arg1=ffffd070fbef4100
ts=508830819006615 cpu=21 thr=ffffd0681b643480 vmx_msr_guest_exit:entry arg0=ffffd070fbe93000 arg1=4
ts=508830819007787 cpu=21 thr=ffffd0681b643480 vmx_msr_guest_exit:return arg0=e5 arg1=47700
ts=508830819008747 cpu=21 thr=ffffd0681b643480 vmx_run:return arg0=2a7 arg1=0
ts=508830819015129 cpu=21 thr=ffffd0681b643480 off-cpu state=0x2 wchan=0
ts=508830819018374 cpu=21 thr=ffffd06663c02820 on-cpu
ts=508830819033740 cpu=21 thr=ffffd06663c02820 vmx_handle_apic_write:return arg0=17e arg1=1

This shows us clearly what happened: we blocked on cbe_xcall_lock -- likely to make the cross-call to reprogram a cyclic on a different CPU -- and while we didn't move while blocked (that is, we stayed on CPU 21), another VM's context was loaded in the mean time. To reprogram the cyclic when it's on a different CPU (which should be the uncommon case), we are clearly going to need to do something more invasive -- perhaps as simple as indicating that it wasn't entirely handled and bouncing back out.


Comment by Jira Bot
Created at 2018-04-02T16:53:52.013Z

illumos-joyent commit ca73f140cd7d1061c0f3af071ab0a8dfa2bee3ad (branch master, by Patrick Mooney)

OS-6860 bhyve requires APIC cyclic be localized
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Bryan Cantrill <bryan@joyent.com>


Comment by Patrick Mooney [X]
Created at 2018-04-02T17:46:12.958Z

Bryan noted a different pathology which shed more light on this issue:

In thoth dump a301f83067e6da0c7b9db38e2ab434d3, I'm seeing a VLAPIC_TIMER_LOCK/cbe_xcall_lock deadlock

We grab that thing from the callout

Which means that you can have cbe_xcall_lock held and get a timer interrupt that takes VLAPIC_TIMER_LOCK

While the order is normally VLAPIC_TIMER_LOCK before cbe_xcall_lock (if implicitly)

> ffffd001ea617c40::findstack
stack pointer for thread ffffd001ea617c40: ffffd001ea617590
  ffffd001ea617890 0xffffd0631c6eda80()
  ffffd001ea6179b0 mutex_delay_default+0xc()
  ffffd001ea617a20 lock_set_spl_spin+0xb2()
  ffffd001ea617a90 mutex_vector_enter+0x361()
  ffffd001ea617ad0 vlapic_callout_handler+0x2a()
  ffffd001ea617af0 vmm_glue_callout_handler+0x2f()
  ffffd001ea617b90 cyclic_softint+0xfd()
  ffffd001ea617ba0 cbe_low_level+0x14()
  ffffd001ea617bf0 av_dispatch_softvect+0x78()
  ffffd001ea617c20 apix_dispatch_softint+0x35()
  ffffd001ece384a0 switch_sp_and_call+0x13()
  ffffd001ece384f0 apix_do_softint+0x6c()
  ffffd001ece38550 apix_do_interrupt+0x362()
  ffffd001ece38560 _interrupt+0xba()
  ffffd001ece38690 cbe_xcall+0x88()
  ffffd001ece38710 cyclic_remove_here+0x57()
  ffffd001ece387a0 cyclic_juggle_one_to+0xf4()
  ffffd001ece387e0 cyclic_move_here+0x88()
  ffffd001ece38810 vmm_glue_callout_localize+0x27()
  ffffd001ece38830 vlapic_localize_resources+0x15()
  ffffd001ece38860 vm_localize_resources+0x55()
  ffffd001ece38920 vm_run+0x10b()
  ffffd001ece38c20 vmmdev_do_ioctl+0xbce()
  ffffd001ece38cc0 vmm_ioctl+0x12c()
  ffffd001ece38d00 cdev_ioctl+0x39()
  ffffd001ece38d50 spec_ioctl+0x60()
  ffffd001ece38de0 fop_ioctl+0x55()
  ffffd001ece38f00 ioctl+0x9b()
  ffffd001ece38f10 sys_syscall+0x19f()

In this case, while localizing the cyclics for one vCPU, the APIC cyclic for a different vCPU instance interrupted the localization. When that interrupt callout routine attempted to run, it blocked on the VLAPIC_LOCK for the vCPU, which was held by a thread on a different host CPU while it was attempted to xcall to reprogram the now-remote cyclic. It shows that the cyclic-reprogramming routines in the APIC logic do depend on localization when they're called from the restricted context of vmx_run.