OS-6684: cyclic reprogramming can race with removal

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2018-02-23T16:24:43.400Z
Updated at:2019-07-18T22:22:57.429Z

People

Created by:Former user
Reported by:Former user
Assigned to:Former user

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2019-07-18T22:22:57.415Z)

Fix Versions

2019-08-01 Xerox, Cerie (Release Date: 2019-08-01)

Labels

bhyve

Description

While running vmadm start / stop -f in a tight loop, I hit the following:

> ::status
debugging crash dump vmcore.22 (64-bit) from dent
operating system: 5.11 joyent_20180216T202957Z (i86pc)
image uuid: (not set)
panic message: attempt to reprogram non-existent cyclic
dump content: kernel pages only
> $C
ffffff000f8328d0 vpanic()
ffffff000f832950 cyclic_reprogram_cyclic+0xb0(ffffff03eb144000, b, 2cc15105498)
ffffff000f832990 cyclic_reprogram+0x1d8(ffffff0437fa49d0, 2cc15105498)
ffffff000f832a00 vmm_glue_callout_reset_sbt+0x8b(ffffff03e8e0d788, c038ccb3532, e0ef1f1, fffffffff7b1fbb0, ffffff03e8e0d7b0, 200)
ffffff000f832a40 pit_timer_start_cntr0+0xab(ffffff03e8e0d740)
ffffff000f832a80 vatpit_callout_handler+0x98(ffffff03e8e0d7b0)
ffffff000f832aa0 vmm_glue_callout_handler+0x2f(ffffff03e8e0d788)
ffffff000f832b90 cyclic_softint+0x209(ffffff03eb0c5580, 0)
ffffff000f832ba0 cbe_low_level+0x14()
ffffff000f832bf0 av_dispatch_softvect+0x88(2)
ffffff000f832c20 apix_dispatch_softint+0x35(0, 0)
ffffff000f9d4a10 switch_sp_and_call+0x13()
ffffff000f9d4a60 apix_do_softint+0x44(ffffff000f9d4ae0)
ffffff000f9d4ad0 apix_do_interrupt+0x480(ffffff000f9d4ae0, ffffff03eb0daea0)
ffffff000f9d4ae0 _interrupt+0x1e9()
ffffff000f9d4bd0 mach_cpu_idle+6()
ffffff000f9d4c00 cpu_idle+0x11a()
ffffff000f9d4c20 idle+0xa7()
ffffff000f9d4c30 thread_start+8()

Comments

Comment by Former user
Created at 2018-02-23T17:17:30.991Z

thoth debug 464ddeb48e17b7faa63631c62d8890a5


Comment by Former user
Created at 2018-03-21T14:52:29.877Z

This is still current. Perhaps not a great surprise, but this doesn't seem to occur - or is certainly harder to trigger - if using "vmadm stop" not "vmadm stop -f".


Comment by Former user
Created at 2019-07-10T21:53:53.257Z

We died in the vhpet this time, in the f6d5 dump. This looks like a cyclics race, or something.


Comment by Former user
Created at 2019-07-12T20:42:15.614Z

It's now available here:

I fixed it up (was under /jlevon):

$ thoth debug 464ddeb48e17b7faa63631c62d8890a5
thoth: using database at 165.225.169.53:28015 (configured from Manta)
thoth: debugging 464ddeb48e17b7faa63631c62d8890a5


Comment by Former user
Created at 2019-07-12T22:19:52.309Z
Updated at 2019-07-15T17:11:53.201Z

After looking at the vhpet-related dump, and checking back with @accountid:62562f1fcdc24000704b0435's original, I've come up with a hypothesis regarding the issue.
In both cases, the panic occurs during an attempted timer reprogramming as part of virtual device's running callout at the same time that a reboot is occurring.

  fffffb0410bc9980 cyclic_reprogram_cyclic+0x97(fffffdc5496f0d40, c3, 370a98496f2f04)
  fffffb0410bc99b0 cyclic_reprogram+0x108(fffffdce41a295c0, 370a98496f2f04)
  fffffb0410bc9a00 vmm_glue_callout_reset_sbt+0x8b(fffffdc859b8ac90, ec66a4e704be61, 3ff6900, fffffffff8494aa0, fffffdc859b8acb8, 200)
  fffffb0410bc9a60 vhpet_start_timer+0x99(fffffdc859b8ac40, 0, 3e391951, ec66a4e3055561)
  fffffb0410bc9ab0 vhpet_handler+0x76(fffffdc859b8acb8)
  fffffb0410bc9ad0 vmm_glue_callout_handler+0x2f(fffffdc859b8ac90)
  fffffb0410bc9b70 cyclic_softint+0xfd(fffffdc544958000, 0)
  fffffb0410bc9b80 cbe_low_level+0x14()
  fffffb0410bc9bd0 av_dispatch_softvect+0x78(2)
  fffffb0410bc9c00 apix_dispatch_softint+0x35(0, 0)
  fffffb0410b789f0 switch_sp_and_call+0x13()
  fffffb0410b78a40 apix_do_softint+0x6c(fffffb0410b78ab0)
  fffffb0410b78aa0 apix_do_interrupt+0x382(fffffb0410b78ab0, 0)
  fffffb0410b78ab0 _interrupt+0xba()
  fffffb0410b78ba0 i86_mwait+0x14()
  fffffb0410b78be0 cpu_idle_mwait+0x109()
  fffffb0410b78c00 idle+0xa7()
  fffffb0410b78c10 thread_start+8()

We can see the re-init (read: reboot) thread in the process of removing that cyclic in another thread:

stack pointer for thread fffffe55d37b8080 (bhyve/1 [vcpu 0]): fffffb04249a9560
[ fffffb04249a9560 _resume_from_idle+0x12b() ]
  fffffb04249a9590 swtch+0x141()
  fffffb04249a95d0 sema_p+0x1d7(fffffdc5496f0de0)
  fffffb04249a9650 cyclic_remove_here+0xac(fffffdc5496f0d40, c3, 0, 0)
  fffffb04249a9690 cyclic_remove+0x38(fffffdce41a295c0)
  fffffb04249a96c0 vmm_glue_callout_drain+0x27(fffffdc859b8ac90)
  fffffb04249a96f0 vhpet_cleanup+0x40(fffffdc859b8ac40)
  fffffb04249a9730 vm_cleanup+0x66(fffffdca49aad000, 0)
  fffffb04249a9760 vm_reinit+0x38(fffffdca49aad000)
  fffffb04249a9c20 vmmdev_do_ioctl+0xc43(fffffdcba98b4dc0, 20007605, 0, 202003, fffffdcb461ef0e0, fffffb04249a9ea8)
  fffffb04249a9cc0 vmm_ioctl+0x12f(13100000017, 20007605, 0, 202003, fffffdcb461ef0e0, fffffb04249a9ea8)
  fffffb04249a9d00 cdev_ioctl+0x39(13100000017, 20007605, 0, 202003, fffffdcb461ef0e0, fffffb04249a9ea8)
  fffffb04249a9d50 spec_ioctl+0x60(fffffdcb12aa0a00, 20007605, 0, 202003, fffffdcb461ef0e0, fffffb04249a9ea8, 0)
  fffffb04249a9de0 fop_ioctl+0x55(fffffdcb12aa0a00, 20007605, 0, 202003, fffffdcb461ef0e0, fffffb04249a9ea8, 0)
  fffffb04249a9f00 ioctl+0x9b(3, 20007605, 0)
  fffffb04249a9f10 sys_syscall+0x19f()

Now cyclic_remove will wait for the pending handler to complete before it returns to the caller, as evidenced by the blocking sema_p call. The fact that the handler is trying to reprogram that very cyclic is where it gets tricky. I've theorized a set of events that leads to this condition:

1. The cyclic for vhpet_handler triggers, resulting in an interrupt on that CPU (at CBE_LOW_PIL of 2)
2. That handler executes some logic, but has not yet reached the point at which it reprograms the cyclic
3. On a different CPU, a thread calls vm_reinit, which as part of its cleanup, goes to remove the cyclic
4. The cyclic_remove call results in a cyclic xcall to the CPU currently executing the handler, at CBE_HIGH_PIL, effectively preempting it
5. In cyclic_remove_xcall, the cyclic is removed from the cyc_cpu_t heap and marked CYF_FREE.
6. Because the handler is still running in that moment (cy_pend != 0), the cyclic_remove_here call is forced to wait for its completion, via sema_p.
7. With the CBE_HIGH_PIL interrupt handling complete, the cyclic handler continues, and as part of its duty, attempts to reprogram the now-removed cyclic.
8. vpanic ensues

It seems that the cyclic_reprogram logic needs to be updated to detect when a cyclic has been removed out from under the nose of its running handler, and report the failure back up to its caller, rather than panic the box.


Comment by Former user
Created at 2019-07-15T15:58:03.384Z

As far as I can figure out, Patrick's explanation seems possible for sure. Sounds like a very simple test could verify how this is broken?

Fixing up cyclic_reprogram() is one way, but removing it from the heap while it's still executing seems a bit dubious in the first place, but perhaps attempting to leave a CYF_FREE cyclic on the heap is a little too nasty. It's a shame that we can't easily tell in cyclic_reprogram() that the cyclic is CYF_FREE and hence not on the heap.


Comment by Former user
Created at 2019-07-15T19:20:22.045Z

Per @accountid:62562f1fcdc24000704b0435's suggestion, I went looking for consumers of cyclic_reprogram to see how they would handle an additional possibility of its failure result. (If cyclic_reprogram was updated to return 0 when a running handler attempts to reprogram a cyclic that has been removed out from under it in the manner above.) It appears that besides bhyve, the SPA deadman cyclic in ZFS is the only consumer which remotely resembles the bhyve case. Given that the cyclic only fires for a pool under stress, and that the deadman panics the box anyways when enabled, discounting it from this analysis should be OK.

Updating cyclic_reprogram to emit an error to the running handler when it discovers its target has been removed seems like a reasonable fix under these circumstances. Other failed cyclic reprogramming should still trigger the panic.


Comment by Former user
Created at 2019-07-16T18:24:45.403Z
Updated at 2019-07-16T20:27:35.360Z

For testing, I drew up this dtrace script to blow up the race window when using bhyve:

BEGIN
{
        active = 0;
}

vhpet_cleanup:entry
{
        active = 1;
        trace("active");
        chill(20000000);
}

vhpet_handler:entry
{
        @ = count();

        if (active) {
                self->t = 1;
                chill(20000000);
        }
}

vhpet_handler:return
/self->t/
{
        self->t = 0;
}

tick-1s
{
        printa(@);
        trunc(@);
}

cyclic_reprogram_cyclic:return
/self->t && arg1 == 0/
{
        stack();
        trace("successful programming failure");
        exit(0);
}

My freshly booted victim VM (Windows, in this case, but never allowed to get far into start-up) would result in about 64 cyclic reprogramming operations per second via the vhpet, used to derive the above chill durations. With the machine freshly started, I would force a reset with:

bhyvectl --vm=SYSbhyve-## --force-reset

On an unpatched machine, it resulted in the non-existent cyclic panic. On a patched machine, the "successful programming failure" trace message was printed.


Comment by Former user
Created at 2019-07-17T17:46:22.887Z

I saw to another test case that John through up: Watching cyclics be juggled during CPU offline-ing. With dtrace, I looked for which CPU was home to the vhpet cyclics on an instance. I then took that CPU offline with psradm. With dtrace, I was able to see the cyclics successfully juggled elsewhere, without incurring any issues.


Comment by Jira Bot
Created at 2019-07-18T21:42:58.897Z

illumos-joyent commit 7cbfae1c5ff60dcbb4e007dcc2e6997eba63c207 (branch master, by Patrick Mooney)

OS-6684 cyclic reprogramming can race with removal
Reviewed by: John Levon <john.levon@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Jason King <jbk@joyent.com>