OS-6928: recursive mutex in pci_xhci

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2018-05-03T19:25:10.895Z
Updated at:2018-06-05T12:47:19.454Z

People

Created by:Mike Gerdts [X]
Reported by:Mike Gerdts [X]
Assigned to:Mike Gerdts [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-05-09T01:02:00.534Z)

Fix Versions

2018-05-10 Rocket Town (Release Date: 2018-05-10)

Related Issues

Related Links

Labels

bhyve

Description

When bhyve is started as:

bhyve -H -B '1,product=SmartOS HVM' -c 2 -m 1024 \
    -s 0,amd_hostbridge \
    -s 1,lpc -l com1,stdio -l bootrom,/zones/media/uefi-rom.bin \
    -s 3,ahci-cd,/zones/media/CentOS-7-x86_64-DVD-1708.iso \
    -s 4,ahci-hd,/dev/zvol/rdsk/zones/testvol \
    -s 29,fbuf,vga=off,tcp=0.0.0.0:5900,w=1024,h=768,wait \
    -s 30,xhci,tablet vm1

The bhyve process hangs with these two threads trying to take a lock:

 fffffc7fef2ae287 lwp_park (0, 0, 0)
 fffffc7fef2a6049 mutex_lock_impl (1082e88, 0) + 189
 fffffc7fef2a6133 mutex_lock (1082e88) + 13
 0000000000435339 pci_xhci_try_usb_xfer (718500, 718820, 718890, fffffc7fbb7b7060, 1, 3) + 69
 00000000004359d7 pci_xhci_device_doorbell (718500, 1, 3, 0) + 117
 00000000004365a2 pci_xhci_dev_intr (718c48, 81) + e2
 000000000043dd4d umouse_event (0, 1a4, 156, 718c80) + dd
 000000000041d53d console_ptr_event (0, 1a4, 156) + 2d
 00000000004393f7 rfb_recv_ptr_msg (6859a0, 7) + 47
 0000000000439802 rfb_handle (6859a0, 7) + 222
 000000000043990b rfb_thr (6859a0) + 6b
 fffffc7fef2adf2a _thrp_setup (fffffc7fef118240) + 8a
 fffffc7fef2ae240 _lwp_start ()
-----------------  lwp# 19 / thread# 19  --------------------
 fffffc7fef2ae287 lwp_park (0, 0, 0)
 fffffc7fef2a6049 mutex_lock_impl (1082e88, 0) + 189
 fffffc7fef2a6133 mutex_lock (1082e88) + 13
 0000000000435339 pci_xhci_try_usb_xfer (718500, 718820, 718890, fffffc7fbb7b7060, 1, 3) + 69
 0000000000435606 pci_xhci_handle_transfer (718500, 718820, 718890, fffffc7fbb7b7060, fffffc7fbb742000, 1, fffffc7f00000003, 3b342010, ...) + 1e6
 00000000004359ba pci_xhci_device_doorbell (718500, 1, 3, 0) + fa
 0000000000435a8a pci_xhci_dbregs_write (718500, 4a4, 3) + 4a
 0000000000435f76 pci_xhci_write (66bdd0, 0, 718290, 0, 4a4, 4, 3) + e6
 0000000000428dfd pci_emul_mem_handler (66bdd0, 0, 2, c20004a4, 4, fffffc7feab4fcc8, 718290, 0) + 19d
 000000000041f6b7 mem_write (66bdd0, 0, c20004a4, 3, 4, 685940) + 47
 0000000000441ae2 emulate_mov (66bdd0, 0, c20004a4, 461148, 41f620, 41f670, 685940) + b2
 000000000044364c vmm_emulate_instruction (66bdd0, 0, c20004a4, 461148, 461130, 41f620, 41f670, 685940) + 6c
 000000000041f7dc emulate_mem (66bdd0, 0, c20004a4, 461148, 461130) + 11c
 000000000041b118 vmexit_inst_emul (66bdd0, 461100, fffffc7feab4fefc) + 48
 000000000041b3af vm_loop (66bdd0, 0, fff0) + cf
 000000000041a90f fbsdrun_start_thread (665380) + 4f
 fffffc7fef2adf2a _thrp_setup (fffffc7fef118a40) + 8a
 fffffc7fef2ae240 _lwp_start ()

The hang likely requires some mouse and/or keyboard input to trigger this.

The hang is caused by the fact that pci_xhci_handle_transfer() has already taken the lock that its callee pci_xhci_try_usb_xfer() tries to take.

Comments

Comment by Patrick Mooney [X]
Created at 2018-05-04T03:06:04.356Z

I would double-check upstream to make sure it's not already been addressed. If not, it would be a good one to get upstream once we've addressed it.


Comment by Mike Gerdts [X]
Created at 2018-05-04T03:34:31.119Z

I have a fix and am working through the FreeBSD build process now.


Comment by Mike Gerdts [X]
Created at 2018-05-08T15:44:32.495Z

FreeBSD process is stalled on copyrights. Going ahead with this fix in SmartOS with normal #ifdefs to track delta.

Test of the fix involves this script:

bootrom=/zones/media/uefi-rom.bin
pfexec bhyve -H -B "1,product=OmniOS HVM" \
        -s 0,hostbridge \
        -s 31,lpc \
        -l bootrom,$bootrom \
        -l com1,stdio -c 2 -m 1G \
        -s 3:0,ahci-hd,/dev/zvol/rdsk/zones/hdd-windows \
        -s 4:0,ahci-cd,/zones/media/win2012-eval-20180501.iso \
        -s 28,fbuf,vga=off,tcp=0.0.0.0:5900,wait,w=1024,h=768 \
        -s 28:1,xhci,tablet \
        windows

I then connected to the vnc server on port 5900. Without this fix, a hang was observed before reading the CD. With the fix, it is able to boot from the CD. When it reaches the part of the installer where the mouse is usable, the VNC mouse the guest mouse track quite well, showing that the xhci,tablet device is working. Without xhci,tablet, the mouse still works but tracks poorly.


Comment by Jira Bot
Created at 2018-05-09T01:01:54.333Z

illumos-joyent commit db9746572d6692af88b9f55827fcb63f1bcb3c46 (branch master, by Mike Gerdts)

OS-6928 recursive mutex in pci_xhci
Reviewed by: John Levon <john.levon@joyent.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: Patrick Mooney <patrick.mooney@joyent.com>