OS-7175: BLOCKIF_IOV_MAX inadequate for Win2016 on bhyve

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-09-07T23:37:27.745Z)

Fix Versions

2018-09-13 Astronaut Mike Dexter (Release Date: 2018-09-13)

Related Links

Description

2018-08-24 22:16:09, Info                         DeviceIDPresent:Found device with ID [PCI\VEN_1AF4&DEV_1001&SUBSYS_00021AF4&REV_00] in the list.
2018-08-24 22:16:09, Info                  IBS    LookupDeviceIDsInInjectedDriverPackage:Found the device ID in the injected driver list.
2018-08-24 22:16:09, Info                  IBS    IsDeviceSupported:Device [Red Hat VirtIO SCSI controller] is supported
2018-08-24 22:16:09, Info                  IBS    IsDeviceSupported:Device description is [PCI Bus]
2018-08-24 22:16:09, Info                  IBS    DumpDeviceIDs:      H/w    ID [ACPI\VEN_PNP&DEV_0A03]
2018-08-24 22:16:09, Info                  IBS    DumpDeviceIDs:      H/w    ID [ACPI\PNP0A03]
2018-08-24 22:16:09, Info                  IBS    DumpDeviceIDs:      H/w    ID [*PNP0A03]
2018-08-24 22:16:09, Info                  IBS    IsDeviceIDPresent:Found device ID [*PNP0A03] in hwcompat list
2018-08-24 22:16:09, Info                  IBS    IsDeviceSupported:Device [PCI Bus] is supported
2018-08-24 22:16:09, Info                  IBS    IsDeviceSupported:Device description is [Microsoft ACPI-Compliant System]
2018-08-24 22:16:09, Info                  IBS    DumpDeviceIDs:      H/w    ID [ACPI_HAL\PNP0C08]
2018-08-24 22:16:09, Info                  IBS    DumpDeviceIDs:      H/w    ID [*PNP0C08]
2018-08-24 22:16:09, Info                  IBS    IsDeviceIDPresent:Found device ID [*PNP0C08] in hwcompat list
2018-08-24 22:16:09, Info                  IBS    IsDeviceSupported:Device [Microsoft ACPI-Compliant System] is supported
2018-08-24 22:16:09, Info                  IBS    IsDeviceSupported:Device description is [ACPI x64-based PC]
2018-08-24 22:16:09, Info                  IBS    DumpDeviceIDs:      H/w    ID [acpiapic]
2018-08-24 22:16:09, Info                  IBS    DumpDeviceIDs:      Compat ID [DETECTEDInternal\ACPI_HAL]
2018-08-24 22:16:09, Info                  IBS    DumpDeviceIDs:      Compat ID [DETECTED\ACPI_HAL]
2018-08-24 22:16:09, Info                  IBS    IsDeviceIDPresent:Found device ID [acpiapic] in hwcompat list
2018-08-24 22:16:09, Info                  IBS    IsDeviceSupported:Device [ACPI x64-based PC] is supported
2018-08-24 22:16:09, Info                  IBS    DetermineDeviceSupport:Disk 0 has the necessary driver support
2018-08-24 22:16:17, Info                         PublishDiskInfoOnBlackboard: Successfully serialized disk info.
2018-08-24 22:16:17, Info       [0x0606cc] IBS    ApplyDiskOperationUsingService: Formatting partition on disk [0] at offset [0x1f500000]; DC FS type = 0x3.
2018-08-24 22:16:17, Info       [0x0606cc] IBS    GetDisk: Querying VDS providers...
2018-08-24 22:16:17, Info       [0x0606cc] IBS    GetDisk: Finished querying VDS providers.
2018-08-24 22:16:17, Info       [0x0606cc] IBS    GetDisk: Querying VDS providers...
2018-08-24 22:16:17, Info       [0x0606cc] IBS    GetDisk: Finished querying VDS providers.
2018-08-24 22:16:17, Info       [0x0606cc] IBS    GetDisk: Querying VDS providers...
2018-08-24 22:16:17, Info       [0x0606cc] IBS    GetDisk: Finished querying VDS providers.
2018-08-24 22:16:17, Info       [0x0606cc] IBS    GetDisk: Querying VDS providers...
2018-08-24 22:16:17, Info       [0x0606cc] IBS    GetDisk: Finished querying VDS providers.
2018-08-24 22:16:18, Info       [0x0606cc] IBS    GetDisk: Querying VDS providers...
2018-08-24 22:16:18, Info       [0x0606cc] IBS    GetDisk: Finished querying VDS providers.
Assertion failed: n >= 2 && n <= BLOCKIF_IOV_MAX + 2, file pci_virtio_block.c, line 239
viostor driver used: DriverVer=12/04/2014,62.71.104.9600

Thoth ID: f5ccb946a466e80c0456ae9c8b6dcc70

The autoinstall ISO (driver is at /drivers/disk/amd64) and bhyve corefile are at /marsell/public/OS-7175

Comments

Comment by Patrick Mooney
Created at 2018-08-25T04:20:36.549Z
There's a change upstream which fixes this.  I'll look into doing a sync next week.

Comment by Patrick Mooney
Created at 2018-08-29T20:09:41.801Z
Updated at 2018-08-29T20:09:53.800Z
Just to confirm the value which blew the assertion...
The source:
	n = vq_getchain(vq, &idx, iov, BLOCKIF_IOV_MAX + 2, flags);

	/*
	 * The first descriptor will be the read-only fixed header,
	 * and the last is for status (hence +2 above and below).
	 * The remaining iov's are the actual data I/O vectors.
	 *
	 * XXX - note - this fails on crash dump, which does a
	 * VIRTIO_BLK_T_FLUSH with a zero transfer length
	 */
assert(n >= 2 && n <= BLOCKIF_IOV_MAX + 2);

Corresponds with these instructions in the dump:
pci_vtblk_proc+0x47:            call   +0x10ce4 <vq_getchain>
pci_vtblk_proc+0x4c:            movl   %eax,%r14d
pci_vtblk_proc+0x4f:            leal   -0x2(%r14),%eax
pci_vtblk_proc+0x53:            cmpl   $0x11,%eax
pci_vtblk_proc+0x56:            ja     +0x30c   <pci_vtblk_proc+0x368>
...
pci_vtblk_proc+0x368:           movl   $0xef,%edx
pci_vtblk_proc+0x36d:           movl   $0x4514df,%esi
pci_vtblk_proc+0x372:           movl   $0x44c7f8,%edi
pci_vtblk_proc+0x377:           call   -0x1c3d4 <PLT=libc.so.1`_assert>

The %eax value is lost, unfortunately, but it lives on through %r14, which we can extract from the _assert stack frame:
libc.so.1`_assert+0x33:         movq   %r14,-0x28(%rbp)
To find the value:
> $C
fffffc7feb0007b0 libc.so.1`_lwp_kill+0xa()
fffffc7feb0007e0 libc.so.1`raise+0x20(6)
fffffc7feb000830 libc.so.1`abort+0x98()
fffffc7feb000a80 0xfffffc7fef2217da()
fffffc7feb000c60 pci_vtblk_proc+0x37c(681c40, 681c90)
...
> fffffc7feb000a80-28/D
0xfffffc7feb000a58:             35
Which is definitely beyond the existing limit of 17.

Comment by Patrick Mooney
Created at 2018-08-29T20:11:26.893Z
I was incorrect in stating that upstream had fixed this.  The review is still pending:
https://reviews.freebsd.org/D9033

Since we already have this #ifdef-ed from the old port, we can simply use the extended value now.

 

Comment by Patrick Mooney
Created at 2018-09-07T21:29:44.859Z
I tested this by booting up various guest OSes (Linux, Windows 2016, FreeBSD, SmartOS), incurring some disk IO load in them once they were booted. With all of the queue length changes (block_if, vtblk queue size and seg_max), those guests were content with the change.

Comment by Jira Bot
Created at 2018-09-07T23:37:32.058Z
illumos-joyent commit 192e1e6405f98e4b0a12f9488793c5dd000f3f7e (branch master, by Patrick Mooney)

OS-7175 BLOCKIF_IOV_MAX inadequate for Win2016 on bhyve
Reviewed by: Jorge Schrauwen <jorge@blackdot.be>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Jason King <jbk@joyent.com>