OS-7300: bhyve should not sync IO unless necessary

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2018-10-10T19:54:20.064Z
Updated at:2018-12-03T22:58:05.376Z

People

Created by:Patrick Mooney [X]
Reported by:Patrick Mooney [X]
Assigned to:Patrick Mooney [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-12-03T22:57:51.577Z)

Fix Versions

2018-12-06 Grizz (Release Date: 2018-12-06)

Related Issues

Related Links

Labels

bhyve

Description

While investigating less-than-optimal performance, I was dtracing the virtio-block code paths in bhyve. Looking for cases where the thread when off-cpu for a long time, expecting such conditions to be related to potential IO waits, I saw this stack prevalent during the results:

              genunix`cv_wait+0x70
              zfs`zil_commit_waiter+0x8b
              zfs`zil_commit_impl+0x4b
              zfs`zil_commit+0x57
              zfs`zvol_write+0x35a
              genunix`cdev_write+0x2d
              specfs`spec_write+0x4c1
              genunix`fop_write+0xf3
              genunix`pwritev+0x499
              unix`sys_syscall+0x19f

With the FLUSH capability negotiated on the device, we would have expected that normal write requests be buffered while FLUSH would force them to be committed to stable storage via fsync() on the host. Seeing zil_commit calls in the common write path is evidence that syncing is occurring for every write. Looking at zvol_write, the reasons are more clear:

        sync = !(zv->zv_flags & ZVOL_WCE) ||
            (zv->zv_objset->os_sync == ZFS_SYNC_ALWAYS);
...
        if (sync)
                zil_commit(zv->zv_zilog, ZVOL_OBJ);

So if the WCE state isn't asserted on the zvol (and we lack logic in bhyve to make that happen) then all writes will be synchronous. Ideally we should wire up bhyve to use the appropriate DKIO ioctls to make this happen when FLUSH is negotiated.

Comments

Comment by Patrick Mooney [X]
Created at 2018-10-10T23:02:59.441Z

Testing an instance with both a zvol-backed disk (the root) and a file-backed disk (additional data), I dtrace WCE-related actions during boot-up. Both devices started with WCE off, and then transitioned on once FLUSH was negotiated:

0  87164      pci_vtblk_apply_feats:entry pci_vtblk_apply_feats(679990, 0)
  0  87161            blockif_set_wce:entry blockif_set_wce:entry 6bbd70 0
  0   7453                      fcntl:entry fcntl(5, 3, 0000)
  0   7453                      fcntl:entry
  0   7454                     fcntl:return fcntl = 2042
  0   7454                     fcntl:return              8258
  0   7453                      fcntl:entry fcntl(5, 4, 2042)
  0   7453                      fcntl:entry
  0   7454                     fcntl:return fcntl = 0
  0   7454                     fcntl:return                 0
  0   7445                     fdsync:entry
  0   7446                    fdsync:return                 0
  0  87163           blockif_set_wce:return blockif_set_wce:return

  0  87164      pci_vtblk_apply_feats:entry pci_vtblk_apply_feats(6bd950, 0)
 28  87161            blockif_set_wce:entry blockif_set_wce:entry 677db0 0
 28   7437                      ioctl:entry ioctl(4, 425, fffffc7fe8001c2c)
 28   7437                      ioctl:entry     ioctl fffffc7fe8001c2c = 0
 28   7437                      ioctl:entry
 28   7438                     ioctl:return ioctl = 0
 28   7438                     ioctl:return                 0
 28   7445                     fdsync:entry
 28   7446                    fdsync:return                 0
 28  87163           blockif_set_wce:return blockif_set_wce:return

 28  87161            blockif_set_wce:entry blockif_set_wce:entry 6bbd70 0
 28   7453                      fcntl:entry fcntl(5, 3, 0000)
 28   7453                      fcntl:entry
 28   7454                     fcntl:return fcntl = 2042
 28   7454                     fcntl:return              8258
 28   7453                      fcntl:entry fcntl(5, 4, 2042)
 28   7453                      fcntl:entry
 28   7454                     fcntl:return fcntl = 0
 28   7454                     fcntl:return                 0
 28   7445                     fdsync:entry
 28   7446                    fdsync:return                 0
 28  87163           blockif_set_wce:return blockif_set_wce:return

 39  87164      pci_vtblk_apply_feats:entry pci_vtblk_apply_feats(679990, 10000644)
 39  87161            blockif_set_wce:entry blockif_set_wce:entry 677db0 1
 39   7437                      ioctl:entry ioctl(4, 425, fffffc7fe8200c2c)
 39   7437                      ioctl:entry     ioctl fffffc7fe8200c2c = 1
 39   7437                      ioctl:entry
 39   7438                     ioctl:return ioctl = 0
 39   7438                     ioctl:return                 0
 39  87163           blockif_set_wce:return blockif_set_wce:return

 39  87164      pci_vtblk_apply_feats:entry pci_vtblk_apply_feats(6bd950, 10000644)
 39  87161            blockif_set_wce:entry blockif_set_wce:entry 6bbd70 1
 39   7453                      fcntl:entry fcntl(5, 3, 0000)
 39   7453                      fcntl:entry
 39   7454                     fcntl:return fcntl = 2042
 39   7454                     fcntl:return              8258
 39   7453                      fcntl:entry fcntl(5, 4, 2002)
 39   7453                      fcntl:entry
 39   7454                     fcntl:return fcntl = 0
 39   7454                     fcntl:return                 0
 39  87163           blockif_set_wce:return blockif_set_wce:return

Comment by Patrick Mooney [X]
Created at 2018-12-03T21:50:05.835Z

Using the updated wad, I repeated the above test to validate that the WCE toggle was occurring. From inside an instance, I engaged a cache-heavy workload (non-SYNC writes via fio) and forced an fsync afterward. The writes streamed to disk at high rate (thanks to a fixed OS-7314).


Comment by Jira Bot
Created at 2018-12-03T22:58:05.376Z

illumos-joyent commit 09d9ee89aec409242d50a5e168d8d883fc7539e8 (branch master, by Patrick Mooney)

OS-7300 bhyve should not sync IO unless necessary
OS-7416 bhyve BLOCKIF_NUMTHR should keep pace with MAXREQ
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Mike Gerdts <mike.gerdts@joyent.com>
Approved by: Mike Gerdts <mike.gerdts@joyent.com>