OS-8054: inotify watches lead to EBUSY during zfs mount


Issue Type:Bug
Priority:4 - Normal
Created at:2019-11-22T16:58:54.872Z
Updated at:2020-06-23T16:33:52.831Z


Created by:Former user
Reported by:Former user
Assigned to:Former user


Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2020-06-03T20:04:18.780Z)

Fix Versions

2020-06-04 T Bone (Release Date: 2020-06-04)

Related Links




lx images that use systemd have problems with delegated datasets that mount as a subdirectory of /. For instance, with centos 7 image 3dbbdcca-2eab-11e8-b925-23bf77789921, if you do:

# /native/sbin/zfs set mountpoint=/data zones/95cd540c-f6f9-c55e-c1a8-8fb17d6345b4/data
[root@95cd540c-f6f9-c55e-c1a8-8fb17d6345b4 ~]# df -h /data
Filesystem                                       Size  Used Avail Use% Mounted on
zones/95cd540c-f6f9-c55e-c1a8-8fb17d6345b4/data   10G   25K   10G   1% /data
# reboot

When the zone comes back up /data is not mounted. Trying to mount it fails due to EBUSY.

[root@95cd540c-f6f9-c55e-c1a8-8fb17d6345b4 ~]# df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/zfsds0      11G  753M   10G   7% /
[root@95cd540c-f6f9-c55e-c1a8-8fb17d6345b4 ~]# /native/sbin/zfs mount -a
cannot mount 'zones/95cd540c-f6f9-c55e-c1a8-8fb17d6345b4/data': mountpoint or dataset is busy
[root@95cd540c-f6f9-c55e-c1a8-8fb17d6345b4 ~]# find /data
[root@95cd540c-f6f9-c55e-c1a8-8fb17d6345b4 ~]# /native/usr/sbin/fuser /data
[root@95cd540c-f6f9-c55e-c1a8-8fb17d6345b4 ~]#

zfs_mount() fails with EBUSY when v_count is non-zero, unless the overlay (-O) mount option is used.

Going into mdb to look at the vnode, we can see that v_count and v_femhead->femh_list->feml_tos (top of stack, the number of items in the list) are equal.

> fffffe0bec6f2580::print vnode_t v_count v_path
v_count = 0x5
v_path = 0xfffffe0be4025488 "/zones/95cd540c-f6f9-c55e-c1a8-8fb17d6345b4/root/foo"
> fffffe0bec6f2580::print vnode_t v_femhead->femh_list->feml_tos
v_femhead->femh_list->feml_tos = 0x5

The Linux inotify(7) man page anticipates that file systems may be mounted over a watched directory and says:

       If a filesystem is mounted on top of a monitored directory, no event
       is generated, and no events are generated for objects immediately
       under the new mount point.  If the filesystem is subsequently
       unmounted, events will subsequently be generated for the directory
       and the objects it contains.

Perhaps zfs_mount should consider a file system to not be busy if all holds are accounted for in v_femhead.


Comment by Former user
Created at 2019-11-22T17:08:26.363Z

This bug was discovered with mountpoint=/data, the critical part being that the mountpoint is a child of /. An effective workaround seems to be:

/native/sbin/zfs set mountpoint=/mnt/data zones/$(zonename)/data
rmdir /data
ln -s /mnt/data /

Comment by Former user
Created at 2019-12-02T14:08:42.596Z

Added the no-upstream flag because inotify is not upstream. If inotify is upstreamed, this should come along for the ride.

Comment by Former user
Created at 2019-12-02T15:30:11.549Z

These changes affect the behavior of mounts in the face of inotify watches only. inotify support has not been upstreamed to illumos.

I added new tests to the zfs test suite to verify that inotify and portfs watches do not interfere with mounting of a file system. I ran these tests against a baseline PI (joyent_20191121T115853Z) generated by Joyent automation and a debug build that I performed. The tests from my build were run against each of them.

I experienced repeated "zfs on zfs" deadlocks running various zpool_upgrade tests so I excluded those from runs.

-bash-4.3$ diff -u /opt/zfs-tests/runfiles/smartos.run smartos-no-nested-zpool.run
--- /opt/zfs-tests/runfiles/smartos.run Tue Nov 26 17:17:29 2019
+++ smartos-no-nested-zpool.run Wed Nov 27 22:38:41 2019
@@ -348,12 +348,6 @@
 tags = ['functional', 'zpool_trim']

-tests = ['zpool_upgrade_001_pos', 'zpool_upgrade_002_pos',
-    'zpool_upgrade_003_pos', 'zpool_upgrade_004_pos', 'zpool_upgrade_005_neg',
-    'zpool_upgrade_006_neg', 'zpool_upgrade_007_pos', 'zpool_upgrade_008_pos',
-    'zpool_upgrade_009_neg']
 tests = ['zdb_001_neg', 'zfs_001_neg', 'zfs_allow_001_neg',
     'zfs_clone_001_neg', 'zfs_create_001_neg', 'zfs_destroy_001_neg',

The differences in the test results are as follows:

--- baseline    Thu Nov 28 12:20:41 2019
+++ fix Thu Nov 28 12:20:28 2019
@@ -224,7 +224,7 @@
 [PASS] /opt/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_mount_all_fail root)
 [PASS] /opt/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_mount_all_mountpoints root)
 [FAIL] /opt/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_mount_encrypted root)
-[FAIL] /opt/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_mount_watched_inotify root)
+[PASS] /opt/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_mount_watched_inotify root)
 [PASS] /opt/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_mount_watched_portfs root)
 [PASS] /opt/zfs-tests/tests/functional/cli_root/zfs_mount/cleanup root)
 [PASS] /opt/zfs-tests/tests/functional/cli_root/zfs_program/setup root)
@@ -410,7 +410,7 @@
 [PASS] /opt/zfs-tests/tests/functional/cli_root/zpool_attach/attach-o_ashift root)
 [PASS] /opt/zfs-tests/tests/functional/cli_root/zpool_attach/cleanup root)
 [PASS] /opt/zfs-tests/tests/functional/cli_root/zpool_clear/setup root)
-[PASS] /opt/zfs-tests/tests/functional/cli_root/zpool_clear/zpool_clear_001_pos root)
+[KILLED] /opt/zfs-tests/tests/functional/cli_root/zpool_clear/zpool_clear_001_pos root)
 [PASS] /opt/zfs-tests/tests/functional/cli_root/zpool_clear/zpool_clear_002_neg root)
 [FAIL] /opt/zfs-tests/tests/functional/cli_root/zpool_clear/zpool_clear_003_neg root)
 [PASS] /opt/zfs-tests/tests/functional/cli_root/zpool_clear/zpool_clear_readonly root)
@@ -869,11 +871,11 @@
 [PASS] /opt/zfs-tests/tests/functional/rsend/rsend_007_pos root)
 [PASS] /opt/zfs-tests/tests/functional/rsend/rsend_013_pos root)
 [PASS] /opt/zfs-tests/tests/functional/rsend/rsend_014_pos root)
-[FAIL] /opt/zfs-tests/tests/functional/rsend/rsend_019_pos root)
+[PASS] /opt/zfs-tests/tests/functional/rsend/rsend_019_pos root)
 [PASS] /opt/zfs-tests/tests/functional/rsend/rsend_020_pos root)
 [FAIL] /opt/zfs-tests/tests/functional/rsend/rsend_021_pos root)
 [PASS] /opt/zfs-tests/tests/functional/rsend/rsend_022_pos root)
-[FAIL] /opt/zfs-tests/tests/functional/rsend/rsend_024_pos root)
+[PASS] /opt/zfs-tests/tests/functional/rsend/rsend_024_pos root)
 [PASS] /opt/zfs-tests/tests/functional/rsend/recv_secpolicy root)
 [FAIL] /opt/zfs-tests/tests/functional/rsend/send-c_verify_ratio root)
 [PASS] /opt/zfs-tests/tests/functional/rsend/send-c_verify_contents root)

That is:

While setting up a CN to run the tests, I saw failures due to missing truncate(1) which should have been caught earlier. I confirmed that the fix to zfstest.ksh notices this to give an early warning. When all the commands are in place, zfstest.ksh works as expected.

Comment by Former user
Created at 2020-05-20T22:32:51.275Z

I've run the new zfs tests (running the full zfs test suite now, but that takes about 7-8 hours, will update with the results).

Additionally, I cd'ed into a watched dir and tried to do zfs_mount to confirm that still results in EBUSY (which it did).

Comment by Former user
Created at 2020-05-29T22:43:46.600Z

After a few red herrings, the full zfs test suite ran successfully (only expected failures).

Comment by Former user
Created at 2020-05-29T22:46:59.863Z

I also made sure the MOUNTEDOVER portfs event was delivered by creating a directory, and then doing a zfs create that would mount in the watched directory.

Comment by Jira Bot
Created at 2020-06-03T20:03:57.720Z

illumos-joyent commit 71b43f2a12f58ef8bc5a1965a3b742749bb49231 (branch master, by Jason King)

OS-8054 inotify watches lead to EBUSY during zfs mount (#305)

Portions contributed by: Mike Gerdts <mike.gerdts@joyent.com>
Reviewed by: John Levon <john.levon@joyent.com>
Reviewed by: Dan McDonald <danmcd@joyent.com>
Approved by: Dan McDonald <danmcd@joyent.com>