Issue Type: | Bug |
---|---|
Priority: | 4 - Normal |
Status: | Resolved |
Created at: | 2015-05-05T19:13:50.000Z |
Updated at: | 2016-12-09T18:15:48.000Z |
Created by: | Former user |
---|---|
Reported by: | Former user |
Assigned to: | Former user |
Duplicate: The problem is a duplicate of an existing issue.
(Resolution Date: 2016-12-09T18:15:48.000Z)
We've encountered a three-way deadlock here in dlmgmtd.
We have a thread that is trying to fork:
feefd395 lwp_suspend (1e) feef521b suspend_fork (febd5240, 8059f2c, 4, 0, 0, fef6f400) + 94 feeeb98b forkx (0, 0, fe261760, 8057965, 5) + 11e feeebab3 fork (0, 4, feecab96, fef6b000, 8059a48, fe262458) + 1e 08057984 dlmgmt_zfop (fe261be4, 734, 8057c10, fe261784, 81cdb20, 0) + bf 08057bb1 dlmgmt_zfopen (fe261be4, 8059c13, 734, fe261fe4, fe261be4, 400) + b8 08058f35 dlmgmt_process_db_onereq (81cdb20, 0, 0, 0, 0, 0) + 54 08059077 dlmgmt_process_db_req (81cdb20, 8059c8d, 80593eb, 0, 0, 0) + 4a 0805923a dlmgmt_db_init (734, fe262928, 41, 400) + cc 08054d51 dlmgmt_zone_init (734, fe262df8, fe262d50, 805543c, 806b5e0, fe262d80) + 12e 0805544d dlmgmt_zoneboot (fe262df8, fe262d80, fe262dac, 0, 8134a90, fed44200) + 55 08055220 dlmgmt_handler (0, fe262df8, 8, 0, 0, 805518a) + 96 feefdc9b __door_return () + 4b
While the thread is in this state, it ultimately is blocked trying to supsend thread 0x17. Thread 0x17 is itself in the kernel doing the following deletion:
[ ffffff01646c89e0 _resume_from_idle+0xf4() ] ffffff01646c8a10 swtch+0x141() ffffff01646c8ab0 turnstile_block+0x21a(ffffff2bf1d1eec0, 0, ffffffffc011dad0, fffffffffbc08cc0, 0, 0) ffffff01646c8b20 rw_enter_sleep+0x19b(ffffffffc011dad0, 0) ffffff01646c8b90 vnic_dev_delete+0x43(8c8, 0, ffffff2d7fb5fc98) ffffff01646c8bd0 vnic_ioc_delete+0x28(ffffff2a06278818, fd65fd04, 100003, ffffff2d7fb5fc98, ffffff01646c8e58) ffffff01646c8c70 drv_ioctl+0x1e4(1200000000, 1710002, fd65fd04, 100003, ffffff2d7fb5fc98, ffffff01646c8e58) ffffff01646c8cb0 cdev_ioctl+0x39(1200000000, 1710002, fd65fd04, 100003, ffffff2d7fb5fc98, ffffff01646c8e58) ffffff01646c8d00 spec_ioctl+0x60(ffffff2377ba5000, 1710002, fd65fd04, 100003, ffffff2d7fb5fc98, ffffff01646c8e58, 0) ffffff01646c8d90 fop_ioctl+0x55(ffffff2377ba5000, 1710002, fd65fd04, 100003, ffffff2d7fb5fc98, ffffff01646c8e58, 0) ffffff01646c8eb0 ioctl+0x9b(0, 1710002, fd65fd04) ffffff01646c8f10 _sys_sysenter_post_swapgs+0x153()
So where is the rwlock that it's blocked on:
ADDR OWNER/COUNT FLAGS WAITERS ffffffffc011dad0 ffffff3c54548880 B111 ffffff241f117b80 (W) ||| ffffff3409922b80 (W) WRITE_LOCKED ------+|| ffffff2a744c1b80 (W) WRITE_WANTED -------+| ffffff39a478b860 (W) HAS_WAITERS --------+ stack pointer for thread ffffff3c54548880: ffffff016164e3c0 [ ffffff016164e3c0 _resume_from_idle+0xf4() ] ffffff016164e400 swtch_to+0xb6(ffffff2dc01e9860) ffffff016164e450 shuttle_resume+0x2af(ffffff2dc01e9860, ffffffffc0015fd0) ffffff016164e500 door_upcall+0x212(ffffff237834e400, ffffff016164e5e0, ffffff2353e78e18, ffffffffffffffff, 0) ffffff016164e580 door_ki_upcall_limited+0x67(ffffff2377ae9f58, ffffff016164e5e0, ffffff2353e78e18, ffffffffffffffff, 0) ffffff016164e5c0 stubs_common_code+0x51() ffffff016164e660 i_dls_mgmt_upcall+0xbf(ffffff016164e6b0, 8, ffffff016164e680, 30) ffffff016164e720 dls_mgmt_get_linkinfo+0x66(1907, ffffff25cb84a1c8, 0, 0, 0) ffffff016164e760 stubs_common_code+0x51() ffffff016164e7e0 mac_client_open+0x201(ffffff43c921b168, ffffff420abbf470, 0, 10) ffffff016164e830 i_dls_link_create+0x83(ffffff43c921b180, ffffff016164e848) ffffff016164e890 dls_link_hold_common+0x73(ffffff43c921b180, ffffff016164e8e8, 1) ffffff016164e8b0 dls_link_hold_create+0x1a(ffffff43c921b180, ffffff016164e8e8) ffffff016164e920 dls_devnet_create+0x67(ffffff43c921b168, 1907, 0) ffffff016164eaf0 vnic_dev_create+0x5cb(1907, 0, ffffff016164eb74, ffffff016164eb7c, ffffff016164eb50, ffffff016164eb78, ffffff230000000c, ffffff2300000000, ffffff0100000000, ffffffff00000000, ffffff37cdeb2044, ffffff2300000002, ffffff016164eb70, ffffff23d2dc1640) ffffff016164ebd0 vnic_ioc_create+0xfd(ffffff37cdeb2000, 8041590, 100003, ffffff23d2dc1640, ffffff016164ee58) ffffff016164ec70 drv_ioctl+0x1e4(1200000000, 1710001, 8041590, 100003, ffffff23d2dc1640, ffffff016164ee58) ffffff016164ecb0 cdev_ioctl+0x39(1200000000, 1710001, 8041590, 100003, ffffff23d2dc1640, ffffff016164ee58) ffffff016164ed00 spec_ioctl+0x60(ffffff2377ba5000, 1710001, 8041590, 100003, ffffff23d2dc1640, ffffff016164ee58, 0) ffffff016164ed90 fop_ioctl+0x55(ffffff2377ba5000, 1710001, 8041590, 100003, ffffff23d2dc1640, ffffff016164ee58, 0) ffffff016164eeb0 ioctl+0x9b(3, 1710001, 8041590) ffffff016164ef10 _sys_sysenter_post_swapgs+0x153()
Which of course is blocked trying to grab the table lock in userland. So we need to figure out the right way forward here with the table lock. We need a better way to serialize our door upcalls and basically quiesce them while we fork.
@accountid:62431b8f258562006fa2866a discovered that we hit this again on:
us-east-1 CN MS08214 (https://east1-adminui.joyent.us/servers/00000000-0000-0000-0000-00259094bf40): plat=7.0/20141226T032659Z, adminIps=10.0.129.138, traits={"internal": "Manta Node"}, comments="Manta Node"
He is injecting an NMI at this time (~2015-06-23T05:34:01.090Z
).
This is the same as OS-5363.