This looks very similar to OS-3342, and possibly related to OS-4260. However, I observed this on CN MS10210, which was running joyent_20141226T032659Z, which I believe should have the fix for OS-3342.
We found Marlin zone resets hung. The oldest process waiting for a zone to boot was this one:
[root@MS10210 (us-east-1) ~]# pargs 69394
69394: zoneadm -z eddb8c2a-6a5d-483b-a640-953a1be85089 boot
argv[0]: zoneadm
argv[1]: -z
argv[2]: eddb8c2a-6a5d-483b-a640-953a1be85089
argv[3]: boot
[root@MS10210 (us-east-1) ~]# pfiles 69394
69394: zoneadm -z eddb8c2a-6a5d-483b-a640-953a1be85089 boot
Current rlimit: 65536 file descriptors
0: S_IFSOCK mode:0666 dev:558,0 ino:52794 uid:0 gid:0 rdev:0,0
O_RDWR
SOCK_STREAM
SO_SNDBUF(16384),SO_RCVBUF(5120)
sockname: AF_UNIX
peer: node[38750] zone: global[0]
1: S_IFSOCK mode:0666 dev:558,0 ino:12377 uid:0 gid:0 rdev:0,0
O_RDWR
SOCK_STREAM
SO_SNDBUF(16384),SO_RCVBUF(5120)
sockname: AF_UNIX
peer: node[38750] zone: global[0]
2: S_IFSOCK mode:0666 dev:558,0 ino:24346 uid:0 gid:0 rdev:0,0
O_RDWR
SOCK_STREAM
SO_SNDBUF(16384),SO_RCVBUF(5120)
sockname: AF_UNIX
peer: node[38750] zone: global[0]
3: S_IFDOOR mode:0444 dev:560,0 ino:46 uid:0 gid:0 rdev:559,0
O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[5351]
/var/run/name_service_door
4: S_IFCHR mode:0666 dev:549,0 ino:47185924 uid:0 gid:3 rdev:90,0
O_RDWR
/devices/pseudo/zfs@0:zfs
offset:0
5: S_IFREG mode:0444 dev:553,1 ino:2 uid:0 gid:0 rdev:0,0
O_RDONLY
/etc/mnttab
offset:0
6: S_IFREG mode:0444 dev:557,1 ino:128 uid:0 gid:0 rdev:0,0
O_RDONLY
/etc/dfs/sharetab
offset:0
7: S_IFCHR mode:0666 dev:549,0 ino:47185924 uid:0 gid:3 rdev:90,0
O_RDWR
/devices/pseudo/zfs@0:zfs
offset:0
8: S_IFDOOR mode:0600 dev:560,0 ino:650 uid:0 gid:0 rdev:559,0
O_RDONLY door to zoneadmd[74318]
/var/run/zones/eddb8c2a-6a5d-483b-a640-953a1be85089.zoneadmd_door
It's making a door call to its zoneadmd, which is:
[root@MS10210 (us-east-1) ~]# pstack 74318
74318: zoneadmd -z eddb8c2a-6a5d-483b-a640-953a1be85089
----------------- lwp# 1 / thread# 1 --------------------
fffffd7fff2c52c7 lwp_park (0, 0, 0)
fffffd7fff2bd1d9 mutex_lock_impl (4340a0, 0) + 189
fffffd7fff2bd2c3 mutex_lock (4340a0) + 13
000000000041291e serve_console (437d40) + 16e
000000000041144f main (3, fffffd7fffdffce8) + 95f
000000000040d44c _start () + 6c
----------------- lwp# 2 / thread# 2 --------------------
fffffd7fff2cca61 door (3, ffbffeff, 1, 1ff, 0, 8)
fffffd7fff2c4f6a _thrp_setup (fffffd7fff050a40) + 8a
fffffd7fff2c5280 _lwp_start ()
----------------- lwp# 3 / thread# 3 --------------------
fffffd7fff2cc15a read (6, 4881f4, 1400)
fffffd7fff2957de _filbuf (435f40) + 6e
fffffd7fff298130 fgets (fffffd7ffe53ca80, 400, 435f40) + 160
000000000040ea80 do_subproc (fffffd7ffe53e7d0, fffffd7ffe53cf00, 0, 0) + a0
000000000040debc brand_poststatechg (fffffd7ffe53e7d0, 2, 0, 0) + ac
000000000040e037 zone_ready (fffffd7ffe53e7d0, 0, 2, 0) + d7
00000000004100f4 server (0, fffffd7ffe53e8e0, 518, 0, 0) + 7c4
fffffd7fff2ccac0 __door_return () + 50
----------------- lwp# 4 / thread# 4 --------------------
fffffd7fff2cca8d door (0, 0, 0, fffffd7ffe15ee00, 1edf00, a)
fffffd7fff2b087d door_return (0, 0, 0, 0) + cd
fffffd7fff2b0f3c door_create_func (0) + 2c
fffffd7fff2c4f6a _thrp_setup (fffffd7fff051a40) + 8a
fffffd7fff2c5280 _lwp_start ()
which is waiting for a child process to do something. That child process has been stuck for a while:
[root@MS10210 (us-east-1) ~]# ptree 74318
74318 zoneadmd -z eddb8c2a-6a5d-483b-a640-953a1be85089
85545 /bin/ksh -p /usr/lib/brand/joyent-minimal/poststate eddb8c2a-6a5d-483b-a640-
85564 /bin/ksh -p /usr/lib/brand/joyent-minimal/statechange post eddb8c2a-6a5d-4
22352 dladm create-vnic -t -l ixgbe1 -p zone=eddb8c2a-6a5d-483b-a640-953a1be85
[root@MS10210 (us-east-1) ~]# ps -ostime -p 22352
STIME
12:50:47
[root@MS10210 (us-east-1) ~]# ps -oetime -p 22352
ELAPSED
14:54:23
That process is making a door call to dlmgmtd:
[root@MS10210 (us-east-1) ~]# pargs 22352
22352: dladm create-vnic -t -l ixgbe1 -p zone=eddb8c2a-6a5d-483b-a640-953a1be85089 -m
argv[0]: dladm
argv[1]: create-vnic
argv[2]: -t
argv[3]: -l
argv[4]: ixgbe1
argv[5]: -p
argv[6]: zone=eddb8c2a-6a5d-483b-a640-953a1be85089
argv[7]: -m
argv[8]: 90:b8:d0:23:dd:1a
argv[9]: -v
argv[10]: 1355
argv[11]: tmp855640
[root@MS10210 (us-east-1) ~]# pstack 22352
22352: dladm create-vnic -t -l ixgbe1 -p zone=eddb8c2a-6a5d-483b-a640-953a1be
feefba48 door (7, 8047608, 0, 0, 0, 3)
feee87d2 door_call (7, 8047608, 28, fed8c94e) + ed
fed8c9b6 dladm_door_call (807c1b0, 8047688, 28, 8047670, 804766c, 8047888) + 76
fed8d4c0 dladm_zname2info (807c1b0, 0, 8047888, 80478a8, 0, 0) + 6c
fed8d561 dladm_name2info (807c1b0, 8047888, 80478a8, 0, 0, 0) + 2c
0805d9c8 do_create_vnic (b) + 5b5
08058aff main (80478fc, fef73688, 8047938, 8056177, c, 8047944) + ac
08056177 _start (c, 8047a88, 8047a8e, 8047a9a, 8047a9d, 8047aa0) + 83
[root@MS10210 (us-east-1) ~]# pfiles 22352
22352: dladm create-vnic -t -l ixgbe1 -p zone=eddb8c2a-6a5d-483b-a640-953a1be
Current rlimit: 65536 file descriptors
0: S_IFIFO mode:0000 dev:547,0 ino:536229222 uid:0 gid:0 rdev:0,0
O_RDWR
1: S_IFIFO mode:0000 dev:547,0 ino:536498353 uid:0 gid:0 rdev:0,0
O_RDWR
2: S_IFIFO mode:0000 dev:547,0 ino:536498353 uid:0 gid:0 rdev:0,0
O_RDWR
3: S_IFCHR mode:0666 dev:549,0 ino:9437188 uid:0 gid:3 rdev:18,0
O_RDWR|O_LARGEFILE
/devices/pseudo/dld@0:ctl
offset:0
4: S_IFCHR mode:0666 dev:549,0 ino:9437188 uid:0 gid:3 rdev:18,0
O_RDWR
/devices/pseudo/dld@0:ctl
offset:0
5: S_IFDOOR mode:0644 dev:560,0 ino:63 uid:15 gid:65 rdev:559,0
O_RDONLY|O_LARGEFILE door to dlmgmtd[21]
/etc/svc/volatile/dladm/dlmgmt_door
6: S_IFCHR mode:0666 dev:549,0 ino:19922952 uid:0 gid:3 rdev:38,2
O_RDWR|O_LARGEFILE
/devices/pseudo/mm@0:null
offset:0
7: S_IFDOOR mode:0644 dev:560,0 ino:63 uid:15 gid:65 rdev:559,0
O_RDONLY door to dlmgmtd[21]
/etc/svc/volatile/dladm/dlmgmt_door
whose threads are here:
[root@MS10210 (us-east-1) ~]# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs sd lofs idm mpt_sas crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs ipc ]
> 0t21::pid2proc | ::walk thread | ::stacks
THREAD STATE SOBJ COUNT
ffffff431dbf8120 SLEEP SHUTTLE 2732
swtch+0x141
shuttle_swtch+0x203
door_return+0x214
doorfs32+0x180
sys_syscall32+0x109
ffffff431dba2840 STOPPED <NONE> 1029
swtch+0x141
stop+0x3f6
issig_forreal+0x36c
issig+0x25
door_return+0x3ef
doorfs32+0x180
sys_syscall32+0x109
ffffff43433013c0 SLEEP CV 178
swtch+0x141
cv_wait_sig_swap_core+0x1b9
cv_wait_sig_swap+0x17
cv_waituntil_sig+0xbd
lwp_park+0x15e
syslwp_park+0x63
sys_syscall32+0x109
ffffff43bbeba520 STOPPED <NONE> 9
swtch+0x141
stop+0x3f6
issig_forreal+0x36c
issig+0x25
cv_wait_sig_swap_core+0x303
cv_wait_sig_swap+0x17
cv_waituntil_sig+0xbd
lwp_park+0x15e
syslwp_park+0x63
sys_syscall32+0x109
ffffff44ffd547a0 SLEEP CV 1
swtch+0x141
cv_wait+0x70
i_mac_perim_enter+0x63
mac_perim_enter_by_mh+0x23
mac_perim_enter_by_macname+0x33
i_dls_devnet_setzid+0x6b
dls_devnet_unset+0x24a
dls_devnet_destroy+0x46
vnic_dev_delete+0x96
vnic_ioc_delete+0x28
drv_ioctl+0x1e4
cdev_ioctl+0x39
spec_ioctl+0x60
fop_ioctl+0x55
ioctl+0x9b
sys_syscall32+0x109
ffffff43f0430840 SLEEP CV 1
swtch+0x141
cv_wait_sig+0x185
lwp_suspend+0xa4
syslwp_suspend+0x48
sys_syscall32+0x109
ffffff431de61b60 STOPPED <NONE> 1
swtch+0x141
stop+0x3f6
issig_forreal+0x36c
issig+0x25
cv_wait_sig_swap_core+0x303
cv_wait_sig_swap+0x17
pause+0x45
sys_syscall32+0x109
ffffff474ad38820 STOPPED <NONE> 1
swtch+0x141
stop+0x3f6
issig_forreal+0x36c
issig+0x25
door_call+0x4d1
doorfs32+0xa7
sys_syscall32+0x109
This really looks a lot like OS-3342. I'm afraid we're going to wish we had the userland state of dlmgmtd, but I wasn't sure how to get it. (Last time we ran into this, most of the ptools hung on dlmgmtd, presumably because of the stuck thread.)
Former user commented on 2016-03-04T18:34:07.000-0500:
We saw this today on CN RM08212 in Manta. Customers reported jobs hung. The Marlin dashboard showed that this CN was resetting all zones and had 20x as much work queued on it as any of the others, which is consistent with stuck zone resets. I've found:
[root@RM08212 (us-east-3) ~]# svcs -p marlin-agent
STATE STIME FMRI
online Jan_22 svc:/smartdc/agent/marlin-agent:default
17:01:11 3705 zoneadm
17:01:31 5358 zoneadm
17:02:10 6618 zoneadm
17:02:34 6949 zoneadm
17:03:26 7735 zoneadm
17:04:35 8858 zoneadm
17:05:57 12492 zoneadm
17:06:00 12723 zoneadm
17:07:21 16987 zoneadm
17:07:27 17078 zoneadm
17:07:27 17079 zoneadm
17:08:38 17977 zoneadm
17:08:40 18056 zoneadm
17:11:11 23413 zoneadm
17:11:14 23655 zoneadm
17:12:05 26361 zoneadm
17:12:18 26559 zoneadm
17:12:50 27193 zoneadm
17:13:01 27413 zoneadm
17:14:00 28247 zoneadm
17:15:04 29092 zoneadm
Jan_22 38575 node
16:40:26 42377 zoneadm
16:40:28 42541 zoneadm
16:40:29 42710 zoneadm
16:40:31 42844 zoneadm
16:40:32 43009 zoneadm
16:40:32 43043 zoneadm
16:40:34 43138 zoneadm
16:40:35 43191 zoneadm
16:40:37 43452 zoneadm
16:40:42 43808 zoneadm
16:40:42 43812 zoneadm
16:40:43 43846 zoneadm
16:40:44 44068 zoneadm
16:40:44 44099 zoneadm
16:40:44 44132 zoneadm
16:40:44 44138 zoneadm
16:40:47 44530 zoneadm
16:40:48 44536 zoneadm
16:40:48 44630 zoneadm
16:40:48 44654 zoneadm
16:40:49 44711 zoneadm
16:40:49 44765 zoneadm
16:40:49 44804 zoneadm
16:40:50 44850 zoneadm
16:40:50 44871 zoneadm
16:40:50 44877 zoneadm
16:40:50 44927 zoneadm
16:40:51 44966 zoneadm
16:40:51 44992 zoneadm
16:40:53 45546 zoneadm
16:40:54 45869 zoneadm
16:40:54 45924 zoneadm
16:40:54 45946 zoneadm
17:25:19 48303 zoneadm
Jan_22 48446 node
16:41:19 50675 zoneadm
16:41:25 51288 zoneadm
16:41:29 51739 zoneadm
16:42:55 59932 zoneadm
16:43:08 61323 zoneadm
16:43:14 61887 zoneadm
16:43:20 62472 zoneadm
17:32:24 64704 zoneadm
17:32:24 64705 zoneadm
16:43:45 64902 zoneadm
16:43:48 65168 zoneadm
17:33:37 65704 zoneadm
16:44:00 66141 zoneadm
16:44:00 66146 zoneadm
16:44:02 66397 zoneadm
16:44:17 66854 zoneadm
16:44:57 67631 zoneadm
16:45:09 68087 zoneadm
16:45:21 68698 zoneadm
17:35:34 69076 zoneadm
16:45:37 70037 zoneadm
18:26:48 73760 zoneadm
18:26:54 73850 zoneadm
18:26:57 73943 zoneadm
16:47:58 76397 zoneadm
16:48:35 77021 zoneadm
16:48:46 77204 zoneadm
16:49:52 78139 zoneadm
16:53:18 86707 zoneadm
16:54:32 87696 zoneadm
16:56:32 93886 zoneadm
16:56:36 94190 zoneadm
16:56:48 94996 zoneadm
16:57:31 95750 zoneadm
[root@RM08212 (us-east-3) ~]# uname -v
joyent_20160121T174713Z
[root@RM08212 (us-east-3) ~]# pargs 42377
42377: zoneadm -z a70d4d06-182e-4bdb-8d57-66f04a6dd290 boot
argv[0]: zoneadm
argv[1]: -z
argv[2]: a70d4d06-182e-4bdb-8d57-66f04a6dd290
argv[3]: boot
[root@RM08212 (us-east-3) ~]# pstack 42377
42377: zoneadm -z a70d4d06-182e-4bdb-8d57-66f04a6dd290 boot
fef002d8 door (8, 8047318, 0, 0, 0, 3)
feeece72 door_call (8, 8047318, 400, fe9ed231) + ed
fe9ed3d1 zonecfg_call_zoneadmd (8047e93, 8047778, fee01a20, 1) + 1d1
0805cd35 boot_func (0, 8047ddc, 13d8, 8047dcc) + 22f
0805637e parse_and_run (1, 8047dd8, 100, 8047dcc) + 3c
080593f9 main (8047d8c, fef78728, 8047dc0, 8055b8b, 4, 8047dcc) + 3c1
08055b8b _start (4, 8047e88, 8047e90, 8047e93, 8047eb8, 0) + 83
[root@RM08212 (us-east-3) ~]# pfiles 42377
42377: zoneadm -z a70d4d06-182e-4bdb-8d57-66f04a6dd290 boot
Current rlimit: 65536 file descriptors
0: S_IFSOCK mode:0666 dev:561,0 ino:50828 uid:0 gid:0 rdev:0,0
O_RDWR
SOCK_STREAM
SO_SNDBUF(16384),SO_RCVBUF(5120)
sockname: AF_UNIX
peer: node[48446] zone: global[0]
1: S_IFSOCK mode:0666 dev:561,0 ino:52541 uid:0 gid:0 rdev:0,0
O_RDWR
SOCK_STREAM
SO_SNDBUF(16384),SO_RCVBUF(5120)
sockname: AF_UNIX
peer: node[48446] zone: global[0]
2: S_IFSOCK mode:0666 dev:561,0 ino:58266 uid:0 gid:0 rdev:0,0
O_RDWR
SOCK_STREAM
SO_SNDBUF(16384),SO_RCVBUF(5120)
sockname: AF_UNIX
peer: node[48446] zone: global[0]
3: S_IFDOOR mode:0444 dev:563,0 ino:1006 uid:0 gid:0 rdev:562,0
O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[45478]
4: S_IFCHR mode:0666 dev:552,0 ino:47185924 uid:0 gid:3 rdev:90,0
O_RDWR
/devices/pseudo/zfs@0:zfs
offset:0
5: S_IFREG mode:0444 dev:556,1 ino:2 uid:0 gid:0 rdev:0,0
O_RDONLY
/etc/mnttab
offset:0
6: S_IFREG mode:0444 dev:560,1 ino:128 uid:0 gid:0 rdev:0,0
O_RDONLY
/etc/dfs/sharetab
offset:0
7: S_IFCHR mode:0666 dev:552,0 ino:47185924 uid:0 gid:3 rdev:90,0
O_RDWR
/devices/pseudo/zfs@0:zfs
offset:0
8: S_IFDOOR mode:0600 dev:563,0 ino:840 uid:0 gid:0 rdev:562,0
O_RDONLY door to zoneadmd[44050]
[root@RM08212 (us-east-3) ~]# pstack 44050
44050: zoneadmd -z a70d4d06-182e-4bdb-8d57-66f04a6dd290
----------------- lwp# 1 / thread# 1 --------------------
fffffd7fff2b80d7 lwp_park (0, 0, 0)
fffffd7fff2afec9 mutex_lock_impl (4350c0, 0) + 189
fffffd7fff2affb3 mutex_lock (4350c0) + 13
0000000000412d5e serve_console (438d40) + 16e
00000000004117ff main (3, fffffd7fffdffce8) + 95f
000000000040d71c _start () + 6c
----------------- lwp# 2 / thread# 2 --------------------
fffffd7fff2bf8e1 door (3, ffbffeff, 1, 3ff, 0, 8)
fffffd7fff2b7d7a _thrp_setup (fffffd7fff030a40) + 8a
fffffd7fff2b8090 _lwp_start ()
----------------- lwp# 3 / thread# 3 --------------------
fffffd7fff2befda read (6, 4fd744, 1400)
fffffd7fff287f2e _filbuf (436f40) + 6e
fffffd7fff28a880 fgets (fffffd7ffe33ca80, 400, 436f40) + 160
000000000040ed50 do_subproc (fffffd7ffe33e7d0, fffffd7ffe33cf00, 0, 0) + a0
000000000040e18c brand_poststatechg (fffffd7ffe33e7d0, 2, 0, 0) + ac
000000000040e307 zone_ready (fffffd7ffe33e7d0, 0, 2, 0) + d7
00000000004104a4 server (0, fffffd7ffe33e8e0, 518, 0, 0) + 7c4
fffffd7fff2bf940 __door_return () + 50
----------------- lwp# 4 / thread# 4 --------------------
fffffd7fff2bf90d door (0, 0, 0, fffffd7ffe13fe00, 1edf00, a)
fffffd7fff2a331d door_return (0, 0, 0, 0) + cd
fffffd7fff2a39dc door_create_func (0) + 2c
fffffd7fff2b7d7a _thrp_setup (fffffd7fff031a40) + 8a
fffffd7fff2b8090 _lwp_start ()
[root@RM08212 (us-east-3) ~]# ptree 44050
44050 zoneadmd -z a70d4d06-182e-4bdb-8d57-66f04a6dd290
45333 /bin/ksh -p /usr/lib/brand/joyent-minimal/poststate a70d4d06-182e-4bdb-8d57-
45371 /bin/ksh -p /usr/lib/brand/joyent-minimal/statechange post a70d4d06-182e-4
77347 dladm create-vnic -t -l ixgbe1 -p mtu=1500,zone=a70d4d06-182e-4bdb-8d57-
[root@RM08212 (us-east-3) ~]# pstack 77347
77347: dladm create-vnic -t -l ixgbe1 -p mtu=1500,zone=a70d4d06-182e-4bdb-8d5
feec02d8 door (7, 80475b8, 0, 0, 0, 3)
feeace72 door_call (7, 80475b8, 50, fed6d22e) + ed
fed6d296 dladm_door_call (8105fd8, 8047638, 28, 8047620, 804761c, 8047838) + 76
fed6dda4 dladm_zname2info (8105fd8, 0, 8047838, 8047858, 0, 0) + 6c
fed6de45 dladm_name2info (8105fd8, 8047838, 8047858, 0, 0, 0) + 2c
0805f69f do_create_vnic (b) + 5b5
08058f2e main (80478ac, fef38728, 80478e0, 8056ac7, c, 80478ec) + b9
08056ac7 _start (c, 8047a38, 8047a3e, 8047a4a, 8047a4d, 8047a50) + 83
[root@RM08212 (us-east-3) ~]# pfiles 77347
77347: dladm create-vnic -t -l ixgbe1 -p mtu=1500,zone=a70d4d06-182e-4bdb-8d5
Current rlimit: 65536 file descriptors
0: S_IFIFO mode:0000 dev:550,0 ino:205840738 uid:0 gid:0 rdev:0,0
O_RDWR
1: S_IFIFO mode:0000 dev:550,0 ino:205871684 uid:0 gid:0 rdev:0,0
O_RDWR
2: S_IFIFO mode:0000 dev:550,0 ino:205871684 uid:0 gid:0 rdev:0,0
O_RDWR
3: S_IFCHR mode:0666 dev:552,0 ino:9437188 uid:0 gid:3 rdev:18,0
O_RDWR|O_LARGEFILE
/devices/pseudo/dld@0:ctl
offset:0
4: S_IFCHR mode:0666 dev:552,0 ino:9437188 uid:0 gid:3 rdev:18,0
O_RDWR
/devices/pseudo/dld@0:ctl
offset:0
5: S_IFDOOR mode:0644 dev:563,0 ino:63 uid:15 gid:65 rdev:562,0
O_RDONLY|O_LARGEFILE door to dlmgmtd[21]
6: S_IFCHR mode:0666 dev:552,0 ino:19922952 uid:0 gid:3 rdev:38,2
O_RDWR|O_LARGEFILE
/devices/pseudo/mm@0:null
offset:0
7: S_IFDOOR mode:0644 dev:563,0 ino:63 uid:15 gid:65 rdev:562,0
O_RDONLY door to dlmgmtd[21]
[root@RM08212 (us-east-3) ~]# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs mm sd lofs idm mpt_sas sata crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs ipc ]
> 0t21::pid2proc | ::walk thread | ::stacks
THREAD STATE SOBJ COUNT
ffffff431db7d3e0 STOPPED <NONE> 1360
swtch+0x141
stop+0x386
issig_forreal+0x3e4
issig+0x25
door_return+0x3ef
doorfs32+0x180
sys_syscall32+0x14a
ffffff42a9e33840 SLEEP CV 219
swtch+0x141
cv_wait_sig_swap_core+0x1b9
cv_wait_sig_swap+0x17
cv_waituntil_sig+0xbd
lwp_park+0x15e
syslwp_park+0x63
sys_syscall32+0x14a
ffffff43607d1c40 SLEEP SHUTTLE 104
swtch+0x141
shuttle_swtch+0x203
door_return+0x214
doorfs32+0x180
sys_syscall32+0x14a
ffffff4531383400 STOPPED <NONE> 2
swtch+0x141
stop+0x386
issig_forreal+0x3e4
issig+0x25
cv_wait_sig_swap_core+0x303
cv_wait_sig_swap+0x17
cv_waituntil_sig+0xbd
lwp_park+0x15e
syslwp_park+0x63
sys_syscall32+0x14a
ffffff46904e8ae0 SLEEP CV 1
swtch+0x141
cv_wait+0x70
i_mac_perim_enter+0x63
mac_perim_enter_by_mh+0x23
mac_perim_enter_by_macname+0x33
i_dls_devnet_setzid+0x6b
dls_devnet_unset+0x24a
dls_devnet_destroy+0x46
vnic_dev_delete+0x96
vnic_ioc_delete+0x28
drv_ioctl+0x1e4
cdev_ioctl+0x39
spec_ioctl+0x60
fop_ioctl+0x55
ioctl+0x9b
sys_syscall32+0x14a
ffffff616a3bf7e0 SLEEP CV 1
swtch+0x141
cv_wait_sig+0x185
lwp_suspend+0xa4
syslwp_suspend+0x48
sys_syscall32+0x14a
ffffff431de30b60 STOPPED <NONE> 1
swtch+0x141
stop+0x386
issig_forreal+0x3e4
issig+0x25
cv_wait_sig_swap_core+0x303
cv_wait_sig_swap+0x17
pause+0x45
sys_syscall32+0x14a
Former user commented on 2016-03-04T18:57:22.000-0500:
This may be an instance of O-3506. We'll need to go dig into the dump to figure out and look and see if we have a similar deadlock.
Former user commented on 2016-12-08T17:29:28.000-0500:
This looks like OS-3506#icft=OS-3506 to me, but I need the dump to confirm. I checked thoth but came up empty.
Former user commented on 2016-12-09T13:30:00.000-0500:
Thoth dump 5982fc0918e12882 is a crash dump from MS10210 and from within about 15 minutes of me filing this ticket, so I think that's probably it. The panic message is "BAD TRAP", which is probably confusing, but that's because I used "clock/W -1" to panic the system.
Former user commented on 2016-12-09T14:00:13.000-0500:
I've confirmed this is a duplicate of OS-3506.
We have a dlmgmtd thread trying to perform a VNIC destroy. The dls_devnet_destroy() code grabs the DLS lock then MAC.
> 0t21::pid2proc | ::print proc_t p_tlist | ::list kthread_t t_forw | ::stacks
...
ffffff44ffd547a0 SLEEP CV 1
swtch+0x141
cv_wait+0x70
i_mac_perim_enter+0x63
mac_perim_enter_by_mh+0x23
mac_perim_enter_by_macname+0x33
i_dls_devnet_setzid+0x6b
dls_devnet_unset+0x24a
dls_devnet_destroy+0x46
vnic_dev_delete+0x96
vnic_ioc_delete+0x28
drv_ioctl+0x1e4
cdev_ioctl+0x39
spec_ioctl+0x60
fop_ioctl+0x55
ioctl+0x9b
sys_syscall32+0x109
> ffffff44ffd547a0::findstack -v
stack pointer for thread ffffff44ffd547a0: ffffff020421e880
[ ffffff020421e880 _resume_from_idle+0xf4() ]
ffffff020421e8b0 swtch+0x141()
ffffff020421e8f0 cv_wait+0x70(ffffff431f1fbc04, ffffff431f1fbbf0)
ffffff020421e930 i_mac_perim_enter+0x63(ffffff443bf780b8)
ffffff020421e960 mac_perim_enter_by_mh+0x23(ffffff443bf780b8, ffffff020421e9e8
)
ffffff020421e9b0 mac_perim_enter_by_macname+0x33(ffffff46bff780d4,
ffffff020421e9e8)
ffffff020421ea50 i_dls_devnet_setzid+0x6b(ffffff46bff780b0, 0, 0, 0)
ffffff020421eab0 dls_devnet_unset+0x24a(ffffff443bf780d0, ffffff020421eb4c, 1
)
ffffff020421eb20 dls_devnet_destroy+0x46(ffffff443bf780b8, ffffff020421eb4c, 1
)
ffffff020421eb90 vnic_dev_delete+0x96(4ff84, 0, ffffff8bb9c66c88)
ffffff020421ebd0 vnic_ioc_delete+0x28(ffffff480c879168, ec557d04, 100003,
ffffff8bb9c66c88, ffffff020421ee58)
ffffff020421ec70 drv_ioctl+0x1e4(1200000000, 1710002, ec557d04, 100003,
ffffff8bb9c66c88, ffffff020421ee58)
ffffff020421ecb0 cdev_ioctl+0x39(1200000000, 1710002, ec557d04, 100003,
ffffff8bb9c66c88, ffffff020421ee58)
ffffff020421ed00 spec_ioctl+0x60(ffffff431ddc2880, 1710002, ec557d04, 100003,
ffffff8bb9c66c88, ffffff020421ee58, 0)
ffffff020421ed90 fop_ioctl+0x55(ffffff431ddc2880, 1710002, ec557d04, 100003,
The dls_devnet_hold_common() code, however, grabs in the opposite order: MAC then DLS. Thus we have a deadlock.
> ffffff443bf780b8::print mac_impl_t mi_driver | ::print vnic_t vn_mc_handles[0]
| ::print mac_client_impl_t mci_mip | ::print mac_impl_t mi_perim_owner | ::fin
dstack -v
stack pointer for thread fffffff3cfd23860: ffffff01ff363870
[ ffffff01ff363870 _resume_from_idle+0xf4() ]
ffffff01ff3638a0 swtch+0x141()
ffffff01ff363940 turnstile_block+0x21a(0, 0, fffffffffbd096e0,
fffffffffbc08cc0, 0, 0)
ffffff01ff3639b0 rw_enter_sleep+0x19b(fffffffffbd096e0, 0)
ffffff01ff363a20 dls_devnet_hold_common+0x57(4ffb5, ffffff01ff363a68, 0)
ffffff01ff363a40 dls_devnet_hold+0x17(4ffb5, ffffff01ff363a68)
ffffff01ff363ac0 dls_devnet_setzid+0x8e(ffffff4322abb238, 2042, 1)
ffffff01ff363b90 drv_ioc_prop_common+0x3f2(ffffff43aa9202c0, 807e6f8, 1,
ffffff88788d6258, 100003)
ffffff01ff363bd0 drv_ioc_setprop+0x29(ffffff43aa9202c0, 807e6f8, 100003,
ffffff88788d6258, ffffff01ff363e58)
ffffff01ff363c70 drv_ioctl+0x1e4(1200000000, d1d001b, 807e6f8, 100003,
ffffff88788d6258, ffffff01ff363e58)
ffffff01ff363cb0 cdev_ioctl+0x39(1200000000, d1d001b, 807e6f8, 100003,
ffffff88788d6258, ffffff01ff363e58)
ffffff01ff363d00 spec_ioctl+0x60(ffffff431ddc2880, d1d001b, 807e6f8, 100003,
ffffff88788d6258, ffffff01ff363e58, 0)
ffffff01ff363d90 fop_ioctl+0x55(ffffff431ddc2880, d1d001b, 807e6f8, 100003,
ffffff88788d6258, ffffff01ff363e58, 0)
ffffff01ff363eb0 ioctl+0x9b(4, d1d001b, 807e6f8)
ffffff01ff363f10 _sys_sysenter_post_swapgs+0x153()
> fffffffffbd096e0::rwlock
ADDR OWNER/COUNT FLAGS WAITERS
fffffffffbd096e0 ffffff44ffd547a0 B111 ffffff4ef951d140 (W)
||| ffffff43ae4ce4a0 (W)
WRITE_LOCKED ------+|| fffffff3cfd23860 (W)
WRITE_WANTED -------+| ffffff4e2958d3a0 (W)
HAS_WAITERS --------+ fffffffa5def3740 (W)
ffffff43bafc2140 (W)
ffffff5077d034a0 (W)
ffffff44abeff120 (W)
ffffff4491c9f4e0 (W)
fffffffb90bbf400 (W)
ffffff4dacc62160 (W)