OS-4682

dlmgmtd deadlock in vnic_dev_delete

Status:
Resolved
Resolution:
Duplicate
Created:
2015-08-31T00:05:35.000-0400
Updated:
2016-12-09T14:00:47.000-0500

Description

This looks very similar to OS-3342, and possibly related to OS-4260. However, I observed this on CN MS10210, which was running joyent_20141226T032659Z, which I believe should have the fix for OS-3342.

We found Marlin zone resets hung. The oldest process waiting for a zone to boot was this one:

[root@MS10210 (us-east-1) ~]# pargs 69394
69394:  zoneadm -z eddb8c2a-6a5d-483b-a640-953a1be85089 boot
argv[0]: zoneadm
argv[1]: -z
argv[2]: eddb8c2a-6a5d-483b-a640-953a1be85089
argv[3]: boot

[root@MS10210 (us-east-1) ~]# pfiles 69394
69394:  zoneadm -z eddb8c2a-6a5d-483b-a640-953a1be85089 boot
  Current rlimit: 65536 file descriptors
   0: S_IFSOCK mode:0666 dev:558,0 ino:52794 uid:0 gid:0 rdev:0,0
      O_RDWR
        SOCK_STREAM
        SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX 
        peer: node[38750] zone: global[0]
   1: S_IFSOCK mode:0666 dev:558,0 ino:12377 uid:0 gid:0 rdev:0,0
      O_RDWR
        SOCK_STREAM
        SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX 
        peer: node[38750] zone: global[0]
   2: S_IFSOCK mode:0666 dev:558,0 ino:24346 uid:0 gid:0 rdev:0,0
      O_RDWR
        SOCK_STREAM
        SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX 
        peer: node[38750] zone: global[0]
   3: S_IFDOOR mode:0444 dev:560,0 ino:46 uid:0 gid:0 rdev:559,0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[5351]
      /var/run/name_service_door
   4: S_IFCHR mode:0666 dev:549,0 ino:47185924 uid:0 gid:3 rdev:90,0
      O_RDWR
      /devices/pseudo/zfs@0:zfs
      offset:0
   5: S_IFREG mode:0444 dev:553,1 ino:2 uid:0 gid:0 rdev:0,0
      O_RDONLY
      /etc/mnttab
      offset:0
   6: S_IFREG mode:0444 dev:557,1 ino:128 uid:0 gid:0 rdev:0,0
      O_RDONLY
      /etc/dfs/sharetab
      offset:0
   7: S_IFCHR mode:0666 dev:549,0 ino:47185924 uid:0 gid:3 rdev:90,0
      O_RDWR
      /devices/pseudo/zfs@0:zfs
      offset:0
   8: S_IFDOOR mode:0600 dev:560,0 ino:650 uid:0 gid:0 rdev:559,0
      O_RDONLY  door to zoneadmd[74318]
      /var/run/zones/eddb8c2a-6a5d-483b-a640-953a1be85089.zoneadmd_door

It's making a door call to its zoneadmd, which is:

[root@MS10210 (us-east-1) ~]# pstack 74318
74318:  zoneadmd -z eddb8c2a-6a5d-483b-a640-953a1be85089
-----------------  lwp# 1 / thread# 1  --------------------
 fffffd7fff2c52c7 lwp_park (0, 0, 0)
 fffffd7fff2bd1d9 mutex_lock_impl (4340a0, 0) + 189
 fffffd7fff2bd2c3 mutex_lock (4340a0) + 13
 000000000041291e serve_console (437d40) + 16e
 000000000041144f main (3, fffffd7fffdffce8) + 95f
 000000000040d44c _start () + 6c
-----------------  lwp# 2 / thread# 2  --------------------
 fffffd7fff2cca61 door     (3, ffbffeff, 1, 1ff, 0, 8)
 fffffd7fff2c4f6a _thrp_setup (fffffd7fff050a40) + 8a
 fffffd7fff2c5280 _lwp_start ()
-----------------  lwp# 3 / thread# 3  --------------------
 fffffd7fff2cc15a read     (6, 4881f4, 1400)
 fffffd7fff2957de _filbuf (435f40) + 6e
 fffffd7fff298130 fgets (fffffd7ffe53ca80, 400, 435f40) + 160
 000000000040ea80 do_subproc (fffffd7ffe53e7d0, fffffd7ffe53cf00, 0, 0) + a0
 000000000040debc brand_poststatechg (fffffd7ffe53e7d0, 2, 0, 0) + ac
 000000000040e037 zone_ready (fffffd7ffe53e7d0, 0, 2, 0) + d7
 00000000004100f4 server (0, fffffd7ffe53e8e0, 518, 0, 0) + 7c4
 fffffd7fff2ccac0 __door_return () + 50
-----------------  lwp# 4 / thread# 4  --------------------
 fffffd7fff2cca8d door     (0, 0, 0, fffffd7ffe15ee00, 1edf00, a)
 fffffd7fff2b087d door_return (0, 0, 0, 0) + cd
 fffffd7fff2b0f3c door_create_func (0) + 2c
 fffffd7fff2c4f6a _thrp_setup (fffffd7fff051a40) + 8a
 fffffd7fff2c5280 _lwp_start ()

which is waiting for a child process to do something. That child process has been stuck for a while:

[root@MS10210 (us-east-1) ~]# ptree 74318
74318 zoneadmd -z eddb8c2a-6a5d-483b-a640-953a1be85089
  85545 /bin/ksh -p /usr/lib/brand/joyent-minimal/poststate eddb8c2a-6a5d-483b-a640-
    85564 /bin/ksh -p /usr/lib/brand/joyent-minimal/statechange post eddb8c2a-6a5d-4
      22352 dladm create-vnic -t -l ixgbe1 -p zone=eddb8c2a-6a5d-483b-a640-953a1be85
[root@MS10210 (us-east-1) ~]# ps -ostime -p 22352
   STIME
12:50:47
[root@MS10210 (us-east-1) ~]# ps -oetime -p 22352
    ELAPSED
   14:54:23

That process is making a door call to dlmgmtd:

[root@MS10210 (us-east-1) ~]# pargs 22352
22352:  dladm create-vnic -t -l ixgbe1 -p zone=eddb8c2a-6a5d-483b-a640-953a1be85089 -m 
argv[0]: dladm
argv[1]: create-vnic
argv[2]: -t
argv[3]: -l
argv[4]: ixgbe1
argv[5]: -p
argv[6]: zone=eddb8c2a-6a5d-483b-a640-953a1be85089
argv[7]: -m
argv[8]: 90:b8:d0:23:dd:1a
argv[9]: -v
argv[10]: 1355
argv[11]: tmp855640
[root@MS10210 (us-east-1) ~]# pstack 22352
22352:  dladm create-vnic -t -l ixgbe1 -p zone=eddb8c2a-6a5d-483b-a640-953a1be
 feefba48 door     (7, 8047608, 0, 0, 0, 3)
 feee87d2 door_call (7, 8047608, 28, fed8c94e) + ed
 fed8c9b6 dladm_door_call (807c1b0, 8047688, 28, 8047670, 804766c, 8047888) + 76
 fed8d4c0 dladm_zname2info (807c1b0, 0, 8047888, 80478a8, 0, 0) + 6c
 fed8d561 dladm_name2info (807c1b0, 8047888, 80478a8, 0, 0, 0) + 2c
 0805d9c8 do_create_vnic (b) + 5b5
 08058aff main     (80478fc, fef73688, 8047938, 8056177, c, 8047944) + ac
 08056177 _start   (c, 8047a88, 8047a8e, 8047a9a, 8047a9d, 8047aa0) + 83
[root@MS10210 (us-east-1) ~]# pfiles 22352
22352:  dladm create-vnic -t -l ixgbe1 -p zone=eddb8c2a-6a5d-483b-a640-953a1be
  Current rlimit: 65536 file descriptors
   0: S_IFIFO mode:0000 dev:547,0 ino:536229222 uid:0 gid:0 rdev:0,0
      O_RDWR
   1: S_IFIFO mode:0000 dev:547,0 ino:536498353 uid:0 gid:0 rdev:0,0
      O_RDWR
   2: S_IFIFO mode:0000 dev:547,0 ino:536498353 uid:0 gid:0 rdev:0,0
      O_RDWR
   3: S_IFCHR mode:0666 dev:549,0 ino:9437188 uid:0 gid:3 rdev:18,0
      O_RDWR|O_LARGEFILE
      /devices/pseudo/dld@0:ctl
      offset:0
   4: S_IFCHR mode:0666 dev:549,0 ino:9437188 uid:0 gid:3 rdev:18,0
      O_RDWR
      /devices/pseudo/dld@0:ctl
      offset:0
   5: S_IFDOOR mode:0644 dev:560,0 ino:63 uid:15 gid:65 rdev:559,0
      O_RDONLY|O_LARGEFILE  door to dlmgmtd[21]
      /etc/svc/volatile/dladm/dlmgmt_door
   6: S_IFCHR mode:0666 dev:549,0 ino:19922952 uid:0 gid:3 rdev:38,2
      O_RDWR|O_LARGEFILE
      /devices/pseudo/mm@0:null
      offset:0
   7: S_IFDOOR mode:0644 dev:560,0 ino:63 uid:15 gid:65 rdev:559,0
      O_RDONLY  door to dlmgmtd[21]
      /etc/svc/volatile/dladm/dlmgmt_door

whose threads are here:

[root@MS10210 (us-east-1) ~]# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs sd lofs idm mpt_sas crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs ipc ]
> 0t21::pid2proc | ::walk thread | ::stacks
THREAD           STATE    SOBJ                COUNT
ffffff431dbf8120 SLEEP    SHUTTLE              2732
                 swtch+0x141
                 shuttle_swtch+0x203
                 door_return+0x214
                 doorfs32+0x180
                 sys_syscall32+0x109

ffffff431dba2840 STOPPED  <NONE>               1029
                 swtch+0x141
                 stop+0x3f6
                 issig_forreal+0x36c
                 issig+0x25
                 door_return+0x3ef
                 doorfs32+0x180
                 sys_syscall32+0x109

ffffff43433013c0 SLEEP    CV                    178
                 swtch+0x141
                 cv_wait_sig_swap_core+0x1b9
                 cv_wait_sig_swap+0x17
                 cv_waituntil_sig+0xbd
                 lwp_park+0x15e
                 syslwp_park+0x63
                 sys_syscall32+0x109

ffffff43bbeba520 STOPPED  <NONE>                  9
                 swtch+0x141
                 stop+0x3f6
                 issig_forreal+0x36c
                 issig+0x25
                 cv_wait_sig_swap_core+0x303
                 cv_wait_sig_swap+0x17
                 cv_waituntil_sig+0xbd
                 lwp_park+0x15e
                 syslwp_park+0x63
                 sys_syscall32+0x109

ffffff44ffd547a0 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait+0x70
                 i_mac_perim_enter+0x63
                 mac_perim_enter_by_mh+0x23
                 mac_perim_enter_by_macname+0x33
                 i_dls_devnet_setzid+0x6b
                 dls_devnet_unset+0x24a
                 dls_devnet_destroy+0x46
                 vnic_dev_delete+0x96
                 vnic_ioc_delete+0x28
                 drv_ioctl+0x1e4
                 cdev_ioctl+0x39
                 spec_ioctl+0x60
                 fop_ioctl+0x55
                 ioctl+0x9b
                 sys_syscall32+0x109

ffffff43f0430840 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait_sig+0x185    
                 lwp_suspend+0xa4
                 syslwp_suspend+0x48
                 sys_syscall32+0x109

ffffff431de61b60 STOPPED  <NONE>                  1
                 swtch+0x141
                 stop+0x3f6
                 issig_forreal+0x36c
                 issig+0x25
                 cv_wait_sig_swap_core+0x303
                 cv_wait_sig_swap+0x17
                 pause+0x45
                 sys_syscall32+0x109

ffffff474ad38820 STOPPED  <NONE>                  1
                 swtch+0x141
                 stop+0x3f6
                 issig_forreal+0x36c
                 issig+0x25
                 door_call+0x4d1
                 doorfs32+0xa7
                 sys_syscall32+0x109

This really looks a lot like OS-3342. I'm afraid we're going to wish we had the userland state of dlmgmtd, but I wasn't sure how to get it. (Last time we ran into this, most of the ptools hung on dlmgmtd, presumably because of the stuck thread.)

Comments (5)

Former user commented on 2016-03-04T18:34:07.000-0500:

We saw this today on CN RM08212 in Manta. Customers reported jobs hung. The Marlin dashboard showed that this CN was resetting all zones and had 20x as much work queued on it as any of the others, which is consistent with stuck zone resets. I've found:

[root@RM08212 (us-east-3) ~]# svcs -p marlin-agent
STATE          STIME    FMRI
online         Jan_22   svc:/smartdc/agent/marlin-agent:default
               17:01:11     3705 zoneadm
               17:01:31     5358 zoneadm
               17:02:10     6618 zoneadm
               17:02:34     6949 zoneadm
               17:03:26     7735 zoneadm
               17:04:35     8858 zoneadm
               17:05:57    12492 zoneadm
               17:06:00    12723 zoneadm
               17:07:21    16987 zoneadm
               17:07:27    17078 zoneadm
               17:07:27    17079 zoneadm
               17:08:38    17977 zoneadm
               17:08:40    18056 zoneadm
               17:11:11    23413 zoneadm
               17:11:14    23655 zoneadm
               17:12:05    26361 zoneadm
               17:12:18    26559 zoneadm
               17:12:50    27193 zoneadm
               17:13:01    27413 zoneadm
               17:14:00    28247 zoneadm
               17:15:04    29092 zoneadm
               Jan_22      38575 node
               16:40:26    42377 zoneadm
               16:40:28    42541 zoneadm
               16:40:29    42710 zoneadm
               16:40:31    42844 zoneadm
               16:40:32    43009 zoneadm
               16:40:32    43043 zoneadm
               16:40:34    43138 zoneadm
               16:40:35    43191 zoneadm
               16:40:37    43452 zoneadm
               16:40:42    43808 zoneadm
               16:40:42    43812 zoneadm
               16:40:43    43846 zoneadm
               16:40:44    44068 zoneadm
               16:40:44    44099 zoneadm
               16:40:44    44132 zoneadm
               16:40:44    44138 zoneadm
               16:40:47    44530 zoneadm
               16:40:48    44536 zoneadm
               16:40:48    44630 zoneadm
               16:40:48    44654 zoneadm
               16:40:49    44711 zoneadm
               16:40:49    44765 zoneadm
               16:40:49    44804 zoneadm
               16:40:50    44850 zoneadm
               16:40:50    44871 zoneadm
               16:40:50    44877 zoneadm
               16:40:50    44927 zoneadm
               16:40:51    44966 zoneadm
               16:40:51    44992 zoneadm
               16:40:53    45546 zoneadm
               16:40:54    45869 zoneadm
               16:40:54    45924 zoneadm
               16:40:54    45946 zoneadm
               17:25:19    48303 zoneadm
               Jan_22      48446 node
               16:41:19    50675 zoneadm
               16:41:25    51288 zoneadm
               16:41:29    51739 zoneadm
               16:42:55    59932 zoneadm
               16:43:08    61323 zoneadm
               16:43:14    61887 zoneadm
               16:43:20    62472 zoneadm
               17:32:24    64704 zoneadm
               17:32:24    64705 zoneadm
               16:43:45    64902 zoneadm
               16:43:48    65168 zoneadm
               17:33:37    65704 zoneadm
               16:44:00    66141 zoneadm
               16:44:00    66146 zoneadm
               16:44:02    66397 zoneadm
               16:44:17    66854 zoneadm
               16:44:57    67631 zoneadm
               16:45:09    68087 zoneadm
               16:45:21    68698 zoneadm
               17:35:34    69076 zoneadm
               16:45:37    70037 zoneadm
               18:26:48    73760 zoneadm
               18:26:54    73850 zoneadm
               18:26:57    73943 zoneadm
               16:47:58    76397 zoneadm
               16:48:35    77021 zoneadm
               16:48:46    77204 zoneadm
               16:49:52    78139 zoneadm
               16:53:18    86707 zoneadm
               16:54:32    87696 zoneadm
               16:56:32    93886 zoneadm
               16:56:36    94190 zoneadm
               16:56:48    94996 zoneadm
               16:57:31    95750 zoneadm
[root@RM08212 (us-east-3) ~]# uname -v
joyent_20160121T174713Z
[root@RM08212 (us-east-3) ~]# pargs 42377
42377:  zoneadm -z a70d4d06-182e-4bdb-8d57-66f04a6dd290 boot
argv[0]: zoneadm
argv[1]: -z
argv[2]: a70d4d06-182e-4bdb-8d57-66f04a6dd290
argv[3]: boot
[root@RM08212 (us-east-3) ~]# pstack 42377
42377:  zoneadm -z a70d4d06-182e-4bdb-8d57-66f04a6dd290 boot
 fef002d8 door     (8, 8047318, 0, 0, 0, 3)
 feeece72 door_call (8, 8047318, 400, fe9ed231) + ed
 fe9ed3d1 zonecfg_call_zoneadmd (8047e93, 8047778, fee01a20, 1) + 1d1
 0805cd35 boot_func (0, 8047ddc, 13d8, 8047dcc) + 22f
 0805637e parse_and_run (1, 8047dd8, 100, 8047dcc) + 3c
 080593f9 main     (8047d8c, fef78728, 8047dc0, 8055b8b, 4, 8047dcc) + 3c1
 08055b8b _start   (4, 8047e88, 8047e90, 8047e93, 8047eb8, 0) + 83
[root@RM08212 (us-east-3) ~]# pfiles 42377
42377:  zoneadm -z a70d4d06-182e-4bdb-8d57-66f04a6dd290 boot
  Current rlimit: 65536 file descriptors
   0: S_IFSOCK mode:0666 dev:561,0 ino:50828 uid:0 gid:0 rdev:0,0
      O_RDWR
        SOCK_STREAM
        SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX 
        peer: node[48446] zone: global[0]
   1: S_IFSOCK mode:0666 dev:561,0 ino:52541 uid:0 gid:0 rdev:0,0
      O_RDWR
        SOCK_STREAM
        SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX 
        peer: node[48446] zone: global[0]
   2: S_IFSOCK mode:0666 dev:561,0 ino:58266 uid:0 gid:0 rdev:0,0
      O_RDWR
        SOCK_STREAM
        SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX 
        peer: node[48446] zone: global[0]
   3: S_IFDOOR mode:0444 dev:563,0 ino:1006 uid:0 gid:0 rdev:562,0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[45478]
   4: S_IFCHR mode:0666 dev:552,0 ino:47185924 uid:0 gid:3 rdev:90,0
      O_RDWR
      /devices/pseudo/zfs@0:zfs
      offset:0
   5: S_IFREG mode:0444 dev:556,1 ino:2 uid:0 gid:0 rdev:0,0
      O_RDONLY
      /etc/mnttab
      offset:0
   6: S_IFREG mode:0444 dev:560,1 ino:128 uid:0 gid:0 rdev:0,0
      O_RDONLY
      /etc/dfs/sharetab
      offset:0
   7: S_IFCHR mode:0666 dev:552,0 ino:47185924 uid:0 gid:3 rdev:90,0
      O_RDWR
      /devices/pseudo/zfs@0:zfs
      offset:0
   8: S_IFDOOR mode:0600 dev:563,0 ino:840 uid:0 gid:0 rdev:562,0
      O_RDONLY  door to zoneadmd[44050]
[root@RM08212 (us-east-3) ~]# pstack 44050
44050:  zoneadmd -z a70d4d06-182e-4bdb-8d57-66f04a6dd290
-----------------  lwp# 1 / thread# 1  --------------------
 fffffd7fff2b80d7 lwp_park (0, 0, 0)
 fffffd7fff2afec9 mutex_lock_impl (4350c0, 0) + 189
 fffffd7fff2affb3 mutex_lock (4350c0) + 13
 0000000000412d5e serve_console (438d40) + 16e
 00000000004117ff main (3, fffffd7fffdffce8) + 95f
 000000000040d71c _start () + 6c
-----------------  lwp# 2 / thread# 2  --------------------
 fffffd7fff2bf8e1 door     (3, ffbffeff, 1, 3ff, 0, 8)
 fffffd7fff2b7d7a _thrp_setup (fffffd7fff030a40) + 8a
 fffffd7fff2b8090 _lwp_start ()
-----------------  lwp# 3 / thread# 3  --------------------
 fffffd7fff2befda read     (6, 4fd744, 1400)
 fffffd7fff287f2e _filbuf (436f40) + 6e
 fffffd7fff28a880 fgets (fffffd7ffe33ca80, 400, 436f40) + 160
 000000000040ed50 do_subproc (fffffd7ffe33e7d0, fffffd7ffe33cf00, 0, 0) + a0
 000000000040e18c brand_poststatechg (fffffd7ffe33e7d0, 2, 0, 0) + ac
 000000000040e307 zone_ready (fffffd7ffe33e7d0, 0, 2, 0) + d7
 00000000004104a4 server (0, fffffd7ffe33e8e0, 518, 0, 0) + 7c4
 fffffd7fff2bf940 __door_return () + 50
-----------------  lwp# 4 / thread# 4  --------------------
 fffffd7fff2bf90d door     (0, 0, 0, fffffd7ffe13fe00, 1edf00, a)
 fffffd7fff2a331d door_return (0, 0, 0, 0) + cd
 fffffd7fff2a39dc door_create_func (0) + 2c
 fffffd7fff2b7d7a _thrp_setup (fffffd7fff031a40) + 8a
 fffffd7fff2b8090 _lwp_start ()
[root@RM08212 (us-east-3) ~]# ptree 44050
44050 zoneadmd -z a70d4d06-182e-4bdb-8d57-66f04a6dd290
  45333 /bin/ksh -p /usr/lib/brand/joyent-minimal/poststate a70d4d06-182e-4bdb-8d57-
    45371 /bin/ksh -p /usr/lib/brand/joyent-minimal/statechange post a70d4d06-182e-4
      77347 dladm create-vnic -t -l ixgbe1 -p mtu=1500,zone=a70d4d06-182e-4bdb-8d57-
[root@RM08212 (us-east-3) ~]# pstack 77347
77347:  dladm create-vnic -t -l ixgbe1 -p mtu=1500,zone=a70d4d06-182e-4bdb-8d5
 feec02d8 door     (7, 80475b8, 0, 0, 0, 3)
 feeace72 door_call (7, 80475b8, 50, fed6d22e) + ed
 fed6d296 dladm_door_call (8105fd8, 8047638, 28, 8047620, 804761c, 8047838) + 76
 fed6dda4 dladm_zname2info (8105fd8, 0, 8047838, 8047858, 0, 0) + 6c
 fed6de45 dladm_name2info (8105fd8, 8047838, 8047858, 0, 0, 0) + 2c
 0805f69f do_create_vnic (b) + 5b5
 08058f2e main     (80478ac, fef38728, 80478e0, 8056ac7, c, 80478ec) + b9
 08056ac7 _start   (c, 8047a38, 8047a3e, 8047a4a, 8047a4d, 8047a50) + 83
[root@RM08212 (us-east-3) ~]# pfiles 77347
77347:  dladm create-vnic -t -l ixgbe1 -p mtu=1500,zone=a70d4d06-182e-4bdb-8d5
  Current rlimit: 65536 file descriptors
   0: S_IFIFO mode:0000 dev:550,0 ino:205840738 uid:0 gid:0 rdev:0,0
      O_RDWR
   1: S_IFIFO mode:0000 dev:550,0 ino:205871684 uid:0 gid:0 rdev:0,0
      O_RDWR
   2: S_IFIFO mode:0000 dev:550,0 ino:205871684 uid:0 gid:0 rdev:0,0
      O_RDWR
   3: S_IFCHR mode:0666 dev:552,0 ino:9437188 uid:0 gid:3 rdev:18,0
      O_RDWR|O_LARGEFILE
      /devices/pseudo/dld@0:ctl
      offset:0
   4: S_IFCHR mode:0666 dev:552,0 ino:9437188 uid:0 gid:3 rdev:18,0
      O_RDWR
      /devices/pseudo/dld@0:ctl
      offset:0
   5: S_IFDOOR mode:0644 dev:563,0 ino:63 uid:15 gid:65 rdev:562,0
      O_RDONLY|O_LARGEFILE  door to dlmgmtd[21]
   6: S_IFCHR mode:0666 dev:552,0 ino:19922952 uid:0 gid:3 rdev:38,2
      O_RDWR|O_LARGEFILE
      /devices/pseudo/mm@0:null
      offset:0
   7: S_IFDOOR mode:0644 dev:563,0 ino:63 uid:15 gid:65 rdev:562,0
      O_RDONLY  door to dlmgmtd[21]
[root@RM08212 (us-east-3) ~]# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs mm sd lofs idm mpt_sas sata crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs ipc ]
> 0t21::pid2proc | ::walk thread | ::stacks
THREAD           STATE    SOBJ                COUNT
ffffff431db7d3e0 STOPPED  <NONE>               1360
                 swtch+0x141
                 stop+0x386
                 issig_forreal+0x3e4
                 issig+0x25
                 door_return+0x3ef
                 doorfs32+0x180
                 sys_syscall32+0x14a

ffffff42a9e33840 SLEEP    CV                    219
                 swtch+0x141
                 cv_wait_sig_swap_core+0x1b9
                 cv_wait_sig_swap+0x17
                 cv_waituntil_sig+0xbd
                 lwp_park+0x15e
                 syslwp_park+0x63
                 sys_syscall32+0x14a

ffffff43607d1c40 SLEEP    SHUTTLE               104
                 swtch+0x141
                 shuttle_swtch+0x203
                 door_return+0x214
                 doorfs32+0x180
                 sys_syscall32+0x14a

ffffff4531383400 STOPPED  <NONE>                  2
                 swtch+0x141
                 stop+0x386
                 issig_forreal+0x3e4
                 issig+0x25
                 cv_wait_sig_swap_core+0x303
                 cv_wait_sig_swap+0x17
                 cv_waituntil_sig+0xbd
                 lwp_park+0x15e
                 syslwp_park+0x63
                 sys_syscall32+0x14a

ffffff46904e8ae0 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait+0x70
                 i_mac_perim_enter+0x63
                 mac_perim_enter_by_mh+0x23
                 mac_perim_enter_by_macname+0x33
                 i_dls_devnet_setzid+0x6b
                 dls_devnet_unset+0x24a
                 dls_devnet_destroy+0x46
                 vnic_dev_delete+0x96
                 vnic_ioc_delete+0x28
                 drv_ioctl+0x1e4
                 cdev_ioctl+0x39
                 spec_ioctl+0x60
                 fop_ioctl+0x55
                 ioctl+0x9b
                 sys_syscall32+0x14a

ffffff616a3bf7e0 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait_sig+0x185
                 lwp_suspend+0xa4
                 syslwp_suspend+0x48
                 sys_syscall32+0x14a

ffffff431de30b60 STOPPED  <NONE>                  1
                 swtch+0x141
                 stop+0x386
                 issig_forreal+0x3e4
                 issig+0x25
                 cv_wait_sig_swap_core+0x303
                 cv_wait_sig_swap+0x17
                 pause+0x45
                 sys_syscall32+0x14a

Former user commented on 2016-03-04T18:57:22.000-0500:

This may be an instance of O-3506. We'll need to go dig into the dump to figure out and look and see if we have a similar deadlock.

Former user commented on 2016-12-08T17:29:28.000-0500:

This looks like OS-3506#icft=OS-3506 to me, but I need the dump to confirm. I checked thoth but came up empty.

Former user commented on 2016-12-09T13:30:00.000-0500:

Thoth dump 5982fc0918e12882 is a crash dump from MS10210 and from within about 15 minutes of me filing this ticket, so I think that's probably it. The panic message is "BAD TRAP", which is probably confusing, but that's because I used "clock/W -1" to panic the system.

Former user commented on 2016-12-09T14:00:13.000-0500:

I've confirmed this is a duplicate of OS-3506.

We have a dlmgmtd thread trying to perform a VNIC destroy. The dls_devnet_destroy() code grabs the DLS lock then MAC.

> 0t21::pid2proc | ::print proc_t p_tlist | ::list kthread_t t_forw | ::stacks
...
ffffff44ffd547a0 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait+0x70
                 i_mac_perim_enter+0x63
                 mac_perim_enter_by_mh+0x23
                 mac_perim_enter_by_macname+0x33
                 i_dls_devnet_setzid+0x6b
                 dls_devnet_unset+0x24a
                 dls_devnet_destroy+0x46
                 vnic_dev_delete+0x96
                 vnic_ioc_delete+0x28
                 drv_ioctl+0x1e4
                 cdev_ioctl+0x39
                 spec_ioctl+0x60
                 fop_ioctl+0x55
                 ioctl+0x9b
                 sys_syscall32+0x109


> ffffff44ffd547a0::findstack -v
stack pointer for thread ffffff44ffd547a0: ffffff020421e880
[ ffffff020421e880 _resume_from_idle+0xf4() ]
  ffffff020421e8b0 swtch+0x141()
  ffffff020421e8f0 cv_wait+0x70(ffffff431f1fbc04, ffffff431f1fbbf0)
  ffffff020421e930 i_mac_perim_enter+0x63(ffffff443bf780b8)
  ffffff020421e960 mac_perim_enter_by_mh+0x23(ffffff443bf780b8, ffffff020421e9e8
  )
  ffffff020421e9b0 mac_perim_enter_by_macname+0x33(ffffff46bff780d4,
  ffffff020421e9e8)
  ffffff020421ea50 i_dls_devnet_setzid+0x6b(ffffff46bff780b0, 0, 0, 0)
  ffffff020421eab0 dls_devnet_unset+0x24a(ffffff443bf780d0, ffffff020421eb4c, 1
  )
  ffffff020421eb20 dls_devnet_destroy+0x46(ffffff443bf780b8, ffffff020421eb4c, 1
  )
  ffffff020421eb90 vnic_dev_delete+0x96(4ff84, 0, ffffff8bb9c66c88)
  ffffff020421ebd0 vnic_ioc_delete+0x28(ffffff480c879168, ec557d04, 100003,
  ffffff8bb9c66c88, ffffff020421ee58)
  ffffff020421ec70 drv_ioctl+0x1e4(1200000000, 1710002, ec557d04, 100003,
  ffffff8bb9c66c88, ffffff020421ee58)
  ffffff020421ecb0 cdev_ioctl+0x39(1200000000, 1710002, ec557d04, 100003,
  ffffff8bb9c66c88, ffffff020421ee58)
  ffffff020421ed00 spec_ioctl+0x60(ffffff431ddc2880, 1710002, ec557d04, 100003,
  ffffff8bb9c66c88, ffffff020421ee58, 0)
  ffffff020421ed90 fop_ioctl+0x55(ffffff431ddc2880, 1710002, ec557d04, 100003,

The dls_devnet_hold_common() code, however, grabs in the opposite order: MAC then DLS. Thus we have a deadlock.

> ffffff443bf780b8::print mac_impl_t mi_driver | ::print vnic_t vn_mc_handles[0]
 | ::print mac_client_impl_t mci_mip | ::print mac_impl_t mi_perim_owner | ::fin
dstack -v
stack pointer for thread fffffff3cfd23860: ffffff01ff363870
[ ffffff01ff363870 _resume_from_idle+0xf4() ]
  ffffff01ff3638a0 swtch+0x141()
  ffffff01ff363940 turnstile_block+0x21a(0, 0, fffffffffbd096e0,
  fffffffffbc08cc0, 0, 0)
  ffffff01ff3639b0 rw_enter_sleep+0x19b(fffffffffbd096e0, 0)
  ffffff01ff363a20 dls_devnet_hold_common+0x57(4ffb5, ffffff01ff363a68, 0)
  ffffff01ff363a40 dls_devnet_hold+0x17(4ffb5, ffffff01ff363a68)
  ffffff01ff363ac0 dls_devnet_setzid+0x8e(ffffff4322abb238, 2042, 1)
  ffffff01ff363b90 drv_ioc_prop_common+0x3f2(ffffff43aa9202c0, 807e6f8, 1,
  ffffff88788d6258, 100003)
  ffffff01ff363bd0 drv_ioc_setprop+0x29(ffffff43aa9202c0, 807e6f8, 100003,
  ffffff88788d6258, ffffff01ff363e58)
  ffffff01ff363c70 drv_ioctl+0x1e4(1200000000, d1d001b, 807e6f8, 100003,
  ffffff88788d6258, ffffff01ff363e58)
  ffffff01ff363cb0 cdev_ioctl+0x39(1200000000, d1d001b, 807e6f8, 100003,
  ffffff88788d6258, ffffff01ff363e58)
  ffffff01ff363d00 spec_ioctl+0x60(ffffff431ddc2880, d1d001b, 807e6f8, 100003,
  ffffff88788d6258, ffffff01ff363e58, 0)
  ffffff01ff363d90 fop_ioctl+0x55(ffffff431ddc2880, d1d001b, 807e6f8, 100003,
  ffffff88788d6258, ffffff01ff363e58, 0)
  ffffff01ff363eb0 ioctl+0x9b(4, d1d001b, 807e6f8)
  ffffff01ff363f10 _sys_sysenter_post_swapgs+0x153()
> fffffffffbd096e0::rwlock

            ADDR      OWNER/COUNT FLAGS          WAITERS
fffffffffbd096e0 ffffff44ffd547a0  B111 ffffff4ef951d140 (W)
                                    ||| ffffff43ae4ce4a0 (W)
                 WRITE_LOCKED ------+|| fffffff3cfd23860 (W)
                 WRITE_WANTED -------+| ffffff4e2958d3a0 (W)
                  HAS_WAITERS --------+ fffffffa5def3740 (W)
                                        ffffff43bafc2140 (W)
                                        ffffff5077d034a0 (W)
                                        ffffff44abeff120 (W)
                                        ffffff4491c9f4e0 (W)
                                        fffffffb90bbf400 (W)
                                        ffffff4dacc62160 (W)