OS-5363: dlmgmtd forking with write locks results in deadlock chaos

Details

Issue Type:Bug
Priority:3 - Elevated
Status:Resolved
Created at:2016-04-25T21:43:47.000Z
Updated at:2017-02-03T05:48:23.000Z

People

Created by:Robert Mustacchi [X]
Reported by:Robert Mustacchi [X]
Assigned to:Ryan Zezeski [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2017-01-30T20:38:16.000Z)

Fix Versions

2017-02-02 ICELAND ESCALATION (Release Date: 2017-02-02)

Related Issues

Description

Looking at RM08211:

We have a bunch of dladm processes which are in a bad state:

[root@RM08211 (us-east-2) ~]# pgrep dladm | wc -l
      50

Based on this let's see what's going on with dlmgmtd:

THREAD           STATE    SOBJ                COUNT
ffffff431dc74c60 SLEEP    SHUTTLE              1553
                 swtch+0x141
                 shuttle_swtch+0x203
                 door_return+0x214
                 doorfs32+0x180
                 sys_syscall32+0x14a

ffffff431dc7f8a0 STOPPED  <NONE>                474
                 swtch+0x141
                 stop+0x386
                 issig_forreal+0x3e4
                 issig+0x25
                 door_return+0x3ef
                 doorfs32+0x180
                 sys_syscall32+0x14a

ffffff4332614140 SLEEP    CV                    130
                 swtch+0x141
                 cv_wait_sig_swap_core+0x1b9
                 cv_wait_sig_swap+0x17
                 cv_waituntil_sig+0xbd
                 lwp_park+0x15e
                 syslwp_park+0x63
                 sys_syscall32+0x14a

ffffff43337024c0 STOPPED  <NONE>                 22
                 swtch+0x141
                 stop+0x386
                 issig_forreal+0x3e4
                 issig+0x25
                 cv_wait_sig_swap_core+0x303
                 cv_wait_sig_swap+0x17
                 cv_waituntil_sig+0xbd
                 lwp_park+0x15e
                 syslwp_park+0x63
                 sys_syscall32+0x14a

ffffff4333702c00 SLEEP    RWLOCK                  2
                 swtch+0x141
                 turnstile_block+0x21a
                 rw_enter_sleep+0x19b
                 vnic_dev_delete+0x43
                 vnic_ioc_delete+0x28
                 drv_ioctl+0x1e4
                 cdev_ioctl+0x39
                 spec_ioctl+0x60
                 fop_ioctl+0x55
                 ioctl+0x9b
                 sys_syscall32+0x14a

ffffff44d103e3a0 SLEEP    CV                      1
                 swtch+0x141
                 cv_timedwait_sig_hires+0x39d
                 cv_waituntil_sig+0xfa
                 nanosleep+0x19f
                 sys_syscall32+0x14a

ffffff4e433544c0 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait_sig+0x185
                 lwp_suspend+0xa4
                 syslwp_suspend+0x48
                 sys_syscall32+0x14a  

ffffff431de23060 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait_sig_swap_core+0x1b9
                 cv_wait_sig_swap+0x17
                 pause+0x45
                 sys_syscall32+0x14a

Let's get the thread that's blocked on an rwlock, always a bad sign and
see who has that lock:

> ffffff4333702c00::findstack -v
stack pointer for thread ffffff4333702c00: ffffff01fc0bf9e0
[ ffffff01fc0bf9e0 _resume_from_idle+0x112() ]
  ffffff01fc0bfa10 swtch+0x141()
  ffffff01fc0bfab0 turnstile_block+0x21a(ffffff45f2578520, 0, ffffffffc011b270, fffffffffbc08ce0, 0, 0)
  ffffff01fc0bfb20 rw_enter_sleep+0x19b(ffffffffc011b270, 0)
  ffffff01fc0bfb90 vnic_dev_delete+0x43(24483, 0, ffffffefb9d39a28)
  ffffff01fc0bfbd0 vnic_ioc_delete+0x28(ffffff4562ecaec8, fd6cfd04, 100003, ffffffefb9d39a28, ffffff01fc0bfe58)
  ffffff01fc0bfc70 drv_ioctl+0x1e4(1200000000, 1710002, fd6cfd04, 100003, ffffffefb9d39a28, ffffff01fc0bfe58)
  ffffff01fc0bfcb0 cdev_ioctl+0x39(1200000000, 1710002, fd6cfd04, 100003, ffffffefb9d39a28, ffffff01fc0bfe58)
  ffffff01fc0bfd00 spec_ioctl+0x60(ffffff431dd93980, 1710002, fd6cfd04, 100003, ffffffefb9d39a28, ffffff01fc0bfe58, 0)
  ffffff01fc0bfd90 fop_ioctl+0x55(ffffff431dd93980, 1710002, fd6cfd04, 100003, ffffffefb9d39a28, ffffff01fc0bfe58, 0)
  ffffff01fc0bfeb0 ioctl+0x9b(0, 1710002, fd6cfd04)
  ffffff01fc0bff10 sys_syscall32+0x14a()
> ffffffffc011b270::rwlock
            ADDR      OWNER/COUNT FLAGS          WAITERS
ffffffffc011b270 ffffff448549dbe0  B111 ffffff462bd1d760 (W)
                                    ||| ffffff4333702c00 (W)
                 WRITE_LOCKED ------+|| ffffff433448b880 (W)
                 WRITE_WANTED -------+|
                  HAS_WAITERS --------+
> ffffffffc011b270=p
                vnic_lock       
> ffffff448549dbe0::findstack -v
stack pointer for thread ffffff448549dbe0: ffffff01eb656010
[ ffffff01eb656010 _resume_from_idle+0x112() ]
  ffffff01eb656050 swtch_to+0xb6(ffffff473202cb80)
  ffffff01eb6560a0 shuttle_resume+0x2af(ffffff473202cb80, ffffffffc0014ed0)
  ffffff01eb656150 door_upcall+0x212(ffffff431de42140, ffffff01eb656240, ffffff42e273ce18, ffffffffffffffff, 0)
  ffffff01eb6561d0 door_ki_upcall_limited+0x67(ffffff431dccc5d8, ffffff01eb656240, ffffff42e273ce18, ffffffffffffffff, 0)
  ffffff01eb656220 stubs_common_code+0x59()
  ffffff01eb6562c0 i_dls_mgmt_upcall+0xbf(ffffff01eb6566f0, 28, ffffff01eb6562e0, 410)
  ffffff01eb656760 i_dls_mgmt_get_linkattr+0x76(6, fffffffffbbec385, ffffff01eb656788, ffffff01eb656778)
  ffffff01eb6567c0 dls_mgmt_get_phydev+0x40(6, ffffff01eb6567e0)
  ffffff01eb656830 dls_devnet_hold_common+0x41(6, ffffff01eb6568c8, 1)
  ffffff01eb656850 dls_devnet_hold_tmp+0x1a(6, ffffff01eb6568c8)
  ffffff01eb6568a0 stubs_common_code+0x59()
  ffffff01eb6568f0 mac_open_by_linkid+0x20(6, ffffff45fd86c088)
  ffffff01eb656af0 vnic_dev_create+0x543(244b1, 6, ffffff01eb656b54, ffffff01eb656b5c, ffffff01eb656b60, ffffff01eb656b58, ffffff420000000c, 
  ffffff42000008fc, ffffff0100000000, ffffffff00000000, ffffff7fccaa6044, ffffff4200000000, ffffff01eb656b50, ffffffb7fdbf61a8)
  ffffff01eb656bd0 vnic_ioc_create+0x10d(ffffff7fccaa6000, 80401c0, 100003, ffffffb7fdbf61a8, ffffff01eb656e58)
  ffffff01eb656c70 drv_ioctl+0x1e4(1200000000, 1710001, 80401c0, 100003, ffffffb7fdbf61a8, ffffff01eb656e58)
  ffffff01eb656cb0 cdev_ioctl+0x39(1200000000, 1710001, 80401c0, 100003, ffffffb7fdbf61a8, ffffff01eb656e58)
  ffffff01eb656d00 spec_ioctl+0x60(ffffff431dd93980, 1710001, 80401c0, 100003, ffffffb7fdbf61a8, ffffff01eb656e58, 0)
  ffffff01eb656d90 fop_ioctl+0x55(ffffff431dd93980, 1710001, 80401c0, 100003, ffffffb7fdbf61a8, ffffff01eb656e58, 0)
  ffffff01eb656eb0 ioctl+0x9b(4, 1710001, 80401c0)
  ffffff01eb656f10 _sys_sysenter_post_swapgs+0x153()
> ffffff448549dbe0::print kthread_t t_procp->p_user.u_psargs
t_procp->p_user.u_psargs = [ "dladm create-vnic -t -l ixgbe1 -p mtu=1500,zone=42f6b9ae-20fa-4d74-a5eb-a0d55a9" ]

Okay, so dlmgmtd is trying to grab the vnic_lock which it needs to
delete something. It presumably is holding a table lock while this is
happening, but we'll need to verify that. On the other hand, this other
thread, which appears to be trying to create a VNIC holds the table
lock.

So based on this, let's take a look and see what's going on with
dlmgmtd and its internal locks:

> ::walk thread | ::stacks
THREAD   STATE    SOBJ        COUNT
2        UNPARKED <NONE>       2008
         dlmgmt_handler+0xf4
         libc.so.1`__door_return+0x3d

13       PARKED   MX            138
         libc.so.1`mutex_lock_impl+0x22f
         libc.so.1`mutex_lock+0x10
         libc.so.1`malloc+0x41
         libc.so.1`_ucred_alloc+0x21
         libc.so.1`door_ucred+0x23
         dlmgmt_handler+0x3f
         libc.so.1`__door_return+0x3d

12       UNPARKED <NONE>         11
         dlmgmt_handler+0x10b
         libc.so.1`__door_return+0x3d

1a       PARKED   CV              9
         libc.so.1`cond_wait_queue+0x5e
         libc.so.1`__cond_wait+0x83
         libc.so.1`cond_wait+0x24
         libc.so.1`pthread_cond_wait+0x1e
         dlmgmt_table_lock+0x2b
         dlmgmt_getnext+0x22
         dlmgmt_handler+0x96
         libc.so.1`__door_return+0x3d

881      UNPARKED <NONE>          6
         libc.so.1`door_create_func+0x29
         libc.so.1`_thrp_setup+0x8c
         libc.so.1`_lwp_start

17       UNPARKED <NONE>          2

2a1      PARKED   CV              2
         libc.so.1`cond_wait_queue+0x5e
         libc.so.1`__cond_wait+0x83
         libc.so.1`cond_wait+0x24
         libc.so.1`pthread_cond_wait+0x1e
         dlmgmt_table_lock+0x2b
         dlmgmt_upcall_getattr+0x15
         dlmgmt_handler+0x96
         libc.so.1`__door_return+0x3d

40       PARKED   CV              2
         libc.so.1`cond_wait_queue+0x5e
         libc.so.1`__cond_wait+0x83
         libc.so.1`cond_wait+0x24
         libc.so.1`pthread_cond_wait+0x1e
         dlmgmt_table_lock+0x2b
         dlmgmt_zonehalt+0xb6
         dlmgmt_handler+0x96
         libc.so.1`__door_return+0x3d

322      UNPARKED <NONE>          1
         libc.so.1`cond_wait_queue+0x5e
         libc.so.1`__cond_wait+0x83
         libc.so.1`cond_wait+0x24
         libc.so.1`pthread_cond_wait+0x1e
         dlmgmt_table_lock+0x2b
         dlmgmt_getlinkid+0x2a
         dlmgmt_handler+0x96
         libc.so.1`__door_return+0x3d 

149      UNPARKED <NONE>          1
         libc.so.1`cond_wait_queue+0x5e
         libc.so.1`__cond_wait+0x83
         libc.so.1`cond_wait+0x24
         libc.so.1`pthread_cond_wait+0x1e
         dlmgmt_table_lock+0x2b
         dlmgmt_getnext+0x22
         dlmgmt_handler+0x96
         libc.so.1`__door_return+0x3d

500      UNPARKED <NONE>          1
         libc.so.1`sleep+0x2d
         dlmgmt_zonehalt+0x6f
         dlmgmt_handler+0x96
         libc.so.1`__door_return+0x3d

148      UNPARKED <NONE>          1
         libc.so.1`suspend_fork+0x97
         libc.so.1`forkx+0x121
         libc.so.1`fork+0x1a
         dlmgmt_zfop+0xbf
         dlmgmt_zfopen+0xb8
         dlmgmt_process_db_onereq+0x54
         dlmgmt_process_db_req+0x4a
         dlmgmt_db_init+0xcc
         dlmgmt_zone_init+0x12e
         dlmgmt_zoneboot+0x55
         dlmgmt_handler+0x96
         libc.so.1`__door_return+0x3d

1        UNPARKED <NONE>          1
         main+0xe2
         _start+0x83

18       PARKED   CV              1
         libc.so.1`cond_wait_queue+0x5e
         libc.so.1`__cond_wait+0x83
         libc.so.1`cond_wait+0x24
         libc.so.1`pthread_cond_wait+0x1e
         dlmgmt_table_lock+0x2b
         dlmgmt_getlinkid+0x2a
         dlmgmt_handler+0x96
         libc.so.1`__door_return+0x3d

It appears that tid 2a1 is the one that's trying to create the vnic in
so far as right now it's asking for physical attributes. It's of course,
trying to grab the table lock which is an rwlock. So let's see if it's
trying to grab a read lock or a write lock and if so, what's going on:

> 0x2a1::findstack -v
stack pointer for thread 2a1: d4cd9840
[ d4cd9840 libc.so.1`__lwp_park+0xb() ]
  d4cd9870 libc.so.1`cond_wait_queue+0x5e(806b6d8, 806b6c0, 0, feefbcf8)
  d4cd98b0 libc.so.1`__cond_wait+0x83(806b6d8, 806b6c0, d4cd98d0,
feefbda1, fef73000, d4cd9dd8)
  d4cd98d0 libc.so.1`cond_wait+0x24(806b6d8, 806b6c0, d4cd9e00, f5f00)
  d4cd98f0 libc.so.1`pthread_cond_wait+0x1e(806b6d8, 806b6c0, 0,
d4cd9e00)
  d4cd9910 dlmgmt_table_lock+0x2b(0, d4cda000)
  d4cd9930 dlmgmt_upcall_getattr+0x15(d4cd9dd8, d4cd9960, d4cd9d8c, 0,
8075730, 410)
  d4cd9db0 dlmgmt_handler+0x96(0, d4cd9dd8, 28, 0, 0, 8055196)
  00000000 libc.so.1`__door_return+0x3d()

So actually, all it wants is a read lock. We're also in the midst of a
fork, so let's verify they're not on the scene:

> 0x2a1::tid2ulwp | ::print ulwp_t ul_uberdata->atforklist[]
ul_uberdata->atforklist = {
    ul_uberdata->atforklist->forw = 0xfee10300
    ul_uberdata->atforklist->back = 0xfee10300
    ul_uberdata->atforklist->prepare = libc.so.1`libc_prepare_atfork
    ul_uberdata->atforklist->parent = libc.so.1`libc_parent_atfork
    ul_uberdata->atforklist->child = libc.so.1`libc_child_atfork
}
> 0x2a1::tid2ulwp | ::print -at ulwp_t ul_uberdata->atforklist[]
fee10080 atfork_t ul_uberdata->atforklist = {
    fee10080 struct atfork *ul_uberdata->atforklist->forw = 0xfee10300
    fee10084 struct atfork *ul_uberdata->atforklist->back = 0xfee10300
    fee10088 void (*)() ul_uberdata->atforklist->prepare = libc.so.1`libc_prepare_atfork
    fee1008c void (*)() ul_uberdata->atforklist->parent = libc.so.1`libc_parent_atfork
    fee10090 void (*)() ul_uberdata->atforklist->child = libc.so.1`libc_child_atfork
}
> 0xfee10300::print struct atfork
{
    forw = 0xfee10080
    back = 0xfee10080
    prepare = libnsl.so.1`_libnsl_prefork
    parent = libnsl.so.1`_libnsl_parent_atfork
    child = libnsl.so.1`_libnsl_child_atfork
}

Okay, so fork isn't a problem. So let's go and see who, if anyone is
holding the lock and what the current state of is:

> dlmgmt_avl_lock::print rwlock_t
{
    readers = 0x40000000
    type = 0
    magic = 0x5257
    mutex = {
        flags = {
            flag1 = 0
            flag2 = 0
            ceiling = 0
            mbcp_type_un = {
                bcptype = 0
                mtype_rcount = {
                    count_type1 = 0
                    count_type2 = 0
                }
            }
            magic = 0x4d58
        }
        lock = {
            lock64 = {
                pad = [ 0, 0, 0, 0, 0, 0, 0, 0 ]
            }
            lock32 = {
                ownerpid = 0
                lockword = 0
            }
            owner64 = 0
        }
        data = 0
    }
    readercv = {
        flags = {
            flag = [ 0, 0, 0, 0 ]
            type = 0
            magic = 0x4356
        }
        data = 0xf0ba3a40
    }
    writercv = {
        flags = {
            flag = [ 0, 0, 0, 0 ]
            type = 0
            magic = 0x4356
        }
        data = 0
    }
}

Okay, based on the readers value we know it's write locked. And from
here, we should be able to determine whom the right owner is. And... the
owner is the the thread that's trying to fork!

> dlmgmt_avl_lock::print rwlock_t readercv.data
readercv.data = 0xf0ba3a40
> 0xf0ba3a40::print ulwp_t ul_lwpid
ul_lwpid = 0x148
> 0x148::findstack -v
stack pointer for thread 148: ea5cf5e0
[ ea5cf5e0 libc.so.1`___lwp_suspend+7() ]
  ea5cf630 libc.so.1`suspend_fork+0x97(f0ba3a40)
  ea5cf670 libc.so.1`forkx+0x121(0, 0, ea5cf760, 8057971)
  ea5cf690 libc.so.1`fork+0x1a(ea5d0558, 8059df1, 17, feec5474, 2336, ea5d0458)
  ea5cf760 dlmgmt_zfop+0xbf(ea5cfbe4, 2336, 8057c1c, ea5cf784, 806fb20, 0)
  ea5cf7b0 dlmgmt_zfopen+0xb8(ea5cfbe4, 8059fd3, 2336, ea5cffe4, ea5cfbe4, 400)
  ea5d0000 dlmgmt_process_db_onereq+0x54(806fb20, 0, 0, 0, 0, 0)
  ea5d0040 dlmgmt_process_db_req+0x4a(806fb20, 805a04d, 80593f7, 0, 0, 0)
  ea5d0480 dlmgmt_db_init+0xcc(2336, ea5d0928, 41, 400)
  ea5d0d30 dlmgmt_zone_init+0x12e(2336, ea5d0df8, ea5d0d50, 8055448, 806b5e0, ea5d0d80)
  ea5d0d50 dlmgmt_zoneboot+0x55(ea5d0df8, ea5d0d80, ea5d0dac, 0, 8073c60, 806b4d0)
  ea5d0dd0 dlmgmt_handler+0x96(0, ea5d0df8, 8, 0, 0, 8055196)
  00000000 libc.so.1`__door_return+0x3d()

Okay, so the dlmgmt_zoneboot() function grabs the table lock for
write. It then preceeds to try and fork and exec. So why can't that
thread make progress? So it's trying to suspend a thread. What thread is
it trying to suspend? Let's go back to our earlier kernel stacks.
There's one thread calling lwp_suspend. Let's see what tid it's trying
to call it on and see what that tid is on our userland process:

> ffffff4e433544c0::findstack -v
stack pointer for thread ffffff4e433544c0: ffffff01f8c3ada0
[ ffffff01f8c3ada0 _resume_from_idle+0x112() ]
  ffffff01f8c3add0 swtch+0x141()
  ffffff01f8c3ae40 cv_wait_sig+0x185(ffffff431ded80d6, ffffff42e28c6380)
  ffffff01f8c3ae80 lwp_suspend+0xa4(ffffff433448b880)
  ffffff01f8c3aeb0 syslwp_suspend+0x48(339)
  ffffff01f8c3af10 sys_syscall32+0x14a()

So we see that it's tid 0x339 and in userland, we see that it's not
having a good day:

> 0x339::findstack -v
stack pointer for thread 339: 0

So, dear kernel, who is our mystery thread?

> ::pgrep dlmgmtd | ::walk thread | ::printf "%p %x\n" kthread_t . t_tid ! grep 339
ffffff477c339880 182
ffffff433448b880 339
ffffff43339f8180 35c
ffffff43339ffc00 518
ffffff43339f63a0 595
ffffff43339f6000 615
ffffff47943394a0 64b
> ffffff433448b880::findstack -v
stack pointer for thread ffffff433448b880: ffffff02051d19e0
[ ffffff02051d19e0 _resume_from_idle+0x112() ]
  ffffff02051d1a10 swtch+0x141()
  ffffff02051d1ab0 turnstile_block+0x21a(0, 0, ffffffffc011b270, fffffffffbc08ce0, 0, 0)
  ffffff02051d1b20 rw_enter_sleep+0x19b(ffffffffc011b270, 0)
  ffffff02051d1b90 vnic_dev_delete+0x43(24490, 0, ffffffefb9d39a28)
  ffffff02051d1bd0 vnic_ioc_delete+0x28(ffffff447c0447d0, cb563d04, 100003, ffffffefb9d39a28, ffffff02051d1e58)
  ffffff02051d1c70 drv_ioctl+0x1e4(1200000000, 1710002, cb563d04, 100003, ffffffefb9d39a28, ffffff02051d1e58)
  ffffff02051d1cb0 cdev_ioctl+0x39(1200000000, 1710002, cb563d04, 100003, ffffffefb9d39a28, ffffff02051d1e58)
  ffffff02051d1d00 spec_ioctl+0x60(ffffff431dd93980, 1710002, cb563d04, 100003, ffffffefb9d39a28, ffffff02051d1e58, 0)
  ffffff02051d1d90 fop_ioctl+0x55(ffffff431dd93980, 1710002, cb563d04, 100003, ffffffefb9d39a28, ffffff02051d1e58, 0)
  ffffff02051d1eb0 ioctl+0x9b(0, 1710002, cb563d04)
  ffffff02051d1f10 sys_syscall32+0x14a()

Aha, and there we have it. There's our problem. So let's see if we can
figure out anything about our userland stack to confirm that this thread
has no locks or where it's coming from. Not surprisingly, it's a door
call:

> 0x339::tid2ulwp | ::print ulwp_t ul_startpc
ul_startpc = libc.so.1`door_create_func

So, let's go stack dumpster diving.

 
> ffffff02051d1eb0,30/nap
0xffffff02051d1eb0:             
0xffffff02051d1eb0:             0xffffff02051d1f10
0xffffff02051d1eb8:             sys_syscall32+0x14a
0xffffff02051d1ec0:             0x51d1f10       
0xffffff02051d1ec8:             0xfb800e6e      
0xffffff02051d1ed0:             0               
0xffffff02051d1ed8:             0x1710002       
0xffffff02051d1ee0:             0xcb563d04      
0xffffff02051d1ee8:             0xffffff433448b880
0xffffff02051d1ef0:             1               
0xffffff02051d1ef8:             0               
0xffffff02051d1f00:             0xffffff02051d1f10
0xffffff02051d1f08:             sys_syscall32+0xe3
0xffffff02051d1f10:             0               
0xffffff02051d1f18:             0               
0xffffff02051d1f20:             0x2316          
0xffffff02051d1f28:             0xcb563d04      
0xffffff02051d1f30:             0xfeefae1e      
0xffffff02051d1f38:             0xfed74980      
0xffffff02051d1f40:             0               
0xffffff02051d1f48:             0xffffff433448b880
0xffffff02051d1f50:             0x36            
0xffffff02051d1f58:             0x8071440       
0xffffff02051d1f60:             0xcb563d20      
0xffffff02051d1f68:             0               
0xffffff02051d1f70:             0               
0xffffff02051d1f78:             0               
0xffffff02051d1f80:             0               
0xffffff02051d1f88:             0xffffffa4f4144880
0xffffff02051d1f90:             0xffffff433448b880
0xffffff02051d1f98:             0               
0xffffff02051d1fa0:             0               
0xffffff02051d1fa8:             0x4b            
0xffffff02051d1fb0:             0x4b            
0xffffff02051d1fb8:             0               
0xffffff02051d1fc0:             0x1c3           
0xffffff02051d1fc8:             0xe             
0xffffff02051d1fd0:             sys_syscall_int+8
0xffffff02051d1fd8:             0xfef053c7      
0xffffff02051d1fe0:             0x43            
0xffffff02051d1fe8:             0x292           
0xffffff02051d1ff0:             0xcb563ce4      
0xffffff02051d1ff8:             0x4b            
mdb: failed to read data from target: no mapping for address
0xffffff02051d2000:             
mdb: failed to read data from target: no mapping for address
0xffffff02051d2000:             
mdb: failed to read data from target: no mapping for address
0xffffff02051d2000:             
mdb: failed to read data from target: no mapping for address
0xffffff02051d2000:             
mdb: failed to read data from target: no mapping for address
0xffffff02051d2000:             
mdb: failed to read data from target: no mapping for address
0xffffff02051d2000:             

If we try a few different things, we get something that looks like the
following:

> 0xcb563ce4$C            
8510c483 0xe81979c0()
> 0xcb563d20$C
cb563d50 dlmgmt_zonehalt+0xc1(cb563df8, cb563d80, cb563dac, 0, 80735f0, 8)
cb563dd0 dlmgmt_handler+0x96(0, cb563df8, 8, 0, 0, 8055196)
00000000 libc.so.1`__door_return+0x3d()
> 0xcb563d04
00024490 dlmgmtd`dlmgmt_avl_mutex()
> dlmgmt_zonehalt+0xc1::dis
dlmgmt_zonehalt+0x97:           pushl  $0x806b72c
<dlmgmtd`my_pid.8955>
dlmgmt_zonehalt+0x9c:           pushl  %eax
dlmgmt_zonehalt+0x9d:           call   -0x177c  <PLT=libc.so.1`write>
dlmgmt_zonehalt+0xa2:           movl   %ebx,(%esp)
dlmgmt_zonehalt+0xa5:           call   -0x1894  <PLT=libc.so.1`close>
dlmgmt_zonehalt+0xaa:           movl   $0x1,(%esp)
dlmgmt_zonehalt+0xb1:           call   +0x209d  <dlmgmt_table_lock>
dlmgmt_zonehalt+0xb6:           addl   $0x4,%esp
dlmgmt_zonehalt+0xb9:           pushl  0x4(%edi)
dlmgmt_zonehalt+0xbc:           call   +0x404f  <dlmgmt_db_fini>
dlmgmt_zonehalt+0xc1:           call   +0x2024  <dlmgmt_table_unlock>
dlmgmt_zonehalt+0xc6:           movl   $0x8059eaa,(%esp)
dlmgmt_zonehalt+0xcd:           call   -0x160c  <PLT=libc.so.1`unlink>
dlmgmt_zonehalt+0xd2:           addl   $0x10,%esp
dlmgmt_zonehalt+0xd5:           movl   0xc(%ebp),%eax
dlmgmt_zonehalt+0xd8:           movl   %esi,(%eax)
dlmgmt_zonehalt+0xda:           leal   -0xc(%ebp),%esp
dlmgmt_zonehalt+0xdd:           popl   %ebx
dlmgmt_zonehalt+0xde:           popl   %esi
dlmgmt_zonehalt+0xdf:           popl   %edi
dlmgmt_zonehalt+0xe0:           leave  

So this thread is trying to destroy the zone. It's not actively holding
the lock, which makes sense. We drop the lock around deletions due to
OS-3342. However, because the forking thread is trying to suspend us, we
get to an even deeper twisted web. Basically it looks like this:

Thread 0x339 is trying to delete a vnic. It holds no locks, but is
locked in the kernel. Normally this is fine. This is what OS-3342 tries
to address in particular.

The door call thread we identified was 0x2a1 which is locked trying to
get the table lock. Note it could also potentially be thread 0x149. In
this case, it doesn't really matter. Normally this wouldn't have trouble
eventually acquiring it. However, it's impossible.

The table lock is blocked on our friend, thread 0x148 which is trying to
suspend every thread and fork. It's not clear how this could generally
work given that it holds a write lock across the fork. And that is
what's going on.

Comments

Comment by Robert Mustacchi [X]
Created at 2016-06-02T18:10:56.000Z

See also OS-5441 which duplicates this.


Comment by Robert Mustacchi [X]
Created at 2016-06-30T23:05:24.000Z

We have a thread in dlmgmtd that's trying to delete a vnic. It's blocked
on the rwlock:

> ffffff453c117520::findstack -v
stack pointer for thread ffffff453c117520: ffffff01ed5b59e0
[ ffffff01ed5b59e0 _resume_from_idle+0x112() ]
  ffffff01ed5b5a10 swtch+0x141()
  ffffff01ed5b5ab0 turnstile_block+0x21a(ffffff42e2732048, 0, ffffffffc011b550, fffffffffbc08ce0, 0, 0)
  ffffff01ed5b5b20 rw_enter_sleep+0x19b(ffffffffc011b550, 0)
  ffffff01ed5b5b90 vnic_dev_delete+0x43(2c764, 0, ffffffcd24704d50)
  ffffff01ed5b5bd0 vnic_ioc_delete+0x28(ffffff5016e3ec50, 98d7dd04, 100003, ffffffcd24704d50, ffffff01ed5b5e58)
  ffffff01ed5b5c70 drv_ioctl+0x1e4(1200000000, 1710002, 98d7dd04, 100003, ffffffcd24704d50, ffffff01ed5b5e58)
  ffffff01ed5b5cb0 cdev_ioctl+0x39(1200000000, 1710002, 98d7dd04, 100003, ffffffcd24704d50, ffffff01ed5b5e58)
  ffffff01ed5b5d00 spec_ioctl+0x60(ffffff431deb2440, 1710002, 98d7dd04, 100003, ffffffcd24704d50, ffffff01ed5b5e58, 0)
  ffffff01ed5b5d90 fop_ioctl+0x55(ffffff431deb2440, 1710002, 98d7dd04, 100003, ffffffcd24704d50, ffffff01ed5b5e58, 0)
  ffffff01ed5b5eb0 ioctl+0x9b(0, 1710002, 98d7dd04)
  ffffff01ed5b5f10 sys_syscall32+0x14a()
> ffffffffc011b550::rwlock
            ADDR      OWNER/COUNT FLAGS          WAITERS
ffffffffc011b550 ffffffc2251173a0  B111 ffffff463aed4060 (W)
                                    ||| ffffff7bf9fbb420 (W)
                 WRITE_LOCKED ------+|| ffffff47c1dd1760 (W)
                 WRITE_WANTED -------+| ffffff453c117520 (W)
                  HAS_WAITERS --------+
> ffffffc2251173a0::findstack -v
stack pointer for thread ffffffc2251173a0: ffffff01f8fa3450
[ ffffff01f8fa3450 _resume_from_idle+0x112() ]
  ffffff01f8fa3490 swtch_to+0xb6(ffffff46bdf24400)
  ffffff01f8fa34e0 shuttle_resume+0x2af(ffffff46bdf24400, ffffffffc0014ed0)
  ffffff01f8fa3590 door_upcall+0x212(ffffff431deb2040, ffffff01f8fa3680, ffffff42e273ce18, ffffffffffffffff, 0)
  ffffff01f8fa3610 door_ki_upcall_limited+0x67(ffffff431ddaa7d8, ffffff01f8fa3680, ffffff42e273ce18, ffffffffffffffff, 0)
  ffffff01f8fa3660 stubs_common_code+0x59()
  ffffff01f8fa3700 i_dls_mgmt_upcall+0xbf(ffffff01f8fa3720, 8, ffffff01f8fa3730, 30)
  ffffff01f8fa37d0 dls_mgmt_get_linkinfo+0x7c(2c7bb, ffffff01f8fa3800, ffffff01f8fa37fc, 0, 0)
  ffffff01f8fa3880 dls_devnet_set+0xb0(ffffff44c05e6bd8, 2c7bb, 0, ffffff01f8fa38b0)
  ffffff01f8fa38f0 dls_devnet_create+0x4c(ffffff44c05e6bc0, 2c7bb, 0)
  ffffff01f8fa3af0 vnic_dev_create+0x6a0(2c7bb, 6, ffffff01f8fa3b54, ffffff01f8fa3b5c, ffffff01f8fa3b60, ffffff01f8fa3b58, ffffff420000000c, 
  ffffff420000054b, ffffff0100000000, ffffffff00000000, ffffff4618f59044, ffffff4200000000, ffffff01f8fa3b50, ffffffced7982400)
  ffffff01f8fa3bd0 vnic_ioc_create+0x10d(ffffff4618f59000, 80401b0, 100003, ffffffced7982400, ffffff01f8fa3e58)
  ffffff01f8fa3c70 drv_ioctl+0x1e4(1200000000, 1710001, 80401b0, 100003, ffffffced7982400, ffffff01f8fa3e58)
  ffffff01f8fa3cb0 cdev_ioctl+0x39(1200000000, 1710001, 80401b0, 100003, ffffffced7982400, ffffff01f8fa3e58)
  ffffff01f8fa3d00 spec_ioctl+0x60(ffffff431deb2440, 1710001, 80401b0, 100003, ffffffced7982400, ffffff01f8fa3e58, 0)
  ffffff01f8fa3d90 fop_ioctl+0x55(ffffff431deb2440, 1710001, 80401b0, 100003, ffffffced7982400, ffffff01f8fa3e58, 0)
  ffffff01f8fa3eb0 ioctl+0x9b(4, 1710001, 80401b0)
  ffffff01f8fa3f10 _sys_sysenter_post_swapgs+0x153()
> ffffff452f83a400::findstack -v
stack pointer for thread ffffff452f83a400: ffffff02070d1da0
[ ffffff02070d1da0 _resume_from_idle+0x112() ]
  ffffff02070d1dd0 swtch+0x141()
  ffffff02070d1e40 cv_wait_sig+0x185(ffffff431de960de, ffffff42e28c6380)
  ffffff02070d1e80 lwp_suspend+0xa4(ffffff453c117520)
  ffffff02070d1eb0 syslwp_suspend+0x48(661)
  ffffff02070d1f10 sys_syscall32+0x14a()
> ffffff453c117520::print kthread_t t_tid
t_tid = 0x661

So we're trying to suspend the thread that's trying to grab a lock that's doing
a door upcall and is likely suspended. Classic OS-5363.


Comment by Ryan Zezeski [X]
Created at 2016-11-16T23:16:15.000Z

If we could avoid the fork we could avoid this deadlock. In this particular case the fork is done to read the datalink.conf file in the zone:

[root@testsos ~]# dtrace -qn 'pid$target::zone_file_exists:entry { printf("zone_file_eixsts: %s%s\n", stringof(args[0]), stringof(args[1])); } pid$target::dlmgmt_zfopen:entry /arg2 != 0/ { printf("open %s with %s mode in zone %d\n", stringof(args[0]), stringof(args[1]), arg2); ustack(); self->t=1; } proc:::create /self->t/ { printf("dlmgmt forked\n"); } pid$target::dlmgmt_zfopen:return /self->t/ { self->t=0; }' -p 100027
zone_file_eixsts: /zones/e0b64261-4708-c1f3-971b-dd561161b3dc/root/native/etc/svc/volatile/dladm
zone_file_eixsts: /zones/e0b64261-4708-c1f3-971b-dd561161b3dc/root/etc/svc/volatile/dladm/network-datalink-management:default.cache
zone_file_eixsts: /zones/e0b64261-4708-c1f3-971b-dd561161b3dc/root/etc/dladm/datalink.conf
open /etc/dladm/datalink.conf with r mode in zone 18

              dlmgmtd`dlmgmt_zfopen
              dlmgmtd`dlmgmt_process_db_onereq+0x54
              dlmgmtd`dlmgmt_process_db_req+0x4a
              dlmgmtd`dlmgmt_db_init+0xcc
              dlmgmtd`dlmgmt_zone_init+0x12e
              dlmgmtd`dlmgmt_zoneboot+0x55
              dlmgmtd`dlmgmt_handler+0x96
              libc.so.1`__door_return+0x4b
dlmgmt forked

In our case this file is always empty since we aren't creating persistent links in a zone. Perhaps we could add a special case to the code to avoid this particular dlmgmt_zfopen(), similar to how we added the zone_file_exists() special case to avoid dlmgmt upcall/fork deadlocks:

/*
 * Attempt to mitigate one of the deadlocks in the dlmgmtd architecture.
 *
 * dlmgmt_db_init() calls dlmgmt_process_db_req() which eventually gets to
 * dlmgmt_zfop() which tries to fork, enter the zone and read the file.
 * Because of the upcall architecture of dlmgmtd this can lead to deadlock
 * with the following scenario:
 *    a) the thread preparing to fork will have acquired the malloc locks
 *       then attempt to suspend every thread in preparation to fork.
 *    b) all of the upcalls will be blocked in door_ucred() trying to malloc()
 *       and get the credentials of their caller.
 *    c) we can't suspend the in-kernel thread making the upcall.
 *
 * Thus, we cannot serve door requests because we're blocked in malloc()
 * which fork() owns, but fork() is in turn blocked on the in-kernel thread
 * making the door upcall.  This is a fundamental architectural problem with
 * any server handling upcalls and also trying to fork().
 *
 * To minimize the chance of this deadlock occuring, we check ahead of time to
 * see if the file we want to read actually exists in the zone (which it almost
 * never does), so we don't need fork in that case (i.e. rarely to never).
 */
static boolean_t
zone_file_exists(char *zoneroot, char *filename)

Comment by Bot Bot [X]
Created at 2017-01-30T17:22:30.000Z

illumos-joyent commit 33df115 (branch master, by Ryan Zezeski)

OS-5363 dlmgmtd forking with write locks results in deadlock chaos
OS-5919 remove zone lock file
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Jerry Jelinek <jerry.jelinek@joyent.com>