OS-6137: illumos#8149 introduced deadlock between device detach and kstat read

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2017-05-19T15:29:09.000Z
Updated at:2017-08-17T01:15:30.000Z

People

Created by:Ryan Zezeski [X]
Reported by:Ryan Zezeski [X]
Assigned to:Ryan Zezeski [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2017-06-05T16:18:13.000Z)

Fix Versions

2017-06-08 SUDAN SURPRISE (Release Date: 2017-06-08)

Related Links

Description

illumos#8149 made a change to dls_devnet_stat_upate() allowing a concurrent device detach and kstat link module read to deadlock.

First, the stacks indicative of the deadlock:

> ::stacks -m dls
THREAD           STATE    SOBJ                COUNT
ffffff0010e66c40 SLEEP    CV                      1
                 swtch+0x18a
                 cv_wait+0x89
                 kstat_hold+0x4d
                 kstat_hold_bykid+0x37
                 kstat_delete+0x52
                 dls_stat_delete+0x1e
                 dls_devnet_stat_destroy+0x59
                 dls_devnet_unset+0x263
                 dls_devnet_destroy+0x31
                 softmac_destroy+0x422
                 net_predetach+0x3a
                 dacf_op_invoke+0x26e
                 dacf_process_rsrvs+0x10b
                 dacfc_predetach+0x41
                 predetach_node+0x3b
                 i_ndi_unconfig_node+0x1eb
                 i_ddi_detachchild+0x59
                 devi_detach_node+0xac
                 unconfig_immediate_children+0x1b5
                 devi_unconfig_common+0x107
                 mt_config_thread+0x259
                 thread_start+8

ffffff03e3804bc0 SLEEP    CV                      1
                 swtch+0x18a
                 cv_wait+0x89
                 softmac_hold_device+0x203
                 stubs_common_code+0x59
                 dls_devnet_hold_common+0xc5
                 dls_devnet_hold_tmp+0x1a
                 dls_devnet_stat_update+0x2e
                 read_kstat_data+0xd0
                 kstat_ioctl+0x83
                 cdev_ioctl+0x39
                 spec_ioctl+0x60
                 fop_ioctl+0x55
                 ioctl+0x9b
                 _sys_sysenter_post_swapgs+0x241

Thread A (ffffff0010e66c40) is attempting a device detach. It has executed parts of softmac_destroy() and is now stuck in dls_devnet_unset() trying to destroy the kstat. There are two places where the kstat might be destroyed. One of them used to be inside the i_dls_devnet_lock, which I moved outside the lock as part of OS-6056. Let's make sure I didn't introduce this deadlock by examining the ddp to determine which dls_devnet_stat_destroy() was executed.

> ffffff0010e66c40::findstack -v
stack pointer for thread ffffff0010e66c40: ffffff0010e662e0
[ ffffff0010e662e0 _resume_from_idle+0x112() ]
  ffffff0010e66310 swtch+0x18a()
  ffffff0010e66340 cv_wait+0x89(ffffff14fa7e4e58, fffffffffbc29f60)
  ffffff0010e66380 kstat_hold+0x4d(fffffffffbc39fa0, ffffff0010e66390)
  ffffff0010e664b0 kstat_hold_bykid+0x37(157db, 0)
  ffffff0010e664f0 kstat_delete+0x52(ffffff14fa7e4d90)
  ffffff0010e66520 dls_stat_delete+0x1e(ffffff14fa7e4d90)
  ffffff0010e66550 dls_devnet_stat_destroy+0x59(ffffff030cfe8c48, 0)
  ffffff0010e665b0 dls_devnet_unset+0x263(ffffff030f5b29c8, ffffff0010e6663c, 0)
  ffffff0010e66610 dls_devnet_destroy+0x31(ffffff030f5b29c8, ffffff0010e6663c, 0)
  ffffff0010e66790 softmac_destroy+0x422(ffffff02ff3320a8, ba00000002)
  ffffff0010e667d0 net_predetach+0x3a(ffffff030d42a828, 0, 0)
  ffffff0010e66860 dacf_op_invoke+0x26e(ffffff0300534cf8, ffffff030d42a828, 0)
  ffffff0010e668d0 dacf_process_rsrvs+0x10b(ffffff02ff332208, 2, 1)
  ffffff0010e66910 dacfc_predetach+0x41(ffffff02ff3320a8)
  ffffff0010e66960 predetach_node+0x3b(ffffff02ff3320a8, 2000)
  ffffff0010e669c0 i_ndi_unconfig_node+0x1eb(ffffff02ff3320a8, 4, 2000)
  ffffff0010e669f0 i_ddi_detachchild+0x59(ffffff02ff3320a8, 2000)
  ffffff0010e66a60 devi_detach_node+0xac(ffffff02ff3320a8, 2000)
  ffffff0010e66af0 unconfig_immediate_children+0x1b5(ffffff02ff338610, 0, 2000, ffffffff)
  ffffff0010e66b80 devi_unconfig_common+0x107(ffffff02ff338610, 0, 2000, ffffffff, 0)
  ffffff0010e66c20 mt_config_thread+0x259(ffffff03b1a9c680)
  ffffff0010e66c30 thread_start+8()

> ffffff030cfe8c48::print dls_devnet_t
{
    dd_linkid = 0x2
    dd_linkname = [ "e1000g1" ]
    dd_mac = [ "e1000g1" ]
    dd_ksp = 0xffffff14fa7e4d90
    dd_zone_ksp = 0
    dd_ref = 0
    dd_mutex = {
        _opaque = [ 0 ]
    }
    dd_cv = {
        _opaque = 0
    }
    dd_tref = 0
    dd_flags = 0x1
    dd_owner_zid = 0
    dd_zid = 0
    dd_prop_loaded = 0x1 (B_TRUE)
    dd_prop_taskid = 0
    dd_transient = 0 (0)
}

The dd_zid is zero and dd_linkid is non-zero; therefore, we know the second one was executed. I did not introduce this deadlock in OS_6056.

	if (ddp->dd_linkid != DATALINK_INVALID_LINKID) {
		/*
		 * See the earlier call in this function for an explanation.
		 */
		mutex_exit(&ddp->dd_mutex);
		dls_devnet_stat_destroy(ddp, ddp->dd_owner_zid);
		mutex_enter(&ddp->dd_mutex);
	}

Okay, so why is this deadlock happening? Let's look at Thread B (ffffff03e3804bc0) and see what it's waiting on. If it's in the first cv_wait case it should be waiting on the smac_global_cv:

		cv_wait(&smac_global_cv, &smac_global_lock);

Is that where Thread B is waiting?

> ffffff03e3804bc0::findstack -v
stack pointer for thread ffffff03e3804bc0: ffffff00131d1710
[ ffffff00131d1710 _resume_from_idle+0x112() ]
  ffffff00131d1740 swtch+0x18a()
  ffffff00131d1770 cv_wait+0x89(ffffff030d350a90, ffffff030d350a88)
  ffffff00131d18f0 softmac_hold_device+0x203(ba00000002, ffffff00131d1958)
  ffffff00131d1940 stubs_common_code+0x59()
  ffffff00131d19b0 dls_devnet_hold_common+0xc5(2, ffffff00131d19f8, 1)
  ffffff00131d19d0 dls_devnet_hold_tmp+0x1a(2, ffffff00131d19f8)
  ffffff00131d1a20 dls_devnet_stat_update+0x2e(ffffff14fa7e4d90, 0)
  ffffff00131d1c20 read_kstat_data+0xd0(ffffff00131d1e48, 8133158, 100001)
  ffffff00131d1c60 kstat_ioctl+0x83(200000000, 4b02, 8133158, 100001, ffffff05ebc389a0, ffffff00131d1e48)
  ffffff00131d1ca0 cdev_ioctl+0x39(200000000, 4b02, 8133158, 100001, ffffff05ebc389a0, ffffff00131d1e48)
  ffffff00131d1cf0 spec_ioctl+0x60(ffffff0324586a00, 4b02, 8133158, 100001, ffffff05ebc389a0, ffffff00131d1e48, 0)
  ffffff00131d1d80 fop_ioctl+0x55(ffffff0324586a00, 4b02, 8133158, 100001, ffffff05ebc389a0, ffffff00131d1e48, 0)
  ffffff00131d1ea0 ioctl+0x9b(5, 4b02, 8133158)
  ffffff00131d1f00 _sys_sysenter_post_swapgs+0x241()


> ffffff030d350a90/P
0xffffff030d350a90:             0x30000012d

No, the address doesn't map to smac_global_cv; it must be the second call.

	while (softmac->smac_state != SOFTMAC_ATTACH_DONE)
		cv_wait(&softmac->smac_cv, &softmac->smac_mutex);

Thread B is waiting for smac_state to be SOFTMAC_ATTACH_DONE. Let's print the softmac state:

> ffffff030d350a90 - 118::print softmac_t
{
    smac_devname = [ "e1000g1" ]
    smac_umajor = 0xba
    smac_uppa = 0x1
    smac_cnt = 0x1
    smac_mutex = {
        _opaque = [ 0 ]
    }
    smac_cv = {
        _opaque = 0x12d
    }
    smac_state = 3 (SOFTMAC_DETACH_INPROG)
    smac_hold_cnt = 0x12d
    smac_flags = 0x1
    smac_attacherr = 0
    smac_mh = 0xffffff030f5b29c8
    smac_softmac = [ 0xffffff030d50a560, 0 ]
    smac_attachok_cnt = 0x1
    smac_attached_left = 0
    smac_notify_thread = 0
    smac_notify_head = 0
    smac_notify_tail = 0
    smac_media = 0x4
    smac_style = 0
    smac_dev = 0
    smac_saplen = 0
    smac_addrlen = 0
    smac_unicst_addr = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
    smac_min_sdu = 0
    smac_max_sdu = 0
    smac_margin = 0
    smac_notifications = 0
    smac_capab_flags = 0
    smac_hcksum_txflags = 0
    smac_no_capability_req = 0 (0)
    smac_mdt_capab = {
        mdt_version = 0
        mdt_flags = 0
        mdt_hdr_head = 0
        mdt_hdr_tail = 0
        mdt_max_pld = 0
        mdt_span_limit = 0
        mdt_mid = {
            mid = [ 0, 0, 0, 0 ]
        }
    }
    smac_mdt = 0 (0)
    smac_lower = 0
    smac_active_mutex = {
        _opaque = [ 0 ]
    }
    smac_active = 0 (0)
    smac_nactive = 0
    smac_fp_mutex = {
        _opaque = [ 0 ]
    }
    smac_fp_cv = {
        _opaque = 0
    }
    smac_fp_disable_clients = 0
    smac_fastpath_admin_disabled = 0 (0)
    smac_sup_list = {
        list_size = 0xe8
        list_offset = 0x30
        list_head = {
            list_next = 0xffffff030d350ba0
            list_prev = 0xffffff030d350ba0
        }
    }
}

The state is SOFTMAC_DETACH_INPROG, but the smac_hold_cnt is non-zero. Obviously, this device is not going to finish detaching until there are no more holds.

Okay, so why are we actually deadlocked? It has to do with kstats.

Thread A has marked the device as SOFTMAC_DETACH_INPROG and then called dls_devnet_unset() to destroy the device -- but it's stuck waiting to hold the kstat so it can delete it.

Thread B holds the kstat lock but is waiting to hold the device (softmac_hold_device()) before it reaches into DLS and grabs a reference to the dls_devnet_t. This call will never return because it's waiting for the softmac to be in an attached state but Thread A has it set to detached and is stuck waiting for the kstat -- deadlock!

This deadlock was introduced by illumos#8149 because it added the call to dls_devnet_hold_tmp() to dls_devnet_stat_update(). I still think illumos#8149 made the correct change, but obviously I'm the first one to test it enough on a debug kernel to find this deadlock.

I find it odd that softmac allows consumers to continue adding holds to the device after it was been marked SOFTMAC_DETACH_INPROG but this seems to have been a concious design decision. If softmac_hold_device() checked for SOFTMAC_DETACH_INPROG before granting the hold that would solve this issue but perhaps it would introduce another one. That's okay, I think there are other ways to solve this.

Comments

Comment by Ryan Zezeski [X]
Created at 2017-06-02T17:13:50.000Z

https://cr.joyent.us/#/c/2030/


Comment by Ryan Zezeski [X]
Created at 2017-06-02T17:27:32.000Z

I solved this by removing the softmac_hold_device() call from dls_devnet_hold_common() and friends. Here's a snippet of the old code:

	/*
	 * Hold this link to prevent it being detached in case of a
	 * physical link.
	 */
	if (dls_mgmt_get_phydev(linkid, &phydev) == 0)
		(void) softmac_hold_device(phydev, &ddh);

At one time this comment may have been true; that in order to grab a devnet hold we need to make sure the device isn't detached from underneath us. However, this comment no longer holds water. In order to destroy the device, softmac_destroy() has to call dls_devnet_destroy():

		/*
		 * This is the first minor node that is being detached for this
		 * softmac.
		 */
		ASSERT(softmac->smac_attachok_cnt == softmac->smac_cnt);
		if (!(smac_flags & SOFTMAC_NOSUPP)) {
			if ((err = dls_devnet_destroy(smac_mh, &linkid,
			    B_FALSE)) != 0) {
				goto error;
			}
		}

The dls_devnet_destroy() function calls dls_devnet_unset(), which makes all DLS devnet hashtable modifications under the i_dls_devnet_lock -- a global DLS rwlock. The various DLS hold routines, like dls_devnet_hold_common() also synchronize their hashtable lookup on this lock. Because dls_devnet_unset() grabs this lock as writer there is no way for the device to deatch while the hold routine is in the critical section. Just like there is no way the hold routine will see the devnet after the detach/unset exits the critical section.

In summary: either the hold routine will atomically grab a hold and prevent completion of the detach routine, or the deatch routine will get to the lock first and the hold routine will return ENOENT.


Comment by Ryan Zezeski [X]
Created at 2017-06-02T17:28:58.000Z

I tested this by loading up background kstat -p 'link:::' processes in a loop and running modunload -i 0. I used dtrace to verify that the softmac_destroy() routine was being executed when running the modunload.


Comment by Bot Bot [X]
Created at 2017-06-05T16:06:46.000Z

illumos-joyent commit c2fb439 (branch master, by Ryan Zezeski)

OS-6137 illumos#8149 introduced deadlock between device detach and kstat read
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Approved by: Bryan Cantrill <bryan@joyent.com>