See 105c32e6a47285a85b802a0cf7bcac1c .
A simpler reproduction is:
1.) snoop -z ZONE -d net0 (Assuming net0 is a NIC in the zone)
2.) vmadm halt ZONE (which will hang)
3.) kill/exit the snoop process in #1.
The close in step 3 of the above will panic the system on the close of the /dev/net/ZONE/net0 file descriptor due to failing in the VERIFY() below:
void
dls_devnet_rele(dls_devnet_t *ddp)
{
mutex_enter(&ddp->dd_mutex);
VERIFY(ddp->dd_ref > 1);
ddp->dd_ref--;
The first step is probably to see where the zone shutdown is holding up. After that make appropriate indicators available to dls_devnet_rele() for GZ processes using dlpi_open_zone(), or having the GZ process better-able to hold a `dd_ref`.
Dan McDonald commented on 2025-03-11T17:01:25.663-0400:
The problem is that the cleanup of the dls_devnet_t at zone shutdown time CANNOT POSSIBLY KNOW how man of its dd_ref counts are tied to global-zone snoop -z <ZONE> -d <transient-zone-link processes.
To expedite zone shutdown, illumos#15167 and friends, which in turn is an upstream of various OS- tickets, notably OS-406 performs this shortcut:
/*
* Make sure downcalls into softmac_create or softmac_destroy from
* devfs don't cv_wait on any devfs related condition for fear of
* deadlock. Return EBUSY if the asynchronous thread started for
* property loading as part of the post attach hasn't yet completed.
*/
VERIFY(ddp->dd_ref != 0);
if ((ddp->dd_ref != 1) || (!wait &&
(ddp->dd_tref != 0 || ddp->dd_prop_taskid != 0))) {
int zstatus = 0;
/*
* There are a couple of alternatives that might be going on
* here; a) the zone is shutting down and it has a transient
* link assigned, in which case we want to clean it up instead
* of moving it back to the global zone, or b) its possible
* that we're trying to clean up an orphaned vnic that was
* delegated to a zone and which wasn't cleaned up properly
* when the zone went away. Check for either of these cases
* before we simply return EBUSY.
*
* zstatus indicates which situation we are dealing with:
* 0 - means return EBUSY
* 1 - means case (a), cleanup transient link
* -1 - means case (b), orphaned VNIC
*/
if (ddp->dd_ref > 1 && ddp->dd_zid != GLOBAL_ZONEID) {
zone_t *zp;
if ((zp = zone_find_by_id(ddp->dd_zid)) == NULL) {
zstatus = -1;
} else {
if (ddp->dd_transient) {
zone_status_t s = zone_status_get(zp);
if (s >= ZONE_IS_SHUTTING_DOWN)
zstatus = 1;
}
zone_rele(zp);
}
}
if (zstatus == 0) {
mutex_exit(&ddp->dd_mutex);
rw_exit(&i_dls_devnet_lock);
return (EBUSY);
}
/*
* We want to delete the link, reset ref to 1;
*/
if (zstatus == -1) {
/* Log a warning, but continue in this case */
cmn_err(CE_WARN, "clear orphaned datalink: %s\n",
ddp->dd_linkname);
}
ddp->dd_ref = 1; /* XXX KEBE ASKS HOW MANY DROPPED REFS? */
}
Now until OS-2782, the only refs that would be reduced were ones from in-zone processes, which were getting killed off anyway, or the TWO (2) references instantiated by `zoneadmd` at zone boot time.
Determining the number 2 involved some DTrace of zone booting’s use of functions in dls_mgmt.c in the kernel. It’s the only file that uses dd_ref in the aforementioned panic. It’s attached, along with an annotated dtrace(8) script output, and an mdb -kw session that demonstrates how to escape a panic if you get into that situation before any fix for this bug is attempted or in place.
Dan McDonald commented on 2025-03-11T17:14:00.424-0400:
The next big question is: How do you solve this without sabotaging OS-406 and OS-2782, AND without requiring kernel mdb hacking?
A first guess might be to be quicker to return EBUSY unless the zone is in ZONE_IS_EMPTY, which means merely changing the ZONE_IS_SHUTTING_DOWN check (or making the check > ZONE_IS_SHUTTING_DOWN). Doing that means you can guarantee the dd_ref is 2, EXCEPT when there are global zone processes using /dev/net/zone/... observability devices. The DTrace testing above was done with the ZONE_IS_EMPTY check being done (via a quick instruction hack in mdb -kw).
So IF checking for ZONE_IS_EMPTY helps and is safe (could it sabotage OS-406), the question reduces to: Can we either kill snoop -z <zone> or safely wait for it?