OS-6435: cred reference count leak leads to zone livelock

Details

Issue Type:Bug
Priority:2 - Critical
Status:Resolved
Created at:2017-11-01T18:39:38.000Z
Updated at:2023-06-09T14:25:49.589Z

People

Created by:Former user
Reported by:Former user
Assigned to:Former user

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2017-11-07T21:55:59.248Z)

Fix Versions

2017-11-09 Edge (Release Date: 2017-11-09)

Related Issues

Description

We encountered a system where a zone failed to be destroyed. From CNAPI
this was due to a task time out. If we look at the system in question,
there are some interesting things about zoneadm:

[root@RA515435 (us-sw-1) /var/adm]# ptree $(pgrep -x zoneadm)
4047  /usr/bin/ctrun -l child -o noorphan /usr/vm/sbin/vmadmd
  4048  /usr/node/bin/node --abort_on_uncaught_exception /usr/vm/sbin/vmadmd
    12224 /usr/sbin/zoneadm -u 8d5bc9a7-838c-4b7a-c7da-941ec4a23f4d boot -X
15658 /usr/node/bin/node --abort_on_uncaught_exception /usr/sbin/vmadm start 8d5bc9a7
  15743 /usr/sbin/zoneadm -u 8d5bc9a7-838c-4b7a-c7da-941ec4a23f4d boot -X
16619 /usr/node/bin/node --abort_on_uncaught_exception /usr/sbin/vmadm start 8d5bc9a7
  16699 /usr/sbin/zoneadm -u 8d5bc9a7-838c-4b7a-c7da-941ec4a23f4d boot -X
23595 /usr/node/bin/node --abort_on_uncaught_exception /usr/sbin/vmadm delete 8d5bc9a
  23603 /usr/sbin/zoneadm -u 8d5bc9a7-838c-4b7a-c7da-941ec4a23f4d halt -X
27286 /usr/node/bin/node --abort_on_uncaught_exception /usr/sbin/vmadm delete 8d5bc9a
  27340 /usr/sbin/zoneadm -u 8d5bc9a7-838c-4b7a-c7da-941ec4a23f4d halt -X
28260 /usr/node/bin/node --abort_on_uncaught_exception /usr/sbin/vmadm delete 8d5bc9a
  28265 /usr/sbin/zoneadm -u 8d5bc9a7-838c-4b7a-c7da-941ec4a23f4d halt -X
42018 /usr/node/bin/node --abort_on_uncaught_exception /usr/sbin/vmadm delete 8d5bc9a
  42023 /usr/sbin/zoneadm -u 8d5bc9a7-838c-4b7a-c7da-941ec4a23f4d halt -X
98978 /usr/node/bin/node --abort_on_uncaught_exception /usr/sbin/vmadm delete 8d5bc9a
  98996 /usr/sbin/zoneadm -u 8d5bc9a7-838c-4b7a-c7da-941ec4a23f4d halt -X
30042 /usr/node/bin/node --abort_on_uncaught_exception /usr/sbin/vmadm start f8abe08e
  30128 /usr/sbin/zoneadm -u f8abe08e-d4c0-ccf2-96f3-c0908995be72 boot -X

Note how we have a combination of zone boot and halts. Let's take 15743
for example. It's trying to halt the zone that 12224 is trying to boot.
The zoneadmd for this zone is: 12226. Let's see what that zoneadmd is
actually doing:

> 0t12226::pid2proc | ::walk thread | ::stacks
THREAD           STATE    SOBJ                COUNT
fffffea39c80c820 SLEEP    CV                      9
                 swtch+0x141
                 cv_wait_sig_swap_core+0x1b9
                 cv_wait_sig_swap+0x17
                 cv_waituntil_sig+0xbd
                 lwp_park+0x15e
                 syslwp_park+0x63

fffffeebdc7c8800 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait+0x70
                 vmem_nextfit_alloc+0x126
                 vmem_alloc+0x19e
                 id_alloc+0x1b
                 zone_create+0x102
                 zone+0x1d4

ffffff0a506e3c20 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait_sig+0x185
                 door_unref+0x94
                 doorfs+0xe5

ffffff8bb82d7040 SLEEP    SHUTTLE                 1
                 swtch+0x141
                 shuttle_swtch+0x203
                 door_return+0x214
                 doorfs+0x16e

So, interestingly, it's blocked trying to create the zone and allocate
an ID. If we look at the code, the first thing to see is how many zones
there are and if any ids are still being used by netstacks.

> ::walk zone ! wc -l
      13
> *netstack_head::list netstack_t netstack_next ! wc -l
      13

So we have equal numbers of zones and netstacks. So we're not hitting
the async netstack reference case. So, the question is where is it.
While reading through the zone_create() code, I saw an interesting thing
mentioned. That in some cases the zone is freed by a final cred
reference being freed. So with that in mind, I decided to walk the cred
cache and group the zones that exist.

> ::walk cred_cache | ::print cred_t cr_zone ! sort | uniq | wc -l
   10000

Well, that's suspicious. I next put together a list of all the zones
there by running the following command and went ahead and did some
additional analysis:

> ::walk cred_cache | ::printf "%p\n" cred_t cr_zone ! sort | uniq > /var/tmp/rm/zonelist
> ::cat /var/tmp/rm/zonelist | ::printf "%s\n" zone_t zone_name ! sort | uniq -c
mdb: failed to read pointer at 0: no mapping for address
mdb: failed to print member 'zone_name'
   1 00c5db23-bb1c-4a50-948c-362f165230dc
   1 05aa165f-9bc7-ca30-f2f3-d4e36cce97a8
   1 180bcac4-f73a-484b-ad43-a78d1476b290
   1 1b3f849d-fab4-c77b-cbf9-9555851c5377
   1 20f8c763-af35-c8d2-fb0a-e235254ae52d
   1 2a97f13a-7510-e87a-9b32-adcf9c18121a
   1 40526b4c-c796-6abd-9f52-9717a83069f0
   1 6ac565b0-75b5-cf29-e0a7-c4b747ddb2ce
   1 71a339c8-072c-4b37-8a27-bcd15e80694c
 110 8015613e-7656-651f-ace0-a6332dc5aa6d
   1 8807738d-fb0b-4aa9-969f-b79d1ad62de0
7486 8d5bc9a7-838c-4b7a-c7da-941ec4a23f4d
   1 96c824b5-0f51-47a1-96f4-d7f1ca030451
   1 d80db6ce-ab7a-465e-be9b-83413b871147
   1 dd091608-430f-c4ef-ce27-fd9dbd0be456
   1 ddf6eb2b-eb6a-6665-813c-bcf5706aa666
   1 e73c1dac-c1f8-e25c-de4f-aa6f4f7b8e24
   1 f1fa76d5-701c-ef6f-97dd-9ab059022c75
2384 f2c24edb-2a05-c6e0-b905-9b6ccaae5513
   1 f338d368-f226-eded-d2d4-c77417c2c923
   1 f9dd9f41-a675-ef76-ff0e-a5684bcae121
   1 global
> ::cat /var/tmp/rm/zonelist | ::print zone_t zone_brand ! sort | uniq -c
mdb: failed to read zone_brand pointer at 230: no mapping for address
9987 zone_brand = lx_brand
  12 zone_brand = native_brand

Now that's rather suspicious. So we had a few zones that ended up
leaking all of their connections. I also assembled a list of all the
cred_t structures that matched those three zones. With that in hand, I
started looking at the data in the cred_t. They all had the same basic
id set.

Based on other work that folks did, we were able to find that these
zones had ended up restarting quite a lot. Based on this, there seems to
be something related to the zone's coming and going that's causing us to
leak a cred_t and not find this.

Comments

Comment by Former user
Created at 2017-11-07T16:58:30.728Z

The leak is 100% reproducible for both lx and native zones. I tested the fix on my SmartOS VM. I rebooted both lx and native zones about 10times each, then shut them down. At the end there is only the global zone being referenced by creds in the cred_cache (as it should be).


Comment by Jira Bot
Created at 2017-11-07T21:43:32.683Z

illumos-joyent commit 7354012d871a98cfeba6ab962af30b16d0455e5f (branch master, by Jerry Jelinek)

OS-6435 cred reference count leak leads to zone livelock
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: Patrick Mooney <patrick.mooney@joyent.com>