Issue Type: | Bug |
---|---|
Priority: | 4 - Normal |
Status: | Resolved |
Created at: | 2018-08-17T12:41:36.807Z |
Updated at: | 2018-09-26T22:08:59.956Z |
Created by: | Former user |
---|---|
Reported by: | Former user |
Assigned to: | Former user |
Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-09-24T22:27:18.678Z)
2018-09-27 Buzz Aldrin (Release Date: 2018-09-27)
We had 4 different zones on 2 different CNs stuck shutting down in production. I took a look and all 4 were in the same state so I'm only going to fully describe one. The zone's state was empty
.
> ::zone ADDR ID STATUS NAME PATH fffffee60bee6540 310 empty 8a500d3f-037c-eb... /zones/8a500d3f-037c-...
The zoneadmd
had one thread which was running constantly (note the accumulated cpu time) and the same thread was always onproc
.
# ps -ef | grep 8a500d3f ... root 127032 1 2 02:31:19 ? 1182:37 zoneadmd -z 8a500d3f-037c-ebe9-9e5c-a554f35b2dbf > 0t127032::pid2proc | ::ps -t S PID PPID PGID SID UID FLAGS ADDR NAME R 127032 1 127032 127032 0 0x42000000 fffffe2801a680b0 zoneadmd T 0xfffffe23ff959c40 <TS_SLEEP> T 0xfffffe24979e9520 <TS_SLEEP> T 0xfffffe2813862760 <TS_SLEEP> T 0xfffffea69bc287e0 <TS_SLEEP> T 0xfffffe28ad1954c0 <TS_ONPROC> T 0xfffffedf2a164ae0 <TS_SLEEP> T 0xfffffef2b0438b00 <TS_SLEEP> T 0xfffffe28ad184460 <TS_SLEEP> T 0xfffffe27ec7b5080 <TS_SLEEP> T 0xfffffedf4955a100 <TS_SLEEP> T 0xfffffe2400f0bc40 <TS_SLEEP> T 0xfffffe281388b860 <TS_SLEEP> T 0xfffffe28ad19aba0 <TS_SLEEP> T 0xfffffe281c5a54a0 <TS_SLEEP>
The thread's stack showed nothing useful (just crediting cpu time), so I ran a little DTrace script and was able to determine that the thread was never making any syscalls but it getting a constant stream of T_AST
traps, so the process is not actually stuck in the kernel, but it is stuck in user-land and constantly trapping into the kernel. Here is the user-land state:
> ::stacks THREAD STATE SOBJ COUNT 4 PARKED MX 9 libc.so.1`mutex_lock_impl+0x189 libc.so.1`mutex_lock+0x13 server+0x1bc libc.so.1`__door_return+0x50 5 UNPARKED <NONE> 1 libc.so.1`_ndoprnt+0x2fd2 libc.so.1`snprintf+0xef escape_json+0x225 logstream_write+0xcc do_zfd_io+0x283 srvr+0x143 libc.so.1`_thrp_setup+0x8a libc.so.1`_lwp_start 2 UNPARKED <NONE> 1 libc.so.1`_thrp_setup+0x8a libc.so.1`_lwp_start e UNPARKED <NONE> 1 libc.so.1`door_return+0xcd libc.so.1`door_create_func+0x2c libc.so.1`_thrp_setup+0x8a libc.so.1`_lwp_start 1 UNPARKED <NONE> 1 libc.so.1`poll+0x56 do_console_io+0xba serve_console+0x197 main+0x9ca _start_crt+0x83 _start+0x18 3 PARKED MX 1 libc.so.1`mutex_lock_impl+0x189 libc.so.1`mutex_lock+0x13 logstream_lock+0x22 logstream_atfork_prepare+9 libc.so.1`_prefork_handler+0x3b libc.so.1`forkx+0x295 libc.so.1`fork+0xb do_subproc+0x7b brand_prestatechg+0xac zone_halt+0x110 server+0x8de libc.so.1`__door_return+0x50
Thread 5 is the thread which keeps trapping into the kernel. This thread is holding the mutex which thread 3 needs to take before it can run the brand's halt hook.
> 3::findstack -v stack pointer for thread 3: fffffbffee6ec930 [ fffffbffee6ec930 libc.so.1`__lwp_park+0x17() ] fffffbffee6ec970 libc.so.1`mutex_lock_impl+0x189(4396c0, 0) fffffbffee6ec990 libc.so.1`mutex_lock+0x13(4396c0) fffffbffee6ec9a0 logstream_lock+0x22() fffffbffee6ec9b0 logstream_atfork_prepare+9() fffffbffee6ec9d0 libc.so.1`_prefork_handler+0x3b() fffffbffee6eca20 libc.so.1`forkx+0x295(0) fffffbffee6eca30 libc.so.1`fork+0xb() fffffbffee6eceb0 do_subproc+0x7b(fffffbffee6ee7d0, fffffbffee6ecee0, 0, 1) fffffbffee6ed720 brand_prestatechg+0xac(fffffbffee6ee7d0, 5, 4, 1) fffffbffee6ed790 zone_halt+0x110(fffffbffee6ee7d0, 0, 0, 5, 1) fffffbffee6ee890 server+0x8de(0, fffffbffee6ee8e0, 518, 0, 0) 0000000000000000 libc.so.1`__door_return+0x50() > 4396c0::print mutex_t { flags = { flag1 = 0x4 flag2 = 0 ceiling = 0 mbcp_type_un = { bcptype = 0x2 mtype_rcount = { count_type1 = 0x2 count_type2 = 0 } } magic = 0x4d58 } lock = { lock64 = { pad = [ 0, 0, 0, 0, 0, 0, 0x1, 0x1 ] } lock32 = { ownerpid = 0 lockword = 0x1010000 } owner64 = 0x101000000000000 } data = 0xfffffbffef072240 } > fffffbffef072240::print ulwp_t ul_lwpid ul_lwpid = 0x5
I confirmed that all of the other stuck zones had this same state where a thread was blocked in the escape_json
function callpath while holding the mutex and thus preventing shutdown. I was able to kill -9
the zoneadmd, then re-issue the halt
command (which starts up a new zoneadmd) to get the zone's halted.
It turns out there are more than 4 zones stuck in this state. In case it is helpful, I grabbed a gcore of one of the zoneadmd
processes that was still stuck and uploaded it to thoth. ID fc098dfb209498d3067df44ea8162c4e.
Here is the code around where the thread is blocked (libc.so.1`_ndoprnt+0x2fd2
).
libc.so.1`_ndoprnt+0x2fb4: jmp -0x2ab9 <libc.so.1`_ndoprnt+0x500> libc.so.1`_ndoprnt+0x2fb9: movq 0xffffffffffffefb0(%rbp),%rcx libc.so.1`_ndoprnt+0x2fc0: movq 0xffffffffffffefe8(%rbp),%rdx libc.so.1`_ndoprnt+0x2fc7: movq %r15,%rsi libc.so.1`_ndoprnt+0x2fca: movq %rbx,%rdi libc.so.1`_ndoprnt+0x2fcd: call -0x30c2 <libc.so.1`_dowrite> libc.so.1`_ndoprnt+0x2fd2: testq %rax,%rax libc.so.1`_ndoprnt+0x2fd5: jne -0x2cd3 <libc.so.1`_ndoprnt+0x308> libc.so.1`_ndoprnt+0x2fdb: jmp -0x2da0 <libc.so.1`_ndoprnt+0x240> libc.so.1`_ndoprnt+0x2fe0: movq 0xfffffffffffff010(%rbp),%rdi libc.so.1`_ndoprnt+0x2fe7: incq %rdi libc.so.1`_ndoprnt+0x2fea: call +0x17d1 <libc.so.1`insert_decimal_point>
I was able to reproduce this by reconstructing the call to escape_json()
from the core file and using it as input to a test program that uses escape_json()
copied from log.c
. A simplified version that does not include customer data is:
#include <stdio.h> #include <string.h> #include <assert.h> #include <sys/types.h> static void escape_json(const char *sbuf, int slen, char *dbuf, int dlen, int *scntp, int *dcntp, boolean_t *flushp) { /* copy function body from usr/src/cmd/zoneadmd/log.c */ } int main(int argc, char **argv) { int scnt = 0, dcnt = 0; const char *sbuf = "\001\002\003\004\005\006"; char dbuf[12]; int slen; for (slen = strlen(sbuf); slen > 0; slen -= scnt, sbuf += scnt) { escape_json(sbuf, slen, dbuf, sizeof (dbuf), &scnt, &dcnt, NULL); (void) printf("<%0.*s>\n", dcnt, dbuf); } }
This program will get stuck in the while
loop in escape_json()
. The culprit is this:
584 while (i < (dlen - 1) && sbuf <= sbuf_end) { ... 619 default: 620 if (c >= 0x20 && c < 0x7f) { 621 append_buf[0] = c; 622 append_buf[1] = '\0'; 623 } else { 624 len = snprintf(append_buf, sizeof (append_buf), 625 "\\u%04x", (int)(0xff & c)); 626 assert(len < sizeof (append_buf)); 627 } 628 append = append_buf; 629 break; 630 } 631 632 len = strlcpy(&dbuf[i], append, dlen - i); 633 if (len >= dlen - i) { 634 if (flushp != NULL) { 635 *flushp = B_TRUE; 636 } 637 } else { 638 sbuf++; 639 i += len; 640 } 641 if (flushp != NULL && *flushp) { 642 break; 643 } 644 }
The code in lines 624 - 626 are executed, putting "\u0001" (literal backslash, "u0001" - jira may be mucking with this) into append_buf
and setting len
to 6. The next time around, "\u0002" is put into append_buf
. Now, at line 633, we determine that dbuf
is too full to accommodate the content of append_buf
and we try to set the flush flag. flushp
is NULL, so *flushp
doesn't get set. Because of this the break
at line 642 is not executed. The condition within the while
statement remains true, so we try again with the same state that already failed.
To fix, a break
statement needs to be inserted above line 637.
This test program can be used to reproduce.
#include <unistd.h> int main(int argc, char **argv) { char buf[2048]; char c; int i, j; char *pre = "\n "; for (i = 0; i < sizeof (buf); i++) { buf[i] = i % 8; } for (i = 1; i < 8; i++) { write(1, pre, i); write(1, buf, sizeof (buf)); write(1, "\n", 1); } }
In the global zone:
[root@buglets ~]# ./t2 > /dev/zfd/75b26a21-5276-6ec1-c7f2-a3e10d89fd3b/slave/0 [hangs]
With the fix described in the previous comment, the test program no longer hangs and the expected output is found in the zone's stdio.log
.
This problem cannot happen on the console - it requires that logstream_open()
be called without the LS_LINE_BUFFERED
flag. Indeed, while trying to reproduce this bug with ./t2 >/dev/zconsole
, no zoneadmd problems were detected due to flushp
being non-NULL.
The log files other than console.log
(stdio.log
and platform.log
) do not have this flag set. The test above reproduces by writing to a log fd that will find its way into stdio.log
.
illumos-joyent commit 7e59066e705035a23aa219d7693f61d200ae378c (branch master, by Mike Gerdts)
OS-7144 zones stuck shutting down due to zio logging
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Jerry Jelinek <jerry.jelinek@joyent.com>