OS-7144: zones stuck shutting down due to zio logging

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2018-08-17T12:41:36.807Z
Updated at:2018-09-26T22:08:59.956Z

People

Created by:Former user
Reported by:Former user
Assigned to:Former user

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2018-09-24T22:27:18.678Z)

Fix Versions

2018-09-27 Buzz Aldrin (Release Date: 2018-09-27)

Description

We had 4 different zones on 2 different CNs stuck shutting down in production. I took a look and all 4 were in the same state so I'm only going to fully describe one. The zone's state was empty.

> ::zone
            ADDR   ID STATUS        NAME                PATH
fffffee60bee6540  310 empty         8a500d3f-037c-eb... /zones/8a500d3f-037c-...

The zoneadmd had one thread which was running constantly (note the accumulated cpu time) and the same thread was always onproc.

# ps -ef | grep 8a500d3f
...
    root 127032      1   2 02:31:19 ?        1182:37 zoneadmd -z 8a500d3f-037c-ebe9-9e5c-a554f35b2dbf

> 0t127032::pid2proc | ::ps -t
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R 127032      1 127032 127032      0 0x42000000 fffffe2801a680b0 zoneadmd
        T  0xfffffe23ff959c40 <TS_SLEEP>
        T  0xfffffe24979e9520 <TS_SLEEP>
        T  0xfffffe2813862760 <TS_SLEEP>
        T  0xfffffea69bc287e0 <TS_SLEEP>
        T  0xfffffe28ad1954c0 <TS_ONPROC>
        T  0xfffffedf2a164ae0 <TS_SLEEP>
        T  0xfffffef2b0438b00 <TS_SLEEP>
        T  0xfffffe28ad184460 <TS_SLEEP>
        T  0xfffffe27ec7b5080 <TS_SLEEP>
        T  0xfffffedf4955a100 <TS_SLEEP>
        T  0xfffffe2400f0bc40 <TS_SLEEP>
        T  0xfffffe281388b860 <TS_SLEEP>
        T  0xfffffe28ad19aba0 <TS_SLEEP>
        T  0xfffffe281c5a54a0 <TS_SLEEP>

The thread's stack showed nothing useful (just crediting cpu time), so I ran a little DTrace script and was able to determine that the thread was never making any syscalls but it getting a constant stream of T_AST traps, so the process is not actually stuck in the kernel, but it is stuck in user-land and constantly trapping into the kernel. Here is the user-land state:

> ::stacks
THREAD           STATE    SOBJ                COUNT
4                PARKED   MX                      9
                 libc.so.1`mutex_lock_impl+0x189
                 libc.so.1`mutex_lock+0x13
                 server+0x1bc
                 libc.so.1`__door_return+0x50

5                UNPARKED <NONE>                  1
                 libc.so.1`_ndoprnt+0x2fd2
                 libc.so.1`snprintf+0xef
                 escape_json+0x225
                 logstream_write+0xcc
                 do_zfd_io+0x283
                 srvr+0x143
                 libc.so.1`_thrp_setup+0x8a
                 libc.so.1`_lwp_start

2                UNPARKED <NONE>                  1
                 libc.so.1`_thrp_setup+0x8a
                 libc.so.1`_lwp_start

e                UNPARKED <NONE>                  1
                 libc.so.1`door_return+0xcd
                 libc.so.1`door_create_func+0x2c
                 libc.so.1`_thrp_setup+0x8a
                 libc.so.1`_lwp_start

1                UNPARKED <NONE>                  1
                 libc.so.1`poll+0x56
                 do_console_io+0xba
                 serve_console+0x197
                 main+0x9ca
                 _start_crt+0x83
                 _start+0x18

3                PARKED   MX                      1
                 libc.so.1`mutex_lock_impl+0x189
                 libc.so.1`mutex_lock+0x13
                 logstream_lock+0x22
                 logstream_atfork_prepare+9
                 libc.so.1`_prefork_handler+0x3b
                 libc.so.1`forkx+0x295
                 libc.so.1`fork+0xb
                 do_subproc+0x7b
                 brand_prestatechg+0xac
                 zone_halt+0x110
                 server+0x8de
                 libc.so.1`__door_return+0x50

Thread 5 is the thread which keeps trapping into the kernel. This thread is holding the mutex which thread 3 needs to take before it can run the brand's halt hook.

> 3::findstack -v
stack pointer for thread 3: fffffbffee6ec930
[ fffffbffee6ec930 libc.so.1`__lwp_park+0x17() ]
  fffffbffee6ec970 libc.so.1`mutex_lock_impl+0x189(4396c0, 0)
  fffffbffee6ec990 libc.so.1`mutex_lock+0x13(4396c0)
  fffffbffee6ec9a0 logstream_lock+0x22()
  fffffbffee6ec9b0 logstream_atfork_prepare+9()
  fffffbffee6ec9d0 libc.so.1`_prefork_handler+0x3b()
  fffffbffee6eca20 libc.so.1`forkx+0x295(0)
  fffffbffee6eca30 libc.so.1`fork+0xb()
  fffffbffee6eceb0 do_subproc+0x7b(fffffbffee6ee7d0, fffffbffee6ecee0, 0, 1)
  fffffbffee6ed720 brand_prestatechg+0xac(fffffbffee6ee7d0, 5, 4, 1)
  fffffbffee6ed790 zone_halt+0x110(fffffbffee6ee7d0, 0, 0, 5, 1)
  fffffbffee6ee890 server+0x8de(0, fffffbffee6ee8e0, 518, 0, 0)
  0000000000000000 libc.so.1`__door_return+0x50()
> 4396c0::print mutex_t
{
    flags = {
        flag1 = 0x4
        flag2 = 0
        ceiling = 0
        mbcp_type_un = {
            bcptype = 0x2
            mtype_rcount = {
                count_type1 = 0x2
                count_type2 = 0
            }
        }
        magic = 0x4d58
    }
    lock = {
        lock64 = {
            pad = [ 0, 0, 0, 0, 0, 0, 0x1, 0x1 ]
        }
        lock32 = {
            ownerpid = 0
            lockword = 0x1010000
        }
        owner64 = 0x101000000000000
    }
    data = 0xfffffbffef072240
}
> fffffbffef072240::print ulwp_t ul_lwpid
ul_lwpid = 0x5

I confirmed that all of the other stuck zones had this same state where a thread was blocked in the escape_json function callpath while holding the mutex and thus preventing shutdown. I was able to kill -9 the zoneadmd, then re-issue the halt command (which starts up a new zoneadmd) to get the zone's halted.

Comments

Comment by Former user
Created at 2018-08-17T15:40:39.369Z
Updated at 2018-08-17T15:45:20.578Z

It turns out there are more than 4 zones stuck in this state. In case it is helpful, I grabbed a gcore of one of the zoneadmd processes that was still stuck and uploaded it to thoth. ID fc098dfb209498d3067df44ea8162c4e.

Here is the code around where the thread is blocked (libc.so.1`_ndoprnt+0x2fd2).

libc.so.1`_ndoprnt+0x2fb4:      jmp    -0x2ab9  <libc.so.1`_ndoprnt+0x500>
libc.so.1`_ndoprnt+0x2fb9:      movq   0xffffffffffffefb0(%rbp),%rcx
libc.so.1`_ndoprnt+0x2fc0:      movq   0xffffffffffffefe8(%rbp),%rdx
libc.so.1`_ndoprnt+0x2fc7:      movq   %r15,%rsi
libc.so.1`_ndoprnt+0x2fca:      movq   %rbx,%rdi
libc.so.1`_ndoprnt+0x2fcd:      call   -0x30c2  <libc.so.1`_dowrite>
libc.so.1`_ndoprnt+0x2fd2:      testq  %rax,%rax
libc.so.1`_ndoprnt+0x2fd5:      jne    -0x2cd3  <libc.so.1`_ndoprnt+0x308>
libc.so.1`_ndoprnt+0x2fdb:      jmp    -0x2da0  <libc.so.1`_ndoprnt+0x240>
libc.so.1`_ndoprnt+0x2fe0:      movq   0xfffffffffffff010(%rbp),%rdi
libc.so.1`_ndoprnt+0x2fe7:      incq   %rdi
libc.so.1`_ndoprnt+0x2fea:      call   +0x17d1  <libc.so.1`insert_decimal_point>

Comment by Former user
Created at 2018-09-24T19:26:55.917Z
Updated at 2018-09-24T19:29:32.876Z

I was able to reproduce this by reconstructing the call to escape_json() from the core file and using it as input to a test program that uses escape_json() copied from log.c. A simplified version that does not include customer data is:

#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <sys/types.h>

static void
escape_json(const char *sbuf, int slen, char *dbuf, int dlen, int *scntp,
    int *dcntp, boolean_t *flushp)
{
    /* copy function body from usr/src/cmd/zoneadmd/log.c */
}

int
main(int argc, char **argv)
{
        int scnt = 0, dcnt = 0;
        const char *sbuf = "\001\002\003\004\005\006";
        char dbuf[12];
        int slen;

        for (slen = strlen(sbuf); slen > 0; slen -= scnt, sbuf += scnt) {
                escape_json(sbuf, slen, dbuf, sizeof (dbuf), &scnt, &dcnt,
                    NULL);
                (void) printf("<%0.*s>\n", dcnt, dbuf);
        }
}

This program will get stuck in the while loop in escape_json(). The culprit is this:

584         while (i < (dlen - 1) && sbuf <= sbuf_end) {
    ...
619                 default:
620                         if (c >= 0x20 && c < 0x7f) {
621                                 append_buf[0] = c;
622                                 append_buf[1] = '\0';
623                         } else {
624                                 len = snprintf(append_buf, sizeof (append_buf),
625                                     "\\u%04x", (int)(0xff & c));
626                                 assert(len < sizeof (append_buf));
627                         }
628                         append = append_buf;
629                         break;
630                 }
631
632                 len = strlcpy(&dbuf[i], append, dlen - i);
633                 if (len >= dlen - i) {
634                         if (flushp != NULL) {
635                                 *flushp = B_TRUE;
636                         }
637                 } else {
638                         sbuf++;
639                         i += len;
640                 }
641                 if (flushp != NULL && *flushp) {
642                         break;
643                 }
644         }

The code in lines 624 - 626 are executed, putting "\u0001" (literal backslash, "u0001" - jira may be mucking with this) into append_buf and setting len to 6. The next time around, "\u0002" is put into append_buf. Now, at line 633, we determine that dbuf is too full to accommodate the content of append_buf and we try to set the flush flag. flushp is NULL, so *flushp doesn't get set. Because of this the break at line 642 is not executed. The condition within the while statement remains true, so we try again with the same state that already failed.

To fix, a break statement needs to be inserted above line 637.


Comment by Former user
Created at 2018-09-24T20:23:05.268Z

This test program can be used to reproduce.

#include <unistd.h>

int
main(int argc, char **argv)
{
        char buf[2048];
        char c;
        int i, j;
        char *pre = "\n               ";

        for (i = 0; i < sizeof (buf); i++) {
                buf[i] = i % 8;
        }
        for (i = 1; i < 8; i++) {
                write(1, pre, i);
                write(1, buf, sizeof (buf));
                write(1, "\n", 1);
        }
}

In the global zone:

[root@buglets ~]# ./t2 > /dev/zfd/75b26a21-5276-6ec1-c7f2-a3e10d89fd3b/slave/0
[hangs]

With the fix described in the previous comment, the test program no longer hangs and the expected output is found in the zone's stdio.log.


Comment by Former user
Created at 2018-09-24T22:22:01.826Z

This problem cannot happen on the console - it requires that logstream_open() be called without the LS_LINE_BUFFERED flag.  Indeed, while trying to reproduce this bug with ./t2 >/dev/zconsole, no zoneadmd problems were detected due to flushp being non-NULL.

The log files other than console.log (stdio.log and platform.log) do not have this flag set. The test above reproduces by writing to a log fd that will find its way into stdio.log.


Comment by Jira Bot
Created at 2018-09-24T22:24:24.455Z

illumos-joyent commit 7e59066e705035a23aa219d7693f61d200ae378c (branch master, by Mike Gerdts)

OS-7144 zones stuck shutting down due to zio logging
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Jerry Jelinek <jerry.jelinek@joyent.com>


Comment by Former user
Created at 2018-09-26T22:08:59.956Z

grabbed cores of each of the stuck processes for further review. See SUP-684 for files.