OS-5637: lxbrand vdso busts golang stack budget


Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2016-09-26T18:09:13.000Z)

Fix Versions

2016-09-29 Ziggy (Release Date: 2016-09-29)

Related Issues


With your Docker environment vars set so that it points to a Triton Cloud data center:

docker run --label com.joyent.package=g4-general-4G -it holzi/quine-relay

This is the result on us-sw-1 and us-east-3b:

##  58: Go -> G-Portugol  ##

go run QR.go > QR.gpt
Segmentation fault (core dumped)
Makefile:513: recipe for target 'QR.gpt' failed
make: *** [QR.gpt] Error 139
make: *** Deleting file 'QR.gpt'


Comment by Casey Bisson
Created at 2016-08-29T15:23:31.000Z
On a subsequent run I got the following additional output:

##  58: Go -> G-Portugol  ##

go run QR.go > QR.gpt
unexpected fault address 0x7ffffe81dff8
fatal error: fault
[signal 0xb code=0x1 addr=0x7ffffe81dff8 pc=0x7fffff44d924]

goroutine 1 [running]:
        /usr/lib/go/src/pkg/runtime/panic.c:464 +0x69 fp=0x7ffffe837f88
        /usr/lib/go/src/pkg/runtime/os_linux.c:237 +0xe9 fp=0x7ffffe837fa0
----- stack segment boundary -----
runtime: unknown pc 0x7fffff44d924 after stack split
panic: invalid memory address or nil pointer dereference
panic during panic
Makefile:513: recipe for target 'QR.gpt' failed
make: *** [QR.gpt] Error 3
make: *** Deleting file 'QR.gpt'

Comment by Jerry Jelinek
Created at 2016-08-30T18:40:19.000Z
Updated at 2016-08-30T21:38:59.000Z
I wanted to see if I could focus on the go failure without having to build and run the whole quine relay in a zone on my laptop. Casey provided the QR.go program from his zone, which I have attached. However, this repeatedly builds just fine in my local Ubuntu 16.04 zone.
% go run QR.go > QR.gpt
% ls -l QR*
-rw-r--r-- 1 jerry jerry 24631 Aug 30 18:07 QR.go
-rw-rw-r-- 1 jerry jerry 24294 Aug 30 18:42 QR.gpt
FYI, this is the version of golang I have installed:
% go version
go version go1.6.2 linux/amd64

Comment by Jerry Jelinek
Created at 2016-08-30T22:18:27.000Z
As another observation, clearly Go works in the general case since containerpilot is written in Go and we run that a lot. Since the QR.go file is not inherently problematic, there must be something more obscure going on here, perhaps related to the docker image being used.

Comment by Casey Bisson
Created at 2016-08-30T22:22:07.000Z
In this image:

# go version 
go version go1.2.1 linux/amd64

The Dockerfile specifies Ubuntu 14.10: https://github.com/h0lzi/quine-relay/blob/master/Dockerfile

The image itself was built "10 months ago," https://hub.docker.com/r/holzi/quine-relay/tags/, it's using whatever was current then.

I'm calling this a regression because this same image used to work without problems.

Comment by Casey Bisson
Created at 2016-08-30T22:24:49.000Z
This can be run/tested interactively in the image:

$ docker run -it holzi/quine-relay bash
# cd /usr/local/share/quine-relay
# make CC=tcc

Comment by Jerry Jelinek
Created at 2016-08-30T22:49:16.000Z
I was able to cause a standalone test segfault on my Ubuntu 14.04 zone:
% go version
go version go1.2.1 linux/amd64
% go run QR.go >QR.gpt
Segmentation fault (core dumped)

Comment by Jerry Jelinek
Created at 2016-08-31T12:49:40.000Z
Updated at 2016-08-31T20:33:58.000Z
This always fails for me in the same way, however, when I have my lxsys DTrace enabled then it appears to slow down the process enough that it has time to install its own signal handler and not dump core. For the un-traced process here is what the core dump shows:
status: process terminated by SIGSEGV (Segmentation Fault), addr=7ffffe81dfe8
For the traced process, there is no core dump but here is the error msg:
unexpected fault address 0x7ffffe81dfe8
fatal error: fault
[signal 0xb code=0x1 addr=0x7ffffe81dfe8 pc=0x7fffff44d774]

goroutine 1 [running]:
        /usr/lib/go/src/pkg/runtime/panic.c:464 +0x69 fp=0x7ffffe837f88
        /usr/lib/go/src/pkg/runtime/os_linux.c:237 +0xe9 fp=0x7ffffe837fa0
----- stack segment boundary -----
runtime: unknown pc 0x7fffff44d774 after stack split
panic: invalid memory address or nil pointer dereference
panic during panic
Note that the address (7ffffe81dfe8) is always the same, so we know this is a deterministic problem in the code.

Comment by Jerry Jelinek
Created at 2016-08-31T20:31:22.000Z
Updated at 2016-09-01T11:50:06.000Z
I did more tracing and debugging on this. I am able to sometimes get an lxsyscall trace and a core dump on the same run. Looking at the mmap syscalls and a pmap from the core, it looks like this is the old Go issue where it it simply grows off the top of its (small) stack. I captured the user-level R_PC and R_SP values, and a ustack, when the process sends the segv:
golang-go 109646 1 send signal: 11 to: golang-go pid 109646, tid 1
send signal regs: 7fffff44d7b6 7ffffe81dff0

We can see the stack pointer is 0x7ffffe81dff0. I also disassembled the code at 0x7fffff44d7b6:
0x7fffff44d7b6:                 call   -0x11b   <0x7fffff44d6a0>
Here is the relevant mmap that setup the thread stack:
golang-go 109646 1 <- mmap(0x0 131072 3 34 -1 0x0)=0x7ffffe81e000 end 0x7ffffe83e000
Here is the relevant part of the pmap with my annotations:
00007FFFFE7FF000          8K rw---    [ anon ]
                                                <----- 7ffffe81dfe8 segv address
                                                <----- 7ffffe81dff0 R_SP
00007FFFFE81E000          8K rw---    [ anon ]  <----- base of mapping
00007FFFFE837000          4K rw---    [ anon ]
00007FFFFE839000          4K rw---    [ anon ]
00007FFFFE83B000          4K rw---    [ anon ]
00007FFFFE83D000          4K rw---    [ anon ]
                                                <---- end of mapping 7ffffe83e000
00007FFFFE83F000          4K rw---    [ anon ]
Because this this used to work, I decided to bisect the builds to see which change causes this failure. I tracked this down to
master-20160520T195114Z    works
master-20160521T023830Z    fails
All of the builds from master-20160521T023830Z in my bisection failed. All of the builds up to master-20160520T195114Z in my bisection worked. Looking at the build log for master-20160521T023830Z we see this build was for git commit e6e2cd9acea32fbe5d71f93e2ca4b306cafcd46e. This commit is:
commit e6e2cd9acea32fbe5d71f93e2ca4b306cafcd46e
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   Fri Apr 22 21:58:46 2016 +0000

    OS-5322 lxbrand use comm page for vdso
I am not certain if this commit is actually the root cause or we break just because the commit enables the use of the vdso for lx. It seems possible that there is a bad interaction introduced with the original vdso that might be causing the segv for Go.

Note that this problem is not unique to the quine relay input program. I hit the same segv running a Go "hello world" program and at the same address (7ffffe81dfe8).

Comment by Jerry Jelinek
Created at 2016-08-31T21:21:39.000Z
Updated at 2016-09-01T11:51:14.000Z
I added a stop to my Dtrace script when Go sends itself a segv. This allows me to see a useful stack:
golang-go 106160 1 send signal: 11 to: golang-go pid 106160, tid 1
send signal regs: 7fffff44d7b6 7ffffe81dff0

stop 106160
It looks like the vdso is using the thread's stack here and since we're right at the top of the stack we grow into unmapped memory and the Go segmented stack code never gets a chance to handle that.

My first thought for an approach to fix this is that we should inline all of the code in the vdso so we don't make function calls and use the caller's stack. We may also need to rewrite this in assembly since there are some stack variables in this code. I want to run all of this by Patrick first to get his input.

Comment by Patrick Mooney
Created at 2016-09-06T18:58:21.000Z
It appears we have a budget of 104 bytes:

Comment by Patrick Mooney
Created at 2016-09-12T19:44:02.000Z
Updated at 2016-09-13T18:09:54.000Z
Perf testing results on an emy lab machine.

#include <stdio.h>
#include <unistd.h>
#include <malloc.h>
#include <time.h>
#include <strings.h>

#define NANOSEC         1000000000LL
#define TESTSAMPLES     1000000

void difftotal(struct timespec s, struct timespec e)
        long long total = 0;
        if (e.tv_nsec > s.tv_nsec) {
                e.tv_nsec += NANOSEC;
        total = e.tv_nsec - s.tv_nsec;
        total += NANOSEC * (e.tv_sec - s.tv_sec);

        fprintf(stderr, "total: %lld per: %lld\n", total, total/TESTSAMPLES);

int main(int argc, char* argv[])
        int i;
        struct timespec *tp, start, end;

        tp = malloc(sizeof (struct timespec) * TESTSAMPLES);
        bzero(tp, sizeof (struct timespec) * TESTSAMPLES);

        clock_gettime(TESTTYPE, &tp[0]);
        clock_gettime(CLOCK_MONOTONIC, &start);
        for (i = 0; i < TESTSAMPLES; i++) {
                //gettimeofday(&tp[i], NULL);
                clock_gettime(TESTTYPE, &tp[i]);
                //cp_clock_gettime(TESTTYPE, &tp[i]);
        clock_gettime(CLOCK_MONOTONIC, &end);

        for (i = 1; i < TESTSAMPLES; i++) {
                time_t diff = tp[i].tv_nsec - tp[i-1].tv_nsec;
                if (tp[i].tv_sec > tp[i-1].tv_sec) {
                        diff += NANOSEC;
                printf("%ld\n", diff);
        difftotal(start, end);

total: 41176132 per: 41
total: 36938047 per: 36
total: 36048066 per: 36
total: 41159172 per: 41

total: 45641602 per: 45
total: 40474166 per: 40
total: 43831161 per: 43
total: 39585725 per: 39

total: 41449842 per: 41
total: 35745746 per: 35
total: 35870066 per: 35
total: 35846523 per: 35

total: 45449788 per: 45
total: 39125847 per: 39
total: 45475065 per: 45
total: 38987713 per: 38

Native CLOCK_REALTIME (32bit) Before:
total: 55594570 per: 55
total: 62064563 per: 62
total: 55534618 per: 55
total: 61911659 per: 61

Native CLOCK_MONOTONIC (32bit) Before:
total: 64365408 per: 64
total: 72379791 per: 72
total: 63818189 per: 63
total: 63693263 per: 63

total: 28642299 per: 28
total: 37405519 per: 37
total: 36490181 per: 36
total: 32927557 per: 32

total: 32451052 per: 32
total: 28566503 per: 28
total: 32589002 per: 32
total: 28737142 per: 28

total: 29815596 per: 29
total: 34286356 per: 34
total: 30476905 per: 30
total: 30259904 per: 30

total: 34180521 per: 34
total: 34207211 per: 34
total: 30454756 per: 30
total: 29364310 per: 29

Native CLOCK_REALTIME (32bit) After:
total: 57473238 per: 57
total: 52141892 per: 52
total: 51347860 per: 51
total: 58459890 per: 58

Native CLOCK_MONOTONIC (32bit) After:
total: 68183314 per: 68
total: 60859710 per: 60
total: 68229590 per: 68
total: 60670139 per: 60

Comment by Patrick Mooney
Created at 2016-09-12T19:51:19.000Z
Interestingly, this machine has registered a tsc_tick_delta:
> tsc_sync_tick_delta::print -l 20

0x54d75a0ed, 0x5514ec354, 0x555261a46, 0x558feeecc, 0x55cd79f91, 0x560aff943, 0x56488a9f7, 0x5685ffc0e, 0x56c7a22d7,
0x57053bb68, 0x5742ecad0, 0x5780878f9, 0x57be6adaf, 0x57fc186a7, 0x5839c5875, 0x5877872f3, 0x58b50b7b6, 0x58f398015,
0x593228061, 0x59715f284, 0x59af917da, 0x59ed637ab, 0x5a2c990c0, ... ]

This may be an anomaly in the sync detection routine.

Comment by Patrick Mooney
Created at 2016-09-13T18:10:41.000Z
The TSC offset is due to the sandy bridge errata handling routine which resets the TSCs during startup.

Comment by Patrick Mooney
Created at 2016-09-23T18:14:32.000Z
I wrote up a simple stress-test program:
#include <stdio.h>
#include <unistd.h>
#include <malloc.h>
#include <sys/time.h>
#include <strings.h>
#include <stdlib.h>

#define TESTSAMPLES     10000000

        printf("usage: cpstress <clocktype>\nWhere clocktype = real|mono\n");
        /* NOTREACHED */
        return (1);

int main(int argc, char* argv[])
        int i, type;
        const char *name = NULL;

        if (argc != 2) {
                return (usage());
        if (strcmp(argv[1], "real") == 0) {
                type = CLOCK_REALTIME;
                name = "CLOCK_REALTIME";
        } else if (strcmp(argv[1], "mono") == 0) {
                type = CLOCK_MONOTONIC;
                name = "CLOCK_MONOTONIC";
        } else {
                return (usage());

        while (1) {
                struct timespec start, end, temp;
                time_t diff;

                clock_gettime(type, &start);
                for (i = 0; i < TESTSAMPLES; i++) {
                        clock_gettime(type, &temp);
                clock_gettime(type, &end);

                if (end.tv_nsec < start.tv_nsec) {
                        end.tv_nsec += NANOSEC;
                diff = end.tv_nsec - start.tv_nsec;
                diff += (end.tv_sec - start.tv_sec) * NANOSEC;

                clock_gettime(CLOCK_REALTIME, &temp);
                printf("%ld.%03ld %s %ldns for %ld samples\n",
                     temp.tv_sec, temp.tv_nsec / MICROSEC,
                     name, diff / TESTSAMPLES, TESTSAMPLES);

Comment by Bot Bot
Created at 2016-09-23T23:27:45.000Z

illumos-joyent commit d21e830 (branch master, by Patrick Mooney)

OS-5637 lxbrand vdso busts golang stack budget
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Ryan Zezeski <rpz@joyent.com>
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Approved by: Jerry Jelinek <jerry.jelinek@joyent.com>