Originally reported in SWSUP-1041.
This simple docker container will work correctly in a regular docker environment, but hangs under Triton:
$ docker run -ti node:8-alpine sh
echo "if (1) return" > play.js
node play.js
# Process will not exit - it is hung
Using a different node.js image (based on Ubuntu instead of Alpine) works successfully under Triton:
$ docker run -ti node:8 sh
echo "if (1) return" > play.js
node play.js
# Process exits cleanly
Reproducable in both COAL and TPC (us-sw-1) environments.
Truss executable output included in SWSUP-1041.
Former user commented on 2018-03-07T17:57:15.285-0500:
Preliminary investigation, unfortunately my kernel debugging skills are almost non-existent:
Running:
a198024f7f1c:~# uname -a
Linux a198024f7f1c 3.13.0 BrandZ virtual linux x86_64 Linux
a198024f7f1c:~# node -e ''
(it hangs here)
On the following platform:
root@headnode (coal) ~]# uname -a
SunOS headnode 5.11 joyent_20180209T180849Z i86pc i386 i86pc
[root@headnode (coal) ~]#
In the container running `node -e`:
a198024f7f1c:~# /native/usr/bin/ptree
92416 /bin/sh
93149 <defunct>
93307 <defunct>
92475 /bin/login -h zone:global -f root
92478 -ash
12841 node -e
93188 /bin/login -h zone:global -f root
93190 -ash
12903 /native/usr/bin/ptree
12897 /bin/login -h zone:global -f root
12899 -ash
Attaching to the node process with `gdb`:
a198024f7f1c:~# gdb -p 12841
GNU gdb (GDB) 7.12.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 12841
[New LWP 12843]
[New LWP 12844]
[New LWP 12845]
[New LWP 12847]
0x00007fffed053a50 in __clone () from /lib/ld-musl-x86_64.so.1
(gdb) thread apply all bt
Thread 5 (LWP 12847):
#0 0x00007fffed053a50 in __clone () from /lib/ld-musl-x86_64.so.1
#1 0x00007fffed0511c3 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x00007fffed192b30 in ?? ()
#3 0x0000000000000000 in ?? ()
Thread 4 (LWP 12845):
#0 0x00007fffed053a50 in __clone () from /lib/ld-musl-x86_64.so.1
#1 0x00007fffed0511c3 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x00007fffe9c02b30 in ?? ()
#3 0x0000000000000000 in ?? ()
Thread 3 (LWP 12844):
#0 0x00007fffed053a50 in __clone () from /lib/ld-musl-x86_64.so.1
#1 0x00007fffed0511c3 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x00007fffeaa02b30 in ?? ()
#3 0x0000000000000000 in ?? ()
Thread 2 (LWP 12843):
#0 0x00007fffed053a50 in __clone () from /lib/ld-musl-x86_64.so.1
#1 0x00007fffed0511c3 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x00007fffeb802b30 in ?? ()
#3 0x0000000000000000 in ?? ()
Thread 1 (LWP 12841):
#0 0x00007fffed053a50 in __clone () from /lib/ld-musl-x86_64.so.1
#1 0x00007fffed0511c3 in ?? () from /lib/ld-musl-x86_64.so.1
#2 0x00007fffed28bb90 in ?? () from /lib/ld-musl-x86_64.so.1
#3 0x0000000000000000 in ?? ()
(gdb) info registers
rax 0xca 202
rbx 0x7fffed28bb48 140737172257608
rcx 0x0 0
rdx 0x322b 12843
rsi 0x0 0
rdi 0x7fffeb802b20 140737144433440
rbp 0xca 0xca
rsp 0x7fffffeffad8 0x7fffffeffad8
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x246 582
r12 0x0 0
r13 0x0 0
r14 0x7fffeb802b20 140737144433440
r15 0x4 4
rip 0x7fffed053a50 0x7fffed053a50 <__clone+98>
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x4b 75
ds 0x4b 75
es 0x4b 75
fs 0x0 0
gs 0x0 0
(gdb) thread 1
[Switching to thread 1 (LWP 12841)]
#0 0x00007fffed053a50 in __clone () from /lib/ld-musl-x86_64.so.1
(gdb) disassemble
Dump of assembler code for function __clone:
0x00007fffed0539ee <+0>: xor %eax,%eax
0x00007fffed0539f0 <+2>: mov $0x38,%al
0x00007fffed0539f2 <+4>: mov %rdi,%r11
0x00007fffed0539f5 <+7>: mov %rdx,%rdi
0x00007fffed0539f8 <+10>: mov %r8,%rdx
0x00007fffed0539fb <+13>: mov %r9,%r8
0x00007fffed0539fe <+16>: mov 0x8(%rsp),%r10
0x00007fffed053a03 <+21>: mov %r11,%r9
0x00007fffed053a06 <+24>: and $0xfffffffffffffff0,%rsi
0x00007fffed053a0a <+28>: sub $0x8,%rsi
0x00007fffed053a0e <+32>: mov %rcx,(%rsi)
0x00007fffed053a11 <+35>: syscall
0x00007fffed053a13 <+37>: test %eax,%eax
0x00007fffed053a15 <+39>: jne 0x7fffed053a26 <__clone+56>
0x00007fffed053a17 <+41>: xor %ebp,%ebp
0x00007fffed053a19 <+43>: pop %rdi
0x00007fffed053a1a <+44>: callq *%r9
0x00007fffed053a1d <+47>: mov %eax,%edi
0x00007fffed053a1f <+49>: xor %eax,%eax
0x00007fffed053a21 <+51>: mov $0x3c,%al
0x00007fffed053a23 <+53>: syscall
0x00007fffed053a25 <+55>: hlt
0x00007fffed053a26 <+56>: retq
0x00007fffed053a27 <+57>: mov (%rdi),%eax
0x00007fffed053a29 <+59>: test %eax,%eax
0x00007fffed053a2b <+61>: jne 0x7fffed053a51 <__clone+99>
0x00007fffed053a2d <+63>: mov %rdi,%r11
0x00007fffed053a30 <+66>: mov %rsi,%rax
0x00007fffed053a33 <+69>: mov %rdx,%rdi
0x00007fffed053a36 <+72>: mov %rcx,%rsi
0x00007fffed053a39 <+75>: mov %r8,%rdx
0x00007fffed053a3c <+78>: mov %r9,%r10
0x00007fffed053a3f <+81>: mov 0x8(%rsp),%r8
0x00007fffed053a44 <+86>: mov 0x10(%rsp),%r9
0x00007fffed053a49 <+91>: mov %r11,0x8(%rsp)
0x00007fffed053a4e <+96>: syscall
=> 0x00007fffed053a50 <+98>: retq
0x00007fffed053a51 <+99>: jmpq 0x7fffed051133
End of assembler dump.
(gdb)
Looking at what all threads are spending their time on:
a198024f7f1c:~# /native/usr/bin/prstat -mLc 1
Please wait...
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
12940 root 18 82 0.0 0.0 0.0 0.0 0.0 0.0 0 0 25K 0 prstat/1
93307 root 7.0 58 0.0 0.0 0.0 0.0 0.0 24 0 0 9K 0 /0
12841 root 0.1 0.1 0.0 0.1 0.1 100 0.0 0.0 1 0 9 0 node/1
93149 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 1 0 /0
12899 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 12 0 busybox/1
12897 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 5 0 busybox/1
92342 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 zsched/1
93188 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 busybox/1
12841 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 node/6
12841 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 node/5
12841 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 node/4
12841 root 0.0 0.0 0.0 0.0 0.0 100 0.0 0.0 0 0 0 0 node/3
92416 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 busybox/1
92475 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 busybox/1
93190 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 busybox/1
Total: 12 processes, 14 lwps, load averages: 0.00, 0.00, 0.00
All node processes/threads are spending 100% of their time on user locks.
[root@headnode (coal) ~]# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba fctl stmf_sbd stmf zfs mm lofs mpt idm fcp sd crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs ipc ]
> 0t12841::pid2proc | ::walk thread | ::findstack -v
stack pointer for thread ffffff03e7300c40: ffffff001a53bb60
[ ffffff001a53bb60 _resume_from_idle+0x112() ]
ffffff001a53bb90 swtch+0x141()
ffffff001a53bc20 cv_wait_sig_swap_core+0x1b9(ffffff03dfc72be0, ffffffffc0167e60, 0)
ffffff001a53bc40 cv_wait_sig_swap+0x17(ffffff03dfc72be0, ffffffffc0167e60)
ffffff001a53bcd0 cv_waituntil_sig+0xbd(ffffff03dfc72be0, ffffffffc0167e60, 0, 6)
ffffff001a53bd70 futex_wait+0x19d(ffffff001a53bdd0, 7fffeb802b20, 322b, 0, ffffffff)
ffffff001a53be70 lx_futex+0x2c1(7fffeb802b20, 0, 322b, 0, 0, 0)
ffffff001a53bef0 lx_syscall_enter+0x16f()
ffffff001a53bf10 sys_syscall+0x142()
stack pointer for thread ffffff042dad2420: ffffff001a70fb60
[ ffffff001a70fb60 _resume_from_idle+0x112() ]
ffffff001a70fb90 swtch+0x141()
ffffff001a70fc20 cv_wait_sig_swap_core+0x1b9(ffffff03dfc72a20, ffffffffc0167be0, 0)
ffffff001a70fc40 cv_wait_sig_swap+0x17(ffffff03dfc72a20, ffffffffc0167be0)
ffffff001a70fcd0 cv_waituntil_sig+0xbd(ffffff03dfc72a20, ffffffffc0167be0, 0, 6)
ffffff001a70fd70 futex_wait+0x19d(ffffff001a70fdd0, 7fffeb8029e4, 2, 0, ffffffff)
ffffff001a70fe70 lx_futex+0x2c1(7fffeb8029e4, 80, 2, 0, 0, 0)
ffffff001a70fef0 lx_syscall_enter+0x16f()
ffffff001a70ff10 sys_syscall+0x142()
stack pointer for thread ffffff0465954c00: ffffff0019fe5b60
[ ffffff0019fe5b60 _resume_from_idle+0x112() ]
ffffff0019fe5b90 swtch+0x141()
ffffff0019fe5c20 cv_wait_sig_swap_core+0x1b9(ffffff04112a7a20, ffffffffc0165fe0, 0)
ffffff0019fe5c40 cv_wait_sig_swap+0x17(ffffff04112a7a20, ffffffffc0165fe0)
ffffff0019fe5cd0 cv_waituntil_sig+0xbd(ffffff04112a7a20, ffffffffc0165fe0, 0, 6)
ffffff0019fe5d70 futex_wait+0x19d(ffffff0019fe5dd0, 7fffeaa029e4, 2, 0, ffffffff)
ffffff0019fe5e70 lx_futex+0x2c1(7fffeaa029e4, 80, 2, 0, 0, 0)
ffffff0019fe5ef0 lx_syscall_enter+0x16f()
ffffff0019fe5f10 sys_syscall+0x142()
stack pointer for thread ffffff0408b457e0: ffffff0019323b60
[ ffffff0019323b60 _resume_from_idle+0x112() ]
ffffff0019323b90 swtch+0x141()
ffffff0019323c20 cv_wait_sig_swap_core+0x1b9(ffffff04112a7be0, ffffffffc01643e0, 0)
ffffff0019323c40 cv_wait_sig_swap+0x17(ffffff04112a7be0, ffffffffc01643e0)
ffffff0019323cd0 cv_waituntil_sig+0xbd(ffffff04112a7be0, ffffffffc01643e0, 0, 6)
ffffff0019323d70 futex_wait+0x19d(ffffff0019323dd0, 7fffe9c029e4, 2, 0, ffffffff)
ffffff0019323e70 lx_futex+0x2c1(7fffe9c029e4, 80, 2, 0, 0, 0)
ffffff0019323ef0 lx_syscall_enter+0x16f()
ffffff0019323f10 sys_syscall+0x142()
stack pointer for thread ffffff049e2e2b40: ffffff0019047b60
[ ffffff0019047b60 _resume_from_idle+0x112() ]
ffffff0019047b90 swtch+0x141()
ffffff0019047c20 cv_wait_sig_swap_core+0x1b9(ffffff04112a76a0, ffffffffc0161840, 0)
ffffff0019047c40 cv_wait_sig_swap+0x17(ffffff04112a76a0, ffffffffc0161840)
ffffff0019047cd0 cv_waituntil_sig+0xbd(ffffff04112a76a0, ffffffffc0161840, 0, 6)
ffffff0019047d70 futex_wait+0x19d(ffffff0019047dd0, 7fffef373ca0, ffffffff, 0, ffffffff)
ffffff0019047e70 lx_futex+0x2c1(7fffef373ca0, 80, ffffffffffffffff, 0, 0, 0)
ffffff0019047ef0 lx_syscall_enter+0x16f()
ffffff0019047f10 sys_syscall+0x142()
> 0t12841::pid2proc | ::walk thread | ::thread -b
ADDR WCHAN TS PITS SOBJ OPS
ffffff03e7300c40 ffffff03dfc72be0 ffffff05647b10d0 0 cv_sobj_ops
ffffff042dad2420 ffffff03dfc72a20 ffffff03d7f21118 0 cv_sobj_ops
ffffff0465954c00 ffffff04112a7a20 ffffff0465b0a5a8 0 cv_sobj_ops
ffffff0408b457e0 ffffff04112a7be0 ffffff049d3cd5b0 0 cv_sobj_ops
ffffff049e2e2b40 ffffff04112a76a0 ffffff04062d91d0 0 cv_sobj_ops
So it looks like all threads are waiting on a condition variable to be woken up so that they can exit, and thus none of them
will wake them up?
Former user commented on 2018-10-05T16:42:40.536-0400:
While testing a fix for OS-7288#icft=OS-7288 I hit this with these three commands running in different windows
while true; do strace -f iperf -s -B 10.88.88.43; done
ln /usr/bin/iperf /usr/bin/myperf; while true; do myperf -c 10.88.88.43 -t 1; done
while true; do sleep 0.5; pkill iperf; done
PI was gate bits from 2018-10-05 and the image was Ubuntu 14.04 7b5981c4-1889-11e7-b4c5-3f3bdfc9b88b.
Former user commented on 2018-10-08T10:30:59.661-0400:
This bug is also reproducible with an Ubuntu 14.04 lx zone on a PI built from master bits as of 2018-10-04.
> fffffe4c5d419ff8::walk thread | ::findstack -v
stack pointer for thread fffffe0bee9f3c40: fffffe0010367d20
[ fffffe0010367d20 _resume_from_idle+0x12b() ]
fffffe0010367d50 swtch+0x18a()
fffffe0010367dd0 stop+0x3d6(9, 1)
fffffe0010367e20 lx_ptrace_stop_common+0x71(fffffe4c5d419ff8, fffffe181afca800, 1)
fffffe0010367e60 lx_ptrace_stop+0x92(1)
fffffe0010367ee0 lx_syscall_enter+0x8e()
fffffe0010367f00 sys_syscall+0x242()
Reproduce with Ubuntu 14.04 lx image with:
apt install -y iperf strace ksh
ln /usr/bin/iperf /usr/bin/myperf
Three windows, each with one of:
while true; do strace -f iperf -s 10.88.88.202; done
while true; do myperf -c 10.88.88.202 -t 1; done
./killit
kilit is the following. ksh is used for floating point math.
#! /bin/ksh
while true; do
pkill strace &
pkill iperf
time=$(( $RANDOM % 1100 / 1000.0 ))
echo $time
sleep $time
done
Former user commented on 2019-11-08T12:39:57.386-0500:
Hello Mike,
It appears that another customer has run into this issue lately. ZD-229434. I was wondering if I could present them with a workaround? I opened NSUP-3 to address maybe in error? or duplicate?
If you could advise that would be great.
Jeff W
Former user commented on 2019-11-11T10:43:13.838-0500:
@Former user The issue here is an lx issue, not a node issue. If a node bug exists when running on a Linux kernel, then it may be considered a node support issue. The NSUP issue should probably be converted to a SWSUP issue.
The only workaround that I know of at this point would be to use bhyve running a real linux kernel.
Former user commented on 2020-09-28T18:47:38.935-0400 (edited 2020-09-28T18:55:00.753-0400):
During a debugging session, we ran node, then called 'exit' and compared the system call behavior of a working linux VM to that of the LX zone where it would hang (both using Alpine).
The hangs appear to be related to the POSIX condvar implementation in musl, which appears to be different than glibc's implementation, though both are based off futexes.
In the LX zone, we noticed this error:
[pid 3333] futex(0x7fffe98029b4, FUTEX_REQUEUE_PRIVATE, 0, 1, 0x12ecc) = -1 EAGAIN (Resource temporarily unavailable)
While in the working VM, we instead saw:
[pid 3333] futex(0x7f621e9809b4, FUTEX_REQUEUE_PRIVATE, 0, 1, 0x555e6cf7aecc) = 1
The futex(2) man page does not list EAGAIN as a valid error for FUTEX_REQUEUE (FUTEX_REQUEUE_PRIVATE is merely FUTEX_REQUEUE|FUTEX_PRIVATE_FLAG), however for FUTEX_CMP_REQUEUE it states that EAGAIN is returned when "The value pointed to by uaddr is not equal to the expected value val3".
Looking at the implementation in lx_futex.c, we see:
1403 case FUTEX_CMP_REQUEUE:
1404 case FUTEX_REQUEUE:
1405 rval = futex_requeue(&memid, &memid2, val,
1406 val2, (void *)addr2, &val3);
1407
1408 break;
(from lx_futex() where val3 is the last argument to the function (int val3). Note that this means that the last parameter to futex_requeue (int *cmpval) will never be NULL. In futex_requeue(), we see:
745 if (cmpval != NULL) {
746 if (fuword32(addr, (uint32_t *)&curval)) {
747 ret = -EFAULT;
748 goto out;
749 }
750 if (curval != *cmpval) {
751 ret = -EAGAIN;
752 goto out;
753 }
754 }
and this is the only location that returns EAGAIN. From this it's fairly clear the issue is that we should pass NULL for cmpval in the FUTEX_REQUEUE instance, but &val3 in the FUTEX_CMP_REQUEUE instance.
Michael Zeller commented on 2020-09-30T13:29:36.918-0400:
I ran ltp from the instructions we ship in illumos-joyent on a platform with the fix.
-----------------------------------------------
Total Tests: 1184
Total Skipped Tests: 336
Total Failures: 11
Kernel Version: 4.3.0
Machine Architecture: x86_64
Hostname: 1d9eda80-a149-45de-9681-45f9f65440af
root@1d9eda80-a149-45de-9681-45f9f65440af:/opt/ltp# grep FAIL /opt/ltp/results/LTP_RUN_ON-2020_09_30-16h_38m_56s.log
fcntl34 FAIL 2
fcntl34_64 FAIL 2
fcntl36 FAIL 2
fcntl36_64 FAIL 2
io_setup01 FAIL 1
msgctl09 FAIL 1
msgctl10 FAIL 1
msgctl11 FAIL 2
recv01 FAIL 1
recvfrom01 FAIL 1
recvmsg01 FAIL 1
The test results look as expected. Another user confirmed that node.js now exits properly on musl based distros.
Jira Bot commented on 2020-09-30T13:31:46.572-0400:
illumos-joyent commit a68418414cd381cf2cb7f9464533e83dce09beb9 (branch master, by Jason King)
OS-6580#icft=OS-6580 LX fails to handle futex_requeue correctly (#333)
Reviewed by: Dan McDonald <danmcd@joyent.com>
Reviewed by: Mike Zeller <mike.zeller@joyent.com>
Approved by: Dan McDonald <danmcd@joyent.com>