OS-8200: LX: serialise parallel vfork()/clone(LX_CLONE_VFORK) invocations

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2020-08-12T02:30:45.910Z
Updated at:2020-08-12T02:33:32.580Z

People

Created by:Former user
Reported by:Former user

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2020-08-12T02:33:32.562Z)

Fix Versions

2020-08-13 Yada Yada Yada (Release Date: 2020-08-13)

Labels

lxbrand

Description

From Andy Fiddaman:

I've been tracking down an intermittent problem with running Zimbra in lx zones since early 2018. It manifests with Zimbra in one of two ways and has been reported on both OmniOS and SmartOS.

  1. A defunct child of sudo, with sudo consuming significant CPU time.
19253  zsched
  19416  systemd
    19808  /opt/zimbra/common/lib/jvm/java/bin/java -XX:ErrorFile=/opt/zimbra/l
      19947  /bin/bash /opt/zimbra/bin/postconf -X policy_time_limit
        19950  sudo /opt/zimbra/common/sbin/postconf -X policy_time_limit
          19952  <defunct>
  1. A defunct child of java, with java consuming significant CPU time.
 
    27181  /opt/zimbra/common/lib/jvm/java/bin/java -XX:ErrorFile=/opt/zimbra/log -client
      27309  <defunct>

By significant, the increased CPU activity is enough to make one of the reporting users detect it via his UPS power reading..

 
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     29301  0.0  0.0      0     0 pts/3    Z    10:03   0:00 [postconf] <defunct>
root     29299 64.3  0.0  50096  3988 pts/3    R    10:03   1:42 sudo /opt/zimbra/common/sbin/postconf -e postscreen_dnsbl_min_ttl=60s
zimbra   29289  0.0  0.0  12624  3580 pts/3    S    10:03   0:00 /bin/bash /opt/zimbra/bin/postconf -e postscreen_dnsbl_min_ttl=60s

To cut a long debugging story short, the processes are stuck looping in poll_common(), called from the lx poll system call. They are looping there because they have all signals blocked via lx_proc_data_t->l_block_all_signals
(see https://smartos.org/bugview/OS-6467 ) and this causes cv_timedwait_sig_hrtime() to return immediately when there is a signal pending, such as the SIGCHLD from the defunct process.

I eventually tracked this down to a problem with vfork(), which openjdk uses when built for Linux. In Linux, it is safe to call vfork() in parallel in a multi-threaded application, whilst in illumos it is not. With the java components in Zimbra, vfork() gets called from different threads. When this happens in parallel, either the vfork child gets set running with signals blocked (the first failure case above) or the parent never gets a response to one of the vfork calls and therefore signals remain blocked (the second case).

The attached file, vfork.c is a test program that can easily trigger the problem. It takes a single argument which is the number of threads to run in parallel. Running with just 2 will usually show up the second issue, 3 will show up the first too. fork.d is a D script to watch what's happening. It's a race so it can be made more likely by capping the zone CPUs to 1. When running with LX_DEBUG=1 it almost never shows up.

vfork.c.txt
fork.d.txt

With this fix, I've been unable to reproduce the original faults over 200 lx zone boots in an lx zone with CPUs capped to 1 - previously about 1 in 5 boots would trigger the problem. The vfork test case runs fine with a large number of threads (tested up to 300).

dtrace shows that the solution using yield() works well with threads being punted only once or twice before successfully calling vfork when there is contention.

Comments

Comment by Jira Bot
Created at 2020-08-12T02:33:23.978Z

illumos-joyent commit 1fc24ed72de4ac8eab15db9b0af90b3e9654262b (branch master, by Andy Fiddaman)

OS-8200 LX: serialise parallel vfork()/clone(LX_CLONE_VFORK) invocations (#321)

Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Mike Zeller <mike.zeller@joyent.com>
Approved by: Jason King <jason.king@joyent.com>