Issue Type: | Bug |
---|---|
Priority: | 4 - Normal |
Status: | Resolved |
Created at: | 2021-02-12T18:21:29.640Z |
Updated at: | 2021-02-18T21:06:29.596Z |
Created by: | Michael Zeller |
---|---|
Reported by: | Michael Zeller |
Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2021-02-18T21:06:29.589Z)
2021-02-25 mark it X (Release Date: 2021-02-25)
Original issue reported here with attached PR:
https://github.com/joyent/illumos-joyent/pull/350
I've recently been troubleshooting some software which crashes in an lx zone which does not have either the zone.max-processes or zone.max-lwps resource control set. In particular, ksh and zimbra were troublesome. This turned out to be due to prlimit(RLIMIT_NPROC) (or getconf CHILD_MAX) returning a very large value, actually INT_MAX in this case. The software was trying to allocate enough memory to hold all possible child process IDs and it was failing (and did not properly handle the failed memory allocation). Looking further into it, RLIMIT_NPROC is: The maximum number of processes that can be created for the real user ID of the calling process. Upon encountering this limit, fork(2) fails with the error EAGAIN. This is pretty much analogous to the illumos v.v_maxup parameter, which is returned by the native getconf CHILD_MAX, but this is obviously capped further by any zone.max-processes rctl (either explicitly set or inferred from zone.max-lwps). To test, install and run ksh93 in an lx zone with no max-lwps or max-processes cap. It will SEGV. prlimit -u will show: root@lx:~# prlimit -u RESOURCE DESCRIPTION SOFT HARD UNITS NPROC max number of processes 2147483647 2147483647 processes and after, it will match the native limit (which is what will be applied in os/fork.c anyway) root@lx:~# prlimit -u RESOURCE DESCRIPTION SOFT HARD UNITS NPROC max number of processes 24581 24581 processes
I built a new centos-8 lx image, created a new zone and removed max-lwps and ran the same test Andy did.
Without the fix:
[root@11937bea-5b34-c292-9705-fa812a32c4b0 ~]# cat /etc/centos-release CentOS Linux release 8.3.2011 [root@11937bea-5b34-c292-9705-fa812a32c4b0 ~]# ksh93 Segmentation fault (core dumped)
With the patch applied
[root@11937bea-5b34-c292-9705-fa812a32c4b0 ~]# /native/usr/bin/uname -a SunOS 11937bea-5b34-c292-9705-fa812a32c4b0 5.11 joyent_20210213T001313Z i86pc i386 i86pc [root@11937bea-5b34-c292-9705-fa812a32c4b0 ~]# cat /etc/centos-release CentOS Linux release 8.3.2011 [root@11937bea-5b34-c292-9705-fa812a32c4b0 ~]# ksh93 # echo it works! it works!
illumos-joyent commit 007468eb6c693b6d901ecd630b8f1909e41100bf (branch master, by Andy Fiddaman)
OS-8268 Incorrect RLIMIT_NPROC in lx causing some software to fail (#350)
Reviewed by: Jason King <jbk@joyent.com>
Reviewed by: Mike Zeller <mike.zeller@joyent.com>
Approved by: Dan McDonald <danmcd@joyent.com>