OS-8268: Incorrect RLIMIT_NPROC in lx causing some software to fail

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2021-02-12T18:21:29.640Z
Updated at:2021-02-18T21:06:29.596Z

People

Created by:Michael Zeller
Reported by:Michael Zeller

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2021-02-18T21:06:29.589Z)

Fix Versions

2021-02-25 mark it X (Release Date: 2021-02-25)

Description

Original issue reported here with attached PR:
https://github.com/joyent/illumos-joyent/pull/350

I've recently been troubleshooting some software which crashes in an lx zone which does not have either the zone.max-processes or zone.max-lwps resource control set. In particular, ksh and zimbra were troublesome.

This turned out to be due to prlimit(RLIMIT_NPROC) (or getconf CHILD_MAX) returning a very large value, actually INT_MAX in this case.

The software was trying to allocate enough memory to hold all possible child process IDs and it was failing (and did not properly handle the failed memory allocation).

Looking further into it, RLIMIT_NPROC is:

The maximum number of processes that can be created for the real user ID of the calling process.
Upon encountering this limit, fork(2) fails with the error EAGAIN.
This is pretty much analogous to the illumos v.v_maxup parameter, which is returned by the native getconf CHILD_MAX, but this is obviously capped further by any zone.max-processes rctl (either explicitly set or inferred from zone.max-lwps).

To test, install and run ksh93 in an lx zone with no max-lwps or max-processes cap. It will SEGV.
prlimit -u will show:

root@lx:~# prlimit -u
RESOURCE DESCRIPTION                   SOFT       HARD UNITS
NPROC    max number of processes 2147483647 2147483647 processes
and after, it will match the native limit (which is what will be applied in os/fork.c anyway)

root@lx:~# prlimit -u
RESOURCE DESCRIPTION              SOFT  HARD UNITS
NPROC    max number of processes 24581 24581 processes

Comments

Comment by Michael Zeller
Created at 2021-02-18T20:59:56.371Z

I built a new centos-8 lx image, created a new zone and removed max-lwps and ran the same test Andy did.

Without the fix:

[root@11937bea-5b34-c292-9705-fa812a32c4b0 ~]# cat /etc/centos-release
CentOS Linux release 8.3.2011
[root@11937bea-5b34-c292-9705-fa812a32c4b0 ~]# ksh93
Segmentation fault (core dumped)

With the patch applied

[root@11937bea-5b34-c292-9705-fa812a32c4b0 ~]# /native/usr/bin/uname -a
SunOS 11937bea-5b34-c292-9705-fa812a32c4b0 5.11 joyent_20210213T001313Z i86pc i386 i86pc
[root@11937bea-5b34-c292-9705-fa812a32c4b0 ~]# cat /etc/centos-release
CentOS Linux release 8.3.2011
[root@11937bea-5b34-c292-9705-fa812a32c4b0 ~]# ksh93
# echo it works!
it works!

Comment by Jira Bot
Created at 2021-02-18T21:05:19.933Z

illumos-joyent commit 007468eb6c693b6d901ecd630b8f1909e41100bf (branch master, by Andy Fiddaman)

OS-8268 Incorrect RLIMIT_NPROC in lx causing some software to fail (#350)

Reviewed by: Jason King <jbk@joyent.com>
Reviewed by: Mike Zeller <mike.zeller@joyent.com>
Approved by: Dan McDonald <danmcd@joyent.com>