OS-5252: lx brand: mremap() can fail spuriously with ENOMEM

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2016-03-16T23:42:25.000Z)

Fix Versions

2016-03-17 Kenny Boy (Release Date: 2016-03-17)

Description

A user reported that a particular git invocation on Alpine Linux failed with "out of memory". Installing and running strace showed the origins of the problem:

[pid 24185] poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=3, revents=POLLIN|POLLRDNORM}])
[pid 24185] read(3, "\27\3\3\5r", 5)    = 5
[pid 24185] read(3, "\"\250\240\345\273veQ\310\214\274\327\254Q;NSs\264\330hA\271a}\2\306\274\376\25\230\250"..., 1394) = 1394
[pid 24185] mremap(0x7ffffef00000, 258048, 389120, MREMAP_MAYMOVE) = -1 ENOMEM (Out of memory)
[pid 24185] mremap(0x7ffffef00000, 258048, 389120, MREMAP_MAYMOVE) = -1 ENOMEM (Out of memory)
[pid 24185] writev(2, [{"fatal: Out of memory, realloc fa"..., 37}, {NULL, 0}], 2fatal: Out of memory, realloc failed

This DTrace invocation was used to stop the process on its call to mremap:

dtrace -n lx-syscall::mremap:entry'{stop(); trace(pid); exit(0)}' -wq

And then this D script was used to investigate the flow from there:


#pragma D option destructive
#pragma D option flowindent

BEGIN
{
        system("prun %d", $1);
}

pid$1::lx_remap*:entry
{
        printf("%x %x %x %x %x", arg0, arg1, arg2, arg3, arg4);
}

pid$1::lx_remap*:return
{
        printf("%x %x", arg0, arg1);
}

syscall:::entry
/pid == $1/
{
        printf("%x %x %x %x %x", arg0, arg1, arg2, arg3, arg4);
}

syscall:::return
/pid == $1/
{
        printf("%x", arg1);
}

And here are the results of that:

  0  -> lx_remap                              7ffffeeed000 52000 7b000 1 0
  0    => open                                7fffff4e869e 0 0 0 a0b2c84c
  0    <= open                                4
  0    => fstat                               4 7fffff35eb10 7fffff25f85b a0b2c84c 7fffff35e600
  0    <= fstat                               0
  0    => mmap                                0 2000 3 102 ffffffff
  0    <= mmap                                7ffffeee0000
  0    => read                                4 7ffffeee0030 1cd8 7fffff25ea2a 0
  0    <= read                                1cd8
  0    => close                               4 1f 0 0 7ffffef3f000
  0    <= close                               0
  0    -> lx_remap_anon                       7ffffeee0cc8 7ffffeee0030 47 7b000 1
  0      => mmap                              7ffffef3f000 29000 7 142 ffffffff
  0      <= mmap                              7ffffeeb6000
  0      => munmap                            7ffffeeb6000 29000 52000 7ffffef3f000 ffffffff
  0      <= munmap                            0
  0      -> lx_remap_anoncache_evict          7ffffeee0cc8 29000 fffffffffbc326c0 7fffff25eaaa ffffffff
  0        -> lx_remap_anoncache_invalidate   7ffffeeed000 52000 fffffffffbc326c0 7fffff25eaaa ffffffff
  0        <- lx_remap_anoncache_invalidate   26 7fffff4fdba0
  0      <- lx_remap_anoncache_evict          31 7fffff4fdba0
  0      => mmap                              ffffffffff851000 7b000 7 142 ffffffff
  0      <= mmap                              ffffffffffffffff
  0    <- lx_remap_anon                       110 fffffffffffffff4
  0    => munmap                              7ffffeee0000 2000 7fffff360000 0 0
  0    <= munmap                              0
  0  <- lx_remap                              77 fffffffffffffff4

The problem is an entirely stupid bug (of mine) in lx_remap_anon; the code does not do what it says it does with respect to finding starting hints -- and indeed, it seems a single character became elided at some point. The diffs tell it all:

diff --git a/usr/src/lib/brand/lx/lx_brand/common/mem.c b/usr/src/lib/brand/lx/l
index 98810cd..391d1a5 100644
--- a/usr/src/lib/brand/lx/lx_brand/common/mem.c
+++ b/usr/src/lib/brand/lx/lx_brand/common/mem.c
@@ -537,21 +537,21 @@ lx_remap_anon(prmap_t *map, prmap_t *maps, int nmap,
                /*
                 * We're going to start at the bottom of the address space;
                 * once we hit an address above 2G, we'll treat that as the
                 * bottom of the top of the address space, and set our address
                 * hint below that.  To give ourselves plenty of room for
                 * further mremap() expansion, we'll multiply our new size by
                 * 16 and leave that much room between our lowest high address
                 * and our hint.
                 */
                for (i = 0; i < nmap; i++) {
-                       if (maps[i].pr_vaddr < (uintptr_t)(1 << 3UL))
+                       if (maps[i].pr_vaddr < (uintptr_t)(1 << 31UL))
                                continue;
 
                        hint = (void *)(maps[i].pr_vaddr - (new_size << 4UL));
                        break;
                }
        }
 
        if ((addr = mmap(hint, new_size, prot, mflags, -1, 0)) == (void *)-1)
                return (-errno);
 

Comments

Comment by Bot Bot [X]
Created at 2016-03-16T22:06:08.000Z

illumos-joyent commit 28e4c4c (branch master, by Bryan Cantrill)

OS-5252 lx brand: mremap() can fail spuriously with ENOMEM
    Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>