Issue Type: | Bug |
---|---|
Priority: | 4 - Normal |
Status: | Resolved |
Created at: | 2019-01-11T20:46:47.922Z |
Updated at: | 2019-08-06T20:45:20.433Z |
Created by: | Former user |
---|---|
Reported by: | Former user |
Assigned to: | Former user |
Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2019-08-06T20:45:20.415Z)
In running LX on a gcc7-compiled platform, systemd was observed to be enagaged in periodic heavy work. Looking at systemctl revealed some failed units:
root@f184bb79-8aef-c588-c2ec-8a3e33a1ed68:~# systemctl list-units --state=failed UNIT LOAD ACTIVE SUB JOB DESCRIPTION ● dbus.service loaded failed failed start D-Bus System Message Bus ● systemd-random-seed.service loaded failed failed Load/Save Random Seed ● systemd-remount-fs.service loaded failed failed Remount Root and Kernel File Systems ● systemd-journald-dev-log.socket loaded failed failed Journal Socket (/dev/log) ● systemd-journald.socket loaded failed failed Journal Socket LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. JOB = Pending job for the unit. 5 loaded units listed. Pass --all to see loaded but inactive units, too. To show all installed unit files use 'systemctl list-unit-files'.
Looking at D-Bus in particular:
root@f184bb79-8aef-c588-c2ec-8a3e33a1ed68:~# systemctl -l status dbus.service ● dbus.service - D-Bus System Message Bus Loaded: loaded (/lib/systemd/system/dbus.service; static; vendor preset: enabled) Active: failed (Result: core-dump) since Fri 2019-01-11 00:05:18 UTC; 9h ago Docs: man:dbus-daemon(1) Process: 13417 ExecStop=/bin/true (code=exited, status=0/SUCCESS) Process: 8474 ExecStart=/usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation (code=dumped, signal=SEGV) Main PID: 8474 (code=dumped, signal=SEGV)
And indeed, we have quite a few coredumps from dbus:
[root@headnode (east-bay-1) ~]# ls -1 /zones/f184bb79-8aef-c588-c2ec-8a3e33a1ed68/cores/*dbus* /zones/f184bb79-8aef-c588-c2ec-8a3e33a1ed68/cores/core.dbus-daemon.6363 /zones/f184bb79-8aef-c588-c2ec-8a3e33a1ed68/cores/core.dbus-daemon.6550 /zones/f184bb79-8aef-c588-c2ec-8a3e33a1ed68/cores/core.dbus-daemon.8474 /zones/f184bb79-8aef-c588-c2ec-8a3e33a1ed68/cores/core.dbus-daemon.84961
Looking at one of these:
[root@headnode (east-bay-1) ~]# mdb /zones/f184bb79-8aef-c588-c2ec-8a3e33a1ed68/cores/core.dbus-daemon.6363 Loading modules: [ libc.so.1 ld.so.1 libc.so.6 ] > $c LMfd`libc.so.1`ascii_strncasecmp+0x10(656e690036706475, 7fffef1d33d7, 5) LMfd`libc.so.1`strncasecmp_l+0x75(656e690036706475, 7fffef1d33d7, 5, 7fffef1ef8c0) LMfd`libc.so.1`strncasecmp+0x35(656e690036706475, 7fffef1d33d7, 5) LMfd`libc.so.1`__priv_getbyname+0x4a(7fffef2b0890, 656e690036706475) LMfd`libc.so.1`priv_getbyname+0x30(656e690036706475) LMfd`libc.so.1`priv_ismember+0x16(7fffef28eb48, 656e690036706475) LMfd`lx_brand.so.1`lx_cap_update_priv+0xb2(7fffef28eb48, 7fffef28eb98) LMfd`lx_brand.so.1`lx_cap_to_priv+0x2f(7fffef28eb90, 7fffef28eb20) LMfd`lx_brand.so.1`lx_capset+0xfb(7fffeefb0724, 7fffeefb072c) LMfd`lx_brand.so.1`lx_emulate+0xc3(7fffef28eca0, 7e, 7fffef28ec70)
Note that we are calling priv_ismember
from lx_cap_update_priv
with a clearly bogus string.
Here's the code for lx_cap_update_priv
:
static long lx_cap_update_priv(priv_set_t *priv, const uint32_t cap[]) { int i, j; boolean_t cap_set; boolean_t priv_set; boolean_t updated = B_FALSE; for (i = 0; i <= LX_CAP_MAX_CHECK; i++) { cap_set = LX_CAP_CAPISSET(i, cap); if (lx_cap_mapping[i] == NULL || i > LX_CAP_MAX_VALID) { /* don't allow setting unsupported caps */ if (cap_set) { /* * CAP_SETPCAP is a special capability, with * varying behavior, that can be used to * control if the process can change other * process's capabilities, or to control moving * capabilities between sets. For now we ignore * this if its passed in. */ if (i == LX_CAP_SETPCAP) { continue; } lx_unsupported("set unsupported capability %d", i); return (-1); } else { continue; } } for (j = 0; lx_cap_mapping[i][j] != NULL; j++) { priv_set = priv_ismember(priv, lx_cap_mapping[i][j]); if (priv_set && !cap_set) { VERIFY0(priv_delset(priv, lx_cap_mapping[i][j])); updated = B_TRUE; } else if (!priv_set && cap_set) { VERIFY0(priv_addset(priv, lx_cap_mapping[i][j])); updated = B_TRUE; } } } if (updated) return (1); else return (0); }
Investigation reveals that this string is sitting off the end of lx_cap_mapping
(in an adjacent symbol):
> LMfd`lx_brand.so.1`lx_cap_mapping,0t41/nap LMfd`lx_brand.so.1`lx_cap_mapping: LMfd`lx_brand.so.1`lx_cap_mapping: LMfd`lx_brand.so.1`lx_cap_mapping: LMfd`lx_brand.so.1`lx_cap_map_chown LMfd`lx_brand.so.1`lx_cap_mapping+8: LMfd`lx_brand.so.1`lx_cap_map_dac_override LMfd`lx_brand.so.1`lx_cap_mapping+0x10: LMfd`lx_brand.so.1`lx_cap_map_dac_read_search LMfd`lx_brand.so.1`lx_cap_mapping+0x18: LMfd`lx_brand.so.1`lx_cap_map_fowner LMfd`lx_brand.so.1`lx_cap_mapping+0x20: LMfd`lx_brand.so.1`lx_cap_map_fsetid LMfd`lx_brand.so.1`lx_cap_mapping+0x28: LMfd`lx_brand.so.1`lx_cap_map_kill LMfd`lx_brand.so.1`lx_cap_mapping+0x30: LMfd`lx_brand.so.1`lx_cap_map_setgid LMfd`lx_brand.so.1`lx_cap_mapping+0x38: LMfd`lx_brand.so.1`lx_cap_map_setuid LMfd`lx_brand.so.1`lx_cap_mapping+0x40: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0x48: LMfd`lx_brand.so.1`lx_cap_map_linux_immutable LMfd`lx_brand.so.1`lx_cap_mapping+0x50: LMfd`lx_brand.so.1`lx_cap_map_bind_service LMfd`lx_brand.so.1`lx_cap_mapping+0x58: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0x60: LMfd`lx_brand.so.1`lx_cap_map_net_admin LMfd`lx_brand.so.1`lx_cap_mapping+0x68: LMfd`lx_brand.so.1`lx_cap_map_net_raw LMfd`lx_brand.so.1`lx_cap_mapping+0x70: LMfd`lx_brand.so.1`lx_cap_map_ipc_lock LMfd`lx_brand.so.1`lx_cap_mapping+0x78: LMfd`lx_brand.so.1`lx_cap_map_ipc_owner LMfd`lx_brand.so.1`lx_cap_mapping+0x80: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0x88: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0x90: LMfd`lx_brand.so.1`lx_cap_map_sys_chroot LMfd`lx_brand.so.1`lx_cap_mapping+0x98: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0xa0: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0xa8: LMfd`lx_brand.so.1`lx_cap_map_sys_admin LMfd`lx_brand.so.1`lx_cap_mapping+0xb0: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0xb8: LMfd`lx_brand.so.1`lx_cap_map_sys_nice LMfd`lx_brand.so.1`lx_cap_mapping+0xc0: LMfd`lx_brand.so.1`lx_cap_map_sys_resource LMfd`lx_brand.so.1`lx_cap_mapping+0xc8: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0xd0: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0xd8: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0xe0: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0xe8: LMfd`lx_brand.so.1`lx_cap_map_audit_write LMfd`lx_brand.so.1`lx_cap_mapping+0xf0: LMfd`lx_brand.so.1`lx_cap_map_audit_control LMfd`lx_brand.so.1`lx_cap_mapping+0xf8: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0x100: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0x108: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0x110: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0x118: 0 LMfd`lx_brand.so.1`lx_cap_mapping+0x120: 0 0x7fffef4fada8: 0 0x7fffef4fadb0: 0 0x7fffef4fadb8: 0 LMfd`lx_brand.so.1`nca: 0x7fffef4e87c1
And:
> 0x7fffef4e87c1/K 0x7fffef4e87c1: 656e690036706475
So we've somehow managed to walk off the end of lx_cap_mapping
, which doesn't seem possible from the code. Looking at the generated assembly it became clear that the generated assembly is incorrect -- which is in turn due to this undefined load:
... for (i = 0; i <= LX_CAP_MAX_CHECK; i++) { cap_set = LX_CAP_CAPISSET(i, cap); if (lx_cap_mapping[i] == NULL || i > LX_CAP_MAX_VALID) { /* don't allow setting unsupported caps */ ...
lx_cap_mapping
is LX_CAP_MAX_VALID + 1
in size; the load is of lx_cap_mapping[i]
is technically undefined for sizes between LX_CAP_MAX_VALID + 1
and LX_CAP_MAX_CHECK
. That said, this shouldn't affect the correctness of the code (much): assuming that lx_cap_mapping
is mapped from LX_CAP_MAX_VALID + 1
to LX_CAP_MAX_CHECK
, the load doesn't result in misbehavior because i
is also greater than LX_CAP_MAX_VALID
. But for some reason (read: compiler bug), the undefined behavior is causing the compiler to optimize away all other paths -- and it will iterate the loop well past LX_CAP_MAX_CHECK
and into adjacent memory!
To see this in a much smaller example, take this C program:
#define NULL ((void *)0) static char *arr[2] = { "nasal", "demons" }; long func() { int i; for (i = 0; i <= 2; i++) { if (arr[i] == NULL && i == 0) return (0xbad); } return (0xfad); }
Here is the code when generated with -O2
and GCC7:
% gcc --version gcc (GCC) 7.3.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. % gcc -O2 -c ub.c % dis ub.o disassembly for ub.o section .text func() func: 55 pushl %ebp func+0x1: b8 ad 0b 00 00 movl $0xbad,%eax func+0x6: 89 e5 movl %esp,%ebp func+0x8: 5d popl %ebp func+0x9: c3 ret
Which definitely doesn't seem right at all! Fortunately, it's easy for us to rectify the undefined behavior here (by flipping the order in the conditional), which gets us out of the compiler issue.
We probably need to think about the no-aggressive option outside of illumos-joyent as well. e.g. it's entirely possible that our very old openssl isn't GCC7-friendly...
illumos-joyent commit 5b248548c9ca0108aa36e7e3db7fbc46077bcf4b (branch master, by Bryan Cantrill)
OS-7498 __ctype_mask[EOF] has been working by accident
OS-7511 undefined behavior in lx_cap_update_priv() turns gcc7 demonic
OS-7517 -faggressive-loop-optimizations is too aggressive
OS-7518 array over-read in has_saved_fp()
OS-7535 allow platform builds on base-64 image
OS-7536 stop shipping poold(1M) and its cohorts
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: John Levon <john.levon@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>