Issue Type: | Bug |
---|---|
Priority: | 4 - Normal |
Status: | Resolved |
Created at: | 2017-10-30T18:13:25.000Z |
Updated at: | 2019-05-03T13:29:13.901Z |
Created by: | Former user |
---|---|
Reported by: | Jonathan Perkin |
Assigned to: | Former user |
Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2019-04-02T10:35:04.982Z)
2019-04-11 Pete Hornberger (Release Date: 2019-04-11)
While implementing CTF across pkgsrc I ran into an issue where ctfconvert would abort attempting to convert libcrypto.so from OpenSSL.
$ LD_PRELOAD=/lib/libumem.so UMEM_DEBUG=default ctfconvert-altexec -k libcrypto.so Abort (core dumped)
Digging into the dump it appears that we are trying to free from within an allocation rather than the allocation itself:
$ mdb /cores/core.ctfconvert-altex.62959 Loading modules: [ libumem.so.1 libc.so.1 libavl.so.1 ld.so.1 ]
> $C 08046968 libc.so.1`_lwp_kill+0x15(1, 6, 0, 1, fead2000, feab065b) 08046988 libc.so.1`raise+0x2b(6, ffffffff, 80469b8, fead2000) 080469a8 libumem.so.1`umem_do_abort+0x2b(fead2000, 0, 80469e8, feaa67fb, feab065b, feab08dd) 080469b8 libumem.so.1`umem_err_recoverable+0x5a(feab065b, feab08dd, 81765ec, feab089b, a5, fef22814) 080469e8 libumem.so.1`process_free+0xbf(81765ec, 1, 0, 817662c) 08046a08 libumem.so.1`umem_malloc_free+0x1a(81765ec, 8046c55, 8047c33, 21) 08046a28 libctf.so.1`ctf_free+0x1b(81765ec, 74, fef3c610, 0, 8046b0c, 81765ec) 08046ab8 libctf.so.1`ctf_dwarf_init_die+0x22b(3, 8139e08, 81765ec, d, 8046c34, 1000) 08046b38 libctf.so.1`ctf_dwarf_convert+0x1ef(3, 8139e08, 4, 8047c3c, 8046b78, 8046c34) 08046b98 libctf.so.1`ctf_elfconvert+0xe3(3, 8139e08, 0, 4, 0, 8047c3c) 08046be8 libctf.so.1`ctf_fdconvert+0x5d(3, 0, 4, 0, 8047c3c, 8046c34) 08047c58 main+0x27f(8047c4c, feeda768, 8047c88, 805153b, 3, 8047c94) 08047c88 _start+0x83(3, 8047d8c, 8047dbf, 8047dc2, 0, 8047dd4)
> 81765ec::whatis 81765ec is 8176000+5ec, allocated from umem_alloc_73728: ADDR BUFADDR TIMESTAMP THREAD CACHE LASTLOG CONTENTS 8189f58 8176000 105289eac5b4db2 1 811d010 0 0 libumem.so.1`umem_cache_alloc_debug+0x1fe libumem.so.1`umem_cache_alloc+0x18f libumem.so.1`umem_alloc+0x50 libumem.so.1`umem_malloc+0x36 libctf.so.1`ctf_alloc+0x1b libctf.so.1`ctf_dwarf_convert+0x124 libctf.so.1`ctf_elfconvert+0xe3 libctf.so.1`ctf_fdconvert+0x5d main+0x27f _start+0x83
Reading the code backs up this theory, we first allocate an array of dies then iterate through them where ctf_dwarf_init_die()
tries to free an individual die:
ctf_conv_status_t ctf_dwarf_convert(int fd, Elf *elf, uint_t nthrs, int *errp, ctf_file_t **fpp, char *errmsg, size_t errlen) { ... cdies = ctf_alloc(sizeof (ctf_die_t) * ndies); ... for (i = 0; i < ndies; i++) { cdp = &cdies[i]; ... ret = ctf_dwarf_init_die(fd, elf, &cdies[i], i, errmsg, errlen);
The final piece of the puzzle is why this shared object triggers the failure when others do not. The code path that is followed is this one:
if (child == NULL) { (void) snprintf(errbuf, errlen, "file does not contain DWARF data\n"); avl_destroy(&cdp->cd_map); ctf_free(cdp, sizeof (ctf_die_t)); return (ECTF_CONVBKERR); }
Analysis of libcrypto.so shows this die's filename is x86_64cpuid.o which is a manually assembled object file from x86_64cpuid.s rather than going via the compiler driver like the other objects. Rather than bailing out of converting libcrypto.so due to one missing die it seems like it might be a better idea to just skip that die and process the rest, i.e. remove that test entirely.
I pushed a newer revision of this fix, the previous one was missing a few changes (the downside of trying to create separate bugs for the same code). Notably we now explicitly bzero()
the ctf_die_t
's first so that all the NULL
checks in ctf_dwarf_free_die
don't check against uninitialized pointers.
With the latest diff the output goes from:
$ ctfconvert-altexec -k libcrypto.so.1.0.0 Segmentation Fault (core dumped)
to
$ ctfconvert-altexec -k libcrypto.so.1.0.0 ctfconvert-altexec: CTF conversion failed: file does not contain DWARF data
I am in the process of also testing it in a full smartos-live build.
Just to note since it wasn't anywhere else (I noticed this during review):
The memory for all the ctf_die_t
's was allocated as one chunk, and each ctf_die_t
doesn't get fully initialized until it is processed. On failure, ctf_dwarf_convert()
attempts to free all the ctf_die_t
's, so it was possible any failure mid-way through processing the ctf_die_t
's would cause ctf_dwarf_free_die()
to go off into the weeds attempting to free a ctf_die_t
that hadn't been initialized yet – including a work queue error or (with OS-6485) a merge error.
The fix here should address all of these issues.
illumos-joyent commit 495dfd0ac49c1326af24c839ba967d1e8264782d (branch master, by John Levon)
OS-6428 bad free in ctf_dwarf_init_die
OS-6486 ctfconvert -i never converts
OS-6488 ctfconvert should handle empty dies
OS-6505 Improve ctfconvert error messages
OS-7639 ctfconvert -i option is mis-handled
OS-7663 ctf_dwarf_convert_type() relies on un-initialized id
OS-7688 shouldn't build gcore.c as part of kmdb
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>