OS-6428: bad free in ctf_dwarf_init_die

Details

Issue Type:Bug
Priority:4 - Normal
Status:Resolved
Created at:2017-10-30T18:13:25.000Z
Updated at:2019-05-03T13:29:13.901Z

People

Created by:Former user
Reported by:Jonathan Perkin
Assigned to:Former user

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2019-04-02T10:35:04.982Z)

Fix Versions

2019-04-11 Pete Hornberger (Release Date: 2019-04-11)

Description

While implementing CTF across pkgsrc I ran into an issue where ctfconvert would abort attempting to convert libcrypto.so from OpenSSL.

$ LD_PRELOAD=/lib/libumem.so UMEM_DEBUG=default ctfconvert-altexec -k libcrypto.so
Abort (core dumped)

Digging into the dump it appears that we are trying to free from within an allocation rather than the allocation itself:

$ mdb /cores/core.ctfconvert-altex.62959
Loading modules: [ libumem.so.1 libc.so.1 libavl.so.1 ld.so.1 ]
> $C
08046968 libc.so.1`_lwp_kill+0x15(1, 6, 0, 1, fead2000, feab065b)
08046988 libc.so.1`raise+0x2b(6, ffffffff, 80469b8, fead2000)
080469a8 libumem.so.1`umem_do_abort+0x2b(fead2000, 0, 80469e8, feaa67fb, feab065b, feab08dd)
080469b8 libumem.so.1`umem_err_recoverable+0x5a(feab065b, feab08dd, 81765ec, feab089b, a5, fef22814)
080469e8 libumem.so.1`process_free+0xbf(81765ec, 1, 0, 817662c)
08046a08 libumem.so.1`umem_malloc_free+0x1a(81765ec, 8046c55, 8047c33, 21)
08046a28 libctf.so.1`ctf_free+0x1b(81765ec, 74, fef3c610, 0, 8046b0c, 81765ec)
08046ab8 libctf.so.1`ctf_dwarf_init_die+0x22b(3, 8139e08, 81765ec, d, 8046c34, 1000)
08046b38 libctf.so.1`ctf_dwarf_convert+0x1ef(3, 8139e08, 4, 8047c3c, 8046b78, 8046c34)
08046b98 libctf.so.1`ctf_elfconvert+0xe3(3, 8139e08, 0, 4, 0, 8047c3c)
08046be8 libctf.so.1`ctf_fdconvert+0x5d(3, 0, 4, 0, 8047c3c, 8046c34)
08047c58 main+0x27f(8047c4c, feeda768, 8047c88, 805153b, 3, 8047c94)
08047c88 _start+0x83(3, 8047d8c, 8047dbf, 8047dc2, 0, 8047dd4)
> 81765ec::whatis
81765ec is 8176000+5ec, allocated from umem_alloc_73728:
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
         8189f58          8176000  105289eac5b4db2                1
                          811d010                0                0
                 libumem.so.1`umem_cache_alloc_debug+0x1fe
                 libumem.so.1`umem_cache_alloc+0x18f
                 libumem.so.1`umem_alloc+0x50
                 libumem.so.1`umem_malloc+0x36
                 libctf.so.1`ctf_alloc+0x1b
                 libctf.so.1`ctf_dwarf_convert+0x124
                 libctf.so.1`ctf_elfconvert+0xe3
                 libctf.so.1`ctf_fdconvert+0x5d
                 main+0x27f
                 _start+0x83

Reading the code backs up this theory, we first allocate an array of dies then iterate through them where ctf_dwarf_init_die() tries to free an individual die:

ctf_conv_status_t
ctf_dwarf_convert(int fd, Elf *elf, uint_t nthrs, int *errp, ctf_file_t **fpp,
    char *errmsg, size_t errlen)
{
...
        cdies = ctf_alloc(sizeof (ctf_die_t) * ndies);
...
        for (i = 0; i < ndies; i++) {
                cdp = &cdies[i];
...
                ret = ctf_dwarf_init_die(fd, elf, &cdies[i], i, errmsg, errlen);

The final piece of the puzzle is why this shared object triggers the failure when others do not. The code path that is followed is this one:

                if (child == NULL) {
                        (void) snprintf(errbuf, errlen,
                            "file does not contain DWARF data\n");
                        avl_destroy(&cdp->cd_map);
                        ctf_free(cdp, sizeof (ctf_die_t));
                        return (ECTF_CONVBKERR);
                }

Analysis of libcrypto.so shows this die's filename is x86_64cpuid.o which is a manually assembled object file from x86_64cpuid.s rather than going via the compiler driver like the other objects. Rather than bailing out of converting libcrypto.so due to one missing die it seems like it might be a better idea to just skip that die and process the rest, i.e. remove that test entirely.

Comments

Comment by Jonathan Perkin
Created at 2017-12-05T12:45:38.061Z

I pushed a newer revision of this fix, the previous one was missing a few changes (the downside of trying to create separate bugs for the same code).  Notably we now explicitly bzero() the ctf_die_t's first so that all the NULL checks in ctf_dwarf_free_die don't check against uninitialized pointers.

With the latest diff the output goes from:

$ ctfconvert-altexec -k libcrypto.so.1.0.0
Segmentation Fault (core dumped)

to

$ ctfconvert-altexec -k libcrypto.so.1.0.0
ctfconvert-altexec: CTF conversion failed: file does not contain DWARF data

I am in the process of also testing it in a full smartos-live build.


Comment by Former user
Created at 2017-12-08T16:45:36.281Z
Updated at 2017-12-08T16:46:18.981Z

Just to note since it wasn't anywhere else (I noticed this during review):

The memory for all the ctf_die_t's was allocated as one chunk, and each ctf_die_t doesn't get fully initialized until it is processed. On failure, ctf_dwarf_convert() attempts to free all the ctf_die_t's, so it was possible any failure mid-way through processing the ctf_die_t's would cause ctf_dwarf_free_die() to go off into the weeds attempting to free a ctf_die_t that hadn't been initialized yet – including a work queue error or (with OS-6485) a merge error.

The fix here should address all of these issues.


Comment by Former user
Created at 2019-04-01T18:32:37.883Z

See OS-7639 for testing notes


Comment by Jira Bot
Created at 2019-04-02T10:29:17.092Z

illumos-joyent commit 495dfd0ac49c1326af24c839ba967d1e8264782d (branch master, by John Levon)

OS-6428 bad free in ctf_dwarf_init_die
OS-6486 ctfconvert -i never converts
OS-6488 ctfconvert should handle empty dies
OS-6505 Improve ctfconvert error messages
OS-7639 ctfconvert -i option is mis-handled
OS-7663 ctf_dwarf_convert_type() relies on un-initialized id
OS-7688 shouldn't build gcore.c as part of kmdb
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>