OS-3103: lx_ldb_get_dyns32 doesn't properly handle NULL pointer inputs


Issue Type:Bug
Priority:4 - Normal
Created at:2014-06-12T21:18:13.000Z
Updated at:2019-08-29T00:28:13.934Z


Created by:Former user
Reported by:Former user
Assigned to:Former user


Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2014-06-23T17:32:15.000Z)

Fix Versions

2014-06-26 Robocop (Release Date: 2014-06-26)




In the lx brand, file usr/src/lib/brand/lx/librtld_db/common/lx_librtld_db.c has function lx_ldb_get_dyns32 used by mdb as part of collecting its data. Part of mdb's code path for lx brand cores passes through fake_elf32, which passes NULL into dynpp_sz, causing a null pointer at line 210.


Comment by Former user
Created at 2014-06-13T17:22:19.000Z

illumos-joyent commit 3d496fe (branch master, by DJ Hoffman)

OS-3103 lx_ldb_get_dyns32 doesn't properly handle NULL pointer inputs

Comment by Former user
Created at 2014-06-13T18:01:05.000Z

illumos-joyent commit 9c8327a (branch master, by DJ Hoffman)

OS-3103 lx_ldb_get_dyns32 doesn't properly handle NULL pointer inputs (back out)

Comment by Former user
Created at 2014-06-19T16:39:10.000Z

source file for code that was having file size limit exceeded problems in lx brand zones.

Comment by Former user
Created at 2014-06-19T16:45:13.000Z
Updated at 2014-06-19T17:26:20.000Z

This investigation started with the core dump of a test run in lx brand zones for preadv and pwritev. The attacked tests64.c file was compiled in a linux vm and run in an lx brand zone

root@ubuntu12:~/lx-syscall-tests/syscall-tests# ./tests64
File size limit exceeded (core dumped)

Interestingly, running mdb on this core dump caused mdb to segmentation fault:

[root@headnode (coal) /zones/ub12/cores]# mdb ./core.tests64.81667 
Segmentation Fault (core dumped)

Using mdb on mdb's core dump gave a stack trace. I also disassembled the function mdb crashed in:

[root@headnode (coal) /cores]# mdb core.mdb.81729 
mdb: debugger failed with error: fatal signal received
Loading modules: [ libumem.so.1 libc.so.1 libproc.so.1 ld.so.1 ]
> $c
lx_librtld_db.so.1`lx_ldb_get_dyns32+0xa3(8134b60, 8048000, 804565c, 0)
librtld_db.so.1`rd_get_dyns+0x31(81125b0, 8048000, 804565c, 0)
libproc.so.1`fake_elf32+0x47(8129008, 81c7688, 8048000, 80456b8, 8, 80456ec)
libproc.so.1`fake_elf+0xa0(8129008, 81c7688, 8129008, fef16000, 8129008, 8129008)
libproc.so.1`build_fake_elf+0x1f(8129008, 81c7688, 8045800, 80457e0, 80457e4, 8045840)
libproc.so.1`Pbuild_file_symtab+0x18a(8129008, 81c7688, 40, 400)
libproc.so.1`core_iter_mapping+0x462(8046518, 8129008, 8046524, 10)
lx_librtld_db.so.1`lx_ldb_loadobj_iter32+0x109(8134b60, fecd1969, 8129008, 80465ac)
librtld_db.so.1`_rd_loadobj_iter32+0x49(81125b0, fecd1969, 8129008, 390, fecd17cc, fecd1969)
librtld_db.so.1`rd_loadobj_iter+0x2d(81125b0, fecd1969, 8129008, 80467e0)
libproc.so.1`Pfgrab_core+0xc15(5, 80468f0, 8046b3c, 0, 6f63002e, 742e6572)
libproc.so.1`proc_grab_common+0x347(8047e3c, 80468f0, 3, 0, 8046b3c, 0)
libproc.so.1`proc_arg_xgrab+0x1f(8047e3c, 0, 3, 0, 8046b3c, 0)
mdb_proc_tgt_create+0x1ad(8112f10, 2, 8047030, 6f632f2e)
mdb_tgt_create+0x99(8090872, 0, 2, 8047030, 0, 8047e3c)
main+0x12a2(8047d1c, fef20888, 8047d5c, 8063db7, 2, 8047d68)
_start+0x83(2, 8047e38, 8047e3c, 0, 8047e51, 8047e5d)
> <rip::dis
mdb: variable 'rip' is not defined near "<rip"
> <eip::dis
lx_librtld_db.so.1`lx_ldb_get_dyns32+0x82:      pushl  %eax
lx_librtld_db.so.1`lx_ldb_get_dyns32+0x83:      call   -0x204   <PLT=ps_plog>
lx_librtld_db.so.1`lx_ldb_get_dyns32+0x88:      movl   $0x0,%eax
lx_librtld_db.so.1`lx_ldb_get_dyns32+0x8d:      addl   $0x10,%esp
lx_librtld_db.so.1`lx_ldb_get_dyns32+0x90:      jmp    +0x101   <lx_librtld_db.so.1`lx_ldb_get_dyns32+0x196>
lx_librtld_db.so.1`lx_ldb_get_dyns32+0x95:      movl   -0x50(%ebp),%edx
lx_librtld_db.so.1`lx_ldb_get_dyns32+0x98:      movl   0x10(%ebp),%eax
lx_librtld_db.so.1`lx_ldb_get_dyns32+0x9b:      movl   %edx,(%eax)
lx_librtld_db.so.1`lx_ldb_get_dyns32+0x9d:      movl   -0x54(%ebp),%edi
lx_librtld_db.so.1`lx_ldb_get_dyns32+0xa0:      movl   0x14(%ebp),%eax
lx_librtld_db.so.1`lx_ldb_get_dyns32+0xa3:      movl   %edi,(%eax)
> $?
no process
SIGSEGV: Segmentation Fault
%cs = 0x0043            %eax = 0x00000000 
%ds = 0x004b            %ebx = 0xfe793000 
%ss = 0x004b            %ecx = 0x0804533c 
%es = 0x004b            %edx = 0x08137648 
%fs = 0x0000            %esi = 0x08048000 
%gs = 0x01c3            %edi = 0x000000f0 

 %eip = 0xfe780d9f lx_librtld_db.so.1`lx_ldb_get_dyns32+0xa3
 %ebp = 0x08045588
%kesp = 0x00000000

%eflags = 0x00010246
  id=0 vip=0 vif=0 ac=0 vm=0 rf=1 nt=0 iopl=0x0

   %esp = 0x08045530
%trapno = 0xe
   %err = 0x6

From here I saw that the segmentation fault was caused by trying to move into the location pointed at by %eax, which was 0 at the time.

Orienting myself based on the location of calls in the assembly with the source code of the file usr/src/lib/brand/lx/librtld_db/common/lx_librtld_db.c it became clear that lines 208 and 209 of this file were the location where mdb had crashed. Looking at these lines:

 *dynpp = dynp;
 *dynpp_sz = dynp_sz;

It is clear that is dynpp or dynpp_sz is NULL, then this function will cause a segfault.

Those two values happen to be passed in as arguments to the function.

Tracing these back where they came from, it seems that rd_get_dyns is mostly a function that passes through its arguments through to other functions without processing (lines 117, 122, and 126)

112 rd_err_e
113 rd_get_dyns(rd_agent_t *rap, psaddr_t addr, void **dynpp, size_t *dynpp_sz)
114 {   
115   if (rap->rd_helper.rh_ops != NULL)
116     return (rap->rd_helper.rh_ops->rho_get_dyns(
117         rap->rd_helper.rh_data, addr, dynpp, dynpp_sz));
119 #ifdef _LP64
120   if (rap->rd_dmodel == PR_MODEL_LP64)
121     return (_rd_get_dyns64(rap,
122         addr, (Elf64_Dyn **)dynpp, dynpp_sz));
123   else
124 #endif
125     return (_rd_get_dyns32(rap,
126         addr, (Dyn **)dynpp, dynpp_sz));
127 }  

Because of that, I don't believe any changes are necessary in rd_get_dyns.

rd_get_dyns is only called in one place in fake_elf32

264     if (rd_get_dyns(P->rap, addr, (void **)&dp, NULL) != RD_OK)

As we can see, it explicitly puts NULL in as the final argument to rd_get_dyns, which means dynpp_sz will be NULL and cause a segfault in lx_ldb_get_dyns32 when that is called. The question here is, should that be a valid value to pass into rd_get_dyns?

Without looking at other similar functions, simply because I don't know where to look, I'd argue yes. Neither the code in rd_get_dyns nor in lx_ldb_get_dyns32 ever reads these values, they are purely return arguments. It functions should not be forced to allocate space for a pointer that they subsequently never use, so presenting the option of passing NULL to ignore that return argument seems like the best interface. This is common practice in other standard libraries I have familiarity with, specifically calls such as sigaction(2) where if, the third argument is not NULL, it is written to contain the previous sigaction struct. The point being that the third argument can be NULL, in which case this doesn't happen.

Comment by Former user
Created at 2014-06-19T17:57:22.000Z

Adding more info on whether the NULL argument itself makes sense:

In a separate case in rd_get_dyns, it calls the function _rd_get_dyns32 which contains the lines

447   *dynpp = dynp;
448   if (dynpp_sz != NULL)
449     *dynpp_sz = phdr.p_filesz;

Where dynpp, and dynpp_sz are the same arguments as before. This provides support for the idea that at least dynpp_sz should be allowed to be NULL.

Comment by Former user
Created at 2014-06-23T17:32:11.000Z

illumos-joyent commit 5eeac42 (branch master, by DJ Hoffman)

OS-3103 lx_ldb_get_dyns32 doesn't properly handle NULL pointer inputs
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>

Comment by Former user
Created at 2014-07-02T01:44:17.000Z

Turns out real time signals are just broken in the build I have. see OS-2810.