OS-6485: CTF conversion fails with large files

Details

Issue Type:	Bug
Priority:	4 - Normal
Status:	Open
Created at:	2017-12-05T09:55:19.274Z
Updated at:	2019-11-07T21:22:09.035Z

People

Created by:	Jonathan Perkin
Reported by:	Jonathan Perkin
Assigned to:	Jonathan Perkin

Description

During the work to CTF convert pkgsrc, libicudata.so exposed memory scaling issues with ctfconvert. It is a 29MB shared library containing 3,449 DIEs.

The current conversion process allocates memory as follows:

A new ctf_die_t is created for each DIE during the initialisation process.
Each ctf_die_t holds a DWARF handle open on the input file.
Each ctf_die_t includes a new ctf_file_t allocation, each of which mmap()'s its own private copy of the CTF data, symtab, and strtab from the object.
During the merge process, for each DIE, an extra ctf_file_t is allocated, again with its own mmap'ed private copies, as the destination for the merge output.
These allocations are only freed at the end of the entire conversion process.

With all these allocations, a 32-bit ctfconvert process runs out of available memory while processing libicudata.so.

While investigating solutions for this issue, we should bear in mind much larger objects than even libicudata.so such as libLLVM.so, which as of version 3.7 is 681MB and contains 43,149 DIEs. A fix should be able to handle files as large as that, even though we can't currently process it due to it being C++.

Comments

Comment by Jonathan Perkin
Created at 2017-12-05T10:25:33.430Z
Updated at 2017-12-05T10:30:17.570Z

With the proposed patch the conversion process is changed to be as follows:

Instead of allocating a full ctf_die_t for every DIE up-front, we instead defer full initialisation to the main conversion process in ctf_dwarf_convert_one().
DIEs are processed in batches, defaulting to a batch size of 256 (configurable via -b on the command line if necessary).
Each batch is converted and merged until a single merged ctf_file_t is returned as the result.
After processing a batch, the input DIEs for that batch are freed.
The merged ctf_file_t is added as an input to the next batch.
The process continues until we have processed all batches and end with a final merged ctf_file_t, or a failure.

The default batchsize of 256 is based on a few constraints:

Processing DIEs in multiple batches means that ctf_id_t's will be different compared to those generated by a previous ctfconvert. If we're able to choose a batchsize which is larger than the number of DIEs in most objects then we will avoid changing ctf_id_t's. Whilst mostly cosmetic, it's still nice to avoid differences if possible.
Performance goes up as the batchsize increases, at least when using the default of 4 threads.
The batchsize needs to be well below the number of DIEs that can be processed without hitting memory limits.

Timing a patched ctfconvert on libicudata.so resulted in the following build times:

Batchsize	16	32	64	128	256	512	1024	2048	4096
Time (seconds)	45	30	23	19	17	16	14	13	Failed `ENOMEM`

Further analysis across pkgsrc may be helpful to determine whether we should consider changing the default.

Comment by Jonathan Perkin
Created at 2017-12-05T13:15:33.726Z

For the record, due to the fact both tickets change similar code and are somewhat interdependent, this ticket must be rebased and checked after OS-6428 has been pushed. In particular, as this change pre-initialises cdp->cd_elf the new check in OS-6428's ctf_dwarf_free_dies becomes useless, and we might want a cleaner way to avoid trying to free dies that have already been freed.

Comment by Former user
Created at 2017-12-05T22:00:59.230Z

Looking at this more, I think the way we're processing the symbol table is fine, but what we're doing with it isn't. We should probably try to convert it and ignore failures in those dies maybe.

Comment by Jonathan Perkin
Created at 2017-12-06T12:37:29.995Z

I guess you meant this comment for a different ticket?

OS-6485: CTF conversion fails with large files

Details

People

Related Links

Description

Comments