Symbols, debuginfo and sources

A package is more than a binary - make it observable

'h' for help, '2' for inline notes, 'p' for presentor notes

Presenter Notes

HTML5 slides. Created in python-landslide. For more background info see presentor notes. Press 'h'.

Mark Wielaard

mark@klomp.org
mjw@gnu.org
mjw@redhat.com

https://gnu.wildebeest.org/blog/mjw/

Presenter Notes

Most important slide. Please contact me. Work upstream on elfutils and valgrind. elfutils used by systemtap, systemd, perf. Happy to work on gdb, gcc, binutils etc. Work for Red Hat (which might explain some of my biases).

Goal

Making sure rich symbols, debuginfo and sources are always available.
Get them wherever they are. Or have a standard way to get them.

Presenter Notes

Seamlessly go from binary back to code. Limited to "native" ELF code. Stuff you build with GCC. Some recommendations. Please do/use this. Hope useful for others. Some questions. How would you do this? Hope to get feedback. Again. Happy to help implement this to make things easy everywhere.

I know how most of this stuff works in Fedora, but need help with other distros.

Why?

Code is never perfect
Real issues always happen in production

Presenter Notes

What is happening? Every user really is a "debugger". You want to observe (trace, profile) the sytem as it is running. Optimized code really is different from "debug code". Get useful backtraces. Analyse a core file "off-line".

Having everything ready makes recreating "debug environment" that matches the production environment precisely so much easier.

Presenter Notes

Context separation, sandboxing, pinning memory highly recommended. Useful security measures. Run as many things in different context, minimal priviliges. Use selinux policies or syscall sandboxing to prevent programs from doing "evil" things. "deny ptrace" however breaks things. It isn't just "ptrace". It doesn't just block "debuggers". It blocks most inter-process synchronisation, signaling, debugging, tracing and profiling by normal (unpriviliged) users. Both tried in Fedora and were disabled in the end. With ptrace deny/restricted scope can no longer observer own processes. So also cannot drop priviliges (debugging/profiling/tracing as root!). Still running in same user/security context, so can manipulate files, insert preload code to disable, etc.

build-ids

Theoretically all we need
Everybody gets this right (mostly *)

*) Make it a hard failure when they are not there.

Presenter Notes

ELF note. SHA1 hash of code sections in ELF image. Put there by the linker. Unique identifier. In each ELF image, executable, shared library. Loaded in memory. Put in core files.

Fedora errors out when no build-id is found. Debian only warns.

If your build is reproducable, then build-id will be the same. Stripping symbols or adding debuginfo should not change it.

Know the build-ids and the addresses where the ELF images are/were loaded then you have enough information to match any address to original source.

backtraces

The backbone of everything (tracing, profiling, debugging)
If you have any observability this should be it.
Provides you context for any observation.
Always use -fasynchronous-unwind-tables everywhere

Shameless plug eu-stack.

Presenter Notes

If you don't adopt anything else, please do this! Having reliable stacktraces is so useful. Fedora enables this on all arches. Shamelesslyplug elfutils eu-stack ('2').

.eh_frame is loaded in memory and so can be accessed easily and fast. frame pointers only get you so far. It is always the tricky code, signal handlers, fork/exec/clone, unexpected termination in the prologue/epilogue that manipulates the frame pointer, etc. Maintaining frame pointers bloats code and reduces optimization oppertunities. GCC is really good at it. GLIBC now has CFI for all/most hand written assembler.

Only exception might be the kernel. But then please put the unwind info in the .debug_frame.

function symbols

mini-symtab (confusingly named .gnu_debugdata)
Compressed ELF image containing just the function symbols (.symtab + .strtab)
gdb and elfutils support this upstream.
But generated only by obscure function inside rpm find-debuginfo.sh script.

Presenter Notes

Normally only .dynsym symbols available. This provides parts of .symtab normally stripped away. You cannot see "inlines" but still very useful. Recommended if the alternative would be having just "raw" addresses (for which you then need build-id to look them up).

Fedora only. But upstream support in gdb and elfutils. Slightly primitive. Alternative might be new ELF compressed section support. Or just not strip the full .symtab out (valgrind needs it in some cases - although only really for ld.so).

Full debuginfo

Always use -g (-gdwarf-4 is the default in recent GCC)
Do NOT disable -fvar-tracking-assignments
gdb-add-index (.gdb_index)
Maybe use -g3 (adds macro definitions)

This is a lot of info and sadly somewhat entangled.

But please always generate it and then strip it into a separate .debug file.

Presenter Notes

This will give you inlines (program structure, which code ended up where). Arguments to functions and local variables, plus which value they have at which point in the program. The types and structures used by the program. Matching addresses to source lines. .gdb_index provides debuggers a quick way to navigate some of these structures, so they don't need to scan it all at startup, even if you only want to use a small portion. -g3 used to be very expensive, but recent GCC versions generate much denser data. Nobody really uses them much though, since nobody generates them...so chicken, egg. Both indexing and dense macros are proposed as DWARFv5 extensions.

Compressing

Please don't use compressed ELF sections (.zdebug or SHF_COMPRESSED)
Do use DWZ the DWARF optimization and duplicate removal tool

Presenter Notes

Full debuginfo is big! So yes, compress. But ELF section compression is the wrong level. It isn't supported by many programs (valgrind for example doesn't). There are two variants (if you use any please not .zdebug). It prevents simply mmapping the data and using an index to only read/use what you need. It causes very slow startup. You can however use both if you really want to get the most compression. But I would recommend using DWZ only and then compress the while file(s) for storage (like in a package), but install uncompressed for direct usage.

Sources

Package the (post-processed) sources (as they were compiled).

DWARF references sources. Where they were build (!)

Can be adjusted in two ways:

rpm debugedit
gcc -fdebug-prefix-map=old=new

debugedit also provides source file list found.

Put them somewhere under /usr/src/debug/[package-version]/

Presenter Notes

debugedit is both more and less flexible. It is more flexible because it provides you the actual source file list used in the DWARF describing the program. It is less flexible because it isn't a full DWARF rewriter. It adjusts the location/directories as long as they are smaller... So setup a big enough build root PATH name. It is probably time to rewrite debugedit to support proper DWARF rewriting.

Separating and "linking"

There are two ways to "link" your binaries to their debug files.

.gnu_debuglink section in main file with name (and CRC) of .debug file
/usr/lib/debug/.build-id/XX/XXXXXXXXXXXXXXXXXXXXXX.debug

Presenter Notes

The .gnu_debuglink name has to be searched under well known paths (/usr/lib/debug + original location and/or subdirs). This makes it fragile, but more tools support it and it is the fallback used for when there is no build-id. But it might be time to deprecate/remove it because they inherently conflict between package versions.

Fedora supports both linking build-id.debug -> debuglink file. Fedora also throws in extra link to main exe under .build-id. But in debuginfo package, so that link could mismatch...

Preventing conflict

Work in progress.

Want to install both 64-bit and 32-bit debug package.
Have older/newer version of a package installed (inspecting a core file).
Have debug files matching running container executables.

To get there:

Hash in full name-version-arch of package into build-id.
Get rid of .gnu_debuglink files?
No more build-id main file backlinks.
Put sources under full name-version-arch subdir

Presenter Notes

This is where I still have more questions than answers. build-ids can conflict for minor version updates (the files should be completely identical though). Dropping .gnu_debuglink (or changing install/renamed paths) will need program updates. Should the build-id-main-file backlinks move into main package?

Should we really package debug files?

Make /usr/lib/debug and /usr/src/debug "magic" fuse file systems?
Populate through something like darkserver? https://darkserver.fedoraproject.org/
Federated registry of build-ids?

Presenter Notes

Discuss!

Table of Contents	t
Exposé	ESC
Full screen slides	e
Presenter View	p
Source Files	s
Slide Numbers	n
Toggle screen blanking	b
Show/hide slide context	c
Notes	2
Help	h

A package is more than a binary - make it observable

Table of Contents

Help