Tracing glibc Itself — From _dl_start to printf

Table of Contents

The previous post built a GCC plugin that traces every function in a normal program. This post takes it to the logical extreme: instrumenting glibc itself, including ld.so — the dynamic linker that runs before main, before any libc function exists, before the process is even fully initialized.

The first line of the resulting trace is:

ev=entry fn=_dl_start ip=0x00007f4dc2f5668c tid=0x0000000000167448

That is the first instruction the Linux kernel hands control to when launching any dynamically linked ELF binary. Getting there requires solving a problem that breaks the architecture described in Part 1.

The ld.so problem

glibc is not a single shared library. It is two: libc.so.6 and ld-linux-x86-64.so.2 (the dynamic linker, ld.so). When you run a program, the kernel loads ld.so first and hands it control. ld.so then finds and loads libc.so.6 and everything else. It does all of this before any writable memory is set up, before any file descriptors are open, before the heap exists.

ld.so is built with a special partial link step:

gcc -nostdlib -nostartfiles -r -o librtld.os \
    dl-allobjs.os rtld-libc.a -lgcc

The -r flag means “relocatable output” — a partial link, not a final executable. The -nostdlib means no standard libraries. This step refuses to resolve any external symbols. Anything ld.so calls must be defined within its own object files.

In Part 1, __inject_trace_write calls __inject_raw_openat, __inject_raw_gettid, and __inject_format_and_write from tracer_helpers.o. These are external symbols. The moment the plugin instruments any ld.so source file, the partial link fails:

/usr/bin/ld: (.text+0x31): undefined reference to `__inject_raw_gettid'
/usr/bin/ld: (.text+0x4d): undefined reference to `__inject_format_and_write'
/usr/bin/ld: (.text+0x98): undefined reference to `__inject_raw_openat'
collect2: error: ld returned 1 exit status

The obvious fix — passing tracer_helpers.o via LDFLAGS — does not work. LDFLAGS reaches the final link steps but not the partial link that builds librtld.os.

The key insight: static locals

The solution is to move all state inside __inject_trace_write itself as static local variables. In C terms:

__attribute__((weak))
void __inject_trace_write(const void *prefix, uint32_t len, uint64_t ip) {
    static long fd   = 0;
    static char path[] = "/tmp/glibc_trace.log";
    // ... open-once spinlock using fd ...
    // ... inline syscalls for openat, gettid, writev ...
}

Static locals have static storage duration but no external linkage. The compiler generates PC-relative references to .bss and .rodata within the same object file — no relocation to any external symbol. Every .os file that gets instrumented carries a complete, self-sufficient copy of __inject_trace_write.

The function is marked __attribute__((weak)). At the final link step, the linker sees hundreds of weak definitions of __inject_trace_write — one from each instrumented TU — and keeps exactly one, discarding the rest. At runtime there is one fd and one path, shared by all instrumented code in the process.

This is serial_full mode in the plugin.

Synthesizing it in GIMPLE

The plugin synthesizes __inject_trace_write during start_unit using GCC’s GENERIC tree API. Static locals are VAR_DECL nodes with TREE_STATIC=1 and TREE_PUBLIC=0:

tree fd_static = build_decl(UNKNOWN_LOCATION, VAR_DECL,
                             get_identifier("__itw_fd"),
                             long_integer_type_node);
TREE_STATIC(fd_static)  = 1;   /* static storage duration */
TREE_PUBLIC(fd_static)  = 0;   /* no external symbol — no relocation */
DECL_INITIAL(fd_static) = build_int_cst(long_integer_type_node, 0);
layout_decl(fd_static, 0);
varpool_add_new_variable(fd_static);

The three syscalls — openat, gettid, writev — are ASM_EXPR nodes in GENERIC. The openat uses the "S" register constraint to force the path pointer into rsi directly, avoiding a register clobber bug that occurs when using "r" with an explicit movq:

const char *tmpl =
    "movl $257,%%eax\n\t"   /* SYS_openat        */
    "movl $-100,%%edi\n\t"  /* AT_FDCWD          */
    "movl $0x441,%%edx\n\t" /* O_WRONLY|O_CREAT|O_APPEND */
    "movl $0644,%%r10d\n\t" /* mode              */
    "syscall";
tree asm_expr = build5(ASM_EXPR, void_type_node,
    build_string(strlen(tmpl), tmpl),
    /* output: "=a"(newfd) */
    build_tree_list(build_tree_list(NULL_TREE, build_string(3, "=a")), newfd),
    /* input: "S"(path) forces path into rsi */
    build_tree_list(build_tree_list(NULL_TREE, build_string(2, "S")), path_addr),
    /* clobbers: rdi, rdx, r10, rcx, r11, memory */
    ...);
ASM_VOLATILE_P(asm_expr) = 1;

The u64_to_hex conversion is an unrolled 16-iteration GENERIC loop — no function call, no libc. The writev iovec is built on the stack as a long[10] array.

The log path is baked in at compile time via a plugin argument:

-fplugin-arg-inject-logpath=/tmp/glibc_trace.log

Building instrumented glibc

mkdir ~/work/glibc_build ~/work/glibc_install
cd ~/work/glibc_build

~/work/glibc/configure \
  CC='/usr/bin/gcc -g -O2 \
    -Wno-error=stringop-truncation \
    -Wno-error=maybe-uninitialized \
    -Wno-error=uninitialized \
    -fplugin=/path/to/inject.so \
    -fplugin-arg-inject-mode=serial_full \
    -fplugin-arg-inject-logpath=/tmp/glibc_trace.log' \
  --prefix=~/work/glibc_install/

make -j$(nproc)
make install

The CC value must be a single unbroken string — no backslash line continuations inside the single quotes. The shell would embed literal newlines into the variable, which breaks glibc’s configure compiler test.

No LDFLAGS needed. No tracer_helpers.o. Every object file carries its own complete __inject_trace_write.

Running it

Link a test program against the instrumented glibc using the full absolute path — tilde expansion does not work inside -Wl arguments:

gcc /tmp/hello.c -o /tmp/hello \
    -Wl,-rpath=/home/markus/work/glibc_install/lib \
    -Wl,--dynamic-linker=/home/markus/work/glibc_install/lib/ld-linux-x86-64.so.2

rm -f /tmp/glibc_trace.log
/tmp/hello
wc -l /tmp/glibc_trace.log
head -30 /tmp/glibc_trace.log

Output for printf("hello\n"):

ev=entry fn=_dl_start                ip=0x00007f4dc2f5668c tid=0x0000000000167448
ev=entry fn=__rtld_malloc_init_stubs ip=0x00007f4dc2f5197b tid=0x0000000000167448
ev=exit  fn=__rtld_malloc_init_stubs ip=0x00007f4dc2f519cb tid=0x0000000000167448
ev=entry fn=_dl_setup_hash           ip=0x00007f4dc2f3f04b tid=0x0000000000167448
ev=exit  fn=_dl_setup_hash           ip=0x00007f4dc2f3f0d1 tid=0x0000000000167448
ev=entry fn=_dl_sysdep_start         ip=0x00007f4dc2f54673 tid=0x0000000000167448
ev=entry fn=_dl_sysdep_parse_arguments ip=0x00007f4dc2f5331e tid=0x0000000000167448
ev=exit  fn=_dl_sysdep_parse_arguments ip=0x00007f4dc2f534e7 tid=0x0000000000167448
ev=entry fn=__tunables_init          ip=0x00007f4dc2f4326b tid=0x0000000000167448
ev=entry fn=get_next_env             ip=0x00007f4dc2f432c7 tid=0x0000000000167448
ev=exit  fn=get_next_env             ip=0x00007f4dc2f43346 tid=0x0000000000167448
...
ev=entry fn=__vfprintf_internal      ip=0x00007f4dc2e8b3a2 tid=0x0000000000167448
ev=entry fn=_IO_file_xsputn          ip=0x00007f4dc2ea2c10 tid=0x0000000000167448
...

A simple printf produces around 1800 trace lines covering the complete bootstrap from _dl_start through the dynamic linker, __libc_start_main, and all printf internals. Every function is traced, every thread is identified, every instruction pointer is captured.

Correlating addresses to symbols

With ASLR active the addresses change on each run. Correlate them back to source with addr2line:

addr2line -e ~/work/glibc_install/lib/libc.so.6 -f 0x7f4dc2f5668c

Or find the function for an address with nm:

nm -D ~/work/glibc_install/lib/libc.so.6 | grep "_dl_start"

Visualizing the call graph

The trace is a flat log of entry/exit events. A Python script reconstructs the call stack per thread and emits a Graphviz dot file:

python3 tracelog2dot.py /tmp/glibc_trace.log \
    -o glibc.dot \
    --depth 8 \
    --min-calls 2 \
    --weight \
    -v

dot -Tsvg glibc.dot -o glibc.svg

The --depth 8 flag limits the graph to the first 8 call levels — without it the 1800-line trace produces an unreadable graph. The colour scheme runs from blue (shallow, early bootstrap) through green to red (deep, inside printf internals).

Check out a generated graph

What was not traced

What was not traced With serial_full mode, rules.yml is optional. The configure above uses no exclusions at all — malloc, the allocator arena, pthread internals, and every other glibc function are instrumented and traced. This works because __inject_trace_write never calls malloc, never uses libc, and the open-once spinlock uses only atomic operations. There is no reentrancy risk. The tracer is genuinely libc-free. You may want exclusions for performance — the allocator is called on almost every interesting operation, so get_next_env appearing 30 times in a row is expected and can dominate the trace. A rules.yml that excludes noisy internal helpers keeps the log focused on the calls you care about.

What’s next

The third post adds USDT probes to the same plugin — zero-overhead probe sites readable by bpftrace, perf, and SystemTap. The probes are synthesized as raw ELF .note.stapsdt entries directly in GIMPLE, without sys/sdt.h or any external library.

Related Posts

Tracing Every Function Call with a GCC Plugin

Ever wanted to see every function call in a program — entry, exit, thread ID, and instruction pointer — without touching the source? This post builds exactly that: a GCC plugin that instruments functions at compile time and a libc-free runtime tracer that writes the log using raw syscalls.

Read More

Zero-Overhead USDT Probes Without sys/sdt.h

The previous two posts built a GCC plugin that writes a trace file on every function entry and exit. That works well for offline analysis but has a cost: every call executes a spinlock check, a gettid syscall, a hex formatting loop, and a writev. Even with the lock already open and O_APPEND in place, that is work happening on every function call in your program.

Read More