Tracing Every Function Call with a GCC Plugin

Table of Contents

Ever wanted to see every function call in a program — entry, exit, thread ID, and instruction pointer — without touching the source? This post builds exactly that: a GCC plugin that instruments functions at compile time and a libc-free runtime tracer that writes the log using raw syscalls.

The result works on any program compiled with GCC. The next two posts use it to trace glibc itself, including _dl_start — the first instruction the dynamic linker runs before main even exists.

The code is at github.com/fynq/inject.

Architecture

The system has two independent pieces compiled separately on purpose.

inject.so is the GCC plugin. It runs at compile time, walks the GIMPLE intermediate representation, and injects calls to dump_call_info() at every function entry and exit. tracer_helpers.o is the runtime — compiled once without the plugin, linked into the target binary, it provides __inject_trace_write() using only raw syscalls. No libc, no malloc, no pthread, no file descriptors from the standard library.

The separation is what makes glibc instrumentation safe: the tracer never calls back into the code being traced.

The GCC Plugin

GCC plugins hook into the compiler’s internal pass pipeline. Our plugin registers a GIMPLE pass that runs after SSA construction. GIMPLE is GCC’s low-level IR — by this point the code is in SSA form but before most optimizations, so the function structure is still clear.

plugin_init() registers two things: the pass, and a PLUGIN_START_UNIT callback that fires once per translation unit before any functions are compiled.

start_unit: synthesizing dump_call_info

The PLUGIN_START_UNIT callback synthesizes a new function — dump_call_info(const void *str, uint32_t len, uint64_t ip) — directly in the compiler’s IR using build_decl, build_function_type_list, and cgraph_node::finalize_function. The body calls __inject_trace_write which is provided by tracer_helpers.o.

This indirection exists so the plugin can emit a simple GIMPLE call without inline asm or register constraints. The real work — file opening, hex formatting, writev — happens in tracer_helpers.c.

instrument_entry and instrument_exit

For every non-inline, non-external function the pass calls both functions.

instrument_entry() prepends to the entry basic block:

  1. A static string "ev=entry fn=<name> ip=0x" as a local variable.
  2. A lea (%%rip), %0 inline asm to capture the instruction pointer.
  3. A GIMPLE call to dump_call_info(string, len, ip).

instrument_exit() does the same at every edge leading to the exit block, correctly handling functions with multiple return paths.

Functions are skipped when DECL_DECLARED_INLINE_P is set, when DECL_EXTERNAL is set, or when the function matches an exclusion rule.

What gets skipped: inline functions

add1() in the test program is declared inline. The plugin sees DECL_DECLARED_INLINE_P and skips it entirely. An inlined function has no single entry point — it is invisible to the tracer. This is a real limitation worth knowing when reading traces.

The Tracer

tracer_helpers.c provides __inject_trace_write(). The constraints are strict: no libc calls, thread safe from the first call, and a single writev per event so lines cannot be interleaved across threads.

Open-once with a lock-free spinlock

The file descriptor starts at 0 meaning “not yet opened”. On the first call, a compare-exchange atomically transitions it from 0 to -1 (meaning “open in progress”). The winner opens the file with a raw openat syscall; everyone else spins with pause until the fd becomes positive:

long expected = 0;
if (__sync_val_compare_and_swap(&__inject_tracer_fd, 0, -1) == 0) {
    long newfd = raw_openat(__inject_tracer_path);
    if (newfd <= 0) newfd = 2;  /* fallback to stderr */
    __atomic_store_n(&__inject_tracer_fd, newfd, __ATOMIC_RELEASE);
} else {
    while (__atomic_load_n(&__inject_tracer_fd, __ATOMIC_ACQUIRE) == -1)
        __asm__ volatile("pause");
}

No mutex, no libc, no pthread — just lock cmpxchg in the generated asm.

Atomic lines with writev

Each log line is five segments: the prefix string (ev=entry fn=foo ip=0x), 16-char hex IP, " tid=0x", 16-char hex TID, newline. A single writev syscall writes all five atomically:

struct t_iovec iov[5] = {
    { prefix,   (long)prefix_len },
    { ipbuf,    16               },
    { tid_pfx,  7                },
    { tidbuf,   16               },
    { nl,       1                },
};
raw_writev((int)fd, iov, 5);

writev to a file opened with O_APPEND is atomic up to PIPE_BUF on Linux. Since our lines are well under 4096 bytes, concurrent threads produce complete, non-interleaved lines.

Building

# Build the plugin
make

# Compile tracer_helpers.o WITHOUT the plugin
gcc -O2 -fPIC -c tracer_helpers.c -o tracer_helpers.o
gcc -O2 -fPIC -c inject_globals.c -o inject_globals.o

# Instrument test_plugin.c
make test_serial

The critical point: tracer_helpers.c and inject_globals.c must be compiled in a separate gcc invocation without -fplugin. If you pass them to the same invocation the plugin instruments __inject_trace_write, which calls dump_call_info, which calls __inject_trace_write — infinite recursion, immediate stack overflow.

Filtering with rules.yml

The plugin reads an optional YAML file to exclude functions and source files from instrumentation at compile time. Excluded functions have zero runtime overhead — the instrumentation is never injected:

exclude:
  functions:
    - malloc
    - free
  files:
    - generated.c
    - third_party.c
gcc -fplugin=./inject.so \
    -fplugin-arg-inject-mode=serial \
    -fplugin-arg-inject-rules=rules.yml \
    -O2 myprogram.c tracer_helpers.o inject_globals.o -o myprogram

Running it

rm -f /tmp/tracer.log
setarch -R ./test_plugin   # -R disables ASLR for stable addresses
cat /tmp/tracer.log
ev=entry fn=main        ip=0x00005555555550bc tid=0x000000000012bc53
ev=entry fn=fun_char    ip=0x00005555555558eb tid=0x000000000012bc53
ev=exit  fn=fun_char    ip=0x000055555555591b tid=0x000000000012bc53
ev=entry fn=write_ext2  ip=0x000055555555528b tid=0x000000000012bc53
ev=exit  fn=write_ext2  ip=0x00005555555552c0 tid=0x000000000012bc53
ev=entry fn=doSomeThing ip=0x00005555555552eb tid=0x000000000012bc54
ev=exit  fn=doSomeThing ip=0x0000555555555357 tid=0x000000000012bc54
ev=entry fn=doSomeThing ip=0x00005555555552eb tid=0x000000000012bc55
ev=exit  fn=doSomeThing ip=0x0000555555555357 tid=0x000000000012bc55
ev=exit  fn=main        ip=0x0000555555555159 tid=0x000000000012bc53

Several things worth noting. write_ext2 appears even though it is called via a function pointer — instrumentation is at the definition, not the call site. The two different TIDs for doSomeThing show two threads running concurrently, with writev atomicity keeping the lines intact. blub2 is absent despite being called twice — it was excluded in rules.yml at compile time.

The IP addresses with ASLR disabled match the binary directly. 0x...50bc is the address of the lea (%%rip) instruction injected at the top of main. Verify with objdump -d test_plugin | grep -A3 "<main>".

What’s next

The next post covers the hard part: making this work inside glibc itself, tracing _dl_start and the entire dynamic linker bootstrap. The challenge is that ld.so is linked with -nostdlib -r and cannot reference any external symbols — which breaks the architecture described here entirely. The solution requires a different approach to where the tracer stores its state.

Related Posts

Zero-Overhead USDT Probes Without sys/sdt.h

The previous two posts built a GCC plugin that writes a trace file on every function entry and exit. That works well for offline analysis but has a cost: every call executes a spinlock check, a gettid syscall, a hex formatting loop, and a writev. Even with the lock already open and O_APPEND in place, that is work happening on every function call in your program.

Read More

Tracing glibc Itself — From _dl_start to printf

The previous post built a GCC plugin that traces every function in a normal program. This post takes it to the logical extreme: instrumenting glibc itself, including ld.so — the dynamic linker that runs before main, before any libc function exists, before the process is even fully initialized.

Read More