Tracing Every Function Call with a GCC Plugin
Table of Contents
Ever wanted to see every function call in a program — entry, exit, thread ID, and instruction pointer — without touching the source? This post builds exactly that: a GCC plugin that instruments functions at compile time and a libc-free runtime tracer that writes the log using raw syscalls.
The result works on any program compiled with GCC. The next two posts use it
to trace glibc itself, including _dl_start — the first instruction the
dynamic linker runs before main even exists.
The code is at github.com/fynq/inject.
Architecture
The system has two independent pieces compiled separately on purpose.
inject.so is the GCC plugin. It runs at compile time, walks the GIMPLE
intermediate representation, and injects calls to dump_call_info() at every
function entry and exit. tracer_helpers.o is the runtime — compiled once
without the plugin, linked into the target binary, it provides
__inject_trace_write() using only raw syscalls. No libc, no malloc, no
pthread, no file descriptors from the standard library.
The separation is what makes glibc instrumentation safe: the tracer never calls back into the code being traced.
The GCC Plugin
GCC plugins hook into the compiler’s internal pass pipeline. Our plugin registers a GIMPLE pass that runs after SSA construction. GIMPLE is GCC’s low-level IR — by this point the code is in SSA form but before most optimizations, so the function structure is still clear.
plugin_init() registers two things: the pass, and a PLUGIN_START_UNIT
callback that fires once per translation unit before any functions are compiled.
start_unit: synthesizing dump_call_info
The PLUGIN_START_UNIT callback synthesizes a new function —
dump_call_info(const void *str, uint32_t len, uint64_t ip) — directly in
the compiler’s IR using build_decl, build_function_type_list, and
cgraph_node::finalize_function. The body calls __inject_trace_write which
is provided by tracer_helpers.o.
This indirection exists so the plugin can emit a simple GIMPLE call without
inline asm or register constraints. The real work — file opening, hex
formatting, writev — happens in tracer_helpers.c.
instrument_entry and instrument_exit
For every non-inline, non-external function the pass calls both functions.
instrument_entry() prepends to the entry basic block:
- A static string
"ev=entry fn=<name> ip=0x"as a local variable. - A
lea (%%rip), %0inline asm to capture the instruction pointer. - A GIMPLE call to
dump_call_info(string, len, ip).
instrument_exit() does the same at every edge leading to the exit block,
correctly handling functions with multiple return paths.
Functions are skipped when DECL_DECLARED_INLINE_P is set, when
DECL_EXTERNAL is set, or when the function matches an exclusion rule.
What gets skipped: inline functions
add1() in the test program is declared inline. The plugin sees
DECL_DECLARED_INLINE_P and skips it entirely. An inlined function has no
single entry point — it is invisible to the tracer. This is a real limitation
worth knowing when reading traces.
The Tracer
tracer_helpers.c provides __inject_trace_write(). The constraints are
strict: no libc calls, thread safe from the first call, and a single writev
per event so lines cannot be interleaved across threads.
Open-once with a lock-free spinlock
The file descriptor starts at 0 meaning “not yet opened”. On the first call, a
compare-exchange atomically transitions it from 0 to -1 (meaning “open in
progress”). The winner opens the file with a raw openat syscall; everyone
else spins with pause until the fd becomes positive:
long expected = 0;
if (__sync_val_compare_and_swap(&__inject_tracer_fd, 0, -1) == 0) {
long newfd = raw_openat(__inject_tracer_path);
if (newfd <= 0) newfd = 2; /* fallback to stderr */
__atomic_store_n(&__inject_tracer_fd, newfd, __ATOMIC_RELEASE);
} else {
while (__atomic_load_n(&__inject_tracer_fd, __ATOMIC_ACQUIRE) == -1)
__asm__ volatile("pause");
}
No mutex, no libc, no pthread — just lock cmpxchg in the generated asm.
Atomic lines with writev
Each log line is five segments: the prefix string (ev=entry fn=foo ip=0x),
16-char hex IP, " tid=0x", 16-char hex TID, newline. A single writev
syscall writes all five atomically:
struct t_iovec iov[5] = {
{ prefix, (long)prefix_len },
{ ipbuf, 16 },
{ tid_pfx, 7 },
{ tidbuf, 16 },
{ nl, 1 },
};
raw_writev((int)fd, iov, 5);
writev to a file opened with O_APPEND is atomic up to PIPE_BUF on
Linux. Since our lines are well under 4096 bytes, concurrent threads produce
complete, non-interleaved lines.
Building
# Build the plugin
make
# Compile tracer_helpers.o WITHOUT the plugin
gcc -O2 -fPIC -c tracer_helpers.c -o tracer_helpers.o
gcc -O2 -fPIC -c inject_globals.c -o inject_globals.o
# Instrument test_plugin.c
make test_serial
The critical point: tracer_helpers.c and inject_globals.c must be compiled
in a separate gcc invocation without -fplugin. If you pass them to the
same invocation the plugin instruments __inject_trace_write, which calls
dump_call_info, which calls __inject_trace_write — infinite recursion,
immediate stack overflow.
Filtering with rules.yml
The plugin reads an optional YAML file to exclude functions and source files from instrumentation at compile time. Excluded functions have zero runtime overhead — the instrumentation is never injected:
exclude:
functions:
- malloc
- free
files:
- generated.c
- third_party.c
gcc -fplugin=./inject.so \
-fplugin-arg-inject-mode=serial \
-fplugin-arg-inject-rules=rules.yml \
-O2 myprogram.c tracer_helpers.o inject_globals.o -o myprogram
Running it
rm -f /tmp/tracer.log
setarch -R ./test_plugin # -R disables ASLR for stable addresses
cat /tmp/tracer.log
ev=entry fn=main ip=0x00005555555550bc tid=0x000000000012bc53
ev=entry fn=fun_char ip=0x00005555555558eb tid=0x000000000012bc53
ev=exit fn=fun_char ip=0x000055555555591b tid=0x000000000012bc53
ev=entry fn=write_ext2 ip=0x000055555555528b tid=0x000000000012bc53
ev=exit fn=write_ext2 ip=0x00005555555552c0 tid=0x000000000012bc53
ev=entry fn=doSomeThing ip=0x00005555555552eb tid=0x000000000012bc54
ev=exit fn=doSomeThing ip=0x0000555555555357 tid=0x000000000012bc54
ev=entry fn=doSomeThing ip=0x00005555555552eb tid=0x000000000012bc55
ev=exit fn=doSomeThing ip=0x0000555555555357 tid=0x000000000012bc55
ev=exit fn=main ip=0x0000555555555159 tid=0x000000000012bc53
Several things worth noting. write_ext2 appears even though it is called
via a function pointer — instrumentation is at the definition, not the call
site. The two different TIDs for doSomeThing show two threads running
concurrently, with writev atomicity keeping the lines intact. blub2 is
absent despite being called twice — it was excluded in rules.yml at compile
time.
The IP addresses with ASLR disabled match the binary directly. 0x...50bc is
the address of the lea (%%rip) instruction injected at the top of main.
Verify with objdump -d test_plugin | grep -A3 "<main>".
What’s next
The next post covers the hard part: making this work inside glibc itself,
tracing _dl_start and the entire dynamic linker bootstrap. The challenge
is that ld.so is linked with -nostdlib -r and cannot reference any
external symbols — which breaks the architecture described here entirely.
The solution requires a different approach to where the tracer stores its
state.