Zero-Overhead USDT Probes Without sys/sdt.h
Table of Contents
The previous two posts built a GCC plugin that writes a trace file on every
function entry and exit. That works well for offline analysis but has a cost:
every call executes a spinlock check, a gettid syscall, a hex formatting
loop, and a writev. Even with the lock already open and O_APPEND in place,
that is work happening on every function call in your program.
This post adds a third mode: USDT probes. Each probe is a single nop
instruction plus an ELF note. The overhead when no tracer is attached is
literally one nop — unmeasurable. When bpftrace or perf attaches, it
replaces the nop with a breakpoint and fires your script.
The interesting part: USDT probes normally require #include <sys/sdt.h> which
expands STAP_PROBE macros into the appropriate asm. We skip the header
entirely and synthesize the ELF notes directly in GIMPLE from the plugin.
No external library, no header dependency, works in glibc’s ld.so just like
serial_full.
How USDT probes work
A USDT probe has two parts. First, a labeled nop at the probe site:
990:
nop
Second, an entry in the .note.stapsdt ELF section describing the probe:
.pushsection .note.stapsdt,"?","note"
.balign 4
.4byte 8 /* namesz: "stapsdt\0" */
.4byte 40 /* descsz */
.4byte 3 /* NT_STAPSDT */
.asciz "stapsdt"
.balign 4
.8byte 990b /* probe PC — address of the nop */
.8byte 0 /* base address */
.8byte 0 /* semaphore address */
.asciz "inject" /* provider name */
.asciz "entry__main" /* probe name */
.asciz "" /* argument descriptors */
.balign 4
.popsection
When bpftrace reads the binary it finds all .note.stapsdt entries, extracts
the probe PC, and installs a kernel uprobe at that address. When the process
hits the nop, the kernel fires the uprobe handler, runs your bpftrace script,
and resumes execution. If no tracer is attached, the nop just executes and
the process continues normally.
The note format is well-documented in the SystemTap source and used identically by bpftrace, perf, and stap. No tool-specific format — one note format, all tools.
Synthesizing the note in the plugin
The plugin emits both parts — the nop and the .note.stapsdt entry — as a
single gimple_build_asm_vec call. The asm string contains the nop and the
full assembler directives for the note:
static void emit_usdt_probe(gimple_seq *seq,
const char *provider,
const char *probe_name,
int counter) {
int descsz = 24 + strlen(provider) + 1
+ strlen(probe_name) + 1
+ 1; /* empty args */
char *tmpl = NULL;
asprintf(&tmpl,
"%d:\n\t"
"nop\n\t"
".pushsection .note.stapsdt,\"?\",\"note\"\n\t"
".balign 4\n\t"
".4byte 8\n\t"
".4byte %d\n\t"
".4byte 3\n\t"
".asciz \"stapsdt\"\n\t"
".balign 4\n\t"
".8byte %db\n\t" /* probe PC = address of the nop */
".8byte 0\n\t" /* base address */
".8byte 0\n\t" /* semaphore */
".asciz \"%s\"\n\t"
".asciz \"%s\"\n\t"
".asciz \"\"\n\t" /* no typed arguments */
".balign 4\n\t"
".popsection",
counter, descsz, counter, provider, probe_name);
gimple g = gimple_build_asm_vec(tmpl, NULL, NULL, NULL, NULL);
gimple_asm_set_volatile(as_a_gasm(g), true);
gimple_seq_add_stmt(seq, g);
free(tmpl);
}
The counter gives each probe a unique local label (990:, 991:, etc.) so
the backward reference %db resolves correctly within the TU. The probe name
encodes the event type and function name: entry__main, exit__printf.
In instrument_entry and instrument_exit the GIMPLE call to
dump_call_info is replaced with a call to emit_usdt_probe:
if (inject_mode == 2) {
char *probe_name;
asprintf(&probe_name, "entry__%s", current_function_name());
gimple_seq probe_seq = NULL;
emit_usdt_probe(&probe_seq, inject_provider, probe_name, probe_counter++);
gsi_insert_seq_before(&gsi, probe_seq, GSI_NEW_STMT);
free(probe_name);
return;
}
No __inject_trace_write is synthesized in USDT mode. No file is opened, no
spinlock runs, no syscalls execute until a tracer attaches.
Building with USDT probes
make test_usdt
# Verify the notes are present
readelf -n test_plugin | grep -c stapsdt
# List all probes
bpftrace -l 'usdt:./test_plugin:inject:*'
Output:
usdt:./test_plugin:inject:entry__main
usdt:./test_plugin:inject:exit__main
usdt:./test_plugin:inject:entry__fun_char
usdt:./test_plugin:inject:exit__fun_char
usdt:./test_plugin:inject:entry__write_ext2
usdt:./test_plugin:inject:exit__write_ext2
usdt:./test_plugin:inject:entry__doSomeThing
usdt:./test_plugin:inject:exit__doSomeThing
The provider name defaults to inject, overridable with
-fplugin-arg-inject-provider=myprov. Probe names are
entry__<funcname> and exit__<funcname> — double underscore since function
names can contain single underscores.
Attaching with bpftrace
# Trace all entry events
bpftrace -e '
usdt:./test_plugin:inject:entry__main {
printf("main entered tid=%d\n", tid);
}
usdt:./test_plugin:inject:entry__doSomeThing {
printf("thread fn tid=%d\n", tid);
}
' -c ./test_plugin
# Measure time in a function
bpftrace -e '
usdt:./test_plugin:inject:entry__fun_char { @start[tid] = nsecs; }
usdt:./test_plugin:inject:exit__fun_char {
printf("fun_char: %d ns\n", nsecs - @start[tid]);
delete(@start[tid]);
}
' -c ./test_plugin
# Count calls per function
bpftrace -e '
usdt:./test_plugin:inject:entry__* {
@calls[probe] = count();
}
END { print(@calls); }
' -c ./test_plugin
USDT on instrumented glibc
Build glibc with USDT mode:
cd ~/work/glibc_build && rm -rf *
~/work/glibc/configure \
CC='/usr/bin/gcc -g -O2 \
-Wno-error=stringop-truncation \
-Wno-error=maybe-uninitialized \
-Wno-error=uninitialized \
-fplugin=/path/to/inject.so \
-fplugin-arg-inject-mode=usdt \
-fplugin-arg-inject-provider=glibc' \
--prefix=~/work/glibc_install_usdt/
make -j$(nproc) && make install
# How many probes?
bpftrace -l \
"usdt:/home/markus/work/glibc_install_usdt/lib/libc.so.6:glibc:*" | wc -l
Thousands. Every non-excluded function in glibc has an entry and exit probe.
# Trace the printf call chain live
bpftrace -e '
usdt:/home/markus/work/glibc_install_usdt/lib/libc.so.6:glibc:entry__* {
printf("%s\n", probe);
}
' -c /tmp/hello
The multi-library problem
bpftrace requires each shared library to be named explicitly. When a program
loads several instrumented libraries you need all of them in the probe spec.
Use ldd to find which ones have stapsdt notes:
for lib in $(ldd /tmp/hello | awk '{print $3}' | grep '^/'); do
count=$(readelf -n $lib 2>/dev/null | grep -c stapsdt)
[ "$count" -gt 0 ] && echo "$lib"
done
perf handles this more cleanly — it follows all loaded libraries
automatically:
perf record -e 'sdt_glibc:entry__*' /tmp/hello
perf script | head -30
Comparing the three modes
| serial | serial_full | usdt | |
|---|---|---|---|
| Overhead | writev per call | writev per call | nop per call |
| Output | file | file | bpftrace/perf/stap |
| glibc/ld.so | no | yes | yes |
| External deps | tracer_helpers.o | none | none |
| Log path | configurable | plugin arg | — |
| Live attach | no | no | yes |
| Offline analysis | yes | yes | via perf record |
serial_full is for when you want a complete offline log — a full timeline of
every call, every thread, every IP address, written to a file. usdt is for
when you want live visibility into a running system with surgical precision and
zero overhead on the paths you are not watching.
Both modes work in ld.so. Both trace _dl_start. The choice is whether you
want to watch or record.
The complete plugin
All three modes live in a single inject.so:
# serial_full — complete offline trace, glibc-safe
gcc -fplugin=./inject.so \
-fplugin-arg-inject-mode=serial_full \
-fplugin-arg-inject-logpath=/tmp/trace.log \
-fplugin-arg-inject-rules=rules.yml \
myprogram.c -o myprogram
# usdt — live probes, zero overhead
gcc -fplugin=./inject.so \
-fplugin-arg-inject-mode=usdt \
-fplugin-arg-inject-provider=myapp \
-fplugin-arg-inject-rules=rules.yml \
myprogram.c -o myprogram
# serial — normal programs, external helpers
gcc -fplugin=./inject.so \
-fplugin-arg-inject-mode=serial \
-fplugin-arg-inject-rules=rules.yml \
myprogram.c tracer_helpers.o inject_globals.o -o myprogram
The code is at https://git.fynq.rocks/markus/asm_inject.