Zero-Overhead USDT Probes Without sys/sdt.h

Table of Contents

The previous two posts built a GCC plugin that writes a trace file on every function entry and exit. That works well for offline analysis but has a cost: every call executes a spinlock check, a gettid syscall, a hex formatting loop, and a writev. Even with the lock already open and O_APPEND in place, that is work happening on every function call in your program.

This post adds a third mode: USDT probes. Each probe is a single nop instruction plus an ELF note. The overhead when no tracer is attached is literally one nop — unmeasurable. When bpftrace or perf attaches, it replaces the nop with a breakpoint and fires your script.

The interesting part: USDT probes normally require #include <sys/sdt.h> which expands STAP_PROBE macros into the appropriate asm. We skip the header entirely and synthesize the ELF notes directly in GIMPLE from the plugin. No external library, no header dependency, works in glibc’s ld.so just like serial_full.

How USDT probes work

A USDT probe has two parts. First, a labeled nop at the probe site:

990:
    nop

Second, an entry in the .note.stapsdt ELF section describing the probe:

.pushsection .note.stapsdt,"?","note"
.balign 4
.4byte 8              /* namesz: "stapsdt\0" */
.4byte 40             /* descsz */
.4byte 3              /* NT_STAPSDT */
.asciz "stapsdt"
.balign 4
.8byte 990b           /* probe PC — address of the nop */
.8byte 0              /* base address */
.8byte 0              /* semaphore address */
.asciz "inject"       /* provider name */
.asciz "entry__main"  /* probe name */
.asciz ""             /* argument descriptors */
.balign 4
.popsection

When bpftrace reads the binary it finds all .note.stapsdt entries, extracts the probe PC, and installs a kernel uprobe at that address. When the process hits the nop, the kernel fires the uprobe handler, runs your bpftrace script, and resumes execution. If no tracer is attached, the nop just executes and the process continues normally.

The note format is well-documented in the SystemTap source and used identically by bpftrace, perf, and stap. No tool-specific format — one note format, all tools.

Synthesizing the note in the plugin

The plugin emits both parts — the nop and the .note.stapsdt entry — as a single gimple_build_asm_vec call. The asm string contains the nop and the full assembler directives for the note:

static void emit_usdt_probe(gimple_seq *seq,
                            const char *provider,
                            const char *probe_name,
                            int         counter) {
    int descsz = 24 + strlen(provider) + 1
                    + strlen(probe_name) + 1
                    + 1;  /* empty args */

    char *tmpl = NULL;
    asprintf(&tmpl,
        "%d:\n\t"
        "nop\n\t"
        ".pushsection .note.stapsdt,\"?\",\"note\"\n\t"
        ".balign 4\n\t"
        ".4byte 8\n\t"
        ".4byte %d\n\t"
        ".4byte 3\n\t"
        ".asciz \"stapsdt\"\n\t"
        ".balign 4\n\t"
        ".8byte %db\n\t"   /* probe PC = address of the nop */
        ".8byte 0\n\t"     /* base address */
        ".8byte 0\n\t"     /* semaphore */
        ".asciz \"%s\"\n\t"
        ".asciz \"%s\"\n\t"
        ".asciz \"\"\n\t"  /* no typed arguments */
        ".balign 4\n\t"
        ".popsection",
        counter, descsz, counter, provider, probe_name);

    gimple g = gimple_build_asm_vec(tmpl, NULL, NULL, NULL, NULL);
    gimple_asm_set_volatile(as_a_gasm(g), true);
    gimple_seq_add_stmt(seq, g);
    free(tmpl);
}

The counter gives each probe a unique local label (990:, 991:, etc.) so the backward reference %db resolves correctly within the TU. The probe name encodes the event type and function name: entry__main, exit__printf.

In instrument_entry and instrument_exit the GIMPLE call to dump_call_info is replaced with a call to emit_usdt_probe:

if (inject_mode == 2) {
    char *probe_name;
    asprintf(&probe_name, "entry__%s", current_function_name());
    gimple_seq probe_seq = NULL;
    emit_usdt_probe(&probe_seq, inject_provider, probe_name, probe_counter++);
    gsi_insert_seq_before(&gsi, probe_seq, GSI_NEW_STMT);
    free(probe_name);
    return;
}

No __inject_trace_write is synthesized in USDT mode. No file is opened, no spinlock runs, no syscalls execute until a tracer attaches.

Building with USDT probes

make test_usdt

# Verify the notes are present
readelf -n test_plugin | grep -c stapsdt

# List all probes
bpftrace -l 'usdt:./test_plugin:inject:*'

Output:

usdt:./test_plugin:inject:entry__main
usdt:./test_plugin:inject:exit__main
usdt:./test_plugin:inject:entry__fun_char
usdt:./test_plugin:inject:exit__fun_char
usdt:./test_plugin:inject:entry__write_ext2
usdt:./test_plugin:inject:exit__write_ext2
usdt:./test_plugin:inject:entry__doSomeThing
usdt:./test_plugin:inject:exit__doSomeThing

The provider name defaults to inject, overridable with -fplugin-arg-inject-provider=myprov. Probe names are entry__<funcname> and exit__<funcname> — double underscore since function names can contain single underscores.

Attaching with bpftrace

# Trace all entry events
bpftrace -e '
usdt:./test_plugin:inject:entry__main {
    printf("main entered tid=%d\n", tid);
}
usdt:./test_plugin:inject:entry__doSomeThing {
    printf("thread fn tid=%d\n", tid);
}
' -c ./test_plugin
# Measure time in a function
bpftrace -e '
usdt:./test_plugin:inject:entry__fun_char { @start[tid] = nsecs; }
usdt:./test_plugin:inject:exit__fun_char  {
    printf("fun_char: %d ns\n", nsecs - @start[tid]);
    delete(@start[tid]);
}
' -c ./test_plugin
# Count calls per function
bpftrace -e '
usdt:./test_plugin:inject:entry__* {
    @calls[probe] = count();
}
END { print(@calls); }
' -c ./test_plugin

USDT on instrumented glibc

Build glibc with USDT mode:

cd ~/work/glibc_build && rm -rf *

~/work/glibc/configure \
  CC='/usr/bin/gcc -g -O2 \
    -Wno-error=stringop-truncation \
    -Wno-error=maybe-uninitialized \
    -Wno-error=uninitialized \
    -fplugin=/path/to/inject.so \
    -fplugin-arg-inject-mode=usdt \
    -fplugin-arg-inject-provider=glibc' \
  --prefix=~/work/glibc_install_usdt/

make -j$(nproc) && make install
# How many probes?
bpftrace -l \
  "usdt:/home/markus/work/glibc_install_usdt/lib/libc.so.6:glibc:*" | wc -l

Thousands. Every non-excluded function in glibc has an entry and exit probe.

# Trace the printf call chain live
bpftrace -e '
usdt:/home/markus/work/glibc_install_usdt/lib/libc.so.6:glibc:entry__* {
    printf("%s\n", probe);
}
' -c /tmp/hello

The multi-library problem

bpftrace requires each shared library to be named explicitly. When a program loads several instrumented libraries you need all of them in the probe spec. Use ldd to find which ones have stapsdt notes:

for lib in $(ldd /tmp/hello | awk '{print $3}' | grep '^/'); do
    count=$(readelf -n $lib 2>/dev/null | grep -c stapsdt)
    [ "$count" -gt 0 ] && echo "$lib"
done

perf handles this more cleanly — it follows all loaded libraries automatically:

perf record -e 'sdt_glibc:entry__*' /tmp/hello
perf script | head -30

Comparing the three modes

serial serial_full usdt
Overhead writev per call writev per call nop per call
Output file file bpftrace/perf/stap
glibc/ld.so no yes yes
External deps tracer_helpers.o none none
Log path configurable plugin arg
Live attach no no yes
Offline analysis yes yes via perf record

serial_full is for when you want a complete offline log — a full timeline of every call, every thread, every IP address, written to a file. usdt is for when you want live visibility into a running system with surgical precision and zero overhead on the paths you are not watching.

Both modes work in ld.so. Both trace _dl_start. The choice is whether you want to watch or record.

The complete plugin

All three modes live in a single inject.so:

# serial_full — complete offline trace, glibc-safe
gcc -fplugin=./inject.so \
    -fplugin-arg-inject-mode=serial_full \
    -fplugin-arg-inject-logpath=/tmp/trace.log \
    -fplugin-arg-inject-rules=rules.yml \
    myprogram.c -o myprogram

# usdt — live probes, zero overhead
gcc -fplugin=./inject.so \
    -fplugin-arg-inject-mode=usdt \
    -fplugin-arg-inject-provider=myapp \
    -fplugin-arg-inject-rules=rules.yml \
    myprogram.c -o myprogram

# serial — normal programs, external helpers
gcc -fplugin=./inject.so \
    -fplugin-arg-inject-mode=serial \
    -fplugin-arg-inject-rules=rules.yml \
    myprogram.c tracer_helpers.o inject_globals.o -o myprogram

The code is at https://git.fynq.rocks/markus/asm_inject.

Related Posts

Tracing Every Function Call with a GCC Plugin

Ever wanted to see every function call in a program — entry, exit, thread ID, and instruction pointer — without touching the source? This post builds exactly that: a GCC plugin that instruments functions at compile time and a libc-free runtime tracer that writes the log using raw syscalls.

Read More

Tracing glibc Itself — From _dl_start to printf

The previous post built a GCC plugin that traces every function in a normal program. This post takes it to the logical extreme: instrumenting glibc itself, including ld.so — the dynamic linker that runs before main, before any libc function exists, before the process is even fully initialized.

Read More