• Alexei Starovoitov's avatar
    selftests/bpf: Add BPF trampoline performance test · c4781e37
    Alexei Starovoitov authored
    Add a test that benchmarks different ways of attaching BPF program to a kernel function.
    Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations:
    $ ./test_progs -n 49 -v|grep events
    task_rename base	2743K events per sec
    task_rename kprobe	2419K events per sec
    task_rename kretprobe	1876K events per sec
    task_rename raw_tp	2578K events per sec
    task_rename fentry	2710K events per sec
    task_rename fexit	2685K events per sec
    
    On a kernel with retpoline:
    $ ./test_progs -n 49 -v|grep events
    task_rename base	2401K events per sec
    task_rename kprobe	1930K events per sec
    task_rename kretprobe	1485K events per sec
    task_rename raw_tp	2053K events per sec
    task_rename fentry	2351K events per sec
    task_rename fexit	2185K events per sec
    
    All 5 approaches:
    - kprobe/kretprobe in __set_task_comm()
    - raw tracepoint in trace_task_rename()
    - fentry/fexit in __set_task_comm()
    are roughly equivalent.
    
    __set_task_comm() by itself is quite fast, so any extra instructions add up.
    Until BPF trampoline was introduced the fastest mechanism was raw tracepoint.
    kprobe via ftrace was second best. kretprobe is slow due to trap. New
    fentry/fexit methods via BPF trampoline are clearly the fastest and the
    difference is more pronounced with retpoline on, since BPF trampoline doesn't
    use indirect jumps.
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
    c4781e37
test_overhead.c 3.85 KB