• Andrii Nakryiko's avatar
    selftest/bpf: Fmod_ret prog and implement test_overhead as part of bench · 4eaf0b5c
    Andrii Nakryiko authored
    Add fmod_ret BPF program to existing test_overhead selftest. Also re-implement
    user-space benchmarking part into benchmark runner to compare results. Results
    with ./bench are consistently somewhat lower than test_overhead's, but relative
    performance of various types of BPF programs stay consisten (e.g., kretprobe is
    noticeably slower). This slowdown seems to be coming from the fact that
    test_overhead is single-threaded, while benchmark always spins off at least
    one thread for producer. This has been confirmed by hacking multi-threaded
    test_overhead variant and also single-threaded bench variant. Resutls are
    below. run_bench_rename.sh script from benchs/ subdirectory was used to
    produce results for ./bench.
    
    Single-threaded implementations
    ===============================
    
    /* bench: single-threaded, atomics */
    base      :    4.622 ± 0.049M/s
    kprobe    :    3.673 ± 0.052M/s
    kretprobe :    2.625 ± 0.052M/s
    rawtp     :    4.369 ± 0.089M/s
    fentry    :    4.201 ± 0.558M/s
    fexit     :    4.309 ± 0.148M/s
    fmodret   :    4.314 ± 0.203M/s
    
    /* selftest: single-threaded, no atomics */
    task_rename base        4555K events per sec
    task_rename kprobe      3643K events per sec
    task_rename kretprobe   2506K events per sec
    task_rename raw_tp      4303K events per sec
    task_rename fentry      4307K events per sec
    task_rename fexit       4010K events per sec
    task_rename fmod_ret    3984K events per sec
    
    Multi-threaded implementations
    ==============================
    
    /* bench: multi-threaded w/ atomics */
    base      :    3.910 ± 0.023M/s
    kprobe    :    3.048 ± 0.037M/s
    kretprobe :    2.300 ± 0.015M/s
    rawtp     :    3.687 ± 0.034M/s
    fentry    :    3.740 ± 0.087M/s
    fexit     :    3.510 ± 0.009M/s
    fmodret   :    3.485 ± 0.050M/s
    
    /* selftest: multi-threaded w/ atomics */
    task_rename base        3872K events per sec
    task_rename kprobe      3068K events per sec
    task_rename kretprobe   2350K events per sec
    task_rename raw_tp      3731K events per sec
    task_rename fentry      3639K events per sec
    task_rename fexit       3558K events per sec
    task_rename fmod_ret    3511K events per sec
    
    /* selftest: multi-threaded, no atomics */
    task_rename base        3945K events per sec
    task_rename kprobe      3298K events per sec
    task_rename kretprobe   2451K events per sec
    task_rename raw_tp      3718K events per sec
    task_rename fentry      3782K events per sec
    task_rename fexit       3543K events per sec
    task_rename fmod_ret    3526K events per sec
    
    Note that the fact that ./bench benchmark always uses atomic increments for
    counting, while test_overhead doesn't, doesn't influence test results all that
    much.
    Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
    Acked-by: default avatarYonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20200512192445.2351848-4-andriin@fb.com
    4eaf0b5c
run_bench_rename.sh 225 Bytes