• Eduard Zingerman's avatar
    bpf: Inline calls to bpf_loop when callback is known · 1ade2371
    Eduard Zingerman authored
    Calls to `bpf_loop` are replaced with direct loops to avoid
    indirection. E.g. the following:
    
      bpf_loop(10, foo, NULL, 0);
    
    Is replaced by equivalent of the following:
    
      for (int i = 0; i < 10; ++i)
        foo(i, NULL);
    
    This transformation could be applied when:
    - callback is known and does not change during program execution;
    - flags passed to `bpf_loop` are always zero.
    
    Inlining logic works as follows:
    
    - During execution simulation function `update_loop_inline_state`
      tracks the following information for each `bpf_loop` call
      instruction:
      - is callback known and constant?
      - are flags constant and zero?
    - Function `optimize_bpf_loop` increases stack depth for functions
      where `bpf_loop` calls can be inlined and invokes `inline_bpf_loop`
      to apply the inlining. The additional stack space is used to spill
      registers R6, R7 and R8. These registers are used as loop counter,
      loop maximal bound and callback context parameter;
    
    Measurements using `benchs/run_bench_bpf_loop.sh` inside QEMU / KVM on
    i7-4710HQ CPU show a drop in latency from 14 ns/op to 2 ns/op.
    Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
    Acked-by: default avatarSong Liu <songliubraving@fb.com>
    Link: https://lore.kernel.org/r/20220620235344.569325-4-eddyz87@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    1ade2371
bpf_iter.c 17.1 KB