• Daniel Borkmann's avatar
    bpf, arm64: optimize 32/64 immediate emission · 6d2eea6f
    Daniel Borkmann authored
    Improve the JIT to emit 64 and 32 bit immediates, the current
    algorithm is not optimal and we often emit more instructions
    than actually needed. arm64 has movz, movn, movk variants but
    for the current 64 bit immediates we only use movz with a
    series of movk when needed.
    
    For example loading ffffffffffffabab emits the following 4
    instructions in the JIT today:
    
      * movz: abab, shift:  0, result: 000000000000abab
      * movk: ffff, shift: 16, result: 00000000ffffabab
      * movk: ffff, shift: 32, result: 0000ffffffffabab
      * movk: ffff, shift: 48, result: ffffffffffffabab
    
    Whereas after the patch the same load only needs a single
    instruction:
    
      * movn: 5454, shift:  0, result: ffffffffffffabab
    
    Another example where two extra instructions can be saved:
    
      * movz: abab, shift:  0, result: 000000000000abab
      * movk: 1f2f, shift: 16, result: 000000001f2fabab
      * movk: ffff, shift: 32, result: 0000ffff1f2fabab
      * movk: ffff, shift: 48, result: ffffffff1f2fabab
    
    After the patch:
    
      * movn: e0d0, shift: 16, result: ffffffff1f2fffff
      * movk: abab, shift:  0, result: ffffffff1f2fabab
    
    Another example with movz, before:
    
      * movz: 0000, shift:  0, result: 0000000000000000
      * movk: fea0, shift: 32, result: 0000fea000000000
    
    After:
    
      * movz: fea0, shift: 32, result: 0000fea000000000
    
    Moreover, reuse emit_a64_mov_i() for 32 bit immediates that
    are loaded via emit_a64_mov_i64() which is a similar optimization
    as done in 6fe8b9c1 ("bpf, x64: save several bytes by using
    mov over movabsq when possible"). On arm64, the latter allows to
    use a single instruction with movn due to zero extension where
    otherwise two would be needed. And last but not least add a
    missing optimization in emit_a64_mov_i() where movn is used but
    the subsequent movk not needed. With some of the Cilium programs
    in use, this shrinks the needed instructions by about three
    percent. Tested on Cavium ThunderX CN8890.
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    6d2eea6f
bpf_jit_comp.c 23.6 KB