1. 25 Oct, 2022 17 commits
  2. 22 Oct, 2022 4 commits
  3. 21 Oct, 2022 15 commits
  4. 19 Oct, 2022 4 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf,x64: Use BMI2 for shifts' · 04a8f9d7
      Alexei Starovoitov authored
      Jie Meng says:
      
      ====================
      
      With baseline x64 instruction set, shift count can only be an immediate
      or in %cl. The implicit dependency on %cl makes it necessary to shuffle
      registers around and/or add push/pop operations.
      
      BMI2 provides shift instructions that can use any general register as
      the shift count, saving us instructions and a few bytes in most cases.
      
      Suboptimal codegen when %ecx is source and/or destination is also
      addressed and unnecessary instructions are removed.
      
      test_progs: Summary: 267/1340 PASSED, 25 SKIPPED, 0 FAILED
      test_progs-no_alu32: Summary: 267/1333 PASSED, 26 SKIPPED, 0 FAILED
      test_verifier: Summary: 1367 PASSED, 636 SKIPPED, 0 FAILED (same result
       with or without BMI2)
      test_maps: OK, 0 SKIPPED
      lib/test_bpf:
        test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed]
        test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
        test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
      ---
      v4 -> v5:
      - More comments regarding instruction encoding
      v3 -> v4:
      - Fixed a regression when BMI2 isn't available
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      04a8f9d7
    • Jie Meng's avatar
      bpf: add selftests for lsh, rsh, arsh with reg operand · 8662de23
      Jie Meng authored
      Current tests cover only shifts with an immediate as the source
      operand/shift counts; add a new test case to cover register operand.
      Signed-off-by: default avatarJie Meng <jmeng@fb.com>
      Link: https://lore.kernel.org/r/20221007202348.1118830-4-jmeng@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8662de23
    • Jie Meng's avatar
      bpf,x64: use shrx/sarx/shlx when available · 77d8f5d4
      Jie Meng authored
      BMI2 provides 3 shift instructions (shrx, sarx and shlx) that use VEX
      encoding but target general purpose registers [1]. They allow the shift
      count in any general purpose register and have the same performance as
      non BMI2 shift instructions [2].
      
      Instead of shr/sar/shl that implicitly use %cl (lowest 8 bit of %rcx),
      emit their more flexible alternatives provided in BMI2 when advantageous;
      keep using the non BMI2 instructions when shift count is already in
      BPF_REG_4/%rcx as non BMI2 instructions are shorter.
      
      To summarize, when BMI2 is available:
      -------------------------------------------------
                  |   arbitrary dst
      =================================================
      src == ecx  |   shl dst, cl
      -------------------------------------------------
      src != ecx  |   shlx dst, dst, src
      -------------------------------------------------
      
      And no additional register shuffling is needed.
      
      A concrete example between non BMI2 and BMI2 codegen.  To shift %rsi by
      %rdi:
      
      Without BMI2:
      
       ef3:   push   %rcx
              51
       ef4:   mov    %rdi,%rcx
              48 89 f9
       ef7:   shl    %cl,%rsi
              48 d3 e6
       efa:   pop    %rcx
              59
      
      With BMI2:
      
       f0b:   shlx   %rdi,%rsi,%rsi
              c4 e2 c1 f7 f6
      
      [1] https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
      [2] https://www.agner.org/optimize/instruction_tables.pdfSigned-off-by: default avatarJie Meng <jmeng@fb.com>
      Link: https://lore.kernel.org/r/20221007202348.1118830-3-jmeng@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      77d8f5d4
    • Jie Meng's avatar
      bpf,x64: avoid unnecessary instructions when shift dest is ecx · 81b35e7c
      Jie Meng authored
      x64 JIT produces redundant instructions when a shift operation's
      destination register is BPF_REG_4/ecx and this patch removes them.
      
      Specifically, when dest reg is BPF_REG_4 but the src isn't, we
      needn't push and pop ecx around shift only to get it overwritten
      by r11 immediately afterwards.
      
      In the rare case when both dest and src registers are BPF_REG_4,
      a single shift instruction is sufficient and we don't need the
      two MOV instructions around the shift.
      
      To summarize using shift left as an example, without patch:
      -------------------------------------------------
                  |   dst == ecx     |    dst != ecx
      =================================================
      src == ecx  |   mov r11, ecx   |    shl dst, cl
                  |   shl r11, ecx   |
                  |   mov ecx, r11   |
      -------------------------------------------------
      src != ecx  |   mov r11, ecx   |    push ecx
                  |   push ecx       |    mov ecx, src
                  |   mov ecx, src   |    shl dst, cl
                  |   shl r11, cl    |    pop ecx
                  |   pop ecx        |
                  |   mov ecx, r11   |
      -------------------------------------------------
      
      With patch:
      -------------------------------------------------
                  |   dst == ecx     |    dst != ecx
      =================================================
      src == ecx  |   shl ecx, cl    |    shl dst, cl
      -------------------------------------------------
      src != ecx  |   mov r11, ecx   |    push ecx
                  |   mov ecx, src   |    mov ecx, src
                  |   shl r11, cl    |    shl dst, cl
                  |   mov ecx, r11   |    pop ecx
      -------------------------------------------------
      Signed-off-by: default avatarJie Meng <jmeng@fb.com>
      Link: https://lore.kernel.org/r/20221007202348.1118830-2-jmeng@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      81b35e7c