1. 25 Oct, 2022 2 commits
  2. 22 Oct, 2022 4 commits
  3. 21 Oct, 2022 15 commits
  4. 19 Oct, 2022 19 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf,x64: Use BMI2 for shifts' · 04a8f9d7
      Alexei Starovoitov authored
      Jie Meng says:
      
      ====================
      
      With baseline x64 instruction set, shift count can only be an immediate
      or in %cl. The implicit dependency on %cl makes it necessary to shuffle
      registers around and/or add push/pop operations.
      
      BMI2 provides shift instructions that can use any general register as
      the shift count, saving us instructions and a few bytes in most cases.
      
      Suboptimal codegen when %ecx is source and/or destination is also
      addressed and unnecessary instructions are removed.
      
      test_progs: Summary: 267/1340 PASSED, 25 SKIPPED, 0 FAILED
      test_progs-no_alu32: Summary: 267/1333 PASSED, 26 SKIPPED, 0 FAILED
      test_verifier: Summary: 1367 PASSED, 636 SKIPPED, 0 FAILED (same result
       with or without BMI2)
      test_maps: OK, 0 SKIPPED
      lib/test_bpf:
        test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed]
        test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
        test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
      ---
      v4 -> v5:
      - More comments regarding instruction encoding
      v3 -> v4:
      - Fixed a regression when BMI2 isn't available
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      04a8f9d7
    • Jie Meng's avatar
      bpf: add selftests for lsh, rsh, arsh with reg operand · 8662de23
      Jie Meng authored
      Current tests cover only shifts with an immediate as the source
      operand/shift counts; add a new test case to cover register operand.
      Signed-off-by: default avatarJie Meng <jmeng@fb.com>
      Link: https://lore.kernel.org/r/20221007202348.1118830-4-jmeng@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8662de23
    • Jie Meng's avatar
      bpf,x64: use shrx/sarx/shlx when available · 77d8f5d4
      Jie Meng authored
      BMI2 provides 3 shift instructions (shrx, sarx and shlx) that use VEX
      encoding but target general purpose registers [1]. They allow the shift
      count in any general purpose register and have the same performance as
      non BMI2 shift instructions [2].
      
      Instead of shr/sar/shl that implicitly use %cl (lowest 8 bit of %rcx),
      emit their more flexible alternatives provided in BMI2 when advantageous;
      keep using the non BMI2 instructions when shift count is already in
      BPF_REG_4/%rcx as non BMI2 instructions are shorter.
      
      To summarize, when BMI2 is available:
      -------------------------------------------------
                  |   arbitrary dst
      =================================================
      src == ecx  |   shl dst, cl
      -------------------------------------------------
      src != ecx  |   shlx dst, dst, src
      -------------------------------------------------
      
      And no additional register shuffling is needed.
      
      A concrete example between non BMI2 and BMI2 codegen.  To shift %rsi by
      %rdi:
      
      Without BMI2:
      
       ef3:   push   %rcx
              51
       ef4:   mov    %rdi,%rcx
              48 89 f9
       ef7:   shl    %cl,%rsi
              48 d3 e6
       efa:   pop    %rcx
              59
      
      With BMI2:
      
       f0b:   shlx   %rdi,%rsi,%rsi
              c4 e2 c1 f7 f6
      
      [1] https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
      [2] https://www.agner.org/optimize/instruction_tables.pdfSigned-off-by: default avatarJie Meng <jmeng@fb.com>
      Link: https://lore.kernel.org/r/20221007202348.1118830-3-jmeng@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      77d8f5d4
    • Jie Meng's avatar
      bpf,x64: avoid unnecessary instructions when shift dest is ecx · 81b35e7c
      Jie Meng authored
      x64 JIT produces redundant instructions when a shift operation's
      destination register is BPF_REG_4/ecx and this patch removes them.
      
      Specifically, when dest reg is BPF_REG_4 but the src isn't, we
      needn't push and pop ecx around shift only to get it overwritten
      by r11 immediately afterwards.
      
      In the rare case when both dest and src registers are BPF_REG_4,
      a single shift instruction is sufficient and we don't need the
      two MOV instructions around the shift.
      
      To summarize using shift left as an example, without patch:
      -------------------------------------------------
                  |   dst == ecx     |    dst != ecx
      =================================================
      src == ecx  |   mov r11, ecx   |    shl dst, cl
                  |   shl r11, ecx   |
                  |   mov ecx, r11   |
      -------------------------------------------------
      src != ecx  |   mov r11, ecx   |    push ecx
                  |   push ecx       |    mov ecx, src
                  |   mov ecx, src   |    shl dst, cl
                  |   shl r11, cl    |    pop ecx
                  |   pop ecx        |
                  |   mov ecx, r11   |
      -------------------------------------------------
      
      With patch:
      -------------------------------------------------
                  |   dst == ecx     |    dst != ecx
      =================================================
      src == ecx  |   shl ecx, cl    |    shl dst, cl
      -------------------------------------------------
      src != ecx  |   mov r11, ecx   |    push ecx
                  |   mov ecx, src   |    mov ecx, src
                  |   shl r11, cl    |    shl dst, cl
                  |   mov ecx, r11   |    pop ecx
      -------------------------------------------------
      Signed-off-by: default avatarJie Meng <jmeng@fb.com>
      Link: https://lore.kernel.org/r/20221007202348.1118830-2-jmeng@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      81b35e7c
    • Alexei Starovoitov's avatar
      Merge branch 'libbpf: support non-mmap()'able data sections' · 7d8d5355
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      
      Make libbpf more conservative in using BPF_F_MMAPABLE flag with internal BPF
      array maps that are backing global data sections. See patch #2 for full
      description and justification.
      
      Changes in this dataset support having bpf_spinlock, kptr, rb_tree nodes and
      other "special" variables as global variables. Combining this with libbpf's
      existing support for multiple custom .data.* sections allows BPF programs to
      utilize multiple spinlock/rbtree_node/kptr variables in a pretty natural way
      by just putting all such variables into separate data sections (and thus ARRAY
      maps).
      
      v1->v2:
        - address Stanislav's feedback, adds acks.
      ====================
      Acked-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7d8d5355
    • Andrii Nakryiko's avatar
      libbpf: add non-mmapable data section selftest · 2f968e9f
      Andrii Nakryiko authored
      Add non-mmapable data section to test_skeleton selftest and make sure it
      really isn't mmapable by trying to mmap() it anyways.
      
      Also make sure that libbpf doesn't report BPF_F_MMAPABLE flag to users.
      
      Additional, some more manual testing was performed that this feature
      works as intended.
      
      Looking at created map through bpftool shows that flags passed to kernel are
      indeed zero:
      
        $ bpftool map show
        ...
        1782: array  name .data.non_mmapa  flags 0x0
                key 4B  value 16B  max_entries 1  memlock 4096B
                btf_id 1169
                pids test_progs(8311)
        ...
      
      Checking BTF uploaded to kernel for this map shows that zero_key and
      zero_value are indeed marked as static, even though zero_key is actually
      original global (but STV_HIDDEN) variable:
      
        $ bpftool btf dump id 1169
        ...
        [51] VAR 'zero_key' type_id=2, linkage=static
        [52] VAR 'zero_value' type_id=7, linkage=static
        ...
        [62] DATASEC '.data.non_mmapable' size=16 vlen=2
                type_id=51 offset=0 size=4 (VAR 'zero_key')
                type_id=52 offset=4 size=12 (VAR 'zero_value')
        ...
      
      And original BTF does have zero_key marked as linkage=global:
      
        $ bpftool btf dump file test_skeleton.bpf.linked3.o
        ...
        [51] VAR 'zero_key' type_id=2, linkage=global
        [52] VAR 'zero_value' type_id=7, linkage=static
        ...
        [62] DATASEC '.data.non_mmapable' size=16 vlen=2
                type_id=51 offset=0 size=4 (VAR 'zero_key')
                type_id=52 offset=4 size=12 (VAR 'zero_value')
      
      Bpftool didn't require any changes at all because it checks whether internal
      map is mmapable already, but just to double-check generated skeleton, we
      see that .data.non_mmapable neither sets mmaped pointer nor has
      a corresponding field in the skeleton:
      
        $ grep non_mmapable test_skeleton.skel.h
                        struct bpf_map *data_non_mmapable;
                s->maps[7].name = ".data.non_mmapable";
                s->maps[7].map = &obj->maps.data_non_mmapable;
      
      But .data.read_mostly has all of those things:
      
        $ grep read_mostly test_skeleton.skel.h
                        struct bpf_map *data_read_mostly;
                struct test_skeleton__data_read_mostly {
                        int read_mostly_var;
                } *data_read_mostly;
                s->maps[6].name = ".data.read_mostly";
                s->maps[6].map = &obj->maps.data_read_mostly;
                s->maps[6].mmaped = (void **)&obj->data_read_mostly;
                _Static_assert(sizeof(s->data_read_mostly->read_mostly_var) == 4, "unexpected size of 'read_mostly_var'");
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221019002816.359650-4-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2f968e9f
    • Andrii Nakryiko's avatar
      libbpf: only add BPF_F_MMAPABLE flag for data maps with global vars · 4fcac46c
      Andrii Nakryiko authored
      Teach libbpf to not add BPF_F_MMAPABLE flag unnecessarily for ARRAY maps
      that are backing data sections, if such data sections don't expose any
      variables to user-space. Exposed variables are those that have
      STB_GLOBAL or STB_WEAK ELF binding and correspond to BTF VAR's
      BTF_VAR_GLOBAL_ALLOCATED linkage.
      
      The overall idea is that if some data section doesn't have any variable that
      is exposed through BPF skeleton, then there is no reason to make such
      BPF array mmapable. Making BPF array mmapable is not a free no-op
      action, because BPF verifier doesn't allow users to put special objects
      (such as BPF spin locks, RB tree nodes, linked list nodes, kptrs, etc;
      anything that has a sensitive internal state that should not be modified
      arbitrarily from user space) into mmapable arrays, as there is no way to
      prevent user space from corrupting such sensitive state through direct
      memory access through memory-mapped region.
      
      By making sure that libbpf doesn't add BPF_F_MMAPABLE flag to BPF array
      maps corresponding to data sections that only have static variables
      (which are not supposed to be visible to user space according to libbpf
      and BPF skeleton rules), users now can have spinlocks, kptrs, etc in
      either default .bss/.data sections or custom .data.* sections (assuming
      there are no global variables in such sections).
      
      The only possible hiccup with this approach is the need to use global
      variables during BPF static linking, even if it's not intended to be
      shared with user space through BPF skeleton. To allow such scenarios,
      extend libbpf's STV_HIDDEN ELF visibility attribute handling to
      variables. Libbpf is already treating global hidden BPF subprograms as
      static subprograms and adjusts BTF accordingly to make BPF verifier
      verify such subprograms as static subprograms with preserving entire BPF
      verifier state between subprog calls. This patch teaches libbpf to treat
      global hidden variables as static ones and adjust BTF information
      accordingly as well. This allows to share variables between multiple
      object files during static linking, but still keep them internal to BPF
      program and not get them exposed through BPF skeleton.
      
      Note, that if the user has some advanced scenario where they absolutely
      need BPF_F_MMAPABLE flag on .data/.bss/.rodata BPF array map despite
      only having static variables, they still can achieve this by forcing it
      through explicit bpf_map__set_map_flags() API.
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20221019002816.359650-3-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4fcac46c
    • Andrii Nakryiko's avatar
      libbpf: clean up and refactor BTF fixup step · f33f742d
      Andrii Nakryiko authored
      Refactor libbpf's BTF fixup step during BPF object open phase. The only
      functional change is that we now ignore BTF_VAR_GLOBAL_EXTERN variables
      during fix up, not just BTF_VAR_STATIC ones, which shouldn't cause any
      change in behavior as there shouldn't be any extern variable in data
      sections for valid BPF object anyways.
      
      Otherwise it's just collapsing two functions that have no reason to be
      separate, and switching find_elf_var_offset() helper to return entire
      symbol pointer, not just its offset. This will be used by next patch to
      get ELF symbol visibility.
      
      While refactoring, also "normalize" debug messages inside
      btf_fixup_datasec() to follow general libbpf style and print out data
      section name consistently, where it's available.
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221019002816.359650-2-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f33f742d
    • Daniel Müller's avatar
      bpf/docs: Summarize CI system and deny lists · 81bfcc3f
      Daniel Müller authored
      This change adds a brief summary of the BPF continuous integration (CI)
      to the BPF selftest documentation. The summary focuses not so much on
      actual workings of the CI, as it is maintained outside of the
      repository, but aims to document the few bits of it that are sourced
      from this repository and that developers may want to adjust as part of
      patch submissions: the BPF kernel configuration and the deny list
      file(s).
      
      Changelog:
      - v1->v2:
        - use s390x instead of s390 for consistency
      Signed-off-by: default avatarDaniel Müller <deso@posteo.net>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20221018164015.1970862-1-deso@posteo.netSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      81bfcc3f
    • Daniel Müller's avatar
      samples/bpf: Fix typos in README · 2c4d72d6
      Daniel Müller authored
      This change fixes some typos found in the BPF samples README file.
      Signed-off-by: default avatarDaniel Müller <deso@posteo.net>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20221018163231.1926462-1-deso@posteo.netSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      2c4d72d6
    • Shaomin Deng's avatar
      01dea954
    • Gerhard Engleder's avatar
      samples/bpf: Fix MAC address swapping in xdp2_kern · 7a698edf
      Gerhard Engleder authored
      xdp2_kern rewrites and forwards packets out on the same interface.
      Forwarding still works but rewrite got broken when xdp multibuffer
      support has been added.
      
      With xdp multibuffer a local copy of the packet has been introduced. The
      MAC address is now swapped in the local copy, but the local copy in not
      written back.
      
      Fix MAC address swapping be adding write back of modified packet.
      
      Fixes: 77225174 ("samples/bpf: fixup some tools to be able to support xdp multibuffer")
      Signed-off-by: default avatarGerhard Engleder <gerhard@engleder-embedded.com>
      Reviewed-by: default avatarAndy Gospodarek <gospo@broadcom.com>
      Link: https://lore.kernel.org/r/20221015213050.65222-1-gerhard@engleder-embedded.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      7a698edf
    • Gerhard Engleder's avatar
      samples/bpf: Fix map iteration in xdp1_user · 05ee658c
      Gerhard Engleder authored
      BPF map iteration in xdp1_user results in endless loop without any
      output, because the return value of bpf_map_get_next_key() is checked
      against the wrong value.
      
      Other call locations of bpf_map_get_next_key() check for equal 0 for
      continuing the iteration. xdp1_user checks against unequal -1. This is
      wrong for a function which can return arbitrary negative errno values,
      because a return value of e.g. -2 results in an endless loop.
      
      With this fix xdp1_user is printing statistics again:
      proto 0:          1 pkt/s
      proto 0:          1 pkt/s
      proto 17:     107383 pkt/s
      proto 17:     881655 pkt/s
      proto 17:     882083 pkt/s
      proto 17:     881758 pkt/s
      
      Fixes: bd054102 ("libbpf: enforce strict libbpf 1.0 behaviors")
      Signed-off-by: default avatarGerhard Engleder <gerhard@engleder-embedded.com>
      Acked-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20221013200922.17167-1-gerhard@engleder-embedded.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      05ee658c
    • Alexandru Tachici's avatar
      net: ethernet: adi: adin1110: Fix SPI transfers · a526a3cc
      Alexandru Tachici authored
      No need to use more than one SPI transfer for reads.
      Use only one from now as ADIN1110/2111 does not tolerate
      CS changes during reads.
      
      The BCM2711/2708 SPI controllers worked fine, but the NXP
      IMX8MM could not keep CS lowered during SPI bursts.
      
      This change aims to make the ADIN1110/2111 driver compatible
      with both SPI controllers, without any loss of bandwidth/other
      capabilities.
      
      Fixes: bc93e19d ("net: ethernet: adi: Add ADIN1110 support")
      Signed-off-by: default avatarAlexandru Tachici <alexandru.tachici@analog.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a526a3cc
    • David S. Miller's avatar
      Merge branch 'net-bridge-mc-cleanups' · ac3208fb
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      bridge: A few multicast cleanups
      
      Clean up a few issues spotted while working on the bridge multicast code
      and running its selftests.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac3208fb
    • Ido Schimmel's avatar
      bridge: mcast: Simplify MDB entry creation · d1942cd4
      Ido Schimmel authored
      Before creating a new MDB entry, br_multicast_new_group() will call
      br_mdb_ip_get() to see if one exists and return it if so.
      
      Therefore, simply call br_multicast_new_group() and omit the call to
      br_mdb_ip_get().
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1942cd4
    • Ido Schimmel's avatar
      bridge: mcast: Use spin_lock() instead of spin_lock_bh() · 262985fa
      Ido Schimmel authored
      IGMPv3 / MLDv2 Membership Reports are only processed from the data path
      with softIRQ disabled, so there is no need to call spin_lock_bh(). Use
      spin_lock() instead.
      
      This is consistent with how other IGMP / MLD packets are processed.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      262985fa
    • Ido Schimmel's avatar
      selftests: bridge_igmp: Remove unnecessary address deletion · b526b2ea
      Ido Schimmel authored
      The test group address is added and removed in v2reportleave_test().
      There is no need to delete it again during cleanup as it results in the
      following error message:
      
       # bash -x ./bridge_igmp.sh
       [...]
       + cleanup
       + pre_cleanup
       [...]
       + ip address del dev swp4 239.10.10.10/32
       RTNETLINK answers: Cannot assign requested address
       + h2_destroy
      
      Solve by removing the unnecessary address deletion.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b526b2ea
    • Ido Schimmel's avatar
      selftests: bridge_vlan_mcast: Delete qdiscs during cleanup · 6fb1faa1
      Ido Schimmel authored
      The qdiscs are added during setup, but not deleted during cleanup,
      resulting in the following error messages:
      
       # ./bridge_vlan_mcast.sh
       [...]
       # ./bridge_vlan_mcast.sh
       Error: Exclusivity flag on, cannot modify.
       Error: Exclusivity flag on, cannot modify.
      
      Solve by deleting the qdiscs during cleanup.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fb1faa1