1. 11 Mar, 2024 20 commits
  2. 09 Mar, 2024 1 commit
  3. 08 Mar, 2024 4 commits
    • Alexei Starovoitov's avatar
      Merge branch 'fix-hash-bucket-overflow-checks-for-32-bit-arches' · a27e8967
      Alexei Starovoitov authored
      Toke Høiland-Jørgensen says:
      
      ====================
      Fix hash bucket overflow checks for 32-bit arches
      
      Syzbot managed to trigger a crash by creating a DEVMAP_HASH map with a
      large number of buckets because the overflow check relies on
      well-defined behaviour that is only correct on 64-bit arches.
      
      Fix the overflow checks to happen before values are rounded up in all
      the affected map types.
      
      v3:
      - Keep the htab->n_buckets > U32_MAX / sizeof(struct bucket) check
      - Use 1UL << 31 instead of U32_MAX / 2 + 1 as the constant to check
        against
      - Add patch to fix stackmap.c
      v2:
      - Fix off-by-one error in overflow check
      - Apply the same fix to hashtab, where the devmap_hash code was copied
        from (John)
      
      Toke Høiland-Jørgensen (3):
        bpf: Fix DEVMAP_HASH overflow check on 32-bit arches
        bpf: Fix hashtab overflow check on 32-bit arches
        bpf: Fix stackmap overflow check on 32-bit arches
      
       kernel/bpf/devmap.c   | 11 ++++++-----
       kernel/bpf/hashtab.c  | 14 +++++++++-----
       kernel/bpf/stackmap.c |  9 ++++++---
       3 files changed, 21 insertions(+), 13 deletions(-)
      ====================
      
      Link: https://lore.kernel.org/r/20240307120340.99577-1-toke@redhat.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a27e8967
    • Toke Høiland-Jørgensen's avatar
      bpf: Fix stackmap overflow check on 32-bit arches · 7a4b2125
      Toke Høiland-Jørgensen authored
      The stackmap code relies on roundup_pow_of_two() to compute the number
      of hash buckets, and contains an overflow check by checking if the
      resulting value is 0. However, on 32-bit arches, the roundup code itself
      can overflow by doing a 32-bit left-shift of an unsigned long value,
      which is undefined behaviour, so it is not guaranteed to truncate
      neatly. This was triggered by syzbot on the DEVMAP_HASH type, which
      contains the same check, copied from the hashtab code.
      
      The commit in the fixes tag actually attempted to fix this, but the fix
      did not account for the UB, so the fix only works on CPUs where an
      overflow does result in a neat truncation to zero, which is not
      guaranteed. Checking the value before rounding does not have this
      problem.
      
      Fixes: 6183f4d3 ("bpf: Check for integer overflow when using roundup_pow_of_two()")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Reviewed-by: default avatarBui Quang Minh <minhquangbui99@gmail.com>
      Message-ID: <20240307120340.99577-4-toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7a4b2125
    • Toke Høiland-Jørgensen's avatar
      bpf: Fix hashtab overflow check on 32-bit arches · 6787d916
      Toke Høiland-Jørgensen authored
      The hashtab code relies on roundup_pow_of_two() to compute the number of
      hash buckets, and contains an overflow check by checking if the
      resulting value is 0. However, on 32-bit arches, the roundup code itself
      can overflow by doing a 32-bit left-shift of an unsigned long value,
      which is undefined behaviour, so it is not guaranteed to truncate
      neatly. This was triggered by syzbot on the DEVMAP_HASH type, which
      contains the same check, copied from the hashtab code. So apply the same
      fix to hashtab, by moving the overflow check to before the roundup.
      
      Fixes: daaf427c ("bpf: fix arraymap NULL deref and missing overflow and zero size checks")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Message-ID: <20240307120340.99577-3-toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6787d916
    • Toke Høiland-Jørgensen's avatar
      bpf: Fix DEVMAP_HASH overflow check on 32-bit arches · 281d464a
      Toke Høiland-Jørgensen authored
      The devmap code allocates a number hash buckets equal to the next power
      of two of the max_entries value provided when creating the map. When
      rounding up to the next power of two, the 32-bit variable storing the
      number of buckets can overflow, and the code checks for overflow by
      checking if the truncated 32-bit value is equal to 0. However, on 32-bit
      arches the rounding up itself can overflow mid-way through, because it
      ends up doing a left-shift of 32 bits on an unsigned long value. If the
      size of an unsigned long is four bytes, this is undefined behaviour, so
      there is no guarantee that we'll end up with a nice and tidy 0-value at
      the end.
      
      Syzbot managed to turn this into a crash on arm32 by creating a
      DEVMAP_HASH with max_entries > 0x80000000 and then trying to update it.
      Fix this by moving the overflow check to before the rounding up
      operation.
      
      Fixes: 6f9d451a ("xdp: Add devmap_hash map type for looking up devices by hashed index")
      Link: https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com
      Reported-and-tested-by: syzbot+8cd36f6b65f3cafd400a@syzkaller.appspotmail.com
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Message-ID: <20240307120340.99577-2-toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      281d464a
  4. 07 Mar, 2024 7 commits
  5. 06 Mar, 2024 8 commits
    • Puranjay Mohan's avatar
      bpf, riscv64/cfi: Support kCFI + BPF on riscv64 · e63985ec
      Puranjay Mohan authored
      The riscv BPF JIT doesn't emit proper kCFI prologues for BPF programs
      and struct_ops trampolines when CONFIG_CFI_CLANG is enabled.
      
      This causes CFI failures when calling BPF programs and can even crash
      the kernel due to invalid memory accesses.
      
      Example crash:
      
      root@rv-selftester:~/bpf# ./test_progs -a dummy_st_ops
      
       Unable to handle kernel paging request at virtual address ffffffff78204ffc
       Oops [#1]
       Modules linked in: bpf_testmod(OE) [....]
       CPU: 3 PID: 356 Comm: test_progs Tainted: P           OE      6.8.0-rc1 #1
       Hardware name: riscv-virtio,qemu (DT)
       epc : bpf_struct_ops_test_run+0x28c/0x5fc
        ra : bpf_struct_ops_test_run+0x26c/0x5fc
       epc : ffffffff82958010 ra : ffffffff82957ff0 sp : ff200000007abc80
        gp : ffffffff868d6218 tp : ff6000008d87b840 t0 : 000000000000000f
        t1 : 0000000000000000 t2 : 000000002005793e s0 : ff200000007abcf0
        s1 : ff6000008a90fee0 a0 : 0000000000000000 a1 : 0000000000000000
        a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
        a5 : ffffffff868dba26 a6 : 0000000000000001 a7 : 0000000052464e43
        s2 : 00007ffffc0a95f0 s3 : ff6000008a90fe80 s4 : ff60000084c24c00
        s5 : ffffffff78205000 s6 : ff60000088750648 s7 : ff20000000035008
        s8 : fffffffffffffff4 s9 : ffffffff86200610 s10: 0000000000000000
        s11: 0000000000000000 t3 : ffffffff8483dc30 t4 : ffffffff8483dc10
        t5 : ffffffff8483dbf0 t6 : ffffffff8483dbd0
       status: 0000000200000120 badaddr: ffffffff78204ffc cause: 000000000000000d
       [<ffffffff82958010>] bpf_struct_ops_test_run+0x28c/0x5fc
       [<ffffffff805083ee>] bpf_prog_test_run+0x170/0x548
       [<ffffffff805029c8>] __sys_bpf+0x2d2/0x378
       [<ffffffff804ff570>] __riscv_sys_bpf+0x5c/0x120
       [<ffffffff8000e8fe>] syscall_handler+0x62/0xe4
       [<ffffffff83362df6>] do_trap_ecall_u+0xc6/0x27c
       [<ffffffff833822c4>] ret_from_exception+0x0/0x64
       Code: b603 0109 b683 0189 b703 0209 8493 0609 157d 8d65 (a303) ffca
       ---[ end trace 0000000000000000 ]---
       Kernel panic - not syncing: Fatal exception
       SMP: stopping secondary CPUs
      
      Implement proper kCFI prologues for the BPF programs and callbacks and
      drop __nocfi for riscv64. Fix the trampoline generation code to emit kCFI
      prologue when a struct_ops trampoline is being prepared.
      Signed-off-by: default avatarPuranjay Mohan <puranjay12@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarBjörn Töpel <bjorn@kernel.org>
      Link: https://lore.kernel.org/bpf/20240303170207.82201-2-puranjay12@gmail.com
      e63985ec
    • Andrii Nakryiko's avatar
      Merge branch 'libbpf-type-suffixes-and-autocreate-flag-for-struct_ops-maps' · 516fca5a
      Andrii Nakryiko authored
      Eduard Zingerman says:
      
      ====================
      libbpf: type suffixes and autocreate flag for struct_ops maps
      
      Tweak struct_ops related APIs to allow the following features:
      - specify version suffixes for stuct_ops map types;
      - share same BPF program between several map definitions with
        different local BTF types, assuming only maps with same
        kernel BTF type would be selected for load;
      - toggle autocreate flag for struct_ops maps;
      - automatically toggle autoload for struct_ops programs referenced
        from struct_ops maps, depending on autocreate status of the
        corresponding map;
      - use SEC("?.struct_ops") and SEC("?.struct_ops.link")
        to define struct_ops maps with autocreate == false after object open.
      
      This would allow loading programs like below:
      
          SEC("struct_ops/foo") int BPF_PROG(foo) { ... }
          SEC("struct_ops/bar") int BPF_PROG(bar) { ... }
      
          struct bpf_testmod_ops___v1 {
              int (*foo)(void);
          };
      
          struct bpf_testmod_ops___v2 {
              int (*foo)(void);
              int (*bar)(void);
          };
      
          /* Assume kernel type name to be 'test_ops' */
          SEC(".struct_ops.link")
          struct test_ops___v1 map_v1 = {
              /* Program 'foo' shared by maps with
               * different local BTF type
               */
              .foo = (void *)foo
          };
      
          SEC(".struct_ops.link")
          struct test_ops___v2 map_v2 = {
              .foo = (void *)foo,
              .bar = (void *)bar
          };
      
      Assuming the following tweaks are done before loading:
      
          /* to load v1 */
          bpf_map__set_autocreate(skel->maps.map_v1, true);
          bpf_map__set_autocreate(skel->maps.map_v2, false);
      
          /* to load v2 */
          bpf_map__set_autocreate(skel->maps.map_v1, false);
          bpf_map__set_autocreate(skel->maps.map_v2, true);
      
      Patch #8 ties autocreate and autoload flags for struct_ops maps and
      programs.
      
      Changelog:
      - v3 [3] -> v4:
        - changes for multiple styling suggestions from Andrii;
        - patch #5: libbpf log capture now happens for LIBBPF_INFO and
          LIBBPF_WARN messages and does not depend on verbosity flags
          (Andrii);
        - patch #6: fixed runtime crash caused by conflict with newly added
          test case struct_ops_multi_pages;
        - patch #7: fixed free of possibly uninitialized pointer (Daniel)
        - patch #8: simpler algorithm to detect which programs to autoload
          (Andrii);
        - patch #9: added assertions for autoload flag after object load
          (Andrii);
        - patch #12: DATASEC name rewrite in libbpf is now done inplace, no
          new strings added to BTF (Andrii);
        - patch #14: allow any printable characters in DATASEC names when
          kernel validates BTF (Andrii)
      - v2 [2] -> v3:
        - moved patch #8 logic to be fully done on load
          (requested by Andrii in offlist discussion);
        - in patch #9 added test case for shadow vars and
          autocreate/autoload interaction.
      - v1 [1] -> v2:
        - fixed memory leak in patch #1 (Kui-Feng);
        - improved error messages in patch #2 (Martin, Andrii);
        - in bad_struct_ops selftest from patch #6 added .test_2
          map member setup (David);
        - added utility functions to capture libbpf log from selftests (David)
        - in selftests replaced usage of ...__open_and_load by separate
          calls to ..._open() and ..._load() (Andrii);
        - removed serial_... in selftest definitions (Andrii);
        - improved comments in selftest struct_ops_autocreate
          from patch #7 (David);
        - removed autoload toggling logic incompatible with shadow variables
          from bpf_map__set_autocreate(), instead struct_ops programs
          autoload property is computed at struct_ops maps load phase,
          see patch #8 (Kui-Feng, Martin, Andrii);
        - added support for SEC("?.struct_ops") and SEC("?.struct_ops.link")
          (Andrii).
      
      [1] https://lore.kernel.org/bpf/20240227204556.17524-1-eddyz87@gmail.com/
      [2] https://lore.kernel.org/bpf/20240302011920.15302-1-eddyz87@gmail.com/
      [3] https://lore.kernel.org/bpf/20240304225156.24765-1-eddyz87@gmail.com/
      ====================
      
      Link: https://lore.kernel.org/r/20240306104529.6453-1-eddyz87@gmail.comSigned-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      516fca5a
    • Eduard Zingerman's avatar
      selftests/bpf: Test cases for '?' in BTF names · 5208930a
      Eduard Zingerman authored
      Two test cases to verify that '?' and other printable characters are
      allowed in BTF DATASEC names:
      - DATASEC with name "?.foo bar:buz" should be accepted;
      - type with name "?foo" should be rejected.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240306104529.6453-16-eddyz87@gmail.com
      5208930a
    • Eduard Zingerman's avatar
      bpf: Allow all printable characters in BTF DATASEC names · bd70a8fb
      Eduard Zingerman authored
      The intent is to allow libbpf to use SEC("?.struct_ops") to identify
      struct_ops maps that are optional, e.g. like in the following BPF code:
      
          SEC("?.struct_ops")
          struct test_ops optional_map = { ... };
      
      Which yields the following BTF:
      
          ...
          [13] DATASEC '?.struct_ops' size=0 vlen=...
          ...
      
      To load such BTF libbpf rewrites DATASEC name before load.
      After this patch the rewrite won't be necessary.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240306104529.6453-15-eddyz87@gmail.com
      bd70a8fb
    • Eduard Zingerman's avatar
      selftests/bpf: Test case for SEC("?.struct_ops") · 733e5e87
      Eduard Zingerman authored
      Check that "?.struct_ops" and "?.struct_ops.link" section names define
      struct_ops maps with autocreate == false after open.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240306104529.6453-14-eddyz87@gmail.com
      733e5e87
    • Eduard Zingerman's avatar
      libbpf: Rewrite btf datasec names starting from '?' · 6ebaa3fb
      Eduard Zingerman authored
      Optional struct_ops maps are defined using question mark at the start
      of the section name, e.g.:
      
          SEC("?.struct_ops")
          struct test_ops optional_map = { ... };
      
      This commit teaches libbpf to detect if kernel allows '?' prefix
      in datasec names, and if it doesn't then to rewrite such names
      by replacing '?' with '_', e.g.:
      
          DATASEC ?.struct_ops -> DATASEC _.struct_ops
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240306104529.6453-13-eddyz87@gmail.com
      6ebaa3fb
    • Eduard Zingerman's avatar
      libbpf: Struct_ops in SEC("?.struct_ops") / SEC("?.struct_ops.link") · 5ad0ecbe
      Eduard Zingerman authored
      Allow using two new section names for struct_ops maps:
      - SEC("?.struct_ops")
      - SEC("?.struct_ops.link")
      
      To specify maps that have bpf_map->autocreate == false after open.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240306104529.6453-12-eddyz87@gmail.com
      5ad0ecbe
    • Eduard Zingerman's avatar
      libbpf: Replace elf_state->st_ops_* fields with SEC_ST_OPS sec_type · 240bf8a5
      Eduard Zingerman authored
      The next patch would add two new section names for struct_ops maps.
      To make working with multiple struct_ops sections more convenient:
      - remove fields like elf_state->st_ops_{shndx,link_shndx};
      - mark section descriptions hosting struct_ops as
        elf_sec_desc->sec_type == SEC_ST_OPS;
      
      After these changes struct_ops sections could be processed uniformly
      by iterating bpf_object->efile.secs entries.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240306104529.6453-11-eddyz87@gmail.com
      240bf8a5