1. 11 May, 2021 8 commits
    • Arnaldo Carvalho de Melo's avatar
    • Florent Revest's avatar
      bpf: Fix nested bpf_bprintf_prepare with more per-cpu buffers · e2d5b2bb
      Florent Revest authored
      The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
      per-cpu buffer that they use to store temporary data (arguments to
      bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
      by the end of their scope with bpf_bprintf_cleanup.
      
      If one of these helpers gets called within the scope of one of these
      helpers, for example: a first bpf program gets called, uses
      bpf_trace_printk which calls raw_spin_lock_irqsave which is traced by
      another bpf program that calls bpf_snprintf, then the second "get"
      fails. Essentially, these helpers are not re-entrant. They would return
      -EBUSY and print a warning message once.
      
      This patch triples the number of bprintf buffers to allow three levels
      of nesting. This is very similar to what was done for tracepoints in
      "9594dc3c bpf: fix nested bpf tracepoints with per-cpu data"
      
      Fixes: d9c9e4db ("bpf: Factorize bpf_trace_printk and bpf_seq_printf")
      Reported-by: syzbot+63122d0bc347f18c1884@syzkaller.appspotmail.com
      Signed-off-by: default avatarFlorent Revest <revest@chromium.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210511081054.2125874-1-revest@chromium.org
      e2d5b2bb
    • Jiri Olsa's avatar
      bpf: Add deny list of btf ids check for tracing programs · 35e3815f
      Jiri Olsa authored
      The recursion check in __bpf_prog_enter and __bpf_prog_exit
      leaves some (not inlined) functions unprotected:
      
      In __bpf_prog_enter:
        - migrate_disable is called before prog->active is checked
      
      In __bpf_prog_exit:
        - migrate_enable,rcu_read_unlock_strict are called after
          prog->active is decreased
      
      When attaching trampoline to them we get panic like:
      
        traps: PANIC: double fault, error_code: 0x0
        double fault: 0000 [#1] SMP PTI
        RIP: 0010:__bpf_prog_enter+0x4/0x50
        ...
        Call Trace:
         <IRQ>
         bpf_trampoline_6442466513_0+0x18/0x1000
         migrate_disable+0x5/0x50
         __bpf_prog_enter+0x9/0x50
         bpf_trampoline_6442466513_0+0x18/0x1000
         migrate_disable+0x5/0x50
         __bpf_prog_enter+0x9/0x50
         bpf_trampoline_6442466513_0+0x18/0x1000
         migrate_disable+0x5/0x50
         __bpf_prog_enter+0x9/0x50
         bpf_trampoline_6442466513_0+0x18/0x1000
         migrate_disable+0x5/0x50
         ...
      
      Fixing this by adding deny list of btf ids for tracing
      programs and checking btf id during program verification.
      Adding above functions to this list.
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210429114712.43783-1-jolsa@kernel.org
      35e3815f
    • Daniel Borkmann's avatar
      bpf: Add kconfig knob for disabling unpriv bpf by default · 08389d88
      Daniel Borkmann authored
      Add a kconfig knob which allows for unprivileged bpf to be disabled by default.
      If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2.
      
      This still allows a transition of 2 -> {0,1} through an admin. Similarly,
      this also still keeps 1 -> {1} behavior intact, so that once set to permanently
      disabled, it cannot be undone aside from a reboot.
      
      We've also added extra2 with max of 2 for the procfs handler, so that an admin
      still has a chance to toggle between 0 <-> 2.
      
      Either way, as an additional alternative, applications can make use of CAP_BPF
      that we added a while ago.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.daniel@iogearbox.net
      08389d88
    • Daniel Borkmann's avatar
      bpf, kconfig: Add consolidated menu entry for bpf with core options · b24abcff
      Daniel Borkmann authored
      Right now, all core BPF related options are scattered in different Kconfig
      locations mainly due to historic reasons. Moving forward, lets add a proper
      subsystem entry under ...
      
        General setup  --->
          BPF subsystem  --->
      
      ... in order to have all knobs in a single location and thus ease BPF related
      configuration. Networking related bits such as sockmap are out of scope for
      the general setup and therefore better suited to remain in net/Kconfig.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/f23f58765a4d59244ebd8037da7b6a6b2fb58446.1620765074.git.daniel@iogearbox.net
      b24abcff
    • Andrii Nakryiko's avatar
      bpf: Prevent writable memory-mapping of read-only ringbuf pages · 04ea3086
      Andrii Nakryiko authored
      Only the very first page of BPF ringbuf that contains consumer position
      counter is supposed to be mapped as writeable by user-space. Producer
      position is read-only and can be modified only by the kernel code. BPF ringbuf
      data pages are read-only as well and are not meant to be modified by
      user-code to maintain integrity of per-record headers.
      
      This patch allows to map only consumer position page as writeable and
      everything else is restricted to be read-only. remap_vmalloc_range()
      internally adds VM_DONTEXPAND, so all the established memory mappings can't be
      extended, which prevents any future violations through mremap()'ing.
      
      Fixes: 457f4436 ("bpf: Implement BPF ring buffer and verifier support for it")
      Reported-by: Ryota Shiga (Flatt Security)
      Reported-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      04ea3086
    • Thadeu Lima de Souza Cascardo's avatar
      bpf, ringbuf: Deny reserve of buffers larger than ringbuf · 4b81cceb
      Thadeu Lima de Souza Cascardo authored
      A BPF program might try to reserve a buffer larger than the ringbuf size.
      If the consumer pointer is way ahead of the producer, that would be
      successfully reserved, allowing the BPF program to read or write out of
      the ringbuf allocated area.
      
      Reported-by: Ryota Shiga (Flatt Security)
      Fixes: 457f4436 ("bpf: Implement BPF ring buffer and verifier support for it")
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4b81cceb
    • Daniel Borkmann's avatar
      bpf: Fix alu32 const subreg bound tracking on bitwise operations · 049c4e13
      Daniel Borkmann authored
      Fix a bug in the verifier's scalar32_min_max_*() functions which leads to
      incorrect tracking of 32 bit bounds for the simulation of and/or/xor bitops.
      When both the src & dst subreg is a known constant, then the assumption is
      that scalar_min_max_*() will take care to update bounds correctly. However,
      this is not the case, for example, consider a register R2 which has a tnum
      of 0xffffffff00000000, meaning, lower 32 bits are known constant and in this
      case of value 0x00000001. R2 is then and'ed with a register R3 which is a
      64 bit known constant, here, 0x100000002.
      
      What can be seen in line '10:' is that 32 bit bounds reach an invalid state
      where {u,s}32_min_value > {u,s}32_max_value. The reason is scalar32_min_max_*()
      delegates 32 bit bounds updates to scalar_min_max_*(), however, that really
      only takes place when both the 64 bit src & dst register is a known constant.
      Given scalar32_min_max_*() is intended to be designed as closely as possible
      to scalar_min_max_*(), update the 32 bit bounds in this situation through
      __mark_reg32_known() which will set all {u,s}32_{min,max}_value to the correct
      constant, which is 0x00000000 after the fix (given 0x00000001 & 0x00000002 in
      32 bit space). This is possible given var32_off already holds the final value
      as dst_reg->var_off is updated before calling scalar32_min_max_*().
      
      Before fix, invalid tracking of R2:
      
        [...]
        9: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,smin_value=-9223372036854775807 (0x8000000000000001),smax_value=9223372032559808513 (0x7fffffff00000001),umin_value=1,umax_value=0xffffffff00000001,var_off=(0x1; 0xffffffff00000000),s32_min_value=1,s32_max_value=1,u32_min_value=1,u32_max_value=1) R3_w=inv4294967298 R10=fp0
        9: (5f) r2 &= r3
        10: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,smin_value=0,smax_value=4294967296 (0x100000000),umin_value=0,umax_value=0x100000000,var_off=(0x0; 0x100000000),s32_min_value=1,s32_max_value=0,u32_min_value=1,u32_max_value=0) R3_w=inv4294967298 R10=fp0
        [...]
      
      After fix, correct tracking of R2:
      
        [...]
        9: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,smin_value=-9223372036854775807 (0x8000000000000001),smax_value=9223372032559808513 (0x7fffffff00000001),umin_value=1,umax_value=0xffffffff00000001,var_off=(0x1; 0xffffffff00000000),s32_min_value=1,s32_max_value=1,u32_min_value=1,u32_max_value=1) R3_w=inv4294967298 R10=fp0
        9: (5f) r2 &= r3
        10: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,smin_value=0,smax_value=4294967296 (0x100000000),umin_value=0,umax_value=0x100000000,var_off=(0x0; 0x100000000),s32_min_value=0,s32_max_value=0,u32_min_value=0,u32_max_value=0) R3_w=inv4294967298 R10=fp0
        [...]
      
      Fixes: 3f50f132 ("bpf: Verifier, do explicit ALU32 bounds tracking")
      Fixes: 2921c90d ("bpf: Fix a verifier failure with xor")
      Reported-by: Manfred Paul (@_manfp)
      Reported-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      049c4e13
  2. 06 May, 2021 2 commits
  3. 05 May, 2021 1 commit
  4. 04 May, 2021 1 commit
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 1682d8df
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-05-04
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 5 non-merge commits during the last 4 day(s) which contain
      a total of 6 files changed, 52 insertions(+), 30 deletions(-).
      
      The main changes are:
      
      1) Fix libbpf overflow when processing BPF ring buffer in case of extreme
         application behavior, from Brendan Jackman.
      
      2) Fix potential data leakage of uninitialized BPF stack under speculative
         execution, from Daniel Borkmann.
      
      3) Fix off-by-one when validating xsk pool chunks, from Xuan Zhuo.
      
      4) Fix snprintf BPF selftest with a pid filter to avoid racing its output
         test buffer, from Florent Revest.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1682d8df
  5. 03 May, 2021 15 commits
  6. 30 Apr, 2021 13 commits