1. 23 Aug, 2019 3 commits
    • Daniel Borkmann's avatar
      bpf: fix use after free in prog symbol exposure · c751798a
      Daniel Borkmann authored
      syzkaller managed to trigger the warning in bpf_jit_free() which checks via
      bpf_prog_kallsyms_verify_off() for potentially unlinked JITed BPF progs
      in kallsyms, and subsequently trips over GPF when walking kallsyms entries:
      
        [...]
        8021q: adding VLAN 0 to HW filter on device batadv0
        8021q: adding VLAN 0 to HW filter on device batadv0
        WARNING: CPU: 0 PID: 9869 at kernel/bpf/core.c:810 bpf_jit_free+0x1e8/0x2a0
        Kernel panic - not syncing: panic_on_warn set ...
        CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: events bpf_prog_free_deferred
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x113/0x167 lib/dump_stack.c:113
         panic+0x212/0x40b kernel/panic.c:214
         __warn.cold.8+0x1b/0x38 kernel/panic.c:571
         report_bug+0x1a4/0x200 lib/bug.c:186
         fixup_bug arch/x86/kernel/traps.c:178 [inline]
         do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
         do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
         invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
        RIP: 0010:bpf_jit_free+0x1e8/0x2a0
        Code: 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 86 00 00 00 48 ba 00 02 00 00 00 00 ad de 0f b6 43 02 49 39 d6 0f 84 5f fe ff ff <0f> 0b e9 58 fe ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1
        RSP: 0018:ffff888092f67cd8 EFLAGS: 00010202
        RAX: 0000000000000007 RBX: ffffc90001947000 RCX: ffffffff816e9d88
        RDX: dead000000000200 RSI: 0000000000000008 RDI: ffff88808769f7f0
        RBP: ffff888092f67d00 R08: fffffbfff1394059 R09: fffffbfff1394058
        R10: fffffbfff1394058 R11: ffffffff89ca02c7 R12: ffffc90001947002
        R13: ffffc90001947020 R14: ffffffff881eca80 R15: ffff88808769f7e8
        BUG: unable to handle kernel paging request at fffffbfff400d000
        #PF error: [normal kernel read fault]
        PGD 21ffee067 P4D 21ffee067 PUD 21ffed067 PMD 9f942067 PTE 0
        Oops: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: events bpf_prog_free_deferred
        RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:495 [inline]
        RIP: 0010:bpf_tree_comp kernel/bpf/core.c:558 [inline]
        RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
        RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
        RIP: 0010:bpf_prog_kallsyms_find+0x107/0x2e0 kernel/bpf/core.c:632
        Code: 00 f0 ff ff 44 38 c8 7f 08 84 c0 0f 85 fa 00 00 00 41 f6 45 02 01 75 02 0f 0b 48 39 da 0f 82 92 00 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 03 0f 8e 45 01 00 00 8b 03 48 c1 e0
        [...]
      
      Upon further debugging, it turns out that whenever we trigger this
      issue, the kallsyms removal in bpf_prog_ksym_node_del() was /skipped/
      but yet bpf_jit_free() reported that the entry is /in use/.
      
      Problem is that symbol exposure via bpf_prog_kallsyms_add() but also
      perf_event_bpf_event() were done /after/ bpf_prog_new_fd(). Once the
      fd is exposed to the public, a parallel close request came in right
      before we attempted to do the bpf_prog_kallsyms_add().
      
      Given at this time the prog reference count is one, we start to rip
      everything underneath us via bpf_prog_release() -> bpf_prog_put().
      The memory is eventually released via deferred free, so we're seeing
      that bpf_jit_free() has a kallsym entry because we added it from
      bpf_prog_load() but /after/ bpf_prog_put() from the remote CPU.
      
      Therefore, move both notifications /before/ we install the fd. The
      issue was never seen between bpf_prog_alloc_id() and bpf_prog_new_fd()
      because upon bpf_prog_get_fd_by_id() we'll take another reference to
      the BPF prog, so we're still holding the original reference from the
      bpf_prog_load().
      
      Fixes: 6ee52e2a ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT")
      Fixes: 74451e66 ("bpf: make jited programs visible in traces")
      Reported-by: syzbot+bd3bba6ff3fcea7a6ec6@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Song Liu <songliubraving@fb.com>
      c751798a
    • Alexei Starovoitov's avatar
      bpf: fix precision tracking in presence of bpf2bpf calls · 6754172c
      Alexei Starovoitov authored
      While adding extra tests for precision tracking and extra infra
      to adjust verifier heuristics the existing test
      "calls: cross frame pruning - liveness propagation" started to fail.
      The root cause is the same as described in verifer.c comment:
      
       * Also if parent's curframe > frame where backtracking started,
       * the verifier need to mark registers in both frames, otherwise callees
       * may incorrectly prune callers. This is similar to
       * commit 7640ead9 ("bpf: verifier: make sure callees don't prune with caller differences")
       * For now backtracking falls back into conservative marking.
      
      Turned out though that returning -ENOTSUPP from backtrack_insn() and
      doing mark_all_scalars_precise() in the current parentage chain is not enough.
      Depending on how is_state_visited() heuristic is creating parentage chain
      it's possible that callee will incorrectly prune caller.
      Fix the issue by setting precise=true earlier and more aggressively.
      Before this fix the precision tracking _within_ functions that don't do
      bpf2bpf calls would still work. Whereas now precision tracking is completely
      disabled when bpf2bpf calls are present anywhere in the program.
      
      No difference in cilium tests (they don't have bpf2bpf calls).
      No difference in test_progs though some of them have bpf2bpf calls,
      but precision tracking wasn't effective there.
      
      Fixes: b5dc0163 ("bpf: precise scalar_value tracking")
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      6754172c
    • Jakub Sitnicki's avatar
      flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH · db38de39
      Jakub Sitnicki authored
      Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to
      free the program once a grace period has elapsed. The callback can run
      together with new RCU readers that started after the last grace period.
      New RCU readers can potentially see the "old" to-be-freed or already-freed
      pointer to the program object before the RCU update-side NULLs it.
      
      Reorder the operations so that the RCU update-side resets the protected
      pointer before the end of the grace period after which the program will be
      freed.
      
      Fixes: d58e468b ("flow_dissector: implements flow dissector BPF hook")
      Reported-by: default avatarLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: default avatarPetar Penkov <ppenkov@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      db38de39
  2. 21 Aug, 2019 4 commits
  3. 20 Aug, 2019 1 commit
  4. 16 Aug, 2019 1 commit
  5. 14 Aug, 2019 1 commit
  6. 13 Aug, 2019 1 commit
  7. 12 Aug, 2019 8 commits
    • Ilya Leoshkevich's avatar
      s390/bpf: fix lcgr instruction encoding · bb2d267c
      Ilya Leoshkevich authored
      "masking, test in bounds 3" fails on s390, because
      BPF_ALU64_IMM(BPF_NEG, BPF_REG_2, 0) ignores the top 32 bits of
      BPF_REG_2. The reason is that JIT emits lcgfr instead of lcgr.
      The associated comment indicates that the code was intended to
      emit lcgr in the first place, it's just that the wrong opcode
      was used.
      
      Fix by using the correct opcode.
      
      Fixes: 05462310 ("s390/bpf: Add s390x eBPF JIT compiler backend")
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Acked-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      bb2d267c
    • Nathan Chancellor's avatar
      net: tc35815: Explicitly check NET_IP_ALIGN is not zero in tc35815_rx · 125b7e09
      Nathan Chancellor authored
      clang warns:
      
      drivers/net/ethernet/toshiba/tc35815.c:1507:30: warning: use of logical
      '&&' with constant operand [-Wconstant-logical-operand]
                              if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN)
                                                        ^  ~~~~~~~~~~~~
      drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: use '&' for a
      bitwise operation
                              if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN)
                                                        ^~
                                                        &
      drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: remove constant to
      silence this warning
                              if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN)
                                                       ~^~~~~~~~~~~~~~~
      1 warning generated.
      
      Explicitly check that NET_IP_ALIGN is not zero, which matches how this
      is checked in other parts of the tree. Because NET_IP_ALIGN is a build
      time constant, this check will be constant folded away during
      optimization.
      
      Fixes: 82a9928d ("tc35815: Enable StripCRC feature")
      Link: https://github.com/ClangBuiltLinux/linux/issues/608Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      125b7e09
    • Chris Packham's avatar
      tipc: initialise addr_trail_end when setting node addresses · 8874ecae
      Chris Packham authored
      We set the field 'addr_trial_end' to 'jiffies', instead of the current
      value 0, at the moment the node address is initialized. This guarantees
      we don't inadvertently enter an address trial period when the node
      address is explicitly set by the user.
      Signed-off-by: default avatarChris Packham <chris.packham@alliedtelesis.co.nz>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8874ecae
    • Chen-Yu Tsai's avatar
      net: dsa: Check existence of .port_mdb_add callback before calling it · 58799865
      Chen-Yu Tsai authored
      The dsa framework has optional .port_mdb_{prepare,add,del} callback fields
      for drivers to handle multicast database entries. When adding an entry, the
      framework goes through a prepare phase, then a commit phase. Drivers not
      providing these callbacks should be detected in the prepare phase.
      
      DSA core may still bypass the bridge layer and call the dsa_port_mdb_add
      function directly with no prepare phase or no switchdev trans object,
      and the framework ends up calling an undefined .port_mdb_add callback.
      This results in a NULL pointer dereference, as shown in the log below.
      
      The other functions seem to be properly guarded. Do the same for
      .port_mdb_add in dsa_switch_mdb_add_bitmap() as well.
      
          8<--- cut here ---
          Unable to handle kernel NULL pointer dereference at virtual address 00000000
          pgd = (ptrval)
          [00000000] *pgd=00000000
          Internal error: Oops: 80000005 [#1] SMP ARM
          Modules linked in: rtl8xxxu rtl8192cu rtl_usb rtl8192c_common rtlwifi mac80211 cfg80211
          CPU: 1 PID: 134 Comm: kworker/1:2 Not tainted 5.3.0-rc1-00247-gd3519030752a #1
          Hardware name: Allwinner sun7i (A20) Family
          Workqueue: events switchdev_deferred_process_work
          PC is at 0x0
          LR is at dsa_switch_event+0x570/0x620
          pc : [<00000000>]    lr : [<c08533ec>]    psr: 80070013
          sp : ee871db8  ip : 00000000  fp : ee98d0a4
          r10: 0000000c  r9 : 00000008  r8 : ee89f710
          r7 : ee98d040  r6 : ee98d088  r5 : c0f04c48  r4 : ee98d04c
          r3 : 00000000  r2 : ee89f710  r1 : 00000008  r0 : ee98d040
          Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
          Control: 10c5387d  Table: 6deb406a  DAC: 00000051
          Process kworker/1:2 (pid: 134, stack limit = 0x(ptrval))
          Stack: (0xee871db8 to 0xee872000)
          1da0:                                                       ee871e14 103ace2d
          1dc0: 00000000 ffffffff 00000000 ee871e14 00000005 00000000 c08524a0 00000000
          1de0: ffffe000 c014bdfc c0f04c48 ee871e98 c0f04c48 ee9e5000 c0851120 c014bef0
          1e00: 00000000 b643aea2 ee9b4068 c08509a8 ee2bf940 ee89f710 ee871ecb 00000000
          1e20: 00000008 103ace2d 00000000 c087e248 ee29c868 103ace2d 00000001 ffffffff
          1e40: 00000000 ee871e98 00000006 00000000 c0fb2a50 c087e2d0 ffffffff c08523c4
          1e60: ffffffff c014bdfc 00000006 c0fad2d0 ee871e98 ee89f710 00000000 c014c500
          1e80: 00000000 ee89f3c0 c0f04c48 00000000 ee9e5000 c087dfb4 ee9e5000 00000000
          1ea0: ee89f710 ee871ecb 00000001 103ace2d 00000000 c0f04c48 00000000 c087e0a8
          1ec0: 00000000 efd9a3e0 0089f3c0 103ace2d ee89f700 ee89f710 ee9e5000 00000122
          1ee0: 00000100 c087e130 ee89f700 c0fad2c8 c1003ef0 c087de4c 2e928000 c0fad2ec
          1f00: c0fad2ec ee839580 ef7a62c0 ef7a9400 00000000 c087def8 c0fad2ec c01447dc
          1f20: ef315640 ef7a62c0 00000008 ee839580 ee839594 ef7a62c0 00000008 c0f03d00
          1f40: ef7a62d8 ef7a62c0 ffffe000 c0145b84 ffffe000 c0fb2420 c0bfaa8c 00000000
          1f60: ffffe000 ee84b600 ee84b5c0 00000000 ee870000 ee839580 c0145b40 ef0e5ea4
          1f80: ee84b61c c014a6f8 00000001 ee84b5c0 c014a5b0 00000000 00000000 00000000
          1fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
          1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
          1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
          [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
          [<c014bdfc>] (notifier_call_chain) from [<c014bef0>] (raw_notifier_call_chain+0x18/0x20)
          [<c014bef0>] (raw_notifier_call_chain) from [<c08509a8>] (dsa_port_mdb_add+0x48/0x74)
          [<c08509a8>] (dsa_port_mdb_add) from [<c087e248>] (__switchdev_handle_port_obj_add+0x54/0xd4)
          [<c087e248>] (__switchdev_handle_port_obj_add) from [<c087e2d0>] (switchdev_handle_port_obj_add+0x8/0x14)
          [<c087e2d0>] (switchdev_handle_port_obj_add) from [<c08523c4>] (dsa_slave_switchdev_blocking_event+0x94/0xa4)
          [<c08523c4>] (dsa_slave_switchdev_blocking_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
          [<c014bdfc>] (notifier_call_chain) from [<c014c500>] (blocking_notifier_call_chain+0x50/0x68)
          [<c014c500>] (blocking_notifier_call_chain) from [<c087dfb4>] (switchdev_port_obj_notify+0x44/0xa8)
          [<c087dfb4>] (switchdev_port_obj_notify) from [<c087e0a8>] (switchdev_port_obj_add_now+0x90/0x104)
          [<c087e0a8>] (switchdev_port_obj_add_now) from [<c087e130>] (switchdev_port_obj_add_deferred+0x14/0x5c)
          [<c087e130>] (switchdev_port_obj_add_deferred) from [<c087de4c>] (switchdev_deferred_process+0x64/0x104)
          [<c087de4c>] (switchdev_deferred_process) from [<c087def8>] (switchdev_deferred_process_work+0xc/0x14)
          [<c087def8>] (switchdev_deferred_process_work) from [<c01447dc>] (process_one_work+0x218/0x50c)
          [<c01447dc>] (process_one_work) from [<c0145b84>] (worker_thread+0x44/0x5bc)
          [<c0145b84>] (worker_thread) from [<c014a6f8>] (kthread+0x148/0x150)
          [<c014a6f8>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
          Exception stack(0xee871fb0 to 0xee871ff8)
          1fa0:                                     00000000 00000000 00000000 00000000
          1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
          1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
          Code: bad PC value
          ---[ end trace 1292c61abd17b130 ]---
      
          [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
          corresponds to
      
      	$ arm-linux-gnueabihf-addr2line -C -i -e vmlinux c08533ec
      
      	linux/net/dsa/switch.c:156
      	linux/net/dsa/switch.c:178
      	linux/net/dsa/switch.c:328
      
      Fixes: e6db98db ("net: dsa: add switch mdb bitmap functions")
      Signed-off-by: default avatarChen-Yu Tsai <wens@csie.org>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58799865
    • Petr Machata's avatar
      mlxsw: spectrum_ptp: Keep unmatched entries in a linked list · 8028ccda
      Petr Machata authored
      To identify timestamps for matching with their packets, Spectrum-1 uses a
      five-tuple of (port, direction, domain number, message type, sequence ID).
      If there are several clients from the same domain behind a single port
      sending Delay_Req's, the only thing differentiating these packets, as far
      as Spectrum-1 is concerned, is the sequence ID. Should sequence IDs between
      individual clients be similar, conflicts may arise. That is not a problem
      to hardware, which will simply deliver timestamps on a first comes, first
      served basis.
      
      However the driver uses a simple hash table to store the unmatched pieces.
      When a new conflicting piece arrives, it pushes out the previously stored
      one, which if it is a packet, is delivered without timestamp. Later on as
      the corresponding timestamps arrive, the first one is mismatched to the
      second packet, and the second one is never matched and eventually is GCd.
      
      To correct this issue, instead of using a simple rhashtable, use rhltable
      to keep the unmatched entries.
      
      Previously, a found unmatched entry would always be removed from the hash
      table. That is not the case anymore--an incompatible entry is left in the
      hash table. Therefore removal from the hash table cannot be used to confirm
      the validity of the looked-up pointer, instead the lookup would simply need
      to be redone. Therefore move it inside the critical section. This
      simplifies a lot of the code.
      
      Fixes: 87486427 ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
      Reported-by: default avatarAlex Veber <alexve@mellanox.com>
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8028ccda
    • Jonathan Neuschäfer's avatar
      net: nps_enet: Fix function names in doc comments · d81f4141
      Jonathan Neuschäfer authored
      Adjust the function names in two doc comments to match the corresponding
      functions.
      Signed-off-by: default avatarJonathan Neuschäfer <j.neuschaefer@gmx.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d81f4141
    • David Howells's avatar
      rxrpc: Fix local refcounting · 68553f1a
      David Howells authored
      Fix rxrpc_unuse_local() to handle a NULL local pointer as it can be called
      on an unbound socket on which rx->local is not yet set.
      
      The following reproduced (includes omitted):
      
      	int main(void)
      	{
      		socket(AF_RXRPC, SOCK_DGRAM, AF_INET);
      		return 0;
      	}
      
      causes the following oops to occur:
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000010
      	...
      	RIP: 0010:rxrpc_unuse_local+0x8/0x1b
      	...
      	Call Trace:
      	 rxrpc_release+0x2b5/0x338
      	 __sock_release+0x37/0xa1
      	 sock_close+0x14/0x17
      	 __fput+0x115/0x1e9
      	 task_work_run+0x72/0x98
      	 do_exit+0x51b/0xa7a
      	 ? __context_tracking_exit+0x4e/0x10e
      	 do_group_exit+0xab/0xab
      	 __x64_sys_exit_group+0x14/0x17
      	 do_syscall_64+0x89/0x1d4
      	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Reported-by: syzbot+20dee719a2e090427b5f@syzkaller.appspotmail.com
      Fixes: 730c5fd4 ("rxrpc: Fix local endpoint refcounting")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Jeffrey Altman <jaltman@auristor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68553f1a
    • David Ahern's avatar
      netdevsim: Restore per-network namespace accounting for fib entries · 59c84b9f
      David Ahern authored
      Prior to the commit in the fixes tag, the resource controller in netdevsim
      tracked fib entries and rules per network namespace. Restore that behavior.
      
      Fixes: 5fc49422 ("netdevsim: create devlink instance per netdevsim instance")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59c84b9f
  8. 11 Aug, 2019 1 commit
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 9481382b
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-08-11
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) x64 JIT code generation fix for backward-jumps to 1st insn, from Alexei.
      
      2) Fix buggy multi-closing of BTF file descriptor in libbpf, from Andrii.
      
      3) Fix libbpf_num_possible_cpus() to make it thread safe, from Takshak.
      
      4) Fix bpftool to dump an error if pinning fails, from Jakub.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9481382b
  9. 10 Aug, 2019 1 commit
  10. 09 Aug, 2019 19 commits