1. 15 Dec, 2021 4 commits
    • Daniel Borkmann's avatar
      bpf, selftests: Update test case for atomic cmpxchg on r0 with pointer · e523102c
      Daniel Borkmann authored
      Fix up unprivileged test case results for 'Dest pointer in r0' verifier tests
      given they now need to reject R0 containing a pointer value, and add a couple
      of new related ones with 32bit cmpxchg as well.
      
        root@foo:~/bpf/tools/testing/selftests/bpf# ./test_verifier
        #0/u invalid and of negative number OK
        #0/p invalid and of negative number OK
        [...]
        #1268/p XDP pkt read, pkt_meta' <= pkt_data, bad access 1 OK
        #1269/p XDP pkt read, pkt_meta' <= pkt_data, bad access 2 OK
        #1270/p XDP pkt read, pkt_data <= pkt_meta', good access OK
        #1271/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
        #1272/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
        Summary: 1900 PASSED, 0 SKIPPED, 0 FAILED
      Acked-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e523102c
    • Daniel Borkmann's avatar
      bpf: Fix kernel address leakage in atomic cmpxchg's r0 aux reg · a82fe085
      Daniel Borkmann authored
      The implementation of BPF_CMPXCHG on a high level has the following parameters:
      
        .-[old-val]                                          .-[new-val]
        BPF_R0 = cmpxchg{32,64}(DST_REG + insn->off, BPF_R0, SRC_REG)
                                `-[mem-loc]          `-[old-val]
      
      Given a BPF insn can only have two registers (dst, src), the R0 is fixed and
      used as an auxilliary register for input (old value) as well as output (returning
      old value from memory location). While the verifier performs a number of safety
      checks, it misses to reject unprivileged programs where R0 contains a pointer as
      old value.
      
      Through brute-forcing it takes about ~16sec on my machine to leak a kernel pointer
      with BPF_CMPXCHG. The PoC is basically probing for kernel addresses by storing the
      guessed address into the map slot as a scalar, and using the map value pointer as
      R0 while SRC_REG has a canary value to detect a matching address.
      
      Fix it by checking R0 for pointers, and reject if that's the case for unprivileged
      programs.
      
      Fixes: 5ffa2550 ("bpf: Add instructions for atomic_[cmp]xchg")
      Reported-by: Ryota Shiga (Flatt Security)
      Acked-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a82fe085
    • Daniel Borkmann's avatar
      bpf, selftests: Add test case for atomic fetch on spilled pointer · 180486b4
      Daniel Borkmann authored
      Test whether unprivileged would be able to leak the spilled pointer either
      by exporting the returned value from the atomic{32,64} operation or by reading
      and exporting the value from the stack after the atomic operation took place.
      
      Note that for unprivileged, the below atomic cmpxchg test case named "Dest
      pointer in r0 - succeed" is failing. The reason is that in the dst memory
      location (r10 -8) there is the spilled register r10:
      
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0
        0: (bf) r0 = r10
        1: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0
        1: (7b) *(u64 *)(r10 -8) = r0
        2: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0 fp-8_w=fp
        2: (b7) r1 = 0
        3: R0_w=fp0 R1_w=invP0 R10=fp0 fp-8_w=fp
        3: (db) r0 = atomic64_cmpxchg((u64 *)(r10 -8), r0, r1)
        4: R0_w=fp0 R1_w=invP0 R10=fp0 fp-8_w=mmmmmmmm
        4: (79) r1 = *(u64 *)(r0 -8)
        5: R0_w=fp0 R1_w=invP(id=0) R10=fp0 fp-8_w=mmmmmmmm
        5: (b7) r0 = 0
        6: R0_w=invP0 R1_w=invP(id=0) R10=fp0 fp-8_w=mmmmmmmm
        6: (95) exit
      
      However, allowing this case for unprivileged is a bit useless given an
      update with a new pointer will fail anyway:
      
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0
        0: (bf) r0 = r10
        1: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0
        1: (7b) *(u64 *)(r10 -8) = r0
        2: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0 fp-8_w=fp
        2: (db) r0 = atomic64_cmpxchg((u64 *)(r10 -8), r0, r10)
        R10 leaks addr into mem
      Acked-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      180486b4
    • Daniel Borkmann's avatar
      bpf: Fix kernel address leakage in atomic fetch · 7d3baf0a
      Daniel Borkmann authored
      The change in commit 37086bfd ("bpf: Propagate stack bounds to registers
      in atomics w/ BPF_FETCH") around check_mem_access() handling is buggy since
      this would allow for unprivileged users to leak kernel pointers. For example,
      an atomic fetch/and with -1 on a stack destination which holds a spilled
      pointer will migrate the spilled register type into a scalar, which can then
      be exported out of the program (since scalar != pointer) by dumping it into
      a map value.
      
      The original implementation of XADD was preventing this situation by using
      a double call to check_mem_access() one with BPF_READ and a subsequent one
      with BPF_WRITE, in both cases passing -1 as a placeholder value instead of
      register as per XADD semantics since it didn't contain a value fetch. The
      BPF_READ also included a check in check_stack_read_fixed_off() which rejects
      the program if the stack slot is of __is_pointer_value() if dst_regno < 0.
      The latter is to distinguish whether we're dealing with a regular stack spill/
      fill or some arithmetical operation which is disallowed on non-scalars, see
      also 6e7e63cb ("bpf: Forbid XADD on spilled pointers for unprivileged
      users") for more context on check_mem_access() and its handling of placeholder
      value -1.
      
      One minimally intrusive option to fix the leak is for the BPF_FETCH case to
      initially check the BPF_READ case via check_mem_access() with -1 as register,
      followed by the actual load case with non-negative load_reg to propagate
      stack bounds to registers.
      
      Fixes: 37086bfd ("bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH")
      Reported-by: <n4ke4mry@gmail.com>
      Acked-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7d3baf0a
  2. 14 Dec, 2021 2 commits
  3. 10 Dec, 2021 4 commits
    • Paul Chaignon's avatar
      selftests/bpf: Tests for state pruning with u32 spill/fill · 0be2516f
      Paul Chaignon authored
      This patch adds tests for the verifier's tracking for spilled, <8B
      registers. The first two test cases ensure the verifier doesn't
      incorrectly prune states in case of <8B spill/fills. The last one simply
      checks that a filled u64 register is marked unknown if the register
      spilled in the same slack slot was less than 8B.
      
      The map value access at the end of the first program is only incorrect
      for the path R6=32. If the precision bit for register R8 isn't
      backtracked through the u32 spill/fill, the R6=32 path is pruned at
      instruction 9 and the program is incorrectly accepted. The second
      program is a variation of the same with u32 spills and a u64 fill.
      
      The additional instructions to introduce the first pruning point may be
      a bit fragile as they depend on the heuristics for pruning points in the
      verifier (currently at least 8 instructions and 2 jumps). If the
      heuristics are changed, the pruning point may move (e.g., to the
      subsequent jump) or disappear, which would cause the test to always pass.
      Signed-off-by: default avatarPaul Chaignon <paul@isovalent.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0be2516f
    • Paul Chaignon's avatar
      bpf: Fix incorrect state pruning for <8B spill/fill · 345e004d
      Paul Chaignon authored
      Commit 354e8f19 ("bpf: Support <8-byte scalar spill and refill")
      introduced support in the verifier to track <8B spill/fills of scalars.
      The backtracking logic for the precision bit was however skipping
      spill/fills of less than 8B. That could cause state pruning to consider
      two states equivalent when they shouldn't be.
      
      As an example, consider the following bytecode snippet:
      
        0:  r7 = r1
        1:  call bpf_get_prandom_u32
        2:  r6 = 2
        3:  if r0 == 0 goto pc+1
        4:  r6 = 3
        ...
        8: [state pruning point]
        ...
        /* u32 spill/fill */
        10: *(u32 *)(r10 - 8) = r6
        11: r8 = *(u32 *)(r10 - 8)
        12: r0 = 0
        13: if r8 == 3 goto pc+1
        14: r0 = 1
        15: exit
      
      The verifier first walks the path with R6=3. Given the support for <8B
      spill/fills, at instruction 13, it knows the condition is true and skips
      instruction 14. At that point, the backtracking logic kicks in but stops
      at the fill instruction since it only propagates the precision bit for
      8B spill/fill. When the verifier then walks the path with R6=2, it will
      consider it safe at instruction 8 because R6 is not marked as needing
      precision. Instruction 14 is thus never walked and is then incorrectly
      removed as 'dead code'.
      
      It's also possible to lead the verifier to accept e.g. an out-of-bound
      memory access instead of causing an incorrect dead code elimination.
      
      This regression was found via Cilium's bpf-next CI where it was causing
      a conntrack map update to be silently skipped because the code had been
      removed by the verifier.
      
      This commit fixes it by enabling support for <8B spill/fills in the
      bactracking logic. In case of a <8B spill/fill, the full 8B stack slot
      will be marked as needing precision. Then, in __mark_chain_precision,
      any tracked register spilled in a marked slot will itself be marked as
      needing precision, regardless of the spill size. This logic makes two
      assumptions: (1) only 8B-aligned spill/fill are tracked and (2) spilled
      registers are only tracked if the spill and fill sizes are equal. Commit
      ef979017 ("bpf: selftest: Add verifier tests for <8-byte scalar
      spill and refill") covers the first assumption and the next commit in
      this patchset covers the second.
      
      Fixes: 354e8f19 ("bpf: Support <8-byte scalar spill and refill")
      Signed-off-by: default avatarPaul Chaignon <paul@isovalent.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      345e004d
    • Eric Dumazet's avatar
      sch_cake: do not call cake_destroy() from cake_init() · ab443c53
      Eric Dumazet authored
      qdiscs are not supposed to call their own destroy() method
      from init(), because core stack already does that.
      
      syzbot was able to trigger use after free:
      
      DEBUG_LOCKS_WARN_ON(lock->magic != lock)
      WARNING: CPU: 0 PID: 21902 at kernel/locking/mutex.c:586 __mutex_lock_common kernel/locking/mutex.c:586 [inline]
      WARNING: CPU: 0 PID: 21902 at kernel/locking/mutex.c:586 __mutex_lock+0x9ec/0x12f0 kernel/locking/mutex.c:740
      Modules linked in:
      CPU: 0 PID: 21902 Comm: syz-executor189 Not tainted 5.16.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__mutex_lock_common kernel/locking/mutex.c:586 [inline]
      RIP: 0010:__mutex_lock+0x9ec/0x12f0 kernel/locking/mutex.c:740
      Code: 08 84 d2 0f 85 19 08 00 00 8b 05 97 38 4b 04 85 c0 0f 85 27 f7 ff ff 48 c7 c6 20 00 ac 89 48 c7 c7 a0 fe ab 89 e8 bf 76 ba ff <0f> 0b e9 0d f7 ff ff 48 8b 44 24 40 48 8d b8 c8 08 00 00 48 89 f8
      RSP: 0018:ffffc9000627f290 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff88802315d700 RSI: ffffffff815f1db8 RDI: fffff52000c4fe44
      RBP: ffff88818f28e000 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff815ebb5e R11: 0000000000000000 R12: 0000000000000000
      R13: dffffc0000000000 R14: ffffc9000627f458 R15: 0000000093c30000
      FS:  0000555556abc400(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fda689c3303 CR3: 000000001cfbb000 CR4: 0000000000350ef0
      Call Trace:
       <TASK>
       tcf_chain0_head_change_cb_del+0x2e/0x3d0 net/sched/cls_api.c:810
       tcf_block_put_ext net/sched/cls_api.c:1381 [inline]
       tcf_block_put_ext net/sched/cls_api.c:1376 [inline]
       tcf_block_put+0xbc/0x130 net/sched/cls_api.c:1394
       cake_destroy+0x3f/0x80 net/sched/sch_cake.c:2695
       qdisc_create.constprop.0+0x9da/0x10f0 net/sched/sch_api.c:1293
       tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
       rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2496
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x904/0xdf0 net/netlink/af_netlink.c:1921
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f1bb06badb9
      Code: Unable to access opcode bytes at RIP 0x7f1bb06bad8f.
      RSP: 002b:00007fff3012a658 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1bb06badb9
      RDX: 0000000000000000 RSI: 00000000200007c0 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000003
      R10: 0000000000000003 R11: 0000000000000246 R12: 00007fff3012a688
      R13: 00007fff3012a6a0 R14: 00007fff3012a6e0 R15: 00000000000013c2
       </TASK>
      
      Fixes: 046f6fd5 ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@toke.dk>
      Link: https://lore.kernel.org/r/20211210142046.698336-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ab443c53
    • Jie2x Zhou's avatar
      selftests: net: Correct ping6 expected rc from 2 to 1 · 92816e26
      Jie2x Zhou authored
      ./fcnal-test.sh -v -t ipv6_ping
      TEST: ping out, VRF bind - ns-B IPv6 LLA                                      [FAIL]
      TEST: ping out, VRF bind - multicast IP                                       [FAIL]
      
      ping6 is failing as it should.
      COMMAND: ip netns exec ns-A /bin/ping6 -c1 -w1 fe80::7c4c:bcff:fe66:a63a%red
      strace of ping6 shows it is failing with '1',
      so change the expected rc from 2 to 1.
      
      Fixes: c0644e71 ("selftests: Add ipv6 ping tests to fcnal-test")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Suggested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJie2x Zhou <jie2x.zhou@intel.com>
      Link: https://lore.kernel.org/r/20211209020230.37270-1-jie2x.zhou@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      92816e26
  4. 09 Dec, 2021 29 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · ded746bf
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf, can and netfilter.
      
        Current release - regressions:
      
         - bpf, sockmap: re-evaluate proto ops when psock is removed from
           sockmap
      
        Current release - new code bugs:
      
         - bpf: fix bpf_check_mod_kfunc_call for built-in modules
      
         - ice: fixes for TC classifier offloads
      
         - vrf: don't run conntrack on vrf with !dflt qdisc
      
        Previous releases - regressions:
      
         - bpf: fix the off-by-two error in range markings
      
         - seg6: fix the iif in the IPv6 socket control block
      
         - devlink: fix netns refcount leak in devlink_nl_cmd_reload()
      
         - dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
      
         - dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports
      
        Previous releases - always broken:
      
         - ethtool: do not perform operations on net devices being
           unregistered
      
         - udp: use datalen to cap max gso segments
      
         - ice: fix races in stats collection
      
         - fec: only clear interrupt of handling queue in fec_enet_rx_queue()
      
         - m_can: pci: fix incorrect reference clock rate
      
         - m_can: disable and ignore ELO interrupt
      
         - mvpp2: fix XDP rx queues registering
      
        Misc:
      
         - treewide: add missing includes masked by cgroup -> bpf.h
           dependency"
      
      * tag 'net-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (82 commits)
        net: dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports
        net: wwan: iosm: fixes unable to send AT command during mbim tx
        net: wwan: iosm: fixes net interface nonfunctional after fw flash
        net: wwan: iosm: fixes unnecessary doorbell send
        net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering
        MAINTAINERS: s390/net: remove myself as maintainer
        net/sched: fq_pie: prevent dismantle issue
        net: mana: Fix memory leak in mana_hwc_create_wq
        seg6: fix the iif in the IPv6 socket control block
        nfp: Fix memory leak in nfp_cpp_area_cache_add()
        nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done
        nfc: fix segfault in nfc_genl_dump_devices_done
        udp: using datalen to cap max gso segments
        net: dsa: mv88e6xxx: error handling for serdes_power functions
        can: kvaser_usb: get CAN clock frequency from device
        can: kvaser_pciefd: kvaser_pciefd_rx_error_frame(): increase correct stats->{rx,tx}_errors counter
        net: mvpp2: fix XDP rx queues registering
        vmxnet3: fix minimum vectors alloc issue
        net, neigh: clear whole pneigh_entry at alloc time
        net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
        ...
      ded746bf
    • Linus Torvalds's avatar
      Merge tag 'mtd/fixes-for-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · 27698cd2
      Linus Torvalds authored
      Pull mtd fixes from Miquel Raynal:
       "MTD fixes:
      
         - dataflash: Add device-tree SPI IDs to avoid new warnings
      
        Raw NAND fixes:
      
         - Fix nand_choose_best_timings() on unsupported interface
      
         - Fix nand_erase_op delay (wrong unit)
      
         - fsmc:
            - Fix timing computation
            - Take instruction delay into account
      
         - denali:
            - Add the dependency on HAS_IOMEM to silence robots"
      
      * tag 'mtd/fixes-for-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
        mtd: dataflash: Add device-tree SPI IDs
        mtd: rawnand: fsmc: Fix timing computation
        mtd: rawnand: fsmc: Take instruction delay into account
        mtd: rawnand: Fix nand_choose_best_timings() on unsupported interface
        mtd: rawnand: Fix nand_erase_op delay
        mtd: rawnand: denali: Add the dependency on HAS_IOMEM
      27698cd2
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · 03090cc7
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
      
       - fixes for various drivers which assume that a HID device is on USB
         transport, but that might not necessarily be the case, as the device
         can be faked by uhid. (Greg, Benjamin Tissoires)
      
       - fix for spurious wakeups on certain Lenovo notebooks (Thomas
         Weißschuh)
      
       - a few other device-specific quirks
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: Ignore battery for Elan touchscreen on Asus UX550VE
        HID: intel-ish-hid: ipc: only enable IRQ wakeup when requested
        HID: google: add eel USB id
        HID: add USB_HID dependancy to hid-prodikeys
        HID: add USB_HID dependancy to hid-chicony
        HID: bigbenff: prevent null pointer dereference
        HID: sony: fix error path in probe
        HID: add USB_HID dependancy on some USB HID drivers
        HID: check for valid USB device for many HID drivers
        HID: wacom: fix problems when device is not a valid USB device
        HID: add hid_is_usb() function to make it simpler for USB detection
        HID: quirks: Add quirk for the Microsoft Surface 3 type-cover
      03090cc7
    • Linus Torvalds's avatar
      Merge tag 'netfs-fixes-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 2990c89d
      Linus Torvalds authored
      Pull netfslib fixes from David Howells:
      
       - Fix a lockdep warning and potential deadlock. This is takes the
         simple approach of offloading the write-to-cache done from within a
         network filesystem read to a worker thread to avoid taking the
         sb_writer lock from the cache backing filesystem whilst holding the
         mmap lock on an inode from the network filesystem.
      
         Jan Kara posits a scenario whereby this can cause deadlock[1], though
         it's quite complex and I think requires someone in userspace to
         actually do I/O on the cache files. Matthew Wilcox isn't so certain,
         though[2].
      
         An alternative way to fix this, suggested by Darrick Wong, might be
         to allow cachefiles to prevent userspace from performing I/O upon the
         file - something like an exclusive open - but that's beyond the scope
         of a fix here if we do want to make such a facility in the future.
      
       - In some of the error handling paths where netfs_ops->cleanup() is
         called, the arguments are transposed[3]. gcc doesn't complain because
         one of the parameters is void* and one of the values is void*.
      
      Link: https://lore.kernel.org/r/20210922110420.GA21576@quack2.suse.cz/ [1]
      Link: https://lore.kernel.org/r/Ya9eDiFCE2fO7K/S@casper.infradead.org/ [2]
      Link: https://lore.kernel.org/r/20211207031449.100510-1-jefflexu@linux.alibaba.com/ [3]
      
      * tag 'netfs-fixes-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        netfs: fix parameter of cleanup()
        netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lock
      2990c89d
    • Sasha Levin's avatar
      tools/lib/lockdep: drop leftover liblockdep headers · 3a49cc22
      Sasha Levin authored
      Clean up remaining headers that are specific to liblockdep but lived in
      the shared header directory.  These are all unused after the liblockdep
      code was removed in commit 7246f4dc ("tools/lib/lockdep: drop
      liblockdep").
      
      Note that there are still headers that were originally created for
      liblockdep, that still have liblockdep references, but they are used by
      other tools/ code at this point.
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3a49cc22
    • Russell King (Oracle)'s avatar
      net: dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports · 04ec4e62
      Russell King (Oracle) authored
      Martyn Welch reports that his CPU port is unable to link where it has
      been necessary to use one of the switch ports with an internal PHY for
      the CPU port. The reason behind this is the port control register is
      left forcing the link down, preventing traffic flow.
      
      This occurs because during initialisation, phylink expects the link to
      be down, and DSA forces the link down by synthesising a call to the
      DSA drivers phylink_mac_link_down() method, but we don't touch the
      forced-link state when we later reconfigure the port.
      
      Resolve this by also unforcing the link state when we are operating in
      PHY mode and the PPU is set to poll the PHY to retrieve link status
      information.
      Reported-by: default avatarMartyn Welch <martyn.welch@collabora.com>
      Tested-by: default avatarMartyn Welch <martyn.welch@collabora.com>
      Fixes: 3be98b2d ("net: dsa: Down cpu/dsa ports phylink will control")
      Cc: <stable@vger.kernel.org> # 5.7: 2b29cb9e: net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/E1mvFhP-00F8Zb-Ul@rmk-PC.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      04ec4e62
    • Jakub Kicinski's avatar
      Merge branch 'net-wwan-iosm-bug-fixes' · 19961780
      Jakub Kicinski authored
      M Chetan Kumar says:
      
      ====================
      net: wwan: iosm: bug fixes
      
      This patch series brings in IOSM driver bug fixes. Patch details are
      explained below.
      
      PATCH1: stop sending unnecessary doorbell in IP tx flow.
      PATCH2: Restore the IP channel configuration after fw flash.
      PATCH3: Removed the unnecessary check around control port TX transfer.
      ====================
      
      Link: https://lore.kernel.org/r/20211209101629.2940877-1-m.chetan.kumar@linux.intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      19961780
    • M Chetan Kumar's avatar
      net: wwan: iosm: fixes unable to send AT command during mbim tx · 383451ce
      M Chetan Kumar authored
      ev_cdev_write_pending flag is preventing a TX message post for
      AT port while MBIM transfer is ongoing.
      
      Removed the unnecessary check around control port TX transfer.
      Signed-off-by: default avatarM Chetan Kumar <m.chetan.kumar@linux.intel.com>
      Reviewed-by: default avatarSergey Ryazanov <ryazanov.s.a@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      383451ce
    • M Chetan Kumar's avatar
      net: wwan: iosm: fixes net interface nonfunctional after fw flash · 07d3f274
      M Chetan Kumar authored
      Devlink initialization flow was overwriting the IP traffic
      channel configuration. This was causing wwan0 network interface
      to be unusable after fw flash.
      
      When device boots to fully functional mode restore the IP channel
      configuration.
      Signed-off-by: default avatarM Chetan Kumar <m.chetan.kumar@linux.intel.com>
      Reviewed-by: default avatarSergey Ryazanov <ryazanov.s.a@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      07d3f274
    • M Chetan Kumar's avatar
      net: wwan: iosm: fixes unnecessary doorbell send · 373f121a
      M Chetan Kumar authored
      In TX packet accumulation flow transport layer is
      giving a doorbell to device even though there is
      no pending control TX transfer that needs immediate
      attention.
      
      Introduced a new hpda_ctrl_pending variable to keep
      track of pending control TX transfer. If there is a
      pending control TX transfer which needs an immediate
      attention only then give a doorbell to device.
      Signed-off-by: default avatarM Chetan Kumar <m.chetan.kumar@linux.intel.com>
      Reviewed-by: default avatarSergey Ryazanov <ryazanov.s.a@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      373f121a
    • José Expósito's avatar
      net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering · e8b1d769
      José Expósito authored
      Avoid a memory leak if there is not a CPU port defined.
      
      Fixes: 8d5f7954 ("net: dsa: felix: break at first CPU port during init and teardown")
      Addresses-Coverity-ID: 1492897 ("Resource leak")
      Addresses-Coverity-ID: 1492899 ("Resource leak")
      Signed-off-by: default avatarJosé Expósito <jose.exposito89@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20211209110538.11585-1-jose.exposito89@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e8b1d769
    • Julian Wiedmann's avatar
      MAINTAINERS: s390/net: remove myself as maintainer · 37ad4e2a
      Julian Wiedmann authored
      I won't have access to the relevant HW and docs much longer.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Link: https://lore.kernel.org/r/20211209153546.1152921-1-jwi@linux.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      37ad4e2a
    • Eric Dumazet's avatar
      net/sched: fq_pie: prevent dismantle issue · 61c24026
      Eric Dumazet authored
      For some reason, fq_pie_destroy() did not copy
      working code from pie_destroy() and other qdiscs,
      thus causing elusive bug.
      
      Before calling del_timer_sync(&q->adapt_timer),
      we need to ensure timer will not rearm itself.
      
      rcu: INFO: rcu_preempt self-detected stall on CPU
      rcu:    0-....: (4416 ticks this GP) idle=60d/1/0x4000000000000000 softirq=10433/10434 fqs=2579
              (t=10501 jiffies g=13085 q=3989)
      NMI backtrace for cpu 0
      CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 5.16.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
       nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
       trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
       rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343
       print_cpu_stall kernel/rcu/tree_stall.h:627 [inline]
       check_cpu_stall kernel/rcu/tree_stall.h:711 [inline]
       rcu_pending kernel/rcu/tree.c:3878 [inline]
       rcu_sched_clock_irq.cold+0x9d/0x746 kernel/rcu/tree.c:2597
       update_process_times+0x16d/0x200 kernel/time/timer.c:1785
       tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
       tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428
       __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
       __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
       hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
       local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
       __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
       sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
       </IRQ>
       <TASK>
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
      RIP: 0010:write_comp_data kernel/kcov.c:221 [inline]
      RIP: 0010:__sanitizer_cov_trace_const_cmp1+0x1d/0x80 kernel/kcov.c:273
      Code: 54 c8 20 48 89 10 c3 66 0f 1f 44 00 00 53 41 89 fb 41 89 f1 bf 03 00 00 00 65 48 8b 0c 25 40 70 02 00 48 89 ce 4c 8b 54 24 08 <e8> 4e f7 ff ff 84 c0 74 51 48 8b 81 88 15 00 00 44 8b 81 84 15 00
      RSP: 0018:ffffc90000d27b28 EFLAGS: 00000246
      RAX: 0000000000000000 RBX: ffff888064bf1bf0 RCX: ffff888011928000
      RDX: ffff888011928000 RSI: ffff888011928000 RDI: 0000000000000003
      RBP: ffff888064bf1c28 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff875d8295 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff8880783dd300 R14: 0000000000000000 R15: 0000000000000000
       pie_calculate_probability+0x405/0x7c0 net/sched/sch_pie.c:418
       fq_pie_timer+0x170/0x2a0 net/sched/sch_fq_pie.c:383
       call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421
       expire_timers kernel/time/timer.c:1466 [inline]
       __run_timers.part.0+0x675/0xa20 kernel/time/timer.c:1734
       __run_timers kernel/time/timer.c:1715 [inline]
       run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1747
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
       run_ksoftirqd kernel/softirq.c:921 [inline]
       run_ksoftirqd+0x2d/0x60 kernel/softirq.c:913
       smpboot_thread_fn+0x645/0x9c0 kernel/smpboot.c:164
       kthread+0x405/0x4f0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
       </TASK>
      
      Fixes: ec97ecf1 ("net: sched: add Flow Queue PIE packet scheduler")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Mohit P. Tahiliani <tahiliani@nitk.edu.in>
      Cc: Sachin D. Patil <sdp.sachin@gmail.com>
      Cc: V. Saicharan <vsaicharan1998@gmail.com>
      Cc: Mohit Bhasi <mohitbhasi1998@gmail.com>
      Cc: Leslie Monis <lesliemonis@gmail.com>
      Cc: Gautam Ramakrishnan <gautamramk@gmail.com>
      Link: https://lore.kernel.org/r/20211209084937.3500020-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      61c24026
    • José Expósito's avatar
      net: mana: Fix memory leak in mana_hwc_create_wq · 9acfc57f
      José Expósito authored
      If allocating the DMA buffer fails, mana_hwc_destroy_wq was called
      without previously storing the pointer to the queue.
      
      In order to avoid leaking the pointer to the queue, store it as soon as
      it is allocated.
      
      Addresses-Coverity-ID: 1484720 ("Resource leak")
      Signed-off-by: default avatarJosé Expósito <jose.exposito89@gmail.com>
      Reviewed-by: default avatarDexuan Cui <decui@microsoft.com>
      Link: https://lore.kernel.org/r/20211208223723.18520-1-jose.exposito89@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9acfc57f
    • Andrea Mayer's avatar
      seg6: fix the iif in the IPv6 socket control block · ae68d933
      Andrea Mayer authored
      When an IPv4 packet is received, the ip_rcv_core(...) sets the receiving
      interface index into the IPv4 socket control block (v5.16-rc4,
      net/ipv4/ip_input.c line 510):
      
          IPCB(skb)->iif = skb->skb_iif;
      
      If that IPv4 packet is meant to be encapsulated in an outer IPv6+SRH
      header, the seg6_do_srh_encap(...) performs the required encapsulation.
      In this case, the seg6_do_srh_encap function clears the IPv6 socket control
      block (v5.16-rc4 net/ipv6/seg6_iptunnel.c line 163):
      
          memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
      
      The memset(...) was introduced in commit ef489749 ("ipv6: sr: clear
      IP6CB(skb) on SRH ip4ip6 encapsulation") a long time ago (2019-01-29).
      
      Since the IPv6 socket control block and the IPv4 socket control block share
      the same memory area (skb->cb), the receiving interface index info is lost
      (IP6CB(skb)->iif is set to zero).
      
      As a side effect, that condition triggers a NULL pointer dereference if
      commit 0857d6f8 ("ipv6: When forwarding count rx stats on the orig
      netdev") is applied.
      
      To fix that issue, we set the IP6CB(skb)->iif with the index of the
      receiving interface once again.
      
      Fixes: ef489749 ("ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation")
      Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20211208195409.12169-1-andrea.mayer@uniroma2.itSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ae68d933
    • Jianglei Nie's avatar
      nfp: Fix memory leak in nfp_cpp_area_cache_add() · c56c9630
      Jianglei Nie authored
      In line 800 (#1), nfp_cpp_area_alloc() allocates and initializes a
      CPP area structure. But in line 807 (#2), when the cache is allocated
      failed, this CPP area structure is not freed, which will result in
      memory leak.
      
      We can fix it by freeing the CPP area when the cache is allocated
      failed (#2).
      
      792 int nfp_cpp_area_cache_add(struct nfp_cpp *cpp, size_t size)
      793 {
      794 	struct nfp_cpp_area_cache *cache;
      795 	struct nfp_cpp_area *area;
      
      800	area = nfp_cpp_area_alloc(cpp, NFP_CPP_ID(7, NFP_CPP_ACTION_RW, 0),
      801 				  0, size);
      	// #1: allocates and initializes
      
      802 	if (!area)
      803 		return -ENOMEM;
      
      805 	cache = kzalloc(sizeof(*cache), GFP_KERNEL);
      806 	if (!cache)
      807 		return -ENOMEM; // #2: missing free
      
      817	return 0;
      818 }
      
      Fixes: 4cb584e0 ("nfp: add CPP access core")
      Signed-off-by: default avatarJianglei Nie <niejianglei2021@163.com>
      Acked-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20211209061511.122535-1-niejianglei2021@163.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c56c9630
    • Krzysztof Kozlowski's avatar
      nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done · 4cd8371a
      Krzysztof Kozlowski authored
      The done() netlink callback nfc_genl_dump_ses_done() should check if
      received argument is non-NULL, because its allocation could fail earlier
      in dumpit() (nfc_genl_dump_ses()).
      
      Fixes: ac22ac46 ("NFC: Add a GET_SE netlink API")
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Link: https://lore.kernel.org/r/20211209081307.57337-1-krzysztof.kozlowski@canonical.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4cd8371a
    • Tadeusz Struk's avatar
      nfc: fix segfault in nfc_genl_dump_devices_done · fd79a0cb
      Tadeusz Struk authored
      When kmalloc in nfc_genl_dump_devices() fails then
      nfc_genl_dump_devices_done() segfaults as below
      
      KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      CPU: 0 PID: 25 Comm: kworker/0:1 Not tainted 5.16.0-rc4-01180-g2a987e65-dirty #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-6.fc35 04/01/2014
      Workqueue: events netlink_sock_destruct_work
      RIP: 0010:klist_iter_exit+0x26/0x80
      Call Trace:
      <TASK>
      class_dev_iter_exit+0x15/0x20
      nfc_genl_dump_devices_done+0x3b/0x50
      genl_lock_done+0x84/0xd0
      netlink_sock_destruct+0x8f/0x270
      __sk_destruct+0x64/0x3b0
      sk_destruct+0xa8/0xd0
      __sk_free+0x2e8/0x3d0
      sk_free+0x51/0x90
      netlink_sock_destruct_work+0x1c/0x20
      process_one_work+0x411/0x710
      worker_thread+0x6fd/0xa80
      
      Link: https://syzkaller.appspot.com/bug?id=fc0fa5a53db9edd261d56e74325419faf18bd0df
      Reported-by: syzbot+f9f76f4a0766420b4a02@syzkaller.appspotmail.com
      Signed-off-by: default avatarTadeusz Struk <tadeusz.struk@linaro.org>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Link: https://lore.kernel.org/r/20211208182742.340542-1-tadeusz.struk@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fd79a0cb
    • Jianguo Wu's avatar
      udp: using datalen to cap max gso segments · 158390e4
      Jianguo Wu authored
      The max number of UDP gso segments is intended to cap to UDP_MAX_SEGMENTS,
      this is checked in udp_send_skb():
      
          if (skb->len > cork->gso_size * UDP_MAX_SEGMENTS) {
              kfree_skb(skb);
              return -EINVAL;
          }
      
      skb->len contains network and transport header len here, we should use
      only data len instead.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Signed-off-by: default avatarJianguo Wu <wujianguo@chinatelecom.cn>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/900742e5-81fb-30dc-6e0b-375c6cdd7982@163.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      158390e4
    • Ameer Hamza's avatar
      net: dsa: mv88e6xxx: error handling for serdes_power functions · 0416e7af
      Ameer Hamza authored
      Added default case to handle undefined cmode scenario in
      mv88e6393x_serdes_power() and mv88e6393x_serdes_power() methods.
      
      Addresses-Coverity: 1494644 ("Uninitialized scalar variable")
      Fixes: 21635d92 (net: dsa: mv88e6xxx: Fix application of erratum 4.8 for 88E6393X)
      Reviewed-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarAmeer Hamza <amhamza.mgc@gmail.com>
      Link: https://lore.kernel.org/r/20211209041552.9810-1-amhamza.mgc@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0416e7af
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-5.16-20211209' of... · 8d6b32aa
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-5.16-20211209' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      can 2021-12-09
      
      Both patches are by Jimmy Assarsson. The first one fixes the
      incrementing of the rx/tx error counters in the Kvaser PCIe FD driver.
      The second one fixes the Kvaser USB driver by using the CAN clock
      frequency provided by the device instead of using a hard coded value.
      
      * tag 'linux-can-fixes-for-5.16-20211209' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: kvaser_usb: get CAN clock frequency from device
        can: kvaser_pciefd: kvaser_pciefd_rx_error_frame(): increase correct stats->{rx,tx}_errors counter
      ====================
      
      Link: https://lore.kernel.org/r/20211209081312.301036-1-mkl@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8d6b32aa
    • Jimmy Assarsson's avatar
      can: kvaser_usb: get CAN clock frequency from device · fb12797a
      Jimmy Assarsson authored
      The CAN clock frequency is used when calculating the CAN bittiming
      parameters. When wrong clock frequency is used, the device may end up
      with wrong bittiming parameters, depending on user requested bittiming
      parameters.
      
      To avoid this, get the CAN clock frequency from the device. Various
      existing Kvaser Leaf products use different CAN clocks.
      
      Fixes: 080f40a6 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices")
      Link: https://lore.kernel.org/all/20211208152122.250852-2-extja@kvaser.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJimmy Assarsson <extja@kvaser.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      fb12797a
    • Jimmy Assarsson's avatar
      can: kvaser_pciefd: kvaser_pciefd_rx_error_frame(): increase correct stats->{rx,tx}_errors counter · 36aea60f
      Jimmy Assarsson authored
      Check the direction bit in the error frame packet (EPACK) to determine
      which net_device_stats {rx,tx}_errors counter to increase.
      
      Fixes: 26ad340e ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
      Link: https://lore.kernel.org/all/20211208152122.250852-1-extja@kvaser.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJimmy Assarsson <extja@kvaser.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      36aea60f
    • Louis Amas's avatar
      net: mvpp2: fix XDP rx queues registering · a50e659b
      Louis Amas authored
      The registration of XDP queue information is incorrect because the
      RX queue id we use is invalid. When port->id == 0 it appears to works
      as expected yet it's no longer the case when port->id != 0.
      
      The problem arised while using a recent kernel version on the
      MACCHIATOBin. This board has several ports:
       * eth0 and eth1 are 10Gbps interfaces ; both ports has port->id == 0;
       * eth2 is a 1Gbps interface with port->id != 0.
      
      Code from xdp-tutorial (more specifically advanced03-AF_XDP) was used
      to test packet capture and injection on all these interfaces. The XDP
      kernel was simplified to:
      
      	SEC("xdp_sock")
      	int xdp_sock_prog(struct xdp_md *ctx)
      	{
      		int index = ctx->rx_queue_index;
      
      		/* A set entry here means that the correspnding queue_id
      		* has an active AF_XDP socket bound to it. */
      		if (bpf_map_lookup_elem(&xsks_map, &index))
      			return bpf_redirect_map(&xsks_map, index, 0);
      
      		return XDP_PASS;
      	}
      
      Starting the program using:
      
      	./af_xdp_user -d DEV
      
      Gives the following result:
      
       * eth0 : ok
       * eth1 : ok
       * eth2 : no capture, no injection
      
      Investigating the issue shows that XDP rx queues for eth2 are wrong:
      XDP expects their id to be in the range [0..3] but we found them to be
      in the range [32..35].
      
      Trying to force rx queue ids using:
      
      	./af_xdp_user -d eth2 -Q 32
      
      fails as expected (we shall not have more than 4 queues).
      
      When we register the XDP rx queue information (using
      xdp_rxq_info_reg() in function mvpp2_rxq_init()) we tell it to use
      rxq->id as the queue id. This value is computed as:
      
      	rxq->id = port->id * max_rxq_count + queue_id
      
      where max_rxq_count depends on the device version. In the MACCHIATOBin
      case, this value is 32, meaning that rx queues on eth2 are numbered
      from 32 to 35 - there are four of them.
      
      Clearly, this is not the per-port queue id that XDP is expecting:
      it wants a value in the range [0..3]. It shall directly use queue_id
      which is stored in rxq->logic_rxq -- so let's use that value instead.
      
      rxq->id is left untouched ; its value is indeed valid but it should
      not be used in this context.
      
      This is consistent with the remaining part of the code in
      mvpp2_rxq_init().
      
      With this change, packet capture is working as expected on all the
      MACCHIATOBin ports.
      
      Fixes: b27db227 ("mvpp2: use page_pool allocator")
      Signed-off-by: default avatarLouis Amas <louis.amas@eho.link>
      Signed-off-by: default avatarEmmanuel Deloget <emmanuel.deloget@eho.link>
      Reviewed-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Link: https://lore.kernel.org/r/20211207143423.916334-1-louis.amas@eho.linkSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a50e659b
    • Ronak Doshi's avatar
      vmxnet3: fix minimum vectors alloc issue · f71ef02f
      Ronak Doshi authored
      'Commit 39f9895a ("vmxnet3: add support for 32 Tx/Rx queues")'
      added support for 32Tx/Rx queues. Within that patch, value of
      VMXNET3_LINUX_MIN_MSIX_VECT was updated.
      
      However, there is a case (numvcpus = 2) which actually requires 3
      intrs which matches VMXNET3_LINUX_MIN_MSIX_VECT which then is
      treated as failure by stack to allocate more vectors. This patch
      fixes this issue.
      
      Fixes: 39f9895a ("vmxnet3: add support for 32 Tx/Rx queues")
      Signed-off-by: default avatarRonak Doshi <doshir@vmware.com>
      Acked-by: default avatarGuolin Yang <gyang@vmware.com>
      Link: https://lore.kernel.org/r/20211207081737.14000-1-doshir@vmware.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f71ef02f
    • Eric Dumazet's avatar
      net, neigh: clear whole pneigh_entry at alloc time · e195e9b5
      Eric Dumazet authored
      Commit 2c611ad9 ("net, neigh: Extend neigh->flags to 32 bit
      to allow for extensions") enables a new KMSAM warning [1]
      
      I think the bug is actually older, because the following intruction
      only occurred if ndm->ndm_flags had NTF_PROXY set.
      
      	pn->flags = ndm->ndm_flags;
      
      Let's clear all pneigh_entry fields at alloc time.
      
      [1]
      BUG: KMSAN: uninit-value in pneigh_fill_info+0x986/0xb30 net/core/neighbour.c:2593
       pneigh_fill_info+0x986/0xb30 net/core/neighbour.c:2593
       pneigh_dump_table net/core/neighbour.c:2715 [inline]
       neigh_dump_info+0x1e3f/0x2c60 net/core/neighbour.c:2832
       netlink_dump+0xaca/0x16a0 net/netlink/af_netlink.c:2265
       __netlink_dump_start+0xd1c/0xee0 net/netlink/af_netlink.c:2370
       netlink_dump_start include/linux/netlink.h:254 [inline]
       rtnetlink_rcv_msg+0x181b/0x18c0 net/core/rtnetlink.c:5534
       netlink_rcv_skb+0x447/0x800 net/netlink/af_netlink.c:2491
       rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5589
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x1095/0x1360 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x16f3/0x1870 net/netlink/af_netlink.c:1916
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       sock_write_iter+0x594/0x690 net/socket.c:1057
       call_write_iter include/linux/fs.h:2162 [inline]
       new_sync_write fs/read_write.c:503 [inline]
       vfs_write+0x1318/0x2030 fs/read_write.c:590
       ksys_write+0x28c/0x520 fs/read_write.c:643
       __do_sys_write fs/read_write.c:655 [inline]
       __se_sys_write fs/read_write.c:652 [inline]
       __x64_sys_write+0xdb/0x120 fs/read_write.c:652
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:524 [inline]
       slab_alloc_node mm/slub.c:3251 [inline]
       slab_alloc mm/slub.c:3259 [inline]
       __kmalloc+0xc3c/0x12d0 mm/slub.c:4437
       kmalloc include/linux/slab.h:595 [inline]
       pneigh_lookup+0x60f/0xd70 net/core/neighbour.c:766
       arp_req_set_public net/ipv4/arp.c:1016 [inline]
       arp_req_set+0x430/0x10a0 net/ipv4/arp.c:1032
       arp_ioctl+0x8d4/0xb60 net/ipv4/arp.c:1232
       inet_ioctl+0x4ef/0x820 net/ipv4/af_inet.c:947
       sock_do_ioctl net/socket.c:1118 [inline]
       sock_ioctl+0xa3f/0x13e0 net/socket.c:1235
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:874 [inline]
       __se_sys_ioctl+0x2df/0x4a0 fs/ioctl.c:860
       __x64_sys_ioctl+0xd8/0x110 fs/ioctl.c:860
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      CPU: 1 PID: 20001 Comm: syz-executor.0 Not tainted 5.16.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 62dd9318 ("[IPV6] NDISC: Set per-entry is_router flag in Proxy NA.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20211206165329.1049835-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e195e9b5
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · fd31cb0c
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Fix bogus compilter warning in nfnetlink_queue, from Florian Westphal.
      
      2) Don't run conntrack on vrf with !dflt qdisc, from Nicolas Dichtel.
      
      3) Fix nft_pipapo bucket load in AVX2 lookup routine for six 8-bit
         groups, from Stefano Brivio.
      
      4) Break rule evaluation on malformed TCP options.
      
      5) Use socat instead of nc in selftests/netfilter/nft_zones_many.sh,
         also from Florian
      
      6) Fix KCSAN data-race in conntrack timeout updates, from Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
        netfilter: conntrack: annotate data-races around ct->timeout
        selftests: netfilter: switch zone stress to socat
        netfilter: nft_exthdr: break evaluation if setting TCP option fails
        selftests: netfilter: Add correctness test for mac,net set type
        nft_set_pipapo: Fix bucket load in AVX2 lookup routine for six 8-bit groups
        vrf: don't run conntrack on vrf with !dflt qdisc
        netfilter: nfnetlink_queue: silence bogus compiler warning
      ====================
      
      Link: https://lore.kernel.org/r/20211209000847.102598-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fd31cb0c
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · b5b6b6ba
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-12-08
      
      Yahui adds re-initialization of Flow Director for VF reset.
      
      Paul restores interrupts when enabling VFs.
      
      Dave re-adds bandwidth check for DCBNL and moves DSCP mode check
      earlier in the function.
      
      Jesse prevents reporting of dropped packets that occur during
      initialization and fixes reporting of statistics which could occur with
      frequent reads.
      
      Michal corrects setting of protocol type for UDP header and fixes lack
      of differentiation when adding filters for tunnels.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: safer stats processing
        ice: fix adding different tunnels
        ice: fix choosing UDP header type
        ice: ignore dropped packets during init
        ice: Fix problems with DSCP QoS implementation
        ice: rearm other interrupt cause register after enabling VFs
        ice: fix FDIR init missing when reset VF
      ====================
      
      Link: https://lore.kernel.org/r/20211208211144.2629867-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b5b6b6ba
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 6efcdadc
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      bpf 2021-12-08
      
      We've added 12 non-merge commits during the last 22 day(s) which contain
      a total of 29 files changed, 659 insertions(+), 80 deletions(-).
      
      The main changes are:
      
      1) Fix an off-by-two error in packet range markings and also add a batch of
         new tests for coverage of these corner cases, from Maxim Mikityanskiy.
      
      2) Fix a compilation issue on MIPS JIT for R10000 CPUs, from Johan Almbladh.
      
      3) Fix two functional regressions and a build warning related to BTF kfunc
         for modules, from Kumar Kartikeya Dwivedi.
      
      4) Fix outdated code and docs regarding BPF's migrate_disable() use on non-
         PREEMPT_RT kernels, from Sebastian Andrzej Siewior.
      
      5) Add missing includes in order to be able to detangle cgroup vs bpf header
         dependencies, from Jakub Kicinski.
      
      6) Fix regression in BPF sockmap tests caused by missing detachment of progs
         from sockets when they are removed from the map, from John Fastabend.
      
      7) Fix a missing "no previous prototype" warning in x86 JIT caused by BPF
         dispatcher, from Björn Töpel.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf: Add selftests to cover packet access corner cases
        bpf: Fix the off-by-two error in range markings
        treewide: Add missing includes masked by cgroup -> bpf dependency
        tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets
        bpf: Fix bpf_check_mod_kfunc_call for built-in modules
        bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL
        mips, bpf: Fix reference to non-existing Kconfig symbol
        bpf: Make sure bpf_disable_instrumentation() is safe vs preemption.
        Documentation/locking/locktypes: Update migrate_disable() bits.
        bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap
        bpf, sockmap: Attach map progs to psock early for feature probes
        bpf, x86: Fix "no previous prototype" warning
      ====================
      
      Link: https://lore.kernel.org/r/20211208155125.11826-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6efcdadc
  5. 08 Dec, 2021 1 commit
    • Russell King (Oracle)'s avatar
      net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's" · 2b29cb9e
      Russell King (Oracle) authored
      This commit fixes a misunderstanding in commit 4a3e0aed ("net: dsa:
      mv88e6xxx: don't use PHY_DETECT on internal PHY's").
      
      For Marvell DSA switches with the PHY_DETECT bit (for non-6250 family
      devices), controls whether the PPU polls the PHY to retrieve the link,
      speed, duplex and pause status to update the port configuration. This
      applies for both internal and external PHYs.
      
      For some switches such as 88E6352 and 88E6390X, PHY_DETECT has an
      additional function of enabling auto-media mode between the internal
      PHY and SERDES blocks depending on which first gains link.
      
      The original intention of commit 5d5b231d (net: dsa: mv88e6xxx: use
      PHY_DETECT in mac_link_up/mac_link_down) was to allow this bit to be
      used to detect when this propagation is enabled, and allow software to
      update the port configuration. This has found to be necessary for some
      switches which do not automatically propagate status from the SERDES to
      the port, which includes the 88E6390. However, commit 4a3e0aed
      ("net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's") breaks
      this assumption.
      
      Maarten Zanders has confirmed that the issue he was addressing was for
      an 88E6250 switch, which does not have a PHY_DETECT bit in bit 12, but
      instead a link status bit. Therefore, mv88e6xxx_port_ppu_updates() does
      not report correctly.
      
      This patch resolves the above issues by reverting Maarten's change and
      instead making mv88e6xxx_port_ppu_updates() indicate whether the port
      is internal for the 88E6250 family of switches.
      
        Yes, you're right, I'm targeting the 6250 family. And yes, your
        suggestion would solve my case and is a better implementation for
        the other devices (as far as I can see).
      
      Fixes: 4a3e0aed ("net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Tested-by: default avatarMaarten Zanders <maarten.zanders@mind.be>
      Link: https://lore.kernel.org/r/E1muXm7-00EwJB-7n@rmk-PC.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2b29cb9e