1. 20 Apr, 2022 5 commits
    • Ido Schimmel's avatar
      selftests: mlxsw: vxlan_flooding_ipv6: Prevent flooding of unwanted packets · 5e624215
      Ido Schimmel authored
      The test verifies that packets are correctly flooded by the bridge and
      the VXLAN device by matching on the encapsulated packets at the other
      end. However, if packets other than those generated by the test also
      ingress the bridge (e.g., MLD packets), they will be flooded as well and
      interfere with the expected count.
      
      Make the test more robust by making sure that only the packets generated
      by the test can ingress the bridge. Drop all the rest using tc filters
      on the egress of 'br0' and 'h1'.
      
      In the software data path, the problem can be solved by matching on the
      inner destination MAC or dropping unwanted packets at the egress of the
      VXLAN device, but this is not currently supported by mlxsw.
      
      Fixes: d01724dd ("selftests: mlxsw: spectrum-2: Add a test for VxLAN flooding with IPv6")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e624215
    • Ido Schimmel's avatar
      selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets · 044011fd
      Ido Schimmel authored
      The test verifies that packets are correctly flooded by the bridge and
      the VXLAN device by matching on the encapsulated packets at the other
      end. However, if packets other than those generated by the test also
      ingress the bridge (e.g., MLD packets), they will be flooded as well and
      interfere with the expected count.
      
      Make the test more robust by making sure that only the packets generated
      by the test can ingress the bridge. Drop all the rest using tc filters
      on the egress of 'br0' and 'h1'.
      
      In the software data path, the problem can be solved by matching on the
      inner destination MAC or dropping unwanted packets at the egress of the
      VXLAN device, but this is not currently supported by mlxsw.
      
      Fixes: 94d302de ("selftests: mlxsw: Add a test for VxLAN flooding")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      044011fd
    • Krzysztof Kozlowski's avatar
      nfc: MAINTAINERS: add Bug entry · c5d0fc54
      Krzysztof Kozlowski authored
      Add a Bug section, indicating preferred mailing method for bug reports,
      to NFC Subsystem entry.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5d0fc54
    • Kevin Hao's avatar
      net: stmmac: Use readl_poll_timeout_atomic() in atomic state · 234901de
      Kevin Hao authored
      The init_systime() may be invoked in atomic state. We have observed the
      following call trace when running "phc_ctl /dev/ptp0 set" on a Intel
      Agilex board.
        BUG: sleeping function called from invalid context at drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c:74
        in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 381, name: phc_ctl
        preempt_count: 1, expected: 0
        RCU nest depth: 0, expected: 0
        Preemption disabled at:
        [<ffff80000892ef78>] stmmac_set_time+0x34/0x8c
        CPU: 2 PID: 381 Comm: phc_ctl Not tainted 5.18.0-rc2-next-20220414-yocto-standard+ #567
        Hardware name: SoCFPGA Agilex SoCDK (DT)
        Call trace:
         dump_backtrace.part.0+0xc4/0xd0
         show_stack+0x24/0x40
         dump_stack_lvl+0x7c/0xa0
         dump_stack+0x18/0x34
         __might_resched+0x154/0x1c0
         __might_sleep+0x58/0x90
         init_systime+0x78/0x120
         stmmac_set_time+0x64/0x8c
         ptp_clock_settime+0x60/0x9c
         pc_clock_settime+0x6c/0xc0
         __arm64_sys_clock_settime+0x88/0xf0
         invoke_syscall+0x5c/0x130
         el0_svc_common.constprop.0+0x4c/0x100
         do_el0_svc+0x7c/0xa0
         el0_svc+0x58/0xcc
         el0t_64_sync_handler+0xa4/0x130
         el0t_64_sync+0x18c/0x190
      
      So we should use readl_poll_timeout_atomic() here instead of
      readl_poll_timeout().
      
      Also adjust the delay time to 10us to fix a "__bad_udelay" build error
      reported by "kernel test robot <lkp@intel.com>". I have tested this on
      Intel Agilex and NXP S32G boards, there is no delay needed at all.
      So the 10us delay should be long enough for most cases.
      
      Fixes: ff8ed737 ("net: stmmac: use readl_poll_timeout() function in init_systime()")
      Signed-off-by: default avatarKevin Hao <haokexin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      234901de
    • Nicolas Dichtel's avatar
      doc/ip-sysctl: add bc_forwarding · c6a4254c
      Nicolas Dichtel authored
      Let's describe this sysctl.
      
      Fixes: 5cbf777c ("route: add support for directed broadcast forwarding")
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6a4254c
  2. 19 Apr, 2022 3 commits
    • Eric Dumazet's avatar
      netlink: reset network and mac headers in netlink_dump() · 99c07327
      Eric Dumazet authored
      netlink_dump() is allocating an skb, reserves space in it
      but forgets to reset network header.
      
      This allows a BPF program, invoked later from sk_filter()
      to access uninitialized kernel memory from the reserved
      space.
      
      Theorically mac header reset could be omitted, because
      it is set to a special initial value.
      bpf_internal_load_pointer_neg_helper calls skb_mac_header()
      without checking skb_mac_header_was_set().
      Relying on skb->len not being too big seems fragile.
      We also could add a sanity check in bpf_internal_load_pointer_neg_helper()
      to avoid surprises in the future.
      
      syzbot report was:
      
      BUG: KMSAN: uninit-value in ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637
       ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637
       __bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796
       bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline]
       __bpf_prog_run include/linux/filter.h:626 [inline]
       bpf_prog_run include/linux/filter.h:633 [inline]
       __bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756
       bpf_prog_run_save_cb include/linux/filter.h:770 [inline]
       sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150
       sk_filter include/linux/filter.h:905 [inline]
       netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276
       netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
       sock_recvmsg_nosec net/socket.c:948 [inline]
       sock_recvmsg net/socket.c:966 [inline]
       sock_read_iter+0x5a9/0x630 net/socket.c:1039
       do_iter_readv_writev+0xa7f/0xc70
       do_iter_read+0x52c/0x14c0 fs/read_write.c:786
       vfs_readv fs/read_write.c:906 [inline]
       do_readv+0x432/0x800 fs/read_write.c:943
       __do_sys_readv fs/read_write.c:1034 [inline]
       __se_sys_readv fs/read_write.c:1031 [inline]
       __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was stored to memory at:
       ___bpf_prog_run+0x96c/0xb420 kernel/bpf/core.c:1558
       __bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796
       bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline]
       __bpf_prog_run include/linux/filter.h:626 [inline]
       bpf_prog_run include/linux/filter.h:633 [inline]
       __bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756
       bpf_prog_run_save_cb include/linux/filter.h:770 [inline]
       sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150
       sk_filter include/linux/filter.h:905 [inline]
       netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276
       netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
       sock_recvmsg_nosec net/socket.c:948 [inline]
       sock_recvmsg net/socket.c:966 [inline]
       sock_read_iter+0x5a9/0x630 net/socket.c:1039
       do_iter_readv_writev+0xa7f/0xc70
       do_iter_read+0x52c/0x14c0 fs/read_write.c:786
       vfs_readv fs/read_write.c:906 [inline]
       do_readv+0x432/0x800 fs/read_write.c:943
       __do_sys_readv fs/read_write.c:1034 [inline]
       __se_sys_readv fs/read_write.c:1031 [inline]
       __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:737 [inline]
       slab_alloc_node mm/slub.c:3244 [inline]
       __kmalloc_node_track_caller+0xde3/0x14f0 mm/slub.c:4972
       kmalloc_reserve net/core/skbuff.c:354 [inline]
       __alloc_skb+0x545/0xf90 net/core/skbuff.c:426
       alloc_skb include/linux/skbuff.h:1158 [inline]
       netlink_dump+0x30f/0x16c0 net/netlink/af_netlink.c:2242
       netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
       sock_recvmsg_nosec net/socket.c:948 [inline]
       sock_recvmsg net/socket.c:966 [inline]
       sock_read_iter+0x5a9/0x630 net/socket.c:1039
       do_iter_readv_writev+0xa7f/0xc70
       do_iter_read+0x52c/0x14c0 fs/read_write.c:786
       vfs_readv fs/read_write.c:906 [inline]
       do_readv+0x432/0x800 fs/read_write.c:943
       __do_sys_readv fs/read_write.c:1034 [inline]
       __se_sys_readv fs/read_write.c:1031 [inline]
       __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      CPU: 0 PID: 3470 Comm: syz-executor751 Not tainted 5.17.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: db65a3aa ("netlink: Trim skb to alloc size to avoid MSG_TRUNC")
      Fixes: 9063e21f ("netlink: autosize skb lengthes")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220415181442.551228-1-eric.dumazet@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      99c07327
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix broken IP multicast flooding · 4cf35a2b
      Vladimir Oltean authored
      When the user runs:
      bridge link set dev $br_port mcast_flood on
      
      this command should affect not only L2 multicast, but also IPv4 and IPv6
      multicast.
      
      In the Ocelot switch, unknown multicast gets flooded according to
      different PGIDs according to its type, and PGID_MC only handles L2
      multicast. Therefore, by leaving PGID_MCIPV4 and PGID_MCIPV6 at their
      default value of 0, unknown IP multicast traffic is never flooded.
      
      Fixes: 421741ea ("net: mscc: ocelot: offload bridge port flags to device")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220415151950.219660-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4cf35a2b
    • Kurt Kanzenbach's avatar
      net: dsa: hellcreek: Calculate checksums in tagger · 0763120b
      Kurt Kanzenbach authored
      In case the checksum calculation is offloaded to the DSA master network
      interface, it will include the switch trailing tag. As soon as the switch strips
      that tag on egress, the calculated checksum is wrong.
      
      Therefore, add the checksum calculation to the tagger (if required) before
      adding the switch tag. This way, the hellcreek code works with all DSA master
      interfaces regardless of their declared feature set.
      
      Fixes: 01ef09ca ("net: dsa: Add tag handling for Hirschmann Hellcreek switches")
      Signed-off-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220415103320.90657-1-kurt@linutronix.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0763120b
  3. 18 Apr, 2022 2 commits
    • Manuel Ullmann's avatar
      net: atlantic: invert deep par in pm functions, preventing null derefs · cbe6c3a8
      Manuel Ullmann authored
      This will reset deeply on freeze and thaw instead of suspend and
      resume and prevent null pointer dereferences of the uninitialized ring
      0 buffer while thawing.
      
      The impact is an indefinitely hanging kernel. You can't switch
      consoles after this and the only possible user interaction is SysRq.
      
      BUG: kernel NULL pointer dereference
      RIP: 0010:aq_ring_rx_fill+0xcf/0x210 [atlantic]
      aq_vec_init+0x85/0xe0 [atlantic]
      aq_nic_init+0xf7/0x1d0 [atlantic]
      atl_resume_common+0x4f/0x100 [atlantic]
      pci_pm_thaw+0x42/0xa0
      
      resolves in aq_ring.o to
      
      ```
      0000000000000ae0 <aq_ring_rx_fill>:
      {
      /* ... */
       baf:	48 8b 43 08          	mov    0x8(%rbx),%rax
       		buff->flags = 0U; /* buff is NULL */
      ```
      
      The bug has been present since the introduction of the new pm code in
      8aaa112a ("net: atlantic: refactoring pm logic") and was hidden
      until 8ce84271 ("net: atlantic: changes for multi-TC support"),
      which refactored the aq_vec_{free,alloc} functions into
      aq_vec_{,ring}_{free,alloc}, but is technically not wrong. The
      original functions just always reinitialized the buffers on S3/S4. If
      the interface is down before freezing, the bug does not occur. It does
      not matter, whether the initrd contains and loads the module before
      thawing.
      
      So the fix is to invert the boolean parameter deep in all pm function
      calls, which was clearly intended to be set like that.
      
      First report was on Github [1], which you have to guess from the
      resume logs in the posted dmesg snippet. Recently I posted one on
      Bugzilla [2], since I did not have an AQC device so far.
      
      #regzbot introduced: 8ce84271
      #regzbot from: koo5 <kolman.jindrich@gmail.com>
      #regzbot monitor: https://github.com/Aquantia/AQtion/issues/32
      
      Fixes: 8aaa112a ("net: atlantic: refactoring pm logic")
      Link: https://github.com/Aquantia/AQtion/issues/32 [1]
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215798 [2]
      Cc: stable@vger.kernel.org
      Reported-by: default avatarkoo5 <kolman.jindrich@gmail.com>
      Signed-off-by: default avatarManuel Ullmann <labre@posteo.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbe6c3a8
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-5.18-20220417' of... · d94ef51d
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-5.18-20220417' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2022-04-17
      
      this is a pull request of 1 patch for net/master.
      
      The patch is by Oliver Hartkopp and fixes a timeout monitoring problem
      in the ISO TP protocol found by the syzbot.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d94ef51d
  4. 17 Apr, 2022 2 commits
    • Oliver Hartkopp's avatar
      can: isotp: stop timeout monitoring when no first frame was sent · d7349708
      Oliver Hartkopp authored
      The first attempt to fix a the 'impossible' WARN_ON_ONCE(1) in
      isotp_tx_timer_handler() focussed on the identical CAN IDs created by
      the syzbot reproducer and lead to upstream fix/commit 3ea56642
      ("can: isotp: sanitize CAN ID checks in isotp_bind()"). But this did
      not catch the root cause of the wrong tx.state in the tx_timer handler.
      
      In the isotp 'first frame' case a timeout monitoring needs to be started
      before the 'first frame' is send. But when this sending failed the timeout
      monitoring for this specific frame has to be disabled too.
      
      Otherwise the tx_timer is fired with the 'warn me' tx.state of ISOTP_IDLE.
      
      Fixes: e057dd3f ("can: add ISO 15765-2:2016 transport protocol")
      Link: https://lore.kernel.org/all/20220405175112.2682-1-socketcan@hartkopp.net
      Reported-by: syzbot+2339c27f5c66c652843e@syzkaller.appspotmail.com
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      d7349708
    • suresh kumar's avatar
      bonding: do not discard lowest hash bit for non layer3+4 hashing · 49aefd13
      suresh kumar authored
      Commit b5f86218 was introduced to discard lowest hash bit for layer3+4 hashing
      but it also removes last bit from non layer3+4 hashing
      
      Below script shows layer2+3 hashing will result in same slave to be used with above commit.
      $ cat hash.py
      #/usr/bin/python3.6
      
      h_dests=[0xa0, 0xa1]
      h_source=0xe3
      hproto=0x8
      saddr=0x1e7aa8c0
      daddr=0x17aa8c0
      
      for h_dest in h_dests:
          hash = (h_dest ^ h_source ^ hproto ^ saddr ^ daddr)
          hash ^= hash >> 16
          hash ^= hash >> 8
          print(hash)
      
      print("with last bit removed")
      for h_dest in h_dests:
          hash = (h_dest ^ h_source ^ hproto ^ saddr ^ daddr)
          hash ^= hash >> 16
          hash ^= hash >> 8
          hash = hash >> 1
          print(hash)
      
      Output:
      $ python3.6 hash.py
      522133332
      522133333   <-------------- will result in both slaves being used
      
      with last bit removed
      261066666
      261066666   <-------------- only single slave used
      Signed-off-by: default avatarsuresh kumar <suresh2514@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49aefd13
  5. 15 Apr, 2022 20 commits
    • Horatiu Vultur's avatar
      net: lan966x: Make sure to release ptp interrupt · d08ed852
      Horatiu Vultur authored
      When the lan966x driver is removed make sure to remove also the ptp_irq
      IRQ.
      
      Fixes: e85a96e4 ("net: lan966x: Add support for ptp interrupts")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Link: https://lore.kernel.org/r/20220413195716.3796467-1-horatiu.vultur@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d08ed852
    • Eric Dumazet's avatar
      ipv6: make ip6_rt_gc_expire an atomic_t · 9cb7c013
      Eric Dumazet authored
      Reads and Writes to ip6_rt_gc_expire always have been racy,
      as syzbot reported lately [1]
      
      There is a possible risk of under-flow, leading
      to unexpected high value passed to fib6_run_gc(),
      although I have not observed this in the field.
      
      Hosts hitting ip6_dst_gc() very hard are under pretty bad
      state anyway.
      
      [1]
      BUG: KCSAN: data-race in ip6_dst_gc / ip6_dst_gc
      
      read-write to 0xffff888102110744 of 4 bytes by task 13165 on cpu 1:
       ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
       dst_alloc+0x9b/0x160 net/core/dst.c:86
       ip6_dst_alloc net/ipv6/route.c:344 [inline]
       icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
       mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
       mld_send_cr net/ipv6/mcast.c:2119 [inline]
       mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
       process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
       worker_thread+0x618/0xa70 kernel/workqueue.c:2436
       kthread+0x1a9/0x1e0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30
      
      read-write to 0xffff888102110744 of 4 bytes by task 11607 on cpu 0:
       ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
       dst_alloc+0x9b/0x160 net/core/dst.c:86
       ip6_dst_alloc net/ipv6/route.c:344 [inline]
       icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
       mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
       mld_send_cr net/ipv6/mcast.c:2119 [inline]
       mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
       process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
       worker_thread+0x618/0xa70 kernel/workqueue.c:2436
       kthread+0x1a9/0x1e0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00000bb3 -> 0x00000ba9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 11607 Comm: kworker/0:21 Not tainted 5.18.0-rc1-syzkaller-00037-g42e7a03d-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: mld mld_ifc_work
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220413181333.649424-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9cb7c013
    • Jakub Kicinski's avatar
      Merge branch 'l3mdev-fix-ip-tunnel-case-after-recent-l3mdev-change' · 268b41b3
      Jakub Kicinski authored
      David Ahern says:
      
      ====================
      l3mdev: Fix ip tunnel case after recent l3mdev change
      
      Second patch provides a fix for ip tunnels after the recent l3mdev change
      that avoids touching the oif in the flow struct. First patch preemptively
      provides a fix to an existing function that the second patch uses.
      ====================
      
      Link: https://lore.kernel.org/r/20220413174320.28989-1-dsahern@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      268b41b3
    • David Ahern's avatar
      net: Handle l3mdev in ip_tunnel_init_flow · db53cd3d
      David Ahern authored
      Ido reported that the commit referenced in the Fixes tag broke
      a gre use case with dummy devices. Add a check to ip_tunnel_init_flow
      to see if the oif is an l3mdev port and if so set the oif to 0 to
      avoid the oif comparison in fib_lookup_good_nhc.
      
      Fixes: 40867d74 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
      Reported-by: default avatarIdo Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      db53cd3d
    • David Ahern's avatar
      l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu · 83daab06
      David Ahern authored
      Next patch uses l3mdev_master_upper_ifindex_by_index_rcu which throws
      a splat with debug kernels:
      
      [13783.087570] ------------[ cut here ]------------
      [13783.093974] RTNL: assertion failed at net/core/dev.c (6702)
      [13783.100761] WARNING: CPU: 3 PID: 51132 at net/core/dev.c:6702 netdev_master_upper_dev_get+0x16a/0x1a0
      
      [13783.184226] CPU: 3 PID: 51132 Comm: kworker/3:3 Not tainted 5.17.0-custom-100090-g6f963aafb1cc #682
      [13783.194788] Hardware name: Mellanox Technologies Ltd. MSN2010/SA002610, BIOS 5.6.5 08/24/2017
      [13783.204755] Workqueue: mld mld_ifc_work [ipv6]
      [13783.210338] RIP: 0010:netdev_master_upper_dev_get+0x16a/0x1a0
      [13783.217209] Code: 0f 85 e3 fe ff ff e8 65 ac ec fe ba 2e 1a 00 00 48 c7 c6 60 6f 38 83 48 c7 c7 c0 70 38 83 c6 05 5e b5 d7 01 01 e8 c6 29 52 00 <0f> 0b e9 b8 fe ff ff e8 5a 6c 35 ff e9 1c ff ff ff 48 89 ef e8 7d
      [13783.238659] RSP: 0018:ffffc9000b37f5a8 EFLAGS: 00010286
      [13783.244995] RAX: 0000000000000000 RBX: ffff88812ee5c000 RCX: 0000000000000000
      [13783.253379] RDX: ffff88811ce09d40 RSI: ffffffff812d0fcd RDI: fffff5200166fea7
      [13783.261769] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8882375f4287
      [13783.270138] R10: ffffed1046ebe850 R11: 0000000000000001 R12: dffffc0000000000
      [13783.278510] R13: 0000000000000275 R14: ffffc9000b37f688 R15: ffff8881273b4af8
      [13783.286870] FS:  0000000000000000(0000) GS:ffff888237400000(0000) knlGS:0000000000000000
      [13783.296352] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [13783.303177] CR2: 00007ff25fc9b2e8 CR3: 0000000174d23000 CR4: 00000000001006e0
      [13783.311546] Call Trace:
      [13783.314660]  <TASK>
      [13783.317553]  l3mdev_master_upper_ifindex_by_index_rcu+0x43/0xe0
      ...
      
      Change l3mdev_master_upper_ifindex_by_index_rcu to use
      netdev_master_upper_dev_get_rcu.
      
      Fixes: 6a6d6681 ("l3mdev: add function to retreive upper master")
      Signed-off-by: default avatarIdo Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Cc: Alexis Bauvin <abauvin@scaleway.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      83daab06
    • Jakub Kicinski's avatar
      Merge branch 'net-sched-two-fixes-for-cls_u32' · 0b9dcf37
      Jakub Kicinski authored
      Eric Dumazet says:
      
      ====================
      net/sched: two fixes for cls_u32
      
      One syzbot report brought my attention to cls_u32.
      
      This series addresses the syzbot report, and an additional
      issue discovered in code review.
      ====================
      
      Link: https://lore.kernel.org/r/20220413173542.533060-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0b9dcf37
    • Eric Dumazet's avatar
      net/sched: cls_u32: fix possible leak in u32_init_knode() · ec5b0f60
      Eric Dumazet authored
      While investigating a related syzbot report,
      I found that whenever call to tcf_exts_init()
      from u32_init_knode() is failing, we end up
      with an elevated refcount on ht->refcnt
      
      To avoid that, only increase the refcount after
      all possible errors have been evaluated.
      
      Fixes: b9a24bb7 ("net_sched: properly handle failure case of tcf_exts_init()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec5b0f60
    • Eric Dumazet's avatar
      net/sched: cls_u32: fix netns refcount changes in u32_change() · 3db09e76
      Eric Dumazet authored
      We are now able to detect extra put_net() at the moment
      they happen, instead of much later in correct code paths.
      
      u32_init_knode() / tcf_exts_init() populates the ->exts.net
      pointer, but as mentioned in tcf_exts_init(),
      the refcount on netns has not been elevated yet.
      
      The refcount is taken only once tcf_exts_get_net()
      is called.
      
      So the two u32_destroy_key() calls from u32_change()
      are attempting to release an invalid reference on the netns.
      
      syzbot report:
      
      refcount_t: decrement hit 0; leaking memory.
      WARNING: CPU: 0 PID: 21708 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
      Modules linked in:
      CPU: 0 PID: 21708 Comm: syz-executor.5 Not tainted 5.18.0-rc2-next-20220412-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
      Code: 1d 14 b6 b2 09 31 ff 89 de e8 6d e9 89 fd 84 db 75 e0 e8 84 e5 89 fd 48 c7 c7 40 aa 26 8a c6 05 f4 b5 b2 09 01 e8 e5 81 2e 05 <0f> 0b eb c4 e8 68 e5 89 fd 0f b6 1d e3 b5 b2 09 31 ff 89 de e8 38
      RSP: 0018:ffffc900051af1b0 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000040000 RSI: ffffffff8160a0c8 RDI: fffff52000a35e28
      RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff81604a9e R11: 0000000000000000 R12: 1ffff92000a35e3b
      R13: 00000000ffffffef R14: ffff8880211a0194 R15: ffff8880577d0a00
      FS:  00007f25d183e700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f19c859c028 CR3: 0000000051009000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       __refcount_dec include/linux/refcount.h:344 [inline]
       refcount_dec include/linux/refcount.h:359 [inline]
       ref_tracker_free+0x535/0x6b0 lib/ref_tracker.c:118
       netns_tracker_free include/net/net_namespace.h:327 [inline]
       put_net_track include/net/net_namespace.h:341 [inline]
       tcf_exts_put_net include/net/pkt_cls.h:255 [inline]
       u32_destroy_key.isra.0+0xa7/0x2b0 net/sched/cls_u32.c:394
       u32_change+0xe01/0x3140 net/sched/cls_u32.c:909
       tc_new_tfilter+0x98d/0x2200 net/sched/cls_api.c:2148
       rtnetlink_rcv_msg+0x80d/0xb80 net/core/rtnetlink.c:6016
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2495
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1921
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x6e2/0x800 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f25d0689049
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f25d183e168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f25d079c030 RCX: 00007f25d0689049
      RDX: 0000000000000000 RSI: 0000000020000340 RDI: 0000000000000005
      RBP: 00007f25d06e308d R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffd0b752e3f R14: 00007f25d183e300 R15: 0000000000022000
       </TASK>
      
      Fixes: 35c55fc1 ("cls_u32: use tcf_exts_get_net() before call_rcu()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3db09e76
    • Jakub Kicinski's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · f3226eed
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-04-13
      
      This series contains updates to igc and e1000e drivers.
      
      Sasha removes waiting for hardware semaphore as it could cause an
      infinite loop and changes usleep_range() calls done under atomic
      context to udelay() for igc. For e1000e, he changes some variables from
      u16 to u32 to prevent possible overflow of values.
      
      Vinicius disables PTM when going to suspend as it is causing hang issues
      on some platforms for igc.
      
      * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        e1000e: Fix possible overflow in LTR decoding
        igc: Fix suspending when PTM is active
        igc: Fix BUG: scheduling while atomic
        igc: Fix infinite loop in release_swfw_sync
      ====================
      
      Link: https://lore.kernel.org/r/20220413170814.2066855-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f3226eed
    • Sukadev Bhattiprolu's avatar
    • Stephen Hemminger's avatar
      net: restore alpha order to Ethernet devices in config · da367ac7
      Stephen Hemminger authored
      The displayed list of Ethernet devices in make menuconfig
      has gotten out of order. This is mostly due to changes in vendor
      names etc, but also because of new Microsoft entry in wrong place.
      
      This restores so that the display is in order even if the names
      of the sub directories are not.
      
      Fixes: ca9c54d2 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da367ac7
    • Paolo Valerio's avatar
      openvswitch: fix OOB access in reserve_sfa_size() · cefa91b2
      Paolo Valerio authored
      Given a sufficiently large number of actions, while copying and
      reserving memory for a new action of a new flow, if next_offset is
      greater than MAX_ACTIONS_BUFSIZE, the function reserve_sfa_size() does
      not return -EMSGSIZE as expected, but it allocates MAX_ACTIONS_BUFSIZE
      bytes increasing actions_len by req_size. This can then lead to an OOB
      write access, especially when further actions need to be copied.
      
      Fix it by rearranging the flow action size check.
      
      KASAN splat below:
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in reserve_sfa_size+0x1ba/0x380 [openvswitch]
      Write of size 65360 at addr ffff888147e4001c by task handler15/836
      
      CPU: 1 PID: 836 Comm: handler15 Not tainted 5.18.0-rc1+ #27
      ...
      Call Trace:
       <TASK>
       dump_stack_lvl+0x45/0x5a
       print_report.cold+0x5e/0x5db
       ? __lock_text_start+0x8/0x8
       ? reserve_sfa_size+0x1ba/0x380 [openvswitch]
       kasan_report+0xb5/0x130
       ? reserve_sfa_size+0x1ba/0x380 [openvswitch]
       kasan_check_range+0xf5/0x1d0
       memcpy+0x39/0x60
       reserve_sfa_size+0x1ba/0x380 [openvswitch]
       __add_action+0x24/0x120 [openvswitch]
       ovs_nla_add_action+0xe/0x20 [openvswitch]
       ovs_ct_copy_action+0x29d/0x1130 [openvswitch]
       ? __kernel_text_address+0xe/0x30
       ? unwind_get_return_address+0x56/0xa0
       ? create_prof_cpu_mask+0x20/0x20
       ? ovs_ct_verify+0xf0/0xf0 [openvswitch]
       ? prep_compound_page+0x198/0x2a0
       ? __kasan_check_byte+0x10/0x40
       ? kasan_unpoison+0x40/0x70
       ? ksize+0x44/0x60
       ? reserve_sfa_size+0x75/0x380 [openvswitch]
       __ovs_nla_copy_actions+0xc26/0x2070 [openvswitch]
       ? __zone_watermark_ok+0x420/0x420
       ? validate_set.constprop.0+0xc90/0xc90 [openvswitch]
       ? __alloc_pages+0x1a9/0x3e0
       ? __alloc_pages_slowpath.constprop.0+0x1da0/0x1da0
       ? unwind_next_frame+0x991/0x1e40
       ? __mod_node_page_state+0x99/0x120
       ? __mod_lruvec_page_state+0x2e3/0x470
       ? __kasan_kmalloc_large+0x90/0xe0
       ovs_nla_copy_actions+0x1b4/0x2c0 [openvswitch]
       ovs_flow_cmd_new+0x3cd/0xb10 [openvswitch]
       ...
      
      Cc: stable@vger.kernel.org
      Fixes: f28cd2af ("openvswitch: fix flow actions reallocation")
      Signed-off-by: default avatarPaolo Valerio <pvalerio@redhat.com>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cefa91b2
    • Peilin Ye's avatar
      ip6_gre: Fix skb_under_panic in __gre6_xmit() · ab198e1d
      Peilin Ye authored
      Feng reported an skb_under_panic BUG triggered by running
      test_ip6gretap() in tools/testing/selftests/bpf/test_tunnel.sh:
      
      [   82.492551] skbuff: skb_under_panic: text:ffffffffb268bb8e len:403 put:12 head:ffff9997c5480000 data:ffff9997c547fff8 tail:0x18b end:0x2c0 dev:ip6gretap11
      <...>
      [   82.607380] Call Trace:
      [   82.609389]  <TASK>
      [   82.611136]  skb_push.cold.109+0x10/0x10
      [   82.614289]  __gre6_xmit+0x41e/0x590
      [   82.617169]  ip6gre_tunnel_xmit+0x344/0x3f0
      [   82.620526]  dev_hard_start_xmit+0xf1/0x330
      [   82.623882]  sch_direct_xmit+0xe4/0x250
      [   82.626961]  __dev_queue_xmit+0x720/0xfe0
      <...>
      [   82.633431]  packet_sendmsg+0x96a/0x1cb0
      [   82.636568]  sock_sendmsg+0x30/0x40
      <...>
      
      The following sequence of events caused the BUG:
      
      1. During ip6gretap device initialization, tunnel->tun_hlen (e.g. 4) is
         calculated based on old flags (see ip6gre_calc_hlen());
      2. packet_snd() reserves header room for skb A, assuming
         tunnel->tun_hlen is 4;
      3. Later (in clsact Qdisc), the eBPF program sets a new tunnel key for
         skb A using bpf_skb_set_tunnel_key() (see _ip6gretap_set_tunnel());
      4. __gre6_xmit() detects the new tunnel key, and recalculates
         "tun_hlen" (e.g. 12) based on new flags (e.g. TUNNEL_KEY and
         TUNNEL_SEQ);
      5. gre_build_header() calls skb_push() with insufficient reserved header
         room, triggering the BUG.
      
      As sugguested by Cong, fix it by moving the call to skb_cow_head() after
      the recalculation of tun_hlen.
      
      Reproducer:
      
        OBJ=$LINUX/tools/testing/selftests/bpf/test_tunnel_kern.o
      
        ip netns add at_ns0
        ip link add veth0 type veth peer name veth1
        ip link set veth0 netns at_ns0
        ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
        ip netns exec at_ns0 ip link set dev veth0 up
        ip link set dev veth1 up mtu 1500
        ip addr add dev veth1 172.16.1.200/24
      
        ip netns exec at_ns0 ip addr add ::11/96 dev veth0
        ip netns exec at_ns0 ip link set dev veth0 up
        ip addr add dev veth1 ::22/96
        ip link set dev veth1 up
      
        ip netns exec at_ns0 \
        	ip link add dev ip6gretap00 type ip6gretap seq flowlabel 0xbcdef key 2 \
        	local ::11 remote ::22
      
        ip netns exec at_ns0 ip addr add dev ip6gretap00 10.1.1.100/24
        ip netns exec at_ns0 ip addr add dev ip6gretap00 fc80::100/96
        ip netns exec at_ns0 ip link set dev ip6gretap00 up
      
        ip link add dev ip6gretap11 type ip6gretap external
        ip addr add dev ip6gretap11 10.1.1.200/24
        ip addr add dev ip6gretap11 fc80::200/24
        ip link set dev ip6gretap11 up
      
        tc qdisc add dev ip6gretap11 clsact
        tc filter add dev ip6gretap11 egress bpf da obj $OBJ sec ip6gretap_set_tunnel
        tc filter add dev ip6gretap11 ingress bpf da obj $OBJ sec ip6gretap_get_tunnel
      
        ping6 -c 3 -w 10 -q ::11
      
      Fixes: 6712abc1 ("ip6_gre: add ip6 gre and gretap collect_md mode")
      Reported-by: default avatarFeng Zhou <zhoufeng.zf@bytedance.com>
      Co-developed-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab198e1d
    • Peilin Ye's avatar
      ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit() · f40c064e
      Peilin Ye authored
      Do not update tunnel->tun_hlen in data plane code.  Use a local variable
      instead, just like "tunnel_hlen" in net/ipv4/ip_gre.c:gre_fb_xmit().
      Co-developed-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f40c064e
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 226c6024
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-04-14
      
      This series contains updates to ice driver only.
      
      Maciej adjusts implementation in __ice_alloc_rx_bufs_zc() for when
      ice_fill_rx_descs() does not return the entire buffer request and fixes a
      return value for !CONFIG_NET_SWITCHDEV configuration which was preventing
      VF creation.
      
      Wojciech prevents eswitch transmit when VFs are being removed which was
      causing NULL pointer dereference.
      
      Jianglei Nie fixes a memory leak on error path of getting OROM data.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      226c6024
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 2cc7fb9d
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2022-04-14
      
      1) Fix the output interface for VRF cases in xfrm_dst_lookup.
         From David Ahern.
      
      2) Fix write out of bounds by doing COW on esp output when the
         packet size is larger than a page.
         From Sabrina Dubroca.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2cc7fb9d
    • Hangbin Liu's avatar
      net/packet: fix packet_sock xmit return value checking · 29e8e659
      Hangbin Liu authored
      packet_sock xmit could be dev_queue_xmit, which also returns negative
      errors. So only checking positive errors is not enough, or userspace
      sendmsg may return success while packet is not send out.
      
      Move the net_xmit_errno() assignment in the braces as checkpatch.pl said
      do not use assignment in if condition.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarFlavio Leitner <fbl@redhat.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29e8e659
    • Tony Lu's avatar
      net/smc: Fix sock leak when release after smc_shutdown() · 1a74e993
      Tony Lu authored
      Since commit e5d5aadc ("net/smc: fix sk_refcnt underflow on linkdown
      and fallback"), for a fallback connection, __smc_release() does not call
      sock_put() if its state is already SMC_CLOSED.
      
      When calling smc_shutdown() after falling back, its state is set to
      SMC_CLOSED but does not call sock_put(), so this patch calls it.
      
      Reported-and-tested-by: syzbot+6e29a053eb165bd50de5@syzkaller.appspotmail.com
      Fixes: e5d5aadc ("net/smc: fix sk_refcnt underflow on linkdown and fallback")
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a74e993
    • David Howells's avatar
      rxrpc: Restore removed timer deletion · ee3b0826
      David Howells authored
      A recent patch[1] from Eric Dumazet flipped the order in which the
      keepalive timer and the keepalive worker were cancelled in order to fix a
      syzbot reported issue[2].  Unfortunately, this enables the mirror image bug
      whereby the timer races with rxrpc_exit_net(), restarting the worker after
      it has been cancelled:
      
      	CPU 1		CPU 2
      	===============	=====================
      			if (rxnet->live)
      			<INTERRUPT>
      	rxnet->live = false;
       	cancel_work_sync(&rxnet->peer_keepalive_work);
      			rxrpc_queue_work(&rxnet->peer_keepalive_work);
      	del_timer_sync(&rxnet->peer_keepalive_timer);
      
      Fix this by restoring the removed del_timer_sync() so that we try to remove
      the timer twice.  If the timer runs again, it should see ->live == false
      and not restart the worker.
      
      Fixes: 1946014c ("rxrpc: fix a race in rxrpc_exit_net()")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Eric Dumazet <edumazet@google.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      Link: https://lore.kernel.org/r/20220404183439.3537837-1-eric.dumazet@gmail.com/ [1]
      Link: https://syzkaller.appspot.com/bug?extid=724378c4bb58f703b09a [2]
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee3b0826
    • Arun Ramadoss's avatar
      net: phy: LAN937x: added PHY_POLL_CABLE_TEST flag · 6f06aa6b
      Arun Ramadoss authored
      Added the phy_poll_cable_test flag for the lan937x phy driver.
      Tested using command -  ethtool --cable-test <dev>
      
      Fixes: 680baca5 ("net: phy: added the LAN937x phy support")
      Signed-off-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f06aa6b
  6. 14 Apr, 2022 8 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · d20339fa
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from wireless and netfilter.
      
        Current release - regressions:
      
         - smc: fix af_ops of child socket pointing to released memory
      
         - wifi: ath9k: fix usage of driver-private space in tx_info
      
        Previous releases - regressions:
      
         - ipv6: fix panic when forwarding a pkt with no in6 dev
      
         - sctp: use the correct skb for security_sctp_assoc_request
      
         - smc: fix NULL pointer dereference in smc_pnet_find_ib()
      
         - sched: fix initialization order when updating chain 0 head
      
         - phy: don't defer probe forever if PHY IRQ provider is missing
      
         - dsa: revert "net: dsa: setup master before ports"
      
         - dsa: felix: fix tagging protocol changes with multiple CPU ports
      
         - eth: ice:
            - fix use-after-free when freeing @rx_cpu_rmap
            - revert "iavf: fix deadlock occurrence during resetting VF
              interface"
      
         - eth: lan966x: stop processing the MAC entry is port is wrong
      
        Previous releases - always broken:
      
         - sched:
            - flower: fix parsing of ethertype following VLAN header
            - taprio: check if socket flags are valid
      
         - nfc: add flush_workqueue to prevent uaf
      
         - veth: ensure eth header is in skb's linear part
      
         - eth: stmmac: fix altr_tse_pcs function when using a fixed-link
      
         - eth: macb: restart tx only if queue pointer is lagging
      
         - eth: macvlan: fix leaking skb in source mode with nodst option"
      
      * tag 'net-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
        net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
        rtnetlink: Fix handling of disabled L3 stats in RTM_GETSTATS replies
        net: dsa: felix: fix tagging protocol changes with multiple CPU ports
        tun: annotate access to queue->trans_start
        nfc: nci: add flush_workqueue to prevent uaf
        net: dsa: realtek: don't parse compatible string for RTL8366S
        net: dsa: realtek: fix Kconfig to assure consistent driver linkage
        net: ftgmac100: access hardware register after clock ready
        Revert "net: dsa: setup master before ports"
        macvlan: Fix leaking skb in source mode with nodst option
        netfilter: nf_tables: nft_parse_register can return a negative value
        net: lan966x: Stop processing the MAC entry is port is wrong.
        net: lan966x: Fix when a port's upper is changed.
        net: lan966x: Fix IGMP snooping when frames have vlan tag
        net: lan966x: Update lan966x_ptp_get_nominal_value
        sctp: Initialize daddr on peeled off socket
        net/smc: Fix af_ops of child socket pointing to released memory
        net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
        net/smc: use memcpy instead of snprintf to avoid out of bounds read
        net: macb: Restart tx only if queue pointer is lagging
        ...
      d20339fa
    • Linus Torvalds's avatar
      Merge tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · b9b4c79e
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This became an unexpectedly large pull request due to various
        regression fixes in the previous kernels.
      
        The majority of fixes are a series of patches to address the
        regression at probe errors in devres'ed drivers, while there are yet
        more fixes for the x86 SG allocations and for USB-audio buffer
        management. In addition, a few HD-audio quirks and other small fixes
        are found"
      
      * tag 'sound-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (52 commits)
        ALSA: usb-audio: Limit max buffer and period sizes per time
        ALSA: memalloc: Add fallback SG-buffer allocations for x86
        ALSA: nm256: Don't call card private_free at probe error path
        ALSA: mtpav: Don't call card private_free at probe error path
        ALSA: rme9652: Fix the missing snd_card_free() call at probe error
        ALSA: hdspm: Fix the missing snd_card_free() call at probe error
        ALSA: hdsp: Fix the missing snd_card_free() call at probe error
        ALSA: oxygen: Fix the missing snd_card_free() call at probe error
        ALSA: lx6464es: Fix the missing snd_card_free() call at probe error
        ALSA: cmipci: Fix the missing snd_card_free() call at probe error
        ALSA: aw2: Fix the missing snd_card_free() call at probe error
        ALSA: als300: Fix the missing snd_card_free() call at probe error
        ALSA: lola: Fix the missing snd_card_free() call at probe error
        ALSA: bt87x: Fix the missing snd_card_free() call at probe error
        ALSA: sis7019: Fix the missing error handling
        ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error
        ALSA: via82xx: Fix the missing snd_card_free() call at probe error
        ALSA: sonicvibes: Fix the missing snd_card_free() call at probe error
        ALSA: rme96: Fix the missing snd_card_free() call at probe error
        ALSA: rme32: Fix the missing snd_card_free() call at probe error
        ...
      b9b4c79e
    • Linus Torvalds's avatar
      Merge tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 722985e2
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "A few more code and warning fixes.
      
        There's one feature ioctl removal patch slated for 5.18 that did not
        make it to the main pull request. It's just a one-liner and the ioctl
        has a v2 that's in use for a long time, no point to postpone it to
        5.19.
      
        Late update:
      
         - remove balance v1 ioctl, superseded by v2 in 2012
      
        Fixes:
      
         - add back cgroup attribution for compressed writes
      
         - add super block write start/end annotations to asynchronous balance
      
         - fix root reference count on an error handling path
      
         - in zoned mode, activate zone at the chunk allocation time to avoid
           ENOSPC due to timing issues
      
         - fix delayed allocation accounting for direct IO
      
        Warning fixes:
      
         - simplify assertion condition in zoned check
      
         - remove an unused variable"
      
      * tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix btrfs_submit_compressed_write cgroup attribution
        btrfs: fix root ref counts in error handling in btrfs_get_root_ref
        btrfs: zoned: activate block group only for extent allocation
        btrfs: return allocated block group from do_chunk_alloc()
        btrfs: mark resumed async balance as writing
        btrfs: remove support of balance v1 ioctl
        btrfs: release correct delalloc amount in direct IO write path
        btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
        btrfs: zoned: remove redundant condition in btrfs_run_delalloc_range
      722985e2
    • Linus Torvalds's avatar
      Merge tag 'fscache-fixes-20220413' of... · ec9c57a7
      Linus Torvalds authored
      Merge tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      Pull fscache fixes from David Howells:
       "Here's a collection of fscache and cachefiles fixes and misc small
        cleanups. The two main fixes are:
      
         - Add a missing unmark of the inode in-use mark in an error path.
      
         - Fix a KASAN slab-out-of-bounds error when setting the xattr on a
           cachefiles volume due to the wrong length being given to memcpy().
      
        In addition, there's the removal of an unused parameter, removal of an
        unused Kconfig option, conditionalising a bit of procfs-related stuff
        and some doc fixes"
      
      * tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        fscache: remove FSCACHE_OLD_API Kconfig option
        fscache: Use wrapper fscache_set_cache_state() directly when relinquishing
        fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FS
        fscache: Remove the cookie parameter from fscache_clear_page_bits()
        docs: filesystems: caching/backend-api.rst: fix an object withdrawn API
        docs: filesystems: caching/backend-api.rst: correct two relinquish APIs use
        cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattr
        cachefiles: unmark inode in use in error path
      ec9c57a7
    • Jianglei Nie's avatar
      ice: Fix memory leak in ice_get_orom_civd_data() · 7c8881b7
      Jianglei Nie authored
      A memory chunk was allocated for orom_data in ice_get_orom_civd_data()
      by vzmalloc(). But when ice_read_flash_module() fails, the allocated
      memory is not freed, which will lead to a memory leak.
      
      We can fix it by freeing the orom_data when ce_read_flash_module() fails.
      
      Fixes: af18d886 ("ice: reduce time to read Option ROM CIVD data")
      Signed-off-by: default avatarJianglei Nie <niejianglei2021@163.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7c8881b7
    • Wojciech Drewek's avatar
      ice: fix crash in switchdev mode · d2016651
      Wojciech Drewek authored
      Below steps end up with crash:
      - modprobe ice
      - devlink dev eswitch set $PF1_PCI mode switchdev
      - echo 64 > /sys/class/net/$PF1/device/sriov_numvfs
      - rmmod ice
      
      Calling ice_eswitch_port_start_xmit while the process of removing
      VFs is in progress ends up with NULL pointer dereference.
      That's because PR netdev is not released but some resources
      are already freed. Fix it by checking if ICE_VF_DIS bit is set.
      
      Call trace:
      [ 1379.595146] BUG: kernel NULL pointer dereference, address: 0000000000000040
      [ 1379.595284] #PF: supervisor read access in kernel mode
      [ 1379.595410] #PF: error_code(0x0000) - not-present page
      [ 1379.595535] PGD 0 P4D 0
      [ 1379.595657] Oops: 0000 [#1] PREEMPT SMP PTI
      [ 1379.595783] CPU: 4 PID: 974 Comm: NetworkManager Kdump: loaded Tainted: G           OE     5.17.0-rc8_mrq_dev-queue+ #12
      [ 1379.595926] Hardware name: Intel Corporation S1200SP/S1200SP, BIOS S1200SP.86B.03.01.0042.013020190050 01/30/2019
      [ 1379.596063] RIP: 0010:ice_eswitch_port_start_xmit+0x46/0xd0 [ice]
      [ 1379.596292] Code: c7 c8 09 00 00 e8 9a c9 fc ff 84 c0 0f 85 82 00 00 00 4c 89 e7 e8 ca 70 fe ff 48 8b 7d 58 48 89 c3 48 85 ff 75 5e 48 8b 53 20 <8b> 42 40 85 c0 74 78 8d 48 01 f0 0f b1 4a 40 75 f2 0f b6 95 84 00
      [ 1379.596456] RSP: 0018:ffffaba0c0d7bad0 EFLAGS: 00010246
      [ 1379.596584] RAX: ffff969c14c71680 RBX: ffff969c14c71680 RCX: 000100107a0f0000
      [ 1379.596715] RDX: 0000000000000000 RSI: ffff969b9d631000 RDI: 0000000000000000
      [ 1379.596846] RBP: ffff969c07b46500 R08: ffff969becfca8ac R09: 0000000000000001
      [ 1379.596977] R10: 0000000000000004 R11: ffffaba0c0d7bbec R12: ffff969b9d631000
      [ 1379.597106] R13: ffffffffc08357a0 R14: ffff969c07b46500 R15: ffff969b9d631000
      [ 1379.597237] FS:  00007f72c0e25c80(0000) GS:ffff969f13500000(0000) knlGS:0000000000000000
      [ 1379.597414] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1379.597562] CR2: 0000000000000040 CR3: 000000012b316006 CR4: 00000000003706e0
      [ 1379.597713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1379.597863] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1379.598015] Call Trace:
      [ 1379.598153]  <TASK>
      [ 1379.598294]  dev_hard_start_xmit+0xd9/0x220
      [ 1379.598444]  sch_direct_xmit+0x8a/0x340
      [ 1379.598592]  __dev_queue_xmit+0xa3c/0xd30
      [ 1379.598739]  ? packet_parse_headers+0xb4/0xf0
      [ 1379.598890]  packet_sendmsg+0xa15/0x1620
      [ 1379.599038]  ? __check_object_size+0x46/0x140
      [ 1379.599186]  sock_sendmsg+0x5e/0x60
      [ 1379.599330]  ____sys_sendmsg+0x22c/0x270
      [ 1379.599474]  ? import_iovec+0x17/0x20
      [ 1379.599622]  ? sendmsg_copy_msghdr+0x59/0x90
      [ 1379.599771]  ___sys_sendmsg+0x81/0xc0
      [ 1379.599917]  ? __pollwait+0xd0/0xd0
      [ 1379.600061]  ? preempt_count_add+0x68/0xa0
      [ 1379.600210]  ? _raw_write_lock_irq+0x1a/0x40
      [ 1379.600369]  ? ep_done_scan+0xc9/0x110
      [ 1379.600494]  ? _raw_spin_unlock_irqrestore+0x25/0x40
      [ 1379.600622]  ? preempt_count_add+0x68/0xa0
      [ 1379.600747]  ? _raw_spin_lock_irq+0x1a/0x40
      [ 1379.600899]  ? __fget_light+0x8f/0x110
      [ 1379.601024]  __sys_sendmsg+0x49/0x80
      [ 1379.601148]  ? release_ds_buffers+0x50/0xe0
      [ 1379.601274]  do_syscall_64+0x3b/0x90
      [ 1379.601399]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 1379.601525] RIP: 0033:0x7f72c1e2e35d
      
      Fixes: f5396b8a ("ice: switchdev slow path")
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Reported-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d2016651
    • Maciej Fijalkowski's avatar
      ice: allow creating VFs for !CONFIG_NET_SWITCHDEV · aacca7a8
      Maciej Fijalkowski authored
      Currently for !CONFIG_NET_SWITCHDEV kernel builds it is not possible to
      create VFs properly as call to ice_eswitch_configure() returns
      -EOPNOTSUPP for us. This is because CONFIG_ICE_SWITCHDEV depends on
      CONFIG_NET_SWITCHDEV.
      
      Change the ice_eswitch_configure() implementation for
      !CONFIG_ICE_SWITCHDEV to return 0 instead -EOPNOTSUPP and let
      ice_ena_vfs() finish its work properly.
      
      CC: Grzegorz Nitka <grzegorz.nitka@intel.com>
      Fixes: 1a1c40df ("ice: set and release switchdev environment")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      aacca7a8
    • Maciej Fijalkowski's avatar
      ice: xsk: check if Rx ring was filled up to the end · d1fc4c6f
      Maciej Fijalkowski authored
      __ice_alloc_rx_bufs_zc() checks if a number of the descriptors to be
      allocated would cause the ring wrap. In that case, driver will issue two
      calls to xsk_buff_alloc_batch() - one that will fill the ring up to the
      end and the second one that will start with filling descriptors from the
      beginning of the ring.
      
      ice_fill_rx_descs() is a wrapper for taking care of what
      xsk_buff_alloc_batch() gave back to the driver. It works in a best
      effort approach, so for example when driver asks for 64 buffers,
      ice_fill_rx_descs() could assign only 32. Such case needs to be checked
      when ring is being filled up to the end, because in that situation ntu
      might not reached the end of the ring.
      
      Fix the ring wrap by checking if nb_buffs_extra has the expected value.
      If not, bump ntu and go directly to tail update.
      
      Fixes: 3876ff52 ("ice: xsk: Handle SW XDP ring wrap and bump tail more often")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarShwetha Nagaraju <Shwetha.nagaraju@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d1fc4c6f