1. 09 Feb, 2022 25 commits
    • Raju Rangoju's avatar
      net: amd-xgbe: disable interrupts during pci removal · 68c2d6af
      Raju Rangoju authored
      Hardware interrupts are enabled during the pci probe, however,
      they are not disabled during pci removal.
      
      Disable all hardware interrupts during pci removal to avoid any
      issues.
      
      Fixes: e7537740 ("amd-xgbe: Update PCI support to use new IRQ functions")
      Suggested-by: default avatarSelwin Sebastian <Selwin.Sebastian@amd.com>
      Signed-off-by: default avatarRaju Rangoju <Raju.Rangoju@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68c2d6af
    • Jon Maloy's avatar
      tipc: rate limit warning for received illegal binding update · c7223d68
      Jon Maloy authored
      It would be easy to craft a message containing an illegal binding table
      update operation. This is handled correctly by the code, but the
      corresponding warning printout is not rate limited as is should be.
      We fix this now.
      
      Fixes: b97bf3fd ("[TIPC] Initial merge")
      Signed-off-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7223d68
    • Joel Stanley's avatar
      net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE · bc1c3c3b
      Joel Stanley authored
      Fix loading of the driver when built as a module.
      
      Fixes: f160e994 ("net: phy: Add mdio-aspeed")
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarAndrew Jeffery <andrew@aj.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc1c3c3b
    • Eric Dumazet's avatar
      veth: fix races around rq->rx_notify_masked · 68468d8c
      Eric Dumazet authored
      veth being NETIF_F_LLTX enabled, we need to be more careful
      whenever we read/write rq->rx_notify_masked.
      
      BUG: KCSAN: data-race in veth_xmit / veth_xmit
      
      write to 0xffff888133d9a9f8 of 1 bytes by task 23552 on cpu 0:
       __veth_xdp_flush drivers/net/veth.c:269 [inline]
       veth_xmit+0x307/0x470 drivers/net/veth.c:350
       __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
       netdev_start_xmit include/linux/netdevice.h:4697 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3473
       dev_hard_start_xmit net/core/dev.c:3489 [inline]
       __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
       br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53
       NF_HOOK include/linux/netfilter.h:307 [inline]
       br_forward_finish net/bridge/br_forward.c:66 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115
       br_flood+0x521/0x5c0 net/bridge/br_forward.c:242
       br_dev_xmit+0x8b6/0x960
       __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
       netdev_start_xmit include/linux/netdevice.h:4697 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3473
       dev_hard_start_xmit net/core/dev.c:3489 [inline]
       __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
       neigh_hh_output include/net/neighbour.h:525 [inline]
       neigh_output include/net/neighbour.h:539 [inline]
       ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228
       ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430
       dst_output include/net/dst.h:451 [inline]
       ip_local_out net/ipv4/ip_output.c:126 [inline]
       ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570
       udp_send_skb+0x641/0x880 net/ipv4/udp.c:967
       udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888133d9a9f8 of 1 bytes by task 23563 on cpu 1:
       __veth_xdp_flush drivers/net/veth.c:268 [inline]
       veth_xmit+0x2d6/0x470 drivers/net/veth.c:350
       __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
       netdev_start_xmit include/linux/netdevice.h:4697 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3473
       dev_hard_start_xmit net/core/dev.c:3489 [inline]
       __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
       br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53
       NF_HOOK include/linux/netfilter.h:307 [inline]
       br_forward_finish net/bridge/br_forward.c:66 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115
       br_flood+0x521/0x5c0 net/bridge/br_forward.c:242
       br_dev_xmit+0x8b6/0x960
       __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
       netdev_start_xmit include/linux/netdevice.h:4697 [inline]
       xmit_one+0x105/0x2f0 net/core/dev.c:3473
       dev_hard_start_xmit net/core/dev.c:3489 [inline]
       __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
       dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
       neigh_hh_output include/net/neighbour.h:525 [inline]
       neigh_output include/net/neighbour.h:539 [inline]
       ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228
       ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430
       dst_output include/net/dst.h:451 [inline]
       ip_local_out net/ipv4/ip_output.c:126 [inline]
       ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570
       udp_send_skb+0x641/0x880 net/ipv4/udp.c:967
       udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00 -> 0x01
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 23563 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00064-gc36c04c2 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 948d4f21 ("veth: Add driver XDP")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68468d8c
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-5.17-20220209' of... · 6d072066
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-5.17-20220209' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2022-02-09
      
      this is a pull request of 2 patches for net/master.
      
      Oliver Hartkopp contributes 2 fixes for the CAN ISOTP protocol.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d072066
    • Duoming Zhou's avatar
      ax25: fix NPD bug in ax25_disconnect · 7ec02f5a
      Duoming Zhou authored
      The ax25_disconnect() in ax25_kill_by_device() is not
      protected by any locks, thus there is a race condition
      between ax25_disconnect() and ax25_destroy_socket().
      when ax25->sk is assigned as NULL by ax25_destroy_socket(),
      a NULL pointer dereference bug will occur if site (1) or (2)
      dereferences ax25->sk.
      
      ax25_kill_by_device()                | ax25_release()
        ax25_disconnect()                  |   ax25_destroy_socket()
          ...                              |
          if(ax25->sk != NULL)             |     ...
            ...                            |     ax25->sk = NULL;
            bh_lock_sock(ax25->sk); //(1)  |     ...
            ...                            |
            bh_unlock_sock(ax25->sk); //(2)|
      
      This patch moves ax25_disconnect() into lock_sock(), which can
      synchronize with ax25_destroy_socket() in ax25_release().
      
      Fail log:
      ===============================================================
      BUG: kernel NULL pointer dereference, address: 0000000000000088
      ...
      RIP: 0010:_raw_spin_lock+0x7e/0xd0
      ...
      Call Trace:
      ax25_disconnect+0xf6/0x220
      ax25_device_event+0x187/0x250
      raw_notifier_call_chain+0x5e/0x70
      dev_close_many+0x17d/0x230
      rollback_registered_many+0x1f1/0x950
      unregister_netdevice_queue+0x133/0x200
      unregister_netdev+0x13/0x20
      ...
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ec02f5a
    • David S. Miller's avatar
      Merge branch 'net-fix-skb-unclone-issues' · 676b4936
      David S. Miller authored
      Antoine Tenart says:
      
      ====================
      net: fix issues when uncloning an skb dst+metadata
      
      This fixes two issues when uncloning an skb dst+metadata in
      tun_dst_unclone; this was initially reported by Vlad Buslov[1]. Because
      of the memory leak fixed by patch 2, the issue in patch 1 never happened
      in practice.
      
      tun_dst_unclone is called from two different places, one in geneve/vxlan
      to handle PMTU and one in net/openvswitch/actions.c where it is used to
      retrieve tunnel information. While both Vlad and I tested the former, we
      could not for the latter. I did spend quite some time trying to, but
      that code path is not easy to trigger. Code inspection shows this should
      be fine, the tunnel information (dst+metadata) is uncloned and the skb
      it is referenced from is only consumed after all accesses to the tunnel
      information are done:
      
        do_execute_actions
          output_userspace
            dev_fill_metadata_dst         <- dst+metadata is uncloned
            ovs_dp_upcall
              queue_userspace_packet
                ovs_nla_put_tunnel_info   <- metadata (tunnel info) is accessed
          consume_skb                     <- dst+metadata is freed
      
      Thanks!
      Antoine
      
      [1] https://lore.kernel.org/all/ygnhh79yluw2.fsf@nvidia.com/T/#m2f814614a4f5424cea66bbff7297f692b59b69a0
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      676b4936
    • Antoine Tenart's avatar
      net: fix a memleak when uncloning an skb dst and its metadata · 9eeabdf1
      Antoine Tenart authored
      When uncloning an skb dst and its associated metadata, a new
      dst+metadata is allocated and later replaces the old one in the skb.
      This is helpful to have a non-shared dst+metadata attached to a specific
      skb.
      
      The issue is the uncloned dst+metadata is initialized with a refcount of
      1, which is increased to 2 before attaching it to the skb. When
      tun_dst_unclone returns, the dst+metadata is only referenced from a
      single place (the skb) while its refcount is 2. Its refcount will never
      drop to 0 (when the skb is consumed), leading to a memory leak.
      
      Fix this by removing the call to dst_hold in tun_dst_unclone, as the
      dst+metadata refcount is already 1.
      
      Fixes: fc4099f1 ("openvswitch: Fix egress tunnel info.")
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Reported-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Tested-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9eeabdf1
    • Antoine Tenart's avatar
      net: do not keep the dst cache when uncloning an skb dst and its metadata · cfc56f85
      Antoine Tenart authored
      When uncloning an skb dst and its associated metadata a new dst+metadata
      is allocated and the tunnel information from the old metadata is copied
      over there.
      
      The issue is the tunnel metadata has references to cached dst, which are
      copied along the way. When a dst+metadata refcount drops to 0 the
      metadata is freed including the cached dst entries. As they are also
      referenced in the initial dst+metadata, this ends up in UaFs.
      
      In practice the above did not happen because of another issue, the
      dst+metadata was never freed because its refcount never dropped to 0
      (this will be fixed in a subsequent patch).
      
      Fix this by initializing the dst cache after copying the tunnel
      information from the old metadata to also unshare the dst cache.
      
      Fixes: d71785ff ("net: add dst_cache to ovs vxlan lwtunnel")
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reported-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Tested-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfc56f85
    • Oliver Hartkopp's avatar
      can: isotp: fix error path in isotp_sendmsg() to unlock wait queue · 8375dfac
      Oliver Hartkopp authored
      Commit 43a08c3b ("can: isotp: isotp_sendmsg(): fix TX buffer concurrent
      access in isotp_sendmsg()") introduced a new locking scheme that may render
      the userspace application in a locking state when an error is detected.
      This issue shows up under high load on simultaneously running isotp channels
      with identical configuration which is against the ISO specification and
      therefore breaks any reasonable PDU communication anyway.
      
      Fixes: 43a08c3b ("can: isotp: isotp_sendmsg(): fix TX buffer concurrent access in isotp_sendmsg()")
      Link: https://lore.kernel.org/all/20220209073601.25728-1-socketcan@hartkopp.net
      Cc: stable@vger.kernel.org
      Cc: Ziyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      8375dfac
    • Oliver Hartkopp's avatar
      can: isotp: fix potential CAN frame reception race in isotp_rcv() · 7c759040
      Oliver Hartkopp authored
      When receiving a CAN frame the current code logic does not consider
      concurrently receiving processes which do not show up in real world
      usage.
      
      Ziyang Xuan writes:
      
      The following syz problem is one of the scenarios. so->rx.len is
      changed by isotp_rcv_ff() during isotp_rcv_cf(), so->rx.len equals
      0 before alloc_skb() and equals 4096 after alloc_skb(). That will
      trigger skb_over_panic() in skb_put().
      
      =======================================================
      CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 5.16.0-rc8-syzkaller #0
      RIP: 0010:skb_panic+0x16c/0x16e net/core/skbuff.c:113
      Call Trace:
       <TASK>
       skb_over_panic net/core/skbuff.c:118 [inline]
       skb_put.cold+0x24/0x24 net/core/skbuff.c:1990
       isotp_rcv_cf net/can/isotp.c:570 [inline]
       isotp_rcv+0xa38/0x1e30 net/can/isotp.c:668
       deliver net/can/af_can.c:574 [inline]
       can_rcv_filter+0x445/0x8d0 net/can/af_can.c:635
       can_receive+0x31d/0x580 net/can/af_can.c:665
       can_rcv+0x120/0x1c0 net/can/af_can.c:696
       __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5465
       __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5579
      
      Therefore we make sure the state changes and data structures stay
      consistent at CAN frame reception time by adding a spin_lock in
      isotp_rcv(). This fixes the issue reported by syzkaller but does not
      affect real world operation.
      
      Fixes: e057dd3f ("can: add ISO 15765-2:2016 transport protocol")
      Link: https://lore.kernel.org/linux-can/d7e69278-d741-c706-65e1-e87623d9a8e8@huawei.com/T/
      Link: https://lore.kernel.org/all/20220208200026.13783-1-socketcan@hartkopp.net
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+4c63f36709a642f801c5@syzkaller.appspotmail.com
      Reported-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      7c759040
    • Louis Peens's avatar
      nfp: flower: fix ida_idx not being released · 7db788ad
      Louis Peens authored
      When looking for a global mac index the extra NFP_TUN_PRE_TUN_IDX_BIT
      that gets set if nfp_flower_is_supported_bridge is true is not taken
      into account. Consequently the path that should release the ida_index
      in cleanup is never triggered, causing messages like:
      
          nfp 0000:02:00.0: nfp: Failed to offload MAC on br-ex.
          nfp 0000:02:00.0: nfp: Failed to offload MAC on br-ex.
          nfp 0000:02:00.0: nfp: Failed to offload MAC on br-ex.
      
      after NFP_MAX_MAC_INDEX number of reconfigs. Ultimately this lead to
      new tunnel flows not being offloaded.
      
      Fix this by unsetting the NFP_TUN_PRE_TUN_IDX_BIT before checking if
      the port is of type OTHER.
      
      Fixes: 2e0bc7f3 ("nfp: flower: encode mac indexes with pre-tunnel rule check")
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20220208101453.321949-1-simon.horman@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7db788ad
    • Eric Dumazet's avatar
      ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure path · 5611a006
      Eric Dumazet authored
      ip[6]mr_free_table() can only be called under RTNL lock.
      
      RTNL: assertion failed at net/core/dev.c (10367)
      WARNING: CPU: 1 PID: 5890 at net/core/dev.c:10367 unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367
      Modules linked in:
      CPU: 1 PID: 5890 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller-11627-g422ee58dc0ef #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367
      Code: 0f 85 9b ee ff ff e8 69 07 4b fa ba 7f 28 00 00 48 c7 c6 00 90 ae 8a 48 c7 c7 40 90 ae 8a c6 05 6d b1 51 06 01 e8 8c 90 d8 01 <0f> 0b e9 70 ee ff ff e8 3e 07 4b fa 4c 89 e7 e8 86 2a 59 fa e9 ee
      RSP: 0018:ffffc900046ff6e0 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff888050f51d00 RSI: ffffffff815fa008 RDI: fffff520008dfece
      RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff815f3d6e R11: 0000000000000000 R12: 00000000fffffff4
      R13: dffffc0000000000 R14: ffffc900046ff750 R15: ffff88807b7dc000
      FS:  00007f4ab736e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fee0b4f8990 CR3: 000000001e7d2000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       mroute_clean_tables+0x244/0xb40 net/ipv6/ip6mr.c:1509
       ip6mr_free_table net/ipv6/ip6mr.c:389 [inline]
       ip6mr_rules_init net/ipv6/ip6mr.c:246 [inline]
       ip6mr_net_init net/ipv6/ip6mr.c:1306 [inline]
       ip6mr_net_init+0x3f0/0x4e0 net/ipv6/ip6mr.c:1298
       ops_init+0xaf/0x470 net/core/net_namespace.c:140
       setup_net+0x54f/0xbb0 net/core/net_namespace.c:331
       copy_net_ns+0x318/0x760 net/core/net_namespace.c:475
       create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
       copy_namespaces+0x391/0x450 kernel/nsproxy.c:178
       copy_process+0x2e0c/0x7300 kernel/fork.c:2167
       kernel_clone+0xe7/0xab0 kernel/fork.c:2555
       __do_sys_clone+0xc8/0x110 kernel/fork.c:2672
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f4ab89f9059
      Code: Unable to access opcode bytes at RIP 0x7f4ab89f902f.
      RSP: 002b:00007f4ab736e118 EFLAGS: 00000206 ORIG_RAX: 0000000000000038
      RAX: ffffffffffffffda RBX: 00007f4ab8b0bf60 RCX: 00007f4ab89f9059
      RDX: 0000000020000280 RSI: 0000000020000270 RDI: 0000000040200000
      RBP: 00007f4ab8a5308d R08: 0000000020000300 R09: 0000000020000300
      R10: 00000000200002c0 R11: 0000000000000206 R12: 0000000000000000
      R13: 00007ffc3977cc1f R14: 00007f4ab736e300 R15: 0000000000022000
       </TASK>
      
      Fixes: f243e5a7 ("ipmr,ip6mr: call ip6mr_free_table() on failure path")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <cong.wang@bytedance.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220208053451.2885398-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5611a006
    • Cai Huoqing's avatar
      net: ethernet: litex: Add the dependency on HAS_IOMEM · 2427f03f
      Cai Huoqing authored
      The LiteX driver uses devm io function API which
      needs HAS_IOMEM enabled, so add the dependency on HAS_IOMEM.
      
      Fixes: ee7da21a ("net: Add driver for LiteX's LiteETH network interface")
      Signed-off-by: default avatarCai Huoqing <cai.huoqing@linux.dev>
      Link: https://lore.kernel.org/r/20220208013308.6563-1-cai.huoqing@linux.devSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2427f03f
    • Sukadev Bhattiprolu's avatar
      ibmvnic: don't release napi in __ibmvnic_open() · 61772b09
      Sukadev Bhattiprolu authored
      If __ibmvnic_open() encounters an error such as when setting link state,
      it calls release_resources() which frees the napi structures needlessly.
      Instead, have __ibmvnic_open() only clean up the work it did so far (i.e.
      disable napi and irqs) and leave the rest to the callers.
      
      If caller of __ibmvnic_open() is ibmvnic_open(), it should release the
      resources immediately. If the caller is do_reset() or do_hard_reset(),
      they will release the resources on the next reset.
      
      This fixes following crash that occurred when running the drmgr command
      several times to add/remove a vnic interface:
      
      	[102056] ibmvnic 30000003 env3: Disabling rx_scrq[6] irq
      	[102056] ibmvnic 30000003 env3: Disabling rx_scrq[7] irq
      	[102056] ibmvnic 30000003 env3: Replenished 8 pools
      	Kernel attempted to read user page (10) - exploit attempt? (uid: 0)
      	BUG: Kernel NULL pointer dereference on read at 0x00000010
      	Faulting instruction address: 0xc000000000a3c840
      	Oops: Kernel access of bad area, sig: 11 [#1]
      	LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
      	...
      	CPU: 9 PID: 102056 Comm: kworker/9:2 Kdump: loaded Not tainted 5.16.0-rc5-autotest-g6441998e #1
      	Workqueue: events_long __ibmvnic_reset [ibmvnic]
      	NIP:  c000000000a3c840 LR: c0080000029b5378 CTR: c000000000a3c820
      	REGS: c0000000548e37e0 TRAP: 0300   Not tainted  (5.16.0-rc5-autotest-g6441998e)
      	MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28248484  XER: 00000004
      	CFAR: c0080000029bdd24 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0
      	GPR00: c0080000029b55d0 c0000000548e3a80 c0000000028f0200 0000000000000000
      	...
      	NIP [c000000000a3c840] napi_enable+0x20/0xc0
      	LR [c0080000029b5378] __ibmvnic_open+0xf0/0x430 [ibmvnic]
      	Call Trace:
      	[c0000000548e3a80] [0000000000000006] 0x6 (unreliable)
      	[c0000000548e3ab0] [c0080000029b55d0] __ibmvnic_open+0x348/0x430 [ibmvnic]
      	[c0000000548e3b40] [c0080000029bcc28] __ibmvnic_reset+0x500/0xdf0 [ibmvnic]
      	[c0000000548e3c60] [c000000000176228] process_one_work+0x288/0x570
      	[c0000000548e3d00] [c000000000176588] worker_thread+0x78/0x660
      	[c0000000548e3da0] [c0000000001822f0] kthread+0x1c0/0x1d0
      	[c0000000548e3e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
      	Instruction dump:
      	7d2948f8 792307e0 4e800020 60000000 3c4c01eb 384239e0 f821ffd1 39430010
      	38a0fff6 e92d1100 f9210028 39200000 <e9030010> f9010020 60420000 e9210020
      	---[ end trace 5f8033b08fd27706 ]---
      
      Fixes: ed651a10 ("ibmvnic: Updated reset handling")
      Reported-by: default avatarAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220208001918.900602-1-sukadev@linux.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      61772b09
    • Jakub Kicinski's avatar
      Merge branch 'more-dsa-fixes-for-devres-mdiobus_-alloc-register' · 1335648f
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      More DSA fixes for devres + mdiobus_{alloc,register}
      
      The initial patch series "[net,0/2] Fix mdiobus users with devres"
      https://patchwork.kernel.org/project/netdevbpf/cover/20210920214209.1733768-1-vladimir.oltean@nxp.com/
      fixed some instances where DSA drivers on slow buses (SPI, I2C) trigger
      a panic (changed since then to a warn) in mdiobus_free. That was due to
      devres calling mdiobus_free() with no prior mdiobus_unregister(), which
      again was due to commit ac3a68d5 ("net: phy: don't abuse devres in
      devm_mdiobus_register()") by Bartosz Golaszewski.
      
      Rafael Richter and Daniel Klauer report yet another variation on that
      theme, but this time it applies to any DSA switch driver, not just those
      on buses which have a "->shutdown() calls ->remove() which unregisters
      children" sequence.
      
      Their setup is that of an LX2160A DPAA2 SoC driving a Marvell DSA switch
      (MDIO). DPAA2 Ethernet drivers probe on the "fsl-mc" bus
      (drivers/bus/fsl-mc/fsl-mc-bus.c). This bus is meant to be the
      kernel-side representation of the networking objects kept by the
      Management Complex (MC) firmware.
      
      The fsl-mc bus driver has this pattern:
      
      static void fsl_mc_bus_shutdown(struct platform_device *pdev)
      {
      	fsl_mc_bus_remove(pdev);
      }
      
      which proceeds to remove the children on the bus. Among those children,
      the dpaa2-eth network driver.
      
      When dpaa2-eth is a DSA master, this removal of the master on shutdown
      trips up the device link created by dsa_master_setup(), and as such, the
      Marvell switch is also removed.
      
      From this point on, readers can revisit the description of commits
      74b6d7d1 ("net: dsa: realtek: register the MDIO bus under devres")
      5135e96a ("net: dsa: don't allocate the slave_mii_bus using devres")
      
      since the prerequisites for the BUG_ON in mdiobus_free() have been
      accomplished if there is a devres mismatch between mdiobus_alloc() and
      mdiobus_register().
      
      Most DSA drivers have this kind of mismatch, and upon my initial
      assessment I had not realized the possibility described above, so I
      didn't fix it. This patch series walks through all drivers and makes
      them use either fully devres, or no devres.
      
      I am aware that there are DSA drivers that are only known to be tested
      with a single DSA master, so some patches are probably overkill for
      them. But code is copy-pasted from so many sources without fully
      understanding the differences, that I think it's better to not leave an
      in-tree source of inspiration that may lead to subtle breakage if not
      adapted properly.
      ====================
      
      Link: https://lore.kernel.org/r/20220207161553.579933-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1335648f
    • Vladimir Oltean's avatar
      net: dsa: lantiq_gswip: don't use devres for mdiobus · 0d120dfb
      Vladimir Oltean authored
      As explained in commits:
      74b6d7d1 ("net: dsa: realtek: register the MDIO bus under devres")
      5135e96a ("net: dsa: don't allocate the slave_mii_bus using devres")
      
      mdiobus_free() will panic when called from devm_mdiobus_free() <-
      devres_release_all() <- __device_release_driver(), and that mdiobus was
      not previously unregistered.
      
      The GSWIP switch is a platform device, so the initial set of constraints
      that I thought would cause this (I2C or SPI buses which call ->remove on
      ->shutdown) do not apply. But there is one more which applies here.
      
      If the DSA master itself is on a bus that calls ->remove from ->shutdown
      (like dpaa2-eth, which is on the fsl-mc bus), there is a device link
      between the switch and the DSA master, and device_links_unbind_consumers()
      will unbind the GSWIP switch driver on shutdown.
      
      So the same treatment must be applied to all DSA switch drivers, which
      is: either use devres for both the mdiobus allocation and registration,
      or don't use devres at all.
      
      The gswip driver has the code structure in place for orderly mdiobus
      removal, so just replace devm_mdiobus_alloc() with the non-devres
      variant, and add manual free where necessary, to ensure that we don't
      let devres free a still-registered bus.
      
      Fixes: ac3a68d5 ("net: phy: don't abuse devres in devm_mdiobus_register()")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0d120dfb
    • Vladimir Oltean's avatar
      net: dsa: mt7530: fix kernel bug in mdiobus_free() when unbinding · 9ffe3d09
      Vladimir Oltean authored
      Nobody in this driver calls mdiobus_unregister(), which is necessary if
      mdiobus_register() completes successfully. So if the devres callbacks
      that free the mdiobus get invoked (this is the case when unbinding the
      driver), mdiobus_free() will BUG if the mdiobus is still registered,
      which it is.
      
      My speculation is that this is due to the fact that prior to commit
      ac3a68d5 ("net: phy: don't abuse devres in devm_mdiobus_register()")
      from June 2020, _devm_mdiobus_free() used to call mdiobus_unregister().
      But at the time that the mt7530 support was introduced in May 2021, the
      API was already changed. It's therefore likely that the blamed patch was
      developed on an older tree, and incorrectly adapted to net-next. This
      makes the Fixes: tag correct.
      
      Fix the problem by using the devres variant of mdiobus_register.
      
      Fixes: ba751e28 ("net: dsa: mt7530: add interrupt support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9ffe3d09
    • Vladimir Oltean's avatar
      net: dsa: seville: register the mdiobus under devres · bd488afc
      Vladimir Oltean authored
      As explained in commits:
      74b6d7d1 ("net: dsa: realtek: register the MDIO bus under devres")
      5135e96a ("net: dsa: don't allocate the slave_mii_bus using devres")
      
      mdiobus_free() will panic when called from devm_mdiobus_free() <-
      devres_release_all() <- __device_release_driver(), and that mdiobus was
      not previously unregistered.
      
      The Seville VSC9959 switch is a platform device, so the initial set of
      constraints that I thought would cause this (I2C or SPI buses which call
      ->remove on ->shutdown) do not apply. But there is one more which
      applies here.
      
      If the DSA master itself is on a bus that calls ->remove from ->shutdown
      (like dpaa2-eth, which is on the fsl-mc bus), there is a device link
      between the switch and the DSA master, and device_links_unbind_consumers()
      will unbind the seville switch driver on shutdown.
      
      So the same treatment must be applied to all DSA switch drivers, which
      is: either use devres for both the mdiobus allocation and registration,
      or don't use devres at all.
      
      The seville driver has a code structure that could accommodate both the
      mdiobus_unregister and mdiobus_free calls, but it has an external
      dependency upon mscc_miim_setup() from mdio-mscc-miim.c, which calls
      devm_mdiobus_alloc_size() on its behalf. So rather than restructuring
      that, and exporting yet one more symbol mscc_miim_teardown(), let's work
      with devres and replace of_mdiobus_register with the devres variant.
      When we use all-devres, we can ensure that devres doesn't free a
      still-registered bus (it either runs both callbacks, or none).
      
      Fixes: ac3a68d5 ("net: phy: don't abuse devres in devm_mdiobus_register()")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bd488afc
    • Vladimir Oltean's avatar
      net: dsa: felix: don't use devres for mdiobus · 209bdb7e
      Vladimir Oltean authored
      As explained in commits:
      74b6d7d1 ("net: dsa: realtek: register the MDIO bus under devres")
      5135e96a ("net: dsa: don't allocate the slave_mii_bus using devres")
      
      mdiobus_free() will panic when called from devm_mdiobus_free() <-
      devres_release_all() <- __device_release_driver(), and that mdiobus was
      not previously unregistered.
      
      The Felix VSC9959 switch is a PCI device, so the initial set of
      constraints that I thought would cause this (I2C or SPI buses which call
      ->remove on ->shutdown) do not apply. But there is one more which
      applies here.
      
      If the DSA master itself is on a bus that calls ->remove from ->shutdown
      (like dpaa2-eth, which is on the fsl-mc bus), there is a device link
      between the switch and the DSA master, and device_links_unbind_consumers()
      will unbind the felix switch driver on shutdown.
      
      So the same treatment must be applied to all DSA switch drivers, which
      is: either use devres for both the mdiobus allocation and registration,
      or don't use devres at all.
      
      The felix driver has the code structure in place for orderly mdiobus
      removal, so just replace devm_mdiobus_alloc_size() with the non-devres
      variant, and add manual free where necessary, to ensure that we don't
      let devres free a still-registered bus.
      
      Fixes: ac3a68d5 ("net: phy: don't abuse devres in devm_mdiobus_register()")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      209bdb7e
    • Vladimir Oltean's avatar
      net: dsa: bcm_sf2: don't use devres for mdiobus · 08f1a208
      Vladimir Oltean authored
      As explained in commits:
      74b6d7d1 ("net: dsa: realtek: register the MDIO bus under devres")
      5135e96a ("net: dsa: don't allocate the slave_mii_bus using devres")
      
      mdiobus_free() will panic when called from devm_mdiobus_free() <-
      devres_release_all() <- __device_release_driver(), and that mdiobus was
      not previously unregistered.
      
      The Starfighter 2 is a platform device, so the initial set of
      constraints that I thought would cause this (I2C or SPI buses which call
      ->remove on ->shutdown) do not apply. But there is one more which
      applies here.
      
      If the DSA master itself is on a bus that calls ->remove from ->shutdown
      (like dpaa2-eth, which is on the fsl-mc bus), there is a device link
      between the switch and the DSA master, and device_links_unbind_consumers()
      will unbind the bcm_sf2 switch driver on shutdown.
      
      So the same treatment must be applied to all DSA switch drivers, which
      is: either use devres for both the mdiobus allocation and registration,
      or don't use devres at all.
      
      The bcm_sf2 driver has the code structure in place for orderly mdiobus
      removal, so just replace devm_mdiobus_alloc() with the non-devres
      variant, and add manual free where necessary, to ensure that we don't
      let devres free a still-registered bus.
      
      Fixes: ac3a68d5 ("net: phy: don't abuse devres in devm_mdiobus_register()")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      08f1a208
    • Vladimir Oltean's avatar
      net: dsa: ar9331: register the mdiobus under devres · 50facd86
      Vladimir Oltean authored
      As explained in commits:
      74b6d7d1 ("net: dsa: realtek: register the MDIO bus under devres")
      5135e96a ("net: dsa: don't allocate the slave_mii_bus using devres")
      
      mdiobus_free() will panic when called from devm_mdiobus_free() <-
      devres_release_all() <- __device_release_driver(), and that mdiobus was
      not previously unregistered.
      
      The ar9331 is an MDIO device, so the initial set of constraints that I
      thought would cause this (I2C or SPI buses which call ->remove on
      ->shutdown) do not apply. But there is one more which applies here.
      
      If the DSA master itself is on a bus that calls ->remove from ->shutdown
      (like dpaa2-eth, which is on the fsl-mc bus), there is a device link
      between the switch and the DSA master, and device_links_unbind_consumers()
      will unbind the ar9331 switch driver on shutdown.
      
      So the same treatment must be applied to all DSA switch drivers, which
      is: either use devres for both the mdiobus allocation and registration,
      or don't use devres at all.
      
      The ar9331 driver doesn't have a complex code structure for mdiobus
      removal, so just replace of_mdiobus_register with the devres variant in
      order to be all-devres and ensure that we don't free a still-registered
      bus.
      
      Fixes: ac3a68d5 ("net: phy: don't abuse devres in devm_mdiobus_register()")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      50facd86
    • Vladimir Oltean's avatar
      net: dsa: mv88e6xxx: don't use devres for mdiobus · f53a2ce8
      Vladimir Oltean authored
      As explained in commits:
      74b6d7d1 ("net: dsa: realtek: register the MDIO bus under devres")
      5135e96a ("net: dsa: don't allocate the slave_mii_bus using devres")
      
      mdiobus_free() will panic when called from devm_mdiobus_free() <-
      devres_release_all() <- __device_release_driver(), and that mdiobus was
      not previously unregistered.
      
      The mv88e6xxx is an MDIO device, so the initial set of constraints that
      I thought would cause this (I2C or SPI buses which call ->remove on
      ->shutdown) do not apply. But there is one more which applies here.
      
      If the DSA master itself is on a bus that calls ->remove from ->shutdown
      (like dpaa2-eth, which is on the fsl-mc bus), there is a device link
      between the switch and the DSA master, and device_links_unbind_consumers()
      will unbind the Marvell switch driver on shutdown.
      
      systemd-shutdown[1]: Powering off.
      mv88e6085 0x0000000008b96000:00 sw_gl0: Link is Down
      fsl-mc dpbp.9: Removing from iommu group 7
      fsl-mc dpbp.8: Removing from iommu group 7
      ------------[ cut here ]------------
      kernel BUG at drivers/net/phy/mdio_bus.c:677!
      Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 5.16.5-00040-gdc05f73788e5 #15
      pc : mdiobus_free+0x44/0x50
      lr : devm_mdiobus_free+0x10/0x20
      Call trace:
       mdiobus_free+0x44/0x50
       devm_mdiobus_free+0x10/0x20
       devres_release_all+0xa0/0x100
       __device_release_driver+0x190/0x220
       device_release_driver_internal+0xac/0xb0
       device_links_unbind_consumers+0xd4/0x100
       __device_release_driver+0x4c/0x220
       device_release_driver_internal+0xac/0xb0
       device_links_unbind_consumers+0xd4/0x100
       __device_release_driver+0x94/0x220
       device_release_driver+0x28/0x40
       bus_remove_device+0x118/0x124
       device_del+0x174/0x420
       fsl_mc_device_remove+0x24/0x40
       __fsl_mc_device_remove+0xc/0x20
       device_for_each_child+0x58/0xa0
       dprc_remove+0x90/0xb0
       fsl_mc_driver_remove+0x20/0x5c
       __device_release_driver+0x21c/0x220
       device_release_driver+0x28/0x40
       bus_remove_device+0x118/0x124
       device_del+0x174/0x420
       fsl_mc_bus_remove+0x80/0x100
       fsl_mc_bus_shutdown+0xc/0x1c
       platform_shutdown+0x20/0x30
       device_shutdown+0x154/0x330
       kernel_power_off+0x34/0x6c
       __do_sys_reboot+0x15c/0x250
       __arm64_sys_reboot+0x20/0x30
       invoke_syscall.constprop.0+0x4c/0xe0
       do_el0_svc+0x4c/0x150
       el0_svc+0x24/0xb0
       el0t_64_sync_handler+0xa8/0xb0
       el0t_64_sync+0x178/0x17c
      
      So the same treatment must be applied to all DSA switch drivers, which
      is: either use devres for both the mdiobus allocation and registration,
      or don't use devres at all.
      
      The Marvell driver already has a good structure for mdiobus removal, so
      just plug in mdiobus_free and get rid of devres.
      
      Fixes: ac3a68d5 ("net: phy: don't abuse devres in devm_mdiobus_register()")
      Reported-by: default avatarRafael Richter <Rafael.Richter@gin.de>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarDaniel Klauer <daniel.klauer@gin.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f53a2ce8
    • Mahesh Bandewar's avatar
      bonding: pair enable_port with slave_arr_updates · 23de0d7b
      Mahesh Bandewar authored
      When 803.2ad mode enables a participating port, it should update
      the slave-array. I have observed that the member links are participating
      and are part of the active aggregator while the traffic is egressing via
      only one member link (in a case where two links are participating). Via
      kprobes I discovered that slave-arr has only one link added while
      the other participating link wasn't part of the slave-arr.
      
      I couldn't see what caused that situation but the simple code-walk
      through provided me hints that the enable_port wasn't always associated
      with the slave-array update.
      
      Fixes: ee637714 ("bonding: Simplify the xmit function for modes that use xmit_hash")
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Link: https://lore.kernel.org/r/20220207222901.1795287-1-maheshb@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      23de0d7b
    • Tao Liu's avatar
      gve: Recording rx queue before sending to napi · 084cbb2e
      Tao Liu authored
      This caused a significant performance degredation when using generic XDP
      with multiple queues.
      
      Fixes: f5cedc84 ("gve: Add transmit and receive support")
      Signed-off-by: default avatarTao Liu <xliutaox@google.com>
      Link: https://lore.kernel.org/r/20220207175901.2486596-1-jeroendb@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      084cbb2e
  2. 08 Feb, 2022 1 commit
  3. 07 Feb, 2022 1 commit
  4. 06 Feb, 2022 2 commits
    • Eric Dumazet's avatar
      net/smc: fix ref_tracker issue in smc_pnet_add() · 28f92221
      Eric Dumazet authored
      I added the netdev_tracker_alloc() right after ndev was
      stored into the newly allocated object:
      
        new_pe->ndev = ndev;
        if (ndev)
            netdev_tracker_alloc(ndev, &new_pe->dev_tracker, GFP_KERNEL);
      
      But I missed that later, we could end up freeing new_pe,
      then calling dev_put(ndev) to release the reference on ndev.
      
      The new_pe->dev_tracker would not be freed.
      
      To solve this issue, move the netdev_tracker_alloc() call to
      the point we know for sure new_pe will be kept.
      
      syzbot report (on net-next tree, but the bug is present in net tree)
      WARNING: CPU: 0 PID: 6019 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
      Modules linked in:
      CPU: 0 PID: 6019 Comm: syz-executor.3 Not tainted 5.17.0-rc2-syzkaller-00650-g5a8fb33e #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
      Code: 1d f4 70 a0 09 31 ff 89 de e8 4d bc 99 fd 84 db 75 e0 e8 64 b8 99 fd 48 c7 c7 20 0c 06 8a c6 05 d4 70 a0 09 01 e8 9e 4e 28 05 <0f> 0b eb c4 e8 48 b8 99 fd 0f b6 1d c3 70 a0 09 31 ff 89 de e8 18
      RSP: 0018:ffffc900043b7400 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000040000 RSI: ffffffff815fb318 RDI: fffff52000876e72
      RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff815f507e R11: 0000000000000000 R12: 1ffff92000876e85
      R13: 0000000000000000 R14: ffff88805c1c6600 R15: 0000000000000000
      FS:  00007f1ef6feb700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2d02b000 CR3: 00000000223f4000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       __refcount_dec include/linux/refcount.h:344 [inline]
       refcount_dec include/linux/refcount.h:359 [inline]
       ref_tracker_free+0x53f/0x6c0 lib/ref_tracker.c:119
       netdev_tracker_free include/linux/netdevice.h:3867 [inline]
       dev_put_track include/linux/netdevice.h:3884 [inline]
       dev_put_track include/linux/netdevice.h:3880 [inline]
       dev_put include/linux/netdevice.h:3910 [inline]
       smc_pnet_add_eth net/smc/smc_pnet.c:399 [inline]
       smc_pnet_enter net/smc/smc_pnet.c:493 [inline]
       smc_pnet_add+0x5fc/0x15f0 net/smc/smc_pnet.c:556
       genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731
       genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
       genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: b6064524 ("net/smc: add net device tracker to struct smc_pnetentry")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28f92221
    • Pavel Parkhomenko's avatar
      net: phy: marvell: Fix MDI-x polarity setting in 88e1118-compatible PHYs · aec12836
      Pavel Parkhomenko authored
      When setting up autonegotiation for 88E1118R and compatible PHYs,
      a software reset of PHY is issued before setting up polarity.
      This is incorrect as changes of MDI Crossover Mode bits are
      disruptive to the normal operation and must be followed by a
      software reset to take effect. Let's patch m88e1118_config_aneg()
      to fix the issue mentioned before by invoking software reset
      of the PHY just after setting up MDI-x polarity.
      
      Fixes: 605f196e ("phy: Add support for Marvell 88E1118 PHY")
      Signed-off-by: default avatarPavel Parkhomenko <Pavel.Parkhomenko@baikalelectronics.ru>
      Reviewed-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Suggested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aec12836
  5. 05 Feb, 2022 2 commits
  6. 04 Feb, 2022 9 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 40106e00
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Don't refresh timeout for SCTP flows in CLOSED state.
      
      2) Don't allow access to transport header if fragment offset is set on.
      
      3) Reinitialize internal conntrack state for retransmitted TCP
         syn-ack packet.
      
      4) Update MAINTAINER file to add the Netfilter group tree. Moving
         forward, Florian Westphal has access to this tree so he can also
         send pull requests.
      
      5) Set on IPS_HELPER for entries created via ctnetlink, otherwise NAT
         might zap it.
      
      All patches from Florian Westphal.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: ctnetlink: disable helper autoassign
        MAINTAINERS: netfilter: update git links
        netfilter: conntrack: re-init state for retransmitted syn-ack
        netfilter: conntrack: move synack init code to helper
        netfilter: nft_payload: don't allow th access for fragments
        netfilter: conntrack: don't refresh sctp entries in closed state
      ====================
      
      Link: https://lore.kernel.org/r/20220204151903.320786-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      40106e00
    • Samuel Mendoza-Jonas's avatar
      ixgbevf: Require large buffers for build_skb on 82599VF · fe68195d
      Samuel Mendoza-Jonas authored
      From 4.17 onwards the ixgbevf driver uses build_skb() to build an skb
      around new data in the page buffer shared with the ixgbe PF.
      This uses either a 2K or 3K buffer, and offsets the DMA mapping by
      NET_SKB_PAD + NET_IP_ALIGN. When using a smaller buffer RXDCTL is set to
      ensure the PF does not write a full 2K bytes into the buffer, which is
      actually 2K minus the offset.
      
      However on the 82599 virtual function, the RXDCTL mechanism is not
      available. The driver attempts to work around this by using the SET_LPE
      mailbox method to lower the maximm frame size, but the ixgbe PF driver
      ignores this in order to keep the PF and all VFs in sync[0].
      
      This means the PF will write up to the full 2K set in SRRCTL, causing it
      to write NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the buffer.
      With 4K pages split into two buffers, this means it either writes
      NET_SKB_PAD + NET_IP_ALIGN bytes past the first buffer (and into the
      second), or NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the DMA
      mapping.
      
      Avoid this by only enabling build_skb when using "large" buffers (3K).
      These are placed in each half of an order-1 page, preventing the PF from
      writing past the end of the mapping.
      
      [0]: Technically it only ever raises the max frame size, see
      ixgbe_set_vf_lpe() in ixgbe_sriov.c
      
      Fixes: f15c5ba5 ("ixgbevf: add support for using order 1 pages to receive large frames")
      Signed-off-by: default avatarSamuel Mendoza-Jonas <samjonas@amazon.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe68195d
    • Florian Westphal's avatar
      netfilter: ctnetlink: disable helper autoassign · d1ca60ef
      Florian Westphal authored
      When userspace, e.g. conntrackd, inserts an entry with a specified helper,
      its possible that the helper is lost immediately after its added:
      
      ctnetlink_create_conntrack
        -> nf_ct_helper_ext_add + assign helper
          -> ctnetlink_setup_nat
            -> ctnetlink_parse_nat_setup
               -> parse_nat_setup -> nfnetlink_parse_nat_setup
      	                       -> nf_nat_setup_info
                                       -> nf_conntrack_alter_reply
                                         -> __nf_ct_try_assign_helper
      
      ... and __nf_ct_try_assign_helper will zero the helper again.
      
      Set IPS_HELPER bit to bypass auto-assign logic, its unwanted, just like
      when helper is assigned via ruleset.
      
      Dropped old 'not strictly necessary' comment, it referred to use of
      rcu_assign_pointer() before it got replaced by RCU_INIT_POINTER().
      
      NB: Fixes tag intentionally incorrect, this extends the referenced commit,
      but this change won't build without IPS_HELPER introduced there.
      
      Fixes: 6714cf54 ("netfilter: nf_conntrack: fix explicit helper attachment and NAT")
      Reported-by: default avatarPham Thanh Tuyen <phamtyn@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d1ca60ef
    • Florian Westphal's avatar
      MAINTAINERS: netfilter: update git links · 1f6339e0
      Florian Westphal authored
      nf and nf-next have a new location.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1f6339e0
    • Florian Westphal's avatar
      netfilter: conntrack: re-init state for retransmitted syn-ack · 82b72cb9
      Florian Westphal authored
      TCP conntrack assumes that a syn-ack retransmit is identical to the
      previous syn-ack.  This isn't correct and causes stuck 3whs in some more
      esoteric scenarios.  tcpdump to illustrate the problem:
      
       client > server: Flags [S] seq 1365731894, win 29200, [mss 1460,sackOK,TS val 2083035583 ecr 0,wscale 7]
       server > client: Flags [S.] seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215367629 ecr 2082921663]
      
      Note the invalid/outdated synack ack number.
      Conntrack marks this syn-ack as out-of-window/invalid, but it did
      initialize the reply direction parameters based on this packets content.
      
       client > server: Flags [S] seq 1365731894, win 29200, [mss 1460,sackOK,TS val 2083036623 ecr 0,wscale 7]
      
      ... retransmit...
      
       server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215368644 ecr 2082921663]
      
      and another bogus synack. This repeats, then client re-uses for a new
      attempt:
      
      client > server: Flags [S], seq 2375731741, win 29200, [mss 1460,sackOK,TS val 2083100223 ecr 0,wscale 7]
      server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215430754 ecr 2082921663]
      
      ... but still gets a invalid syn-ack.
      
      This repeats until:
      
       server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215437785 ecr 2082921663]
       server > client: Flags [R.], seq 145824454, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215443451 ecr 2082921663]
       client > server: Flags [S], seq 2375731741, win 29200, [mss 1460,sackOK,TS val 2083115583 ecr 0,wscale 7]
       server > client: Flags [S.], seq 162602410, ack 2375731742, win 65535, [mss 8952,wscale 5,TS val 3215445754 ecr 2083115583]
      
      This syn-ack has the correct ack number, but conntrack flags it as
      invalid: The internal state was created from the first syn-ack seen
      so the sequence number of the syn-ack is treated as being outside of
      the announced window.
      
      Don't assume that retransmitted syn-ack is identical to previous one.
      Treat it like the first syn-ack and reinit state.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      82b72cb9
    • Florian Westphal's avatar
      netfilter: conntrack: move synack init code to helper · cc4f9d62
      Florian Westphal authored
      It seems more readable to use a common helper in the followup fix rather
      than copypaste or goto.
      
      No functional change intended.  The function is only called for syn-ack
      or syn in repy direction in case of simultaneous open.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      cc4f9d62
    • Florian Westphal's avatar
      netfilter: nft_payload: don't allow th access for fragments · a9e8503d
      Florian Westphal authored
      Loads relative to ->thoff naturally expect that this points to the
      transport header, but this is only true if pkt->fragoff == 0.
      
      This has little effect for rulesets with connection tracking/nat because
      these enable ip defra. For other rulesets this prevents false matches.
      
      Fixes: 96518518 ("netfilter: add nftables")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a9e8503d
    • Florian Westphal's avatar
      netfilter: conntrack: don't refresh sctp entries in closed state · 77b33719
      Florian Westphal authored
      Vivek Thrivikraman reported:
       An SCTP server application which is accessed continuously by client
       application.
       When the session disconnects the client retries to establish a connection.
       After restart of SCTP server application the session is not established
       because of stale conntrack entry with connection state CLOSED as below.
      
       (removing this entry manually established new connection):
      
       sctp 9 CLOSED src=10.141.189.233 [..]  [ASSURED]
      
      Just skip timeout update of closed entries, we don't want them to
      stay around forever.
      Reported-and-tested-by: default avatarVivek Thrivikraman <vivek.thrivikraman@est.tech>
      Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1579Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      77b33719
    • Steen Hegelund's avatar
      net: sparx5: Fix get_stat64 crash in tcpdump · ed14fc7a
      Steen Hegelund authored
      This problem was found with Sparx5 when the tcpdump tool requests the
      do_get_stats64 (sparx5_get_stats64) statistic.
      
      The portstats pointer was incorrectly incremented when fetching priority
      based statistics.
      
      Fixes: af4b1102 (net: sparx5: add ethtool configuration and statistics support)
      Signed-off-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Link: https://lore.kernel.org/r/20220203102900.528987-1-steen.hegelund@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ed14fc7a