1. 28 Dec, 2022 13 commits
  2. 26 Dec, 2022 5 commits
    • Anuradha Weeraman's avatar
      net: ethernet: marvell: octeontx2: Fix uninitialized variable warning · d3805695
      Anuradha Weeraman authored
      Fix for uninitialized variable warning.
      
      Addresses-Coverity: ("Uninitialized scalar variable")
      Signed-off-by: default avatarAnuradha Weeraman <anuradha@debian.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3805695
    • Miaoqian Lin's avatar
      nfc: Fix potential resource leaks · df49908f
      Miaoqian Lin authored
      nfc_get_device() take reference for the device, add missing
      nfc_put_device() to release it when not need anymore.
      Also fix the style warnning by use error EOPNOTSUPP instead of
      ENOTSUPP.
      
      Fixes: 5ce3f32b ("NFC: netlink: SE API implementation")
      Fixes: 29e76924 ("nfc: netlink: Add capability to reply to vendor_cmd with data")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df49908f
    • Johnny S. Lee's avatar
      net: dsa: mv88e6xxx: depend on PTP conditionally · 30e72553
      Johnny S. Lee authored
      PTP hardware timestamping related objects are not linked when PTP
      support for MV88E6xxx (NET_DSA_MV88E6XXX_PTP) is disabled, therefore
      NET_DSA_MV88E6XXX should not depend on PTP_1588_CLOCK_OPTIONAL
      regardless of NET_DSA_MV88E6XXX_PTP.
      
      Instead, condition more strictly on how NET_DSA_MV88E6XXX_PTP's
      dependencies are met, making sure that it cannot be enabled when
      NET_DSA_MV88E6XXX=y and PTP_1588_CLOCK=m.
      
      In other words, this commit allows NET_DSA_MV88E6XXX to be built-in
      while PTP_1588_CLOCK is a module, as long as NET_DSA_MV88E6XXX_PTP is
      prevented from being enabled.
      
      Fixes: e5f31552 ("ethernet: fix PTP_1588_CLOCK dependencies")
      Signed-off-by: default avatarJohnny S. Lee <foss@jsl.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30e72553
    • Daniil Tatianin's avatar
      qlcnic: prevent ->dcb use-after-free on qlcnic_dcb_enable() failure · 13a7c896
      Daniil Tatianin authored
      adapter->dcb would get silently freed inside qlcnic_dcb_enable() in
      case qlcnic_dcb_attach() would return an error, which always happens
      under OOM conditions. This would lead to use-after-free because both
      of the existing callers invoke qlcnic_dcb_get_info() on the obtained
      pointer, which is potentially freed at that point.
      
      Propagate errors from qlcnic_dcb_enable(), and instead free the dcb
      pointer at callsite using qlcnic_dcb_free(). This also removes the now
      unused qlcnic_clear_dcb_ops() helper, which was a simple wrapper around
      kfree() also causing memory leaks for partially initialized dcb.
      
      Found by Linux Verification Center (linuxtesting.org) with the SVACE
      static analysis tool.
      
      Fixes: 3c44bba1 ("qlcnic: Disable DCB operations from SR-IOV VFs")
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarDaniil Tatianin <d-tatianin@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13a7c896
    • Hawkins Jiawei's avatar
      net: sched: fix memory leak in tcindex_set_parms · 399ab7fe
      Hawkins Jiawei authored
      Syzkaller reports a memory leak as follows:
      ====================================
      BUG: memory leak
      unreferenced object 0xffff88810c287f00 (size 256):
        comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046
          [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline]
          [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline]
          [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline]
          [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline]
          [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342
          [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553
          [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147
          [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082
          [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540
          [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
          [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345
          [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921
          [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline]
          [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734
          [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482
          [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536
          [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622
          [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline]
          [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline]
          [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648
          [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
          [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      ====================================
      
      Kernel uses tcindex_change() to change an existing
      filter properties.
      
      Yet the problem is that, during the process of changing,
      if `old_r` is retrieved from `p->perfect`, then
      kernel uses tcindex_alloc_perfect_hash() to newly
      allocate filter results, uses tcindex_filter_result_init()
      to clear the old filter result, without destroying
      its tcf_exts structure, which triggers the above memory leak.
      
      To be more specific, there are only two source for the `old_r`,
      according to the tcindex_lookup(). `old_r` is retrieved from
      `p->perfect`, or `old_r` is retrieved from `p->h`.
      
        * If `old_r` is retrieved from `p->perfect`, kernel uses
      tcindex_alloc_perfect_hash() to newly allocate the
      filter results. Then `r` is assigned with `cp->perfect + handle`,
      which is newly allocated. So condition `old_r && old_r != r` is
      true in this situation, and kernel uses tcindex_filter_result_init()
      to clear the old filter result, without destroying
      its tcf_exts structure
      
        * If `old_r` is retrieved from `p->h`, then `p->perfect` is NULL
      according to the tcindex_lookup(). Considering that `cp->h`
      is directly copied from `p->h` and `p->perfect` is NULL,
      `r` is assigned with `tcindex_lookup(cp, handle)`, whose value
      should be the same as `old_r`, so condition `old_r && old_r != r`
      is false in this situation, kernel ignores using
      tcindex_filter_result_init() to clear the old filter result.
      
      So only when `old_r` is retrieved from `p->perfect` does kernel use
      tcindex_filter_result_init() to clear the old filter result, which
      triggers the above memory leak.
      
      Considering that there already exists a tc_filter_wq workqueue
      to destroy the old tcindex_data by tcindex_partial_destroy_work()
      at the end of tcindex_set_parms(), this patch solves
      this memory leak bug by removing this old filter result
      clearing part and delegating it to the tc_filter_wq workqueue.
      
      Note that this patch doesn't introduce any other issues. If
      `old_r` is retrieved from `p->perfect`, this patch just
      delegates old filter result clearing part to the
      tc_filter_wq workqueue; If `old_r` is retrieved from `p->h`,
      kernel doesn't reach the old filter result clearing part, so
      removing this part has no effect.
      
      [Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni
      and Dmitry Vyukov]
      
      Fixes: b9a24bb7 ("net_sched: properly handle failure case of tcf_exts_init()")
      Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/
      Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
      Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com
      Cc: Cong Wang <cong.wang@bytedance.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      399ab7fe
  3. 24 Dec, 2022 1 commit
    • David S. Miller's avatar
      Merge tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · be1236fc
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 7 non-merge commits during the last 5 day(s) which contain
      a total of 11 files changed, 231 insertions(+), 3 deletions(-).
      
      The main changes are:
      
      1) Fix a splat in bpf_skb_generic_pop() under CHECKSUM_PARTIAL due to
         misuse of skb_postpull_rcsum(), from Jakub Kicinski with test case
         from Martin Lau.
      
      2) Fix BPF verifier's nullness propagation when registers are of
         type PTR_TO_BTF_ID, from Hao Sun.
      
      3) Fix bpftool build for JIT disassembler under statically built
         libllvm, from Anton Protopopov.
      
      4) Fix warnings reported by resolve_btfids when building vmlinux
         with CONFIG_SECURITY_NETWORK disabled, from Hou Tao.
      
      5) Minor fix up for BPF selftest gitignore, from Stanislav Fomichev.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be1236fc
  4. 23 Dec, 2022 9 commits
  5. 22 Dec, 2022 12 commits
    • Anton Protopopov's avatar
      bpftool: Fix linkage with statically built libllvm · 55171f29
      Anton Protopopov authored
      Since the commit eb9d1acf ("bpftool: Add LLVM as default library for
      disassembling JIT-ed programs") we might link the bpftool program with the
      libllvm library. This works fine when a shared libllvm library is available,
      but fails if we want to link bpftool with a statically built LLVM:
      
        [...]
        /usr/bin/ld: /usr/local/lib/libLLVMSupport.a(CrashRecoveryContext.cpp.o): in function `llvm::CrashRecoveryContextCleanup::~CrashRecoveryContextCleanup()':
        CrashRecoveryContext.cpp:(.text._ZN4llvm27CrashRecoveryContextCleanupD0Ev+0x17): undefined reference to `operator delete(void*, unsigned long)'
        /usr/bin/ld: /usr/local/lib/libLLVMSupport.a(CrashRecoveryContext.cpp.o): in function `llvm::CrashRecoveryContext::~CrashRecoveryContext()':
        CrashRecoveryContext.cpp:(.text._ZN4llvm20CrashRecoveryContextD2Ev+0xc8): undefined reference to `operator delete(void*, unsigned long)'
        [...]
      
      So in the case of static libllvm we need to explicitly link bpftool with
      required libraries, namely, libstdc++ and those provided by the `llvm-config
      --system-libs` command. We can distinguish between the shared and static cases
      by using the `llvm-config --shared-mode` command.
      
      Fixes: eb9d1acf ("bpftool: Add LLVM as default library for disassembling JIT-ed programs")
      Signed-off-by: default avatarAnton Protopopov <aspsk@isovalent.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/bpf/20221222102627.1643709-1-aspsk@isovalent.com
      55171f29
    • Shawn Bohrer's avatar
      veth: Fix race with AF_XDP exposing old or uninitialized descriptors · fa349e39
      Shawn Bohrer authored
      When AF_XDP is used on on a veth interface the RX ring is updated in two
      steps.  veth_xdp_rcv() removes packet descriptors from the FILL ring
      fills them and places them in the RX ring updating the cached_prod
      pointer.  Later xdp_do_flush() syncs the RX ring prod pointer with the
      cached_prod pointer allowing user-space to see the recently filled in
      descriptors.  The rings are intended to be SPSC, however the existing
      order in veth_poll allows the xdp_do_flush() to run concurrently with
      another CPU creating a race condition that allows user-space to see old
      or uninitialized descriptors in the RX ring.  This bug has been observed
      in production systems.
      
      To summarize, we are expecting this ordering:
      
      CPU 0 __xsk_rcv_zc()
      CPU 0 __xsk_map_flush()
      CPU 2 __xsk_rcv_zc()
      CPU 2 __xsk_map_flush()
      
      But we are seeing this order:
      
      CPU 0 __xsk_rcv_zc()
      CPU 2 __xsk_rcv_zc()
      CPU 0 __xsk_map_flush()
      CPU 2 __xsk_map_flush()
      
      This occurs because we rely on NAPI to ensure that only one napi_poll
      handler is running at a time for the given veth receive queue.
      napi_schedule_prep() will prevent multiple instances from getting
      scheduled. However calling napi_complete_done() signals that this
      napi_poll is complete and allows subsequent calls to
      napi_schedule_prep() and __napi_schedule() to succeed in scheduling a
      concurrent napi_poll before the xdp_do_flush() has been called.  For the
      veth driver a concurrent call to napi_schedule_prep() and
      __napi_schedule() can occur on a different CPU because the veth xmit
      path can additionally schedule a napi_poll creating the race.
      
      The fix as suggested by Magnus Karlsson, is to simply move the
      xdp_do_flush() call before napi_complete_done().  This syncs the
      producer ring pointers before another instance of napi_poll can be
      scheduled on another CPU.  It will also slightly improve performance by
      moving the flush closer to when the descriptors were placed in the
      RX ring.
      
      Fixes: d1396004 ("veth: Add XDP TX and REDIRECT")
      Suggested-by: default avatarMagnus Karlsson <magnus.karlsson@gmail.com>
      Signed-off-by: default avatarShawn Bohrer <sbohrer@cloudflare.com>
      Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      fa349e39
    • Horatiu Vultur's avatar
      net: lan966x: Fix configuration of the PCS · d717f947
      Horatiu Vultur authored
      When the PCS was taken out of reset, we were changing by mistake also
      the speed to 100 Mbit. But in case the link was going down, the link
      up routine was setting correctly the link speed. If the link was not
      getting down then the speed was forced to run at 100 even if the
      speed was something else.
      On lan966x, to set the speed link to 1G or 2.5G a value of 1 needs to be
      written in DEV_CLOCK_CFG_LINK_SPEED. This is similar to the procedure in
      lan966x_port_init.
      
      The issue was reproduced using 1000base-x sfp module using the commands:
      ip link set dev eth2 up
      ip link addr add 10.97.10.2/24 dev eth2
      ethtool -s eth2 speed 1000 autoneg off
      
      Fixes: d28d6d2e ("net: lan966x: add port module support")
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarPiotr Raczynski <piotr.raczynski@intel.com>
      Link: https://lore.kernel.org/r/20221221093315.939133-1-horatiu.vultur@microchip.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d717f947
    • Eric Dumazet's avatar
      bonding: fix lockdep splat in bond_miimon_commit() · 42c7ded0
      Eric Dumazet authored
      bond_miimon_commit() is run while RTNL is held, not RCU.
      
      WARNING: suspicious RCU usage
      6.1.0-syzkaller-09671-g89529367 #0 Not tainted
      -----------------------------
      drivers/net/bonding/bond_main.c:2704 suspicious rcu_dereference_check() usage!
      
      Fixes: e95cc447 ("bonding: do failover when high prio link up")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Hangbin Liu <liuhangbin@gmail.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Link: https://lore.kernel.org/r/20221220130831.1480888-1-edumazet@google.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      42c7ded0
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-locking-fixes' · 43ae218f
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: Locking fixes
      
      Two separate locking fixes for the networking tree:
      
      Patch 1 addresses a MPTCP fastopen error-path deadlock that was found
      with syzkaller.
      
      Patch 2 works around a lockdep false-positive between MPTCP listening and
      non-listening sockets at socket destruct time.
      ====================
      
      Link: https://lore.kernel.org/r/20221220195215.238353-1-mathew.j.martineau@linux.intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      43ae218f
    • Paolo Abeni's avatar
      mptcp: fix lockdep false positive · fec3adfd
      Paolo Abeni authored
      MattB reported a lockdep splat in the mptcp listener code cleanup:
      
       WARNING: possible circular locking dependency detected
       packetdrill/14278 is trying to acquire lock:
       ffff888017d868f0 ((work_completion)(&msk->work)){+.+.}-{0:0}, at: __flush_work (kernel/workqueue.c:3069)
      
       but task is already holding lock:
       ffff888017d84130 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close (net/mptcp/protocol.c:2973)
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
              __lock_acquire (kernel/locking/lockdep.c:5055)
              lock_acquire (kernel/locking/lockdep.c:466)
              lock_sock_nested (net/core/sock.c:3463)
              mptcp_worker (net/mptcp/protocol.c:2614)
              process_one_work (kernel/workqueue.c:2294)
              worker_thread (include/linux/list.h:292)
              kthread (kernel/kthread.c:376)
              ret_from_fork (arch/x86/entry/entry_64.S:312)
      
       -> #0 ((work_completion)(&msk->work)){+.+.}-{0:0}:
              check_prev_add (kernel/locking/lockdep.c:3098)
              validate_chain (kernel/locking/lockdep.c:3217)
              __lock_acquire (kernel/locking/lockdep.c:5055)
              lock_acquire (kernel/locking/lockdep.c:466)
              __flush_work (kernel/workqueue.c:3070)
              __cancel_work_timer (kernel/workqueue.c:3160)
              mptcp_cancel_work (net/mptcp/protocol.c:2758)
              mptcp_subflow_queue_clean (net/mptcp/subflow.c:1817)
              __mptcp_close_ssk (net/mptcp/protocol.c:2363)
              mptcp_destroy_common (net/mptcp/protocol.c:3170)
              mptcp_destroy (include/net/sock.h:1495)
              __mptcp_destroy_sock (net/mptcp/protocol.c:2886)
              __mptcp_close (net/mptcp/protocol.c:2959)
              mptcp_close (net/mptcp/protocol.c:2974)
              inet_release (net/ipv4/af_inet.c:432)
              __sock_release (net/socket.c:651)
              sock_close (net/socket.c:1367)
              __fput (fs/file_table.c:320)
              task_work_run (kernel/task_work.c:181 (discriminator 1))
              exit_to_user_mode_prepare (include/linux/resume_user_mode.h:49)
              syscall_exit_to_user_mode (kernel/entry/common.c:130)
              do_syscall_64 (arch/x86/entry/common.c:87)
              entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      
       other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(sk_lock-AF_INET);
                                      lock((work_completion)(&msk->work));
                                      lock(sk_lock-AF_INET);
         lock((work_completion)(&msk->work));
      
        *** DEADLOCK ***
      
      The report is actually a false positive, since the only existing lock
      nesting is the msk socket lock acquired by the mptcp work.
      cancel_work_sync() is invoked without the relevant socket lock being
      held, but under a different (the msk listener) socket lock.
      
      We could silence the splat adding a per workqueue dynamic lockdep key,
      but that looks overkill. Instead just tell lockdep the msk socket lock
      is not held around cancel_work_sync().
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/322
      Fixes: 30e51b92 ("mptcp: fix unreleased socket in accept queue")
      Reported-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fec3adfd
    • Paolo Abeni's avatar
      mptcp: fix deadlock in fastopen error path · 7d803344
      Paolo Abeni authored
      MatM reported a deadlock at fastopening time:
      
      INFO: task syz-executor.0:11454 blocked for more than 143 seconds.
            Tainted: G S                 6.1.0-rc5-03226-gdb0157db5153 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:syz-executor.0  state:D stack:25104 pid:11454 ppid:424    flags:0x00004006
      Call Trace:
       <TASK>
       context_switch kernel/sched/core.c:5191 [inline]
       __schedule+0x5c2/0x1550 kernel/sched/core.c:6503
       schedule+0xe8/0x1c0 kernel/sched/core.c:6579
       __lock_sock+0x142/0x260 net/core/sock.c:2896
       lock_sock_nested+0xdb/0x100 net/core/sock.c:3466
       __mptcp_close_ssk+0x1a3/0x790 net/mptcp/protocol.c:2328
       mptcp_destroy_common+0x16a/0x650 net/mptcp/protocol.c:3171
       mptcp_disconnect+0xb8/0x450 net/mptcp/protocol.c:3019
       __inet_stream_connect+0x897/0xa40 net/ipv4/af_inet.c:720
       tcp_sendmsg_fastopen+0x3dd/0x740 net/ipv4/tcp.c:1200
       mptcp_sendmsg_fastopen net/mptcp/protocol.c:1682 [inline]
       mptcp_sendmsg+0x128a/0x1a50 net/mptcp/protocol.c:1721
       inet6_sendmsg+0x11f/0x150 net/ipv6/af_inet6.c:663
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg+0xf7/0x190 net/socket.c:734
       ____sys_sendmsg+0x336/0x970 net/socket.c:2476
       ___sys_sendmsg+0x122/0x1c0 net/socket.c:2530
       __sys_sendmmsg+0x18d/0x460 net/socket.c:2616
       __do_sys_sendmmsg net/socket.c:2645 [inline]
       __se_sys_sendmmsg net/socket.c:2642 [inline]
       __x64_sys_sendmmsg+0x9d/0x110 net/socket.c:2642
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f5920a75e7d
      RSP: 002b:00007f59201e8028 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00007f5920bb4f80 RCX: 00007f5920a75e7d
      RDX: 0000000000000001 RSI: 0000000020002940 RDI: 0000000000000005
      RBP: 00007f5920ae7593 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000020004050 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007f5920bb4f80 R15: 00007f59201c8000
       </TASK>
      
      In the error path, tcp_sendmsg_fastopen() ends-up calling
      mptcp_disconnect(), and the latter tries to close each
      subflow, acquiring the socket lock on each of them.
      
      At fastopen time, we have a single subflow, and such subflow
      socket lock is already held by the called, causing the deadlock.
      
      We already track the 'fastopen in progress' status inside the msk
      socket. Use it to address the issue, making mptcp_disconnect() a
      no op when invoked from the fastopen (error) path and doing the
      relevant cleanup after releasing the subflow socket lock.
      
      While at the above, rename the fastopen status bit to something
      more meaningful.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/321
      Fixes: fa9e5746 ("mptcp: fix abba deadlock on fastopen")
      Reported-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7d803344
    • Yinjun Zhang's avatar
      nfp: fix schedule in atomic context when sync mc address · e20aa071
      Yinjun Zhang authored
      The callback `.ndo_set_rx_mode` is called in atomic context, sleep
      is not allowed in the implementation. Now use workqueue mechanism
      to avoid this issue.
      
      Fixes: de624864 ("nfp: add support for multicast filter")
      Signed-off-by: default avatarYinjun Zhang <yinjun.zhang@corigine.com>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20221220152100.1042774-1-simon.horman@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e20aa071
    • Ronak Doshi's avatar
      vmxnet3: correctly report csum_level for encapsulated packet · 3d8f2c42
      Ronak Doshi authored
      Commit dacce2be ("vmxnet3: add geneve and vxlan tunnel offload
      support") added support for encapsulation offload. However, the
      pathc did not report correctly the csum_level for encapsulated packet.
      
      This patch fixes this issue by reporting correct csum level for the
      encapsulated packet.
      
      Fixes: dacce2be ("vmxnet3: add geneve and vxlan tunnel offload support")
      Signed-off-by: default avatarRonak Doshi <doshir@vmware.com>
      Acked-by: default avatarPeng Li <lpeng@vmware.com>
      Link: https://lore.kernel.org/r/20221220202556.24421-1-doshir@vmware.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d8f2c42
    • Aaron Conole's avatar
      net: openvswitch: release vport resources on failure · 95637d91
      Aaron Conole authored
      A recent commit introducing upcall packet accounting failed to properly
      release the vport object when the per-cpu stats struct couldn't be
      allocated.  This can cause dangling pointers to dp objects long after
      they've been released.
      
      Cc: wangchuanlei <wangchuanlei@inspur.com>
      Fixes: 1933ea36 ("net: openvswitch: Add support to count upcall packets")
      Reported-by: syzbot+8f4e2dcfcb3209ac35f9@syzkaller.appspotmail.com
      Signed-off-by: default avatarAaron Conole <aconole@redhat.com>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Link: https://lore.kernel.org/r/20221220212717.526780-1-aconole@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      95637d91
    • Antoine Tenart's avatar
      net: vrf: determine the dst using the original ifindex for multicast · f2575c8f
      Antoine Tenart authored
      Multicast packets received on an interface bound to a VRF are marked as
      belonging to the VRF and the skb device is updated to point to the VRF
      device itself. This was fine even when a route was associated to a
      device as when performing a fib table lookup 'oif' in fib6_table_lookup
      (coming from 'skb->dev->ifindex' in ip6_route_input) was set to 0 when
      FLOWI_FLAG_SKIP_NH_OIF was set.
      
      With commit 40867d74 ("net: Add l3mdev index to flow struct and
      avoid oif reset for port devices") this is not longer true and multicast
      traffic is not received on the original interface.
      
      Instead of adding back a similar check in fib6_table_lookup determine
      the dst using the original ifindex for multicast VRF traffic. To make
      things consistent across the function do the above for all strict
      packets, which was the logic before commit 6f12fa77 ("vrf: mark skb
      for multicast or link-local as enslaved to VRF"). Note that reverting to
      this behavior should be fine as the change was about marking packets
      belonging to the VRF, not about their dst.
      
      Fixes: 40867d74 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20221220171825.1172237-1-atenart@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f2575c8f
    • Maciej Fijalkowski's avatar
      ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf · 53fc61be
      Maciej Fijalkowski authored
      Previously ice XDP xmit routine was changed in a way that it avoids
      xdp_buff->xdp_frame conversion as it is simply not needed for handling
      XDP_TX action and what is more it saves us CPU cycles. This routine is
      re-used on ZC driver to handle XDP_TX action.
      
      Although for XDP_TX on Rx ZC xdp_buff that comes from xsk_buff_pool is
      converted to xdp_frame, xdp_frame itself is not stored inside
      ice_tx_buf, we only store raw data pointer. Casting this pointer to
      xdp_frame and calling against it xdp_return_frame in
      ice_clean_xdp_tx_buf() results in undefined behavior.
      
      To fix this, simply call page_frag_free() on tx_buf->raw_buf.
      Later intention is to remove the buff->frame conversion in order to
      simplify the codebase and improve XDP_TX performance on ZC.
      
      Fixes: 126cdfe1 ("ice: xsk: Improve AF_XDP ZC Tx and use batching API")
      Reported-and-tested-by: default avatarRobin Cowley <robin.cowley@thehutgroup.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarPiotr Raczynski <piotr.raczynski@.intel.com>
      Link: https://lore.kernel.org/r/20221220175448.693999-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      53fc61be