1. 22 Dec, 2022 9 commits
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-locking-fixes' · 43ae218f
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: Locking fixes
      
      Two separate locking fixes for the networking tree:
      
      Patch 1 addresses a MPTCP fastopen error-path deadlock that was found
      with syzkaller.
      
      Patch 2 works around a lockdep false-positive between MPTCP listening and
      non-listening sockets at socket destruct time.
      ====================
      
      Link: https://lore.kernel.org/r/20221220195215.238353-1-mathew.j.martineau@linux.intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      43ae218f
    • Paolo Abeni's avatar
      mptcp: fix lockdep false positive · fec3adfd
      Paolo Abeni authored
      MattB reported a lockdep splat in the mptcp listener code cleanup:
      
       WARNING: possible circular locking dependency detected
       packetdrill/14278 is trying to acquire lock:
       ffff888017d868f0 ((work_completion)(&msk->work)){+.+.}-{0:0}, at: __flush_work (kernel/workqueue.c:3069)
      
       but task is already holding lock:
       ffff888017d84130 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close (net/mptcp/protocol.c:2973)
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (sk_lock-AF_INET){+.+.}-{0:0}:
              __lock_acquire (kernel/locking/lockdep.c:5055)
              lock_acquire (kernel/locking/lockdep.c:466)
              lock_sock_nested (net/core/sock.c:3463)
              mptcp_worker (net/mptcp/protocol.c:2614)
              process_one_work (kernel/workqueue.c:2294)
              worker_thread (include/linux/list.h:292)
              kthread (kernel/kthread.c:376)
              ret_from_fork (arch/x86/entry/entry_64.S:312)
      
       -> #0 ((work_completion)(&msk->work)){+.+.}-{0:0}:
              check_prev_add (kernel/locking/lockdep.c:3098)
              validate_chain (kernel/locking/lockdep.c:3217)
              __lock_acquire (kernel/locking/lockdep.c:5055)
              lock_acquire (kernel/locking/lockdep.c:466)
              __flush_work (kernel/workqueue.c:3070)
              __cancel_work_timer (kernel/workqueue.c:3160)
              mptcp_cancel_work (net/mptcp/protocol.c:2758)
              mptcp_subflow_queue_clean (net/mptcp/subflow.c:1817)
              __mptcp_close_ssk (net/mptcp/protocol.c:2363)
              mptcp_destroy_common (net/mptcp/protocol.c:3170)
              mptcp_destroy (include/net/sock.h:1495)
              __mptcp_destroy_sock (net/mptcp/protocol.c:2886)
              __mptcp_close (net/mptcp/protocol.c:2959)
              mptcp_close (net/mptcp/protocol.c:2974)
              inet_release (net/ipv4/af_inet.c:432)
              __sock_release (net/socket.c:651)
              sock_close (net/socket.c:1367)
              __fput (fs/file_table.c:320)
              task_work_run (kernel/task_work.c:181 (discriminator 1))
              exit_to_user_mode_prepare (include/linux/resume_user_mode.h:49)
              syscall_exit_to_user_mode (kernel/entry/common.c:130)
              do_syscall_64 (arch/x86/entry/common.c:87)
              entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      
       other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(sk_lock-AF_INET);
                                      lock((work_completion)(&msk->work));
                                      lock(sk_lock-AF_INET);
         lock((work_completion)(&msk->work));
      
        *** DEADLOCK ***
      
      The report is actually a false positive, since the only existing lock
      nesting is the msk socket lock acquired by the mptcp work.
      cancel_work_sync() is invoked without the relevant socket lock being
      held, but under a different (the msk listener) socket lock.
      
      We could silence the splat adding a per workqueue dynamic lockdep key,
      but that looks overkill. Instead just tell lockdep the msk socket lock
      is not held around cancel_work_sync().
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/322
      Fixes: 30e51b92 ("mptcp: fix unreleased socket in accept queue")
      Reported-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fec3adfd
    • Paolo Abeni's avatar
      mptcp: fix deadlock in fastopen error path · 7d803344
      Paolo Abeni authored
      MatM reported a deadlock at fastopening time:
      
      INFO: task syz-executor.0:11454 blocked for more than 143 seconds.
            Tainted: G S                 6.1.0-rc5-03226-gdb0157db5153 #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:syz-executor.0  state:D stack:25104 pid:11454 ppid:424    flags:0x00004006
      Call Trace:
       <TASK>
       context_switch kernel/sched/core.c:5191 [inline]
       __schedule+0x5c2/0x1550 kernel/sched/core.c:6503
       schedule+0xe8/0x1c0 kernel/sched/core.c:6579
       __lock_sock+0x142/0x260 net/core/sock.c:2896
       lock_sock_nested+0xdb/0x100 net/core/sock.c:3466
       __mptcp_close_ssk+0x1a3/0x790 net/mptcp/protocol.c:2328
       mptcp_destroy_common+0x16a/0x650 net/mptcp/protocol.c:3171
       mptcp_disconnect+0xb8/0x450 net/mptcp/protocol.c:3019
       __inet_stream_connect+0x897/0xa40 net/ipv4/af_inet.c:720
       tcp_sendmsg_fastopen+0x3dd/0x740 net/ipv4/tcp.c:1200
       mptcp_sendmsg_fastopen net/mptcp/protocol.c:1682 [inline]
       mptcp_sendmsg+0x128a/0x1a50 net/mptcp/protocol.c:1721
       inet6_sendmsg+0x11f/0x150 net/ipv6/af_inet6.c:663
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg+0xf7/0x190 net/socket.c:734
       ____sys_sendmsg+0x336/0x970 net/socket.c:2476
       ___sys_sendmsg+0x122/0x1c0 net/socket.c:2530
       __sys_sendmmsg+0x18d/0x460 net/socket.c:2616
       __do_sys_sendmmsg net/socket.c:2645 [inline]
       __se_sys_sendmmsg net/socket.c:2642 [inline]
       __x64_sys_sendmmsg+0x9d/0x110 net/socket.c:2642
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f5920a75e7d
      RSP: 002b:00007f59201e8028 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00007f5920bb4f80 RCX: 00007f5920a75e7d
      RDX: 0000000000000001 RSI: 0000000020002940 RDI: 0000000000000005
      RBP: 00007f5920ae7593 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000020004050 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007f5920bb4f80 R15: 00007f59201c8000
       </TASK>
      
      In the error path, tcp_sendmsg_fastopen() ends-up calling
      mptcp_disconnect(), and the latter tries to close each
      subflow, acquiring the socket lock on each of them.
      
      At fastopen time, we have a single subflow, and such subflow
      socket lock is already held by the called, causing the deadlock.
      
      We already track the 'fastopen in progress' status inside the msk
      socket. Use it to address the issue, making mptcp_disconnect() a
      no op when invoked from the fastopen (error) path and doing the
      relevant cleanup after releasing the subflow socket lock.
      
      While at the above, rename the fastopen status bit to something
      more meaningful.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/321
      Fixes: fa9e5746 ("mptcp: fix abba deadlock on fastopen")
      Reported-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7d803344
    • Yinjun Zhang's avatar
      nfp: fix schedule in atomic context when sync mc address · e20aa071
      Yinjun Zhang authored
      The callback `.ndo_set_rx_mode` is called in atomic context, sleep
      is not allowed in the implementation. Now use workqueue mechanism
      to avoid this issue.
      
      Fixes: de624864 ("nfp: add support for multicast filter")
      Signed-off-by: default avatarYinjun Zhang <yinjun.zhang@corigine.com>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20221220152100.1042774-1-simon.horman@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e20aa071
    • Ronak Doshi's avatar
      vmxnet3: correctly report csum_level for encapsulated packet · 3d8f2c42
      Ronak Doshi authored
      Commit dacce2be ("vmxnet3: add geneve and vxlan tunnel offload
      support") added support for encapsulation offload. However, the
      pathc did not report correctly the csum_level for encapsulated packet.
      
      This patch fixes this issue by reporting correct csum level for the
      encapsulated packet.
      
      Fixes: dacce2be ("vmxnet3: add geneve and vxlan tunnel offload support")
      Signed-off-by: default avatarRonak Doshi <doshir@vmware.com>
      Acked-by: default avatarPeng Li <lpeng@vmware.com>
      Link: https://lore.kernel.org/r/20221220202556.24421-1-doshir@vmware.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d8f2c42
    • Aaron Conole's avatar
      net: openvswitch: release vport resources on failure · 95637d91
      Aaron Conole authored
      A recent commit introducing upcall packet accounting failed to properly
      release the vport object when the per-cpu stats struct couldn't be
      allocated.  This can cause dangling pointers to dp objects long after
      they've been released.
      
      Cc: wangchuanlei <wangchuanlei@inspur.com>
      Fixes: 1933ea36 ("net: openvswitch: Add support to count upcall packets")
      Reported-by: syzbot+8f4e2dcfcb3209ac35f9@syzkaller.appspotmail.com
      Signed-off-by: default avatarAaron Conole <aconole@redhat.com>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Link: https://lore.kernel.org/r/20221220212717.526780-1-aconole@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      95637d91
    • Antoine Tenart's avatar
      net: vrf: determine the dst using the original ifindex for multicast · f2575c8f
      Antoine Tenart authored
      Multicast packets received on an interface bound to a VRF are marked as
      belonging to the VRF and the skb device is updated to point to the VRF
      device itself. This was fine even when a route was associated to a
      device as when performing a fib table lookup 'oif' in fib6_table_lookup
      (coming from 'skb->dev->ifindex' in ip6_route_input) was set to 0 when
      FLOWI_FLAG_SKIP_NH_OIF was set.
      
      With commit 40867d74 ("net: Add l3mdev index to flow struct and
      avoid oif reset for port devices") this is not longer true and multicast
      traffic is not received on the original interface.
      
      Instead of adding back a similar check in fib6_table_lookup determine
      the dst using the original ifindex for multicast VRF traffic. To make
      things consistent across the function do the above for all strict
      packets, which was the logic before commit 6f12fa77 ("vrf: mark skb
      for multicast or link-local as enslaved to VRF"). Note that reverting to
      this behavior should be fine as the change was about marking packets
      belonging to the VRF, not about their dst.
      
      Fixes: 40867d74 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20221220171825.1172237-1-atenart@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f2575c8f
    • Maciej Fijalkowski's avatar
      ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf · 53fc61be
      Maciej Fijalkowski authored
      Previously ice XDP xmit routine was changed in a way that it avoids
      xdp_buff->xdp_frame conversion as it is simply not needed for handling
      XDP_TX action and what is more it saves us CPU cycles. This routine is
      re-used on ZC driver to handle XDP_TX action.
      
      Although for XDP_TX on Rx ZC xdp_buff that comes from xsk_buff_pool is
      converted to xdp_frame, xdp_frame itself is not stored inside
      ice_tx_buf, we only store raw data pointer. Casting this pointer to
      xdp_frame and calling against it xdp_return_frame in
      ice_clean_xdp_tx_buf() results in undefined behavior.
      
      To fix this, simply call page_frag_free() on tx_buf->raw_buf.
      Later intention is to remove the buff->frame conversion in order to
      simplify the codebase and improve XDP_TX performance on ZC.
      
      Fixes: 126cdfe1 ("ice: xsk: Improve AF_XDP ZC Tx and use batching API")
      Reported-and-tested-by: default avatarRobin Cowley <robin.cowley@thehutgroup.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarPiotr Raczynski <piotr.raczynski@.intel.com>
      Link: https://lore.kernel.org/r/20221220175448.693999-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      53fc61be
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2022-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · aa6c3961
      Jakub Kicinski authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.2
      
      First set of fixes for v6.2. Fix for a link error in mt76, fix for an
      iwlwifi firmware crash and two cleanups.
      
      * tag 'wireless-2022-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        wifi: ath9k: use proper statements in conditionals
        wifi: mt76: mt7996: select CONFIG_RELAY
        wifi: iwlwifi: fw: skip PPAG for JF
        wifi: ti: remove obsolete lines in the Makefile
      ====================
      
      Link: https://lore.kernel.org/r/20221221180808.96A8AC433EF@smtp.kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aa6c3961
  2. 21 Dec, 2022 4 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 609d3bc6
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf, netfilter and can.
      
        Current release - regressions:
      
         - bpf: synchronize dispatcher update with bpf_dispatcher_xdp_func
      
         - rxrpc:
            - fix security setting propagation
            - fix null-deref in rxrpc_unuse_local()
            - fix switched parameters in peer tracing
      
        Current release - new code bugs:
      
         - rxrpc:
            - fix I/O thread startup getting skipped
            - fix locking issues in rxrpc_put_peer_locked()
            - fix I/O thread stop
            - fix uninitialised variable in rxperf server
            - fix the return value of rxrpc_new_incoming_call()
      
         - microchip: vcap: fix initialization of value and mask
      
         - nfp: fix unaligned io read of capabilities word
      
        Previous releases - regressions:
      
         - stop in-kernel socket users from corrupting socket's task_frag
      
         - stream: purge sk_error_queue in sk_stream_kill_queues()
      
         - openvswitch: fix flow lookup to use unmasked key
      
         - dsa: mv88e6xxx: avoid reg_lock deadlock in mv88e6xxx_setup_port()
      
         - devlink:
            - hold region lock when flushing snapshots
            - protect devlink dump by the instance lock
      
        Previous releases - always broken:
      
         - bpf:
            - prevent leak of lsm program after failed attach
            - resolve fext program type when checking map compatibility
      
         - skbuff: account for tail adjustment during pull operations
      
         - macsec: fix net device access prior to holding a lock
      
         - bonding: switch back when high prio link up
      
         - netfilter: flowtable: really fix NAT IPv6 offload
      
         - enetc: avoid buffer leaks on xdp_do_redirect() failure
      
         - unix: fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()
      
         - dsa: microchip: remove IRQF_TRIGGER_FALLING in
           request_threaded_irq"
      
      * tag 'net-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
        net: fec: check the return value of build_skb()
        net: simplify sk_page_frag
        Treewide: Stop corrupting socket's task_frag
        net: Introduce sk_use_task_frag in struct sock.
        mctp: Remove device type check at unregister
        net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq
        can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len
        can: flexcan: avoid unbalanced pm_runtime_enable warning
        Documentation: devlink: add missing toc entry for etas_es58x devlink doc
        mctp: serial: Fix starting value for frame check sequence
        nfp: fix unaligned io read of capabilities word
        net: stream: purge sk_error_queue in sk_stream_kill_queues()
        myri10ge: Fix an error handling path in myri10ge_probe()
        net: microchip: vcap: Fix initialization of value and mask
        rxrpc: Fix the return value of rxrpc_new_incoming_call()
        rxrpc: rxperf: Fix uninitialised variable
        rxrpc: Fix I/O thread stop
        rxrpc: Fix switched parameters in peer tracing
        rxrpc: Fix locking issues in rxrpc_put_peer_locked()
        rxrpc: Fix I/O thread startup getting skipped
        ...
      609d3bc6
    • Linus Torvalds's avatar
      Merge tag 'fs.vfsuid.ima.v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping · 878cf96f
      Linus Torvalds authored
      Pull vfsuid cleanup from Christian Brauner:
       "This moves the ima specific vfs{g,u}id_t comparison helpers out of the
        header and into the one file in ima where they are used.
      
        We shouldn't incentivize people to use them by placing them into the
        header. As discussed and suggested by Linus in [1] let's just define
        them locally in the one file in ima where they are used"
      
      Link: https://lore.kernel.org/lkml/CAHk-=wj4BpEwUd=OkTv1F9uykvSrsBNZJVHMp+p_+e2kiV71_A@mail.gmail.com [1]
      
      * tag 'fs.vfsuid.ima.v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
        mnt_idmapping: move ima-only helpers to ima
      878cf96f
    • Linus Torvalds's avatar
      Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random · 222882c2
      Linus Torvalds authored
      Pull more random number generator updates from Jason Donenfeld:
       "Two remaining changes that are now possible after you merged a few
        other trees:
      
         - #include <asm/archrandom.h> can be removed from random.h now,
           making the direct use of the arch_random_* API more of a private
           implementation detail between the archs and random.c, rather than
           something for general consumers.
      
         - Two additional uses of prandom_u32_max() snuck in during the
           initial phase of pulls, so these have been converted to
           get_random_u32_below(), and now the deprecated prandom_u32_max()
           alias -- which was just a wrapper around get_random_u32_below() --
           can be removed.
      
        In addition, there is one fix:
      
         - Check efi_rt_services_supported() before attempting to use an EFI
           runtime function.
      
           This affected EFI systems that disable runtime services yet still
           boot via EFI (e.g. the reporter's Lenovo Thinkpad X13s laptop), as
           well systems where EFI runtime services have been forcibly
           disabled, such as on PREEMPT_RT.
      
           On those machines, a very early and hard to diagnose crash would
           happen, preventing boot"
      
      * tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
        prandom: remove prandom_u32_max()
        efi: random: fix NULL-deref when refreshing seed
        random: do not include <asm/archrandom.h> from random.h
      222882c2
    • Linus Torvalds's avatar
      Merge tag 'rcu-urgent.2022.12.17a' of... · 19822e3e
      Linus Torvalds authored
      Merge tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
      
      Pull RCU fix from Paul McKenney:
       "This fixes a lockdep false positive in synchronize_rcu() that can
        otherwise occur during early boot.
      
        The fix simply avoids invoking lockdep if the scheduler has not yet
        been initialized, that is, during that portion of boot when interrupts
        are disabled"
      
      * tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        rcu: Don't assert interrupts enabled too early in boot
      19822e3e
  3. 20 Dec, 2022 19 commits
  4. 19 Dec, 2022 8 commits