1. 11 Jul, 2024 5 commits
    • Lai Jiangshan's avatar
      workqueue: Remove the argument @cpu_going_down from wq_calc_pod_cpumask() · 88a41b18
      Lai Jiangshan authored
      wq_calc_pod_cpumask() uses wq_online_cpumask, which excludes the cpu
      going down, so the argument cpu_going_down is unused and can be removed.
      Signed-off-by: default avatarLai Jiangshan <jiangshan.ljs@antgroup.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      88a41b18
    • Lai Jiangshan's avatar
      workqueue: Remove the unneeded cpumask empty check in wq_calc_pod_cpumask() · 2cb61f76
      Lai Jiangshan authored
      The cpumask empty check in wq_calc_pod_cpumask() has long been useless.
      It just works purely as documents which states that the cpumask is not
      possible empty after the function returns.
      
      Now the code above is even more explicit that the cpumask is not empty,
      so the document-only empty check can be removed.
      Signed-off-by: default avatarLai Jiangshan <jiangshan.ljs@antgroup.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      2cb61f76
    • Lai Jiangshan's avatar
      workqueue: Remove cpus_read_lock() from apply_wqattrs_lock() · 19af4575
      Lai Jiangshan authored
      1726a171 ("workqueue: Put PWQ allocation and WQ enlistment in the same
      lock C.S.") led to the following possible deadlock:
      
        WARNING: possible recursive locking detected
        6.10.0-rc5-00004-g1d4c6111406c #1 Not tainted
         --------------------------------------------
         swapper/0/1 is trying to acquire lock:
         c27760f4 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue (kernel/workqueue.c:5152 kernel/workqueue.c:5730) 
        
         but task is already holding lock:
         c27760f4 (cpu_hotplug_lock){++++}-{0:0}, at: padata_alloc (kernel/padata.c:1007) 
         ...  
         stack backtrace:
         ...
         cpus_read_lock (include/linux/percpu-rwsem.h:53 kernel/cpu.c:488) 
         alloc_workqueue (kernel/workqueue.c:5152 kernel/workqueue.c:5730) 
         padata_alloc (kernel/padata.c:1007 (discriminator 1)) 
         pcrypt_init_padata (crypto/pcrypt.c:327 (discriminator 1)) 
         pcrypt_init (crypto/pcrypt.c:353) 
         do_one_initcall (init/main.c:1267) 
         do_initcalls (init/main.c:1328 (discriminator 1) init/main.c:1345 (discriminator 1)) 
         kernel_init_freeable (init/main.c:1364) 
         kernel_init (init/main.c:1469) 
         ret_from_fork (arch/x86/kernel/process.c:153) 
         ret_from_fork_asm (arch/x86/entry/entry_32.S:737) 
         entry_INT80_32 (arch/x86/entry/entry_32.S:944) 
      
      This is caused by pcrypt allocating a workqueue while holding
      cpus_read_lock(), so workqueue code can't do it again as that can lead to
      deadlocks if down_write starts after the first down_read.
      
      The pwq creations and installations have been reworked based on
      wq_online_cpumask rather than cpu_online_mask making cpus_read_lock() is
      unneeded during wqattrs changes. Fix the deadlock by removing
      cpus_read_lock() from apply_wqattrs_lock().
      
      tj: Updated changelog.
      Signed-off-by: default avatarLai Jiangshan <jiangshan.ljs@antgroup.com>
      Fixes: 1726a171 ("workqueue: Put PWQ allocation and WQ enlistment in the same lock C.S.")
      Link: http://lkml.kernel.org/r/202407081521.83b627c1-lkp@intel.comSigned-off-by: default avatarTejun Heo <tj@kernel.org>
      19af4575
    • Lai Jiangshan's avatar
      workqueue: Simplify wq_calc_pod_cpumask() with wq_online_cpumask · fbb3d4c1
      Lai Jiangshan authored
      Avoid relying on cpu_online_mask for wqattrs changes so that
      cpus_read_lock() can be removed from apply_wqattrs_lock().
      
      And with wq_online_cpumask, attrs->__pod_cpumask doesn't need to be
      reused as a temporary storage to calculate if the pod have any online
      CPUs @attrs wants since @cpu_going_down is not in the wq_online_cpumask.
      Signed-off-by: default avatarLai Jiangshan <jiangshan.ljs@antgroup.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      fbb3d4c1
    • Lai Jiangshan's avatar
      workqueue: Add wq_online_cpumask · 8d84baf7
      Lai Jiangshan authored
      The new wq_online_mask mirrors the cpu_online_mask except during
      hotplugging; specifically, it differs between the hotplugging stages
      of workqueue_offline_cpu() and workqueue_online_cpu(), during which
      the transitioning CPU is not represented in the mask.
      Signed-off-by: default avatarLai Jiangshan <jiangshan.ljs@antgroup.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      8d84baf7
  2. 05 Jul, 2024 5 commits
  3. 02 Jul, 2024 2 commits
  4. 25 Jun, 2024 2 commits
  5. 21 Jun, 2024 4 commits
  6. 19 Jun, 2024 1 commit
    • Lai Jiangshan's avatar
      workqueue: Avoid nr_active manipulation in grabbing inactive items · b56c7207
      Lai Jiangshan authored
      Current try_to_grab_pending() activates the inactive item and
      subsequently treats it as though it were a standard activated item.
      
      This approach prevents duplicating handling logic for both active and
      inactive items, yet the premature activation of an inactive item
      triggers trace_workqueue_activate_work(), yielding an unintended user
      space visible side effect.
      
      And the unnecessary increment of the nr_active, which is not a simple
      counter now, followed by a counteracted decrement, is inefficient and
      complicates the code.
      
      Just remove the nr_active manipulation code in grabbing inactive items.
      Signed-off-by: default avatarLai Jiangshan <jiangshan.ljs@antgroup.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      b56c7207
  7. 10 Jun, 2024 1 commit
    • Julia Lawall's avatar
      workqueue: replace call_rcu by kfree_rcu for simple kmem_cache_free callback · 37c2277f
      Julia Lawall authored
      Since SLOB was removed, it is not necessary to use call_rcu
      when the callback only performs kmem_cache_free. Use
      kfree_rcu() directly.
      
      The changes were done using the following Coccinelle semantic patch.
      This semantic patch is designed to ignore cases where the callback
      function is used in another way.
      
      // <smpl>
      @r@
      expression e;
      local idexpression e2;
      identifier cb,f;
      position p;
      @@
      
      (
      call_rcu(...,e2)
      |
      call_rcu(&e->f,cb@p)
      )
      
      @r1@
      type T;
      identifier x,r.cb;
      @@
      
       cb(...) {
      (
         kmem_cache_free(...);
      |
         T x = ...;
         kmem_cache_free(...,x);
      |
         T x;
         x = ...;
         kmem_cache_free(...,x);
      )
       }
      
      @s depends on r1@
      position p != r.p;
      identifier r.cb;
      @@
      
       cb@p
      
      @script:ocaml@
      cb << r.cb;
      p << s.p;
      @@
      
      Printf.eprintf "Other use of %s at %s:%d\n"
         cb (List.hd p).file (List.hd p).line
      
      @depends on r1 && !s@
      expression e;
      identifier r.cb,f;
      position r.p;
      @@
      
      - call_rcu(&e->f,cb@p)
      + kfree_rcu(e,f)
      
      @r1a depends on !s@
      type T;
      identifier x,r.cb;
      @@
      
      - cb(...) {
      (
      -  kmem_cache_free(...);
      |
      -  T x = ...;
      -  kmem_cache_free(...,x);
      |
      -  T x;
      -  x = ...;
      -  kmem_cache_free(...,x);
      )
      - }
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@inria.fr>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      37c2277f
  8. 07 Jun, 2024 1 commit
  9. 06 Jun, 2024 19 commits
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 8a929806
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "The core change is to detect unusually large number of VPD pages
        (caused by device manufacturers having an endiannes issue) and reject
        them rather than trying to parse a huge non-existent array.
      
        The remaining fixes are in drivers the most user visible of which is
        the ALUA state transition recognition (leads to intermittent I/O
        errors in some situations otherwise)"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: ufs: mcq: Fix error output and clean up ufshcd_mcq_abort()
        scsi: core: Handle devices which return an unusually large VPD page count
        scsi: mpt3sas: Add missing kerneldoc parameter descriptions
        scsi: qedf: Set qed_slowpath_params to zero before use
        scsi: qedf: Wait for stag work during unload
        scsi: qedf: Don't process stag work during unload and recovery
        scsi: sr: Fix unintentional arithmetic wraparound
        scsi: core: alua: I/O errors for ALUA state transitions
        scsi: mpi3mr: Use proper format specifier in mpi3mr_sas_port_add()
      8a929806
    • Linus Torvalds's avatar
      Merge tag 'pci-v6.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci · d91e6562
      Linus Torvalds authored
      Pull pci fix from Bjorn Helgaas:
      
       - Revert lockdep checking on locking that protects device resets from
         user-space config accesses; it exposed issues for which fixes are in
         the works but are too risky for this cycle (Dan Williams)
      
      * tag 'pci-v6.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
        PCI: Revert the cfg_access_lock lockdep mechanism
      d91e6562
    • Linus Torvalds's avatar
      Merge tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · d30d0e49
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from BPF and big collection of fixes for WiFi core and
        drivers.
      
        Current release - regressions:
      
         - vxlan: fix regression when dropping packets due to invalid src
           addresses
      
         - bpf: fix a potential use-after-free in bpf_link_free()
      
         - xdp: revert support for redirect to any xsk socket bound to the
           same UMEM as it can result in a corruption
      
         - virtio_net:
            - add missing lock protection when reading return code from
              control_buf
            - fix false-positive lockdep splat in DIM
            - Revert "wifi: wilc1000: convert list management to RCU"
      
         - wifi: ath11k: fix error path in ath11k_pcic_ext_irq_config
      
        Previous releases - regressions:
      
         - rtnetlink: make the "split" NLM_DONE handling generic, restore the
           old behavior for two cases where we started coalescing those
           messages with normal messages, breaking sloppily-coded userspace
      
         - wifi:
            - cfg80211: validate HE operation element parsing
            - cfg80211: fix 6 GHz scan request building
            - mt76: mt7615: add missing chanctx ops
            - ath11k: move power type check to ASSOC stage, fix connecting to
              6 GHz AP
            - ath11k: fix WCN6750 firmware crash caused by 17 num_vdevs
            - rtlwifi: ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
            - iwlwifi: mvm: fix a crash on 7265
      
        Previous releases - always broken:
      
         - ncsi: prevent multi-threaded channel probing, a spec violation
      
         - vmxnet3: disable rx data ring on dma allocation failure
      
         - ethtool: init tsinfo stats if requested, prevent unintentionally
           reporting all-zero stats on devices which don't implement any
      
         - dst_cache: fix possible races in less common IPv6 features
      
         - tcp: auth: don't consider TCP_CLOSE to be in TCP_AO_ESTABLISHED
      
         - ax25: fix two refcounting bugs
      
         - eth: ionic: fix kernel panic in XDP_TX action
      
        Misc:
      
         - tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB"
      
      * tag 'net-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
        selftests: net: lib: set 'i' as local
        selftests: net: lib: avoid error removing empty netns name
        selftests: net: lib: support errexit with busywait
        net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool()
        ipv6: fix possible race in __fib6_drop_pcpu_from()
        af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().
        af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
        af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
        af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
        af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen.
        af_unix: Annotate data-races around sk->sk_sndbuf.
        af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG.
        af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb().
        af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg().
        af_unix: Annotate data-race of sk->sk_state in unix_accept().
        af_unix: Annotate data-race of sk->sk_state in unix_stream_connect().
        af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll().
        af_unix: Annotate data-race of sk->sk_state in unix_inq_len().
        af_unix: Annodate data-races around sk->sk_state for writers.
        af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
        ...
      d30d0e49
    • Linus Torvalds's avatar
      Merge tag 'tomoyo-pr-20240606' of git://git.code.sf.net/p/tomoyo/tomoyo · 2faf6332
      Linus Torvalds authored
      Pull tomoyo fixlet from Tetsuo Handa:
       "Single patch to update project links, no behavior changes"
      
      * tag 'tomoyo-pr-20240606' of git://git.code.sf.net/p/tomoyo/tomoyo:
        tomoyo: update project links
      2faf6332
    • Linus Torvalds's avatar
      Merge tag 'efi-fixes-for-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · a34adf60
      Linus Torvalds authored
      Pull EFI fixes from Ard Biesheuvel:
      
       - Ensure that .discard sections are really discarded in the EFI zboot
         image build
      
       - Return proper error numbers from efi-pstore
      
       - Add __nocfi annotations to EFI runtime wrappers
      
      * tag 'efi-fixes-for-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        efi: Add missing __nocfi annotations to runtime wrappers
        efi: pstore: Return proper errors on UEFI failures
        efi/libstub: zboot.lds: Discard .discard sections
      a34adf60
    • Jakub Kicinski's avatar
      Merge branch 'selftests-net-lib-small-fixes' · 27bc8654
      Jakub Kicinski authored
      Matthieu Baerts says:
      
      ====================
      selftests: net: lib: small fixes
      
      While looking at using 'lib.sh' for the MPTCP selftests [1], we found
      some small issues with 'lib.sh'. Here they are:
      
      - Patch 1: fix 'errexit' (set -e) support with busywait. 'errexit' is
        supported in some functions, not all. A fix for v6.8+.
      
      - Patch 2: avoid confusing error messages linked to the cleaning part
        when the netns setup fails. A fix for v6.8+.
      
      - Patch 3: set a variable as local to avoid accidentally changing the
        value of a another one with the same name on the caller side. A fix
        for v6.10-rc1+.
      
      Link: https://lore.kernel.org/mptcp/5f4615c3-0621-43c5-ad25-55747a4350ce@kernel.org/T/ [1]
      ====================
      
      Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-0-b3afadd368c9@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      27bc8654
    • Matthieu Baerts (NGI0)'s avatar
      selftests: net: lib: set 'i' as local · 84a8bc3e
      Matthieu Baerts (NGI0) authored
      Without this, the 'i' variable declared before could be overridden by
      accident, e.g.
      
        for i in "${@}"; do
            __ksft_status_merge "${i}"  ## 'i' has been modified
            foo "${i}"                  ## using 'i' with an unexpected value
        done
      
      After a quick look, it looks like 'i' is currently not used after having
      been modified in __ksft_status_merge(), but still, better be safe than
      sorry. I saw this while modifying the same file, not because I suspected
      an issue somewhere.
      
      Fixes: 596c8819 ("selftests: forwarding: Have RET track kselftest framework constants")
      Acked-by: default avatarGeliang Tang <geliang@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-3-b3afadd368c9@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      84a8bc3e
    • Matthieu Baerts (NGI0)'s avatar
      selftests: net: lib: avoid error removing empty netns name · 79322174
      Matthieu Baerts (NGI0) authored
      If there is an error to create the first netns with 'setup_ns()',
      'cleanup_ns()' will be called with an empty string as first parameter.
      
      The consequences is that 'cleanup_ns()' will try to delete an invalid
      netns, and wait 20 seconds if the netns list is empty.
      
      Instead of just checking if the name is not empty, convert the string
      separated by spaces to an array. Manipulating the array is cleaner, and
      calling 'cleanup_ns()' with an empty array will be a no-op.
      
      Fixes: 25ae948b ("selftests/net: add lib.sh")
      Cc: stable@vger.kernel.org
      Acked-by: default avatarGeliang Tang <geliang@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-2-b3afadd368c9@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      79322174
    • Matthieu Baerts (NGI0)'s avatar
      selftests: net: lib: support errexit with busywait · 41b02ea4
      Matthieu Baerts (NGI0) authored
      If errexit is enabled ('set -e'), loopy_wait -- or busywait and others
      using it -- will stop after the first failure.
      
      Note that if the returned status of loopy_wait is checked, and even if
      errexit is enabled, Bash will not stop at the first error.
      
      Fixes: 25ae948b ("selftests/net: add lib.sh")
      Cc: stable@vger.kernel.org
      Acked-by: default avatarGeliang Tang <geliang@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-1-b3afadd368c9@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      41b02ea4
    • Su Hui's avatar
      net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool() · 0dcc53ab
      Su Hui authored
      Clang static checker (scan-build) warning:
      net/ethtool/ioctl.c:line 2233, column 2
      Called function pointer is null (null dereference).
      
      Return '-EOPNOTSUPP' when 'ops->get_ethtool_phy_stats' is NULL to fix
      this typo error.
      
      Fixes: 201ed315 ("net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers")
      Signed-off-by: default avatarSu Hui <suhui@nfschina.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Link: https://lore.kernel.org/r/20240605034742.921751-1-suhui@nfschina.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0dcc53ab
    • Eric Dumazet's avatar
      ipv6: fix possible race in __fib6_drop_pcpu_from() · b01e1c03
      Eric Dumazet authored
      syzbot found a race in __fib6_drop_pcpu_from() [1]
      
      If compiler reads more than once (*ppcpu_rt),
      second read could read NULL, if another cpu clears
      the value in rt6_get_pcpu_route().
      
      Add a READ_ONCE() to prevent this race.
      
      Also add rcu_read_lock()/rcu_read_unlock() because
      we rely on RCU protection while dereferencing pcpu_rt.
      
      [1]
      
      Oops: general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN PTI
      KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
      CPU: 0 PID: 7543 Comm: kworker/u8:17 Not tainted 6.10.0-rc1-syzkaller-00013-g2bfcfd58 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
      Workqueue: netns cleanup_net
       RIP: 0010:__fib6_drop_pcpu_from.part.0+0x10a/0x370 net/ipv6/ip6_fib.c:984
      Code: f8 48 c1 e8 03 80 3c 28 00 0f 85 16 02 00 00 4d 8b 3f 4d 85 ff 74 31 e8 74 a7 fa f7 49 8d bf 90 00 00 00 48 89 f8 48 c1 e8 03 <80> 3c 28 00 0f 85 1e 02 00 00 49 8b 87 90 00 00 00 48 8b 0c 24 48
      RSP: 0018:ffffc900040df070 EFLAGS: 00010206
      RAX: 0000000000000012 RBX: 0000000000000001 RCX: ffffffff89932e16
      RDX: ffff888049dd1e00 RSI: ffffffff89932d7c RDI: 0000000000000091
      RBP: dffffc0000000000 R08: 0000000000000005 R09: 0000000000000007
      R10: 0000000000000001 R11: 0000000000000006 R12: ffff88807fa080b8
      R13: fffffbfff1a9a07d R14: ffffed100ff41022 R15: 0000000000000001
      FS:  0000000000000000(0000) GS:ffff8880b9200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b32c26000 CR3: 000000005d56e000 CR4: 00000000003526f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
        __fib6_drop_pcpu_from net/ipv6/ip6_fib.c:966 [inline]
        fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1027 [inline]
        fib6_purge_rt+0x7f2/0x9f0 net/ipv6/ip6_fib.c:1038
        fib6_del_route net/ipv6/ip6_fib.c:1998 [inline]
        fib6_del+0xa70/0x17b0 net/ipv6/ip6_fib.c:2043
        fib6_clean_node+0x426/0x5b0 net/ipv6/ip6_fib.c:2205
        fib6_walk_continue+0x44f/0x8d0 net/ipv6/ip6_fib.c:2127
        fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2175
        fib6_clean_tree+0xd7/0x120 net/ipv6/ip6_fib.c:2255
        __fib6_clean_all+0x100/0x2d0 net/ipv6/ip6_fib.c:2271
        rt6_sync_down_dev net/ipv6/route.c:4906 [inline]
        rt6_disable_ip+0x7ed/0xa00 net/ipv6/route.c:4911
        addrconf_ifdown.isra.0+0x117/0x1b40 net/ipv6/addrconf.c:3855
        addrconf_notify+0x223/0x19e0 net/ipv6/addrconf.c:3778
        notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
        call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1992
        call_netdevice_notifiers_extack net/core/dev.c:2030 [inline]
        call_netdevice_notifiers net/core/dev.c:2044 [inline]
        dev_close_many+0x333/0x6a0 net/core/dev.c:1585
        unregister_netdevice_many_notify+0x46d/0x19f0 net/core/dev.c:11193
        unregister_netdevice_many net/core/dev.c:11276 [inline]
        default_device_exit_batch+0x85b/0xae0 net/core/dev.c:11759
        ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
        cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640
        process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
        process_scheduled_works kernel/workqueue.c:3312 [inline]
        worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393
        kthread+0x2c1/0x3a0 kernel/kthread.c:389
        ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
      
      Fixes: d52d3997 ("ipv6: Create percpu rt6_info")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/r/20240604193549.981839-1-edumazet@google.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b01e1c03
    • Paolo Abeni's avatar
      Merge branch 'af_unix-fix-lockless-access-of-sk-sk_state-and-others-fields' · 411c0ea6
      Paolo Abeni authored
      Kuniyuki Iwashima says:
      
      ====================
      af_unix: Fix lockless access of sk->sk_state and others fields.
      
      The patch 1 fixes a bug where SOCK_DGRAM's sk->sk_state is changed
      to TCP_CLOSE even if the socket is connect()ed to another socket.
      
      The rest of this series annotates lockless accesses to the following
      fields.
      
        * sk->sk_state
        * sk->sk_sndbuf
        * net->unx.sysctl_max_dgram_qlen
        * sk->sk_receive_queue.qlen
        * sk->sk_shutdown
      
      Note that with this series there is skb_queue_empty() left in
      unix_dgram_disconnected() that needs to be changed to lockless
      version, and unix_peer(other) access there should be protected
      by unix_state_lock().
      
      This will require some refactoring, so another series will follow.
      
      Changes:
        v2:
          * Patch 1: Fix wrong double lock
      
        v1: https://lore.kernel.org/netdev/20240603143231.62085-1-kuniyu@amazon.com/
      ====================
      
      Link: https://lore.kernel.org/r/20240604165241.44758-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      411c0ea6
    • Kuniyuki Iwashima's avatar
      af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill(). · efaf24e3
      Kuniyuki Iwashima authored
      While dumping sockets via UNIX_DIAG, we do not hold unix_state_lock().
      
      Let's use READ_ONCE() to read sk->sk_shutdown.
      
      Fixes: e4e541a8 ("sock-diag: Report shutdown for inet and unix sockets (v2)")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      efaf24e3
    • Kuniyuki Iwashima's avatar
      af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen(). · 5d915e58
      Kuniyuki Iwashima authored
      We can dump the socket queue length via UNIX_DIAG by specifying
      UDIAG_SHOW_RQLEN.
      
      If sk->sk_state is TCP_LISTEN, we return the recv queue length,
      but here we do not hold recvq lock.
      
      Let's use skb_queue_len_lockless() in sk_diag_show_rqlen().
      
      Fixes: c9da99e6 ("unix_diag: Fixup RQLEN extension report")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5d915e58
    • Kuniyuki Iwashima's avatar
      af_unix: Use skb_queue_empty_lockless() in unix_release_sock(). · 83690b82
      Kuniyuki Iwashima authored
      If the socket type is SOCK_STREAM or SOCK_SEQPACKET, unix_release_sock()
      checks the length of the peer socket's recvq under unix_state_lock().
      
      However, unix_stream_read_generic() calls skb_unlink() after releasing
      the lock.  Also, for SOCK_SEQPACKET, __skb_try_recv_datagram() unlinks
      skb without unix_state_lock().
      
      Thues, unix_state_lock() does not protect qlen.
      
      Let's use skb_queue_empty_lockless() in unix_release_sock().
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      83690b82
    • Kuniyuki Iwashima's avatar
      af_unix: Use unix_recvq_full_lockless() in unix_stream_connect(). · 45d872f0
      Kuniyuki Iwashima authored
      Once sk->sk_state is changed to TCP_LISTEN, it never changes.
      
      unix_accept() takes advantage of this characteristics; it does not
      hold the listener's unix_state_lock() and only acquires recvq lock
      to pop one skb.
      
      It means unix_state_lock() does not prevent the queue length from
      changing in unix_stream_connect().
      
      Thus, we need to use unix_recvq_full_lockless() to avoid data-race.
      
      Now we remove unix_recvq_full() as no one uses it.
      
      Note that we can remove READ_ONCE() for sk->sk_max_ack_backlog in
      unix_recvq_full_lockless() because of the following reasons:
      
        (1) For SOCK_DGRAM, it is a written-once field in unix_create1()
      
        (2) For SOCK_STREAM and SOCK_SEQPACKET, it is changed under the
            listener's unix_state_lock() in unix_listen(), and we hold
            the lock in unix_stream_connect()
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      45d872f0
    • Kuniyuki Iwashima's avatar
      af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen. · bd9f2d05
      Kuniyuki Iwashima authored
      net->unx.sysctl_max_dgram_qlen is exposed as a sysctl knob and can be
      changed concurrently.
      
      Let's use READ_ONCE() in unix_create1().
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bd9f2d05
    • Kuniyuki Iwashima's avatar
      af_unix: Annotate data-races around sk->sk_sndbuf. · b0632e53
      Kuniyuki Iwashima authored
      sk_setsockopt() changes sk->sk_sndbuf under lock_sock(), but it's
      not used in af_unix.c.
      
      Let's use READ_ONCE() to read sk->sk_sndbuf in unix_writable(),
      unix_dgram_sendmsg(), and unix_stream_sendmsg().
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b0632e53
    • Kuniyuki Iwashima's avatar
      af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG. · 0aa3be7b
      Kuniyuki Iwashima authored
      While dumping AF_UNIX sockets via UNIX_DIAG, sk->sk_state is read
      locklessly.
      
      Let's use READ_ONCE() there.
      
      Note that the result could be inconsistent if the socket is dumped
      during the state change.  This is common for other SOCK_DIAG and
      similar interfaces.
      
      Fixes: c9da99e6 ("unix_diag: Fixup RQLEN extension report")
      Fixes: 2aac7a2c ("unix_diag: Pending connections IDs NLA")
      Fixes: 45a96b9b ("unix_diag: Dumping all sockets core")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0aa3be7b