1. 27 Apr, 2024 3 commits
    • Sudan Landge's avatar
      virt: vmgenid: add support for devicetree bindings · 7b1bcd6b
      Sudan Landge authored
      Extend the vmgenid platform driver to support devicetree bindings. With
      this support, hypervisors can send vmgenid notifications to the virtual
      machine without the need to enable ACPI. The bindings are located at:
      Documentation/devicetree/bindings/rng/microsoft,vmgenid.yaml
      
      Since this is no longer ACPI-dependent, remove the dependency from
      Kconfig and protect the ACPI code with a single ifdef.
      Signed-off-by: default avatarSudan Landge <sudanl@amazon.com>
      Reviewed-by: default avatarAlexander Graf <graf@amazon.com>
      Tested-by: default avatarBabis Chalios <bchalios@amazon.es>
      [Jason: - Small style cleanups and refactoring.
              - Re-work ACPI conditionalization. ]
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      7b1bcd6b
    • Sudan Landge's avatar
      dt-bindings: rng: Add vmgenid support · a4aded1f
      Sudan Landge authored
      Virtual Machine Generation ID driver was introduced in commit
      af6b54e2 ("virt: vmgenid: notify RNG of VM fork and supply
      generation ID"), as an ACPI only device.
      
      VMGenID specification http://go.microsoft.com/fwlink/?LinkId=260709
      defines a mechanism for the BIOS/hypervisors to communicate to the
      virtual machine that it is executed with a different configuration (e.g.
      snapshot execution or creation from a template).  The guest operating
      system can use the notification for various purposes such as
      re-initializing its random number generator etc.
      
      As per the specs, hypervisor should provide a globally unique
      identified, or GUID via ACPI.
      
      This patch tries to mimic the mechanism to provide the same
      functionality which is for a hypervisor/BIOS to notify the virtual
      machine when it is executed with a different configuration.
      
      As part of this support the devicetree bindings requires the hypervisors
      or BIOS to provide a memory address which holds the GUID and an IRQ
      which is used to notify when there is a change in the GUID.  The memory
      exposed in the DT should follow the rules defined in the vmgenid spec
      mentioned above.
      
      Reason for this change: Chosing ACPI or devicetree is an intrinsic part
      of an hypervisor design.  Without going into details of why a hypervisor
      would chose DT over ACPI, we would like to highlight that the
      hypervisors that have chose devicetree and now want to make use of the
      vmgenid functionality cannot do so today because vmgenid is an ACPI only
      device.  This forces these hypervisors to change their design which
      could have undesirable impacts on their use-cases, test-scenarios etc.
      
      The point of vmgenid is to provide a mechanism to discover a GUID when
      the execution state of a virtual machine changes and the simplest way to
      do it is pass a memory location and an interrupt via devicetree.  It
      would complicate things unnecessarily if instead of using devicetree, we
      try to implement a new protocol or modify other protocols to somehow
      provide the same functionility.
      
      We believe that adding a devicetree binding for vmgenid is a simpler,
      better alternative to provide the same functionality and will allow such
      hypervisors as mentioned above to continue using devicetree.
      
      More references to the vmgenid specs are found below.
      Signed-off-by: default avatarSudan Landge <sudanl@amazon.com>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Reviewed-by: default avatarAlexander Graf <graf@amazon.com>
      Link: https://www.qemu.org/docs/master/specs/vmgenid.html
      Link: https://learn.microsoft.com/en-us/windows/win32/hyperv_v2/virtual-machine-generation-identifierSigned-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      a4aded1f
    • Sudan Landge's avatar
      virt: vmgenid: change implementation to use a platform driver · e0760671
      Sudan Landge authored
      Re-implement vmgenid as a platform driver in preparation for adding
      devicetree bindings support in next commits.
      Signed-off-by: default avatarSudan Landge <sudanl@amazon.com>
      Reviewed-by: default avatarAlexander Graf <graf@amazon.com>
      Tested-by: default avatarBabis Chalios <bchalios@amazon.es>
      [Jason: - Small style cleanups and refactoring.]
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      e0760671
  2. 26 Apr, 2024 25 commits
  3. 25 Apr, 2024 12 commits
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · c942a0cd
      Linus Torvalds authored
      Pull virtio fix from Michael Tsirkin:
       "enum renames for vdpa uapi - we better do this now before the names
        have been exposed in any releases"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        vDPA: code clean for vhost_vdpa uapi
      c942a0cd
    • Linus Torvalds's avatar
      Merge tag '9p-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs · dda89e2f
      Linus Torvalds authored
      Pull 9p fix from Eric Van Hensbergen:
       "This contains a single mitigation to help deal with an apparent race
        condition between client and server having to deal with inode number
        collisions"
      
      * tag '9p-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
        fs/9p: mitigate inode collisions
      dda89e2f
    • Linus Torvalds's avatar
      Merge tag 'acpi-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · a93289b8
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These fix three recent regressions, one introduced while enabling a
        new platform firmware feature for power management, and two introduced
        by a recent CPPC library update.
      
        Specifics:
      
         - Allow two overlapping Low-Power S0 Idle _DSM function sets to be
           used at the same time (Rafael Wysocki)
      
         - Fix bit offset computation in MASK_VAL() macro used for applying a
           bitmask to a new CPPC register value (Jarred White)
      
         - Fix access width field usage for PCC registers in CPPC (Vanshidhar
           Konda)"
      
      * tag 'acpi-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: PM: s2idle: Evaluate all Low-Power S0 Idle _DSM functions
        ACPI: CPPC: Fix access width used for PCC registers
        ACPI: CPPC: Fix bit_offset shift in MASK_VAL() macro
      a93289b8
    • Linus Torvalds's avatar
      Merge tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 52afb15e
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter, wireless and bluetooth.
      
        Nothing major, regression fixes are mostly in drivers, two more of
        those are flowing towards us thru various trees. I wish some of the
        changes went into -rc5, we'll try to keep an eye on frequency of PRs
        from sub-trees.
      
        Also disproportional number of fixes for bugs added in v6.4, strange
        coincidence.
      
        Current release - regressions:
      
         - igc: fix LED-related deadlock on driver unbind
      
         - wifi: mac80211: small fixes to recent clean up of the connection
           process
      
         - Revert "wifi: iwlwifi: bump FW API to 90 for BZ/SC devices", kernel
           doesn't have all the code to deal with that version, yet
      
         - Bluetooth:
             - set power_ctrl_enabled on NULL returned by gpiod_get_optional()
             - qca: fix invalid device address check, again
      
         - eth: ravb: fix registered interrupt names
      
        Current release - new code bugs:
      
         - wifi: mac80211: check EHT/TTLM action frame length
      
        Previous releases - regressions:
      
         - fix sk_memory_allocated_{add|sub} for architectures where
           __this_cpu_{add|sub}* are not IRQ-safe
      
         - dsa: mv88e6xx: fix link setup for 88E6250
      
        Previous releases - always broken:
      
         - ip: validate dev returned from __in_dev_get_rcu(), prevent possible
           null-derefs in a few places
      
         - switch number of for_each_rcu() loops using call_rcu() on the
           iterator to for_each_safe()
      
         - macsec: fix isolation of broadcast traffic in presence of offload
      
         - vxlan: drop packets from invalid source address
      
         - eth: mlxsw: trap and ACL programming fixes
      
         - eth: bnxt: PCIe error recovery fixes, fix counting dropped packets
      
         - Bluetooth:
             - lots of fixes for the command submission rework from v6.4
             - qca: fix NULL-deref on non-serdev suspend
      
        Misc:
      
         - tools: ynl: don't ignore errors in NLMSG_DONE messages"
      
      * tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
        af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
        net: b44: set pause params only when interface is up
        tls: fix lockless read of strp->msg_ready in ->poll
        dpll: fix dpll_pin_on_pin_register() for multiple parent pins
        net: ravb: Fix registered interrupt names
        octeontx2-af: fix the double free in rvu_npc_freemem()
        net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets
        ice: fix LAG and VF lock dependency in ice_reset_vf()
        iavf: Fix TC config comparison with existing adapter TC config
        i40e: Report MFS in decimal base instead of hex
        i40e: Do not use WQ_MEM_RECLAIM flag for workqueue
        net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()
        net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst for MACsec
        macsec: Detect if Rx skb is macsec-related for offloading devices that update md_dst
        ethernet: Add helper for assigning packet type when dest address does not match device address
        macsec: Enable devices to advertise whether they update sk_buff md_dst during offloads
        net: phy: dp83869: Fix MII mode failure
        netfilter: nf_tables: honor table dormant flag from netdev release event path
        eth: bnxt: fix counting packets discarded due to OOM and netpoll
        igc: Fix LED-related deadlock on driver unbind
        ...
      52afb15e
    • Rafael J. Wysocki's avatar
      Merge branch 'acpi-cppc' · 2ad98467
      Rafael J. Wysocki authored
      * acpi-cppc:
        ACPI: CPPC: Fix access width used for PCC registers
        ACPI: CPPC: Fix bit_offset shift in MASK_VAL() macro
      2ad98467
    • Miaohe Lin's avatar
      mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio() · 52ccdde1
      Miaohe Lin authored
      When I did memory failure tests recently, below warning occurs:
      
      DEBUG_LOCKS_WARN_ON(1)
      WARNING: CPU: 8 PID: 1011 at kernel/locking/lockdep.c:232 __lock_acquire+0xccb/0x1ca0
      Modules linked in: mce_inject hwpoison_inject
      CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      RIP: 0010:__lock_acquire+0xccb/0x1ca0
      RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082
      RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8
      RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0
      RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb
      R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10
      R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004
      FS:  00007ff9f32aa740(0000) GS:ffffa1ce5fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ff9f3134ba0 CR3: 00000008484e4000 CR4: 00000000000006f0
      Call Trace:
       <TASK>
       lock_acquire+0xbe/0x2d0
       _raw_spin_lock_irqsave+0x3a/0x60
       hugepage_subpool_put_pages.part.0+0xe/0xc0
       free_huge_folio+0x253/0x3f0
       dissolve_free_huge_page+0x147/0x210
       __page_handle_poison+0x9/0x70
       memory_failure+0x4e6/0x8c0
       hard_offline_page_store+0x55/0xa0
       kernfs_fop_write_iter+0x12c/0x1d0
       vfs_write+0x380/0x540
       ksys_write+0x64/0xe0
       do_syscall_64+0xbc/0x1d0
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7ff9f3114887
      RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887
      RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001
      RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
      R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00
       </TASK>
      Kernel panic - not syncing: kernel: panic_on_warn set ...
      CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       panic+0x326/0x350
       check_panic_on_warn+0x4f/0x50
       __warn+0x98/0x190
       report_bug+0x18e/0x1a0
       handle_bug+0x3d/0x70
       exc_invalid_op+0x18/0x70
       asm_exc_invalid_op+0x1a/0x20
      RIP: 0010:__lock_acquire+0xccb/0x1ca0
      RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082
      RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8
      RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0
      RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb
      R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10
      R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004
       lock_acquire+0xbe/0x2d0
       _raw_spin_lock_irqsave+0x3a/0x60
       hugepage_subpool_put_pages.part.0+0xe/0xc0
       free_huge_folio+0x253/0x3f0
       dissolve_free_huge_page+0x147/0x210
       __page_handle_poison+0x9/0x70
       memory_failure+0x4e6/0x8c0
       hard_offline_page_store+0x55/0xa0
       kernfs_fop_write_iter+0x12c/0x1d0
       vfs_write+0x380/0x540
       ksys_write+0x64/0xe0
       do_syscall_64+0xbc/0x1d0
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7ff9f3114887
      RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887
      RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001
      RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
      R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00
       </TASK>
      
      After git bisecting and digging into the code, I believe the root cause is
      that _deferred_list field of folio is unioned with _hugetlb_subpool field.
      In __update_and_free_hugetlb_folio(), folio->_deferred_list is
      initialized leading to corrupted folio->_hugetlb_subpool when folio is
      hugetlb.  Later free_huge_folio() will use _hugetlb_subpool and above
      warning happens.
      
      But it is assumed hugetlb flag must have been cleared when calling
      folio_put() in update_and_free_hugetlb_folio().  This assumption is broken
      due to below race:
      
      CPU1					CPU2
      dissolve_free_huge_page			update_and_free_pages_bulk
       update_and_free_hugetlb_folio		 hugetlb_vmemmap_restore_folios
      					  folio_clear_hugetlb_vmemmap_optimized
        clear_flag = folio_test_hugetlb_vmemmap_optimized
        if (clear_flag) <-- False, it's already cleared.
         __folio_clear_hugetlb(folio) <-- Hugetlb is not cleared.
        folio_put
         free_huge_folio <-- free_the_page is expected.
      					 list_for_each_entry()
      					  __folio_clear_hugetlb <-- Too late.
      
      Fix this issue by checking whether folio is hugetlb directly instead of
      checking clear_flag to close the race window.
      
      Link: https://lkml.kernel.org/r/20240419085819.1901645-1-linmiaohe@huawei.com
      Fixes: 32c87719 ("hugetlb: do not clear hugetlb dtor until allocating vmemmap")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      52ccdde1
    • Muhammad Usama Anjum's avatar
      selftests: mm: protection_keys: save/restore nr_hugepages value from launch script · ed74abcd
      Muhammad Usama Anjum authored
      The save/restore of nr_hugepages was added to the test itself by using the
      atexit() functionality.  But it is broken as parent exits after creating
      child.  Hence calling the atexit() function early.  That's not it.  The
      child exits after creating its child and so on.
      
      The parent cannot wait to get the termination status for its children as
      it'll keep on holding the resources until the new pkey allocation fails. 
      It is impossible to wait for exits of all the grand and great grand
      children.  Hence the restoring of nr_hugepages value from parent is wrong.
      
      Let's save/restore the nr_hugepages settings in the launch script
      instead of doing it in the test.
      
      Link: https://lkml.kernel.org/r/20240419115027.3848958-1-usama.anjum@collabora.com
      Fixes: c52eb6db ("selftests: mm: restore settings from only parent process")
      Signed-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Reported-by: default avatarJoey Gouly <joey.gouly@arm.com>
      Closes: https://lore.kernel.org/all/20240418125250.GA2941398@e124191.cambridge.arm.com
      Cc: Joey Gouly <joey.gouly@arm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ed74abcd
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · e33c4963
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
      
       - Revert some backchannel fixes that went into v6.9-rc
      
      * tag 'nfsd-6.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        Revert "NFSD: Convert the callback workqueue to use delayed_work"
        Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"
      e33c4963
    • Linus Torvalds's avatar
      Merge tag 'for-linus-2024042501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · f9e02329
      Linus Torvalds authored
      Pull HID fixes from Benjamin Tissoires:
      
       - A couple of i2c-hid fixes (Kenny Levinsen & Nam Cao)
      
       - A config issue with mcp-2221 when CONFIG_IIO is not enabled
         (Abdelrahman Morsy)
      
       - A dev_err fix in intel-ish-hid (Zhang Lixu)
      
       - A couple of mouse fixes for both nintendo and Logitech-dj (Nuno
         Pereira and Yaraslau Furman)
      
       - I'm changing my main kernel email address as it's way simpler for me
         than the Red Hat one (Benjamin Tissoires)
      
      * tag 'for-linus-2024042501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: mcp-2221: cancel delayed_work only when CONFIG_IIO is enabled
        HID: logitech-dj: allow mice to use all types of reports
        HID: i2c-hid: Revert to await reset ACK before reading report descriptor
        HID: nintendo: Fix N64 controller being identified as mouse
        MAINTAINERS: update Benjamin's email address
        HID: intel-ish-hid: ipc: Fix dev_err usage with uninitialized dev->devc
        HID: i2c-hid: remove I2C_HID_READ_PENDING flag to prevent lock-up
      f9e02329
    • Sergei Antonov's avatar
      mmc: moxart: fix handling of sgm->consumed, otherwise WARN_ON triggers · e027e72e
      Sergei Antonov authored
      When e.g. 8 bytes are to be read, sgm->consumed equals 8 immediately after
      sg_miter_next() call. The driver then increments it as bytes are read,
      so sgm->consumed becomes 16 and this warning triggers in sg_miter_stop():
      WARN_ON(miter->consumed > miter->length);
      
      WARNING: CPU: 0 PID: 28 at lib/scatterlist.c:925 sg_miter_stop+0x2c/0x10c
      CPU: 0 PID: 28 Comm: kworker/0:2 Tainted: G        W          6.9.0-rc5-dirty #249
      Hardware name: Generic DT based system
      Workqueue: events_freezable mmc_rescan
      Call trace:.
       unwind_backtrace from show_stack+0x10/0x14
       show_stack from dump_stack_lvl+0x44/0x5c
       dump_stack_lvl from __warn+0x78/0x16c
       __warn from warn_slowpath_fmt+0xb0/0x160
       warn_slowpath_fmt from sg_miter_stop+0x2c/0x10c
       sg_miter_stop from moxart_request+0xb0/0x468
       moxart_request from mmc_start_request+0x94/0xa8
       mmc_start_request from mmc_wait_for_req+0x60/0xa8
       mmc_wait_for_req from mmc_app_send_scr+0xf8/0x150
       mmc_app_send_scr from mmc_sd_setup_card+0x1c/0x420
       mmc_sd_setup_card from mmc_sd_init_card+0x12c/0x4dc
       mmc_sd_init_card from mmc_attach_sd+0xf0/0x16c
       mmc_attach_sd from mmc_rescan+0x1e0/0x298
       mmc_rescan from process_scheduled_works+0x2e4/0x4ec
       process_scheduled_works from worker_thread+0x1ec/0x24c
       worker_thread from kthread+0xd4/0xe0
       kthread from ret_from_fork+0x14/0x38
      
      This patch adds initial zeroing of sgm->consumed. It is then incremented
      as bytes are read or written.
      Signed-off-by: default avatarSergei Antonov <saproj@gmail.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Fixes: 3ee0e7c3 ("mmc: moxart-mmc: Use sg_miter for PIO")
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Link: https://lore.kernel.org/r/20240422153607.963672-1-saproj@gmail.comSigned-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      e027e72e
    • Jakub Kicinski's avatar
      Merge tag 'nf-24-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · e8baa63f
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      The following patchset contains two Netfilter/IPVS fixes for net:
      
      Patch #1 fixes SCTP checksumming for IPVS with gso packets,
      	 from Ismael Luceno.
      
      Patch #2 honor dormant flag from netdev event path to fix a possible
      	 double hook unregistration.
      
      * tag 'nf-24-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: honor table dormant flag from netdev release event path
        ipvs: Fix checksumming on GSO of SCTP packets
      ====================
      
      Link: https://lore.kernel.org/r/20240425090149.1359547-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e8baa63f
    • Kuniyuki Iwashima's avatar
      af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc(). · 1971d13f
      Kuniyuki Iwashima authored
      syzbot reported a lockdep splat regarding unix_gc_lock and
      unix_state_lock().
      
      One is called from recvmsg() for a connected socket, and another
      is called from GC for TCP_LISTEN socket.
      
      So, the splat is false-positive.
      
      Let's add a dedicated lock class for the latter to suppress the splat.
      
      Note that this change is not necessary for net-next.git as the issue
      is only applied to the old GC impl.
      
      [0]:
      WARNING: possible circular locking dependency detected
      6.9.0-rc5-syzkaller-00007-g4d200843 #0 Not tainted
       -----------------------------------------------------
      kworker/u8:1/11 is trying to acquire lock:
      ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
      ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
      
      but task is already holding lock:
      ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
      ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
       -> #1 (unix_gc_lock){+.+.}-{2:2}:
             lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
             __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
             _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
             spin_lock include/linux/spinlock.h:351 [inline]
             unix_notinflight+0x13d/0x390 net/unix/garbage.c:140
             unix_detach_fds net/unix/af_unix.c:1819 [inline]
             unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876
             skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188
             skb_release_all net/core/skbuff.c:1200 [inline]
             __kfree_skb net/core/skbuff.c:1216 [inline]
             kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252
             kfree_skb include/linux/skbuff.h:1262 [inline]
             manage_oob net/unix/af_unix.c:2672 [inline]
             unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749
             unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981
             do_splice_read fs/splice.c:985 [inline]
             splice_file_to_pipe+0x299/0x500 fs/splice.c:1295
             do_splice+0xf2d/0x1880 fs/splice.c:1379
             __do_splice fs/splice.c:1436 [inline]
             __do_sys_splice fs/splice.c:1652 [inline]
             __se_sys_splice+0x331/0x4a0 fs/splice.c:1634
             do_syscall_x64 arch/x86/entry/common.c:52 [inline]
             do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
             entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
       -> #0 (&u->lock){+.+.}-{2:2}:
             check_prev_add kernel/locking/lockdep.c:3134 [inline]
             check_prevs_add kernel/locking/lockdep.c:3253 [inline]
             validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
             __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
             lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
             __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
             _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
             spin_lock include/linux/spinlock.h:351 [inline]
             __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
             process_one_work kernel/workqueue.c:3254 [inline]
             process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
             worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
             kthread+0x2f0/0x390 kernel/kthread.c:388
             ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
             ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(unix_gc_lock);
                                     lock(&u->lock);
                                     lock(unix_gc_lock);
        lock(&u->lock);
      
       *** DEADLOCK ***
      
      3 locks held by kworker/u8:1/11:
       #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline]
       #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335
       #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline]
       #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335
       #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
       #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261
      
      stack backtrace:
      CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d200843 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Workqueue: events_unbound __unix_gc
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
       check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
       check_prev_add kernel/locking/lockdep.c:3134 [inline]
       check_prevs_add kernel/locking/lockdep.c:3253 [inline]
       validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
       __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
       __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
       _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
       spin_lock include/linux/spinlock.h:351 [inline]
       __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
       process_one_work kernel/workqueue.c:3254 [inline]
       process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
       worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
       kthread+0x2f0/0x390 kernel/kthread.c:388
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
       </TASK>
      
      Fixes: 47d8ac01 ("af_unix: Fix garbage collector racing against connect()")
      Reported-and-tested-by: syzbot+fa379358c28cc87cc307@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240424170443.9832-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1971d13f