1. 12 Jul, 2024 12 commits
    • Linus Torvalds's avatar
      Merge tag 'arm-fixes-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · e091caf9
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "Most of these changes are Qualcomm SoC specific and came in just after
        I sent out the last set of fixes. This includes two regression fixes
        for SoC drivers, a defconfig change to ensure the Lenovo X13s is
        usable and 11 changes to DT files to fix regressions and minor
        platform specific issues.
      
        Tony and Chunyan step back from their respective maintainership roles
        on the omap and unisoc platforms, and Christophe in turn takes over
        maintaining some of the Freescale SoC drivers that he has been taking
        care of in practice already.
      
        Lastly, there are two trivial fixes for the davinci and sunxi
        platforms"
      
      * tag 'arm-fixes-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
        MAINTAINERS: Update FREESCALE SOC DRIVERS and QUICC ENGINE LIBRARY
        MAINTAINERS: Add more maintainers for omaps
        ARM: davinci: Convert comma to semicolon
        MAINTAINERS: Move myself from SPRD Maintainer to Reviewer
        Revert "dt-bindings: cache: qcom,llcc: correct QDU1000 reg entries"
        arm64: dts: qcom: qdu1000: Fix LLCC reg property
        arm64: dts: qcom: sm6115: add iommu for sdhc_1
        arm64: dts: qcom: x1e80100-crd: fix DAI used for headset recording
        arm64: dts: qcom: x1e80100-crd: fix WCD audio codec TX port mapping
        soc: qcom: pmic_glink: disable UCSI on sc8280xp
        arm64: defconfig: enable Elan i2c-hid driver
        arm64: dts: qcom: sc8280xp-crd: use external pull up for touch reset
        arm64: dts: qcom: sc8280xp-x13s: fix touchscreen power on
        arm64: dts: qcom: x1e80100: Fix PCIe 6a reg offsets and add MHI
        arm64: dts: qcom: sa8775p: Correct IRQ number of EL2 non-secure physical timer
        arm64: dts: allwinner: Fix PMIC interrupt number
        arm64: dts: qcom: sc8280xp: Set status = "reserved" on PSHOLD
        arm64: dts: qcom: x1e80100-*: Allocate some CMA buffers
        arm64: dts: qcom: sc8180x: Fix LLCC reg property again
      e091caf9
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.10-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · f469cf96
      Linus Torvalds authored
      Pull char / misc driver fixes from Greg KH:
       "Here are some small remaining driver fixes for 6.10-final that have
        all been in linux-next for a while and resolve reported issues.
        Included in here are:
      
         - mei driver fixes (and a spelling fix at the end just to be clean)
      
         - iio driver fixes for reported problems
      
         - fastrpc bugfixes
      
         - nvmem small fixes"
      
      * tag 'char-misc-6.10-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        mei: vsc: Fix spelling error
        mei: vsc: Enhance SPI transfer of IVSC ROM
        mei: vsc: Utilize the appropriate byte order swap function
        mei: vsc: Prevent timeout error with added delay post-firmware download
        mei: vsc: Enhance IVSC chipset stability during warm reboot
        nvmem: core: limit cell sysfs permissions to main attribute ones
        nvmem: core: only change name to fram for current attribute
        nvmem: meson-efuse: Fix return value of nvmem callbacks
        nvmem: rmem: Fix return value of rmem_read()
        misc: microchip: pci1xxxx: Fix return value of nvmem callbacks
        hpet: Support 32-bit userspace
        misc: fastrpc: Restrict untrusted app to attach to privileged PD
        misc: fastrpc: Fix ownership reassignment of remote heap
        misc: fastrpc: Fix memory leak in audio daemon attach operation
        misc: fastrpc: Avoid updating PD type for capability request
        misc: fastrpc: Copy the complete capability structure to user
        misc: fastrpc: Fix DSP capabilities request
        iio: light: apds9306: Fix error handing
        iio: trigger: Fix condition for own trigger
      f469cf96
    • Linus Torvalds's avatar
      Merge tag 'tty-6.10-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 1cb67bcc
      Linus Torvalds authored
      Pull tty / serial fixes from Greg KH:
       "Here are some small serial driver fixes for 6.10-final. Included in
        here are:
      
         - qcom-geni fixes for a much much much discussed issue and everyone
           now seems to be agreed that this is the proper way forward to
           resolve the reported lockups
      
         - imx serial driver bugfixes
      
         - 8250_omap errata fix
      
         - ma35d1 serial driver bugfix
      
        All of these have been in linux-next for over a week with no reported
        issues"
      
      * tag 'tty-6.10-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        serial: qcom-geni: do not kill the machine on fifo underrun
        serial: qcom-geni: fix hard lockup on buffer flush
        serial: qcom-geni: fix soft lockup on sw flow control and suspend
        serial: imx: ensure RTS signal is not left active after shutdown
        tty: serial: ma35d1: Add a NULL check for of_node
        serial: 8250_omap: Fix Errata i2310 with RX FIFO level check
        serial: imx: only set receiver level if it is zero
      1cb67bcc
    • Linus Torvalds's avatar
      Merge tag 'usb-6.10-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 1293147a
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB driver fixes and new device ids for
        6.10-final. Included in here are:
      
         - new usb-serial device ids for reported devices
      
         - syzbot-triggered duplicate endpoint bugfix
      
         - gadget bugfix for configfs memory overwrite
      
         - xhci resume bugfix
      
         - new device quirk added
      
         - usb core error path bugfix
      
        All of these have been in linux-next (most for a while) with no
        reported issues"
      
      * tag 'usb-6.10-final' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: serial: mos7840: fix crash on resume
        USB: serial: option: add Rolling RW350-GL variants
        USB: serial: option: add support for Foxconn T99W651
        USB: serial: option: add Netprisma LCUK54 series modules
        usb: gadget: configfs: Prevent OOB read/write in usb_string_copy()
        usb: dwc3: pci: add support for the Intel Panther Lake
        usb: core: add missing of_node_put() in usb_of_has_devices_or_graph
        USB: Add USB_QUIRK_NO_SET_INTF quirk for START BP-850k
        USB: core: Fix duplicate endpoint bug by clearing reserved bits in the descriptor
        xhci: always resume roothubs if xHC was reset during resume
        USB: serial: option: add Telit generic core-dump composition
        USB: serial: option: add Fibocom FM350-GL
        USB: serial: option: add Telit FN912 rmnet compositions
      1293147a
    • Linus Torvalds's avatar
      Merge tag 'sound-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 9b48104b
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "The majority of changes here are small device-specific fixes for ASoC
        SOF / Intel and usual HD-audio quirks.
      
        The only significant high LOC is found in the Cirrus firmware driver,
        but all those are for hardening against malicious firmware blobs, and
        they look fine for taking as a last minute fix, too"
      
      * tag 'sound-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda/realtek: Enable Mute LED on HP 250 G7
        firmware: cs_dsp: Use strnlen() on name fields in V1 wmfw files
        ALSA: hda/realtek: Limit mic boost on VAIO PRO PX
        ALSA: hda: cs35l41: Fix swapped l/r audio channels for Lenovo ThinBook 13x Gen4
        ASoC: SOF: Intel: hda-pcm: Limit the maximum number of periods by MAX_BDL_ENTRIES
        ASoC: rt711-sdw: add missing readable registers
        ASoC: SOF: Intel: hda: fix null deref on system suspend entry
        ALSA: hda/realtek: add quirk for Clevo V5[46]0TU
        firmware: cs_dsp: Prevent buffer overrun when processing V2 alg headers
        firmware: cs_dsp: Validate payload length before processing block
        firmware: cs_dsp: Return error if block header overflows file
        firmware: cs_dsp: Fix overflow checking of wmfw header
      9b48104b
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-07-12' of https://evilpiepirate.org/git/bcachefs · 5d4c8513
      Linus Torvalds authored
      Pull more bcachefs fixes from Kent Overstreet:
      
       - revert the SLAB_ACCOUNT patch, something crazy is going on in memcg
         and someone forgot to test
      
       - minor fixes: missing rcu_read_lock(), scheduling while atomic (in an
         emergency shutdown path)
      
       - two lockdep fixes; these could have gone earlier, but were left to
         bake awhile
      
      * tag 'bcachefs-2024-07-12' of https://evilpiepirate.org/git/bcachefs:
        bcachefs: bch2_gc_btree() should not use btree_root_lock
        bcachefs: Set PF_MEMALLOC_NOFS when trans->locked
        bcachefs; Use trans_unlock_long() when waiting on allocator
        Revert "bcachefs: Mark bch_inode_info as SLAB_ACCOUNT"
        bcachefs: fix scheduling while atomic in break_cycle()
        bcachefs: Fix RCU splat
      5d4c8513
    • Christophe Leroy's avatar
      MAINTAINERS: Update FREESCALE SOC DRIVERS and QUICC ENGINE LIBRARY · 6fba5cbd
      Christophe Leroy authored
      FREESCALE SOC DRIVERS has been orphaned since
      commit eaac25d0 ("MAINTAINERS: Drop Li Yang as their email address
      stopped working")
      QUICC ENGINE LIBRARY has Qiang Zhao as maintainer but he hasn't
      responded for years and when Li Yang was still maintaining FREESCALE
      SOC DRIVERS he was also handling QUICC ENGINE LIBRARY directly.
      
      As a maintainer of LINUX FOR POWERPC EMBEDDED PPC8XX AND PPC83XX, I
      also need FREESCALE SOC DRIVERS to be actively maintained, so add
      myself as maintainer of FREESCALE SOC DRIVERS and QUICC ENGINE LIBRARY.
      
      See below link for more context.
      
      Link: https://lore.kernel.org/linuxppc-dev/20240219153016.ntltc76bphwrv6hn@skbuf/T/#mf6d4a5eef79e8eae7ae0456a2794c01e630a6756Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      6fba5cbd
    • Tony Lindgren's avatar
      MAINTAINERS: Add more maintainers for omaps · dfd168e7
      Tony Lindgren authored
      There are many generations of omaps to maintain, and I will be only active
      as a hobbyist with time permitting. Let's add more maintainers to ensure
      continued Linux support.
      
      TI is interested in maintaining the active SoCs such as am3, am4 and
      dra7. And the hobbyists are interested in maintaining some of the older
      devices, mainly based on omap3 and 4 SoCs.
      
      Kevin and Roger have agreed to maintain the active TI parts. Both Kevin
      and Roger have been working on the omap variants for a long time, and
      have a good understanding of the hardware.
      
      Aaro and Andreas have agreed to maintain the community devices. Both Aaro
      and Andreas have long experience on working with the earlier TI SoCs.
      
      While at it, let's also change me to be a reviewer for the omap1, and
      drop the link to my old omap web page.
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Acked-by: default avatarKevin Hilman <khilman@baylibre.com>
      Acked-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Acked-by: default avatarRoger Quadros <rogerq@kernel.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      dfd168e7
    • Kent Overstreet's avatar
      bcachefs: bch2_gc_btree() should not use btree_root_lock · 1841027c
      Kent Overstreet authored
      btree_root_lock is for the root keys in btree_root, not the pointers to
      the nodes themselves; this fixes a lock ordering issue between
      btree_root_lock and btree node locks.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      1841027c
    • Kent Overstreet's avatar
      bcachefs: Set PF_MEMALLOC_NOFS when trans->locked · f236ea4b
      Kent Overstreet authored
      proper lock ordering is: fs_reclaim -> btree node locks
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      f236ea4b
    • Kent Overstreet's avatar
      bcachefs; Use trans_unlock_long() when waiting on allocator · f0f3e511
      Kent Overstreet authored
      not using unlock_long() blocks key cache reclaim, and the allocator may
      take awhile
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      f0f3e511
    • Kent Overstreet's avatar
      Revert "bcachefs: Mark bch_inode_info as SLAB_ACCOUNT" · aacd897d
      Kent Overstreet authored
      This reverts commit 86d81ec5.
      
      This wasn't tested with memcg enabled, it immediately hits a null ptr
      deref in list_lru_add().
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      aacd897d
  2. 11 Jul, 2024 20 commits
    • Linus Torvalds's avatar
      Merge tag 'for-6.10/dm-fixes-2' of... · 43db1e03
      Linus Torvalds authored
      Merge tag 'for-6.10/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fix from Mikulas Patocka:
      
       - Fix broken discard for device mapper VDO target
      
      * tag 'for-6.10/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm vdo: replace max_discard_sectors with max_hw_discard_sectors
      43db1e03
    • Bruce Johnston's avatar
      dm vdo: replace max_discard_sectors with max_hw_discard_sectors · d5cfecfe
      Bruce Johnston authored
      Commit 4f563a64 ("block: add a max_user_discard_sectors queue
      limit") changed block core to set max_discard_sectors to:
      min(lim->max_hw_discard_sectors, lim->max_user_discard_sectors)
      
      Commit 825d8bbd ("dm: always manage discard support in terms
      of max_hw_discard_sectors") fixed most dm targetss to deal with
      this, by replacing max_discard_sectors with max_hw_discard_sectors.
      Unfortunately, dm-vdo did not get fixed at that time.
      
      Fixes: 825d8bbd ("dm: always manage discard support in terms of max_hw_discard_sectors")
      Signed-off-by: default avatarBruce Johnston <bjohnsto@redhat.com>
      Signed-off-by: default avatarMatthew Sakai <msakai@redhat.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      d5cfecfe
    • Linus Torvalds's avatar
      Merge tag 'spi-fix-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 8a18fda0
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "This fixes two regressions that have been bubbling along for a large
        part of this release.
      
        One is a revert of the multi mode support for the OMAP SPI controller,
        this introduced regressions on a number of systems and while there has
        been progress on fixing those we've not got something that works for
        everyone yet so let's just drop the change for now.
      
        The other is a series of fixes from David Lechner for his recent
        message optimisation work, this interacted badly with spi-mux which
        is altogether too clever with recursive use of the bus and creates
        situations that hadn't been considered.
      
        There are also a couple of small driver specific fixes, including one
        more patch from David for sleep duration calculations in the AXI
        driver"
      
      * tag 'spi-fix-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: mux: set ctlr->bits_per_word_mask
        spi: add defer_optimize_message controller flag
        spi: don't unoptimize message in spi_async()
        spi: omap2-mcspi: Revert multi mode support
        spi: davinci: Unset POWERDOWN bit when releasing resources
        spi: axi-spi-engine: fix sleep calculation
        spi: imx: Don't expect DMA for i.MX{25,35,50,51,53} cspi devices
      8a18fda0
    • Linus Torvalds's avatar
      Merge tag 'net-6.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 51df8e0c
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bpf and netfilter.
      
        Current release - regressions:
      
         - core: fix rc7's __skb_datagram_iter() regression
      
        Current release - new code bugs:
      
         - eth: bnxt: fix crashes when reducing ring count with active RSS
           contexts
      
        Previous releases - regressions:
      
         - sched: fix UAF when resolving a clash
      
         - skmsg: skip zero length skb in sk_msg_recvmsg2
      
         - sunrpc: fix kernel free on connection failure in
           xs_tcp_setup_socket
      
         - tcp: avoid too many retransmit packets
      
         - tcp: fix incorrect undo caused by DSACK of TLP retransmit
      
         - udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
      
         - eth: ks8851: fix deadlock with the SPI chip variant
      
         - eth: i40e: fix XDP program unloading while removing the driver
      
        Previous releases - always broken:
      
         - bpf:
             - fix too early release of tcx_entry
             - fail bpf_timer_cancel when callback is being cancelled
             - bpf: fix order of args in call to bpf_map_kvcalloc
      
         - netfilter: nf_tables: prefer nft_chain_validate
      
         - ppp: reject claimed-as-LCP but actually malformed packets
      
         - wireguard: avoid unaligned 64-bit memory accesses"
      
      * tag 'net-6.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits)
        net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket
        net/sched: Fix UAF when resolving a clash
        net: ks8851: Fix potential TX stall after interface reopen
        udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
        netfilter: nf_tables: prefer nft_chain_validate
        netfilter: nfnetlink_queue: drop bogus WARN_ON
        ethtool: netlink: do not return SQI value if link is down
        ppp: reject claimed-as-LCP but actually malformed packets
        selftests/bpf: Add timer lockup selftest
        net: ethernet: mtk-star-emac: set mac_managed_pm when probing
        e1000e: fix force smbus during suspend flow
        tcp: avoid too many retransmit packets
        bpf: Defer work in bpf_timer_cancel_and_free
        bpf: Fail bpf_timer_cancel when callback is being cancelled
        bpf: fix order of args in call to bpf_map_kvcalloc
        net: ethernet: lantiq_etop: fix double free in detach
        i40e: Fix XDP program unloading while removing the driver
        net: fix rc7's __skb_datagram_iter()
        net: ks8851: Fix deadlock with the SPI chip variant
        octeontx2-af: Fix incorrect value output on error path in rvu_check_rsrc_availability()
        ...
      51df8e0c
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.10-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 83ab4b46
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
       "cachefiles:
      
         - Export an existing and add a new cachefile helper to be used in
           filesystems to fix reference count bugs
      
         - Use the newly added fscache_ty_get_volume() helper to get a
           reference count on an fscache_volume to handle volumes that are
           about to be removed cleanly
      
         - After withdrawing a fscache_cache via FSCACHE_CACHE_IS_WITHDRAWN
           wait for all ongoing cookie lookups to complete and for the object
           count to reach zero
      
         - Propagate errors from vfs_getxattr() to avoid an infinite loop in
           cachefiles_check_volume_xattr() because it keeps seeing ESTALE
      
         - Don't send new requests when an object is dropped by raising
           CACHEFILES_ONDEMAND_OJBSTATE_DROPPING
      
         - Cancel all requests for an object that is about to be dropped
      
         - Wait for the ondemand_boject_worker to finish before dropping a
           cachefiles object to prevent use-after-free
      
         - Use cyclic allocation for message ids to better handle id recycling
      
         - Add missing lock protection when iterating through the xarray when
           polling
      
        netfs:
      
         - Use standard logging helpers for debug logging
      
        VFS:
      
         - Fix potential use-after-free in file locks during
           trace_posix_lock_inode(). The tracepoint could fire while another
           task raced it and freed the lock that was requested to be traced
      
         - Only increment the nr_dentry_negative counter for dentries that are
           present on the superblock LRU. Currently, DCACHE_LRU_LIST list is
           used to detect this case. However, the flag is also raised in
           combination with DCACHE_SHRINK_LIST to indicate that dentry->d_lru
           is used. So checking only DCACHE_LRU_LIST will lead to wrong
           nr_dentry_negative count. Fix the check to not count dentries that
           are on a shrink related list
      
        Misc:
      
         - hfsplus: fix an uninitialized value issue in copy_name
      
         - minix: fix minixfs_rename with HIGHMEM. It still uses kunmap() even
           though we switched it to kmap_local_page() a while ago"
      
      * tag 'vfs-6.10-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        minixfs: Fix minixfs_rename with HIGHMEM
        hfsplus: fix uninit-value in copy_name
        vfs: don't mod negative dentry count when on shrinker list
        filelock: fix potential use-after-free in posix_lock_inode
        cachefiles: add missing lock protection when polling
        cachefiles: cyclic allocation of msg_id to avoid reuse
        cachefiles: wait for ondemand_object_worker to finish when dropping object
        cachefiles: cancel all requests for the object that is being dropped
        cachefiles: stop sending new request when dropping object
        cachefiles: propagate errors from vfs_getxattr() to avoid infinite loop
        cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie()
        cachefiles: fix slab-use-after-free in fscache_withdraw_volume()
        netfs, fscache: export fscache_put_volume() and add fscache_try_get_volume()
        netfs: Switch debug logging to pr_debug()
      83ab4b46
    • Takashi Iwai's avatar
      Merge tag 'asoc-fix-v6.10-rc7' of... · f19e1027
      Takashi Iwai authored
      Merge tag 'asoc-fix-v6.10-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
      
      ASoC: Fixes for v6.10
      
      A few fairly small fixes for ASoC, there's a relatively large set of
      hardening changes for the cs_dsp firmware file parsing and a couple of
      other small device specific fixes.
      f19e1027
    • Paolo Abeni's avatar
      Merge tag 'nf-24-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · d7c199e7
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following batch contains Netfilter fixes for net:
      
      Patch #1 fixes a bogus WARN_ON splat in nfnetlink_queue.
      
      Patch #2 fixes a crash due to stack overflow in chain loop detection
      	 by using the existing chain validation routines
      
      Both patches from Florian Westphal.
      
      netfilter pull request 24-07-11
      
      * tag 'nf-24-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: prefer nft_chain_validate
        netfilter: nfnetlink_queue: drop bogus WARN_ON
      ====================
      
      Link: https://patch.msgid.link/20240711093948.3816-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d7c199e7
    • Paolo Abeni's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · a819ff0c
      Paolo Abeni authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2024-07-11
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 4 non-merge commits during the last 2 day(s) which contain
      a total of 4 files changed, 262 insertions(+), 19 deletions(-).
      
      The main changes are:
      
      1) Fixes for a BPF timer lockup and a use-after-free scenario when timers
         are used concurrently, from Kumar Kartikeya Dwivedi.
      
      2) Fix the argument order in the call to bpf_map_kvcalloc() which could
         otherwise lead to a compilation error, from Mohammad Shehar Yaar Tausif.
      
      bpf-for-netdev
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: Add timer lockup selftest
        bpf: Defer work in bpf_timer_cancel_and_free
        bpf: Fail bpf_timer_cancel when callback is being cancelled
        bpf: fix order of args in call to bpf_map_kvcalloc
      ====================
      
      Link: https://patch.msgid.link/20240711084016.25757-1-daniel@iogearbox.netSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a819ff0c
    • Daniel Borkmann's avatar
      net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket · 626dfed5
      Daniel Borkmann authored
      When using a BPF program on kernel_connect(), the call can return -EPERM. This
      causes xs_tcp_setup_socket() to loop forever, filling up the syslog and causing
      the kernel to potentially freeze up.
      
      Neil suggested:
      
        This will propagate -EPERM up into other layers which might not be ready
        to handle it. It might be safer to map EPERM to an error we would be more
        likely to expect from the network system - such as ECONNREFUSED or ENETDOWN.
      
      ECONNREFUSED as error seems reasonable. For programs setting a different error
      can be out of reach (see handling in 4fbac77d) in particular on kernels
      which do not have f10d0596 ("bpf: Make BPF_PROG_RUN_ARRAY return -err
      instead of allow boolean"), thus given that it is better to simply remap for
      consistent behavior. UDP does handle EPERM in xs_udp_send_request().
      
      Fixes: d74bad4e ("bpf: Hooks for sys_connect")
      Fixes: 4fbac77d ("bpf: Hooks for sys_bind")
      Co-developed-by: default avatarLex Siegel <usiegl00@gmail.com>
      Signed-off-by: default avatarLex Siegel <usiegl00@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Trond Myklebust <trondmy@kernel.org>
      Cc: Anna Schumaker <anna@kernel.org>
      Link: https://github.com/cilium/cilium/issues/33395
      Link: https://lore.kernel.org/bpf/171374175513.12877.8993642908082014881@noble.neil.brown.name
      Link: https://patch.msgid.link/9069ec1d59e4b2129fc23433349fd5580ad43921.1720075070.git.daniel@iogearbox.netSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      626dfed5
    • Chengen Du's avatar
      net/sched: Fix UAF when resolving a clash · 26488172
      Chengen Du authored
      KASAN reports the following UAF:
      
       BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct]
       Read of size 1 at addr ffff888c07603600 by task handler130/6469
      
       Call Trace:
        <IRQ>
        dump_stack_lvl+0x48/0x70
        print_address_description.constprop.0+0x33/0x3d0
        print_report+0xc0/0x2b0
        kasan_report+0xd0/0x120
        __asan_load1+0x6c/0x80
        tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct]
        tcf_ct_act+0x886/0x1350 [act_ct]
        tcf_action_exec+0xf8/0x1f0
        fl_classify+0x355/0x360 [cls_flower]
        __tcf_classify+0x1fd/0x330
        tcf_classify+0x21c/0x3c0
        sch_handle_ingress.constprop.0+0x2c5/0x500
        __netif_receive_skb_core.constprop.0+0xb25/0x1510
        __netif_receive_skb_list_core+0x220/0x4c0
        netif_receive_skb_list_internal+0x446/0x620
        napi_complete_done+0x157/0x3d0
        gro_cell_poll+0xcf/0x100
        __napi_poll+0x65/0x310
        net_rx_action+0x30c/0x5c0
        __do_softirq+0x14f/0x491
        __irq_exit_rcu+0x82/0xc0
        irq_exit_rcu+0xe/0x20
        common_interrupt+0xa1/0xb0
        </IRQ>
        <TASK>
        asm_common_interrupt+0x27/0x40
      
       Allocated by task 6469:
        kasan_save_stack+0x38/0x70
        kasan_set_track+0x25/0x40
        kasan_save_alloc_info+0x1e/0x40
        __kasan_krealloc+0x133/0x190
        krealloc+0xaa/0x130
        nf_ct_ext_add+0xed/0x230 [nf_conntrack]
        tcf_ct_act+0x1095/0x1350 [act_ct]
        tcf_action_exec+0xf8/0x1f0
        fl_classify+0x355/0x360 [cls_flower]
        __tcf_classify+0x1fd/0x330
        tcf_classify+0x21c/0x3c0
        sch_handle_ingress.constprop.0+0x2c5/0x500
        __netif_receive_skb_core.constprop.0+0xb25/0x1510
        __netif_receive_skb_list_core+0x220/0x4c0
        netif_receive_skb_list_internal+0x446/0x620
        napi_complete_done+0x157/0x3d0
        gro_cell_poll+0xcf/0x100
        __napi_poll+0x65/0x310
        net_rx_action+0x30c/0x5c0
        __do_softirq+0x14f/0x491
      
       Freed by task 6469:
        kasan_save_stack+0x38/0x70
        kasan_set_track+0x25/0x40
        kasan_save_free_info+0x2b/0x60
        ____kasan_slab_free+0x180/0x1f0
        __kasan_slab_free+0x12/0x30
        slab_free_freelist_hook+0xd2/0x1a0
        __kmem_cache_free+0x1a2/0x2f0
        kfree+0x78/0x120
        nf_conntrack_free+0x74/0x130 [nf_conntrack]
        nf_ct_destroy+0xb2/0x140 [nf_conntrack]
        __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack]
        nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack]
        __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack]
        tcf_ct_act+0x12ad/0x1350 [act_ct]
        tcf_action_exec+0xf8/0x1f0
        fl_classify+0x355/0x360 [cls_flower]
        __tcf_classify+0x1fd/0x330
        tcf_classify+0x21c/0x3c0
        sch_handle_ingress.constprop.0+0x2c5/0x500
        __netif_receive_skb_core.constprop.0+0xb25/0x1510
        __netif_receive_skb_list_core+0x220/0x4c0
        netif_receive_skb_list_internal+0x446/0x620
        napi_complete_done+0x157/0x3d0
        gro_cell_poll+0xcf/0x100
        __napi_poll+0x65/0x310
        net_rx_action+0x30c/0x5c0
        __do_softirq+0x14f/0x491
      
      The ct may be dropped if a clash has been resolved but is still passed to
      the tcf_ct_flow_table_process_conn function for further usage. This issue
      can be fixed by retrieving ct from skb again after confirming conntrack.
      
      Fixes: 0cc254e5 ("net/sched: act_ct: Offload connections with commit action")
      Co-developed-by: default avatarGerald Yang <gerald.yang@canonical.com>
      Signed-off-by: default avatarGerald Yang <gerald.yang@canonical.com>
      Signed-off-by: default avatarChengen Du <chengen.du@canonical.com>
      Link: https://patch.msgid.link/20240710053747.13223-1-chengen.du@canonical.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      26488172
    • Ronald Wahl's avatar
      net: ks8851: Fix potential TX stall after interface reopen · 7a99afef
      Ronald Wahl authored
      The amount of TX space in the hardware buffer is tracked in the tx_space
      variable. The initial value is currently only set during driver probing.
      
      After closing the interface and reopening it the tx_space variable has
      the last value it had before close. If it is smaller than the size of
      the first send packet after reopeing the interface the queue will be
      stopped. The queue is woken up after receiving a TX interrupt but this
      will never happen since we did not send anything.
      
      This commit moves the initialization of the tx_space variable to the
      ks8851_net_open function right before starting the TX queue. Also query
      the value from the hardware instead of using a hard coded value.
      
      Only the SPI chip variant is affected by this issue because only this
      driver variant actually depends on the tx_space variable in the xmit
      function.
      
      Fixes: 3dc5d445 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Simon Horman <horms@kernel.org>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org # 5.10+
      Signed-off-by: default avatarRonald Wahl <ronald.wahl@raritan.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://patch.msgid.link/20240709195845.9089-1-rwahl@gmx.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7a99afef
    • Kuniyuki Iwashima's avatar
      udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port(). · 5c0b485a
      Kuniyuki Iwashima authored
      syzkaller triggered the warning [0] in udp_v4_early_demux().
      
      In udp_v[46]_early_demux() and sk_lookup(), we do not touch the refcount
      of the looked-up sk and use sock_pfree() as skb->destructor, so we check
      SOCK_RCU_FREE to ensure that the sk is safe to access during the RCU grace
      period.
      
      Currently, SOCK_RCU_FREE is flagged for a bound socket after being put
      into the hash table.  Moreover, the SOCK_RCU_FREE check is done too early
      in udp_v[46]_early_demux() and sk_lookup(), so there could be a small race
      window:
      
        CPU1                                 CPU2
        ----                                 ----
        udp_v4_early_demux()                 udp_lib_get_port()
        |                                    |- hlist_add_head_rcu()
        |- sk = __udp4_lib_demux_lookup()    |
        |- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
                                             `- sock_set_flag(sk, SOCK_RCU_FREE)
      
      We had the same bug in TCP and fixed it in commit 871019b2 ("net:
      set SOCK_RCU_FREE before inserting socket into hashtable").
      
      Let's apply the same fix for UDP.
      
      [0]:
      WARNING: CPU: 0 PID: 11198 at net/ipv4/udp.c:2599 udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599
      Modules linked in:
      CPU: 0 PID: 11198 Comm: syz-executor.1 Not tainted 6.9.0-g93bda330 #13
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      RIP: 0010:udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599
      Code: c5 7a 15 fe bb 01 00 00 00 44 89 e9 31 ff d3 e3 81 e3 bf ef ff ff 89 de e8 2c 74 15 fe 85 db 0f 85 02 06 00 00 e8 9f 7a 15 fe <0f> 0b e8 98 7a 15 fe 49 8d 7e 60 e8 4f 39 2f fe 49 c7 46 60 20 52
      RSP: 0018:ffffc9000ce3fa58 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8318c92c
      RDX: ffff888036ccde00 RSI: ffffffff8318c2f1 RDI: 0000000000000001
      RBP: ffff88805a2dd6e0 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0001ffffffffffff R12: ffff88805a2dd680
      R13: 0000000000000007 R14: ffff88800923f900 R15: ffff88805456004e
      FS:  00007fc449127640(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fc449126e38 CR3: 000000003de4b002 CR4: 0000000000770ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
      PKRU: 55555554
      Call Trace:
       <TASK>
       ip_rcv_finish_core.constprop.0+0xbdd/0xd20 net/ipv4/ip_input.c:349
       ip_rcv_finish+0xda/0x150 net/ipv4/ip_input.c:447
       NF_HOOK include/linux/netfilter.h:314 [inline]
       NF_HOOK include/linux/netfilter.h:308 [inline]
       ip_rcv+0x16c/0x180 net/ipv4/ip_input.c:569
       __netif_receive_skb_one_core+0xb3/0xe0 net/core/dev.c:5624
       __netif_receive_skb+0x21/0xd0 net/core/dev.c:5738
       netif_receive_skb_internal net/core/dev.c:5824 [inline]
       netif_receive_skb+0x271/0x300 net/core/dev.c:5884
       tun_rx_batched drivers/net/tun.c:1549 [inline]
       tun_get_user+0x24db/0x2c50 drivers/net/tun.c:2002
       tun_chr_write_iter+0x107/0x1a0 drivers/net/tun.c:2048
       new_sync_write fs/read_write.c:497 [inline]
       vfs_write+0x76f/0x8d0 fs/read_write.c:590
       ksys_write+0xbf/0x190 fs/read_write.c:643
       __do_sys_write fs/read_write.c:655 [inline]
       __se_sys_write fs/read_write.c:652 [inline]
       __x64_sys_write+0x41/0x50 fs/read_write.c:652
       x64_sys_call+0xe66/0x1990 arch/x86/include/generated/asm/syscalls_64.h:2
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x4b/0x110 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x4b/0x53
      RIP: 0033:0x7fc44a68bc1f
      Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 cf f5 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 3c d0 f5 ff 48
      RSP: 002b:00007fc449126c90 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00000000004bc050 RCX: 00007fc44a68bc1f
      RDX: 0000000000000032 RSI: 00000000200000c0 RDI: 00000000000000c8
      RBP: 00000000004bc050 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000032 R11: 0000000000000293 R12: 0000000000000000
      R13: 000000000000000b R14: 00007fc44a5ec530 R15: 0000000000000000
       </TASK>
      
      Fixes: 6acc9b43 ("bpf: Add helper to retrieve socket in BPF")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20240709191356.24010-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5c0b485a
    • Florian Westphal's avatar
      netfilter: nf_tables: prefer nft_chain_validate · cff3bd01
      Florian Westphal authored
      nft_chain_validate already performs loop detection because a cycle will
      result in a call stack overflow (ctx->level >= NFT_JUMP_STACK_SIZE).
      
      It also follows maps via ->validate callback in nft_lookup, so there
      appears no reason to iterate the maps again.
      
      nf_tables_check_loops() and all its helper functions can be removed.
      This improves ruleset load time significantly, from 23s down to 12s.
      
      This also fixes a crash bug. Old loop detection code can result in
      unbounded recursion:
      
      BUG: TASK stack guard page was hit at ....
      Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN
      CPU: 4 PID: 1539 Comm: nft Not tainted 6.10.0-rc5+ #1
      [..]
      
      with a suitable ruleset during validation of register stores.
      
      I can't see any actual reason to attempt to check for this from
      nft_validate_register_store(), at this point the transaction is still in
      progress, so we don't have a full picture of the rule graph.
      
      For nf-next it might make sense to either remove it or make this depend
      on table->validate_state in case we could catch an error earlier
      (for improved error reporting to userspace).
      
      Fixes: 20a69341 ("netfilter: nf_tables: add netlink set API")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      cff3bd01
    • Florian Westphal's avatar
      netfilter: nfnetlink_queue: drop bogus WARN_ON · 631a4b3d
      Florian Westphal authored
      Happens when rules get flushed/deleted while packet is out, so remove
      this WARN_ON.
      
      This WARN exists in one form or another since v4.14, no need to backport
      this to older releases, hence use a more recent fixes tag.
      
      Fixes: 3f801968 ("netfilter: move nf_reinject into nfnetlink_queue modules")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202407081453.11ac0f63-lkp@intel.comSigned-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      631a4b3d
    • Oleksij Rempel's avatar
      ethtool: netlink: do not return SQI value if link is down · c184cf94
      Oleksij Rempel authored
      Do not attach SQI value if link is down. "SQI values are only valid if
      link-up condition is present" per OpenAlliance specification of
      100Base-T1 Interoperability Test suite [1]. The same rule would apply
      for other link types.
      
      [1] https://opensig.org/automotive-ethernet-specifications/#
      
      Fixes: 80660219 ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarWoojung Huh <woojung.huh@microchip.com>
      Link: https://patch.msgid.link/20240709061943.729381-1-o.rempel@pengutronix.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c184cf94
    • Dmitry Antipov's avatar
      ppp: reject claimed-as-LCP but actually malformed packets · f2aeb730
      Dmitry Antipov authored
      Since 'ppp_async_encode()' assumes valid LCP packets (with code
      from 1 to 7 inclusive), add 'ppp_check_packet()' to ensure that
      LCP packet has an actual body beyond PPP_LCP header bytes, and
      reject claimed-as-LCP but actually malformed data otherwise.
      
      Reported-by: syzbot+ec0723ba9605678b14bf@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=ec0723ba9605678b14bf
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarDmitry Antipov <dmantipov@yandex.ru>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f2aeb730
    • Kumar Kartikeya Dwivedi's avatar
      selftests/bpf: Add timer lockup selftest · 50bd5a0c
      Kumar Kartikeya Dwivedi authored
      Add a selftest that tries to trigger a situation where two timer callbacks
      are attempting to cancel each other's timer. By running them continuously,
      we hit a condition where both run in parallel and cancel each other.
      
      Without the fix in the previous patch, this would cause a lockup as
      hrtimer_cancel on either side will wait for forward progress from the
      callback.
      
      Ensure that this situation leads to a EDEADLK error.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20240711052709.2148616-1-memxor@gmail.com
      50bd5a0c
    • Jian Hui Lee's avatar
      net: ethernet: mtk-star-emac: set mac_managed_pm when probing · 8c6790b5
      Jian Hui Lee authored
      The below commit introduced a warning message when phy state is not in
      the states: PHY_HALTED, PHY_READY, and PHY_UP.
      commit 744d23c7 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
      
      mtk-star-emac doesn't need mdiobus suspend/resume. To fix the warning
      message during resume, indicate the phy resume/suspend is managed by the
      mac when probing.
      
      Fixes: 744d23c7 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
      Signed-off-by: default avatarJian Hui Lee <jianhui.lee@canonical.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://patch.msgid.link/20240708065210.4178980-1-jianhui.lee@canonical.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8c6790b5
    • Vitaly Lifshits's avatar
      e1000e: fix force smbus during suspend flow · 76a0a3f9
      Vitaly Lifshits authored
      Commit 861e8086 ("e1000e: move force SMBUS from enable ulp function
      to avoid PHY loss issue") resolved a PHY access loss during suspend on
      Meteor Lake consumer platforms, but it affected corporate systems
      incorrectly.
      
      A better fix, working for both consumer and corporate systems, was
      proposed in commit bfd546a5 ("e1000e: move force SMBUS near the end
      of enable_ulp function"). However, it introduced a regression on older
      devices, such as [8086:15B8], [8086:15F9], [8086:15BE].
      
      This patch aims to fix the secondary regression, by limiting the scope of
      the changes to Meteor Lake platforms only.
      
      Fixes: bfd546a5 ("e1000e: move force SMBUS near the end of enable_ulp function")
      Reported-by: default avatarTodd Brandt <todd.e.brandt@intel.com>
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218940Reported-by: default avatarDieter Mummenschanz <dmummenschanz@web.de>
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218936Signed-off-by: default avatarVitaly Lifshits <vitaly.lifshits@intel.com>
      Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240709203123.2103296-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      76a0a3f9
    • Eric Dumazet's avatar
      tcp: avoid too many retransmit packets · 97a90635
      Eric Dumazet authored
      If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
      retracted its window to zero, tcp_retransmit_timer() can
      retransmit a packet every two jiffies (2 ms for HZ=1000),
      for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.
      
      The fix is to make sure tcp_rtx_probe0_timed_out() takes
      icsk->icsk_user_timeout into account.
      
      Before blamed commit, the socket would not timeout after
      icsk->icsk_user_timeout, but would use standard exponential
      backoff for the retransmits.
      
      Also worth noting that before commit e89688e3 ("net: tcp:
      fix unexcepted socket die when snd_wnd is 0"), the issue
      would last 2 minutes instead of 4.
      
      Fixes: b701a99e ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarJason Xing <kerneljasonxing@gmail.com>
      Reviewed-by: default avatarJon Maxwell <jmaxwell37@gmail.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      97a90635
  3. 10 Jul, 2024 8 commits
    • Alexei Starovoitov's avatar
      Merge branch 'fixes-for-bpf-timer-lockup-and-uaf' · 0c237341
      Alexei Starovoitov authored
      Kumar Kartikeya Dwivedi says:
      
      ====================
      Fixes for BPF timer lockup and UAF
      
      The following patches contain fixes for timer lockups and a
      use-after-free scenario.
      
      This set proposes to fix the following lockup situation for BPF timers.
      
      CPU 1					CPU 2
      
      bpf_timer_cb				bpf_timer_cb
        timer_cb1				  timer_cb2
          bpf_timer_cancel(timer_cb2)		    bpf_timer_cancel(timer_cb1)
            hrtimer_cancel			      hrtimer_cancel
      
      In this case, both callbacks will continue waiting for each other to
      finish synchronously, causing a lockup.
      
      The proposed fix adds support for tracking in-flight cancellations
      *begun by other timer callbacks* for a particular BPF timer.  Whenever
      preparing to call hrtimer_cancel, a callback will increment the target
      timer's counter, then inspect its in-flight cancellations, and if
      non-zero, return -EDEADLK to avoid situations where the target timer's
      callback is waiting for its completion.
      
      This does mean that in cases where a callback is fired and cancelled, it
      will be unable to cancel any timers in that execution. This can be
      alleviated by maintaining the list of waiting callbacks in bpf_hrtimer
      and searching through it to avoid interdependencies, but this may
      introduce additional delays in bpf_timer_cancel, in addition to
      requiring extra state at runtime which may need to be allocated or
      reused from bpf_hrtimer storage. Moreover, extra synchronization is
      needed to delete these elements from the list of waiting callbacks once
      hrtimer_cancel has finished.
      
      The second patch is for a deadlock situation similar to above in
      bpf_timer_cancel_and_free, but also a UAF scenario that can occur if
      timer is armed before entering it, if hrtimer_running check causes the
      hrtimer_cancel call to be skipped.
      
      As seen above, synchronous hrtimer_cancel would lead to deadlock (if
      same callback tries to free its timer, or two timers free each other),
      therefore we queue work onto the global workqueue to ensure outstanding
      timers are cancelled before bpf_hrtimer state is freed.
      
      Further details are in the patches.
      ====================
      
      Link: https://lore.kernel.org/r/20240709185440.1104957-1-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0c237341
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Defer work in bpf_timer_cancel_and_free · a6fcd19d
      Kumar Kartikeya Dwivedi authored
      Currently, the same case as previous patch (two timer callbacks trying
      to cancel each other) can be invoked through bpf_map_update_elem as
      well, or more precisely, freeing map elements containing timers. Since
      this relies on hrtimer_cancel as well, it is prone to the same deadlock
      situation as the previous patch.
      
      It would be sufficient to use hrtimer_try_to_cancel to fix this problem,
      as the timer cannot be enqueued after async_cancel_and_free. Once
      async_cancel_and_free has been done, the timer must be reinitialized
      before it can be armed again. The callback running in parallel trying to
      arm the timer will fail, and freeing bpf_hrtimer without waiting is
      sufficient (given kfree_rcu), and bpf_timer_cb will return
      HRTIMER_NORESTART, preventing the timer from being rearmed again.
      
      However, there exists a UAF scenario where the callback arms the timer
      before entering this function, such that if cancellation fails (due to
      timer callback invoking this routine, or the target timer callback
      running concurrently). In such a case, if the timer expiration is
      significantly far in the future, the RCU grace period expiration
      happening before it will free the bpf_hrtimer state and along with it
      the struct hrtimer, that is enqueued.
      
      Hence, it is clear cancellation needs to occur after
      async_cancel_and_free, and yet it cannot be done inline due to deadlock
      issues. We thus modify bpf_timer_cancel_and_free to defer work to the
      global workqueue, adding a work_struct alongside rcu_head (both used at
      _different_ points of time, so can share space).
      
      Update existing code comments to reflect the new state of affairs.
      
      Fixes: b00628b1 ("bpf: Introduce bpf timers.")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20240709185440.1104957-3-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a6fcd19d
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Fail bpf_timer_cancel when callback is being cancelled · d4523831
      Kumar Kartikeya Dwivedi authored
      Given a schedule:
      
      timer1 cb			timer2 cb
      
      bpf_timer_cancel(timer2);	bpf_timer_cancel(timer1);
      
      Both bpf_timer_cancel calls would wait for the other callback to finish
      executing, introducing a lockup.
      
      Add an atomic_t count named 'cancelling' in bpf_hrtimer. This keeps
      track of all in-flight cancellation requests for a given BPF timer.
      Whenever cancelling a BPF timer, we must check if we have outstanding
      cancellation requests, and if so, we must fail the operation with an
      error (-EDEADLK) since cancellation is synchronous and waits for the
      callback to finish executing. This implies that we can enter a deadlock
      situation involving two or more timer callbacks executing in parallel
      and attempting to cancel one another.
      
      Note that we avoid incrementing the cancelling counter for the target
      timer (the one being cancelled) if bpf_timer_cancel is not invoked from
      a callback, to avoid spurious errors. The whole point of detecting
      cur->cancelling and returning -EDEADLK is to not enter a busy wait loop
      (which may or may not lead to a lockup). This does not apply in case the
      caller is in a non-callback context, the other side can continue to
      cancel as it sees fit without running into errors.
      
      Background on prior attempts:
      
      Earlier versions of this patch used a bool 'cancelling' bit and used the
      following pattern under timer->lock to publish cancellation status.
      
      lock(t->lock);
      t->cancelling = true;
      mb();
      if (cur->cancelling)
      	return -EDEADLK;
      unlock(t->lock);
      hrtimer_cancel(t->timer);
      t->cancelling = false;
      
      The store outside the critical section could overwrite a parallel
      requests t->cancelling assignment to true, to ensure the parallely
      executing callback observes its cancellation status.
      
      It would be necessary to clear this cancelling bit once hrtimer_cancel
      is done, but lack of serialization introduced races. Another option was
      explored where bpf_timer_start would clear the bit when (re)starting the
      timer under timer->lock. This would ensure serialized access to the
      cancelling bit, but may allow it to be cleared before in-flight
      hrtimer_cancel has finished executing, such that lockups can occur
      again.
      
      Thus, we choose an atomic counter to keep track of all outstanding
      cancellation requests and use it to prevent lockups in case callbacks
      attempt to cancel each other while executing in parallel.
      Reported-by: default avatarDohyun Kim <dohyunkim@google.com>
      Reported-by: default avatarNeel Natu <neelnatu@google.com>
      Fixes: b00628b1 ("bpf: Introduce bpf timers.")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20240709185440.1104957-2-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d4523831
    • Mohammad Shehar Yaar Tausif's avatar
      bpf: fix order of args in call to bpf_map_kvcalloc · af253aef
      Mohammad Shehar Yaar Tausif authored
      The original function call passed size of smap->bucket before the number of
      buckets which raises the error 'calloc-transposed-args' on compilation.
      
      Vlastimil Babka added:
      
      The order of parameters can be traced back all the way to 6ac99e8f
      ("bpf: Introduce bpf sk local storage") accross several refactorings,
      and that's why the commit is used as a Fixes: tag.
      
      In v6.10-rc1, a different commit 2c321f3f ("mm: change inlined
      allocation helpers to account at the call site") however exposed the
      order of args in a way that gcc-14 has enough visibility to start
      warning about it, because (in !CONFIG_MEMCG case) bpf_map_kvcalloc is
      then a macro alias for kvcalloc instead of a static inline wrapper.
      
      To sum up the warning happens when the following conditions are all met:
      
      - gcc-14 is used (didn't see it with gcc-13)
      - commit 2c321f3f is present
      - CONFIG_MEMCG is not enabled in .config
      - CONFIG_WERROR turns this from a compiler warning to error
      
      Fixes: 6ac99e8f ("bpf: Introduce bpf sk local storage")
      Reviewed-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Tested-by: default avatarChristian Kujau <lists@nerdbynature.de>
      Signed-off-by: default avatarMohammad Shehar Yaar Tausif <sheharyaar48@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Link: https://lore.kernel.org/r/20240710100521.15061-2-vbabka@suse.czSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      af253aef
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-07-10-13-19' of... · 9d9a2f29
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-07-10-13-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "21 hotfixes, 15 of which are cc:stable.
      
        No identifiable theme here - all are singleton patches, 19 are for MM"
      
      * tag 'mm-hotfixes-stable-2024-07-10-13-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
        mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
        mm/hugetlb: fix potential race in __update_and_free_hugetlb_folio()
        filemap: replace pte_offset_map() with pte_offset_map_nolock()
        arch/xtensa: always_inline get_current() and current_thread_info()
        sched.h: always_inline alloc_tag_{save|restore} to fix modpost warnings
        MAINTAINERS: mailmap: update Lorenzo Stoakes's email address
        mm: fix crashes from deferred split racing folio migration
        lib/build_OID_registry: avoid non-destructive substitution for Perl < 5.13.2 compat
        mm: gup: stop abusing try_grab_folio
        nilfs2: fix kernel bug on rename operation of broken directory
        mm/hugetlb_vmemmap: fix race with speculative PFN walkers
        cachestat: do not flush stats in recency check
        mm/shmem: disable PMD-sized page cache if needed
        mm/filemap: skip to create PMD-sized page cache if needed
        mm/readahead: limit page cache size in page_cache_ra_order()
        mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray
        mm/damon/core: merge regions aggressively when max_nr_regions is unmet
        Fix userfaultfd_api to return EINVAL as expected
        mm: vmalloc: check if a hash-index is in cpu_possible_mask
        mm: prevent derefencing NULL ptr in pfn_section_valid()
        ...
      9d9a2f29
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · ef2b7eb5
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "One core change that moves a disk start message to a location where it
        will only be printed once instead of twice plus a couple of error
        handling race fixes in the ufs driver"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: sd: Do not repeat the starting disk message
        scsi: ufs: core: Fix ufshcd_abort_one racing issue
        scsi: ufs: core: Fix ufshcd_clear_cmd racing issue
      ef2b7eb5
    • Linus Torvalds's avatar
      Merge tag 'vfio-v6.10' of https://github.com/awilliam/linux-vfio · d6e1712b
      Linus Torvalds authored
      Pull VFIO fix from Alex Williamson:
      
       - Recent stable backports are exposing a bug introduced in the v6.10
         development cycle where a counter value is uninitialized.  This leads
         to regressions in userspace drivers like QEMU where where the kernel
         might ask for an arbitrary buffer size or return out of memory itself
         based on a bogus value.  Zero initialize the counter.  (Yi Liu)
      
      * tag 'vfio-v6.10' of https://github.com/awilliam/linux-vfio:
        vfio/pci: Init the count variable in collecting hot-reset devices
      d6e1712b
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-07-10' of https://evilpiepirate.org/git/bcachefs · f6963ab4
      Linus Torvalds authored
      Pull bcachefs fixes from Kent Overstreet:
      
       - Switch some asserts to WARN()
      
       - Fix a few "transaction not locked" asserts in the data read retry
         paths and backpointers gc
      
       - Fix a race that would cause the journal to get stuck on a flush
         commit
      
       - Add missing fsck checks for the fragmentation LRU
      
       - The usual assorted ssorted syzbot fixes
      
      * tag 'bcachefs-2024-07-10' of https://evilpiepirate.org/git/bcachefs: (22 commits)
        bcachefs: Add missing bch2_trans_begin()
        bcachefs: Fix missing error check in journal_entry_btree_keys_validate()
        bcachefs: Warn on attempting a move with no replicas
        bcachefs: bch2_data_update_to_text()
        bcachefs: Log mount failure error code
        bcachefs: Fix undefined behaviour in eytzinger1_first()
        bcachefs: Mark bch_inode_info as SLAB_ACCOUNT
        bcachefs: Fix bch2_inode_insert() race path for tmpfiles
        closures: fix closure_sync + closure debugging
        bcachefs: Fix journal getting stuck on a flush commit
        bcachefs: io clock: run timer fns under clock lock
        bcachefs: Repair fragmentation_lru in alloc_write_key()
        bcachefs: add check for missing fragmentation in check_alloc_to_lru_ref()
        bcachefs: bch2_btree_write_buffer_maybe_flush()
        bcachefs: Add missing printbuf_tabstops_reset() calls
        bcachefs: Fix loop restart in bch2_btree_transactions_read()
        bcachefs: Fix bch2_read_retry_nodecode()
        bcachefs: Don't use the new_fs() bucket alloc path on an initialized fs
        bcachefs: Fix shift greater than integer size
        bcachefs: Change bch2_fs_journal_stop() BUG_ON() to warning
        ...
      f6963ab4