1. 26 Jul, 2024 13 commits
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2024-07-26' of https://gitlab.freedesktop.org/drm/kernel · 0ba9b155
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Fixes for rc1, mostly amdgpu, i915 and xe, with some other misc ones,
        doesn't seem to be anything too serious.
      
        amdgpu:
         - Bump driver version for GFX12 DCC
         - DC documention warning fixes
         - VCN unified queue power fix
         - SMU fix
         - RAS fix
         - Display corruption fix
         - SDMA 5.2 workaround
         - GFX12 fixes
         - Uninitialized variable fix
         - VCN/JPEG 4.0.3 fixes
         - Misc display fixes
         - RAS fixes
         - VCN4/5 harvest fix
         - GPU reset fix
      
        i915:
         - Reset intel_dp->link_trained before retraining the link
         - Don't switch the LTTPR mode on an active link
         - Do not consider preemption during execlists_dequeue for gen8
         - Allow NULL memory region
      
        xe:
         - xe_exec ioctl minor fix on sync entry cleanup upon error
         - SRIOV: limit VF LMEM provisioning
         - Wedge mode fixes
      
        v3d:
         - fix indirect dispatch on newer v3d revs
      
        panel:
         - fix panel backlight bindings"
      
      * tag 'drm-next-2024-07-26' of https://gitlab.freedesktop.org/drm/kernel: (39 commits)
        drm/amdgpu: reset vm state machine after gpu reset(vram lost)
        drm/amdgpu: add missed harvest check for VCN IP v4/v5
        drm/amdgpu: Fix eeprom max record count
        drm/amdgpu: fix ras UE error injection failure issue
        drm/amd/display: Remove ASSERT if significance is zero in math_ceil2
        drm/amd/display: Check for NULL pointer
        drm/amdgpu/vcn: Use offsets local to VCN/JPEG in VF
        drm/amdgpu: Add empty HDP flush function to VCN v4.0.3
        drm/amdgpu: Add empty HDP flush function to JPEG v4.0.3
        drm/amd/amdgpu: Fix uninitialized variable warnings
        drm/amdgpu: Fix atomics on GFX12
        drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell
        drm/i915: Allow NULL memory region
        drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8
        dt-bindings: display: panel: samsung,atna33xc20: Document ATNA45AF01
        drm/xe: Don't suspend device upon wedge
        drm/xe: Wedge the entire device
        drm/xe/pf: Limit fair VF LMEM provisioning
        drm/xe/exec: Fix minor bug related to xe_sync_entry_cleanup
        drm/amd/display: fix corruption with high refresh rates on DCN 3.0
        ...
      0ba9b155
    • Linus Torvalds's avatar
      Merge tag 's390-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 65ad409e
      Linus Torvalds authored
      Pull more s390 updates from Vasily Gorbik:
      
       - Fix KMSAN build breakage caused by the conflict between s390 and
         mm-stable trees
      
       - Add KMSAN page markers for ptdump
      
       - Add runtime constant support
      
       - Fix __pa/__va for modules under non-GPL licenses by exporting
         necessary vm_layout struct with EXPORT_SYMBOL to prevent linkage
         problems
      
       - Fix an endless loop in the CF_DIAG event stop in the CPU Measurement
         Counter Facility code when the counter set size is zero
      
       - Remove the PROTECTED_VIRTUALIZATION_GUEST config option and enable
         its functionality by default
      
       - Support allocation of multiple MSI interrupts per device and improve
         logging of architecture-specific limitations
      
       - Add support for lowcore relocation as a debugging feature to catch
         all null ptr dereferences in the kernel address space, improving
         detection beyond the current implementation's limited write access
         protection
      
       - Clean up and rework CPU alternatives to allow for callbacks and early
         patching for the lowcore relocation
      
      * tag 's390-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
        s390: Remove protvirt and kvm config guards for uv code
        s390/boot: Add cmdline option to relocate lowcore
        s390/kdump: Make kdump ready for lowcore relocation
        s390/entry: Make system_call() ready for lowcore relocation
        s390/entry: Make ret_from_fork() ready for lowcore relocation
        s390/entry: Make __switch_to() ready for lowcore relocation
        s390/entry: Make restart_int_handler() ready for lowcore relocation
        s390/entry: Make mchk_int_handler() ready for lowcore relocation
        s390/entry: Make int handlers ready for lowcore relocation
        s390/entry: Make pgm_check_handler() ready for lowcore relocation
        s390/entry: Add base register to CHECK_VMAP_STACK/CHECK_STACK macro
        s390/entry: Add base register to SIEEXIT macro
        s390/entry: Add base register to MBEAR macro
        s390/entry: Make __sie64a() ready for lowcore relocation
        s390/head64: Make startup code ready for lowcore relocation
        s390: Add infrastructure to patch lowcore accesses
        s390/atomic_ops: Disable flag outputs constraint for GCC versions below 14.2.0
        s390/entry: Move SIE indicator flag to thread info
        s390/nmi: Simplify ptregs setup
        s390/alternatives: Remove alternative facility list
        ...
      65ad409e
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · a6294b5b
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "The usual summary below, but the main fix is for the fast GUP lockless
        page-table walk when we have a combination of compile-time and
        run-time folding of the p4d and the pud respectively.
      
         - Remove some redundant Kconfig conditionals
      
         - Fix string output in ptrace selftest
      
         - Fix fast GUP crashes in some page-table configurations
      
         - Remove obsolete linker option when building the vDSO
      
         - Fix some sysreg field definitions for the GIC"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: mm: Fix lockless walks with static and dynamic page-table folding
        arm64/sysreg: Correct the values for GICv4.1
        arm64/vdso: Remove --hash-style=sysv
        kselftest: missing arg in ptrace.c
        arm64/Kconfig: Remove redundant 'if HAVE_FUNCTION_GRAPH_TRACER'
        arm64: remove redundant 'if HAVE_ARCH_KASAN' in Kconfig
      a6294b5b
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-6.11-rc1' of https://github.com/ceph/ceph-client · 6467dfdf
      Linus Torvalds authored
      Pull ceph updates from Ilya Dryomov:
       "A small patchset to address bogus I/O errors and ultimately an
        assertion failure in the face of watch errors with -o exclusive
        mappings in RBD marked for stable and some assorted CephFS fixes"
      
      * tag 'ceph-for-6.11-rc1' of https://github.com/ceph/ceph-client:
        rbd: don't assume rbd_is_lock_owner() for exclusive mappings
        rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappings
        rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait
        ceph: fix incorrect kmalloc size of pagevec mempool
        ceph: periodically flush the cap releases
        ceph: convert comma to semicolon in __ceph_dentry_dir_lease_touch()
        ceph: use cap_wait_list only if debugfs is enabled
      6467dfdf
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-6.11-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 732c2753
      Linus Torvalds authored
      Pull more erofs updates from Gao Xiang:
      
       - Support STATX_DIOALIGN and FS_IOC_GETFSSYSFSPATH
      
       - Fix a race of LZ4 decompression due to recent refactoring
      
       - Another multi-page folio adaption in erofs_bread()
      
      * tag 'erofs-for-6.11-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: convert comma to semicolon
        erofs: support multi-page folios for erofs_bread()
        erofs: add support for FS_IOC_GETFSSYSFSPATH
        erofs: fix race in z_erofs_get_gbuf()
        erofs: support STATX_DIOALIGN
      732c2753
    • Linus Torvalds's avatar
      Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · dd90ad50
      Linus Torvalds authored
      Pull struct file leak fixes from Al Viro:
       "a couple of leaks on failure exits missing fdput()"
      
      * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        lirc: rc_dev_get_from_fd(): fix file leak
        powerpc: fix a file leak in kvm_vcpu_ioctl_enable_cap()
      dd90ad50
    • Linus Torvalds's avatar
      arm64: allow installing compressed image by default · 4c7be57f
      Linus Torvalds authored
      On arm64 we build compressed images, but "make install" by default will
      install the old non-compressed one.  To actually get the compressed
      image install, you need to use "make zinstall", which is not the usual
      way to install a kernel.
      
      Which may not sound like much of an issue, but when you deal with
      multiple architectures (and years of your fingers knowing the regular
      "make install" incantation), this inconsistency is pretty annoying.
      
      But as Will Deacon says:
       "Sadly, bootloaders being as top quality as you might expect, I don't
        think we're in a position to rely on decompressor support across the
        board. Our Image.gz is literally just that -- we don't have a built-in
        decompressor (nor do I think we want to rush into that again after the
        fun we had on arm32) and the recent EFI zboot support solves that
        problem for platforms using EFI.
      
        Changing the default 'install' target terrifies me. There are bound to
        be folks with embedded boards who've scripted this and we could really
        ruin their day if we quietly give them a compressed kernel that their
        bootloader doesn't know how to handle :/"
      
      So make this conditional on a new "COMPRESSED_INSTALL" option.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4c7be57f
    • Linus Torvalds's avatar
      Merge tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux · 51c47675
      Linus Torvalds authored
      Pull bitmap updates from Yury Norov:
       "Random fixes"
      
      * tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux:
        riscv: Remove unnecessary int cast in variable_fls()
        radix tree test suite: put definition of bitmap_clear() into lib/bitmap.c
        bitops: Add a comment explaining the double underscore macros
        lib: bitmap: add missing MODULE_DESCRIPTION() macros
        cpumask: introduce assign_cpu() macro
      51c47675
    • Chen Ni's avatar
      erofs: convert comma to semicolon · 14e9283f
      Chen Ni authored
      Replace a comma between expression statements by a semicolon.
      Signed-off-by: default avatarChen Ni <nichen@iscas.ac.cn>
      Link: https://lore.kernel.org/r/20240724020721.2389738-1-nichen@iscas.ac.cnReviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      14e9283f
    • Gao Xiang's avatar
      erofs: support multi-page folios for erofs_bread() · 5d3bb77e
      Gao Xiang authored
      If the requested page is part of the previous multi-page folio, there
      is no need to call read_mapping_folio() again.
      
      Also, get rid of the remaining one of page->index [1] in our codebase.
      
      [1] https://lore.kernel.org/r/Zp8fgUSIBGQ1TN0D@casper.infradead.org
      
      Cc: Matthew Wilcox <willy@infradead.org>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20240723073024.875290-1-hsiangkao@linux.alibaba.com
      5d3bb77e
    • Huang Xiaojia's avatar
      erofs: add support for FS_IOC_GETFSSYSFSPATH · 684b290a
      Huang Xiaojia authored
      FS_IOC_GETFSSYSFSPATH ioctl exposes /sys/fs path of a given filesystem,
      potentially standarizing sysfs reporting. This patch add support for
      FS_IOC_GETFSSYSFSPATH for erofs, "erofs/<dev>" will be outputted for bdev
      cases, "erofs/[domain_id,]<fs_id>" will be outputted for fscache cases.
      Signed-off-by: default avatarHuang Xiaojia <huangxiaojia2@huawei.com>
      Link: https://lore.kernel.org/r/20240720082335.441563-1-huangxiaojia2@huawei.comReviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      684b290a
    • Gao Xiang's avatar
      erofs: fix race in z_erofs_get_gbuf() · 7dc5537c
      Gao Xiang authored
      In z_erofs_get_gbuf(), the current task may be migrated to another
      CPU between `z_erofs_gbuf_id()` and `spin_lock(&gbuf->lock)`.
      
      Therefore, z_erofs_put_gbuf() will trigger the following issue
      which was found by stress test:
      
      <2>[772156.434168] kernel BUG at fs/erofs/zutil.c:58!
      ..
      <4>[772156.435007]
      <4>[772156.439237] CPU: 0 PID: 3078 Comm: stress Kdump: loaded Tainted: G            E      6.10.0-rc7+ #2
      <4>[772156.439239] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.0.0 01/01/2017
      <4>[772156.439241] pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
      <4>[772156.439243] pc : z_erofs_put_gbuf+0x64/0x70 [erofs]
      <4>[772156.439252] lr : z_erofs_lz4_decompress+0x600/0x6a0 [erofs]
      ..
      <6>[772156.445958] stress (3127): drop_caches: 1
      <4>[772156.446120] Call trace:
      <4>[772156.446121]  z_erofs_put_gbuf+0x64/0x70 [erofs]
      <4>[772156.446761]  z_erofs_lz4_decompress+0x600/0x6a0 [erofs]
      <4>[772156.446897]  z_erofs_decompress_queue+0x740/0xa10 [erofs]
      <4>[772156.447036]  z_erofs_runqueue+0x428/0x8c0 [erofs]
      <4>[772156.447160]  z_erofs_readahead+0x224/0x390 [erofs]
      ..
      
      Fixes: f36f3010 ("erofs: rename per-CPU buffers to global buffer pool and make it configurable")
      Cc: <stable@vger.kernel.org> # 6.10+
      Reviewed-by: default avatarChunhai Guo <guochunhai@vivo.com>
      Reviewed-by: default avatarSandeep Dhavale <dhavale@google.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20240722035110.3456740-1-hsiangkao@linux.alibaba.com
      7dc5537c
    • Hongbo Li's avatar
      erofs: support STATX_DIOALIGN · 9c421ef3
      Hongbo Li authored
      Add support for STATX_DIOALIGN to EROFS, so that direct I/O
      alignment restrictions are exposed to userspace in a generic
      way.
      
      [Before]
      ```
      ./statx_test /mnt/erofs/testfile
      statx(/mnt/erofs/testfile) = 0
      dio mem align:0
      dio offset align:0
      ```
      
      [After]
      ```
      ./statx_test /mnt/erofs/testfile
      statx(/mnt/erofs/testfile) = 0
      dio mem align:512
      dio offset align:512
      ```
      Signed-off-by: default avatarHongbo Li <lihongbo22@huawei.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20240718083243.2485437-1-hsiangkao@linux.alibaba.com
      9c421ef3
  2. 25 Jul, 2024 27 commits
    • Dave Airlie's avatar
      Merge tag 'amd-drm-fixes-6.11-2024-07-25' of... · d4ef5d2b
      Dave Airlie authored
      Merge tag 'amd-drm-fixes-6.11-2024-07-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
      
      amd-drm-fixes-6.11-2024-07-25:
      
      amdgpu:
      - SDMA 5.2 workaround
      - GFX12 fixes
      - Uninitialized variable fix
      - VCN/JPEG 4.0.3 fixes
      - Misc display fixes
      - RAS fixes
      - VCN4/5 harvest fix
      - GPU reset fix
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Alex Deucher <alexander.deucher@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240725202900.2155572-1-alexander.deucher@amd.com
      d4ef5d2b
    • Dave Airlie's avatar
      Merge tag 'drm-misc-next-fixes-2024-07-25' of... · 86f259cb
      Dave Airlie authored
      Merge tag 'drm-misc-next-fixes-2024-07-25' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
      
      A single fix for a panel compatible
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Maxime Ripard <mripard@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240725-frisky-wren-of-tact-f5f504@houat
      86f259cb
    • Dave Airlie's avatar
      Merge tag 'drm-intel-next-fixes-2024-07-25' of... · a37cd98c
      Dave Airlie authored
      Merge tag 'drm-intel-next-fixes-2024-07-25' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
      
      - Do not consider preemption during execlists_dequeue for gen8 [gt] (Nitin Gote)
      - Allow NULL memory region (Jonathan Cavitt)
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Tvrtko Ursulin <tursulin@igalia.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/ZqICQzyzm/6hDWy4@linux
      a37cd98c
    • Linus Torvalds's avatar
      Merge tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 1722389b
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf and netfilter.
      
        A lot of networking people were at a conference last week, busy
        catching COVID, so relatively short PR.
      
        Current release - regressions:
      
         - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP
      
        Current release - new code bugs:
      
         - l2tp: protect session IDR and tunnel session list with one lock,
           make sure the state is coherent to avoid a warning
      
         - eth: bnxt_en: update xdp_rxq_info in queue restart logic
      
         - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field
      
        Previous releases - regressions:
      
         - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
           the field reuses previously un-validated pad
      
        Previous releases - always broken:
      
         - tap/tun: drop short frames to prevent crashes later in the stack
      
         - eth: ice: add a per-VF limit on number of FDIR filters
      
         - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash"
      
      * tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
        tun: add missing verification for short frame
        tap: add missing verification for short frame
        mISDN: Fix a use after free in hfcmulti_tx()
        gve: Fix an edge case for TSO skb validity check
        bnxt_en: update xdp_rxq_info in queue restart logic
        tcp: process the 3rd ACK with sk_socket for TFO/MPTCP
        selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
        xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
        bpf: Fix a segment issue when downgrading gso_size
        net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling
        MAINTAINERS: make Breno the netconsole maintainer
        MAINTAINERS: Update bonding entry
        net: nexthop: Initialize all fields in dumped nexthops
        net: stmmac: Correct byte order of perfect_match
        selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
        tipc: Return non-zero value from tipc_udp_addr2str() on error
        netfilter: nft_set_pipapo_avx2: disable softinterrupts
        ice: Fix recipe read procedure
        ice: Add a per-VF limit on number of FDIR filters
        net: bonding: correctly annotate RCU in bond_should_notify_peers()
        ...
      1722389b
    • Linus Torvalds's avatar
      Merge tag 'printk-for-6.11-trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux · 8bf10009
      Linus Torvalds authored
      Pull printk updates from Petr Mladek:
      
       - trivial printk changes
      
      The bigger "real" printk work is still being discussed.
      
      * tag 'printk-for-6.11-trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
        vsprintf: add missing MODULE_DESCRIPTION() macro
        printk: Rename console_replay_all() and update context
      8bf10009
    • Linus Torvalds's avatar
      Merge tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl · b4856250
      Linus Torvalds authored
      Pull sysctl constification from Joel Granados:
       "Treewide constification of the ctl_table argument of proc_handlers
        using a coccinelle script and some manual code formatting fixups.
      
        This is a prerequisite to moving the static ctl_table structs into
        read-only data section which will ensure that proc_handler function
        pointers cannot be modified"
      
      * tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
        sysctl: treewide: constify the ctl_table argument of proc_handlers
      b4856250
    • Linus Torvalds's avatar
      Merge tag 'efi-fixes-for-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · bba959f4
      Linus Torvalds authored
      Pull EFI fixes from Ard Biesheuvel:
      
       - Wipe screen_info after allocating it from the heap - used by arm32
         and EFI zboot, other EFI architectures allocate it statically
      
       - Revert to allocating boot_params from the heap on x86 when entering
         via the native PE entrypoint, to work around a regression on older
         Dell hardware
      
      * tag 'efi-fixes-for-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        x86/efistub: Revert to heap allocated boot_params for PE entrypoint
        efi/libstub: Zero initialize heap allocated struct screen_info
      bba959f4
    • Linus Torvalds's avatar
      Merge tag 'kgdb-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux · 9b219936
      Linus Torvalds authored
      Pull kgdb updates from Daniel Thompson:
       "Three small changes this cycle:
      
         - Clean up an architecture abstraction that is no longer needed
           because all the architectures have converged.
      
         - Actually use the prompt argument to kdb_position_cursor() instead
           of ignoring it (functionally this fix is a nop but that was due to
           luck rather than good judgement)
      
         - Fix a -Wformat-security warning"
      
      * tag 'kgdb-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
        kdb: Get rid of redundant kdb_curr_task()
        kdb: Use the passed prompt in kdb_position_cursor()
        kdb: address -Wformat-security warnings
      9b219936
    • Linus Torvalds's avatar
      Merge tag 'mips_6.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 28e7241c
      Linus Torvalds authored
      Pull MIPS updates from Thomas Bogendoerfer:
      
       - Use improved timer sync for Loongson64
      
       - Fix address of GCR_ACCESS register
      
       - Add missing MODULE_DESCRIPTION
      
      * tag 'mips_6.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        mips: sibyte: add missing MODULE_DESCRIPTION() macro
        MIPS: SMP-CPS: Fix address for GCR_ACCESS register for CM3 and later
        MIPS: Loongson64: Switch to SYNC_R4K
      28e7241c
    • Linus Torvalds's avatar
      Merge tag 'parisc-for-6.11-rc1' of... · f6464295
      Linus Torvalds authored
      Merge tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
      
      Pull parisc updates from Helge Deller:
       "The gettimeofday() and clock_gettime() syscalls are now available as
        vDSO functions, and Dave added a patch which allows to use NVMe cards
        in the PCI slots as fast and easy alternative to SCSI discs.
      
        Summary:
      
         - add gettimeofday() and clock_gettime() vDSO functions
      
         - enable PCI_MSI_ARCH_FALLBACKS to allow PCI to PCIe bridge adaptor
           with PCIe NVME card to function in parisc machines
      
         - allow users to reduce kernel unaligned runtime warnings
      
         - minor code cleanups"
      
      * tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN
        parisc: Use max() to calculate parisc_tlb_flush_threshold
        parisc: Fix warning at drivers/pci/msi/msi.h:121
        parisc: Add 64-bit gettimeofday() and clock_gettime() vDSO functions
        parisc: Add 32-bit gettimeofday() and clock_gettime() vDSO functions
        parisc: Clean up unistd.h file
      f6464295
    • Linus Torvalds's avatar
      Merge tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux · f9bcc61a
      Linus Torvalds authored
      Pull UML updates from Richard Weinberger:
      
       - Support for preemption
      
       - i386 Rust support
      
       - Huge cleanup by Benjamin Berg
      
       - UBSAN support
      
       - Removal of dead code
      
      * tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (41 commits)
        um: vector: always reset vp->opened
        um: vector: remove vp->lock
        um: register power-off handler
        um: line: always fill *error_out in setup_one_line()
        um: remove pcap driver from documentation
        um: Enable preemption in UML
        um: refactor TLB update handling
        um: simplify and consolidate TLB updates
        um: remove force_flush_all from fork_handler
        um: Do not flush MM in flush_thread
        um: Delay flushing syscalls until the thread is restarted
        um: remove copy_context_skas0
        um: remove LDT support
        um: compress memory related stub syscalls while adding them
        um: Rework syscall handling
        um: Add generic stub_syscall6 function
        um: Create signal stack memory assignment in stub_data
        um: Remove stub-data.h include from common-offsets.h
        um: time-travel: fix signal blocking race/hang
        um: time-travel: remove time_exit()
        ...
      f9bcc61a
    • Linus Torvalds's avatar
      Merge tag 'driver-core-6.11-rc1' of... · c2a96b7f
      Linus Torvalds authored
      Merge tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core updates from Greg KH:
       "Here is the big set of driver core changes for 6.11-rc1.
      
        Lots of stuff in here, with not a huge diffstat, but apis are evolving
        which required lots of files to be touched. Highlights of the changes
        in here are:
      
         - platform remove callback api final fixups (Uwe took many releases
           to get here, finally!)
      
         - Rust bindings for basic firmware apis and initial driver-core
           interactions.
      
           It's not all that useful for a "write a whole driver in rust" type
           of thing, but the firmware bindings do help out the phy rust
           drivers, and the driver core bindings give a solid base on which
           others can start their work.
      
           There is still a long way to go here before we have a multitude of
           rust drivers being added, but it's a great first step.
      
         - driver core const api changes.
      
           This reached across all bus types, and there are some fix-ups for
           some not-common bus types that linux-next and 0-day testing shook
           out.
      
           This work is being done to help make the rust bindings more safe,
           as well as the C code, moving toward the end-goal of allowing us to
           put driver structures into read-only memory. We aren't there yet,
           but are getting closer.
      
         - minor devres cleanups and fixes found by code inspection
      
         - arch_topology minor changes
      
         - other minor driver core cleanups
      
        All of these have been in linux-next for a very long time with no
        reported problems"
      
      * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
        ARM: sa1100: make match function take a const pointer
        sysfs/cpu: Make crash_hotplug attribute world-readable
        dio: Have dio_bus_match() callback take a const *
        zorro: make match function take a const pointer
        driver core: module: make module_[add|remove]_driver take a const *
        driver core: make driver_find_device() take a const *
        driver core: make driver_[create|remove]_file take a const *
        firmware_loader: fix soundness issue in `request_internal`
        firmware_loader: annotate doctests as `no_run`
        devres: Correct code style for functions that return a pointer type
        devres: Initialize an uninitialized struct member
        devres: Fix memory leakage caused by driver API devm_free_percpu()
        devres: Fix devm_krealloc() wasting memory
        driver core: platform: Switch to use kmemdup_array()
        driver core: have match() callback in struct bus_type take a const *
        MAINTAINERS: add Rust device abstractions to DRIVER CORE
        device: rust: improve safety comments
        MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer
        MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER
        firmware: rust: improve safety comments
        ...
      c2a96b7f
    • Linus Torvalds's avatar
      Merge tag 'linux-watchdog-6.11-rc1' of git://www.linux-watchdog.org/linux-watchdog · b2eed733
      Linus Torvalds authored
      Pull watchdog updates from Wim Van Sebroeck:
      
       - make watchdog_class const
      
       - rework of the rzg2l_wdt driver
      
       - other small fixes and improvements
      
      * tag 'linux-watchdog-6.11-rc1' of git://www.linux-watchdog.org/linux-watchdog:
        dt-bindings: watchdog: dlg,da9062-watchdog: Drop blank space
        watchdog: rzn1: Convert comma to semicolon
        watchdog: lenovo_se10_wdt: Convert comma to semicolon
        dt-bindings: watchdog: renesas,wdt: Document RZ/G3S support
        watchdog: rzg2l_wdt: Add suspend/resume support
        watchdog: rzg2l_wdt: Rely on the reset driver for doing proper reset
        watchdog: rzg2l_wdt: Remove comparison with zero
        watchdog: rzg2l_wdt: Remove reset de-assert from probe
        watchdog: rzg2l_wdt: Check return status of pm_runtime_put()
        watchdog: rzg2l_wdt: Use pm_runtime_resume_and_get()
        watchdog: rzg2l_wdt: Make the driver depend on PM
        watchdog: rzg2l_wdt: Restrict the driver to ARCH_RZG2L and ARCH_R9A09G011
        watchdog: imx7ulp_wdt: keep already running watchdog enabled
        watchdog: starfive: Add missing clk_disable_unprepare()
        watchdog: Make watchdog_class const
      b2eed733
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-6.11-2024-07-24' of git://git.infradead.org/users/hch/dma-mapping · 9cf601e8
      Linus Torvalds authored
      Pull dma-mapping fix from Christoph Hellwig:
      
       - fix the order of actions in dmam_free_coherent (Lance Richardson)
      
      * tag 'dma-mapping-6.11-2024-07-24' of git://git.infradead.org/users/hch/dma-mapping:
        dma: fix call order in dmam_free_coherent
      9cf601e8
    • Jakub Kicinski's avatar
      Merge branch 'tap-tun-harden-by-dropping-short-frame' · af65ea42
      Jakub Kicinski authored
      Dongli Zhang says:
      
      ====================
      tap/tun: harden by dropping short frame
      
      This is to harden all of tap/tun to avoid any short frame smaller than the
      Ethernet header (ETH_HLEN).
      
      While the xen-netback already rejects short frame smaller than ETH_HLEN ...
      
       914 static void xenvif_tx_build_gops(struct xenvif_queue *queue,
       915                                      int budget,
       916                                      unsigned *copy_ops,
       917                                      unsigned *map_ops)
       918 {
      ... ...
      1007                 if (unlikely(txreq.size < ETH_HLEN)) {
      1008                         netdev_dbg(queue->vif->dev,
      1009                                    "Bad packet size: %d\n", txreq.size);
      1010                         xenvif_tx_err(queue, &txreq, extra_count, idx);
      1011                         break;
      1012                 }
      
      ... the short frame may not be dropped by vhost-net/tap/tun.
      
      This fixes CVE-2024-41090 and CVE-2024-41091.
      ====================
      
      Link: https://patch.msgid.link/20240724170452.16837-1-dongli.zhang@oracle.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      af65ea42
    • Dongli Zhang's avatar
      tun: add missing verification for short frame · 04958480
      Dongli Zhang authored
      The cited commit missed to check against the validity of the frame length
      in the tun_xdp_one() path, which could cause a corrupted skb to be sent
      downstack. Even before the skb is transmitted, the
      tun_xdp_one-->eth_type_trans() may access the Ethernet header although it
      can be less than ETH_HLEN. Once transmitted, this could either cause
      out-of-bound access beyond the actual length, or confuse the underlayer
      with incorrect or inconsistent header length in the skb metadata.
      
      In the alternative path, tun_get_user() already prohibits short frame which
      has the length less than Ethernet header size from being transmitted for
      IFF_TAP.
      
      This is to drop any frame shorter than the Ethernet header size just like
      how tun_get_user() does.
      
      CVE: CVE-2024-41091
      Inspired-by: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/
      Fixes: 043d222f ("tuntap: accept an array of XDP buffs through sendmsg()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: default avatarSi-Wei Liu <si-wei.liu@oracle.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://patch.msgid.link/20240724170452.16837-3-dongli.zhang@oracle.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      04958480
    • Si-Wei Liu's avatar
      tap: add missing verification for short frame · ed7f2afd
      Si-Wei Liu authored
      The cited commit missed to check against the validity of the frame length
      in the tap_get_user_xdp() path, which could cause a corrupted skb to be
      sent downstack. Even before the skb is transmitted, the
      tap_get_user_xdp()-->skb_set_network_header() may assume the size is more
      than ETH_HLEN. Once transmitted, this could either cause out-of-bound
      access beyond the actual length, or confuse the underlayer with incorrect
      or inconsistent header length in the skb metadata.
      
      In the alternative path, tap_get_user() already prohibits short frame which
      has the length less than Ethernet header size from being transmitted.
      
      This is to drop any frame shorter than the Ethernet header size just like
      how tap_get_user() does.
      
      CVE: CVE-2024-41090
      Link: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/
      Fixes: 0efac277 ("tap: accept an array of XDP buffs through sendmsg()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSi-Wei Liu <si-wei.liu@oracle.com>
      Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://patch.msgid.link/20240724170452.16837-2-dongli.zhang@oracle.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ed7f2afd
    • Dan Carpenter's avatar
      mISDN: Fix a use after free in hfcmulti_tx() · 61ab7514
      Dan Carpenter authored
      Don't dereference *sp after calling dev_kfree_skb(*sp).
      
      Fixes: af69fb3a ("Add mISDN HFC multiport driver")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/8be65f5a-c2dd-4ba0-8a10-bfe5980b8cfb@stanley.mountainSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      61ab7514
    • Bailey Forrest's avatar
      gve: Fix an edge case for TSO skb validity check · 36e3b949
      Bailey Forrest authored
      The NIC requires each TSO segment to not span more than 10
      descriptors. NIC further requires each descriptor to not exceed
      16KB - 1 (GVE_TX_MAX_BUF_SIZE_DQO).
      
      The descriptors for an skb are generated by
      gve_tx_add_skb_no_copy_dqo() for DQO RDA queue format.
      gve_tx_add_skb_no_copy_dqo() loops through each skb frag and
      generates a descriptor for the entire frag if the frag size is
      not greater than GVE_TX_MAX_BUF_SIZE_DQO. If the frag size is
      greater than GVE_TX_MAX_BUF_SIZE_DQO, it is split into descriptor(s)
      of size GVE_TX_MAX_BUF_SIZE_DQO and a descriptor is generated for
      the remainder (frag size % GVE_TX_MAX_BUF_SIZE_DQO).
      
      gve_can_send_tso() checks if the descriptors thus generated for an
      skb would meet the requirement that each TSO-segment not span more
      than 10 descriptors. However, the current code misses an edge case
      when a TSO segment spans multiple descriptors within a large frag.
      This change fixes the edge case.
      
      gve_can_send_tso() relies on the assumption that max gso size (9728)
      is less than GVE_TX_MAX_BUF_SIZE_DQO and therefore within an skb
      fragment a TSO segment can never span more than 2 descriptors.
      
      Fixes: a57e5de4 ("gve: DQO: Add TX path")
      Signed-off-by: default avatarPraveen Kaligineedi <pkaligineedi@google.com>
      Signed-off-by: default avatarBailey Forrest <bcf@google.com>
      Reviewed-by: default avatarJeroen de Borst <jeroendb@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://patch.msgid.link/20240724143431.3343722-1-pkaligineedi@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      36e3b949
    • Taehee Yoo's avatar
      bnxt_en: update xdp_rxq_info in queue restart logic · b537633c
      Taehee Yoo authored
      When the netdev_rx_queue_restart() restarts queues, the bnxt_en driver
      updates(creates and deletes) a page_pool.
      But it doesn't update xdp_rxq_info, so the xdp_rxq_info is still
      connected to an old page_pool.
      So, bnxt_rx_ring_info->page_pool indicates a new page_pool, but
      bnxt_rx_ring_info->xdp_rxq is still connected to an old page_pool.
      
      An old page_pool is no longer used so it is supposed to be
      deleted by page_pool_destroy() but it isn't.
      Because the xdp_rxq_info is holding the reference count for it and the
      xdp_rxq_info is not updated, an old page_pool will not be deleted in
      the queue restart logic.
      
      Before restarting 1 queue:
      ./tools/net/ynl/samples/page-pool
      enp10s0f1np1[6] page pools: 4 (zombies: 0)
      	refs: 8192 bytes: 33554432 (refs: 0 bytes: 0)
      	recycling: 0.0% (alloc: 128:8048 recycle: 0:0)
      
      After restarting 1 queue:
      ./tools/net/ynl/samples/page-pool
      enp10s0f1np1[6] page pools: 5 (zombies: 0)
      	refs: 10240 bytes: 41943040 (refs: 0 bytes: 0)
      	recycling: 20.0% (alloc: 160:10080 recycle: 1920:128)
      
      Before restarting queues, an interface has 4 page_pools.
      After restarting one queue, an interface has 5 page_pools, but it
      should be 4, not 5.
      The reason is that queue restarting logic creates a new page_pool and
      an old page_pool is not deleted due to the absence of an update of
      xdp_rxq_info logic.
      
      Fixes: 2d694c27 ("bnxt_en: implement netdev_queue_mgmt_ops")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarDavid Wei <dw@davidwei.uk>
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Link: https://patch.msgid.link/20240721053554.1233549-1-ap420073@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b537633c
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · f7578df9
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2024-07-25
      
      We've added 14 non-merge commits during the last 8 day(s) which contain
      a total of 19 files changed, 177 insertions(+), 70 deletions(-).
      
      The main changes are:
      
      1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and
         BPF sockhash. Also add test coverage for this case, from Michal Luczaj.
      
      2) Fix a segmentation issue when downgrading gso_size in the BPF helper
         bpf_skb_adjust_room(), from Fred Li.
      
      3) Fix a compiler warning in resolve_btfids due to a missing type cast,
         from Liwei Song.
      
      4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte
         boundary in the fexit_sleep BPF selftest, from Puranjay Mohan.
      
      5) Fix a xsk regression to require a flag when actuating tx_metadata_len,
         from Stanislav Fomichev.
      
      6) Fix function prototype BTF dumping in libbpf for prototypes that have
         no input arguments, from Andrii Nakryiko.
      
      7) Fix stacktrace symbol resolution in perf script for BPF programs
         containing subprograms, from Hou Tao.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
        xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
        bpf: Fix a segment issue when downgrading gso_size
        tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids
        bpf, events: Use prog to emit ksymbol event for main program
        selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB
        selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags
        selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected()
        af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash
        bpftool: Fix typo in usage help
        libbpf: Fix no-args func prototype BTF dumping syntax
        MAINTAINERS: Update powerpc BPF JIT maintainers
        MAINTAINERS: Update email address of Naveen
        selftests/bpf: fexit_sleep: Fix stack allocation for arm64
      ====================
      
      Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f7578df9
    • Will Deacon's avatar
      arm64: mm: Fix lockless walks with static and dynamic page-table folding · 36639013
      Will Deacon authored
      Lina reports random oopsen originating from the fast GUP code when
      16K pages are used with 4-level page-tables, the fourth level being
      folded at runtime due to lack of LPA2.
      
      In this configuration, the generic implementation of
      p4d_offset_lockless() will return a 'p4d_t *' corresponding to the
      'pgd_t' allocated on the stack of the caller, gup_fast_pgd_range().
      This is normally fine, but when the fourth level of page-table is folded
      at runtime, pud_offset_lockless() will offset from the address of the
      'p4d_t' to calculate the address of the PUD in the same page-table page.
      This results in a stray stack read when the 'p4d_t' has been allocated
      on the stack and can send the walker into the weeds.
      
      Fix the problem by providing our own definition of p4d_offset_lockless()
      when CONFIG_PGTABLE_LEVELS <= 4 which returns the real page-table
      pointer rather than the address of the local stack variable.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/50360968-13fb-4e6f-8f52-1725b3177215@asahilina.net
      Fixes: 0dd4f60a ("arm64: mm: Add support for folding PUDs at runtime")
      Reported-by: default avatarAsahi Lina <lina@asahilina.net>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Link: https://lore.kernel.org/r/20240725090345.28461-1-will@kernel.orgSigned-off-by: default avatarWill Deacon <will@kernel.org>
      36639013
    • Matthieu Baerts (NGI0)'s avatar
      tcp: process the 3rd ACK with sk_socket for TFO/MPTCP · c1668292
      Matthieu Baerts (NGI0) authored
      The 'Fixes' commit recently changed the behaviour of TCP by skipping the
      processing of the 3rd ACK when a sk->sk_socket is set. The goal was to
      skip tcp_ack_snd_check() in tcp_rcv_state_process() not to send an
      unnecessary ACK in case of simultaneous connect(). Unfortunately, that
      had an impact on TFO and MPTCP.
      
      I started to look at the impact on MPTCP, because the MPTCP CI found
      some issues with the MPTCP Packetdrill tests [1]. Then Paolo Abeni
      suggested me to look at the impact on TFO with "plain" TCP.
      
      For MPTCP, when receiving the 3rd ACK of a request adding a new path
      (MP_JOIN), sk->sk_socket will be set, and point to the MPTCP sock that
      has been created when the MPTCP connection got established before with
      the first path. The newly added 'goto' will then skip the processing of
      the segment text (step 7) and not go through tcp_data_queue() where the
      MPTCP options are validated, and some actions are triggered, e.g.
      sending the MPJ 4th ACK [2] as demonstrated by the new errors when
      running a packetdrill test [3] establishing a second subflow.
      
      This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be
      delayed. Still, we don't want to have this behaviour as it delays the
      switch to the fully established mode, and invalid MPTCP options in this
      3rd ACK will not be caught any more. This modification also affects the
      MPTCP + TFO feature as well, and being the reason why the selftests
      started to be unstable the last few days [4].
      
      For TFO, the existing 'basic-cookie-not-reqd' test [5] was no longer
      passing: if the 3rd ACK contains data, and the connection is accept()ed
      before receiving them, these data would no longer be processed, and thus
      not ACKed.
      
      One last thing about MPTCP, in case of simultaneous connect(), a
      fallback to TCP will be done, which seems fine:
      
        `../common/defaults.sh`
      
         0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3
        +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
      
        +0 > S  0:0(0)                 <mss 1460, sackOK, TS val 100 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
        +0 < S  0:0(0) win 1000        <mss 1460, sackOK, TS val 407 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
        +0 > S. 0:0(0) ack 1           <mss 1460, sackOK, TS val 330 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
        +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]>
        +0 >  . 1:1(0) ack 1           <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1>
      
      Simultaneous SYN-data crossing is also not supported by TFO, see [6].
      
      Kuniyuki Iwashima suggested to restrict the processing to SYN+ACK only:
      that's a more generic solution than the one initially proposed, and
      also enough to fix the issues described above.
      
      Later on, Eric Dumazet mentioned that an ACK should still be sent in
      reaction to the second SYN+ACK that is received: not sending a DUPACK
      here seems wrong and could hurt:
      
         0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
        +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
      
        +0 > S  0:0(0)                <mss 1460, sackOK, TS val 1000 ecr 0,nop,wscale 8>
        +0 < S  0:0(0)       win 1000 <mss 1000, sackOK, nop, nop>
        +0 > S. 0:0(0) ack 1          <mss 1460, sackOK, TS val 3308134035 ecr 0,nop,wscale 8>
        +0 < S. 0:0(0) ack 1 win 1000 <mss 1000, sackOK, nop, nop>
        +0 >  . 1:1(0) ack 1          <nop, nop, sack 0:1>  // <== Here
      
      So in this version, the 'goto consume' is dropped, to always send an ACK
      when switching from TCP_SYN_RECV to TCP_ESTABLISHED. This ACK will be
      seen as a DUPACK -- with DSACK if SACK has been negotiated -- in case of
      simultaneous SYN crossing: that's what is expected here.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9936227696 [1]
      Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [2]
      Link: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/syscalls/accept.pkt#L28 [3]
      Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh [4]
      Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/server/basic-cookie-not-reqd.pkt#L21 [5]
      Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/client/simultaneous-fast-open.pkt [6]
      Fixes: 23e89e8e ("tcp: Don't drop SYN+ACK for simultaneous connect().")
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Suggested-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20240724-upstream-net-next-20240716-tcp-3rd-ack-consume-sk_socket-v3-1-d48339764ce9@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c1668292
    • Ilya Dryomov's avatar
      rbd: don't assume rbd_is_lock_owner() for exclusive mappings · 3ceccb14
      Ilya Dryomov authored
      Expanding on the previous commit, assuming that rbd_is_lock_owner()
      always returns true (i.e. that we are either in RBD_LOCK_STATE_LOCKED
      or RBD_LOCK_STATE_QUIESCING) if the mapping is exclusive is wrong too.
      In case ceph_cls_set_cookie() fails, the lock would be temporarily
      released even if the mapping is exclusive, meaning that we can end up
      even in RBD_LOCK_STATE_UNLOCKED.
      
      IOW, exclusive mappings are really "just" about disabling automatic
      lock transitions (as documented in the man page), not about grabbing
      the lock and holding on to it whatever it takes.
      
      Cc: stable@vger.kernel.org
      Fixes: 637cd060 ("rbd: new exclusive lock wait/wake code")
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarDongsheng Yang <dongsheng.yang@easystack.cn>
      3ceccb14
    • Ilya Dryomov's avatar
      rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappings · 2237ceb7
      Ilya Dryomov authored
      Every time a watch is reestablished after getting lost, we need to
      update the cookie which involves quiescing exclusive lock.  For this,
      we transition from RBD_LOCK_STATE_LOCKED to RBD_LOCK_STATE_QUIESCING
      roughly for the duration of rbd_reacquire_lock() call.  If the mapping
      is exclusive and I/O happens to arrive in this time window, it's failed
      with EROFS (later translated to EIO) based on the wrong assumption in
      rbd_img_exclusive_lock() -- "lock got released?" check there stopped
      making sense with commit a2b1da09 ("rbd: lock should be quiesced on
      reacquire").
      
      To make it worse, any such I/O is added to the acquiring list before
      EROFS is returned and this sets up for violating rbd_lock_del_request()
      precondition that the request is either on the running list or not on
      any list at all -- see commit ded080c8 ("rbd: don't move requests
      to the running list on errors").  rbd_lock_del_request() ends up
      processing these requests as if they were on the running list which
      screws up quiescing_wait completion counter and ultimately leads to
      
          rbd_assert(!completion_done(&rbd_dev->quiescing_wait));
      
      being triggered on the next watch error.
      
      Cc: stable@vger.kernel.org # 06ef84c4e9c4: rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait
      Cc: stable@vger.kernel.org
      Fixes: 637cd060 ("rbd: new exclusive lock wait/wake code")
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarDongsheng Yang <dongsheng.yang@easystack.cn>
      2237ceb7
    • Ilya Dryomov's avatar
      rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait · f5c466a0
      Ilya Dryomov authored
      ... to RBD_LOCK_STATE_QUIESCING and quiescing_wait to recognize that
      this state and the associated completion are backing rbd_quiesce_lock(),
      which isn't specific to releasing the lock.
      
      While exclusive lock does get quiesced before it's released, it also
      gets quiesced before an attempt to update the cookie is made and there
      the lock is not released as long as ceph_cls_set_cookie() succeeds.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarDongsheng Yang <dongsheng.yang@easystack.cn>
      f5c466a0
    • Stanislav Fomichev's avatar
      selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test · 9b9969c4
      Stanislav Fomichev authored
      This flag is now required to use tx_metadata_len.
      
      Fixes: 40808a23 ("selftests/bpf: Add TX side to xdp_metadata")
      Reported-by: default avatarJulian Schindel <mail@arctic-alpaca.de>
      Signed-off-by: default avatarStanislav Fomichev <sdf@fomichev.me>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Link: https://lore.kernel.org/bpf/20240713015253.121248-3-sdf@fomichev.me
      9b9969c4