1. 23 Jan, 2023 1 commit
  2. 19 Jan, 2023 1 commit
  3. 15 Jan, 2023 4 commits
  4. 14 Jan, 2023 7 commits
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 7c698440
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - Core: Fix an iommu-group refcount leak
      
       - Fix overflow issue in IOVA alloc path
      
       - ARM-SMMU fixes from Will:
          - Fix VFIO regression on NXP SoCs by reporting IOMMU_CAP_CACHE_COHERENCY
          - Fix SMMU shutdown paths to avoid device unregistration race
      
       - Error handling fix for Mediatek IOMMU driver
      
      * tag 'iommu-fixes-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe()
        iommu/iova: Fix alloc iova overflows issue
        iommu: Fix refcount leak in iommu_device_claim_dma_owner
        iommu/arm-smmu-v3: Don't unregister on shutdown
        iommu/arm-smmu: Don't unregister on shutdown
        iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY even betterer
      7c698440
    • Linus Torvalds's avatar
      Merge tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock · 4f43ade4
      Linus Torvalds authored
      Pull memblock fix from Mike Rapoport:
       "memblock: always release pages to the buddy allocator in
        memblock_free_late()
      
        If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
        only releases pages to the buddy allocator if they are not in the
        deferred range. This is correct for free pages (as defined by
        for_each_free_mem_pfn_range_in_zone()) because free pages in the
        deferred range will be initialized and released as part of the
        deferred init process.
      
        memblock_free_pages() is called by memblock_free_late(), which is used
        to free reserved ranges after memblock_free_all() has run. All pages
        in reserved ranges have been initialized at that point, and
        accordingly, those pages are not touched by the deferred init process.
      
        This means that currently, if the pages that memblock_free_late()
        intends to release are in the deferred range, they will never be
        released to the buddy allocator. They will forever be reserved.
      
        In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
        which is also correct for free pages but is not correct for reserved
        pages. KMSAN metadata for reserved pages is initialized by
        kmsan_init_shadow(), which runs shortly before memblock_free_all().
      
        For both of these reasons, memblock_free_pages() should only be called
        for free pages, and memblock_free_late() should call
        __free_pages_core() directly instead.
      
        One case where this issue can occur in the wild is EFI boot on x86_64.
        The x86 EFI code reserves all EFI boot services memory ranges via
        memblock_reserve() and frees them later via memblock_free_late()
        (efi_reserve_boot_services() and efi_free_boot_services(),
        respectively).
      
        If any of those ranges happens to fall within the deferred init range,
        the pages will not be released and that memory will be unavailable.
      
        For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:
      
          v6.2-rc2:
          Node 0, zone      DMA
                spanned  4095
                present  3999
                managed  3840
          Node 0, zone    DMA32
                spanned  246652
                present  245868
                managed  178867
      
          v6.2-rc2 + patch:
          Node 0, zone      DMA
                spanned  4095
                present  3999
                managed  3840
          Node 0, zone    DMA32
                spanned  246652
                present  245868
                managed  222816   # +43,949 pages"
      
      * tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
        mm: Always release pages to the buddy allocator in memblock_free_late().
      4f43ade4
    • Linus Torvalds's avatar
      Merge tag 'hardening-v6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 880ca43e
      Linus Torvalds authored
      Pull kernel hardening fixes from Kees Cook:
      
       - Fix CFI hash randomization with KASAN (Sami Tolvanen)
      
       - Check size of coreboot table entry and use flex-array
      
      * tag 'hardening-v6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        kbuild: Fix CFI hash randomization with KASAN
        firmware: coreboot: Check size of table entry and use flex-array
      880ca43e
    • Linus Torvalds's avatar
      Merge tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux · 8b7be52f
      Linus Torvalds authored
      Pull module fix from Luis Chamberlain:
       "Just one fix for modules by Nick"
      
      * tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
        kallsyms: Fix scheduling with interrupts disabled in self-test
      8b7be52f
    • Linus Torvalds's avatar
      Merge tag '6.2-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · b35ad63e
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
      
       - memory leak and double free fix
      
       - two symlink fixes
      
       - minor cleanup fix
      
       - two smb1 fixes
      
      * tag '6.2-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Fix uninitialized memory read for smb311 posix symlink create
        cifs: fix potential memory leaks in session setup
        cifs: do not query ifaces on smb1 mounts
        cifs: fix double free on failed kerberos auth
        cifs: remove redundant assignment to the variable match
        cifs: fix file info setting in cifs_open_file()
        cifs: fix file info setting in cifs_query_path_info()
      b35ad63e
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 8e768130
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two minor fixes in the hisi_sas driver which only impact enterprise
        style multi-expander and shared disk situations and no core changes"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: hisi_sas: Set a port invalid only if there are no devices attached when refreshing port id
        scsi: hisi_sas: Use abort task set to reset SAS disks when discovered
      8e768130
    • Linus Torvalds's avatar
      Merge tag 'ata-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 34cbf89a
      Linus Torvalds authored
      Pull ATA fix from Damien Le Moal:
       "A single fix to prevent building the pata_cs5535 driver with user mode
        linux as it uses msr operations that are not defined with UML"
      
      * tag 'ata-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata: pata_cs5535: Don't build on UML
      34cbf89a
  5. 13 Jan, 2023 27 commits
    • Linus Torvalds's avatar
      Merge tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linux · 97ec4d55
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Nothing major in here, just a collection of NVMe fixes and dropping a
        wrong might_sleep() that static checkers tripped over but which isn't
        valid"
      
      * tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linux:
        MAINTAINERS: stop nvme matching for nvmem files
        nvme: don't allow unprivileged passthrough on partitions
        nvme: replace the "bool vec" arguments with flags in the ioctl path
        nvme: remove __nvme_ioctl
        nvme-pci: fix error handling in nvme_pci_enable()
        nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers
        nvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression
        block: Drop spurious might_sleep() from blk_put_queue()
      97ec4d55
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.2-2023-01-13' of git://git.kernel.dk/linux · 2ce7592d
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "A fix for a regression that happened last week, rest is fixes that
        will be headed to stable as well. In detail:
      
         - Fix for a regression added with the leak fix from last week (me)
      
         - In writing a test case for that leak, inadvertently discovered a
           case where we a poll request can race. So fix that up and mark it
           for stable, and also ensure that fdinfo covers both the poll tables
           that we have. The latter was an oversight when the split poll table
           were added (me)
      
         - Fix for a lockdep reported issue with IOPOLL (Pavel)"
      
      * tag 'io_uring-6.2-2023-01-13' of git://git.kernel.dk/linux:
        io_uring: lock overflowing for IOPOLL
        io_uring/poll: attempt request issue after racy poll wakeup
        io_uring/fdinfo: include locked hash table in fdinfo output
        io_uring/poll: add hash if ready poll request can't complete inline
        io_uring/io-wq: only free worker if it was allocated for creation
      2ce7592d
    • Linus Torvalds's avatar
      Merge tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 9e058c29
      Linus Torvalds authored
      Pull pci fixes from Bjorn Helgaas:
      
       - Work around apparent firmware issue that made Linux reject MMCONFIG
         space, which broke PCI extended config space (Bjorn Helgaas)
      
       - Fix CONFIG_PCIE_BT1 dependency due to mid-air collision between a
         PCI_MSI_IRQ_DOMAIN -> PCI_MSI change and addition of PCIE_BT1 (Lukas
         Bulwahn)
      
      * tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
        x86/pci: Simplify is_mmconf_reserved() messages
        PCI: dwc: Adjust to recent removal of PCI_MSI_IRQ_DOMAIN
      9e058c29
    • Sami Tolvanen's avatar
      kbuild: Fix CFI hash randomization with KASAN · 42633ed8
      Sami Tolvanen authored
      Clang emits a asan.module_ctor constructor to each object file
      when KASAN is enabled, and these functions are indirectly called
      in do_ctors. With CONFIG_CFI_CLANG, the compiler also emits a CFI
      type hash before each address-taken global function so they can
      pass indirect call checks.
      
      However, in commit 0c3e806e ("x86/cfi: Add boot time hash
      randomization"), x86 implemented boot time hash randomization,
      which relies on the .cfi_sites section generated by objtool. As
      objtool is run against vmlinux.o instead of individual object
      files with X86_KERNEL_IBT (enabled by default), CFI types in
      object files that are not part of vmlinux.o end up not being
      included in .cfi_sites, and thus won't get randomized and trip
      CFI when called.
      
      Only .vmlinux.export.o and init/version-timestamp.o are linked
      into vmlinux separately from vmlinux.o. As these files don't
      contain any functions, disable KASAN for both of them to avoid
      breaking hash randomization.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/1742
      Fixes: 0c3e806e ("x86/cfi: Add boot time hash randomization")
      Signed-off-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20230112224948.1479453-2-samitolvanen@google.com
      42633ed8
    • Kees Cook's avatar
      firmware: coreboot: Check size of table entry and use flex-array · 3b293487
      Kees Cook authored
      The memcpy() of the data following a coreboot_table_entry couldn't
      be evaluated by the compiler under CONFIG_FORTIFY_SOURCE. To make it
      easier to reason about, add an explicit flexible array member to struct
      coreboot_device so the entire entry can be copied at once. Additionally,
      validate the sizes before copying. Avoids this run-time false positive
      warning:
      
        memcpy: detected field-spanning write (size 168) of single field "&device->entry" at drivers/firmware/google/coreboot_table.c:103 (size 8)
      Reported-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Link: https://lore.kernel.org/all/03ae2704-8c30-f9f0-215b-7cdf4ad35a9a@molgen.mpg.de/
      Cc: Jack Rosenthal <jrosenth@chromium.org>
      Cc: Guenter Roeck <groeck@chromium.org>
      Cc: Julius Werner <jwerner@chromium.org>
      Cc: Brian Norris <briannorris@chromium.org>
      Cc: Stephen Boyd <swboyd@chromium.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarJulius Werner <jwerner@chromium.org>
      Reviewed-by: default avatarGuenter Roeck <groeck@chromium.org>
      Link: https://lore.kernel.org/r/20230107031406.gonna.761-kees@kernel.orgReviewed-by: default avatarStephen Boyd <swboyd@chromium.org>
      Reviewed-by: default avatarJack Rosenthal <jrosenth@chromium.org>
      Link: https://lore.kernel.org/r/20230112230312.give.446-kees@kernel.org
      3b293487
    • Nicholas Piggin's avatar
      kallsyms: Fix scheduling with interrupts disabled in self-test · da35048f
      Nicholas Piggin authored
      kallsyms_on_each* may schedule so must not be called with interrupts
      disabled. The iteration function could disable interrupts, but this
      also changes lookup_symbol() to match the change to the other timing
      code.
      Reported-by: default avatarErhard F. <erhard_f@mailbox.org>
      Link: https://lore.kernel.org/all/bug-216902-206035@https.bugzilla.kernel.org%2F/Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Link: https://lore.kernel.org/oe-lkp/202212251728.8d0872ff-oliver.sang@intel.com
      Fixes: 30f3bb09 ("kallsyms: Add self-test facility")
      Tested-by: default avatar"Erhard F." <erhard_f@mailbox.org>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      da35048f
    • Peter Foley's avatar
      ata: pata_cs5535: Don't build on UML · 22eebaa6
      Peter Foley authored
      This driver uses MSR functions that aren't implemented under UML.
      Avoid building it to prevent tripping up allyesconfig.
      
      e.g.
      /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x3a3): undefined reference to `__tracepoint_read_msr'
      /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x3d2): undefined reference to `__tracepoint_write_msr'
      /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x457): undefined reference to `__tracepoint_write_msr'
      /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x481): undefined reference to `do_trace_write_msr'
      /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x4d5): undefined reference to `do_trace_write_msr'
      /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x4f5): undefined reference to `do_trace_read_msr'
      /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x51c): undefined reference to `do_trace_write_msr'
      Signed-off-by: default avatarPeter Foley <pefoley2@pefoley.com>
      Reviewed-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      22eebaa6
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 92783a90
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "ARM:
      
         - Fix the PMCR_EL0 reset value after the PMU rework
      
         - Correctly handle S2 fault triggered by a S1 page table walk by not
           always classifying it as a write, as this breaks on R/O memslots
      
         - Document why we cannot exit with KVM_EXIT_MMIO when taking a write
           fault from a S1 PTW on a R/O memslot
      
         - Put the Apple M2 on the naughty list for not being able to
           correctly implement the vgic SEIS feature, just like the M1 before
           it
      
         - Reviewer updates: Alex is stepping down, replaced by Zenghui
      
        x86:
      
         - Fix various rare locking issues in Xen emulation and teach lockdep
           to detect them
      
         - Documentation improvements
      
         - Do not return host topology information from KVM_GET_SUPPORTED_CPUID"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock
        KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule
        KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest()
        KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking
        Documentation: kvm: fix SRCU locking order docs
        KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID
        KVM: nSVM: clarify recalc_intercepts() wrt CR8
        MAINTAINERS: Remove myself as a KVM/arm64 reviewer
        MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer
        KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations
        KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*
        KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
        KVM: arm64: Fix S1PTW handling on RO memslots
        KVM: arm64: PMU: Fix PMCR_EL0 reset value
      92783a90
    • Mateusz Guzik's avatar
      lockref: stop doing cpu_relax in the cmpxchg loop · f5fe24ef
      Mateusz Guzik authored
      On the x86-64 architecture even a failing cmpxchg grants exclusive
      access to the cacheline, making it preferable to retry the failed op
      immediately instead of stalling with the pause instruction.
      
      To illustrate the impact, below are benchmark results obtained by
      running various will-it-scale tests on top of the 6.2-rc3 kernel and
      Cascade Lake (2 sockets * 24 cores * 2 threads) CPU.
      
      All results in ops/s.  Note there is some variance in re-runs, but the
      code is consistently faster when contention is present.
      
        open3 ("Same file open/close"):
        proc          stock       no-pause
           1         805603         814942       (+%1)
           2        1054980        1054781       (-0%)
           8        1544802        1822858      (+18%)
          24        1191064        2199665      (+84%)
          48         851582        1469860      (+72%)
          96         609481        1427170     (+134%)
      
        fstat2 ("Same file fstat"):
        proc          stock       no-pause
           1        3013872        3047636       (+1%)
           2        4284687        4400421       (+2%)
           8        3257721        5530156      (+69%)
          24        2239819        5466127     (+144%)
          48        1701072        5256609     (+209%)
          96        1269157        6649326     (+423%)
      
      Additionally, a kernel with a private patch to help access() scalability:
      access2 ("Same file access"):
      
        proc          stock        patched      patched
                                               +nopause
          24        2378041        2005501      5370335  (-15% / +125%)
      
      That is, fixing the problems in access itself *reduces* scalability
      after the cacheline ping-pong only happens in lockref with the pause
      instruction.
      
      Note that fstat and access benchmarks are not currently integrated into
      will-it-scale, but interested parties can find them in pull requests to
      said project.
      
      Code at hand has a rather tortured history.  First modification showed
      up in commit d472d9d9 ("lockref: Relax in cmpxchg loop"), written
      with Itanium in mind.  Later it got patched up to use an arch-dependent
      macro to stop doing it on s390 where it caused a significant regression.
      Said macro had undergone revisions and was ultimately eliminated later,
      going back to cpu_relax.
      
      While I intended to only remove cpu_relax for x86-64, I got the
      following comment from Linus:
      
          I would actually prefer just removing it entirely and see if
          somebody else hollers. You have the numbers to prove it hurts on
          real hardware, and I don't think we have any numbers to the
          contrary.
      
          So I think it's better to trust the numbers and remove it as a
          failure, than say "let's just remove it on x86-64 and leave
          everybody else with the potentially broken code"
      
      Additionally, Will Deacon (maintainer of the arm64 port, one of the
      architectures previously benchmarked):
      
          So, from the arm64 side of the fence, I'm perfectly happy just
          removing the cpu_relax() calls from lockref.
      
      As such, come back full circle in history and whack it altogether.
      Signed-off-by: default avatarMateusz Guzik <mjguzik@gmail.com>
      Link: https://lore.kernel.org/all/CAGudoHHx0Nqg6DE70zAVA75eV-HXfWyhVMWZ-aSeOofkA_=WdA@mail.gmail.com/
      Acked-by: Tony Luck <tony.luck@intel.com> # ia64
      Acked-by: Nicholas Piggin <npiggin@gmail.com> # powerpc
      Acked-by: Will Deacon <will@kernel.org> # arm64
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f5fe24ef
    • Bjorn Helgaas's avatar
      x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space · fd3a8cff
      Bjorn Helgaas authored
      Normally we reject ECAM space unless it is reported as reserved in the E820
      table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2).
      
      07eab090 ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes
      E820 entries that correspond to EfiMemoryMappedIO regions because some
      other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the
      E820 entries prevent Linux from allocating BAR space for hot-added devices.
      
      Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does
      mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is
      normally converted to an E820 entry by a bootloader or EFI stub.  After
      07eab090, that E820 entry is removed, so we reject this ECAM space,
      which makes PCI extended config space (offsets 0x100-0xfff) inaccessible.
      
      The lack of extended config space breaks anything that relies on it,
      including perf, VSEC telemetry, EDAC, QAT, SR-IOV, etc.
      
      Allow use of ECAM for extended config space when the region is covered by
      an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02
      _CRS.
      
      Link: https://lore.kernel.org/r/ac2693d8-8ba3-72e0-5b66-b3ae008d539d@linux.intel.com
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216891
      Fixes: 07eab090 ("efi/x86: Remove EfiMemoryMappedIO from E820 map")
      Link: https://lore.kernel.org/r/20230110180243.1590045-3-helgaas@kernel.orgReported-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Reported-by: default avatarTony Luck <tony.luck@intel.com>
      Reported-by: default avatarGiovanni Cabiddu <giovanni.cabiddu@intel.com>
      Reported-by: default avatarYunying Sun <yunying.sun@intel.com>
      Reported-by: default avatarBaowen Zheng <baowen.zheng@corigine.com>
      Reported-by: default avatarZhenzhong Duan <zhenzhong.duan@intel.com>
      Reported-by: default avatarYang Lixiao <lixiao.yang@intel.com>
      Tested-by: default avatarTony Luck <tony.luck@intel.com>
      Tested-by: default avatarGiovanni Cabiddu <giovanni.cabiddu@intel.com>
      Tested-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Tested-by: default avatarYunying Sun <yunying.sun@intel.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael@kernel.org>
      fd3a8cff
    • Linus Torvalds's avatar
      Merge tag 'efi-fixes-for-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · 0bf913e0
      Linus Torvalds authored
      Pull EFI fixes from Ard Biesheuvel:
      
       - avoid a potential crash on the efi_subsys_init() error path
      
       - use more appropriate error code for runtime services calls issued
         after a crash in the firmware occurred
      
       - avoid READ_ONCE() for accessing firmware tables that may appear
         misaligned in memory
      
      * tag 'efi-fixes-for-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        efi: tpm: Avoid READ_ONCE() for accessing the event log
        efi: rt-wrapper: Add missing include
        efi: fix userspace infinite retry read efivars after EFI runtime services page fault
        efi: fix NULL-deref in init error path
      0bf913e0
    • Linus Torvalds's avatar
      Merge tag 'docs-6.2-fixes' of git://git.lwn.net/linux · 40d92fc4
      Linus Torvalds authored
      Pull documentation fixes from Jonathan Corbet:
       "Three documentation fixes (or rather two and one warning):
      
         - Sphinx 6.0 broke our configuration mechanism, so fix it
      
         - I broke our configuration for non-Alabaster themes; Akira fixed it
      
         - Deprecate Sphinx < 2.4 with an eye toward future removal"
      
      * tag 'docs-6.2-fixes' of git://git.lwn.net/linux:
        docs/conf.py: Use about.html only in sidebar of alabaster theme
        docs: Deprecate use of Sphinx < 2.4.x
        docs: Fix the docs build with Sphinx 6.0
      40d92fc4
    • Ard Biesheuvel's avatar
      efi: tpm: Avoid READ_ONCE() for accessing the event log · d3f45053
      Ard Biesheuvel authored
      Nathan reports that recent kernels built with LTO will crash when doing
      EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a
      misaligned load from the TPM event log, which is annotated with
      READ_ONCE(), and under LTO, this gets translated into a LDAR instruction
      which does not tolerate misaligned accesses.
      
      Interestingly, this does not happen when booting the same kernel
      straight from the UEFI shell, and so the fact that the event log may
      appear misaligned in memory may be caused by a bug in GRUB or SHIM.
      
      However, using READ_ONCE() to access firmware tables is slightly unusual
      in any case, and here, we only need to ensure that 'event' is not
      dereferenced again after it gets unmapped, but this is already taken
      care of by the implicit barrier() semantics of the early_memunmap()
      call.
      
      Cc: <stable@vger.kernel.org>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Jarkko Sakkinen <jarkko@kernel.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Link: https://github.com/ClangBuiltLinux/linux/issues/1782Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      d3f45053
    • Pavel Begunkov's avatar
      io_uring: lock overflowing for IOPOLL · 544d163d
      Pavel Begunkov authored
      syzbot reports an issue with overflow filling for IOPOLL:
      
      WARNING: CPU: 0 PID: 28 at io_uring/io_uring.c:734 io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734
      CPU: 0 PID: 28 Comm: kworker/u4:1 Not tainted 6.2.0-rc3-syzkaller-16369-g358a161a6a9e #0
      Workqueue: events_unbound io_ring_exit_work
      Call trace:
       io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734
       io_req_cqe_overflow+0x5c/0x70 io_uring/io_uring.c:773
       io_fill_cqe_req io_uring/io_uring.h:168 [inline]
       io_do_iopoll+0x474/0x62c io_uring/rw.c:1065
       io_iopoll_try_reap_events+0x6c/0x108 io_uring/io_uring.c:1513
       io_uring_try_cancel_requests+0x13c/0x258 io_uring/io_uring.c:3056
       io_ring_exit_work+0xec/0x390 io_uring/io_uring.c:2869
       process_one_work+0x2d8/0x504 kernel/workqueue.c:2289
       worker_thread+0x340/0x610 kernel/workqueue.c:2436
       kthread+0x12c/0x158 kernel/kthread.c:376
       ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:863
      
      There is no real problem for normal IOPOLL as flush is also called with
      uring_lock taken, but it's getting more complicated for IOPOLL|SQPOLL,
      for which __io_cqring_overflow_flush() happens from the CQ waiting path.
      
      Reported-and-tested-by: syzbot+6805087452d72929404e@syzkaller.appspotmail.com
      Cc: stable@vger.kernel.org # 5.10+
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      544d163d
    • Linus Torvalds's avatar
      Merge tag 'sound-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 689968db
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This became a slightly big update, but it's more or less expected, as
        the first batch after holidays.
      
        All changes (but for the last two last-minute fixes) have been stewed
        in linux-next long enough, so it's fairly safe to take:
      
         - PCM UAF fix in 32bit compat layer
      
         - ASoC board-specific fixes for Intel, AMD, Medathek, Qualcomm
      
         - SOF power management fixes
      
         - ASoC Intel link failure fixes
      
         - A series of fixes for USB-audio regressions
      
         - CS35L41 HD-audio codec regression fixes
      
         - HD-audio device-specific fixes / quirks
      
        Note that one SPI patch has been taken in ASoC subtree mistakenly, and
        the same fix is found in spi tree, but it should be OK to apply"
      
      * tag 'sound-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (39 commits)
        ALSA: pcm: Move rwsem lock inside snd_ctl_elem_read to prevent UAF
        ALSA: usb-audio: Fix possible NULL pointer dereference in snd_usb_pcm_has_fixed_rate()
        ALSA: hda/realtek: Enable mute/micmute LEDs on HP Spectre x360 13-aw0xxx
        ASoC: fsl-asoc-card: Fix naming of AC'97 CODEC widgets
        ASoC: fsl_ssi: Rename AC'97 streams to avoid collisions with AC'97 CODEC
        ALSA: hda/hdmi: Add a HP device 0x8715 to force connect list
        ALSA: control-led: use strscpy in set_led_id()
        ALSA: usb-audio: Always initialize fixed_rate in snd_usb_find_implicit_fb_sync_format()
        ASoC: dt-bindings: qcom,lpass-tx-macro: correct clocks on SC7280
        ASoC: dt-bindings: qcom,lpass-wsa-macro: correct clocks on SM8250
        ASoC: qcom: Fix building APQ8016 machine driver without SOUNDWIRE
        ALSA: hda: cs35l41: Check runtime suspend capability at runtime_idle
        ALSA: hda: cs35l41: Don't return -EINVAL from system suspend/resume
        ASoC: fsl_micfil: Correct the number of steps on SX controls
        ALSA: hda/realtek: fix mute/micmute LEDs don't work for a HP platform
        Revert "ALSA: usb-audio: Drop superfluous interface setup at parsing"
        ALSA: usb-audio: More refactoring of hw constraint rules
        ALSA: usb-audio: Relax hw constraints for implicit fb sync
        ALSA: usb-audio: Make sure to stop endpoints before closing EPs
        ALSA: hda - Enable headset mic on another Dell laptop with ALC3254
        ...
      689968db
    • Linus Torvalds's avatar
      Merge tag 'pm-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · d863f053
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix assorted issues in the ARM cpufreq drivers and in the AMD
        P-state driver.
      
        Specifics:
      
         - Fix cpufreq policy reference counting in amd-pstate to prevent it
           from crashing on removal (Perry Yuan)
      
         - Fix double initialization and set suspend-freq for Apple's cpufreq
           driver (Arnd Bergmann, Hector Martin)
      
         - Fix reading of "reg" property, update cpufreq-dt's blocklist and
           update DT documentation for Qualcomm's cpufreq driver (Konrad
           Dybcio, Krzysztof Kozlowski)
      
         - Replace 0 with NULL in the Armada cpufreq driver (Miles Chen)
      
         - Fix potential overflows in the CPPC cpufreq driver (Pierre Gondois)
      
         - Update blocklist for the Tegra234 Soc cpufreq driver (Sumit Gupta)"
      
      * tag 'pm-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: amd-pstate: fix kernel hang issue while amd-pstate unregistering
        cpufreq: armada-37xx: stop using 0 as NULL pointer
        cpufreq: apple-soc: Switch to the lowest frequency on suspend
        dt-bindings: cpufreq: cpufreq-qcom-hw: document interrupts
        cpufreq: Add SM6375 to cpufreq-dt-platdev blocklist
        cpufreq: Add Tegra234 to cpufreq-dt-platdev blocklist
        cpufreq: qcom-hw: Fix reading "reg" with address/size-cells != 2
        cpufreq: CPPC: Add u64 casts to avoid overflowing
        cpufreq: apple: remove duplicate intializer
      d863f053
    • Linus Torvalds's avatar
      Merge tag 'acpi-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · cdbbca25
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These add one more ACPI IRQ override quirk, improve ACPI companion
        lookup for backlight devices and add missing kernel command line
        option values for backlight detection.
      
        Specifics:
      
         - Improve ACPI companion lookup for backlight devices in the cases
           when there is more than one candidate ACPI device object (Hans de
           Goede)
      
         - Add missing support for manual selection of NVidia-WMI-EC or Apple
           GMUX backlight in the kernel command line to the ACPI backlight
           driver (Hans de Goede)
      
         - Skip ACPI IRQ override on Asus Expertbook B2402CBA (Tamim Khan)"
      
      * tag 'acpi-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: Fix selecting wrong ACPI fwnode for the iGPU on some Dell laptops
        ACPI: video: Allow selecting NVidia-WMI-EC or Apple GMUX backlight from the cmdline
        ACPI: resource: Skip IRQ override on Asus Expertbook B2402CBA
      cdbbca25
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v6.2-2' of... · 0d0833e0
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fixes from Hans de Goede:
       "A set of assorted fixes and hardware-id additions"
      
      * tag 'platform-drivers-x86-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/x86: thinkpad_acpi: Fix profile mode display in AMT mode
        platform/x86: int3472/discrete: Ensure the clk/power enable pins are in output mode
        platform/x86/amd: Fix refcount leak in amd_pmc_probe
        platform/x86: intel/pmc/core: Add Meteor Lake mobile support
        platform/x86: simatic-ipc: add another model
        platform/x86: simatic-ipc: correct name of a model
        platform/x86: dell-privacy: Only register SW_CAMERA_LENS_COVER if present
        platform/x86: dell-privacy: Fix SW_CAMERA_LENS_COVER reporting
        platform/x86: asus-wmi: Don't load fan curves without fan
        platform/x86: asus-wmi: Ignore fan on E410MA
        platform/x86: asus-wmi: Add quirk wmi_ignore_fan
        platform/x86: asus-nb-wmi: Add alternate mapping for KEY_SCREENLOCK
        platform/x86: asus-nb-wmi: Add alternate mapping for KEY_CAMERA
        platform/surface: aggregator: Add missing call to ssam_request_sync_free()
        platform/surface: aggregator: Ignore command messages not intended for us
        platform/x86: touchscreen_dmi: Add info for the CSL Panther Tab HD
        platform/x86: ideapad-laptop: Add Legion 5 15ARH05 DMI id to set_fn_lock_led_list[]
        platform/x86: sony-laptop: Don't turn off 0x153 keyboard backlight during probe
      0d0833e0
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2023-01-13' of git://anongit.freedesktop.org/drm/drm · ff5ebafd
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "There is a bit of a post-holiday build up here I expect, small fixes
        across the board, amdgpu and msm being the main leaders, with others
        having a few. One code removal patch for nouveau:
      
        buddy:
         - benchmark regression fix for top-down buddy allocation
      
        panel:
         - add Lenovo panel orientation quirk
      
        ttm:
         - fix kernel oops regression
      
        amdgpu:
         - fix missing fence references
         - fix missing pipeline sync fencing
         - SMU13 fan speed fix
         - SMU13 fix power cap handling
         - SMU13 BACO fix
         - Fix a possible segfault in bo validation error case
         - Delay removal of firmware framebuffer
         - Fix error when unloading
      
        amdkfd:
         - SVM fix when clearing vram
         - GC11 fix for multi-GPU
      
        i915:
         - Reserve enough fence slot for i915_vma_unbind_vsync
         - Fix potential use after free
         - Reset engines twice in case of reset failure
         - Use multi-cast registers for SVG Unit registers
      
        msm:
         - display:
         - doc warning fixes
         - dt attribs cleanups
         - memory leak fix
         - error handing in hdmi probe fix
         - dp_aux_isr incorrect signalling fix
         - shutdown path fix
         - accel:
         - a5xx: fix quirks to be a bitmask
         - a6xx: fix gx halt to avoid 1s hang
         - kexec shutdown fix
         - fix potential double free
      
        vmwgfx:
         - drop rcu usage to make code more robust
      
        virtio:
         - fix use-after-free in gem handle code
      
        nouveau:
         - drop unused nouveau_fbcon.c"
      
      * tag 'drm-fixes-2023-01-13' of git://anongit.freedesktop.org/drm/drm: (35 commits)
        drm: Optimize drm buddy top-down allocation method
        drm/ttm: Fix a regression causing kernel oops'es
        drm/i915/gt: Cover rest of SVG unit MCR registers
        drm/nouveau: Remove file nouveau_fbcon.c
        drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU
        drm/amd/pm/smu13: BACO is supported when it's in BACO state
        drm/amdkfd: Add sync after creating vram bo
        drm/i915/gt: Reset twice
        drm/amdgpu: fix pipeline sync v2
        drm/vmwgfx: Remove rcu locks from user resources
        drm/virtio: Fix GEM handle creation UAF
        drm/amdgpu: Fixed bug on error when unloading amdgpu
        drm/amd: Delay removal of the firmware framebuffer
        drm/amdgpu: Fix potential NULL dereference
        drm/i915: Fix potential context UAFs
        drm/i915: Reserve enough fence slot for i915_vma_unbind_async
        drm: Add orientation quirk for Lenovo ideapad D330-10IGL
        drm/msm/a6xx: Avoid gx gbit halt during rpm suspend
        drm/msm/adreno: Make adreno quirks not overwrite each other
        drm/msm: another fix for the headless Adreno GPU
        ...
      ff5ebafd
    • Clement Lecigne's avatar
      ALSA: pcm: Move rwsem lock inside snd_ctl_elem_read to prevent UAF · 56b88b50
      Clement Lecigne authored
      Takes rwsem lock inside snd_ctl_elem_read instead of snd_ctl_elem_read_user
      like it was done for write in commit 1fa4445f ("ALSA: control - introduce
      snd_ctl_notify_one() helper"). Doing this way we are also fixing the following
      locking issue happening in the compat path which can be easily triggered and
      turned into an use-after-free.
      
      64-bits:
      snd_ctl_ioctl
        snd_ctl_elem_read_user
          [takes controls_rwsem]
          snd_ctl_elem_read [lock properly held, all good]
          [drops controls_rwsem]
      
      32-bits:
      snd_ctl_ioctl_compat
        snd_ctl_elem_write_read_compat
          ctl_elem_write_read
            snd_ctl_elem_read [missing lock, not good]
      
      CVE-2023-0266 was assigned for this issue.
      
      Cc: stable@kernel.org # 5.13+
      Signed-off-by: default avatarClement Lecigne <clecigne@google.com>
      Reviewed-by: default avatarJaroslav Kysela <perex@perex.cz>
      Link: https://lore.kernel.org/r/20230113120745.25464-1-tiwai@suse.deSigned-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      56b88b50
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · d45b832d
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "Here's a sizeable batch of Friday the 13th arm64 fixes for -rc4. What
        could possibly go wrong?
      
        The obvious reason we have so much here is because of the holiday
        season right after the merge window, but we've also brought back an
        erratum workaround that was previously dropped at the last minute and
        there's an MTE coredumping fix that strays outside of the arch/arm64
        directory.
      
        Summary:
      
         - Fix PAGE_TABLE_CHECK failures on hugepage splitting path
      
         - Fix PSCI encoding of MEM_PROTECT_RANGE function in UAPI header
      
         - Fix NULL deref when accessing debugfs node if PSCI is not present
      
         - Fix MTE core dumping when VMA list is being updated concurrently
      
         - Fix SME signal frame handling when SVE is not implemented by the
           CPU
      
         - Fix asm constraints for cmpxchg_double() to hazard both words
      
         - Fix build failure with stack tracer and older versions of Clang
      
         - Bring back workaround for Cortex-A715 erratum 2645198"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: Fix build with CC=clang, CONFIG_FTRACE=y and CONFIG_STACK_TRACER=y
        arm64/mm: Define dummy pud_user_exec() when using 2-level page-table
        arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption
        firmware/psci: Don't register with debugfs if PSCI isn't available
        firmware/psci: Fix MEM_PROTECT_RANGE function numbers
        arm64/signal: Always allocate SVE signal frames on SME only systems
        arm64/signal: Always accept SVE signal frames on SME only systems
        arm64/sme: Fix context switch for SME only systems
        arm64: cmpxchg_double*: hazard against entire exchange variable
        arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning
        arm64: mte: Avoid the racy walk of the vma list during core dump
        elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size}
        arm64: mte: Fix double-freeing of the temporary tag storage during coredump
        arm64: ptrace: Use ARM64_SME to guard the SME register enumerations
        arm64/mm: add pud_user_exec() check in pud_user_accessible_page()
        arm64/mm: fix incorrect file_map_count for invalid pmd
      d45b832d
    • Christophe JAILLET's avatar
      iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe() · 142e821f
      Christophe JAILLET authored
      A clk, prepared and enabled in mtk_iommu_v1_hw_init(), is not released in
      the error handling path of mtk_iommu_v1_probe().
      
      Add the corresponding clk_disable_unprepare(), as already done in the
      remove function.
      
      Fixes: b17336c5 ("iommu/mediatek: add support for mtk iommu generation one HW")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarYong Wu <yong.wu@mediatek.com>
      Reviewed-by: default avatarAngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
      Reviewed-by: default avatarMatthias Brugger <matthias.bgg@gmail.com>
      Link: https://lore.kernel.org/r/593e7b7d97c6e064b29716b091a9d4fd122241fb.1671473163.git.christophe.jaillet@wanadoo.frSigned-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      142e821f
    • Yunfei Wang's avatar
      iommu/iova: Fix alloc iova overflows issue · dcdb3ba7
      Yunfei Wang authored
      In __alloc_and_insert_iova_range, there is an issue that retry_pfn
      overflows. The value of iovad->anchor.pfn_hi is ~0UL, then when
      iovad->cached_node is iovad->anchor, curr_iova->pfn_hi + 1 will
      overflow. As a result, if the retry logic is executed, low_pfn is
      updated to 0, and then new_pfn < low_pfn returns false to make the
      allocation successful.
      
      This issue occurs in the following two situations:
      1. The first iova size exceeds the domain size. When initializing
      iova domain, iovad->cached_node is assigned as iovad->anchor. For
      example, the iova domain size is 10M, start_pfn is 0x1_F000_0000,
      and the iova size allocated for the first time is 11M. The
      following is the log information, new->pfn_lo is smaller than
      iovad->cached_node.
      
      Example log as follows:
      [  223.798112][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range
      start_pfn:0x1f0000,retry_pfn:0x0,size:0xb00,limit_pfn:0x1f0a00
      [  223.799590][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range
      success start_pfn:0x1f0000,new->pfn_lo:0x1efe00,new->pfn_hi:0x1f08ff
      
      2. The node with the largest iova->pfn_lo value in the iova domain
      is deleted, iovad->cached_node will be updated to iovad->anchor,
      and then the alloc iova size exceeds the maximum iova size that can
      be allocated in the domain.
      
      After judging that retry_pfn is less than limit_pfn, call retry_pfn+1
      to fix the overflow issue.
      Signed-off-by: default avatarjianjiao zeng <jianjiao.zeng@mediatek.com>
      Signed-off-by: default avatarYunfei Wang <yf.wang@mediatek.com>
      Cc: <stable@vger.kernel.org> # 5.15.*
      Fixes: 4e89dce7 ("iommu/iova: Retry from last rb tree node if iova search fails")
      Acked-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Link: https://lore.kernel.org/r/20230111063801.25107-1-yf.wang@mediatek.comSigned-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      dcdb3ba7
    • Miaoqian Lin's avatar
      iommu: Fix refcount leak in iommu_device_claim_dma_owner · a6a9a5da
      Miaoqian Lin authored
      iommu_group_get() returns the group with the reference incremented.
      Move iommu_group_get() after owner check to fix the refcount leak.
      
      Fixes: 89395cce ("iommu: Add device-centric DMA ownership interfaces")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Reviewed-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Link: https://lore.kernel.org/r/20221230083100.1489569-1-linmq006@gmail.com
      [ joro: Remove *group = NULL initialization ]
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      a6a9a5da
    • Vladimir Oltean's avatar
      iommu/arm-smmu-v3: Don't unregister on shutdown · 32ea2c57
      Vladimir Oltean authored
      Similar to SMMUv2, this driver calls iommu_device_unregister() from the
      shutdown path, which removes the IOMMU groups with no coordination
      whatsoever with their users - shutdown methods are optional in device
      drivers. This can lead to NULL pointer dereferences in those drivers'
      DMA API calls, or worse.
      
      Instead of calling the full arm_smmu_device_remove() from
      arm_smmu_device_shutdown(), let's pick only the relevant function call -
      arm_smmu_device_disable() - more or less the reverse of
      arm_smmu_device_reset() - and call just that from the shutdown path.
      
      Fixes: 57365a04 ("iommu: Move bus setup to IOMMU device registration")
      Suggested-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20221215141251.3688780-2-vladimir.oltean@nxp.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      32ea2c57
    • Vladimir Oltean's avatar
      iommu/arm-smmu: Don't unregister on shutdown · ce31e6ca
      Vladimir Oltean authored
      Michael Walle says he noticed the following stack trace while performing
      a shutdown with "reboot -f". He suggests he got "lucky" and just hit the
      correct spot for the reboot while there was a packet transmission in
      flight.
      
      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000098
      CPU: 0 PID: 23 Comm: kworker/0:1 Not tainted 6.1.0-rc5-00088-gf3600ff8e322 #1930
      Hardware name: Kontron KBox A-230-LS (DT)
      pc : iommu_get_dma_domain+0x14/0x20
      lr : iommu_dma_map_page+0x9c/0x254
      Call trace:
       iommu_get_dma_domain+0x14/0x20
       dma_map_page_attrs+0x1ec/0x250
       enetc_start_xmit+0x14c/0x10b0
       enetc_xmit+0x60/0xdc
       dev_hard_start_xmit+0xb8/0x210
       sch_direct_xmit+0x11c/0x420
       __dev_queue_xmit+0x354/0xb20
       ip6_finish_output2+0x280/0x5b0
       __ip6_finish_output+0x15c/0x270
       ip6_output+0x78/0x15c
       NF_HOOK.constprop.0+0x50/0xd0
       mld_sendpack+0x1bc/0x320
       mld_ifc_work+0x1d8/0x4dc
       process_one_work+0x1e8/0x460
       worker_thread+0x178/0x534
       kthread+0xe0/0xe4
       ret_from_fork+0x10/0x20
      Code: d503201f f9416800 d503233f d50323bf (f9404c00)
      ---[ end trace 0000000000000000 ]---
      Kernel panic - not syncing: Oops: Fatal exception in interrupt
      
      This appears to be reproducible when the board has a fixed IP address,
      is ping flooded from another host, and "reboot -f" is used.
      
      The following is one more manifestation of the issue:
      
      $ reboot -f
      kvm: exiting hardware virtualization
      cfg80211: failed to load regulatory.db
      arm-smmu 5000000.iommu: disabling translation
      sdhci-esdhc 2140000.mmc: Removing from iommu group 11
      sdhci-esdhc 2150000.mmc: Removing from iommu group 12
      fsl-edma 22c0000.dma-controller: Removing from iommu group 17
      dwc3 3100000.usb: Removing from iommu group 9
      dwc3 3110000.usb: Removing from iommu group 10
      ahci-qoriq 3200000.sata: Removing from iommu group 2
      fsl-qdma 8380000.dma-controller: Removing from iommu group 20
      platform f080000.display: Removing from iommu group 0
      etnaviv-gpu f0c0000.gpu: Removing from iommu group 1
      etnaviv etnaviv: Removing from iommu group 1
      caam_jr 8010000.jr: Removing from iommu group 13
      caam_jr 8020000.jr: Removing from iommu group 14
      caam_jr 8030000.jr: Removing from iommu group 15
      caam_jr 8040000.jr: Removing from iommu group 16
      fsl_enetc 0000:00:00.0: Removing from iommu group 4
      arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
      arm-smmu 5000000.iommu:         GFSR 0x80000002, GFSYNR0 0x00000002, GFSYNR1 0x00000429, GFSYNR2 0x00000000
      fsl_enetc 0000:00:00.1: Removing from iommu group 5
      arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
      arm-smmu 5000000.iommu:         GFSR 0x80000002, GFSYNR0 0x00000002, GFSYNR1 0x00000429, GFSYNR2 0x00000000
      arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
      arm-smmu 5000000.iommu:         GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000429, GFSYNR2 0x00000000
      fsl_enetc 0000:00:00.2: Removing from iommu group 6
      fsl_enetc_mdio 0000:00:00.3: Removing from iommu group 8
      mscc_felix 0000:00:00.5: Removing from iommu group 3
      fsl_enetc 0000:00:00.6: Removing from iommu group 7
      pcieport 0001:00:00.0: Removing from iommu group 18
      arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
      arm-smmu 5000000.iommu:         GFSR 0x00000002, GFSYNR0 0x00000000, GFSYNR1 0x00000429, GFSYNR2 0x00000000
      pcieport 0002:00:00.0: Removing from iommu group 19
      Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a8
      pc : iommu_get_dma_domain+0x14/0x20
      lr : iommu_dma_unmap_page+0x38/0xe0
      Call trace:
       iommu_get_dma_domain+0x14/0x20
       dma_unmap_page_attrs+0x38/0x1d0
       enetc_unmap_tx_buff.isra.0+0x6c/0x80
       enetc_poll+0x170/0x910
       __napi_poll+0x40/0x1e0
       net_rx_action+0x164/0x37c
       __do_softirq+0x128/0x368
       run_ksoftirqd+0x68/0x90
       smpboot_thread_fn+0x14c/0x190
      Code: d503201f f9416800 d503233f d50323bf (f9405400)
      ---[ end trace 0000000000000000 ]---
      Kernel panic - not syncing: Oops: Fatal exception in interrupt
      ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
      
      The problem seems to be that iommu_group_remove_device() is allowed to
      run with no coordination whatsoever with the shutdown procedure of the
      enetc PCI device. In fact, it almost seems as if it implies that the
      pci_driver :: shutdown() method is mandatory if DMA is used with an
      IOMMU, otherwise this is inevitable. That was never the case; shutdown
      methods are optional in device drivers.
      
      This is the call stack that leads to iommu_group_remove_device() during
      reboot:
      
      kernel_restart
      -> device_shutdown
         -> platform_shutdown
            -> arm_smmu_device_shutdown
               -> arm_smmu_device_remove
                  -> iommu_device_unregister
                     -> bus_for_each_dev
                        -> remove_iommu_group
                           -> iommu_release_device
                              -> iommu_group_remove_device
      
      I don't know much about the arm_smmu driver, but
      arm_smmu_device_shutdown() invoking arm_smmu_device_remove() looks
      suspicious, since it causes the IOMMU device to unregister and that's
      where everything starts to unravel. It forces all other devices which
      depend on IOMMU groups to also point their ->shutdown() to ->remove(),
      which will make reboot slower overall.
      
      There are 2 moments relevant to this behavior. First was commit
      b06c076e ("Revert "iommu/arm-smmu: Make arm-smmu explicitly
      non-modular"") when arm_smmu_device_shutdown() was made to run the exact
      same thing as arm_smmu_device_remove(). Prior to that, there was no
      iommu_device_unregister() call in arm_smmu_device_shutdown(). However,
      that was benign until commit 57365a04 ("iommu: Move bus setup to
      IOMMU device registration"), which made iommu_device_unregister() call
      remove_iommu_group().
      
      Restore the old shutdown behavior by making remove() call shutdown(),
      but shutdown() does not call the remove() specific bits.
      
      Fixes: 57365a04 ("iommu: Move bus setup to IOMMU device registration")
      Reported-by: default avatarMichael Walle <michael@walle.cc>
      Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20221215141251.3688780-1-vladimir.oltean@nxp.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      ce31e6ca
    • Robin Murphy's avatar
      iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY even betterer · ac9c5e92
      Robin Murphy authored
      Although it's vanishingly unlikely that anyone would integrate an SMMU
      within a coherent interconnect without also making the pagetable walk
      interface coherent, the same effect happens if a coherent SMMU fails to
      advertise CTTW correctly. This turns out to be the case on some popular
      NXP SoCs, where VFIO started failing the IOMMU_CAP_CACHE_COHERENCY test,
      even though IOMMU_CACHE *was* previously achieving the desired effect
      anyway thanks to the underlying integration.
      
      While those SoCs stand to gain some more general benefits from a
      firmware update to override CTTW correctly in DT/ACPI, it's also easy
      to work around this in Linux as well, to avoid imposing too much on
      affected users - since the upstream client devices *are* correctly
      marked as coherent, we can trivially infer their coherent paths through
      the SMMU as well.
      Reported-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Fixes: df198b37 ("iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY better")
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Tested-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/d6dc41952961e5c7b21acac08a8bf1eb0f69e124.1671123115.git.robin.murphy@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
      ac9c5e92