1. 01 Dec, 2022 5 commits
    • Jason Gunthorpe's avatar
      iommufd: Algorithms for PFN storage · 8d160cd4
      Jason Gunthorpe authored
      The iopt_pages which represents a logical linear list of full PFNs held in
      different storage tiers. Each area points to a slice of exactly one
      iopt_pages, and each iopt_pages can have multiple areas and accesses.
      
      The three storage tiers are managed to meet these objectives:
      
       - If no iommu_domain or in-kerenel access exists then minimal memory
         should be consumed by iomufd
       - If a page has been pinned then an iopt_pages will not pin it again
       - If an in-kernel access exists then the xarray must provide the backing
         storage to avoid allocations on domain removals
       - Otherwise any iommu_domain will be used for storage
      
      In a common configuration with only an iommu_domain the iopt_pages does
      not allocate significant memory itself.
      
      The external interface for pages has several logical operations:
      
        iopt_area_fill_domain() will load the PFNs from storage into a single
        domain. This is used when attaching a new domain to an existing IOAS.
      
        iopt_area_fill_domains() will load the PFNs from storage into multiple
        domains. This is used when creating a new IOVA map in an existing IOAS
      
        iopt_pages_add_access() creates an iopt_pages_access that tracks an
        in-kernel access of PFNs. This is some external driver that might be
        accessing the IOVA using the CPU, or programming PFNs with the DMA
        API. ie a VFIO mdev.
      
        iopt_pages_rw_access() directly perform a memcpy on the PFNs, without
        the overhead of iopt_pages_add_access()
      
        iopt_pages_fill_xarray() will load PFNs into the xarray and return a
        'struct page *' array. It is used by iopt_pages_access's to extract PFNs
        for in-kernel use. iopt_pages_fill_from_xarray() is a fast path when it
        is known the xarray is already filled.
      
      As an iopt_pages can be referred to in slices by many areas and accesses
      it uses interval trees to keep track of which storage tiers currently hold
      the PFNs. On a page-by-page basis any request for a PFN will be satisfied
      from one of the storage tiers and the PFN copied to target domain/array.
      
      Unfill actions are similar, on a page by page basis domains are unmapped,
      xarray entries freed or struct pages fully put back.
      
      Significant complexity is required to fully optimize all of these data
      motions. The implementation calculates the largest consecutive range of
      same-storage indexes and operates in blocks. The accumulation of PFNs
      always generates the largest contiguous PFN range possible to optimize and
      this gathering can cross storage tier boundaries. For cases like 'fill
      domains' care is taken to avoid duplicated work and PFNs are read once and
      pushed into all domains.
      
      The map/unmap interaction with the iommu_domain always works in contiguous
      PFN blocks. The implementation does not require or benefit from any
      split/merge optimization in the iommu_domain driver.
      
      This design suggests several possible improvements in the IOMMU API that
      would greatly help performance, particularly a way for the driver to map
      and read the pfns lists instead of working with one driver call per page
      to read, and one driver call per contiguous range to store.
      
      Link: https://lore.kernel.org/r/9-v6-a196d26f289e+11787-iommufd_jgg@nvidia.comReviewed-by: default avatarKevin Tian <kevin.tian@intel.com>
      Tested-by: default avatarNicolin Chen <nicolinc@nvidia.com>
      Tested-by: default avatarYi Liu <yi.l.liu@intel.com>
      Tested-by: default avatarLixiao Yang <lixiao.yang@intel.com>
      Tested-by: default avatarMatthew Rosato <mjrosato@linux.ibm.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      8d160cd4
    • Jason Gunthorpe's avatar
      iommufd: PFN handling for iopt_pages · f394576e
      Jason Gunthorpe authored
      The top of the data structure provides an IO Address Space (IOAS) that is
      similar to a VFIO container. The IOAS allows map/unmap of memory into
      ranges of IOVA called iopt_areas. Multiple IOMMU domains (IO page tables)
      and in-kernel accesses (like VFIO mdevs) can be attached to the IOAS to
      access the PFNs that those IOVA areas cover.
      
      The IO Address Space (IOAS) datastructure is composed of:
       - struct io_pagetable holding the IOVA map
       - struct iopt_areas representing populated portions of IOVA
       - struct iopt_pages representing the storage of PFNs
       - struct iommu_domain representing each IO page table in the system IOMMU
       - struct iopt_pages_access representing in-kernel accesses of PFNs (ie
         VFIO mdevs)
       - struct xarray pinned_pfns holding a list of pages pinned by in-kernel
         accesses
      
      This patch introduces the lowest part of the datastructure - the movement
      of PFNs in a tiered storage scheme:
       1) iopt_pages::pinned_pfns xarray
       2) Multiple iommu_domains
       3) The origin of the PFNs, i.e. the userspace pointer
      
      PFN have to be copied between all combinations of tiers, depending on the
      configuration.
      
      The interface is an iterator called a 'pfn_reader' which determines which
      tier each PFN is stored and loads it into a list of PFNs held in a struct
      pfn_batch.
      
      Each step of the iterator will fill up the pfn_batch, then the caller can
      use the pfn_batch to send the PFNs to the required destination. Repeating
      this loop will read all the PFNs in an IOVA range.
      
      The pfn_reader and pfn_batch also keep track of the pinned page accounting.
      
      While PFNs are always stored and accessed as full PAGE_SIZE units the
      iommu_domain tier can store with a sub-page offset/length to support
      IOMMUs with a smaller IOPTE size than PAGE_SIZE.
      
      Link: https://lore.kernel.org/r/8-v6-a196d26f289e+11787-iommufd_jgg@nvidia.comReviewed-by: default avatarKevin Tian <kevin.tian@intel.com>
      Tested-by: default avatarNicolin Chen <nicolinc@nvidia.com>
      Tested-by: default avatarYi Liu <yi.l.liu@intel.com>
      Tested-by: default avatarLixiao Yang <lixiao.yang@intel.com>
      Tested-by: default avatarMatthew Rosato <mjrosato@linux.ibm.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      f394576e
    • Jason Gunthorpe's avatar
      kernel/user: Allow user_struct::locked_vm to be usable for iommufd · ce5a23c8
      Jason Gunthorpe authored
      Following the pattern of io_uring, perf, skb, and bpf, iommfd will use
      user->locked_vm for accounting pinned pages. Ensure the value is included
      in the struct and export free_uid() as iommufd is modular.
      
      user->locked_vm is the good accounting to use for ulimit because it is
      per-user, and the security sandboxing of locked pages is not supposed to
      be per-process. Other places (vfio, vdpa and infiniband) have used
      mm->pinned_vm and/or mm->locked_vm for accounting pinned pages, but this
      is only per-process and inconsistent with the new FOLL_LONGTERM users in
      the kernel.
      
      Concurrent work is underway to try to put this in a cgroup, so everything
      can be consistent and the kernel can provide a FOLL_LONGTERM limit that
      actually provides security.
      
      Link: https://lore.kernel.org/r/7-v6-a196d26f289e+11787-iommufd_jgg@nvidia.comReviewed-by: default avatarKevin Tian <kevin.tian@intel.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Tested-by: default avatarNicolin Chen <nicolinc@nvidia.com>
      Tested-by: default avatarYi Liu <yi.l.liu@intel.com>
      Tested-by: default avatarLixiao Yang <lixiao.yang@intel.com>
      Tested-by: default avatarMatthew Rosato <mjrosato@linux.ibm.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      ce5a23c8
    • Jason Gunthorpe's avatar
      iommufd: File descriptor, context, kconfig and makefiles · 2ff4bed7
      Jason Gunthorpe authored
      This is the basic infrastructure of a new miscdevice to hold the iommufd
      IOCTL API.
      
      It provides:
       - A miscdevice to create file descriptors to run the IOCTL interface over
      
       - A table based ioctl dispatch and centralized extendable pre-validation
         step
      
       - An xarray mapping userspace ID's to kernel objects. The design has
         multiple inter-related objects held within in a single IOMMUFD fd
      
       - A simple usage count to build a graph of object relations and protect
         against hostile userspace racing ioctls
      
      The only IOCTL provided in this patch is the generic 'destroy any object
      by handle' operation.
      
      Link: https://lore.kernel.org/r/6-v6-a196d26f289e+11787-iommufd_jgg@nvidia.comReviewed-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Reviewed-by: default avatarKevin Tian <kevin.tian@intel.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Tested-by: default avatarNicolin Chen <nicolinc@nvidia.com>
      Tested-by: default avatarYi Liu <yi.l.liu@intel.com>
      Tested-by: default avatarLixiao Yang <lixiao.yang@intel.com>
      Tested-by: default avatarMatthew Rosato <mjrosato@linux.ibm.com>
      Signed-off-by: default avatarYi Liu <yi.l.liu@intel.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      2ff4bed7
    • Kevin Tian's avatar
      iommufd: Document overview of iommufd · 658234de
      Kevin Tian authored
      Add iommufd into the documentation tree, and supply initial documentation.
      Much of this is linked from code comments by kdoc.
      
      Link: https://lore.kernel.org/r/5-v6-a196d26f289e+11787-iommufd_jgg@nvidia.comReviewed-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Signed-off-by: default avatarKevin Tian <kevin.tian@intel.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      658234de
  2. 29 Nov, 2022 4 commits
  3. 03 Nov, 2022 14 commits
  4. 01 Nov, 2022 5 commits
  5. 30 Oct, 2022 12 commits
    • Linus Torvalds's avatar
      Linux 6.1-rc3 · 30a0b95b
      Linus Torvalds authored
      30a0b95b
    • Linus Torvalds's avatar
      Merge tag 'fbdev-for-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev · b72018ab
      Linus Torvalds authored
      Pull fbdev fixes from Helge Deller:
       "A use-after-free bugfix in the smscufx driver and various minor error
        path fixes, smaller build fixes, sysfs fixes and typos in comments in
        the stifb, sisfb, da8xxfb, xilinxfb, sm501fb, gbefb and cyber2000fb
        drivers"
      
      * tag 'fbdev-for-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
        fbdev: cyber2000fb: fix missing pci_disable_device()
        fbdev: sisfb: use explicitly signed char
        fbdev: smscufx: Fix several use-after-free bugs
        fbdev: xilinxfb: Make xilinxfb_release() return void
        fbdev: sisfb: fix repeated word in comment
        fbdev: gbefb: Convert sysfs snprintf to sysfs_emit
        fbdev: sm501fb: Convert sysfs snprintf to sysfs_emit
        fbdev: stifb: Fall back to cfb_fillrect() on 32-bit HCRX cards
        fbdev: da8xx-fb: Fix error handling in .remove()
        fbdev: MIPS supports iomem addresses
      b72018ab
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 9f127546
      Linus Torvalds authored
      Pull char/misc fixes from Greg KH:
       "Some small driver fixes for 6.1-rc3.  They include:
      
         - iio driver bugfixes
      
         - counter driver bugfixes
      
         - coresight bugfixes, including a revert and then a second fix to get
           it right.
      
        All of these have been in linux-next with no reported problems"
      
      * tag 'char-misc-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
        misc: sgi-gru: use explicitly signed char
        coresight: cti: Fix hang in cti_disable_hw()
        Revert "coresight: cti: Fix hang in cti_disable_hw()"
        counter: 104-quad-8: Fix race getting function mode and direction
        counter: microchip-tcb-capture: Handle Signal1 read and Synapse
        coresight: cti: Fix hang in cti_disable_hw()
        coresight: Fix possible deadlock with lock dependency
        counter: ti-ecap-capture: fix IS_ERR() vs NULL check
        counter: Reduce DEFINE_COUNTER_ARRAY_POLARITY() to defining counter_array
        iio: bmc150-accel-core: Fix unsafe buffer attributes
        iio: adxl367: Fix unsafe buffer attributes
        iio: adxl372: Fix unsafe buffer attributes
        iio: at91-sama5d2_adc: Fix unsafe buffer attributes
        iio: temperature: ltc2983: allocate iio channels once
        tools: iio: iio_utils: fix digit calculation
        iio: adc: stm32-adc: fix channel sampling time init
        iio: adc: mcp3911: mask out device ID in debug prints
        iio: adc: mcp3911: use correct id bits
        iio: adc: mcp3911: return proper error code on failure to allocate trigger
        iio: adc: mcp3911: fix sizeof() vs ARRAY_SIZE() bug
        ...
      9f127546
    • Linus Torvalds's avatar
      Merge tag 'usb-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · c4d25ce6
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "A few small USB fixes for 6.1-rc3. Include in here are:
      
         - MAINTAINERS update, including a big one for the USB gadget
           subsystem. Many thanks to Felipe for all of the years of hard work
           he has done on this codebase, it was greatly appreciated.
      
         - dwc3 driver fixes for reported problems.
      
         - xhci driver fixes for reported problems.
      
         - typec driver fixes for minor issues
      
         - uvc gadget driver change, and then revert as it wasn't relevant for
           6.1-final, as it is a new feature and people are still reviewing
           and modifying it.
      
        All of these have been in the linux-next tree with no reported issues"
      
      * tag 'usb-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: dwc3: gadget: Don't set IMI for no_interrupt
        usb: dwc3: gadget: Stop processing more requests on IMI
        Revert "usb: gadget: uvc: limit isoc_sg to super speed gadgets"
        xhci: Remove device endpoints from bandwidth list when freeing the device
        xhci-pci: Set runtime PM as default policy on all xHC 1.2 or later devices
        xhci: Add quirk to reset host back to default state at shutdown
        usb: xhci: add XHCI_SPURIOUS_SUCCESS to ASM1042 despite being a V0.96 controller
        usb: dwc3: st: Rely on child's compatible instead of name
        usb: gadget: uvc: limit isoc_sg to super speed gadgets
        usb: bdc: change state when port disconnected
        usb: typec: ucsi: acpi: Implement resume callback
        usb: typec: ucsi: Check the connection on resume
        usb: gadget: aspeed: Fix probe regression
        usb: gadget: uvc: fix sg handling during video encode
        usb: gadget: uvc: fix sg handling in error case
        usb: gadget: uvc: fix dropped frame after missed isoc
        usb: dwc3: gadget: Don't delay End Transfer on delayed_status
        usb: dwc3: Don't switch OTG -> peripheral if extcon is present
        MAINTAINERS: Update maintainers for broadcom USB
        MAINTAINERS: move USB gadget and phy entries under the main USB entry
      c4d25ce6
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · ef3c0949
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - convert gpio-tegra to using an immutable irqchip
      
       - MAINTAINERS update
      
      * tag 'gpio-fixes-for-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        MAINTAINERS: Change myself to a maintainer
        gpio: tegra: Convert to immutable irq chip
      ef3c0949
    • Linus Torvalds's avatar
      Merge tag 'perf_urgent_for_v6.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 43476605
      Linus Torvalds authored
      Pull perf fixes from Borislav Petkov:
      
       - Rename a perf memory level event define to denote it is of CXL type
      
       - Add Alder and Raptor Lakes support to RAPL
      
       - Make sure raw sample data is output with tracepoints
      
      * tag 'perf_urgent_for_v6.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL
        perf/x86/rapl: Add support for Intel Raptor Lake
        perf/x86/rapl: Add support for Intel AlderLake-N
        perf: Fix missing raw data on tracepoint events
      43476605
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.1-1' of... · c96bb958
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Remove unused kernel stack padding, fix some build errors/warnings and
        two bugs in laptop platform driver"
      
      * tag 'loongarch-fixes-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        platform/loongarch: laptop: Fix possible UAF and simplify generic_acpi_laptop_init()
        platform/loongarch: laptop: Adjust resume order for loongson_hotkey_resume()
        LoongArch: BPF: Avoid declare variables in switch-case
        LoongArch: Use flexible-array member instead of zero-length array
        LoongArch: Remove unused kernel stack padding
      c96bb958
    • Linus Torvalds's avatar
      Merge tag '6.1-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 28b7bd4a
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
      
       - use after free fix for reconnect race
      
       - two memory leak fixes
      
      * tag '6.1-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: fix use-after-free caused by invalid pointer `hostname`
        cifs: Fix pages leak when writedata alloc failed in cifs_write_from_iter()
        cifs: Fix pages array leak when writedata alloc failed in cifs_writedata_alloc()
      28b7bd4a
    • Linus Torvalds's avatar
      Merge tag 'random-6.1-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random · 882ad2a2
      Linus Torvalds authored
      Pull random number generator fix from Jason Donenfeld:
       "One fix from Jean-Philippe Brucker, addressing a regression in which
        early boot code on ARM64 would use the non-_early variant of the
        arch_get_random family of functions, resulting in the architectural
        random number generator appearing unavailable during that early phase
        of boot.
      
        The fix simply changes arch_get_random*() to arch_get_random*_early().
      
        This distinction between these two functions is a bit of an old wart
        I'm not a fan of, and for 6.2 I'll see if I can make obsolete the
        _early variant, so that one function does the right thing in all
        contexts without overhead"
      
      * tag 'random-6.1-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
        random: use arch_get_random*_early() in random_init()
      882ad2a2
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 83633ed7
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Varions small  fixes, all  in drivers.
      
        Some of these arrived during the merge window and got held over to
        make sure of testing on the -rc tree.
      
        The biggest change is for standards conformance in the target driver,
        closely followed by a set of bug fixes in megaraid_sas"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (21 commits)
        scsi: ufs: core: Fix typo in comment
        scsi: mpi3mr: Select CONFIG_SCSI_SAS_ATTRS
        scsi: ufs: core: Fix typo for register name in comments
        scsi: pm80xx: Display proc_name in sysfs
        scsi: ufs: core: Fix the error log in ufshcd_query_flag_retry()
        scsi: ufs: core: Remove unneeded casts from void *
        scsi: lpfc: Fix spelling mistake "unsolicted" -> "unsolicited"
        scsi: qla2xxx: Use transport-defined speed mask for supported_speeds
        scsi: target: iblock: Fold iblock_emulate_read_cap_with_block_size() into iblock_get_blocks()
        scsi: qla2xxx: Fix serialization of DCBX TLV data request
        scsi: ufs: qcom: Remove redundant dev_err() call
        scsi: megaraid_sas: Move megasas_dbg_lvl init to megasas_init()
        scsi: megaraid_sas: Remove unnecessary memset()
        scsi: megaraid_sas: Simplify megasas_update_device_list
        scsi: megaraid_sas: Correct an error message
        scsi: megaraid_sas: Correct value passed to scsi_device_lookup()
        scsi: target: core: UA on all LUNs after reset
        scsi: target: core: New key must be used for moved PR
        scsi: target: core: Abort all preempted regs if requested
        scsi: target: core: Fix memory leak in preempt_and_abort
        ...
      83633ed7
    • Linus Torvalds's avatar
      Merge tag 'block-6.1-2022-10-28' of git://git.kernel.dk/linux · c6e0e874
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request via Christoph:
            - make the multipath dma alignment match the non-multipath one
              (Keith Busch)
            - fix a bogus use of sg_init_marker() (Nam Cao)
            - fix circulr locking in nvme-tcp (Sagi Grimberg)
      
       - Initialization fix for requests allocated via the special hw queue
         allocator (John)
      
       - Fix for a regression added in this release with the batched
         completions of end_io backed requests (Ming)
      
       - Error handling leak fix for rbd (Yang)
      
       - Error handling leak fix for add_disk() failure (Yu)
      
      * tag 'block-6.1-2022-10-28' of git://git.kernel.dk/linux:
        blk-mq: Properly init requests from blk_mq_alloc_request_hctx()
        blk-mq: don't add non-pt request with ->end_io to batch
        rbd: fix possible memory leak in rbd_sysfs_init()
        nvme-multipath: set queue dma alignment to 3
        nvme-tcp: fix possible circular locking when deleting a controller under memory pressure
        nvme-tcp: replace sg_init_marker() with sg_init_table()
        block: fix memory leak for elevator on add_disk failure
      c6e0e874
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.1-2022-10-28' of git://git.kernel.dk/linux · 4d244327
      Linus Torvalds authored
      Pull io_uring fix from Jens Axboe:
       "Just a fix for a locking regression introduced with the deferred
        task_work running from this merge window"
      
      * tag 'io_uring-6.1-2022-10-28' of git://git.kernel.dk/linux:
        io_uring: unlock if __io_run_local_work locked inside
        io_uring: use io_run_local_work_locked helper
      4d244327