1. 23 Nov, 2019 4 commits
    • Jason Gunthorpe's avatar
      RDMA/odp: Use mmu_interval_notifier_insert() · f25a546e
      Jason Gunthorpe authored
      Replace the internal interval tree based mmu notifier with the new common
      mmu_interval_notifier_insert() API. This removes a lot of code and fixes a
      deadlock that can be triggered in ODP:
      
       zap_page_range()
        mmu_notifier_invalidate_range_start()
         [..]
          ib_umem_notifier_invalidate_range_start()
             down_read(&per_mm->umem_rwsem)
        unmap_single_vma()
          [..]
            __split_huge_page_pmd()
              mmu_notifier_invalidate_range_start()
              [..]
                 ib_umem_notifier_invalidate_range_start()
                    down_read(&per_mm->umem_rwsem)   // DEADLOCK
      
              mmu_notifier_invalidate_range_end()
                 up_read(&per_mm->umem_rwsem)
        mmu_notifier_invalidate_range_end()
           up_read(&per_mm->umem_rwsem)
      
      The umem_rwsem is held across the range_start/end as the ODP algorithm for
      invalidate_range_end cannot tolerate changes to the interval
      tree. However, due to the nested invalidation regions the second
      down_read() can deadlock if there are competing writers. The new core code
      provides an alternative scheme to solve this problem.
      
      Fixes: ca748c39 ("RDMA/umem: Get rid of per_mm->notifier_count")
      Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.caTested-by: default avatarArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      f25a546e
    • Jason Gunthorpe's avatar
      mm/hmm: define the pre-processor related parts of hmm.h even if disabled · 107e8998
      Jason Gunthorpe authored
      Only the function calls are stubbed out with static inlines that always
      fail. This is the standard way to write a header for an optional component
      and makes it easier for drivers that only optionally need HMM_MIRROR.
      
      Link: https://lore.kernel.org/r/20191112202231.3856-5-jgg@ziepe.caReviewed-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Tested-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      107e8998
    • Jason Gunthorpe's avatar
      mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror · 04ec32fb
      Jason Gunthorpe authored
      hmm_mirror's handling of ranges does not use a sequence count which
      results in this bug:
      
               CPU0                                   CPU1
                                           hmm_range_wait_until_valid(range)
                                               valid == true
                                           hmm_range_fault(range)
      hmm_invalidate_range_start()
         range->valid = false
      hmm_invalidate_range_end()
         range->valid = true
                                           hmm_range_valid(range)
                                                valid == true
      
      Where the hmm_range_valid() should not have succeeded.
      
      Adding the required sequence count would make it nearly identical to the
      new mmu_interval_notifier. Instead replace the hmm_mirror stuff with
      mmu_interval_notifier.
      
      Co-existence of the two APIs is the first step.
      
      Link: https://lore.kernel.org/r/20191112202231.3856-4-jgg@ziepe.caReviewed-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Tested-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Tested-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      04ec32fb
    • Jason Gunthorpe's avatar
      mm/mmu_notifier: add an interval tree notifier · 99cb252f
      Jason Gunthorpe authored
      Of the 13 users of mmu_notifiers, 8 of them use only
      invalidate_range_start/end() and immediately intersect the
      mmu_notifier_range with some kind of internal list of VAs.  4 use an
      interval tree (i915_gem, radeon_mn, umem_odp, hfi1). 4 use a linked list
      of some kind (scif_dma, vhost, gntdev, hmm)
      
      And the remaining 5 either don't use invalidate_range_start() or do some
      special thing with it.
      
      It turns out that building a correct scheme with an interval tree is
      pretty complicated, particularly if the use case is synchronizing against
      another thread doing get_user_pages().  Many of these implementations have
      various subtle and difficult to fix races.
      
      This approach puts the interval tree as common code at the top of the mmu
      notifier call tree and implements a shareable locking scheme.
      
      It includes:
       - An interval tree tracking VA ranges, with per-range callbacks
       - A read/write locking scheme for the interval tree that avoids
         sleeping in the notifier path (for OOM killer)
       - A sequence counter based collision-retry locking scheme to tell
         device page fault that a VA range is being concurrently invalidated.
      
      This is based on various ideas:
      - hmm accumulates invalidated VA ranges and releases them when all
        invalidates are done, via active_invalidate_ranges count.
        This approach avoids having to intersect the interval tree twice (as
        umem_odp does) at the potential cost of a longer device page fault.
      
      - kvm/umem_odp use a sequence counter to drive the collision retry,
        via invalidate_seq
      
      - a deferred work todo list on unlock scheme like RTNL, via deferred_list.
        This makes adding/removing interval tree members more deterministic
      
      - seqlock, except this version makes the seqlock idea multi-holder on the
        write side by protecting it with active_invalidate_ranges and a spinlock
      
      To minimize MM overhead when only the interval tree is being used, the
      entire SRCU and hlist overheads are dropped using some simple
      branches. Similarly the interval tree overhead is dropped when in hlist
      mode.
      
      The overhead from the mandatory spinlock is broadly the same as most of
      existing users which already had a lock (or two) of some sort on the
      invalidation path.
      
      Link: https://lore.kernel.org/r/20191112202231.3856-3-jgg@ziepe.caAcked-by: default avatarChristian König <christian.koenig@amd.com>
      Tested-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Tested-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      99cb252f
  2. 13 Nov, 2019 1 commit
  3. 01 Nov, 2019 1 commit
    • Jason Gunthorpe's avatar
      Merge branch 'odp_rework' into hmm.git · 0e64e5b3
      Jason Gunthorpe authored
      This branch is shared with the rdma.git for dependencies in following
      patches:
      
      ====================
      In order to hoist the interval tree code out of the drivers and into the
      mmu_notifiers it is necessary for the drivers to not use the interval tree
      for other things.
      
      This series replaces the interval tree with an xarray and along the way
      re-aligns all the locking to use a sensible SRCU model where the 'update'
      step is done by modifying an xarray.
      
      The result is overall much simpler and with less locking in the critical
      path. Many functions were reworked for clarity and small details like
      using 'imr' to refer to the implicit MR make the entire code flow here
      more readable.
      
      This also squashes at least two race bugs on its own, and quite possibily
      more that haven't been identified.
      ====================
      
      * branch 'odp_rework':
        RDMA/odp: Remove broken debugging call to invalidate_range
        RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
        RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
        RDMA/mlx5: Rework implicit ODP destroy
        RDMA/mlx5: Avoid double lookups on the pagefault path
        RDMA/mlx5: Reduce locking in implicit_mr_get_data()
        RDMA/mlx5: Use an xarray for the children of an implicit ODP
        RDMA/mlx5: Split implicit handling from pagefault_mr
        RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
        RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
        RDMA/mlx5: Rework implicit_mr_get_data
        RDMA/mlx5: Delete struct mlx5_priv->mkey_table
        RDMA/mlx5: Use a dedicated mkey xarray for ODP
        RDMA/mlx5: Split sig_err MR data into its own xarray
        RDMA/mlx5: Use SRCU properly in ODP prefetch
      0e64e5b3
  4. 29 Oct, 2019 1 commit
  5. 28 Oct, 2019 15 commits
  6. 27 Oct, 2019 7 commits
    • Linus Torvalds's avatar
      Linux 5.4-rc5 · d6d5df1d
      Linus Torvalds authored
      d6d5df1d
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 153a971f
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "Two fixes for the VMWare guest support:
      
         - Unbreak VMWare platform detection which got wreckaged by converting
           an integer constant to a string constant.
      
         - Fix the clang build of the VMWAre hypercall by explicitely
           specifying the ouput register for INL instead of using the short
           form"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu/vmware: Fix platform detection VMWARE_PORT macro
        x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm
      153a971f
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2b776b54
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "A small set of fixes for time(keeping):
      
         - Add a missing include to prevent compiler warnings.
      
         - Make the VDSO implementation of clock_getres() POSIX compliant
           again. A recent change dropped the NULL pointer guard which is
           required as NULL is a valid pointer value for this function.
      
         - Fix two function documentation typos"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        posix-cpu-timers: Fix two trivial comments
        timers/sched_clock: Include local timekeeping.h for missing declarations
        lib/vdso: Make clock_getres() POSIX compliant again
      2b776b54
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a8a31fdc
      Linus Torvalds authored
      Pull perf fixes from Thomas Gleixner:
       "A set of perf fixes:
      
        kernel:
      
         - Unbreak the tracking of auxiliary buffer allocations which got
           imbalanced causing recource limit failures.
      
         - Fix the fallout of splitting of ToPA entries which missed to shift
           the base entry PA correctly.
      
         - Use the correct context to lookup the AUX event when unmapping the
           associated AUX buffer so the event can be stopped and the buffer
           reference dropped.
      
        tools:
      
         - Fix buildiid-cache mode setting in copyfile_mode_ns() when copying
           /proc/kcore
      
         - Fix freeing id arrays in the event list so the correct event is
           closed.
      
         - Sync sched.h anc kvm.h headers with the kernel sources.
      
         - Link jvmti against tools/lib/ctype.o to have weak strlcpy().
      
         - Fix multiple memory and file descriptor leaks, found by coverity in
           perf annotate.
      
         - Fix leaks in error handling paths in 'perf c2c', 'perf kmem', found
           by a static analysis tool"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/aux: Fix AUX output stopping
        perf/aux: Fix tracking of auxiliary trace buffer allocation
        perf/x86/intel/pt: Fix base for single entry topa
        perf kmem: Fix memory leak in compact_gfp_flags()
        tools headers UAPI: Sync sched.h with the kernel
        tools headers kvm: Sync kvm.h headers with the kernel sources
        tools headers kvm: Sync kvm headers with the kernel sources
        tools headers kvm: Sync kvm headers with the kernel sources
        perf c2c: Fix memory leak in build_cl_output()
        perf tools: Fix mode setting in copyfile_mode_ns()
        perf annotate: Fix multiple memory and file descriptor leaks
        perf tools: Fix resource leak of closedir() on the error paths
        perf evlist: Fix fix for freed id arrays
        perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy()
      a8a31fdc
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1e1ac1cb
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "Two fixes for interrupt controller drivers:
      
         - Skip IRQ_M_EXT entries in the device tree when initializing the
           RISCV PLIC controller to avoid a double init attempt.
      
         - Use the correct ITS list when issuing the VMOVP synchronization
           command so the operation works only on the ITS instances which are
           associated to a VM"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/sifive-plic: Skip contexts except supervisor in plic_init()
        irqchip/gic-v3-its: Use the exact ITSList for VMOVP
      1e1ac1cb
    • Linus Torvalds's avatar
      Merge tag '5.4-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · c9a2e4a8
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Seven cifs/smb3 fixes, including three for stable"
      
      * tag '5.4-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occurs
        CIFS: Fix use after free of file info structures
        CIFS: Fix retry mid list corruption on reconnects
        cifs: Fix missed free operations
        CIFS: avoid using MID 0xFFFF
        cifs: clarify comment about timestamp granularity for old servers
        cifs: Handle -EINPROGRESS only when noblockcnt is set
      c9a2e4a8
    • Linus Torvalds's avatar
      Merge tag 'riscv/for-v5.4-rc5-b' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 6995a6a5
      Linus Torvalds authored
      Pull RISC-V fixes from Paul Walmsley:
       "Several minor fixes and cleanups for v5.4-rc5:
      
         - Three build fixes for various SPARSEMEM-related kernel
           configurations
      
         - Two cleanup patches for the kernel bug and breakpoint trap handler
           code"
      
      * tag 'riscv/for-v5.4-rc5-b' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: cleanup do_trap_break
        riscv: cleanup <asm/bug.h>
        riscv: Fix undefined reference to vmemmap_populate_basepages
        riscv: Fix implicit declaration of 'page_to_section'
        riscv: fix fs/proc/kcore.c compilation with sparsemem enabled
      6995a6a5
  7. 26 Oct, 2019 11 commits