1. 17 Jun, 2022 4 commits
    • Peter Xu's avatar
      mm: avoid unnecessary page fault retires on shared memory types · d9272525
      Peter Xu authored
      I observed that for each of the shared file-backed page faults, we're very
      likely to retry one more time for the 1st write fault upon no page.  It's
      because we'll need to release the mmap lock for dirty rate limit purpose
      with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
      
      Then after that throttling we return VM_FAULT_RETRY.
      
      We did that probably because VM_FAULT_RETRY is the only way we can return
      to the fault handler at that time telling it we've released the mmap lock.
      
      However that's not ideal because it's very likely the fault does not need
      to be retried at all since the pgtable was well installed before the
      throttling, so the next continuous fault (including taking mmap read lock,
      walk the pgtable, etc.) could be in most cases unnecessary.
      
      It's not only slowing down page faults for shared file-backed, but also add
      more mmap lock contention which is in most cases not needed at all.
      
      To observe this, one could try to write to some shmem page and look at
      "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
      shmem write simply because we retried, and vm event "pgfault" will capture
      that.
      
      To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
      show that we've completed the whole fault and released the lock.  It's also
      a hint that we should very possibly not need another fault immediately on
      this page because we've just completed it.
      
      This patch provides a ~12% perf boost on my aarch64 test VM with a simple
      program sequentially dirtying 400MB shmem file being mmap()ed and these are
      the time it needs:
      
        Before: 650.980 ms (+-1.94%)
        After:  569.396 ms (+-1.38%)
      
      I believe it could help more than that.
      
      We need some special care on GUP and the s390 pgfault handler (for gmap
      code before returning from pgfault), the rest changes in the page fault
      handlers should be relatively straightforward.
      
      Another thing to mention is that mm_account_fault() does take this new
      fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
      
      I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
      not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
      them as-is.
      
      Link: https://lkml.kernel.org/r/20220530183450.42886-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarVineet Gupta <vgupta@kernel.org>
      Acked-by: default avatarGuo Ren <guoren@kernel.org>
      Acked-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Acked-by: default avatarChristian Borntraeger <borntraeger@linux.ibm.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarAlistair Popple <apopple@nvidia.com>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>	[arm part]
      Acked-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Brian Cain <bcain@quicinc.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Janosch Frank <frankja@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Will Deacon <will@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Yoshinori Sato <ysato@users.osdn.me>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d9272525
    • Yuanzheng Song's avatar
      tools/vm/slabinfo: use alphabetic order when two values are equal · 4f5ceb88
      Yuanzheng Song authored
      When the number of partial slabs in each cache is the same (e.g., the
      value are 0), the results of the `slabinfo -X -N5` and `slabinfo -P -N5`
      are different.
      
      / # slabinfo -X -N5
      ...
      Slabs sorted by number of partial slabs
      ---------------------------------------
      Name                   Objects Objsize           Space Slabs/Part/Cpu  O/S O %Fr %Ef Flg
      inode_cache              15180     392         6217728b        758/0/1   20 1   0  95 a
      kernfs_node_cache        22494      88         2002944        488/0/1   46 0   0  98
      shmem_inode_cache          663     464          319488         38/0/1   17 1   0  96
      biovec-max                  50    3072          163840          4/0/1   10 3   0  93 A
      dentry                   19050     136         2600960        633/0/2   30 0   0  99 a
      
      / # slabinfo -P -N5
      Name                   Objects Objsize           Space Slabs/Part/Cpu  O/S O %Fr %Ef Flg
      bdev_cache                  32     984           32.7K          1/0/1   16 2   0  96 Aa
      ext4_inode_cache            42     752           32.7K          1/0/1   21 2   0  96 a
      dentry                   19050     136            2.6M        633/0/2   30 0   0  99 a
      TCPv6                       17    1840           32.7K          0/0/1   17 3   0  95 A
      RAWv6                       18     856           16.3K          0/0/1   18 2   0  94 A
      
      This problem is caused by the sort_slabs().  So let's use alphabetic order
      when two values are equal in the sort_slabs().
      
      By the way, the content of the `slabinfo -h` is not aligned because the
      
      `-P|--partial Sort by number of partial slabs`
      
      uses tabs instead of spaces.  So let's use spaces instead of tabs to fix
      it.
      
      Link: https://lkml.kernel.org/r/20220528063117.935158-1-songyuanzheng@huawei.com
      Fixes: 1106b205 ("tools/vm/slabinfo: add partial slab listing to -X")
      Signed-off-by: default avatarYuanzheng Song <songyuanzheng@huawei.com>
      Cc: "Tobin C. Harding" <tobin@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4f5ceb88
    • Fanjun Kong's avatar
      mm: use PAGE_ALIGNED instead of IS_ALIGNED · 0b82ade6
      Fanjun Kong authored
      <linux/mm.h> already provides the PAGE_ALIGNED macro.  Let's use this
      macro instead of IS_ALIGNED and passing PAGE_SIZE directly.
      
      Link: https://lkml.kernel.org/r/20220526140257.1568744-1-bh1scw@gmail.comSigned-off-by: default avatarFanjun Kong <bh1scw@gmail.com>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0b82ade6
    • Peter Xu's avatar
      mm/x86: remove dead code for hugetlbpage.c · cd16dd03
      Peter Xu authored
      It seems to exist since the old times and never used once.  Remove them.
      
      Link: https://lkml.kernel.org/r/20220525195220.10241-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cd16dd03
  2. 12 Jun, 2022 10 commits
  3. 11 Jun, 2022 9 commits
    • Linus Torvalds's avatar
      Merge tag 'gpio-fixes-for-v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 7a68065e
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
       "A set of fixes. Most address the new warning we emit at build time
        when irq chips are not immutable with some additional tweaks to
        gpio-crystalcove from Andy and a small tweak to gpio-dwapd.
      
         - make irq_chip structs immutable in several Diolan and intel drivers
           to get rid of the new warning we emit when fiddling with irq chips
      
         - don't print error messages on probe deferral in gpio-dwapb"
      
      * tag 'gpio-fixes-for-v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: dwapb: Don't print error on -EPROBE_DEFER
        gpio: dln2: make irq_chip immutable
        gpio: sch: make irq_chip immutable
        gpio: merrifield: make irq_chip immutable
        gpio: wcove: make irq_chip immutable
        gpio: crystalcove: Join function declarations and long lines
        gpio: crystalcove: Use specific type and API for IRQ number
        gpio: crystalcove: make irq_chip immutable
      7a68065e
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · cecb3540
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Driver fixes and and one core patch.
      
        Nine of the driver patches are minor fixes and reworks to lpfc and the
        rest are trivial and minor fixes elsewhere"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: pmcraid: Fix missing resource cleanup in error case
        scsi: ipr: Fix missing/incorrect resource cleanup in error case
        scsi: mpt3sas: Fix out-of-bounds compiler warning
        scsi: lpfc: Update lpfc version to 14.2.0.4
        scsi: lpfc: Allow reduced polling rate for nvme_admin_async_event cmd completion
        scsi: lpfc: Add more logging of cmd and cqe information for aborted NVMe cmds
        scsi: lpfc: Fix port stuck in bypassed state after LIP in PT2PT topology
        scsi: lpfc: Resolve NULL ptr dereference after an ELS LOGO is aborted
        scsi: lpfc: Address NULL pointer dereference after starget_to_rport()
        scsi: lpfc: Resolve some cleanup issues following SLI path refactoring
        scsi: lpfc: Resolve some cleanup issues following abort path refactoring
        scsi: lpfc: Correct BDE type for XMIT_SEQ64_WQE in lpfc_ct_reject_event()
        scsi: vmw_pvscsi: Expand vcpuHint to 16 bits
        scsi: sd: Fix interpretation of VPD B9h length
      cecb3540
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · abe71eb3
      Linus Torvalds authored
      Pull virtio fixes from Michael Tsirkin:
       "Fixes all over the place, most notably fixes for latent bugs in
        drivers that got exposed by suppressing interrupts before DRIVER_OK,
        which in turn has been done by 8b4ec69d ("virtio: harden vring
        IRQ")"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        um: virt-pci: set device ready in probe()
        vdpa: make get_vq_group and set_group_asid optional
        virtio: Fix all occurences of the "the the" typo
        vduse: Fix NULL pointer dereference on sysfs access
        vringh: Fix loop descriptors check in the indirect cases
        vdpa/mlx5: clean up indenting in handle_ctrl_vlan()
        vdpa/mlx5: fix error code for deleting vlan
        virtio-mmio: fix missing put_device() when vm_cmdline_parent registration failed
        vdpa/mlx5: Fix syntax errors in comments
        virtio-rng: make device ready before making request
      abe71eb3
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-5.19-1' of... · 0678afa6
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen.
       "Fix build errors and a stale comment"
      
      * tag 'loongarch-fixes-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: Remove MIPS comment about cycle counter
        LoongArch: Fix copy_thread() build errors
        LoongArch: Fix the !CONFIG_SMP build
      0678afa6
    • Linus Torvalds's avatar
      iov_iter: fix build issue due to possible type mis-match · 1c27f1fc
      Linus Torvalds authored
      Commit 6c776766 ("iov_iter: Fix iter_xarray_get_pages{,_alloc}()")
      introduced a problem on some 32-bit architectures (at least arm, xtensa,
      csky,sparc and mips), that have a 'size_t' that is 'unsigned int'.
      
      The reason is that we now do
      
          min(nr * PAGE_SIZE - offset, maxsize);
      
      where 'nr' and 'offset' and both 'unsigned int', and PAGE_SIZE is
      'unsigned long'.  As a result, the normal C type rules means that the
      first argument to 'min()' ends up being 'unsigned long'.
      
      In contrast, 'maxsize' is of type 'size_t'.
      
      Now, 'size_t' and 'unsigned long' are always the same physical type in
      the kernel, so you'd think this doesn't matter, and from an actual
      arithmetic standpoint it doesn't.
      
      But on 32-bit architectures 'size_t' is commonly 'unsigned int', even if
      it could also be 'unsigned long'.  In that situation, both are unsigned
      32-bit types, but they are not the *same* type.
      
      And as a result 'min()' will complain about the distinct types (ignore
      the "pointer types" part of the error message: that's an artifact of the
      way we have made 'min()' check types for being the same):
      
        lib/iov_iter.c: In function 'iter_xarray_get_pages':
        include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror]
           20 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
              |                                   ^~
        lib/iov_iter.c:1464:16: note: in expansion of macro 'min'
         1464 |         return min(nr * PAGE_SIZE - offset, maxsize);
              |                ^~~
      
      This was not visible on 64-bit architectures (where we always define
      'size_t' to be 'unsigned long').
      
      Force these cases to use 'min_t(size_t, x, y)' to make the type explicit
      and avoid the issue.
      
      [ Nit-picky note: technically 'size_t' doesn't have to match 'unsigned
        long' arithmetically. We've certainly historically seen environments
        with 16-bit address spaces and 32-bit 'unsigned long'.
      
        Similarly, even in 64-bit modern environments, 'size_t' could be its
        own type distinct from 'unsigned long', even if it were arithmetically
        identical.
      
        So the above type commentary is only really descriptive of the kernel
        environment, not some kind of universal truth for the kinds of wild
        and crazy situations that are allowed by the C standard ]
      Reported-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Link: https://lore.kernel.org/all/YqRyL2sIqQNDfky2@debian/
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c27f1fc
    • Jason A. Donenfeld's avatar
      wireguard: selftests: use maximum cpu features and allow rng seeding · 17b0128a
      Jason A. Donenfeld authored
      By forcing the maximum CPU that QEMU has available, we expose additional
      capabilities, such as the RNDR instruction, which increases test
      coverage. This then allows the CI to skip the fake seeding step in some
      cases. Also enable STRICT_KERNEL_RWX to catch issues related to early
      jump labels when the RNG is initialized at boot.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      17b0128a
    • Kuan-Ying Lee's avatar
      scripts/gdb: change kernel config dumping method · 1f7a6cf6
      Kuan-Ying Lee authored
      MAGIC_START("IKCFG_ST") and MAGIC_END("IKCFG_ED") are moved out
      from the kernel_config_data variable.
      
      Thus, we parse kernel_config_data directly instead of considering
      offset of MAGIC_START and MAGIC_END.
      
      Fixes: 13610aa9 ("kernel/configs: use .incbin directive to embed config_data.gz")
      Signed-off-by: default avatarKuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      1f7a6cf6
    • Vincent Whitchurch's avatar
      um: virt-pci: set device ready in probe() · eacea844
      Vincent Whitchurch authored
      Call virtio_device_ready() to make this driver work after commit
      b4ec69d7e09 ("virtio: harden vring IRQ"), since the driver uses the
      virtqueues in the probe function.  (The virtio core sets the device
      ready when probe returns.)
      
      Fixes: 8b4ec69d ("virtio: harden vring IRQ")
      Fixes: 68f5d3f3 ("um: add PCI over virtio emulation driver")
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Message-Id: <20220610151203.3492541-1-vincent.whitchurch@axis.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Tested-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      eacea844
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · 0885eacd
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
       "Notable changes:
      
         - There is now a backup maintainer for NFSD
      
        Notable fixes:
      
         - Prevent array overruns in svc_rdma_build_writes()
      
         - Prevent buffer overruns when encoding NFSv3 READDIR results
      
         - Fix a potential UAF in nfsd_file_put()"
      
      * tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        SUNRPC: Remove pointer type casts from xdr_get_next_encode_buffer()
        SUNRPC: Clean up xdr_get_next_encode_buffer()
        SUNRPC: Clean up xdr_commit_encode()
        SUNRPC: Optimize xdr_reserve_space()
        SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()
        SUNRPC: Trap RDMA segment overflows
        NFSD: Fix potential use-after-free in nfsd_file_put()
        MAINTAINERS: reciprocal co-maintainership for file locking and nfsd
      0885eacd
  4. 10 Jun, 2022 17 commits