1. 13 May, 2022 19 commits
  2. 10 May, 2022 21 commits
    • NeilBrown's avatar
      VFS: add FMODE_CAN_ODIRECT file flag · a2ad63da
      NeilBrown authored
      Currently various places test if direct IO is possible on a file by
      checking for the existence of the direct_IO address space operation.
      This is a poor choice, as the direct_IO operation may not be used - it is
      only used if the generic_file_*_iter functions are called for direct IO
      and some filesystems - particularly NFS - don't do this.
      
      Instead, introduce a new f_mode flag: FMODE_CAN_ODIRECT and change the
      various places to check this (avoiding pointer dereferences).
      do_dentry_open() will set this flag if ->direct_IO is present, so
      filesystems do not need to be changed.
      
      NFS *is* changed, to set the flag explicitly and discard the direct_IO
      entry in the address_space_operations for files.
      
      Other filesystems which currently use noop_direct_IO could usefully be
      changed to set this flag instead.
      
      Link: https://lkml.kernel.org/r/164859778128.29473.15189737957277399416.stgit@noble.brownReviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a2ad63da
    • NeilBrown's avatar
      MM: handle THP in swap_*page_fs() - count_vm_events() · 6341a446
      NeilBrown authored
      We need to use count_swpout_vm_event() for sio_write_complete() to get
      correct counting.
      
      Note that THP swap in (if it ever happens) is current accounted 1 for each
      page, whether HUGE or normal.  This is different from swap-out accounting.
      
      This patch should be squashed into
          MM: handle THP in swap_*page_fs()
      
      Link: https://lkml.kernel.org/r/165146948934.24404.5909750610552745025@noble.neil.brown.nameSigned-off-by: default avatarNeilBrown <neilb@suse.de>
      Reported-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6341a446
    • NeilBrown's avatar
      mm: handle THP in swap_*page_fs() · a1a0dfd5
      NeilBrown authored
      Pages passed to swap_readpage()/swap_writepage() are not necessarily all
      the same size - there may be transparent-huge-pages involves.
      
      The BIO paths of swap_*page() handle this correctly, but the SWP_FS_OPS
      path does not.
      
      So we need to use thp_size() to find the size, not just assume PAGE_SIZE,
      and we need to track the total length of the request, not just assume it
      is "page * PAGE_SIZE".
      
      Link: https://lkml.kernel.org/r/165119301488.15698.9457662928942765453.stgit@noble.brownSigned-off-by: default avatarNeilBrown <neilb@suse.de>
      Reported-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a1a0dfd5
    • NeilBrown's avatar
      mm: submit multipage write for SWP_FS_OPS swap-space · 2282679f
      NeilBrown authored
      swap_writepage() is given one page at a time, but may be called repeatedly
      in succession.
      
      For block-device swapspace, the blk_plug functionality allows the multiple
      pages to be combined together at lower layers.  That cannot be used for
      SWP_FS_OPS as blk_plug may not exist - it is only active when
      CONFIG_BLOCK=y.  Consequently all swap reads over NFS are single page
      reads.
      
      With this patch we pass a pointer-to-pointer via the wbc.  swap_writepage
      can store state between calls - much like the pointer passed explicitly to
      swap_readpage.  After calling swap_writepage() some number of times, the
      state will be passed to swap_write_unplug() which can submit the combined
      request.
      
      Link: https://lkml.kernel.org/r/164859778128.29473.5191868522654408537.stgit@noble.brownSigned-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2282679f
    • NeilBrown's avatar
      mm: submit multipage reads for SWP_FS_OPS swap-space · 5169b844
      NeilBrown authored
      swap_readpage() is given one page at a time, but may be called repeatedly
      in succession.
      
      For block-device swap-space, the blk_plug functionality allows the
      multiple pages to be combined together at lower layers.  That cannot be
      used for SWP_FS_OPS as blk_plug may not exist - it is only active when
      CONFIG_BLOCK=y.  Consequently all swap reads over NFS are single page
      reads.
      
      With this patch we pass in a pointer-to-pointer when swap_readpage can
      store state between calls - much like the effect of blk_plug.  After
      calling swap_readpage() some number of times, the state will be passed to
      swap_read_unplug() which can submit the combined request.
      
      Link: https://lkml.kernel.org/r/164859778127.29473.14059420492644907783.stgit@noble.brownSigned-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5169b844
    • NeilBrown's avatar
      doc: update documentation for swap_activate and swap_rw · cba738f6
      NeilBrown authored
      This documentation for ->swap_activate() has been out-of-date for a long
      time.  This patch updates it to match recent changes, and adds
      documentation for the associated ->swap_rw()
      
      Link: https://lkml.kernel.org/r/164859778126.29473.6778751233552859461.stgit@noble.brownSigned-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cba738f6
    • NeilBrown's avatar
      mm: perform async writes to SWP_FS_OPS swap-space using ->swap_rw · 7eadabc0
      NeilBrown authored
      This patch switches swap-out to SWP_FS_OPS swap-spaces to use ->swap_rw
      and makes the writes asynchronous, like they are for other swap spaces.
      
      To make it async we need to allocate the kiocb struct from a mempool. 
      This may block, but won't block as long as waiting for the write to
      complete.  At most it will wait for some previous swap IO to complete.
      
      Link: https://lkml.kernel.org/r/164859778126.29473.12399585304843922231.stgit@noble.brownSigned-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7eadabc0
    • NeilBrown's avatar
      nfs: rename nfs_direct_IO and use as ->swap_rw · eb79f3af
      NeilBrown authored
      The nfs_direct_IO() exists to support SWAP IO, but hasn't worked for a
      while.  We now need a ->swap_rw function which behaves slightly
      differently, returning zero for success rather than a byte count.
      
      So modify nfs_direct_IO accordingly, rename it, and use it as the
      ->swap_rw function.
      
      Link: https://lkml.kernel.org/r/165119301493.15698.7491285551903597618.stgit@noble.brownSigned-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> (on Renesas RSK+RZA1 with 32 MiB of SDRAM)
      Cc: David Howells <dhowells@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      eb79f3af
    • NeilBrown's avatar
      mm: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space · e1209d3a
      NeilBrown authored
      swap currently uses ->readpage to read swap pages.  This can only request
      one page at a time from the filesystem, which is not most efficient.
      
      swap uses ->direct_IO for writes which while this is adequate is an
      inappropriate over-loading.  ->direct_IO may need to had handle allocate
      space for holes or other details that are not relevant for swap.
      
      So this patch introduces a new address_space operation: ->swap_rw.  In
      this patch it is used for reads, and a subsequent patch will switch writes
      to use it.
      
      No filesystem yet supports ->swap_rw, but that is not a problem because
      no filesystem actually works with filesystem-based swap.
      Only two filesystems set SWP_FS_OPS:
      - cifs sets the flag, but ->direct_IO always fails so swap cannot work.
      - nfs sets the flag, but ->direct_IO calls generic_write_checks()
        which has failed on swap files for several releases.
      
      To ensure that a NULL ->swap_rw isn't called, ->activate_swap() for both
      NFS and cifs are changed to fail if ->swap_rw is not set.  This can be
      removed if/when the function is added.
      
      Future patches will restore swap-over-NFS functionality.
      
      To submit an async read with ->swap_rw() we need to allocate a structure
      to hold the kiocb and other details.  swap_readpage() cannot handle
      transient failure, so we create a mempool to provide the structures.
      
      Link: https://lkml.kernel.org/r/164859778125.29473.13430559328221330589.stgit@noble.brownSigned-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e1209d3a
    • NeilBrown's avatar
      mm: reclaim mustn't enter FS for SWP_FS_OPS swap-space · d791ea67
      NeilBrown authored
      If swap-out is using filesystem operations (SWP_FS_OPS), then it is not
      safe to enter the FS for reclaim.  So only down-grade the requirement for
      swap pages to __GFP_IO after checking that SWP_FS_OPS are not being used.
      
      This makes the calculation of "may_enter_fs" slightly more complex, so
      move it into a separate function.  with that done, there is little value
      in maintaining the bool variable any more.  So replace the may_enter_fs
      variable with a may_enter_fs() function.  This removes any risk for the
      variable becoming out-of-date.
      
      Link: https://lkml.kernel.org/r/164859778124.29473.16176717935781721855.stgit@noble.brownSigned-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d791ea67
    • NeilBrown's avatar
      mm: move responsibility for setting SWP_FS_OPS to ->swap_activate · 4b60c0ff
      NeilBrown authored
      If a filesystem wishes to handle all swap IO itself (via ->direct_IO and
      ->readpage), rather than just providing devices addresses for
      submit_bio(), SWP_FS_OPS must be set.
      
      Currently the protocol for setting this it to have ->swap_activate return
      zero.  In that case SWP_FS_OPS is set, and add_swap_extent() is called for
      the entire file.
      
      This is a little clumsy as different return values for ->swap_activate
      have quite different meanings, and it makes it hard to search for which
      filesystems require SWP_FS_OPS to be set.
      
      So remove the special meaning of a zero return, and require the filesystem
      to set SWP_FS_OPS if it so desires, and to always call add_swap_extent()
      as required.
      
      Currently only NFS and CIFS return zero for add_swap_extent().
      
      Link: https://lkml.kernel.org/r/164859778123.29473.17908205846599043598.stgit@noble.brownSigned-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4b60c0ff
    • NeilBrown's avatar
      mm: drop swap_dirty_folio · 4c4a7634
      NeilBrown authored
      folios that are written to swap are owned by the MM subsystem - not any
      filesystem.
      
      When such a folio is passed to a filesystem to be written out to a
      swap-file, the filesystem handles the data, but the folio itself does not
      belong to the filesystem.  So calling the filesystem's ->dirty_folio()
      address_space operation makes no sense.  This is for folios in the given
      address space, and a folio to be written to swap does not exist in the
      given address space.
      
      So drop swap_dirty_folio() which calls the address-space's
      ->dirty_folio(), and always use noop_dirty_folio(), which is appropriate
      for folios being swapped out.
      
      Link: https://lkml.kernel.org/r/164859778123.29473.6900942583784889976.stgit@noble.brownSigned-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4c4a7634
    • NeilBrown's avatar
      mm: create new mm/swap.h header file · 014bb1de
      NeilBrown authored
      Patch series "MM changes to improve swap-over-NFS support".
      
      Assorted improvements for swap-via-filesystem.
      
      This is a resend of these patches, rebased on current HEAD.  The only
      substantial changes is that swap_dirty_folio has replaced
      swap_set_page_dirty.
      
      Currently swap-via-fs (SWP_FS_OPS) doesn't work for any filesystem.  It
      has previously worked for NFS but that broke a few releases back.  This
      series changes to use a new ->swap_rw rather than ->readpage and
      ->direct_IO.  It also makes other improvements.
      
      There is a companion series already in linux-next which fixes various
      issues with NFS.  Once both series land, a final patch is needed which
      changes NFS over to use ->swap_rw.
      
      
      This patch (of 10):
      
      Many functions declared in include/linux/swap.h are only used within mm/
      
      Create a new "mm/swap.h" and move some of these declarations there.
      Remove the redundant 'extern' from the function declarations.
      
      [akpm@linux-foundation.org: mm/memory-failure.c needs mm/swap.h]
      Link: https://lkml.kernel.org/r/164859751830.29473.5309689752169286816.stgit@noble.brown
      Link: https://lkml.kernel.org/r/164859778120.29473.11725907882296224053.stgit@noble.brownSigned-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      014bb1de
    • Joel Savitz's avatar
      selftests: clarify common error when running gup_test · 17de1e55
      Joel Savitz authored
      The gup_test binary will fail showing only the output of perror("open") in
      the case that /sys/kernel/debug/gup_test is not found. This will almost
      always be due to CONFIG_GUP_TEST not being set, which enables
      compilation of a kernel that provides this file.
      
      Add a short error message to clarify this failure and point the user to
      the solution.
      
      Link: https://lkml.kernel.org/r/20220502224942.995427-1-jsavitz@redhat.comSigned-off-by: default avatarJoel Savitz <jsavitz@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Nico Pache <npache@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      17de1e55
    • Yury Norov's avatar
      mm/gup: fix comments to pin_user_pages_*() · 0768c8de
      Yury Norov authored
      pin_user_pages API forces FOLL_PIN in gup_flags, which means that the API
      requires struct page **pages to be provided (not NULL).  However, the
      comment to pin_user_pages() clearly allows for passing in a NULL @pages
      argument.
      
      Remove the incorrect comments, and add WARN_ON_ONCE(!pages) calls to
      enforce the API.
      
      It has been independently spotted by Minchan Kim and confirmed with
      John Hubbard:
      
      https://lore.kernel.org/all/YgWA0ghrrzHONehH@google.com/
      
      Link: https://lkml.kernel.org/r/20220422015839.1274328-1-yury.norov@gmail.comSigned-off-by: default avatarYury Norov (NVIDIA) <yury.norov@gmail.com>
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0768c8de
    • David Hildenbrand's avatar
      powerpc/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE for book3s · bff9beaa
      David Hildenbrand authored
      Right now, the last 5 bits (0x1f) of the swap entry are used for the type
      and the bit before that (0x20) is used for _PAGE_SWP_SOFT_DIRTY.  We
      cannot use 0x40, as that collides with _RPAGE_RSV1 -- contained in
      _PAGE_HPTEFLAGS.  The next candidate would be _RPAGE_SW3 (0x200) -- which
      is used for _PAGE_SOFT_DIRTY for !swp ptes.
      
      So let's just use _PAGE_SOFT_DIRTY for _PAGE_SWP_SOFT_DIRTY (to make it
      easier to grasp) and use 0x20 now for _PAGE_SWP_EXCLUSIVE.
      
      Link: https://lkml.kernel.org/r/20220329164329.208407-9-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Liang Zhang <zhangliang5@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Oded Gabbay <oded.gabbay@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bff9beaa
    • David Hildenbrand's avatar
      powerpc/pgtable: remove _PAGE_BIT_SWAP_TYPE for book3s · 03ac1b71
      David Hildenbrand authored
      The swap type is simply stored in bits 0x1f of the swap pte.  Let's
      simplify by just getting rid of _PAGE_BIT_SWAP_TYPE.  It's not like that
      we can simply change it: _PAGE_SWP_SOFT_DIRTY would suddenly fall into
      _RPAGE_RSV1, which isn't possible and would make the
      BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY) angry.
      
      While at it, make it clearer which bit we're actually using for
      _PAGE_SWP_SOFT_DIRTY by just using the proper define and introduce and use
      SWP_TYPE_MASK.
      
      Link: https://lkml.kernel.org/r/20220329164329.208407-8-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Liang Zhang <zhangliang5@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Oded Gabbay <oded.gabbay@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      03ac1b71
    • David Hildenbrand's avatar
      s390/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE · 92cd58bd
      David Hildenbrand authored
      Let's use bit 52, which is unused.
      
      Link: https://lkml.kernel.org/r/20220329164329.208407-7-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Liang Zhang <zhangliang5@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Oded Gabbay <oded.gabbay@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      92cd58bd
    • David Hildenbrand's avatar
      s390/pgtable: cleanup description of swp pte layout · 8043d26c
      David Hildenbrand authored
      Bit 52 and bit 55 don't have to be zero: they only trigger a
      translation-specifiation exception if the PTE is marked as valid, which is
      not the case for swap ptes.
      
      Document which bits are used for what, and which ones are unused.
      
      Link: https://lkml.kernel.org/r/20220329164329.208407-6-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Liang Zhang <zhangliang5@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Oded Gabbay <oded.gabbay@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8043d26c
    • David Hildenbrand's avatar
      arm64/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE · 570ef363
      David Hildenbrand authored
      Let's use one of the type bits: core-mm only supports 5, so there is no
      need to consume 6.
      
      Note that we might be able to reuse bit 1, but reusing bit 1 turned out
      problematic in the past for PROT_NONE handling; so let's play safe and use
      another bit.
      
      Link: https://lkml.kernel.org/r/20220329164329.208407-5-david@redhat.comReviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Liang Zhang <zhangliang5@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Oded Gabbay <oded.gabbay@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      570ef363
    • David Hildenbrand's avatar
      x86/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE · 3e20889c
      David Hildenbrand authored
      Let's use bit 3 to remember PG_anon_exclusive in swap ptes.
      
      [david@redhat.com: fix 32-bit swap layout]
        Link: https://lkml.kernel.org/r/d875c292-46b3-f281-65ae-71d0b0c6f592@redhat.com
      Link: https://lkml.kernel.org/r/20220329164329.208407-4-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Liang Zhang <zhangliang5@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Oded Gabbay <oded.gabbay@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3e20889c