1. 25 Oct, 2023 37 commits
  2. 18 Oct, 2023 3 commits
    • Lorenzo Stoakes's avatar
      mm: perform the mapping_map_writable() check after call_mmap() · 15897894
      Lorenzo Stoakes authored
      In order for a F_SEAL_WRITE sealed memfd mapping to have an opportunity to
      clear VM_MAYWRITE, we must be able to invoke the appropriate
      vm_ops->mmap() handler to do so.  We would otherwise fail the
      mapping_map_writable() check before we had the opportunity to avoid it.
      
      This patch moves this check after the call_mmap() invocation.  Only memfd
      actively denies write access causing a potential failure here (in
      memfd_add_seals()), so there should be no impact on non-memfd cases.
      
      This patch makes the userland-visible change that MAP_SHARED, PROT_READ
      mappings of an F_SEAL_WRITE sealed memfd mapping will now succeed.
      
      There is a delicate situation with cleanup paths assuming that a writable
      mapping must have occurred in circumstances where it may now not have.  In
      order to ensure we do not accidentally mark a writable file unwritable by
      mistake, we explicitly track whether we have a writable mapping and unmap
      only if we do.
      
      [lstoakes@gmail.com: do not set writable_file_mapping in inappropriate case]
        Link: https://lkml.kernel.org/r/c9eb4cc6-7db4-4c2b-838d-43a0b319a4f0@lucifer.local
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217238
      Link: https://lkml.kernel.org/r/55e413d20678a1bb4c7cce889062bbb07b0df892.1697116581.git.lstoakes@gmail.comSigned-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      15897894
    • Lorenzo Stoakes's avatar
      mm: update memfd seal write check to include F_SEAL_WRITE · 28464bbb
      Lorenzo Stoakes authored
      The seal_check_future_write() function is called by shmem_mmap() or
      hugetlbfs_file_mmap() to disallow any future writable mappings of an memfd
      sealed this way.
      
      The F_SEAL_WRITE flag is not checked here, as that is handled via the
      mapping->i_mmap_writable mechanism and so any attempt at a mapping would
      fail before this could be run.
      
      However we intend to change this, meaning this check can be performed for
      F_SEAL_WRITE mappings also.
      
      The logic here is equally applicable to both flags, so update this
      function to accommodate both and rename it accordingly.
      
      Link: https://lkml.kernel.org/r/913628168ce6cce77df7d13a63970bae06a526e0.1697116581.git.lstoakes@gmail.comSigned-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      28464bbb
    • Lorenzo Stoakes's avatar
      mm: drop the assumption that VM_SHARED always implies writable · e8e17ee9
      Lorenzo Stoakes authored
      Patch series "permit write-sealed memfd read-only shared mappings", v4.
      
      The man page for fcntl() describing memfd file seals states the following
      about F_SEAL_WRITE:-
      
          Furthermore, trying to create new shared, writable memory-mappings via
          mmap(2) will also fail with EPERM.
      
      With emphasis on 'writable'.  In turns out in fact that currently the
      kernel simply disallows all new shared memory mappings for a memfd with
      F_SEAL_WRITE applied, rendering this documentation inaccurate.
      
      This matters because users are therefore unable to obtain a shared mapping
      to a memfd after write sealing altogether, which limits their usefulness. 
      This was reported in the discussion thread [1] originating from a bug
      report [2].
      
      This is a product of both using the struct address_space->i_mmap_writable
      atomic counter to determine whether writing may be permitted, and the
      kernel adjusting this counter when any VM_SHARED mapping is performed and
      more generally implicitly assuming VM_SHARED implies writable.
      
      It seems sensible that we should only update this mapping if VM_MAYWRITE
      is specified, i.e.  whether it is possible that this mapping could at any
      point be written to.
      
      If we do so then all we need to do to permit write seals to function as
      documented is to clear VM_MAYWRITE when mapping read-only.  It turns out
      this functionality already exists for F_SEAL_FUTURE_WRITE - we can
      therefore simply adapt this logic to do the same for F_SEAL_WRITE.
      
      We then hit a chicken and egg situation in mmap_region() where the check
      for VM_MAYWRITE occurs before we are able to clear this flag.  To work
      around this, perform this check after we invoke call_mmap(), with careful
      consideration of error paths.
      
      Thanks to Andy Lutomirski for the suggestion!
      
      [1]:https://lore.kernel.org/all/20230324133646.16101dfa666f253c4715d965@linux-foundation.org/
      [2]:https://bugzilla.kernel.org/show_bug.cgi?id=217238
      
      
      This patch (of 3):
      
      There is a general assumption that VMAs with the VM_SHARED flag set are
      writable.  If the VM_MAYWRITE flag is not set, then this is simply not the
      case.
      
      Update those checks which affect the struct address_space->i_mmap_writable
      field to explicitly test for this by introducing
      [vma_]is_shared_maywrite() helper functions.
      
      This remains entirely conservative, as the lack of VM_MAYWRITE guarantees
      that the VMA cannot be written to.
      
      Link: https://lkml.kernel.org/r/cover.1697116581.git.lstoakes@gmail.com
      Link: https://lkml.kernel.org/r/d978aefefa83ec42d18dfa964ad180dbcde34795.1697116581.git.lstoakes@gmail.comSigned-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Suggested-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e8e17ee9