• Jeff Xu's avatar
    mseal: add mseal syscall · 8be7258a
    Jeff Xu authored
    The new mseal() is an syscall on 64 bit CPU, and with following signature:
    
    int mseal(void addr, size_t len, unsigned long flags)
    addr/len: memory range.
    flags: reserved.
    
    mseal() blocks following operations for the given memory range.
    
    1> Unmapping, moving to another location, and shrinking the size,
       via munmap() and mremap(), can leave an empty space, therefore can
       be replaced with a VMA with a new set of attributes.
    
    2> Moving or expanding a different VMA into the current location,
       via mremap().
    
    3> Modifying a VMA via mmap(MAP_FIXED).
    
    4> Size expansion, via mremap(), does not appear to pose any specific
       risks to sealed VMAs. It is included anyway because the use case is
       unclear. In any case, users can rely on merging to expand a sealed VMA.
    
    5> mprotect() and pkey_mprotect().
    
    6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
       memory, when users don't have write permission to the memory. Those
       behaviors can alter region contents by discarding pages, effectively a
       memset(0) for anonymous memory.
    
    Following input during RFC are incooperated into this patch:
    
    Jann Horn: raising awareness and providing valuable insights on the
    destructive madvise operations.
    Linus Torvalds: assisting in defining system call signature and scope.
    Liam R. Howlett: perf optimization.
    Theo de Raadt: sharing the experiences and insight gained from
      implementing mimmutable() in OpenBSD.
    
    Finally, the idea that inspired this patch comes from Stephen Röttger's
    work in Chrome V8 CFI.
    
    [jeffxu@chromium.org: add branch prediction hint, per Pedro]
      Link: https://lkml.kernel.org/r/20240423192825.1273679-2-jeffxu@chromium.org
    Link: https://lkml.kernel.org/r/20240415163527.626541-3-jeffxu@chromium.orgSigned-off-by: default avatarJeff Xu <jeffxu@chromium.org>
    Reviewed-by: default avatarKees Cook <keescook@chromium.org>
    Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
    Cc: Pedro Falcato <pedro.falcato@gmail.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Guenter Roeck <groeck@chromium.org>
    Cc: Jann Horn <jannh@google.com>
    Cc: Jeff Xu <jeffxu@google.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
    Cc: Pedro Falcato <pedro.falcato@gmail.com>
    Cc: Stephen Röttger <sroettger@google.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
    Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    8be7258a
madvise.c 39.9 KB