• Ross Zwisler's avatar
    mm: add vm_insert_mixed_mkwrite() · b2770da6
    Ross Zwisler authored
    When servicing mmap() reads from file holes the current DAX code
    allocates a page cache page of all zeroes and places the struct page
    pointer in the mapping->page_tree radix tree.  This has three major
    drawbacks:
    
    1) It consumes memory unnecessarily. For every 4k page that is read via
       a DAX mmap() over a hole, we allocate a new page cache page. This
       means that if you read 1GiB worth of pages, you end up using 1GiB of
       zeroed memory.
    
    2) It is slower than using a common zero page because each page fault
       has more work to do. Instead of just inserting a common zero page we
       have to allocate a page cache page, zero it, and then insert it.
    
    3) The fact that we had to check for both DAX exceptional entries and
       for page cache pages in the radix tree made the DAX code more
       complex.
    
    This series solves these issues by following the lead of the DAX PMD
    code and using a common 4k zero page instead.  This reduces memory usage
    and decreases latencies for some workloads, and it simplifies the DAX
    code, removing over 100 lines in total.
    
    This patch (of 5):
    
    To be able to use the common 4k zero page in DAX we need to have our PTE
    fault path look more like our PMD fault path where a PTE entry can be
    marked as dirty and writeable as it is first inserted rather than
    waiting for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault()
    call.
    
    Right now we can rely on having a dax_pfn_mkwrite() call because we can
    distinguish between these two cases in do_wp_page():
    
    	case 1: 4k zero page => writable DAX storage
    	case 2: read-only DAX storage => writeable DAX storage
    
    This distinction is made by via vm_normal_page().  vm_normal_page()
    returns false for the common 4k zero page, though, just as it does for
    DAX ptes.  Instead of special casing the DAX + 4k zero page case we will
    simplify our DAX PTE page fault sequence so that it matches our DAX PMD
    sequence, and get rid of the dax_pfn_mkwrite() helper.  We will instead
    use dax_iomap_fault() to handle write-protection faults.
    
    This means that insert_pfn() needs to follow the lead of
    insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag.  If 'mkwrite'
    is set insert_pfn() will do the work that was previously done by
    wp_page_reuse() as part of the dax_pfn_mkwrite() call path.
    
    Link: http://lkml.kernel.org/r/20170724170616.25810-2-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
    Reviewed-by: default avatarJan Kara <jack@suse.cz>
    Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
    Cc: "Theodore Ts'o" <tytso@mit.edu>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andreas Dilger <adilger.kernel@dilger.ca>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Matthew Wilcox <mawilcox@microsoft.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    b2770da6
memory.c 120 KB