• Michal Hocko's avatar
    mm: allow GFP_{FS,IO} for page_cache_read page cache allocation · 820ca577
    Michal Hocko authored
    commit c20cd45e upstream.
    
    page_cache_read has been historically using page_cache_alloc_cold to
    allocate a new page.  This means that mapping_gfp_mask is used as the
    base for the gfp_mask.  Many filesystems are setting this mask to
    GFP_NOFS to prevent from fs recursion issues.  page_cache_read is called
    from the vm_operations_struct::fault() context during the page fault.
    This context doesn't need the reclaim protection normally.
    
    ceph and ocfs2 which call filemap_fault from their fault handlers seem
    to be OK because they are not taking any fs lock before invoking generic
    implementation.  xfs which takes XFS_MMAPLOCK_SHARED is safe from the
    reclaim recursion POV because this lock serializes truncate and punch
    hole with the page faults and it doesn't get involved in the reclaim.
    
    There is simply no reason to deliberately use a weaker allocation
    context when a __GFP_FS | __GFP_IO can be used.  The GFP_NOFS protection
    might be even harmful.  There is a push to fail GFP_NOFS allocations
    rather than loop within allocator indefinitely with a very limited
    reclaim ability.  Once we start failing those requests the OOM killer
    might be triggered prematurely because the page cache allocation failure
    is propagated up the page fault path and end up in
    pagefault_out_of_memory.
    
    We cannot play with mapping_gfp_mask directly because that would be racy
    wrt.  parallel page faults and it might interfere with other users who
    really rely on NOFS semantic from the stored gfp_mask.  The mask is also
    inode proper so it would even be a layering violation.  What we can do
    instead is to push the gfp_mask into struct vm_fault and allow fs layer
    to overwrite it should the callback need to be called with a different
    allocation context.
    
    Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO)
    because this should be safe from the page fault path normally.  Why do
    we care about mapping_gfp_mask at all then? Because this doesn't hold
    only reclaim protection flags but it also might contain zone and
    movability restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have
    to respect those.
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Acked-by: default avatarJan Kara <jack@suse.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Mark Fasheh <mfasheh@suse.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    820ca577
filemap.c 72.2 KB