• Dave Chinner's avatar
    xfs: prevent stack overflows from page cache allocation · ad22c7a0
    Dave Chinner authored
    Page cache allocation doesn't always go through ->begin_write and
    hence we don't always get the opportunity to set the allocation
    context to GFP_NOFS. Failing to do this means we open up the direct
    relcaim stack to recurse into the filesystem and consume a
    significant amount of stack.
    
    On RHEL6.4 kernels we are seeing ra_submit() and
    generic_file_splice_read() from an nfsd context recursing into the
    filesystem via the inode cache shrinker and evicting inodes. This is
    causing truncation to be run (e.g EOF block freeing) and causing
    bmap btree block merges and free space btree block splits to occur.
    These btree manipulations are occurring with the call chain already
    30 functions deep and hence there is not enough stack space to
    complete such operations.
    
    To avoid these specific overruns, we need to prevent the page cache
    allocation from recursing via direct reclaim. We can do that because
    the allocation functions take the allocation context from that which
    is stored in the mapping for the inode. We don't set that right now,
    so the default is GFP_HIGHUSER_MOVABLE, which is effectively a
    GFP_KERNEL context. We need it to be the equivalent of GFP_NOFS, so
    when we initialise an inode, set the mapping gfp mask appropriately.
    
    This makes the use of AOP_FLAG_NOFS redundant from other parts of
    the XFS IO path, so get rid of it.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarBen Myers <bpm@sgi.com>
    ad22c7a0
xfs_iops.c 30.8 KB