• Andrew Morton's avatar
    [PATCH] don't allocate ratnodes under PF_MEMALLOC · 49c7ca7c
    Andrew Morton authored
    On the swap_out() path, the radix-tree pagecache is allocating its
    nodes with PF_MEMALLOC set, which allows it to completely exhaust the
    free page lists(*).  This is fairly easy to trigger with swap-intensive
    loads.
    
    It would be better to make those node allocations fail at an earlier
    time.  When this happens, the radix-tree can still obtain nodes from its
    mempool, and we leave some memory available for the I/O layer.
    (Assuming that the I/O is being performed under PF_MEMALLOC, which it
    is).
    
    So the patch simply drops PF_MEMALLOC while adding nodes to the
    swapcache's tree.
    
    We're still performing atomic allocations, so the rat is still biting
    pretty deeply into the page reserves - under heavy load the amount of
    free memory is less than half of what it was pre-rat.
    
    It is unfortunate that the page allocator overloads !__GFP_WAIT to also
    mean "try harder".  It would be better to separate these concepts, and
    to allow the radix-tree code (at least) to perform atomic allocations,
    but to not go below pages_min.  It seems that __GFP_TRY_HARDER will be
    pretty straightforward to implement.  Later.
    
    The patch also impements a workaround for the mempool list_head
    problem, until that is sorted out.
    
    
    
    (*) The usual result is that the SCSI layer dies at scsi_merge.c:82.
    It would be nice to have a fix for that - it's going BUG if 1-order
    allocations fail at interrupt time.  That happens pretty easily.
    49c7ca7c
vmscan.c 19.1 KB