• Michal Hocko's avatar
    Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements" · 8d80b5e1
    Michal Hocko authored
    commit 4e390b2b upstream.
    
    This reverts commit f9054c70 ("mm, mempool: only set __GFP_NOMEMALLOC
    if there are free elements").
    
    There has been a report about OOM killer invoked when swapping out to a
    dm-crypt device.  The primary reason seems to be that the swapout out IO
    managed to completely deplete memory reserves.  Ondrej was able to
    bisect and explained the issue by pointing to f9054c70 ("mm,
    mempool: only set __GFP_NOMEMALLOC if there are free elements").
    
    The reason is that the swapout path is not throttled properly because
    the md-raid layer needs to allocate from the generic_make_request path
    which means it allocates from the PF_MEMALLOC context.  dm layer uses
    mempool_alloc in order to guarantee a forward progress which used to
    inhibit access to memory reserves when using page allocator.  This has
    changed by f9054c70 ("mm, mempool: only set __GFP_NOMEMALLOC if
    there are free elements") which has dropped the __GFP_NOMEMALLOC
    protection when the memory pool is depleted.
    
    If we are running out of memory and the only way forward to free memory
    is to perform swapout we just keep consuming memory reserves rather than
    throttling the mempool allocations and allowing the pending IO to
    complete up to a moment when the memory is depleted completely and there
    is no way forward but invoking the OOM killer.  This is less than
    optimal.
    
    The original intention of f9054c70 was to help with the OOM
    situations where the oom victim depends on mempool allocation to make a
    forward progress.  David has mentioned the following backtrace:
    
      schedule
      schedule_timeout
      io_schedule_timeout
      mempool_alloc
      __split_and_process_bio
      dm_request
      generic_make_request
      submit_bio
      mpage_readpages
      ext4_readpages
      __do_page_cache_readahead
      ra_submit
      filemap_fault
      handle_mm_fault
      __do_page_fault
      do_page_fault
      page_fault
    
    We do not know more about why the mempool is depleted without being
    replenished in time, though.  In any case the dm layer shouldn't depend
    on any allocations outside of the dedicated pools so a forward progress
    should be guaranteed.  If this is not the case then the dm should be
    fixed rather than papering over the problem and postponing it to later
    by accessing more memory reserves.
    
    mempools are a mechanism to maintain dedicated memory reserves to
    guaratee forward progress.  Allowing them an unbounded access to the
    page allocator memory reserves is going against the whole purpose of
    this mechanism.
    
    Bisected by Ondrej Kozina.
    
    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20160721145309.GR26379@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Reported-by: default avatarOndrej Kozina <okozina@redhat.com>
    Reviewed-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: default avatarNeilBrown <neilb@suse.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Mikulas Patocka <mpatocka@redhat.com>
    Cc: Ondrej Kozina <okozina@redhat.com>
    Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
    Cc: Mel Gorman <mgorman@suse.de>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    8d80b5e1
mempool.c 14.1 KB