• Oscar Salvador's avatar
    mm,mremap: bail out earlier in mremap_to under map pressure · ea2c3f6f
    Oscar Salvador authored
    When using mremap() syscall in addition to MREMAP_FIXED flag, mremap()
    calls mremap_to() which does the following:
    
    1) unmaps the destination region where we are going to move the map
    2) If the new region is going to be smaller, we unmap the last part
       of the old region
    
    Then, we will eventually call move_vma() to do the actual move.
    
    move_vma() checks whether we are at least 4 maps below max_map_count
    before going further, otherwise it bails out with -ENOMEM.  The problem
    is that we might have already unmapped the vma's in steps 1) and 2), so
    it is not possible for userspace to figure out the state of the vmas
    after it gets -ENOMEM, and it gets tricky for userspace to clean up
    properly on error path.
    
    While it is true that we can return -ENOMEM for more reasons (e.g: see
    may_expand_vm() or move_page_tables()), I think that we can avoid this
    scenario if we check early in mremap_to() if the operation has high
    chances to succeed map-wise.
    
    Should that not be the case, we can bail out before we even try to unmap
    anything, so we make sure the vma's are left untouched in case we are
    likely to be short of maps.
    
    The thumb-rule now is to rely on the worst-scenario case we can have.
    That is when both vma's (old region and new region) are going to be
    split in 3, so we get two more maps to the ones we already hold (one per
    each).  If current map count + 2 maps still leads us to 4 maps below the
    threshold, we are going to pass the check in move_vma().
    
    Of course, this is not free, as it might generate false positives when
    it is true that we are tight map-wise, but the unmap operation can
    release several vma's leading us to a good state.
    
    Another approach was also investigated [1], but it may be too much
    hassle for what it brings.
    
    [1] https://lore.kernel.org/lkml/20190219155320.tkfkwvqk53tfdojt@d104.suse.de/
    
    Link: http://lkml.kernel.org/r/20190226091314.18446-1-osalvador@suse.deSigned-off-by: default avatarOscar Salvador <osalvador@suse.de>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
    Cc: Yang Shi <yang.shi@linux.alibaba.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Cyril Hrubis <chrubis@suse.cz>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    ea2c3f6f
mremap.c 19.3 KB