• Michel Lespinasse's avatar
    mlock: only hold mmap_sem in shared mode when faulting in pages · fed067da
    Michel Lespinasse authored
    Currently mlock() holds mmap_sem in exclusive mode while the pages get
    faulted in.  In the case of a large mlock, this can potentially take a
    very long time, during which various commands such as 'ps auxw' will
    block.  This makes sysadmins unhappy:
    
    real    14m36.232s
    user    0m0.003s
    sys     0m0.015s
    (output from 'time ps auxw' while a 20GB file was being mlocked without
    being previously preloaded into page cache)
    
    I propose that mlock() could release mmap_sem after the VM_LOCKED bits
    have been set in all appropriate VMAs.  Then a second pass could be done
    to actually mlock the pages, in small batches, releasing mmap_sem when we
    block on disk access or when we detect some contention.
    
    This patch:
    
    Before this change, mlock() holds mmap_sem in exclusive mode while the
    pages get faulted in.  In the case of a large mlock, this can potentially
    take a very long time.  Various things will block while mmap_sem is held,
    including 'ps auxw'.  This can make sysadmins angry.
    
    I propose that mlock() could release mmap_sem after the VM_LOCKED bits
    have been set in all appropriate VMAs.  Then a second pass could be done
    to actually mlock the pages with mmap_sem held for reads only.  We need to
    recheck the vma flags after we re-acquire mmap_sem, but this is easy.
    
    In the case where a vma has been munlocked before mlock completes, pages
    that were already marked as PageMlocked() are handled by the munlock()
    call, and mlock() is careful to not mark new page batches as PageMlocked()
    after the munlock() call has cleared the VM_LOCKED vma flags.  So, the end
    result will be identical to what'd happen if munlock() had executed after
    the mlock() call.
    
    In a later change, I will allow the second pass to release mmap_sem when
    blocking on disk accesses or when it is otherwise contended, so that it
    won't be held for long periods of time even in shared mode.
    Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
    Tested-by: default avatarValdis Kletnieks <Valdis.Kletnieks@vt.edu>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Nick Piggin <npiggin@kernel.dk>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: David Howells <dhowells@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    fed067da
mlock.c 16.9 KB