Commit 88f306b6 authored by Kirill A. Shutemov's avatar Kirill A. Shutemov Committed by Linus Torvalds

mm: fix locking order in mm_take_all_locks()

Dmitry Vyukov has reported[1] possible deadlock (triggered by his
syzkaller fuzzer):

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&hugetlbfs_i_mmap_rwsem_key);
                               lock(&mapping->i_mmap_rwsem);
                               lock(&hugetlbfs_i_mmap_rwsem_key);
  lock(&mapping->i_mmap_rwsem);

Both traces points to mm_take_all_locks() as a source of the problem.
It doesn't take care about ordering or hugetlbfs_i_mmap_rwsem_key (aka
mapping->i_mmap_rwsem for hugetlb mapping) vs.  i_mmap_rwsem.

huge_pmd_share() does memory allocation under hugetlbfs_i_mmap_rwsem_key
and allocator can take i_mmap_rwsem if it hit reclaim.  So we need to
take i_mmap_rwsem from all hugetlb VMAs before taking i_mmap_rwsem from
rest of VMAs.

The patch also documents locking order for hugetlbfs_i_mmap_rwsem_key.

[1] http://lkml.kernel.org/r/CACT4Y+Zu95tBs-0EvdiAKzUOsb4tczRRfCRTpLr4bg_OP9HuVg@mail.gmail.comSigned-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
Reviewed-by: default avatarMichal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent d645fc0e
...@@ -708,7 +708,7 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb, ...@@ -708,7 +708,7 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
/* /*
* Hugetlbfs is not reclaimable; therefore its i_mmap_rwsem will never * Hugetlbfs is not reclaimable; therefore its i_mmap_rwsem will never
* be taken from reclaim -- unlike regular filesystems. This needs an * be taken from reclaim -- unlike regular filesystems. This needs an
* annotation because huge_pmd_share() does an allocation under * annotation because huge_pmd_share() does an allocation under hugetlb's
* i_mmap_rwsem. * i_mmap_rwsem.
*/ */
static struct lock_class_key hugetlbfs_i_mmap_rwsem_key; static struct lock_class_key hugetlbfs_i_mmap_rwsem_key;
......
...@@ -3184,10 +3184,16 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping) ...@@ -3184,10 +3184,16 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping)
* mapping->flags avoid to take the same lock twice, if more than one * mapping->flags avoid to take the same lock twice, if more than one
* vma in this mm is backed by the same anon_vma or address_space. * vma in this mm is backed by the same anon_vma or address_space.
* *
* We can take all the locks in random order because the VM code * We take locks in following order, accordingly to comment at beginning
* taking i_mmap_rwsem or anon_vma->rwsem outside the mmap_sem never * of mm/rmap.c:
* takes more than one of them in a row. Secondly we're protected * - all hugetlbfs_i_mmap_rwsem_key locks (aka mapping->i_mmap_rwsem for
* against a concurrent mm_take_all_locks() by the mm_all_locks_mutex. * hugetlb mapping);
* - all i_mmap_rwsem locks;
* - all anon_vma->rwseml
*
* We can take all locks within these types randomly because the VM code
* doesn't nest them and we protected from parallel mm_take_all_locks() by
* mm_all_locks_mutex.
* *
* mm_take_all_locks() and mm_drop_all_locks are expensive operations * mm_take_all_locks() and mm_drop_all_locks are expensive operations
* that may have to take thousand of locks. * that may have to take thousand of locks.
...@@ -3206,7 +3212,16 @@ int mm_take_all_locks(struct mm_struct *mm) ...@@ -3206,7 +3212,16 @@ int mm_take_all_locks(struct mm_struct *mm)
for (vma = mm->mmap; vma; vma = vma->vm_next) { for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (signal_pending(current)) if (signal_pending(current))
goto out_unlock; goto out_unlock;
if (vma->vm_file && vma->vm_file->f_mapping) if (vma->vm_file && vma->vm_file->f_mapping &&
is_vm_hugetlb_page(vma))
vm_lock_mapping(mm, vma->vm_file->f_mapping);
}
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (signal_pending(current))
goto out_unlock;
if (vma->vm_file && vma->vm_file->f_mapping &&
!is_vm_hugetlb_page(vma))
vm_lock_mapping(mm, vma->vm_file->f_mapping); vm_lock_mapping(mm, vma->vm_file->f_mapping);
} }
......
...@@ -23,6 +23,7 @@ ...@@ -23,6 +23,7 @@
* inode->i_mutex (while writing or truncating, not reading or faulting) * inode->i_mutex (while writing or truncating, not reading or faulting)
* mm->mmap_sem * mm->mmap_sem
* page->flags PG_locked (lock_page) * page->flags PG_locked (lock_page)
* hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share)
* mapping->i_mmap_rwsem * mapping->i_mmap_rwsem
* anon_vma->rwsem * anon_vma->rwsem
* mm->page_table_lock or pte_lock * mm->page_table_lock or pte_lock
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment