Commit 4779280d authored by Ying Han's avatar Ying Han Committed by Linus Torvalds

mm: make get_user_pages() interruptible

The initial implementation of checking TIF_MEMDIE covers the cases of OOM
killing.  If the process has been OOM killed, the TIF_MEMDIE is set and it
return immediately.  This patch includes:

1.  add the case that the SIGKILL is sent by user processes.  The
   process can try to get_user_pages() unlimited memory even if a user
   process has sent a SIGKILL to it(maybe a monitor find the process
   exceed its memory limit and try to kill it).  In the old
   implementation, the SIGKILL won't be handled until the get_user_pages()
   returns.

2.  change the return value to be ERESTARTSYS.  It makes no sense to
   return ENOMEM if the get_user_pages returned by getting a SIGKILL
   signal.  Considering the general convention for a system call
   interrupted by a signal is ERESTARTNOSYS, so the current return value
   is consistant to that.

Lee:

An unfortunate side effect of "make-get_user_pages-interruptible" is that
it prevents a SIGKILL'd task from munlock-ing pages that it had mlocked,
resulting in freeing of mlocked pages.  Freeing of mlocked pages, in
itself, is not so bad.  We just count them now--altho' I had hoped to
remove this stat and add PG_MLOCKED to the free pages flags check.

However, consider pages in shared libraries mapped by more than one task
that a task mlocked--e.g., via mlockall().  If the task that mlocked the
pages exits via SIGKILL, these pages would be left mlocked and
unevictable.

Proposed fix:

Add another GUP flag to ignore sigkill when calling get_user_pages from
munlock()--similar to Kosaki Motohiro's 'IGNORE_VMA_PERMISSIONS flag for
the same purpose.  We are not actually allocating memory in this case,
which "make-get_user_pages-interruptible" intends to avoid.  We're just
munlocking pages that are already resident and mapped, and we're reusing
get_user_pages() to access those pages.

??  Maybe we should combine 'IGNORE_VMA_PERMISSIONS and '_IGNORE_SIGKILL
into a single flag: GUP_FLAGS_MUNLOCK ???

[Lee.Schermerhorn@hp.com: ignore sigkill in get_user_pages during munlock]
Signed-off-by: default avatarPaul Menage <menage@google.com>
Signed-off-by: default avatarYing Han <yinghan@google.com>
Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 91bf189c
...@@ -276,6 +276,7 @@ static inline void mminit_validate_memmodel_limits(unsigned long *start_pfn, ...@@ -276,6 +276,7 @@ static inline void mminit_validate_memmodel_limits(unsigned long *start_pfn,
#define GUP_FLAGS_WRITE 0x1 #define GUP_FLAGS_WRITE 0x1
#define GUP_FLAGS_FORCE 0x2 #define GUP_FLAGS_FORCE 0x2
#define GUP_FLAGS_IGNORE_VMA_PERMISSIONS 0x4 #define GUP_FLAGS_IGNORE_VMA_PERMISSIONS 0x4
#define GUP_FLAGS_IGNORE_SIGKILL 0x8
int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int len, int flags, unsigned long start, int len, int flags,
......
...@@ -1210,6 +1210,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, ...@@ -1210,6 +1210,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
int write = !!(flags & GUP_FLAGS_WRITE); int write = !!(flags & GUP_FLAGS_WRITE);
int force = !!(flags & GUP_FLAGS_FORCE); int force = !!(flags & GUP_FLAGS_FORCE);
int ignore = !!(flags & GUP_FLAGS_IGNORE_VMA_PERMISSIONS); int ignore = !!(flags & GUP_FLAGS_IGNORE_VMA_PERMISSIONS);
int ignore_sigkill = !!(flags & GUP_FLAGS_IGNORE_SIGKILL);
if (len <= 0) if (len <= 0)
return 0; return 0;
...@@ -1288,12 +1289,15 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, ...@@ -1288,12 +1289,15 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
struct page *page; struct page *page;
/* /*
* If tsk is ooming, cut off its access to large memory * If we have a pending SIGKILL, don't keep faulting
* allocations. It has a pending SIGKILL, but it can't * pages and potentially allocating memory, unless
* be processed until returning to user space. * current is handling munlock--e.g., on exit. In
*/ * that case, we are not allocating memory. Rather,
if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE))) * we're only unlocking already resident/mapped pages.
return i ? i : -ENOMEM; */
if (unlikely(!ignore_sigkill &&
fatal_signal_pending(current)))
return i ? i : -ERESTARTSYS;
if (write) if (write)
foll_flags |= FOLL_WRITE; foll_flags |= FOLL_WRITE;
......
...@@ -173,12 +173,13 @@ static long __mlock_vma_pages_range(struct vm_area_struct *vma, ...@@ -173,12 +173,13 @@ static long __mlock_vma_pages_range(struct vm_area_struct *vma,
(atomic_read(&mm->mm_users) != 0)); (atomic_read(&mm->mm_users) != 0));
/* /*
* mlock: don't page populate if page has PROT_NONE permission. * mlock: don't page populate if vma has PROT_NONE permission.
* munlock: the pages always do munlock althrough * munlock: always do munlock although the vma has PROT_NONE
* its has PROT_NONE permission. * permission, or SIGKILL is pending.
*/ */
if (!mlock) if (!mlock)
gup_flags |= GUP_FLAGS_IGNORE_VMA_PERMISSIONS; gup_flags |= GUP_FLAGS_IGNORE_VMA_PERMISSIONS |
GUP_FLAGS_IGNORE_SIGKILL;
if (vma->vm_flags & VM_WRITE) if (vma->vm_flags & VM_WRITE)
gup_flags |= GUP_FLAGS_WRITE; gup_flags |= GUP_FLAGS_WRITE;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment