Commit ef982393 authored by Guillaume Morin's avatar Guillaume Morin Committed by Linus Torvalds

kernel/exit.c: call proc_exit_connector() after exit_state is set

The process events connector delivers a notification when a process
exits.  This is really convenient for a process that spawns and wants to
monitor its children through an epoll-able() interface.

Unfortunately, there is a small window between when the event is
delivered and the child become wait()-able.

This is creates a race if the parent wants to make sure that it knows
about the exit, e.g

pid_t pid = fork();
if (pid > 0) {
	register_interest_for_pid(pid);
	if (waitpid(pid, NULL, WNOHANG) > 0)
	{
	  /* We might have raced with exit() */
	}
	return;
}

/* Child */
execve(...)

register_interest_for_pid() would be telling the the connector socket
reader to pay attention to events related to pid.

Though this is not a bug, I think it would make the connector a bit more
usable if this race was closed by simply moving the call to
proc_exit_connector() from just before exit_notify() to right after.

Oleg said:

: Even with this patch the code above is still "racy" if the child is
: multi-threaded.  Plus it should obviously filter-out subthreads.  And
: afaics there is no way to make it reliable, even if you change the code
: above so that waitpid() is called only after the last thread exits WNOHANG
: still can fail.
Signed-off-by: default avatarGuillaume Morin <guillaume@morinfr.org>
Cc: Matt Helsley <matt.helsley@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 4bcb8232
...@@ -802,13 +802,13 @@ void do_exit(long code) ...@@ -802,13 +802,13 @@ void do_exit(long code)
module_put(task_thread_info(tsk)->exec_domain->module); module_put(task_thread_info(tsk)->exec_domain->module);
proc_exit_connector(tsk);
/* /*
* FIXME: do that only when needed, using sched_exit tracepoint * FIXME: do that only when needed, using sched_exit tracepoint
*/ */
flush_ptrace_hw_breakpoint(tsk); flush_ptrace_hw_breakpoint(tsk);
exit_notify(tsk, group_dead); exit_notify(tsk, group_dead);
proc_exit_connector(tsk);
#ifdef CONFIG_NUMA #ifdef CONFIG_NUMA
task_lock(tsk); task_lock(tsk);
mpol_put(tsk->mempolicy); mpol_put(tsk->mempolicy);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment