• Tejun Heo's avatar
    job control: Fix ptracer wait(2) hang and explain notask_error clearing · 9b84cca2
    Tejun Heo authored
    
    
    wait(2) and friends allow access to stopped/continued states through
    zombies, which is required as the states are process-wide and should
    be accessible whether the leader task is alive or undead.
    wait_consider_task() implements this by always clearing notask_error
    and going through wait_task_stopped/continued() for unreaped zombies.
    
    However, while ptraced, the stopped state is per-task and as such if
    the ptracee became a zombie, there's no further stopped event to
    listen to and wait(2) and friends should return -ECHILD on the tracee.
    
    Fix it by clearing notask_error only if WCONTINUED | WEXITED is set
    for ptraced zombies.  While at it, document why clearing notask_error
    is safe for each case.
    
    Test case follows.
    
      #include <stdio.h>
      #include <unistd.h>
      #include <pthread.h>
      #include <time.h>
      #include <sys/types.h>
      #include <sys/ptrace.h>
      #include <sys/wait.h>
    
      static void *nooper(void *arg)
      {
    	  pause();
    	  return NULL;
      }
    
      int main(void)
      {
    	  const struct timespec ts1s = { .tv_sec = 1 };
    	  pid_t tracee, tracer;
    	  siginfo_t si;
    
    	  tracee = fork();
    	  if (tracee == 0) {
    		  pthread_t thr;
    
    		  pthread_create(&thr, NULL, nooper, NULL);
    		  nanosleep(&ts1s, NULL);
    		  printf("tracee exiting\n");
    		  pthread_exit(NULL);	/* let subthread run */
    	  }
    
    	  tracer = fork();
    	  if (tracer == 0) {
    		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
    		  while (1) {
    			  if (waitid(P_PID, tracee, &si, WSTOPPED) < 0) {
    				  perror("waitid");
    				  break;
    			  }
    			  ptrace(PTRACE_CONT, tracee, NULL,
    				 (void *)(long)si.si_status);
    		  }
    		  return 0;
    	  }
    
    	  waitid(P_PID, tracer, &si, WEXITED);
    	  kill(tracee, SIGKILL);
    	  return 0;
      }
    
    Before the patch, after the tracee becomes a zombie, the tracer's
    waitid(WSTOPPED) never returns and the program doesn't terminate.
    
      tracee exiting
      ^C
    
    After the patch, tracee exiting triggers waitid() to fail.
    
      tracee exiting
      waitid: No child processes
    
    -v2: Oleg pointed out that exited in addition to continued can happen
         for ptraced dead group leader.  Clear notask_error for ptraced
         child on WEXITED too.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
    9b84cca2
exit.c 45.4 KB