• Paul E. McKenney's avatar
    rcu-tasks: Fix grace-period/unlock race in RCU Tasks Trace · ba3a86e4
    Paul E. McKenney authored
    The more intense grace-period processing resulting from the 50x RCU
    Tasks Trace grace-period speedups exposed the following race condition:
    
    o	Task A running on CPU 0 executes rcu_read_lock_trace(),
    	entering a read-side critical section.
    
    o	When Task A eventually invokes rcu_read_unlock_trace()
    	to exit its read-side critical section, this function
    	notes that the ->trc_reader_special.s flag is zero and
    	and therefore invoke wil set ->trc_reader_nesting to zero
    	using WRITE_ONCE().  But before that happens...
    
    o	The RCU Tasks Trace grace-period kthread running on some other
    	CPU interrogates Task A, but this fails because this task is
    	currently running.  This kthread therefore sends an IPI to CPU 0.
    
    o	CPU 0 receives the IPI, and thus invokes trc_read_check_handler().
    	Because Task A has not yet cleared its ->trc_reader_nesting
    	counter, this function sees that Task A is still within its
    	read-side critical section.  This function therefore sets the
    	->trc_reader_nesting.b.need_qs flag, AKA the .need_qs flag.
    
    	Except that Task A has already checked the .need_qs flag, which
    	is part of the ->trc_reader_special.s flag.  The .need_qs flag
    	therefore remains set until Task A's next rcu_read_unlock_trace().
    
    o	Task A now invokes synchronize_rcu_tasks_trace(), which cannot
    	start a new grace period until the current grace period completes.
    	And thus cannot return until after that time.
    
    	But Task A's .need_qs flag is still set, which prevents the current
    	grace period from completing.  And because Task A is blocked, it
    	will never execute rcu_read_unlock_trace() until its call to
    	synchronize_rcu_tasks_trace() returns.
    
    	We are therefore deadlocked.
    
    This race is improbable, but 80 hours of rcutorture made it happen twice.
    The race was possible before the grace-period speedup, but roughly 50x
    less probable.  Several thousand hours of rcutorture would have been
    necessary to have a reasonable chance of making this happen before this
    50x speedup.
    
    This commit therefore eliminates this deadlock by setting
    ->trc_reader_nesting to a large negative number before checking the
    .need_qs and zeroing (or decrementing with respect to its initial
    value) ->trc_reader_nesting.  For its part, the IPI handler's
    trc_read_check_handler() function adds a check for negative values,
    deferring evaluation of the task in this case.  Taken together, these
    changes avoid this deadlock scenario.
    
    Fixes: 276c4104
    
     ("rcu-tasks: Split ->trc_reader_need_end")
    Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: <bpf@vger.kernel.org>
    Cc: <stable@vger.kernel.org> # 5.7.x
    Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
    ba3a86e4
tasks.h 40.6 KB