• Steven Rostedt's avatar
    sched: prevent divide by zero error in cpu_avg_load_per_task · 4cd42620
    Steven Rostedt authored
    Impact: fix divide by zero crash in scheduler rebalance irq
    
    While testing the branch profiler, I hit this crash:
    
    divide error: 0000 [#1] PREEMPT SMP
    [...]
    RIP: 0010:[<ffffffff8024a008>]  [<ffffffff8024a008>] cpu_avg_load_per_task+0x50/0x7f
    [...]
    Call Trace:
     <IRQ> <0> [<ffffffff8024fd43>] find_busiest_group+0x3e5/0xcaa
     [<ffffffff8025da75>] rebalance_domains+0x2da/0xa21
     [<ffffffff80478769>] ? find_next_bit+0x1b2/0x1e6
     [<ffffffff8025e2ce>] run_rebalance_domains+0x112/0x19f
     [<ffffffff8026d7c2>] __do_softirq+0xa8/0x232
     [<ffffffff8020ea7c>] call_softirq+0x1c/0x3e
     [<ffffffff8021047a>] do_softirq+0x94/0x1cd
     [<ffffffff8026d5eb>] irq_exit+0x6b/0x10e
     [<ffffffff8022e6ec>] smp_apic_timer_interrupt+0xd3/0xff
     [<ffffffff8020e4b3>] apic_timer_interrupt+0x13/0x20
    
    The code for cpu_avg_load_per_task has:
    
    	if (rq->nr_running)
    		rq->avg_load_per_task = rq->load.weight / rq->nr_running;
    
    The runqueue lock is not held here, and there is nothing that prevents
    the rq->nr_running from going to zero after it passes the if condition.
    
    The branch profiler simply made the race window bigger.
    
    This patch saves off the rq->nr_running to a local variable and uses that
    for both the condition and the division.
    Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
    Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    4cd42620
sched.c 225 KB