• Thomas Gleixner's avatar
    x86/irq: Cure live lock in fixup_irqs() · 551adc60
    Thomas Gleixner authored
    Harry reported, that he's able to trigger a system freeze with cpu hot
    unplug. The freeze turned out to be a live lock caused by recent changes in
    irq_force_complete_move().
    
    When fixup_irqs() and from there irq_force_complete_move() is called on the
    dying cpu, then all other cpus are in stop machine an wait for the dying cpu
    to complete the teardown. If there is a move of an interrupt pending then
    irq_force_complete_move() sends the cleanup IPI to the cpus in the old_domain
    mask and waits for them to clear the mask. That's obviously impossible as
    those cpus are firmly stuck in stop machine with interrupts disabled.
    
    I should have known that, but I completely overlooked it being concentrated on
    the locking issues around the vectors. And the existance of the call to
    __irq_complete_move() in the code, which actually sends the cleanup IPI made
    it reasonable to wait for that cleanup to complete. That call was bogus even
    before the recent changes as it was just a pointless distraction.
    
    We have to look at two cases:
    
    1) The move_in_progress flag of the interrupt is set
    
       This means the ioapic has been updated with the new vector, but it has not
       fired yet. In theory there is a race:
    
       set_ioapic(new_vector) <-- Interrupt is raised before update is effective,
       			      i.e. it's raised on the old vector. 
    
       So if the target cpu cannot handle that interrupt before the old vector is
       cleaned up, we get a spurious interrupt and in the worst case the ioapic
       irq line becomes stale, but my experiments so far have only resulted in
       spurious interrupts.
    
       But in case of cpu hotplug this should be a non issue because if the
       affinity update happens right before all cpus rendevouz in stop machine,
       there is no way that the interrupt can be blocked on the target cpu because
       all cpus loops first with interrupts enabled in stop machine, so the old
       vector is not yet cleaned up when the interrupt fires.
    
       So the only way to run into this issue is if the delivery of the interrupt
       on the apic/system bus would be delayed beyond the point where the target
       cpu disables interrupts in stop machine. I doubt that it can happen, but at
       least there is a theroretical chance. Virtualization might be able to
       expose this, but AFAICT the IOAPIC emulation is not as stupid as the real
       hardware.
    
       I've spent quite some time over the weekend to enforce that situation,
       though I was not able to trigger the delayed case.
    
    2) The move_in_progress flag is not set and the old_domain cpu mask is not
       empty.
    
       That means, that an interrupt was delivered after the change and the
       cleanup IPI has been sent to the cpus in old_domain, but not all CPUs have
       responded to it yet.
    
    In both cases we can assume that the next interrupt will arrive on the new
    vector, so we can cleanup the old vectors on the cpus in the old_domain cpu
    mask.
    
    Fixes: 98229aa3 "x86/irq: Plug vector cleanup race"
    Reported-by: default avatarHarry Junior <harryjr@outlook.fr>
    Tested-by: default avatarTony Luck <tony.luck@intel.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Joe Lawrence <joe.lawrence@stratus.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Ben Hutchings <ben@decadent.org.uk>
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603140931430.3657@nanosSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    551adc60
hw_irq.h 4.63 KB