• Serge Semin's avatar
    EDAC/synopsys: Fix ECC status and IRQ control race condition · 591c9466
    Serge Semin authored
    The race condition around the ECCCLR register access happens in the IRQ
    disable method called in the device remove() procedure and in the ECC IRQ
    handler:
    
      1. Enable IRQ:
         a. ECCCLR = EN_CE | EN_UE
      2. Disable IRQ:
         a. ECCCLR = 0
      3. IRQ handler:
         a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT
         b. ECCCLR = 0
         c. ECCCLR = EN_CE | EN_UE
    
    So if the IRQ disabling procedure is called concurrently with the IRQ
    handler method the IRQ might be actually left enabled due to the
    statement 3c.
    
    The root cause of the problem is that ECCCLR register (which since
    v3.10a has been called as ECCCTL) has intermixed ECC status data clear
    flags and the IRQ enable/disable flags. Thus the IRQ disabling (clear EN
    flags) and handling (write 1 to clear ECC status data) procedures must
    be serialised around the ECCCTL register modification to prevent the
    race.
    
    So fix the problem described above by adding the spin-lock around the
    ECCCLR modifications and preventing the IRQ-handler from modifying the
    IRQs enable flags (there is no point in disabling the IRQ and then
    re-enabling it again within a single IRQ handler call, see the
    statements 3a/3b and 3c above).
    
    Fixes: f7824ded ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR")
    Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
    Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20240222181324.28242-2-fancer.lancer@gmail.com
    591c9466
synopsys_edac.c 38.5 KB