• Huang Ying's avatar
    ACPI, APEI, GHES, Error records content based throttle · 152cef40
    Huang Ying authored
    printk is used by GHES to report hardware errors.  Ratelimit is
    enforced on the printk to avoid too many hardware error reports in
    kernel log.  Because there may be thousands or even millions of
    corrected hardware errors during system running.
    
    Currently, a simple scheme is used.  That is, the total number of
    hardware error reporting is ratelimited.  This may cause some issues
    in practice.
    
    For example, there are two kinds of hardware errors occurred in
    system.  One is corrected memory error, because the fault memory
    address is accessed frequently, there may be hundreds error report
    per-second.  The other is corrected PCIe AER error, it will be
    reported once per-second.  Because they share one ratelimit control
    structure, it is highly possible that only memory error is reported.
    
    To avoid the above issue, an error record content based throttle
    algorithm is implemented in the patch.  Where after the first
    successful reporting, all error records that are same are throttled for
    some time, to let other kinds of error records have the opportunity to
    be reported.
    
    In above example, the memory errors will be throttled for some time,
    after being printked.  Then the PCIe AER error will be printked
    successfully.
    Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
    Signed-off-by: default avatarLen Brown <len.brown@intel.com>
    152cef40
ghes.c 27.2 KB