• Yazen Ghannam's avatar
    x86/amd_nb: Add MI200 PCI IDs · e1588568
    Yazen Ghannam authored
    The AMD MI200 series accelerators are data center GPUs. They include
    unified memory controllers and a data fabric similar to those used in
    AMD x86 CPU products. The memory controllers report errors using MCA,
    though these errors are generally handled through GPU drivers that
    directly manage the accelerator device.
    
    In some configurations, memory errors from these devices will be
    reported through MCA and managed by x86 CPUs. The OS is expected to
    handle these errors in similar fashion to MCA errors originating from
    memory controllers on the CPUs. In Linux, this flow includes passing MCA
    errors to a notifier chain with handlers in the EDAC subsystem.
    
    The AMD64 EDAC module requires information from the memory controllers
    and data fabric in order to provide detailed decoding of memory errors.
    The information is read from hardware registers accessed through
    interfaces in the data fabric.
    
    The accelerator data fabrics are visible to the host x86 CPUs as PCI
    devices just like x86 CPU data fabrics are already. However, the
    accelerator fabrics have new and unique PCI IDs.
    
    Add PCI IDs for the MI200 series of accelerator devices in order to
    enable EDAC support. The data fabrics of the accelerator devices will be
    enumerated as any other fabric already supported.  System-specific
    implementation details will be handled within the AMD64 EDAC module.
    
      [ bp: Scrub off marketing speak. ]
    Signed-off-by: default avatarYazen Ghannam <yazen.ghannam@amd.com>
    Co-developed-by: default avatarMuralidhara M K <muralidhara.mk@amd.com>
    Signed-off-by: default avatarMuralidhara M K <muralidhara.mk@amd.com>
    Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20230515113537.1052146-2-muralimk@amd.com
    e1588568
amd_nb.c 14.8 KB