• Lyude Paul's avatar
    drm/i915: Add short HPD IRQ storm detection for non-MST systems · 9a64c650
    Lyude Paul authored
    Unfortunately, it seems that the HPD IRQ storm problem from the early
    days of Intel GPUs was never entirely solved, only mostly. Within the
    last couple of days, I got a bug report from one of our customers who
    had been having issues with their machine suddenly booting up very
    slowly after having updated. The amount of time it took to boot went
    from around 30 seconds, to over 6 minutes consistently.
    
    After some investigation, I discovered that i915 was reporting massive
    amounts of short HPD IRQ spam on this system from the DisplayPort port,
    despite there not being anything actually connected. The symptoms would
    start with one "long" HPD IRQ being detected at boot:
    
    [    1.891398] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00440000, dig 0x00440000, pins 0x000000a0
    [    1.891436] [drm:intel_hpd_irq_handler [i915]] digital hpd port B - long
    [    1.891472] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 5 - cnt: 0
    [    1.891508] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - long
    [    1.891544] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 7 - cnt: 0
    [    1.891592] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B - long
    [    1.891628] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port D - long
    …
    
    followed by constant short IRQs afterwards:
    
    [    1.895091] [drm:intel_encoder_hotplug [i915]] [CONNECTOR:66:DP-1] status updated from unknown to disconnected
    [    1.895129] [drm:i915_hotplug_work_func [i915]] Connector DP-3 (pin 7) received hotplug event.
    [    1.895165] [drm:intel_dp_detect [i915]] [CONNECTOR:72:DP-3]
    [    1.895275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
    [    1.895312] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
    [    1.895762] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
    [    1.895799] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
    [    1.896239] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x71450085
    [    1.896293] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
    [    1.896330] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
    [    1.896781] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
    [    1.896817] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
    [    1.897275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
    
    The customer's system in question has a GM45 GPU, which is apparently
    well known for hotplugging storms.
    
    So, workaround this impressively broken hardware by changing the default
    HPD storm threshold from 5 to 50. Then, make long IRQs count for 10, and
    short IRQs count for 1. This makes it so that 5 long IRQs will trigger
    an HPD storm, and on systems with short HPD storm detection 50 short
    IRQs will trigger an HPD storm. 50 short IRQs amounts to 100ms of
    constant pulsing, which seems like a good middleground between being too
    sensitive and not being sensitive enough (which would cause visible
    stutters in userspace every time a storm occurs).
    
    And just to be extra safe: we don't enable this by default on systems
    with MST support. There's too high of a chance of MST support triggering
    storm detection, and systems that are new enough to support MST are a
    lot less likely to have issues with IRQ storms anyway.
    
    As a note: this patch was tested using a ThinkPad T450s and a Chamelium
    to simulate the short IRQ storms.
    
    Changes since v1:
    - Don't use two separate thresholds, just make long IRQs count for 10
      each and short IRQs count for 1. This simplifies the code a bit
      - Ville Syrjälä
    Changes since v2:
    - Document @long_hpd in intel_hpd_irq_storm_detect, no functional
      changes
    Changes since v4:
    - Remove !! in long_hpd assignment - Ville Syrjälä
    - queue_hp = true - Ville Syrjälä
    Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
    Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20181106213017.14563-6-lyude@redhat.com
    9a64c650
intel_hotplug.c 21 KB