1. 07 Nov, 2018 10 commits
    • Lyude Paul's avatar
      drm/i915: Add short HPD IRQ storm detection for non-MST systems · 9a64c650
      Lyude Paul authored
      Unfortunately, it seems that the HPD IRQ storm problem from the early
      days of Intel GPUs was never entirely solved, only mostly. Within the
      last couple of days, I got a bug report from one of our customers who
      had been having issues with their machine suddenly booting up very
      slowly after having updated. The amount of time it took to boot went
      from around 30 seconds, to over 6 minutes consistently.
      
      After some investigation, I discovered that i915 was reporting massive
      amounts of short HPD IRQ spam on this system from the DisplayPort port,
      despite there not being anything actually connected. The symptoms would
      start with one "long" HPD IRQ being detected at boot:
      
      [    1.891398] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00440000, dig 0x00440000, pins 0x000000a0
      [    1.891436] [drm:intel_hpd_irq_handler [i915]] digital hpd port B - long
      [    1.891472] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 5 - cnt: 0
      [    1.891508] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - long
      [    1.891544] [drm:intel_hpd_irq_handler [i915]] Received HPD interrupt on PIN 7 - cnt: 0
      [    1.891592] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B - long
      [    1.891628] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port D - long
      …
      
      followed by constant short IRQs afterwards:
      
      [    1.895091] [drm:intel_encoder_hotplug [i915]] [CONNECTOR:66:DP-1] status updated from unknown to disconnected
      [    1.895129] [drm:i915_hotplug_work_func [i915]] Connector DP-3 (pin 7) received hotplug event.
      [    1.895165] [drm:intel_dp_detect [i915]] [CONNECTOR:72:DP-3]
      [    1.895275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
      [    1.895312] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
      [    1.895762] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
      [    1.895799] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
      [    1.896239] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x71450085
      [    1.896293] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
      [    1.896330] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
      [    1.896781] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
      [    1.896817] [drm:intel_hpd_irq_handler [i915]] digital hpd port D - short
      [    1.897275] [drm:intel_get_hpd_pins [i915]] hotplug event received, stat 0x00200000, dig 0x00200000, pins 0x00000080
      
      The customer's system in question has a GM45 GPU, which is apparently
      well known for hotplugging storms.
      
      So, workaround this impressively broken hardware by changing the default
      HPD storm threshold from 5 to 50. Then, make long IRQs count for 10, and
      short IRQs count for 1. This makes it so that 5 long IRQs will trigger
      an HPD storm, and on systems with short HPD storm detection 50 short
      IRQs will trigger an HPD storm. 50 short IRQs amounts to 100ms of
      constant pulsing, which seems like a good middleground between being too
      sensitive and not being sensitive enough (which would cause visible
      stutters in userspace every time a storm occurs).
      
      And just to be extra safe: we don't enable this by default on systems
      with MST support. There's too high of a chance of MST support triggering
      storm detection, and systems that are new enough to support MST are a
      lot less likely to have issues with IRQ storms anyway.
      
      As a note: this patch was tested using a ThinkPad T450s and a Chamelium
      to simulate the short IRQ storms.
      
      Changes since v1:
      - Don't use two separate thresholds, just make long IRQs count for 10
        each and short IRQs count for 1. This simplifies the code a bit
        - Ville Syrjälä
      Changes since v2:
      - Document @long_hpd in intel_hpd_irq_storm_detect, no functional
        changes
      Changes since v4:
      - Remove !! in long_hpd assignment - Ville Syrjälä
      - queue_hp = true - Ville Syrjälä
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181106213017.14563-6-lyude@redhat.com
      9a64c650
    • Lyude Paul's avatar
      drm/i915: Clarify flow for disabling IRQs on storms · 0759af9e
      Lyude Paul authored
      This is rather confusing to look at as-is:
      dev_priv->display.hpd_irq_setup(dev_priv); in intel_hpd_irq_handler()
      handles disabling the actual HPD IRQ, while
      intel_hpd_irq_storm_disable() handles moving the HPD pin state over from
      MARK_DISABLED to DISABLED along with enabling polling for it.
      
      Changes since v3:
      - Rename i915_hpd_irq_storm_disable() to
        i915_hpd_irq_storm_switch_to_polling() - Rodrigo Vivi
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181106213017.14563-5-lyude@redhat.com
      0759af9e
    • Lyude Paul's avatar
      drm/i915: Fix threshold check in intel_hpd_irq_storm_detect() · a4af7889
      Lyude Paul authored
      Currently in intel_hpd_irq_storm_detect() when we detect that the last
      recorded hotplug wasn't within the period defined by
      HPD_STORM_DETECT_DELAY, we make the mistake of resetting the HPD count
      to 0 without incrementing it. This results in us only enabling storm
      detection when we go +2 above the threshold, e.g. an HPD threshold of 5
      would not trigger a storm until we reach a total of 7 hotplugs.
      
      So: rework the code a bit so we reset the HPD count when
      HPD_STORM_DETECT_DELAY has passed, then increment the count afterwards.
      Also, clean things up a bit to make it easier to undertand.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181106213017.14563-4-lyude@redhat.com
      a4af7889
    • Lyude Paul's avatar
      drm/i915: Fix NULL deref when re-enabling HPD IRQs on systems with MST · fee61dee
      Lyude Paul authored
      Turns out that if you trigger an HPD storm on a system that has an MST
      topology connected to it, you'll end up causing the kernel to eventually
      hit a NULL deref:
      
      [  332.339041] BUG: unable to handle kernel NULL pointer dereference at 00000000000000ec
      [  332.340906] PGD 0 P4D 0
      [  332.342750] Oops: 0000 [#1] SMP PTI
      [  332.344579] CPU: 2 PID: 25 Comm: kworker/2:0 Kdump: loaded Tainted: G           O      4.18.0-rc3short-hpd-storm+ #2
      [  332.346453] Hardware name: LENOVO 20BWS1KY00/20BWS1KY00, BIOS JBET71WW (1.35 ) 09/14/2018
      [  332.348361] Workqueue: events intel_hpd_irq_storm_reenable_work [i915]
      [  332.350301] RIP: 0010:intel_hpd_irq_storm_reenable_work.cold.3+0x2f/0x86 [i915]
      [  332.352213] Code: 00 00 ba e8 00 00 00 48 c7 c6 c0 aa 5f a0 48 c7 c7 d0 73 62 a0 4c 89 c1 4c 89 04 24 e8 7f f5 af e0 4c 8b 04 24 44 89 f8 29 e8 <41> 39 80 ec 00 00 00 0f 85 43 13 fc ff 41 0f b6 86 b8 04 00 00 41
      [  332.354286] RSP: 0018:ffffc90000147e48 EFLAGS: 00010006
      [  332.356344] RAX: 0000000000000005 RBX: ffff8802c226c9d4 RCX: 0000000000000006
      [  332.358404] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88032dc95570
      [  332.360466] RBP: 0000000000000005 R08: 0000000000000000 R09: ffff88031b3dc840
      [  332.362528] R10: 0000000000000000 R11: 000000031a069602 R12: ffff8802c226ca20
      [  332.364575] R13: ffff8802c2268000 R14: ffff880310661000 R15: 000000000000000a
      [  332.366615] FS:  0000000000000000(0000) GS:ffff88032dc80000(0000) knlGS:0000000000000000
      [  332.368658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  332.370690] CR2: 00000000000000ec CR3: 000000000200a003 CR4: 00000000003606e0
      [  332.372724] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  332.374773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  332.376798] Call Trace:
      [  332.378809]  process_one_work+0x1a1/0x350
      [  332.380806]  worker_thread+0x30/0x380
      [  332.382777]  ? wq_update_unbound_numa+0x10/0x10
      [  332.384772]  kthread+0x112/0x130
      [  332.386740]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  332.388706]  ret_from_fork+0x35/0x40
      [  332.390651] Modules linked in: i915(O) vfat fat joydev btusb btrtl btbcm btintel bluetooth ecdh_generic iTCO_wdt wmi_bmof i2c_algo_bit drm_kms_helper intel_rapl syscopyarea sysfillrect x86_pkg_temp_thermal sysimgblt coretemp fb_sys_fops crc32_pclmul drm psmouse pcspkr mei_me mei i2c_i801 lpc_ich mfd_core i2c_core tpm_tis tpm_tis_core thinkpad_acpi wmi tpm rfkill video crc32c_intel serio_raw ehci_pci xhci_pci ehci_hcd xhci_hcd [last unloaded: i915]
      [  332.394963] CR2: 00000000000000ec
      
      This appears to be due to the fact that with an MST topology, not all
      intel_connector structs will have ->encoder set. So, fix this by
      skipping connectors without encoders in
      intel_hpd_irq_storm_reenable_work().
      
      For those wondering, this bug was found on accident while simulating HPD
      storms using a Chamelium connected to a ThinkPad T450s (Broadwell).
      
      Changes since v1:
      - Check intel_connector->mst_port instead of intel_connector->encoder
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Cc: stable@vger.kernel.org
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181106213017.14563-3-lyude@redhat.com
      fee61dee
    • Lyude Paul's avatar
      drm/i915: Fix possible race in intel_dp_add_mst_connector() · 66a5ab10
      Lyude Paul authored
      This hasn't caused any issues yet that I'm aware of, but as Ville
      Syrjälä pointed out - we need to make sure that
      intel_connector->mst_port is set before initializing MST connectors,
      since in theory we could potentially check intel_connector->mst_port in
      i915_hpd_poll_init_work() after registering the connector but before
      having written it's value.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://patchwork.freedesktop.org/patch/msgid/20181106213017.14563-2-lyude@redhat.com
      66a5ab10
    • Ville Syrjälä's avatar
      drm/i915: Clean up skl_program_scaler() · d0105af9
      Ville Syrjälä authored
      Remove the "sizes are 0 based" stuff that is not even true for the
      scaler.
      
      v2: Rebase
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181101151736.20522-1-ville.syrjala@linux.intel.comReviewed-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      d0105af9
    • Ville Syrjälä's avatar
      drm/i915: Nuke posting reads from plane update/disable funcs · e69b348a
      Ville Syrjälä authored
      No need for the posting reads in the plane update/disable hooks.
      If we need a posting read for something then a single one at the
      very end would be sufficient. We have that anyway in the form
      of eg. scanline/frame counter reads.
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181101150605.18235-2-ville.syrjala@linux.intel.comReviewed-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      e69b348a
    • Colin Ian King's avatar
    • Chris Wilson's avatar
      drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5 · 55f99bf2
      Chris Wilson authored
      Exercising the gpu reloc path strenuously revealed an issue where the
      updated relocations (from MI_STORE_DWORD_IMM) were not being observed
      upon execution. After some experiments with adding pipecontrols (a lot
      of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
      controls or even the current on), it was discovered that we merely
      needed to delay the EMIT_INVALIDATE by several flushes. It is important
      to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
      needs the delay as opposed to what one might first expect -- that the
      delay is required for the TLB invalidation to take effect (one presumes
      to purge any CS buffers) as opposed to a delay after flushing to ensure
      the writes have landed before triggering invalidation.
      
      Testcase: igt/gem_tiled_fence_blits
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181105094305.5767-1-chris@chris-wilson.co.uk
      55f99bf2
    • Kuo-Hsin Yang's avatar
      mm, drm/i915: mark pinned shmemfs pages as unevictable · 64e3d12f
      Kuo-Hsin Yang authored
      The i915 driver uses shmemfs to allocate backing storage for gem
      objects. These shmemfs pages can be pinned (increased ref count) by
      shmem_read_mapping_page_gfp(). When a lot of pages are pinned, vmscan
      wastes a lot of time scanning these pinned pages. In some extreme case,
      all pages in the inactive anon lru are pinned, and only the inactive
      anon lru is scanned due to inactive_ratio, the system cannot swap and
      invokes the oom-killer. Mark these pinned pages as unevictable to speed
      up vmscan.
      
      Export pagevec API check_move_unevictable_pages().
      
      This patch was inspired by Chris Wilson's change [1].
      
      [1]: https://patchwork.kernel.org/patch/9768741/
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarKuo-Hsin Yang <vovoy@chromium.org>
      Acked-by: Michal Hocko <mhocko@suse.com> # mm part
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarDave Hansen <dave.hansen@intel.com>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181106132324.17390-1-chris@chris-wilson.co.ukSigned-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      64e3d12f
  2. 06 Nov, 2018 4 commits
  3. 05 Nov, 2018 4 commits
  4. 03 Nov, 2018 2 commits
  5. 02 Nov, 2018 16 commits
  6. 01 Nov, 2018 4 commits