1. 27 Oct, 2021 6 commits
    • Matthew Brost's avatar
      drm/i915/guc: Fix recursive lock in GuC submission · 9ca8bb7a
      Matthew Brost authored
      Use __release_guc_id (lock held) rather than release_guc_id (acquires
      lock), add lockdep annotations.
      
      213.280129] i915: Running i915_perf_live_selftests/live_noa_gpr
      [ 213.283459] ============================================
      [ 213.283462] WARNING: possible recursive locking detected
      {{[ 213.283466] 5.15.0-rc6+ #18 Tainted: G U W }}
      [ 213.283470] --------------------------------------------
      [ 213.283472] kworker/u24:0/8 is trying to acquire lock:
      [ 213.283475] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x2df/0x350 [i915]
      {{[ 213.283618] }}
      {{ but task is already holding lock:}}
      [ 213.283621] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
      {{[ 213.283720] }}
      {{ other info that might help us debug this:}}
      [ 213.283724] Possible unsafe locking scenario:[ 213.283727] CPU0
      [ 213.283728] ----
      [ 213.283730] lock(&guc->submission_state.lock);
      [ 213.283734] lock(&guc->submission_state.lock);
      {{[ 213.283737] }}
      {{ *** DEADLOCK ***}}[ 213.283740] May be due to missing lock nesting notation[ 213.283744] 3 locks held by kworker/u24:0/8:
      [ 213.283747] #0: ffff8ffb80059d38 ((wq_completion)events_unbound){..}-{0:0}, at: process_one_work+0x1f3/0x550
      [ 213.283757] #1: ffffb509000e3e78 ((work_completion)(&guc->submission_state.destroyed_worker)){..}-{0:0}, at: process_one_work+0x1f3/0x550
      [ 213.283766] #2: ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
      {{[ 213.283860] }}
      {{ stack backtrace:}}
      [ 213.283863] CPU: 8 PID: 8 Comm: kworker/u24:0 Tainted: G U W 5.15.0-rc6+ #18
      [ 213.283868] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
      [ 213.283873] Workqueue: events_unbound destroyed_worker_func [i915]
      [ 213.283957] Call Trace:
      [ 213.283960] dump_stack_lvl+0x57/0x72
      [ 213.283966] __lock_acquire.cold+0x191/0x2d3
      [ 213.283972] lock_acquire+0xb5/0x2b0
      [ 213.283978] ? destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284059] ? destroyed_worker_func+0x2d7/0x350 [i915]
      [ 213.284139] ? lock_release+0xb9/0x280
      [ 213.284143] _raw_spin_lock_irqsave+0x48/0x60
      [ 213.284148] ? destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284226] destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284310] process_one_work+0x270/0x550
      [ 213.284315] worker_thread+0x52/0x3b0
      [ 213.284319] ? process_one_work+0x550/0x550
      [ 213.284322] kthread+0x135/0x160
      [ 213.284326] ? set_kthread_struct+0x40/0x40
      [ 213.284331] ret_from_fork+0x1f/0x30
      
      and a bit later in the trace:
      
      {{ 227.499864] do_raw_spin_lock+0x94/0xa0}}
      [ 227.499868] _raw_spin_lock_irqsave+0x50/0x60
      [ 227.499871] ? guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
      [ 227.499995] guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
      [ 227.500104] intel_guc_submission_reset_prepare+0x99/0x4b0 [i915]
      [ 227.500209] ? mark_held_locks+0x49/0x70
      [ 227.500212] intel_uc_reset_prepare+0x46/0x50 [i915]
      [ 227.500320] reset_prepare+0x78/0x90 [i915]
      [ 227.500412] __intel_gt_set_wedged.part.0+0x13/0xe0 [i915]
      [ 227.500485] intel_gt_set_wedged.part.0+0x54/0x100 [i915]
      [ 227.500556] intel_gt_set_wedged_on_fini+0x1a/0x30 [i915]
      [ 227.500622] intel_gt_driver_unregister+0x1e/0x60 [i915]
      [ 227.500694] i915_driver_remove+0x4a/0xf0 [i915]
      [ 227.500767] i915_pci_probe+0x84/0x170 [i915]
      [ 227.500838] local_pci_probe+0x42/0x80
      [ 227.500842] pci_device_probe+0xd9/0x190
      [ 227.500844] really_probe+0x1f2/0x3f0
      [ 227.500847] __driver_probe_device+0xfe/0x180
      [ 227.500848] driver_probe_device+0x1e/0x90
      [ 227.500850] __driver_attach+0xc4/0x1d0
      [ 227.500851] ? __device_attach_driver+0xe0/0xe0
      [ 227.500853] ? __device_attach_driver+0xe0/0xe0
      [ 227.500854] bus_for_each_dev+0x64/0x90
      [ 227.500856] bus_add_driver+0x12e/0x1f0
      [ 227.500857] driver_register+0x8f/0xe0
      [ 227.500859] i915_init+0x1d/0x8f [i915]
      [ 227.500934] ? 0xffffffffc144a000
      [ 227.500936] do_one_initcall+0x58/0x2d0
      [ 227.500938] ? rcu_read_lock_sched_held+0x3f/0x80
      [ 227.500940] ? kmem_cache_alloc_trace+0x238/0x2d0
      [ 227.500944] do_init_module+0x5c/0x270
      [ 227.500946] __do_sys_finit_module+0x95/0xe0
      [ 227.500949] do_syscall_64+0x38/0x90
      [ 227.500951] entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 227.500953] RIP: 0033:0x7ffa59d2ae0d
      [ 227.500954] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48
      [ 227.500955] RSP: 002b:00007fff320bbf48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [ 227.500956] RAX: ffffffffffffffda RBX: 00000000022ea710 RCX: 00007ffa59d2ae0d
      [ 227.500957] RDX: 0000000000000000 RSI: 00000000022e1d90 RDI: 0000000000000004
      [ 227.500958] RBP: 0000000000000020 R08: 00007ffa59df3a60 R09: 0000000000000070
      [ 227.500958] R10: 00000000022e1d90 R11: 0000000000000246 R12: 00000000022e1d90
      [ 227.500959] R13: 00000000022e58e0 R14: 0000000000000043 R15: 00000000022e42c0
      
      v2:
       (CI build)
        - Fix build error
      
      Fixes: 1a52faed ("drm/i915/guc: Take GT PM ref when deregistering context")
      Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211020192147.8048-1-matthew.brost@intel.com
      (cherry picked from commit 12a9917e)
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      9ca8bb7a
    • Jani Nikula's avatar
      drm/i915/cdclk: put the cdclk vtables in const data · 8a30b871
      Jani Nikula authored
      Add the const that was accidentally left out from the vtables.
      
      Fixes: 6b4cd9cb ("drm/i915: constify the cdclk vtable")
      Cc: Dave Airlie <airlied@redhat.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211021133408.32166-1-jani.nikula@intel.com
      (cherry picked from commit 877d0749)
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      8a30b871
    • Jani Nikula's avatar
      Revert "drm/i915/bios: gracefully disable dual eDP for now" · c4d6da21
      Jani Nikula authored
      This reverts commit 05734ca2.
      
      It's not graceful, instead it leads to boot time warning splats in the
      case it is supposed to handle gracefully. Apparently the BIOS/GOP
      enabling the port we end up skipping leads to state readout
      problems. Back to the drawing board.
      
      References: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21255/bat-adlp-4/boot0.txt
      Fixes: 05734ca2 ("drm/i915/bios: gracefully disable dual eDP for now")
      Cc: José Roberto de Souza <jose.souza@intel.com>
      Cc: Uma Shankar <uma.shankar@intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Swati Sharma <swati2.sharma@intel.com>
      Reviewed-by: default avatarJosé Roberto de Souza <jose.souza@intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211019114334.24643-1-jani.nikula@intel.com
      (cherry picked from commit 171c555c)
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      c4d6da21
    • Imre Deak's avatar
      drm/i915/dp: Ensure max link params are always valid · cc99bc62
      Imre Deak authored
      Atm until the DPCD for a connector is read the max link rate and lane
      count params are invalid. If the connector is modeset, in
      intel_dp_compute_config(), intel_dp_common_len_rate_limit(max_link_rate)
      will return 0, leading to a intel_dp->common_rates[-1] access.
      
      Fix the above by making sure the max link params are always valid.
      
      The above access leads to an undefined behaviour by definition, though
      not causing a user visible problem to my best knowledge, see the previous
      patch why. Nevertheless it is an undefined behaviour and it triggers a
      BUG() in CONFIG_UBSAN builds, hence CC:stable.
      
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Acked-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211018094154.1407705-4-imre.deak@intel.com
      (cherry picked from commit 9ad87de4)
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      cc99bc62
    • Imre Deak's avatar
      drm/i915/dp: Ensure sink rate values are always valid · 6c34bd45
      Imre Deak authored
      Atm, there are no sink rate values set for DP (vs. eDP) sinks until the
      DPCD capabilities are successfully read from the sink. During this time
      intel_dp->num_common_rates is 0 which can lead to a
      
      intel_dp->common_rates[-1]    (*)
      
      access, which is an undefined behaviour, in the following cases:
      
      - In intel_dp_sync_state(), if the encoder is enabled without a sink
        connected to the encoder's connector (BIOS enabled a monitor, but the
        user unplugged the monitor until the driver loaded).
      - In intel_dp_sync_state() if the encoder is enabled with a sink
        connected, but for some reason the DPCD read has failed.
      - In intel_dp_compute_link_config() if modesetting a connector without
        a sink connected on it.
      - In intel_dp_compute_link_config() if modesetting a connector with a
        a sink connected on it, but before probing the connector first.
      
      To avoid the (*) access in all the above cases, make sure that the sink
      rate table - and hence the common rate table - is always valid, by
      setting a default minimum sink rate when registering the connector
      before anything could use it.
      
      I also considered setting all the DP link rates by default, so that
      modesetting with higher resolution modes also succeeds in the last two
      cases above. However in case a sink is not connected that would stop
      working after the first modeset, due to the LT fallback logic. So this
      would need more work, beyond the scope of this fix.
      
      As I mentioned in the previous patch, I don't think the issue this patch
      fixes is user visible, however it is an undefined behaviour by
      definition and triggers a BUG() in CONFIG_UBSAN builds, hence CC:stable.
      
      v2: Clear the default sink rates, before initializing these for eDP.
      
      Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4297
      References: https://gitlab.freedesktop.org/drm/intel/-/issues/4298Suggested-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Acked-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211018143417.1452632-1-imre.deak@intel.com
      (cherry picked from commit 3f61ef97)
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      6c34bd45
    • Dave Airlie's avatar
      Merge tag 'amd-drm-next-5.16-2021-10-22' of... · 367fe8dc
      Dave Airlie authored
      Merge tag 'amd-drm-next-5.16-2021-10-22' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
      
      amd-drm-next-5.16-2021-10-22:
      
      amdgpu:
      - PSP fix for resume
      - XGMI fixes
      - Interrupt fix in device tear down
      - Renoir USB-C DP alt mode fix for resume
      - DP 2.0 fixes
      - Yellow Carp display fixes
      - Misc display fixes
      - RAS fixes
      - IP Discovery enumeration fixes
      - VGH fixes
      - SR-IOV fixes
      - Revert ChromeOS workaround in display code
      - Cyan Skillfish fixes
      
      amdkfd:
      - Fix error handling in gpu memory allocation
      - Fix build warnings with some configs
      - SVM fixes
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexander.deucher@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211022183112.4574-1-alexander.deucher@amd.com
      367fe8dc
  2. 22 Oct, 2021 15 commits
  3. 21 Oct, 2021 2 commits
  4. 20 Oct, 2021 17 commits