1. 12 Feb, 2020 3 commits
    • Chris Wilson's avatar
      drm/i915/gt: Acquire ce->active before ce->pin_count/ce->pin_mutex · 5b92415e
      Chris Wilson authored
      Similar to commit ac0e331a ("drm/i915: Tighten atomicity of
      i915_active_acquire vs i915_active_release") we have the same race of
      trying to pin the context underneath a mutex while allowing the
      decrement to be atomic outside of that mutex. This leads to the problem
      where two threads may simultaneously try to pin the context and the
      second not notice that they needed to repin the context.
      
      <2> [198.669621] kernel BUG at drivers/gpu/drm/i915/gt/intel_timeline.c:387!
      <4> [198.669703] invalid opcode: 0000 [#1] PREEMPT SMP PTI
      <4> [198.669712] CPU: 0 PID: 1246 Comm: gem_exec_create Tainted: G     U  W         5.5.0-rc6-CI-CI_DRM_7755+ #1
      <4> [198.669723] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
      <4> [198.669776] RIP: 0010:timeline_advance+0x7b/0xe0 [i915]
      <4> [198.669785] Code: 00 48 c7 c2 10 f1 46 a0 48 c7 c7 70 1b 32 a0 e8 bb dd e7 e0 bf 01 00 00 00 e8 d1 af e7 e0 31 f6 bf 09 00 00 00 e8 35 ef d8 e0 <0f> 0b 48 c7 c1 48 fa 49 a0 ba 84 01 00 00 48 c7 c6 10 f1 46 a0 48
      <4> [198.669803] RSP: 0018:ffffc900004c3a38 EFLAGS: 00010296
      <4> [198.669810] RAX: ffff888270b35140 RBX: ffff88826f32ee00 RCX: 0000000000000006
      <4> [198.669818] RDX: 00000000000017c5 RSI: 0000000000000000 RDI: 0000000000000009
      <4> [198.669826] RBP: ffffc900004c3a64 R08: 0000000000000000 R09: 0000000000000000
      <4> [198.669834] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88826f9b5980
      <4> [198.669841] R13: 0000000000000cc0 R14: ffffc900004c3dc0 R15: ffff888253610068
      <4> [198.669849] FS:  00007f63e663fe40(0000) GS:ffff888276c00000(0000) knlGS:0000000000000000
      <4> [198.669857] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4> [198.669864] CR2: 00007f171f8e39a8 CR3: 000000026b1f6005 CR4: 00000000003606f0
      <4> [198.669872] Call Trace:
      <4> [198.669924]  intel_timeline_get_seqno+0x12/0x40 [i915]
      <4> [198.669977]  __i915_request_create+0x76/0x5a0 [i915]
      <4> [198.670024]  i915_request_create+0x86/0x1c0 [i915]
      <4> [198.670068]  i915_gem_do_execbuffer+0xbf2/0x2500 [i915]
      <4> [198.670082]  ? __lock_acquire+0x460/0x15d0
      <4> [198.670128]  i915_gem_execbuffer2_ioctl+0x11f/0x470 [i915]
      <4> [198.670171]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
      <4> [198.670181]  drm_ioctl_kernel+0xa7/0xf0
      <4> [198.670188]  drm_ioctl+0x2e1/0x390
      <4> [198.670233]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
      
      Fixes: 84135022 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
      References: ac0e331a ("drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200127152829.2842149-1-chris@chris-wilson.co.uk
      (cherry picked from commit e5429340)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      5b92415e
    • Chris Wilson's avatar
      drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release · 7c34bb03
      Chris Wilson authored
      As we use a mutex to serialise the first acquire (as it may be a lengthy
      operation), but only an atomic decrement for the release, we have to
      be careful in case a second thread races and completes both
      acquire/release as the first finishes its acquire.
      
      Thread A			Thread B
      i915_active_acquire		i915_active_acquire
        atomic_read() == 0		  atomic_read() == 0
        mutex_lock()			  mutex_lock()
      				  atomic_read() == 0
      				    ref->active();
      				  atomic_inc()
      				  mutex_unlock()
        atomic_read() == 1
      				i915_active_release
      				  atomic_dec_and_test() -> 0
      				    ref->retire()
        atomic_inc() -> 1
        mutex_unlock()
      
      So thread A has acquired the ref->active_count but since the ref was
      still active at the time, it did not initialise it. By switching the
      check inside the mutex to an atomic increment only if already active, we
      close the race.
      
      Fixes: c9ad602f ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200126102346.1877661-3-chris@chris-wilson.co.uk
      (cherry picked from commit ac0e331a)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      7c34bb03
    • Chris Wilson's avatar
      drm/i915: Stub out i915_gpu_coredump_put · 9556e5c7
      Chris Wilson authored
      i915_gpu_coreddump_put is currently only defined if
      CONFIG_DRM_I915_CAPTURE_ERROR is enabled, provide a stub otherwise.
      Reported-by: default avatarMike Lothian <mike@fireburn.co.uk>
      Fixes: 742379c0 ("drm/i915: Start chopping up the GPU error capture")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mike Lothian <mike@fireburn.co.uk>
      Cc: Andi Shyti <andi.shyti@intel.com>
      Reviewed-by: default avatarAndi Shyti <andi.shyti@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200124192255.541355-1-chris@chris-wilson.co.uk
      (cherry picked from commit 7e36505d)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      9556e5c7
  2. 11 Feb, 2020 5 commits
  3. 10 Feb, 2020 5 commits
  4. 09 Feb, 2020 5 commits
  5. 07 Feb, 2020 3 commits
  6. 06 Feb, 2020 3 commits
  7. 05 Feb, 2020 2 commits
    • Lyude Paul's avatar
      drm/amd/dm/mst: Ignore payload update failures · 58fe03d6
      Lyude Paul authored
      Disabling a display on MST can potentially happen after the entire MST
      topology has been removed, which means that we can't communicate with
      the topology at all in this scenario. Likewise, this also means that we
      can't properly update payloads on the topology and as such, it's a good
      idea to ignore payload update failures when disabling displays.
      Currently, amdgpu makes the mistake of halting the payload update
      process when any payload update failures occur, resulting in leaving
      DC's local copies of the payload tables out of date.
      
      This ends up causing problems with hotplugging MST topologies, and
      causes modesets on the second hotplug to fail like so:
      
      [drm] Failed to updateMST allocation table forpipe idx:1
      ------------[ cut here ]------------
      WARNING: CPU: 5 PID: 1511 at
      drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:2677
      update_mst_stream_alloc_table+0x11e/0x130 [amdgpu]
      Modules linked in: cdc_ether usbnet fuse xt_conntrack nf_conntrack
      nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4
      nft_counter nft_compat nf_tables nfnetlink tun bridge stp llc sunrpc
      vfat fat wmi_bmof uvcvideo snd_hda_codec_realtek snd_hda_codec_generic
      snd_hda_codec_hdmi videobuf2_vmalloc snd_hda_intel videobuf2_memops
      videobuf2_v4l2 snd_intel_dspcfg videobuf2_common crct10dif_pclmul
      snd_hda_codec videodev crc32_pclmul snd_hwdep snd_hda_core
      ghash_clmulni_intel snd_seq mc joydev pcspkr snd_seq_device snd_pcm
      sp5100_tco k10temp i2c_piix4 snd_timer thinkpad_acpi ledtrig_audio snd
      wmi soundcore video i2c_scmi acpi_cpufreq ip_tables amdgpu(O)
      rtsx_pci_sdmmc amd_iommu_v2 gpu_sched mmc_core i2c_algo_bit ttm
      drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm
      crc32c_intel serio_raw hid_multitouch r8152 mii nvme r8169 nvme_core
      rtsx_pci pinctrl_amd
      CPU: 5 PID: 1511 Comm: gnome-shell Tainted: G           O      5.5.0-rc7Lyude-Test+ #4
      Hardware name: LENOVO FA495SIT26/FA495SIT26, BIOS R12ET22W(0.22 ) 01/31/2019
      RIP: 0010:update_mst_stream_alloc_table+0x11e/0x130 [amdgpu]
      Code: 28 00 00 00 75 2b 48 8d 65 e0 5b 41 5c 41 5d 41 5e 5d c3 0f b6 06
      49 89 1c 24 41 88 44 24 08 0f b6 46 01 41 88 44 24 09 eb 93 <0f> 0b e9
      2f ff ff ff e8 a6 82 a3 c2 66 0f 1f 44 00 00 0f 1f 44 00
      RSP: 0018:ffffac428127f5b0 EFLAGS: 00010202
      RAX: 0000000000000002 RBX: ffff8d1e166eee80 RCX: 0000000000000000
      RDX: ffffac428127f668 RSI: ffff8d1e166eee80 RDI: ffffac428127f610
      RBP: ffffac428127f640 R08: ffffffffc03d94a8 R09: 0000000000000000
      R10: ffff8d1e24b02000 R11: ffffac428127f5b0 R12: ffff8d1e1b83d000
      R13: ffff8d1e1bea0b08 R14: 0000000000000002 R15: 0000000000000002
      FS:  00007fab23ffcd80(0000) GS:ffff8d1e28b40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f151f1711e8 CR3: 00000005997c0000 CR4: 00000000003406e0
      Call Trace:
       ? mutex_lock+0xe/0x30
       dc_link_allocate_mst_payload+0x9a/0x210 [amdgpu]
       ? dm_read_reg_func+0x39/0xb0 [amdgpu]
       ? core_link_enable_stream+0x656/0x730 [amdgpu]
       core_link_enable_stream+0x656/0x730 [amdgpu]
       dce110_apply_ctx_to_hw+0x58e/0x5d0 [amdgpu]
       ? dcn10_verify_allow_pstate_change_high+0x1d/0x280 [amdgpu]
       ? dcn10_wait_for_mpcc_disconnect+0x3c/0x130 [amdgpu]
       dc_commit_state+0x292/0x770 [amdgpu]
       ? add_timer+0x101/0x1f0
       ? ttm_bo_put+0x1a1/0x2f0 [ttm]
       amdgpu_dm_atomic_commit_tail+0xb59/0x1ff0 [amdgpu]
       ? amdgpu_move_blit.constprop.0+0xb8/0x1f0 [amdgpu]
       ? amdgpu_bo_move+0x16d/0x2b0 [amdgpu]
       ? ttm_bo_handle_move_mem+0x118/0x570 [ttm]
       ? ttm_bo_validate+0x134/0x150 [ttm]
       ? dm_plane_helper_prepare_fb+0x1b9/0x2a0 [amdgpu]
       ? _cond_resched+0x15/0x30
       ? wait_for_completion_timeout+0x38/0x160
       ? _cond_resched+0x15/0x30
       ? wait_for_completion_interruptible+0x33/0x190
       commit_tail+0x94/0x130 [drm_kms_helper]
       drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
       drm_atomic_helper_set_config+0x70/0xb0 [drm_kms_helper]
       drm_mode_setcrtc+0x194/0x6a0 [drm]
       ? _cond_resched+0x15/0x30
       ? mutex_lock+0xe/0x30
       ? drm_mode_getcrtc+0x180/0x180 [drm]
       drm_ioctl_kernel+0xaa/0xf0 [drm]
       drm_ioctl+0x208/0x390 [drm]
       ? drm_mode_getcrtc+0x180/0x180 [drm]
       amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
       do_vfs_ioctl+0x458/0x6d0
       ksys_ioctl+0x5e/0x90
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x55/0x1b0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fab2121f87b
      Code: 0f 1e fa 48 8b 05 0d 96 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
      ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
      f0 ff ff 73 01 c3 48 8b 0d dd 95 2c 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffd045f9068 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00007ffd045f90a0 RCX: 00007fab2121f87b
      RDX: 00007ffd045f90a0 RSI: 00000000c06864a2 RDI: 000000000000000b
      RBP: 00007ffd045f90a0 R08: 0000000000000000 R09: 000055dbd2985d10
      R10: 000055dbd2196280 R11: 0000000000000246 R12: 00000000c06864a2
      R13: 000000000000000b R14: 0000000000000000 R15: 000055dbd2196280
      ---[ end trace 6ea888c24d2059cd ]---
      
      Note as well, I have only been able to reproduce this on setups with 2
      MST displays.
      
      Changes since v1:
      * Don't return false when part 1 or part 2 of updating the payloads
        fails, we don't want to abort at any step of the process even if
        things fail
      Reviewed-by: default avatarMikita Lipski <Mikita.Lipski@amd.com>
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Acked-by: default avatarHarry Wentland <harry.wentland@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      58fe03d6
    • Alex Deucher's avatar
      drm/amdgpu: update default voltage for boot od table for navi1x · 7b913a76
      Alex Deucher authored
      It needed to be updated as well so it will show the proper values
      if you reset to the defaults.
      
      Bug: https://gitlab.freedesktop.org/drm/amd/issues/1020Reviewed-by: default avatarEvan Quan <evan.quan@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      7b913a76
  8. 04 Feb, 2020 12 commits
  9. 03 Feb, 2020 2 commits
    • Ben Skeggs's avatar
      drm/nouveau/kms/gv100-: avoid sending a core update until the first modeset · 137c4ba7
      Ben Skeggs authored
      The OR routing logic in NVKM does not expect to receive supervisor
      interrupts until the DD has provided consistent information on the
      ORs it's using and the EVO/NVD assembly state to match.
      
      The combination of changing window ownership + core channel update
      during display init triggered a situation where we'd disconnect an
      OR from the pad it was meant to still be driving on some systems.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      137c4ba7
    • Ben Skeggs's avatar
      drm/nouveau/kms/gv100-: move window ownership setup into modesetting path · 5bb88d07
      Ben Skeggs authored
      For various complicated reasons, we need to avoid sending a core update
      method during display init.  Something, which we've been required to do
      on GV100 and up because we've been assigning windows to heads there and
      the HW is rather picky about when that's allowed.
      
      This moves window assignment into the modesetting path at a point where
      it's much safer to send our first update methods to NVDisplay.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      5bb88d07