1. 27 Apr, 2020 4 commits
    • Chris Wilson's avatar
      drm/i915/gt: Sanitize GT first · 4243cd53
      Chris Wilson authored
      We see that if the HW doesn't actually sleep, the HW may eat the poison
      we set in its write-only HWSP during sanitize:
      
        intel_gt_resume.part.8: 0000:00:02.0
        __gt_unpark: 0000:00:02.0
        gt_sanitize: 0000:00:02.0 force:yes
        process_csb: 0000:00:02.0 vcs0: cs-irq head=5, tail=90
        process_csb: 0000:00:02.0 vcs0: csb[0]: status=0x5a5a5a5a:0x5a5a5a5a
        assert_pending_valid: Nothing pending for promotion!
      
      The CS TAIL pointer should have been reset by reset_csb_pointers(), so
      in this case it is likely that we have read back from the CPU cache and
      so we must clflush our control over that page. In doing so, push the
      sanitisation to the start of the GT sequence so that our poisoning is
      assuredly before we start talking to the HW.
      
      References: https://gitlab.freedesktop.org/drm/intel/-/issues/1794Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200427084000.10999-1-chris@chris-wilson.co.uk
      4243cd53
    • Chris Wilson's avatar
      drm/i915/gt: Check cacheline is valid before acquiring · 2759e395
      Chris Wilson authored
      The hwsp_cacheline pointer from i915_request is very, very flimsy. The
      i915_request.timeline (and the hwsp_cacheline) are lost upon retiring
      (after an RCU grace). Therefore we need to confirm that once we have the
      right pointer for the cacheline, it is not in the process of being
      retired and disposed of before we attempt to acquire a reference to the
      cacheline.
      
      <3>[  547.208237] BUG: KASAN: use-after-free in active_debug_hint+0x6a/0x70 [i915]
      <3>[  547.208366] Read of size 8 at addr ffff88822a0d2710 by task gem_exec_parall/2536
      
      <4>[  547.208547] CPU: 3 PID: 2536 Comm: gem_exec_parall Tainted: G     U            5.7.0-rc2-ged7a286b5d02d-kasan_117+ #1
      <4>[  547.208556] Hardware name: Dell Inc. XPS 13 9350/, BIOS 1.4.12 11/30/2016
      <4>[  547.208564] Call Trace:
      <4>[  547.208579]  dump_stack+0x96/0xdb
      <4>[  547.208707]  ? active_debug_hint+0x6a/0x70 [i915]
      <4>[  547.208719]  print_address_description.constprop.6+0x16/0x310
      <4>[  547.208841]  ? active_debug_hint+0x6a/0x70 [i915]
      <4>[  547.208963]  ? active_debug_hint+0x6a/0x70 [i915]
      <4>[  547.208975]  __kasan_report+0x137/0x190
      <4>[  547.209106]  ? active_debug_hint+0x6a/0x70 [i915]
      <4>[  547.209127]  kasan_report+0x32/0x50
      <4>[  547.209257]  ? i915_gemfs_fini+0x40/0x40 [i915]
      <4>[  547.209376]  active_debug_hint+0x6a/0x70 [i915]
      <4>[  547.209389]  debug_print_object+0xa7/0x220
      <4>[  547.209405]  ? lockdep_hardirqs_on+0x348/0x5f0
      <4>[  547.209426]  debug_object_assert_init+0x297/0x430
      <4>[  547.209449]  ? debug_object_free+0x360/0x360
      <4>[  547.209472]  ? lock_acquire+0x1ac/0x8a0
      <4>[  547.209592]  ? intel_timeline_read_hwsp+0x4f/0x840 [i915]
      <4>[  547.209737]  ? i915_active_acquire_if_busy+0x66/0x120 [i915]
      <4>[  547.209861]  i915_active_acquire_if_busy+0x66/0x120 [i915]
      <4>[  547.209990]  ? __live_alloc.isra.15+0xc0/0xc0 [i915]
      <4>[  547.210005]  ? rcu_read_lock_sched_held+0xd0/0xd0
      <4>[  547.210017]  ? print_usage_bug+0x580/0x580
      <4>[  547.210153]  intel_timeline_read_hwsp+0xbc/0x840 [i915]
      <4>[  547.210284]  __emit_semaphore_wait+0xd5/0x480 [i915]
      <4>[  547.210415]  ? i915_fence_get_timeline_name+0x110/0x110 [i915]
      <4>[  547.210428]  ? lockdep_hardirqs_on+0x348/0x5f0
      <4>[  547.210442]  ? _raw_spin_unlock_irq+0x2a/0x40
      <4>[  547.210567]  ? __await_execution.constprop.51+0x2e0/0x570 [i915]
      <4>[  547.210706]  i915_request_await_dma_fence+0x8f7/0xc70 [i915]
      
      Fixes: 85bedbf1 ("drm/i915/gt: Eliminate the trylock for reading a timeline's hwsp")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: <stable@vger.kernel.org> # v5.6+
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200427093038.29219-1-chris@chris-wilson.co.uk
      2759e395
    • Chris Wilson's avatar
      drm/i915/execlists: Check preempt-timeout target before submit_ports · 68ace460
      Chris Wilson authored
      We evaluate *active, which is a pointer into execlists->inflight[]
      during dequeue to decide how long a preempt-timeout we need to apply.
      However, as soon as we do the submit_ports, the HW may send its ACK
      interrupt causing us to promote execlists->pending[] tp
      execlists->inflight[], overwriting the value of *active. We know *active
      is only stable until we submit (as we only submit when there is no
      pending promotion).
      
      [   16.102328] BUG: KCSAN: data-race in execlists_dequeue+0x1449/0x1600 [i915]
      [   16.102356]
      [   16.102375] race at unknown origin, with read to 0xffff8881e9500488 of 8 bytes by task 429 on cpu 1:
      [   16.102780]  execlists_dequeue+0x1449/0x1600 [i915]
      [   16.103160]  __execlists_submission_tasklet+0x48/0x60 [i915]
      [   16.103540]  execlists_submit_request+0x38e/0x3c0 [i915]
      [   16.103940]  submit_notify+0x8f/0xc0 [i915]
      [   16.104308]  __i915_sw_fence_complete+0x61/0x420 [i915]
      [   16.104683]  i915_sw_fence_complete+0x58/0x80 [i915]
      [   16.105054]  i915_sw_fence_commit+0x16/0x20 [i915]
      [   16.105457]  __i915_request_queue+0x60/0x70 [i915]
      [   16.105843]  i915_gem_do_execbuffer+0x2d6b/0x4230 [i915]
      [   16.106227]  i915_gem_execbuffer2_ioctl+0x2b0/0x580 [i915]
      [   16.106257]  drm_ioctl_kernel+0xe9/0x130
      [   16.106279]  drm_ioctl+0x27d/0x45e
      [   16.106311]  ksys_ioctl+0x89/0xb0
      [   16.106336]  __x64_sys_ioctl+0x42/0x60
      [   16.106370]  do_syscall_64+0x6e/0x2c0
      [   16.106397]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200426094231.21995-1-chris@chris-wilson.co.uk
      68ace460
    • Nick Desaulniers's avatar
      drm/i915: re-disable -Wframe-address · 9f4069b0
      Nick Desaulniers authored
      The top level Makefile disables this warning. When building an
      i386_defconfig with Clang, this warning is triggered a whole bunch via
      includes of headers from perf.
      
      Link: https://github.com/ClangBuiltLinux/continuous-integration/pull/182Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200426214215.139435-1-ndesaulniers@google.com
      9f4069b0
  2. 25 Apr, 2020 4 commits
  3. 24 Apr, 2020 15 commits
  4. 23 Apr, 2020 6 commits
  5. 22 Apr, 2020 4 commits
    • Chris Wilson's avatar
      drm/i915/execlists: Drop request-before-CS assertion · 15501287
      Chris Wilson authored
      When we migrated to execlists, one of the conditions we wanted to test
      for was whether the breadcrumb seqno was being written before the
      breadcumb interrupt was delivered. This was following on from issues
      observed on previous generations which were not so strongly ordered. With
      the removal of the missed interrupt detection, we have not reliable
      means of detecting the out-of-order seqno/interrupt but instead tried to
      assert that the relationship between the CS event interrupt and the
      breadwrite should be strongly ordered. However, Icelake proves it is
      possible for the HW implementation to forget about minor little details
      such as write ordering and so the order between *processing* the CS
      event and the breadcrumb is unreliable.
      
      Remove the unreliable assertion, but leave a debug telltale in case we
      have reason to suspect.
      
      Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1658Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200422141749.28709-1-chris@chris-wilson.co.uk
      15501287
    • Chris Wilson's avatar
      drm/i915/gem: Hold obj->vma.lock over for_each_ggtt_vma() · cb593e5d
      Chris Wilson authored
      While the ggtt vma are protected by their object lifetime, the list
      continues until it hits a non-ggtt vma, and that vma is not protected
      and may be freed as we inspect it. Hence, we require the obj->vma.lock
      to protect the list as we iterate.
      
      An example of forgetting to hold the obj->vma.lock is
      
      [1642834.464973] general protection fault, probably for non-canonical address 0xdead000000000122: 0000 [#1] SMP PTI
      [1642834.464977] CPU: 3 PID: 1954 Comm: Xorg Not tainted 5.6.0-300.fc32.x86_64 #1
      [1642834.464979] Hardware name: LENOVO 20ARS25701/20ARS25701, BIOS GJET94WW (2.44 ) 09/14/2017
      [1642834.465021] RIP: 0010:i915_gem_object_set_tiling+0x2c0/0x3e0 [i915]
      [1642834.465024] Code: 8b 84 24 18 01 00 00 f6 c4 80 74 59 49 8b 94 24 a0 00 00 00 49 8b 84 24 e0 00 00 00 49 8b 74 24 10 48 8b 92 30 01 00 00 89 c7 <80> ba 0a 06 00 00 03 0f 87 86 00 00 00 ba 00 00 08 00 b9 00 00 10
      [1642834.465025] RSP: 0018:ffffa98780c77d60 EFLAGS: 00010282
      [1642834.465028] RAX: ffff8d232bfb2578 RBX: 0000000000000002 RCX: ffff8d25873a0000
      [1642834.465029] RDX: dead000000000122 RSI: fffff0af8ac6e408 RDI: 000000002bfb2578
      [1642834.465030] RBP: ffff8d25873a0000 R08: ffff8d252bfb5638 R09: 0000000000000000
      [1642834.465031] R10: 0000000000000000 R11: ffff8d252bfb5640 R12: ffffa987801cb8f8
      [1642834.465032] R13: 0000000000001000 R14: ffff8d233e972e50 R15: ffff8d233e972d00
      [1642834.465034] FS:  00007f6a3d327f00(0000) GS:ffff8d25926c0000(0000) knlGS:0000000000000000
      [1642834.465036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [1642834.465037] CR2: 00007f6a2064d000 CR3: 00000002fb57c001 CR4: 00000000001606e0
      [1642834.465038] Call Trace:
      [1642834.465083]  i915_gem_set_tiling_ioctl+0x122/0x230 [i915]
      [1642834.465121]  ? i915_gem_object_set_tiling+0x3e0/0x3e0 [i915]
      [1642834.465151]  drm_ioctl_kernel+0x86/0xd0 [drm]
      [1642834.465156]  ? avc_has_perm+0x3b/0x160
      [1642834.465178]  drm_ioctl+0x206/0x390 [drm]
      [1642834.465216]  ? i915_gem_object_set_tiling+0x3e0/0x3e0 [i915]
      [1642834.465221]  ? selinux_file_ioctl+0x122/0x1c0
      [1642834.465226]  ? __do_munmap+0x24b/0x4d0
      [1642834.465231]  ksys_ioctl+0x82/0xc0
      [1642834.465235]  __x64_sys_ioctl+0x16/0x20
      [1642834.465238]  do_syscall_64+0x5b/0xf0
      [1642834.465243]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [1642834.465245] RIP: 0033:0x7f6a3d7b047b
      [1642834.465247] Code: 0f 1e fa 48 8b 05 1d aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed a9 0c 00 f7 d8 64 89 01 48
      [1642834.465249] RSP: 002b:00007ffe71adba28 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [1642834.465251] RAX: ffffffffffffffda RBX: 000055f99048fa40 RCX: 00007f6a3d7b047b
      [1642834.465253] RDX: 00007ffe71adba30 RSI: 00000000c0106461 RDI: 000000000000000e
      [1642834.465254] RBP: 0000000000000002 R08: 000055f98f3f1798 R09: 0000000000000002
      [1642834.465255] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000080
      [1642834.465257] R13: 000055f98f3f1690 R14: 00000000c0106461 R15: 00007ffe71adba30
      
      Now to take the spinlock during the list iteration, we need to break it
      down into two phases. In the first phase under the lock, we cannot sleep
      and so must defer the actual work to a second list, protected by the
      ggtt->mutex.
      
      We also need to hold the spinlock during creation of a new vma to
      serialise with updates of the tiling on the object.
      Reported-by: default avatarDave Airlie <airlied@redhat.com>
      Fixes: 2850748e ("drm/i915: Pull i915_vma_pin under the vm->mutex")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: <stable@vger.kernel.org> # v5.5+
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200422072805.17340-1-chris@chris-wilson.co.uk
      cb593e5d
    • Chris Wilson's avatar
      drm/i915/selftests: Try to detect rollback during batchbuffer preemption · c92724de
      Chris Wilson authored
      Since batch buffers dominant execution time, most preemption requests
      should naturally occur during execution of a batch buffer. We wish to
      verify that should a preemption occur within a batch buffer, when we
      come to restart that batch buffer, it occurs at the interrupted
      instruction and most importantly does not rollback to an earlier point.
      
      v2: Do not clear the GPR at the start of the batch, but rely on them
      being clear for new contexts.
      Suggested-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200422100903.25216-1-chris@chris-wilson.co.uk
      c92724de
    • Chris Wilson's avatar
      drm/i915/selftests: Disable heartbeat around RPS interrupt testing · cbb6f880
      Chris Wilson authored
      For verifying reciving the EI interrupts, we need to hold the GPU in
      very precise conditions (in terms of C0 cycles during the EI). If we
      preempt the busy load to handle the heartbeat, this may perturb the busy
      load causing us to miss the interrupt.
      
      The other tests, while not as time sensitive, may also be slightly
      perturbed, so apply the heartbeat protection across all the
      measurements.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200422083855.26842-1-chris@chris-wilson.co.uk
      cbb6f880
  6. 21 Apr, 2020 7 commits