1. 04 Jul, 2017 2 commits
  2. 03 Jul, 2017 3 commits
  3. 30 Jun, 2017 1 commit
  4. 29 Jun, 2017 3 commits
  5. 28 Jun, 2017 3 commits
  6. 27 Jun, 2017 5 commits
  7. 23 Jun, 2017 4 commits
  8. 22 Jun, 2017 7 commits
  9. 21 Jun, 2017 5 commits
  10. 20 Jun, 2017 7 commits
    • Chris Wilson's avatar
      drm/i915: Pass the right flags to i915_vma_move_to_active() · 25ffaa67
      Chris Wilson authored
      i915_vma_move_to_active() takes the execobject flags and not a boolean!
      Instead of passing EXEC_OBJECT_WRITE we passed true [i.e.
      EXEC_OBJECT_NEEDS_FENCE] causing us to start tracking the
      vma->last_fence access and since we forgot to clear that on unbinding,
      we caused a use-after-free.
      
      [  321.263854] BUG: KASAN: use-after-free in i915_gem_request_retire+0x1728/0x1740 [i915]
      [  321.264001] Read of size 8 at addr ffff880100fc67d8 by task gem_exec_reloc/2868
      
      [  321.264181] CPU: 0 PID: 2868 Comm: gem_exec_reloc Not tainted 4.12.0-rc6-CI-Custom_2759+ #1
      [  321.264195] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015
      [  321.264208] Call Trace:
      [  321.264234]  dump_stack+0x67/0x99
      [  321.264260]  print_address_description+0x77/0x290
      [  321.264437]  ? i915_gem_request_retire+0x1728/0x1740 [i915]
      [  321.264459]  kasan_report+0x269/0x350
      [  321.264487]  __asan_report_load8_noabort+0x14/0x20
      [  321.264660]  i915_gem_request_retire+0x1728/0x1740 [i915]
      [  321.264841]  ? intel_ring_context_pin+0x131/0x690 [i915]
      [  321.265021]  i915_gem_request_alloc+0x2c6/0x1220 [i915]
      [  321.265044]  ? _raw_spin_unlock_irqrestore+0x3d/0x60
      [  321.265226]  i915_gem_do_execbuffer+0xac0/0x2a20 [i915]
      [  321.265250]  ? __lock_acquire+0xceb/0x5450
      [  321.265269]  ? entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  321.265291]  ? kvmalloc_node+0x6b/0x80
      [  321.265310]  ? kvmalloc_node+0x6b/0x80
      [  321.265489]  ? eb_relocate_slow+0xbe0/0xbe0 [i915]
      [  321.265520]  ? ___slab_alloc.constprop.28+0x2ab/0x3d0
      [  321.265549]  ? debug_check_no_locks_freed+0x280/0x280
      [  321.265591]  ? __might_fault+0xc6/0x1b0
      [  321.265782]  i915_gem_execbuffer2+0x14a/0x3f0 [i915]
      [  321.265815]  drm_ioctl+0x4ba/0xaa0
      [  321.265986]  ? i915_gem_execbuffer+0xde0/0xde0 [i915]
      [  321.266017]  ? drm_getunique+0x270/0x270
      [  321.266068]  do_vfs_ioctl+0x17f/0xfa0
      [  321.266091]  ? __fget+0x1ba/0x330
      [  321.266112]  ? lock_acquire+0x390/0x390
      [  321.266133]  ? ioctl_preallocate+0x1d0/0x1d0
      [  321.266164]  ? __fget+0x1db/0x330
      [  321.266194]  ? __fget_light+0x79/0x1f0
      [  321.266219]  SyS_ioctl+0x3c/0x70
      [  321.266247]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  321.266265] RIP: 0033:0x7fcede207357
      [  321.266279] RSP: 002b:00007ffef0effe58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  321.266307] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fcede207357
      [  321.266321] RDX: 00007ffef0effef0 RSI: 0000000040406469 RDI: 0000000000000004
      [  321.266335] RBP: ffffffff812097c6 R08: 0000000000000008 R09: 0000000000000000
      [  321.266349] R10: 0000000000000008 R11: 0000000000000246 R12: ffff880116bcff98
      [  321.266363] R13: ffffffff81cb7cb3 R14: ffff880116bcff70 R15: 0000000000000000
      [  321.266385]  ? __this_cpu_preempt_check+0x13/0x20
      [  321.266406]  ? trace_hardirqs_off_caller+0x1d6/0x2c0
      
      [  321.266487] Allocated by task 2868:
      [  321.266568]  save_stack_trace+0x16/0x20
      [  321.266586]  kasan_kmalloc+0xee/0x180
      [  321.266602]  kasan_slab_alloc+0x12/0x20
      [  321.266620]  kmem_cache_alloc+0xc7/0x2e0
      [  321.266795]  i915_vma_instance+0x28c/0x1540 [i915]
      [  321.266964]  eb_lookup_vmas+0x5a7/0x2250 [i915]
      [  321.267130]  i915_gem_do_execbuffer+0x69a/0x2a20 [i915]
      [  321.267296]  i915_gem_execbuffer2+0x14a/0x3f0 [i915]
      [  321.267315]  drm_ioctl+0x4ba/0xaa0
      [  321.267333]  do_vfs_ioctl+0x17f/0xfa0
      [  321.267350]  SyS_ioctl+0x3c/0x70
      [  321.267369]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      
      [  321.267428] Freed by task 177:
      [  321.267502]  save_stack_trace+0x16/0x20
      [  321.267521]  kasan_slab_free+0xad/0x180
      [  321.267539]  kmem_cache_free+0xc5/0x340
      [  321.267710]  i915_vma_unbind+0x666/0x10a0 [i915]
      [  321.267880]  i915_vma_close+0x23a/0x2f0 [i915]
      [  321.268048]  __i915_gem_free_objects+0x17d/0xc70 [i915]
      [  321.268215]  __i915_gem_free_work+0x49/0x70 [i915]
      [  321.268234]  process_one_work+0x66f/0x1410
      [  321.268252]  worker_thread+0xe1/0xe90
      [  321.268269]  kthread+0x304/0x410
      [  321.268285]  ret_from_fork+0x27/0x40
      
      [  321.268346] The buggy address belongs to the object at ffff880100fc6640
                      which belongs to the cache i915_vma of size 656
      [  321.268550] The buggy address is located 408 bytes inside of
                      656-byte region [ffff880100fc6640, ffff880100fc68d0)
      [  321.268741] The buggy address belongs to the page:
      [  321.268837] page:ffffea000403f000 count:1 mapcount:0 mapping:          (null) index:0xffff880100fc5980 compound_mapcount: 0
      [  321.269045] flags: 0x8000000000008100(slab|head)
      [  321.269147] raw: 8000000000008100 0000000000000000 ffff880100fc5980 00000001001e001d
      [  321.269312] raw: ffffea0004038e20 ffff880116b46240 ffff88011646c640 0000000000000000
      [  321.269484] page dumped because: kasan: bad access detected
      
      [  321.269665] Memory state around the buggy address:
      [  321.269778]  ffff880100fc6680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.269949]  ffff880100fc6700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.270115] >ffff880100fc6780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.270279]                                                     ^
      [  321.270410]  ffff880100fc6800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.270576]  ffff880100fc6880: fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc
      [  321.270740] ==================================================================
      [  321.270903] Disabling lock debugging due to kernel taint
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101511
      Fixes: 7dd4f672 ("drm/i915: Async GPU relocation processing")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170620124321.1108-2-chris@chris-wilson.co.ukReviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      25ffaa67
    • Michel Thierry's avatar
      drm/i915: Enable Engine reset and recovery support · d3d3765f
      Michel Thierry authored
      This feature is made available only from Gen8, for previous gen devices
      driver uses legacy full gpu reset.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: default avatarTomas Elf <tomas.elf@intel.com>
      Signed-off-by: default avatarArun Siluvery <arun.siluvery@linux.intel.com>
      Signed-off-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-10-michel.thierry@intel.comReviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-10-chris@chris-wilson.co.uk
      d3d3765f
    • Michel Thierry's avatar
      drm/i915/selftests: reset engine self tests · abeb4def
      Michel Thierry authored
      Check that we can reset specific engines, also check the fallback to
      full reset if something didn't work.
      
      v2: rebase.
      v3: use RESET_ENGINE_IN_PROGRESS flag.
      v4: use I915_RESET_ENGINE flag.
      Signed-off-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-12-michel.thierry@intel.comReviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-9-chris@chris-wilson.co.uk
      abeb4def
    • Michel Thierry's avatar
      drm/i915: Export per-engine reset count info to debugfs · 061d06a2
      Michel Thierry authored
      A new variable is added to export the reset counts to debugfs, this
      includes full gpu reset and engine reset count. This is useful for tests
      where they are expected to trigger reset; these counts are checked before
      and after the test to ensure the same.
      
      v2: Include reset engine count in i915_engine_info too (Chris).
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: default avatarArun Siluvery <arun.siluvery@linux.intel.com>
      Signed-off-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-8-michel.thierry@intel.comReviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-8-chris@chris-wilson.co.uk
      061d06a2
    • Michel Thierry's avatar
      drm/i915: Add engine reset count to error state · 702c8f8e
      Michel Thierry authored
      Driver maintains count of how many times a given engine is reset, useful to
      capture this in error state also. It gives an idea of how engine is coping
      up with the workloads it is executing before this error state.
      
      A follow-up patch will provide this information in debugfs.
      
      v2: s/engine_reset/reset_engine/ (Chris)
          Define count as unsigned int (Tvrtko)
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: default avatarArun Siluvery <arun.siluvery@linux.intel.com>
      Signed-off-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-7-michel.thierry@intel.comReviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-7-chris@chris-wilson.co.uk
      702c8f8e
    • Michel Thierry's avatar
      drm/i915: Add support for per engine reset recovery · a1ef70e1
      Michel Thierry authored
      This change implements support for per-engine reset as an initial, less
      intrusive hang recovery option to be attempted before falling back to the
      legacy full GPU reset recovery mode if necessary. This is only supported
      from Gen8 onwards.
      
      Hangchecker determines which engines are hung and invokes error handler to
      recover from it. Error handler schedules recovery for each of those engines
      that are hung. The recovery procedure is as follows,
       - identifies the request that caused the hang and it is dropped
       - force engine to idle: this is done by issuing a reset request
       - reset the engine
       - re-init the engine to resume submissions.
      
      If engine reset fails then we fall back to heavy weight full gpu reset
      which resets all engines and reinitiazes complete state of HW and SW.
      
      v2: Rebase.
      v3: s/*engine_reset*/*reset_engine*/; freeze engine and irqs before
      calling i915_gem_reset_engine (Chris).
      v4: Rebase, modify i915_gem_reset_prepare to use a ring mask and
      reuse the function for reset_engine.
      v5: intel_reset_engine_start/cancel instead of request/unrequest_reset.
      v6: Clean up reset_engine function to not require mutex, i.e. no need to call
      revoke/restore_fences and _retire_requests (Chris).
      v7: Remove leftovers from v5, i.e. no need to disable irq, hold
      forcewake or wakeup the handoff bit (Chris).
      v8: engine_retire_requests should be (and it was) static; explain that
      we have to re-init the engine after reset, which is why the init_hw call
      is needed; check reset-in-progress flag (Chris).
      v9: Rebase, include code to pass the active request to gem_reset_engine
      (as it is already done in full reset). Remove unnecessary
      intel_reset_engine_start/cancel, these are executed as part of the
      reset.
      v10: Rebase, use the right I915_RESET_ENGINE flag.
      v11: Fixup to call reset_finish_engine even on error.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: default avatarTomas Elf <tomas.elf@intel.com>
      Signed-off-by: default avatarArun Siluvery <arun.siluvery@linux.intel.com>
      Signed-off-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-6-michel.thierry@intel.comReviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-6-chris@chris-wilson.co.uk
      a1ef70e1
    • Michel Thierry's avatar
      drm/i915: Modify error handler for per engine hang recovery · 142bc7d9
      Michel Thierry authored
      This is a preparatory patch which modifies error handler to do per engine
      hang recovery. The actual patch which implements this sequence follows
      later in the series. The aim is to prepare existing recovery function to
      adapt to this new function where applicable (which fails at this point
      because core implementation is lacking) and continue recovery using legacy
      full gpu reset.
      
      A helper function is also added to query the availability of engine
      reset. A subsequent patch will add the capability to query which type
      of reset is present (engine -> full -> no-reset) via the get-param
      ioctl.
      
      It has been decided that the error events that are used to notify user of
      reset will only be sent in case if full chip reset. In case of just
      single (or multiple) engine resets, userspace won't be notified by these
      events.
      
      Note that this implementation of engine reset is for i915 directly
      submitting to the ELSP, where the driver manages the hang detection,
      recovery and resubmission. With GuC submission these tasks are shared
      between driver and firmware; i915 will still responsible for detecting a
      hang, and when it does it will have to request GuC to reset that Engine and
      remind the firmware about the outstanding submissions. This will be
      added in different patch.
      
      v2: rebase, advertise engine reset availability in platform definition,
      add note about GuC submission.
      v3: s/*engine_reset*/*reset_engine*/. (Chris)
      Handle reset as 2 level resets, by first going to engine only and fall
      backing to full/chip reset as needed, i.e. reset_engine will need the
      struct_mutex.
      v4: Pass the engine mask to i915_reset. (Chris)
      v5: Rebase, update selftests.
      v6: Rebase, prepare for mutex-less reset engine.
      v7: Pass reset_engine mask as a function parameter, and iterate over the
      engine mask for reset_engine. (Chris)
      v8: Use i915.reset >=2 in has_reset_engine; remove redundant reset
      logging; add a reset-engine-in-progress flag to prevent concurrent
      resets, and avoid dual purposing of reset-backoff. (Chris)
      v9: Support reset of different engines in parallel (Chris)
      v10: Handle reset-engine flag locking better (Chris)
      v11: Squash in reporting of per-engine-reset availability.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: default avatarIan Lister <ian.lister@intel.com>
      Signed-off-by: default avatarTomas Elf <tomas.elf@intel.com>
      Signed-off-by: default avatarArun Siluvery <arun.siluvery@linux.intel.com>
      Signed-off-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-4-michel.thierry@intel.comReviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-5-chris@chris-wilson.co.uk
      142bc7d9