1. 03 Jul, 2017 1 commit
    • Ville Syrjälä's avatar
      drm/i915: Disable MSI for all pre-gen5 · ce3f7163
      Ville Syrjälä authored
      We have pretty clear evidence that MSIs are getting lost on g4x and
      somehow the interrupt logic doesn't seem to recover from that state
      even if we try hard to clear the IIR.
      
      Disabling IER around the normal IIR clearing in the irq handler isn't
      sufficient to avoid this, so the problem really seems to be further
      up the interrupt chain. This should guarantee that there's always
      an edge if any IIR bits are set after the interrupt handler is done,
      which should normally guarantee that the CPU interrupt is generated.
      That approach seems to work perfectly on VLV/CHV, but apparently
      not on g4x.
      
      MSI is documented to be broken on 965gm at least. The chipset spec
      says MSI is defeatured because interrupts can be delayed or lost,
      which fits well with what we're seeing on g4x. Previously we've
      already disabled GMBUS interrupts on g4x because somehow GMBUS
      manages to raise legacy interrupts even when MSI is enabled.
      
      Since there's such widespread MSI breakahge all over in the pre-gen5
      land let's just give up on MSI on these platforms.
      
      Seqno reporting might be negatively affected by this since the legcy
      interrupts aren't guaranteed to be ordered with the seqno writes,
      whereas MSI interrupts may be? But an occasioanlly missed seqno
      seems like a small price to pay for generally working interrupts.
      
      Cc: stable@vger.kernel.org
      Cc: Diego Viola <diego.viola@gmail.com>
      Tested-by: default avatarDiego Viola <diego.viola@gmail.com>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101261Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170626203051.28480-1-ville.syrjala@linux.intel.comReviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      (cherry picked from commit e38c2da0)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      ce3f7163
  2. 30 Jun, 2017 1 commit
  3. 29 Jun, 2017 1 commit
  4. 27 Jun, 2017 2 commits
    • Xiong Zhang's avatar
      drm/i915/gvt: Don't read ADPA_CRT_HOTPLUG_MONITOR from host · 75e64ff2
      Xiong Zhang authored
      When host connects a crt screen, linux guest will detect two
      screens: crt and dp. This is wrong as linux guest has only
      one dp.
      
      In order to avoid guest get host crt screen, we should set
      ADPA_CRT_HOTPLUG_MONITOR to none. But MMIO_RO(PCH_ADPA) prevent
      from that. So MMIO_DH should be used instead of MMIO_RO.
      
      v2: Clear its staus to none at initialize, so guest don't
          get host crt.(Zhangyu)
      v3: SKL doesn't have this register, limit it to pre_skl.(xiong)
      Signed-off-by: default avatarXiong Zhang <xiong.y.zhang@intel.com>
      Signed-off-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      75e64ff2
    • Xiong Zhang's avatar
      drm/i915/gvt: Set initial PORT_CLK_SEL vreg for BDW · 295a0d0b
      Xiong Zhang authored
      On BDW, when host physical screen and guest virtual screen aren't on
      the same DDI port, guest i915 driver prints the following error and
      stop running.
      [    6.775873] BUG: unable to handle kernel NULL pointer dereference
      at 0000000000000068
      [    6.775928] IP: intel_ddi_clock_get+0x81/0x430 [i915]
      [    6.776206] Call Trace:
      [    6.776233]  ? vgpu_read32+0x4f/0x100 [i915]
      [    6.776264]  intel_ddi_get_config+0x11c/0x230 [i915]
      [    6.776298]  intel_modeset_setup_hw_state+0x313/0xd40 [i915]
      [    6.776334]  intel_modeset_init+0xe49/0x18d0 [i915]
      [    6.776368]  ? vgpu_write32+0x53/0x100 [i915]
      [    6.776731]  ? intel_i2c_reset+0x42/0x50 [i915]
      [    6.777085]  ? intel_setup_gmbus+0x32a/0x350 [i915]
      [    6.777427]  i915_driver_load+0xabc/0x14d0 [i915]
      [    6.777768]  i915_pci_probe+0x4f/0x70 [i915]
      
      The null pointer is guest intel_crtc_state->shared_dpll which is
      setted in haswell_get_ddi_pll(). When guest and host screen are
      on different DDI port, host driver won't set PORT_CLK_SET(guest_port),
      so haswell_get_ddi_pll() will return null and don't set
      pipe_config->shared_dpll, once the following program refernce this
      structure, it will print the above error.
      
      This patch set the initial val of guest PORT_CLK_SEL(guest_port) to
      LCPLL_810. And guest i915 driver will reset this value according to
      guest screen mode.
      Signed-off-by: default avatarXiong Zhang <xiong.y.zhang@intel.com>
      Signed-off-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      295a0d0b
  5. 26 Jun, 2017 5 commits
    • Chris Wilson's avatar
      drm/i915: Clear execbuf's vma backpointer upon release · bdbbf7d6
      Chris Wilson authored
      commit 2889caa9 ("drm/i915: Eliminate lots of iterations over the
      execobjects array") jiggled around the error handling and replace a test
      that we cleaned up properly after ourselves with an assertion. That
      assertion failed because in the release function (moments after the
      assertion) we were indeed forgetting to mark the vma as cleared. The
      consequence was when testing an invalid relocation address, we would try
      to release the vma twice (following the couple of attempts to verify the
      address) and on the second release notice that the first release was
      incomplete.
      
      Testcase: igt/gem_reloc_overflow/invalid-address
      Fixes: 2889caa9 ("drm/i915: Eliminate lots of iterations over the execobjects array")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170622104722.2583-1-chris@chris-wilson.co.ukReviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      (cherry picked from commit 51d05e1b)
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      bdbbf7d6
    • Chris Wilson's avatar
      drm/i915: Pass the right flags to i915_vma_move_to_active() · b88eb199
      Chris Wilson authored
      i915_vma_move_to_active() takes the execobject flags and not a boolean!
      Instead of passing EXEC_OBJECT_WRITE we passed true [i.e.
      EXEC_OBJECT_NEEDS_FENCE] causing us to start tracking the
      vma->last_fence access and since we forgot to clear that on unbinding,
      we caused a use-after-free.
      
      [  321.263854] BUG: KASAN: use-after-free in i915_gem_request_retire+0x1728/0x1740 [i915]
      [  321.264001] Read of size 8 at addr ffff880100fc67d8 by task gem_exec_reloc/2868
      
      [  321.264181] CPU: 0 PID: 2868 Comm: gem_exec_reloc Not tainted 4.12.0-rc6-CI-Custom_2759+ #1
      [  321.264195] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015
      [  321.264208] Call Trace:
      [  321.264234]  dump_stack+0x67/0x99
      [  321.264260]  print_address_description+0x77/0x290
      [  321.264437]  ? i915_gem_request_retire+0x1728/0x1740 [i915]
      [  321.264459]  kasan_report+0x269/0x350
      [  321.264487]  __asan_report_load8_noabort+0x14/0x20
      [  321.264660]  i915_gem_request_retire+0x1728/0x1740 [i915]
      [  321.264841]  ? intel_ring_context_pin+0x131/0x690 [i915]
      [  321.265021]  i915_gem_request_alloc+0x2c6/0x1220 [i915]
      [  321.265044]  ? _raw_spin_unlock_irqrestore+0x3d/0x60
      [  321.265226]  i915_gem_do_execbuffer+0xac0/0x2a20 [i915]
      [  321.265250]  ? __lock_acquire+0xceb/0x5450
      [  321.265269]  ? entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  321.265291]  ? kvmalloc_node+0x6b/0x80
      [  321.265310]  ? kvmalloc_node+0x6b/0x80
      [  321.265489]  ? eb_relocate_slow+0xbe0/0xbe0 [i915]
      [  321.265520]  ? ___slab_alloc.constprop.28+0x2ab/0x3d0
      [  321.265549]  ? debug_check_no_locks_freed+0x280/0x280
      [  321.265591]  ? __might_fault+0xc6/0x1b0
      [  321.265782]  i915_gem_execbuffer2+0x14a/0x3f0 [i915]
      [  321.265815]  drm_ioctl+0x4ba/0xaa0
      [  321.265986]  ? i915_gem_execbuffer+0xde0/0xde0 [i915]
      [  321.266017]  ? drm_getunique+0x270/0x270
      [  321.266068]  do_vfs_ioctl+0x17f/0xfa0
      [  321.266091]  ? __fget+0x1ba/0x330
      [  321.266112]  ? lock_acquire+0x390/0x390
      [  321.266133]  ? ioctl_preallocate+0x1d0/0x1d0
      [  321.266164]  ? __fget+0x1db/0x330
      [  321.266194]  ? __fget_light+0x79/0x1f0
      [  321.266219]  SyS_ioctl+0x3c/0x70
      [  321.266247]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  321.266265] RIP: 0033:0x7fcede207357
      [  321.266279] RSP: 002b:00007ffef0effe58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  321.266307] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fcede207357
      [  321.266321] RDX: 00007ffef0effef0 RSI: 0000000040406469 RDI: 0000000000000004
      [  321.266335] RBP: ffffffff812097c6 R08: 0000000000000008 R09: 0000000000000000
      [  321.266349] R10: 0000000000000008 R11: 0000000000000246 R12: ffff880116bcff98
      [  321.266363] R13: ffffffff81cb7cb3 R14: ffff880116bcff70 R15: 0000000000000000
      [  321.266385]  ? __this_cpu_preempt_check+0x13/0x20
      [  321.266406]  ? trace_hardirqs_off_caller+0x1d6/0x2c0
      
      [  321.266487] Allocated by task 2868:
      [  321.266568]  save_stack_trace+0x16/0x20
      [  321.266586]  kasan_kmalloc+0xee/0x180
      [  321.266602]  kasan_slab_alloc+0x12/0x20
      [  321.266620]  kmem_cache_alloc+0xc7/0x2e0
      [  321.266795]  i915_vma_instance+0x28c/0x1540 [i915]
      [  321.266964]  eb_lookup_vmas+0x5a7/0x2250 [i915]
      [  321.267130]  i915_gem_do_execbuffer+0x69a/0x2a20 [i915]
      [  321.267296]  i915_gem_execbuffer2+0x14a/0x3f0 [i915]
      [  321.267315]  drm_ioctl+0x4ba/0xaa0
      [  321.267333]  do_vfs_ioctl+0x17f/0xfa0
      [  321.267350]  SyS_ioctl+0x3c/0x70
      [  321.267369]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      
      [  321.267428] Freed by task 177:
      [  321.267502]  save_stack_trace+0x16/0x20
      [  321.267521]  kasan_slab_free+0xad/0x180
      [  321.267539]  kmem_cache_free+0xc5/0x340
      [  321.267710]  i915_vma_unbind+0x666/0x10a0 [i915]
      [  321.267880]  i915_vma_close+0x23a/0x2f0 [i915]
      [  321.268048]  __i915_gem_free_objects+0x17d/0xc70 [i915]
      [  321.268215]  __i915_gem_free_work+0x49/0x70 [i915]
      [  321.268234]  process_one_work+0x66f/0x1410
      [  321.268252]  worker_thread+0xe1/0xe90
      [  321.268269]  kthread+0x304/0x410
      [  321.268285]  ret_from_fork+0x27/0x40
      
      [  321.268346] The buggy address belongs to the object at ffff880100fc6640
                      which belongs to the cache i915_vma of size 656
      [  321.268550] The buggy address is located 408 bytes inside of
                      656-byte region [ffff880100fc6640, ffff880100fc68d0)
      [  321.268741] The buggy address belongs to the page:
      [  321.268837] page:ffffea000403f000 count:1 mapcount:0 mapping:          (null) index:0xffff880100fc5980 compound_mapcount: 0
      [  321.269045] flags: 0x8000000000008100(slab|head)
      [  321.269147] raw: 8000000000008100 0000000000000000 ffff880100fc5980 00000001001e001d
      [  321.269312] raw: ffffea0004038e20 ffff880116b46240 ffff88011646c640 0000000000000000
      [  321.269484] page dumped because: kasan: bad access detected
      
      [  321.269665] Memory state around the buggy address:
      [  321.269778]  ffff880100fc6680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.269949]  ffff880100fc6700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.270115] >ffff880100fc6780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.270279]                                                     ^
      [  321.270410]  ffff880100fc6800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.270576]  ffff880100fc6880: fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc
      [  321.270740] ==================================================================
      [  321.270903] Disabling lock debugging due to kernel taint
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101511
      Fixes: 7dd4f672 ("drm/i915: Async GPU relocation processing")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170620124321.1108-2-chris@chris-wilson.co.ukReviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      (cherry picked from commit 25ffaa67)
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      b88eb199
    • Rodrigo Vivi's avatar
      drm/i915/cnl: Fix RMW on ddi vswing sequence. · 33b92c1e
      Rodrigo Vivi authored
      Paulo noticed that we were missing few bits clear
      before writing values back to the register on
      these RMW MMIO operations.
      
      v2: Remove "POST_" from CURSOR_COEFF_MASK. (Paulo).
      v3: Remove unnecessary braces. (Jani).
      
      Fixes: cf54ca8b ("drm/i915/cnl: Implement voltage swing sequence.")
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Cc: Manasi Navare <manasi.d.navare@intel.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: default avatarPaulo Zanoni <paulo.r.zanoni@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1497897572-22520-1-git-send-email-rodrigo.vivi@intel.com
      (cherry picked from commit 1f588aeb)
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      33b92c1e
    • Chuanxiao Dong's avatar
      drm/i915/gvt: Fix inconsistent locks holding sequence · f16bd3dd
      Chuanxiao Dong authored
      There are two kinds of locking sequence.
      
      One is in the thread which is started by vfio ioctl to do
      the iommu unmapping. The locking sequence is:
      	down_read(&group_lock) ----> mutex_lock(&cached_lock)
      
      The other is in the vfio release thread which will unpin all
      the cached pages. The lock sequence is:
      	mutex_lock(&cached_lock) ---> down_read(&group_lock)
      
      And, the cache_lock is used to protect the rb tree of the cache
      node and doing vfio unpin doesn't require this lock. Move the
      vfio unpin out of the cache_lock protected region.
      
      v2:
      - use for style instead of do{}while(1). (Zhenyu)
      
      Fixes: f30437c5 ("drm/i915/gvt: add KVMGT support")
      Signed-off-by: default avatarChuanxiao Dong <chuanxiao.dong@intel.com>
      Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      f16bd3dd
    • Chuanxiao Dong's avatar
      drm/i915/gvt: Fix possible recursive locking issue · 62d02fd1
      Chuanxiao Dong authored
      vfio_unpin_pages will hold a read semaphore however it is already hold
      in the same thread by vfio ioctl. It will cause below warning:
      
      [ 5102.127454] ============================================
      [ 5102.133379] WARNING: possible recursive locking detected
      [ 5102.139304] 4.12.0-rc4+ #3 Not tainted
      [ 5102.143483] --------------------------------------------
      [ 5102.149407] qemu-system-x86/1620 is trying to acquire lock:
      [ 5102.155624]  (&container->group_lock){++++++}, at: [<ffffffff817768c6>] vfio_unpin_pages+0x96/0xf0
      [ 5102.165626]
      but task is already holding lock:
      [ 5102.172134]  (&container->group_lock){++++++}, at: [<ffffffff8177728f>] vfio_fops_unl_ioctl+0x5f/0x280
      [ 5102.182522]
      other info that might help us debug this:
      [ 5102.189806]  Possible unsafe locking scenario:
      
      [ 5102.196411]        CPU0
      [ 5102.199136]        ----
      [ 5102.201861]   lock(&container->group_lock);
      [ 5102.206527]   lock(&container->group_lock);
      [ 5102.211191]
      *** DEADLOCK ***
      
      [ 5102.217796]  May be due to missing lock nesting notation
      
      [ 5102.225370] 3 locks held by qemu-system-x86/1620:
      [ 5102.230618]  #0:  (&container->group_lock){++++++}, at: [<ffffffff8177728f>] vfio_fops_unl_ioctl+0x5f/0x280
      [ 5102.241482]  #1:  (&(&iommu->notifier)->rwsem){++++..}, at: [<ffffffff810de775>] __blocking_notifier_call_chain+0x35/0x70
      [ 5102.253713]  #2:  (&vgpu->vdev.cache_lock){+.+...}, at: [<ffffffff8157b007>] intel_vgpu_iommu_notifier+0x77/0x120
      [ 5102.265163]
      stack backtrace:
      [ 5102.270022] CPU: 5 PID: 1620 Comm: qemu-system-x86 Not tainted 4.12.0-rc4+ #3
      [ 5102.277991] Hardware name: Intel Corporation S1200RP/S1200RP, BIOS S1200RP.86B.03.01.APER.061220151418 06/12/2015
      [ 5102.289445] Call Trace:
      [ 5102.292175]  dump_stack+0x85/0xc7
      [ 5102.295871]  validate_chain.isra.21+0x9da/0xaf0
      [ 5102.300925]  __lock_acquire+0x405/0x820
      [ 5102.305202]  lock_acquire+0xc7/0x220
      [ 5102.309191]  ? vfio_unpin_pages+0x96/0xf0
      [ 5102.313666]  down_read+0x2b/0x50
      [ 5102.317259]  ? vfio_unpin_pages+0x96/0xf0
      [ 5102.321732]  vfio_unpin_pages+0x96/0xf0
      [ 5102.326024]  intel_vgpu_iommu_notifier+0xe5/0x120
      [ 5102.331283]  notifier_call_chain+0x4a/0x70
      [ 5102.335851]  __blocking_notifier_call_chain+0x4d/0x70
      [ 5102.341490]  blocking_notifier_call_chain+0x16/0x20
      [ 5102.346935]  vfio_iommu_type1_ioctl+0x87b/0x920
      [ 5102.351994]  vfio_fops_unl_ioctl+0x81/0x280
      [ 5102.356660]  ? __fget+0xf0/0x210
      [ 5102.360261]  do_vfs_ioctl+0x93/0x6a0
      [ 5102.364247]  ? __fget+0x111/0x210
      [ 5102.367942]  SyS_ioctl+0x41/0x70
      [ 5102.371542]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      put the vfio_unpin_pages in a workqueue can fix this.
      
      v2:
      - use for style instead of do{}while(1). (Zhenyu)
      v3:
      - rename gvt_cache_mark to gvt_cache_mark_remove. (Zhenyu)
      
      Fixes: 659643f7 ("drm/i915/gvt/kvmgt: add vfio/mdev support to KVMGT")
      Signed-off-by: default avatarChuanxiao Dong <chuanxiao.dong@intel.com>
      Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      62d02fd1
  6. 20 Jun, 2017 8 commits
    • Dave Airlie's avatar
      Merge tag 'drm-misc-next-2017-06-19_0' of git://anongit.freedesktop.org/git/drm-misc into drm-next · 047b8e21
      Dave Airlie authored
      UAPI Changes:
      - vc4: Add get/set tiling format ioctls (Eric)
      
      Driver Changes:
      - vc4: Add tiling T-format support for scanout (Eric)
      - vc4: Use atomic helpers in commit (Boris)
      
      Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
      Cc: Eric Anholt <eric@anholt.net>
      
      * tag 'drm-misc-next-2017-06-19_0' of git://anongit.freedesktop.org/git/drm-misc:
        drm/vc4: Mimic drm_atomic_helper_commit() behavior
        drm/vc4: Add get/set tiling ioctls.
        drm/vc4: Add T-format scanout support.
      047b8e21
    • Dave Airlie's avatar
      drm/i915: remove rate_to_index, messed up merge. · 296923e1
      Dave Airlie authored
      This was from a merge I did incorrectly.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      296923e1
    • Dave Airlie's avatar
      Merge tag 'drm-intel-next-2017-06-19' of git://anongit.freedesktop.org/git/drm-intel into drm-next · 305b9edd
      Dave Airlie authored
      Final pile of features for 4.13
      
      New uabi:
      - batch bo in first slot, for faster execbuf assembly in userspace
        (Chris Wilson)
      - (sub)slice getparam, needed for mesa perf support (Robert Bragg)
      
      First pile of patches for cnl/cfl support, maintained by Rodrigo but
      with lots of contributions from others. Still incomplete since public
      review still ongoing.
      
      Features/refactoring:
      - Make execbuf faster (Chris Wilson), a pile of series to make execbuf
        buffer handling have fewer passes, use less list walking, postpone
        more work to async workers and shuffle buffers less, all to make the
        common case much faster (in some cases at least).
      - cold boot support for glk dsi (Madhav Chauhan)
      - Clean up pipe A quirk and related old platform hacks (Ville)
      - perf sampling support for kbl/glk (Lionel)
      - perf cleanups (Robert Bragg)
      - wire atomic state to backlight code, to avoid pipe lookup hacks
        (Maarten)
      - reduce request waiting latency/overhead to remove the spinning and
        associated cpu cycle wasting (Chris)
      - fix 90/270 rotation wm computation (Ville)
      - new ddb allocation algo for skl (Kumar Mahesh)
      - fix regression due to system suspend optimiazatino (Imre)
      - the usual pile of small cleanups and refactors all over
      
      GVT updates contained in this tag:
      - optimization for per-VM mmio save/restore (Changbin)
      - optimization for mmio hash table (Changbin)
      - scheduler optimization with event (Ping)
      - vGPU reset refinement (Fred)
      - other misc refactor and cleanups, etc.
      
      * tag 'drm-intel-next-2017-06-19' of git://anongit.freedesktop.org/git/drm-intel: (170 commits)
        drm/i915: Update DRIVER_DATE to 20170619
        drm/i915/cfl: Introduce Coffee Lake workarounds.
        drm/i915: Store 9 bits of PCI Device ID for platforms with a LP PCH
        drm/i915: Stash a pointer to the obj's resv in the vma
        drm/i915: Async GPU relocation processing
        drm/i915: Allow execbuffer to use the first object as the batch
        drm/i915: Wait upon userptr get-user-pages within execbuffer
        drm/i915: First try the previous execbuffer location
        drm/i915: Store a persistent reference for an object in the execbuffer cache
        drm/i915: Eliminate lots of iterations over the execobjects array
        drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations
        drm/i915: Pass vma to relocate entry
        drm/i915: Store a direct lookup from object handle to vma
        drm/i915: Fix retrieval of hangcheck stats
        drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty
        drm/i915: Mark CPU cache as dirty on every transition for CPU writes
        drm/i915: Make i915_vma_destroy() static
        drm/i915: Actually attach the tv_format property to the SDVO connector
        Revert "drm/i915/skl: New ddb allocation algorithm"
        drm/i915/glk: Add cold boot sequence for GLK DSI
        ...
      305b9edd
    • Dave Airlie's avatar
      Merge tag 'drm-msm-next-2017-06-20' of git://people.freedesktop.org/~robclark/linux into drm-next · eafae133
      Dave Airlie authored
      This time around, the biggest thing is a bunch of GEM rework for more
      fine grained locking and prep work to handle multiple address spaces
      (ie. per-process pagetables).  Also some HDMI fixes for 8x96
      (snapdragon 820).
      
      One unrelated bus patch, for something that seems to get merged
      through whatever random tree (and has all the right ack's).
      
      * tag 'drm-msm-next-2017-06-20' of git://people.freedesktop.org/~robclark/linux:
        drm/msm: Fix potential buffer overflow issue
        bus: SIMPLE_PM_BUS does not depend on ARCH_RENESAS
        drm/msm: Separate locking of buffer resources from struct_mutex
        drm/msm/hdmi: Fix HDMI pink strip issue seen on 8x96
        drm/msm/hdmi: 8996 PLL: Populate unprepare
        drm/msm/hdmi: Use bitwise operators when building register values
        drm/msm: update generated headers
        drm/msm: remove address-space id
        drm/msm: support for an arbitrary number of address spaces
        drm/msm: refactor how we handle vram carveout buffers
        drm/msm: pass address-space to _get_iova() and friends
        drm/msm/mdp4+5: move aspace/id to base class
        drm/msm/mdp5: kill pipe_lock
        drm/msm: fix locking inconsistency for gpu->hw_init()
        drm/msm: Remove memptrs->wptr
        drm/msm: Add a struct to pass configuration to msm_gpu_init()
        drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
        drm/msm: Remove idle function hook
        drm/msm: Remove DRM_MSM_NUM_IOCTLS
        drm/msm: gpu: Enable zap shader for A5XX
      eafae133
    • Dave Airlie's avatar
      Merge branch 'drm-next-4.13' of git://people.freedesktop.org/~agd5f/linux into drm-next · 8c52f364
      Dave Airlie authored
      A few more things for 4.13:
      - Semaphore support using sync objects
      - Drop fb location programming
      - Optimize bo list ioctl
      
      * 'drm-next-4.13' of git://people.freedesktop.org/~agd5f/linux:
        drm/amdgpu: Optimize mutex usage (v4)
        drm/amdgpu: Optimization of AMDGPU_BO_LIST_OP_CREATE (v2)
        amdgpu: use drm sync objects for shared semaphores (v6)
        amdgpu/cs: split out fence dependency checking (v2)
        drm/amdgpu: don't check the default value for vm size
      8c52f364
    • Dave Airlie's avatar
      Merge branch 'for-upstream/mali-dp' of git://linux-arm.org/linux-ld into drm-next · 3aaf4d95
      Dave Airlie authored
      Here are the Mali DP driver changes. They include the mali-dp specific
      changes from Jose Abreu on crtc->mode_valid() as well as a couple of
      patches for fixing the sharing of IRQ lines and use of DRM CMA helper
      for framebuffer physical address calculation. Please pull!
      
      * 'for-upstream/mali-dp' of git://linux-arm.org/linux-ld:
        drm/arm: mali-dp: Use CMA helper for plane buffer address calculation
        drm/mali-dp: Check PM status when sharing interrupt lines
        drm/arm: malidp: Use crtc->mode_valid() callback
      3aaf4d95
    • Dave Airlie's avatar
      Merge branch 'linux-4.13' of git://github.com/skeggsb/linux into drm-next · d02b0ffb
      Dave Airlie authored
      - HDMI stereoscopic support
      - Rework of display code to properly support SOR pad macro routing on
      >=GM20x GPUs
      - Various other fixes/improvements.
      
      * 'linux-4.13' of git://github.com/skeggsb/linux: (73 commits)
        drm/nouveau/disp/nv50-: avoid creating ORs that aren't present on HW
        drm/nouveau: use proper prototype in nouveau_pmops_runtime() definition
        drm/nouveau: Skip vga_fini on non-PCI device
        drm/nouveau/tegra: Don't leave GPU in reset
        drm/nouveau/tegra: Skip manual unpowergating when not necessary
        drm/nouveau/hwmon: Change permissions to numeric
        drm/nouveau/hwmon: expose the auto_point and pwm_min/max attrs
        drm/nouveau/hwmon: Remove old code, add .write/.read operations
        drm/nouveau/hwmon: Add nouveau_hwmon_ops structure with .is_visible/.read_string
        drm/nouveau/hwmon: Add config for all sensors and their settings
        drm/nouveau/disp/gm200-: allow non-identity mapping of SOR <-> macro links
        drm/nouveau/disp/nv50-: implement a common supervisor 3.0
        drm/nouveau/disp/nv50-: implement a common supervisor 2.2
        drm/nouveau/disp/nv50-: implement a common supervisor 2.1
        drm/nouveau/disp/nv50-: implement a common supervisor 2.0
        drm/nouveau/disp/nv50-: implement a common supervisor 1.0
        drm/nouveau/disp/nv50-gt21x: remove workaround for dp->tmds hotplug issues
        drm/nouveau/disp/dp: use new devinit script interpreter entry-point
        drm/nouveau/disp/dp: determine link bandwidth requirements from head state
        drm/nouveau/disp: introduce acquire/release display path methods
        ...
      d02b0ffb
    • Dave Airlie's avatar
      Merge tag 'drm/tegra/for-4.13-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next · 4a525bad
      Dave Airlie authored
      drm/tegra: Changes for v4.13-rc1
      
      This starts off with the addition of more documentation for the host1x
      and DRM drivers and finishes with a slew of fixes and enhancements for
      the staging IOCTLs as a result of the awesome work done by Dmitry and
      Erik on the grate reverse-engineering effort.
      
      * tag 'drm/tegra/for-4.13-rc1' of git://anongit.freedesktop.org/tegra/linux:
        gpu: host1x: At first try a non-blocking allocation for the gather copy
        gpu: host1x: Refactor channel allocation code
        gpu: host1x: Remove unused host1x_cdma_stop() definition
        gpu: host1x: Remove unused 'struct host1x_cmdbuf'
        gpu: host1x: Check waits in the firewall
        gpu: host1x: Correct swapped arguments in the is_addr_reg() definition
        gpu: host1x: Forbid unrelated SETCLASS opcode in the firewall
        gpu: host1x: Forbid RESTART opcode in the firewall
        gpu: host1x: Forbid relocation address shifting in the firewall
        gpu: host1x: Do not leak BO's phys address to userspace
        gpu: host1x: Correct host1x_job_pin() error handling
        gpu: host1x: Initialize firewall class to the job's one
        drm/tegra: dc: Disable plane if it is invisible
        drm/tegra: dc: Apply clipping to the plane
        drm/tegra: dc: Avoid reset asserts on Tegra20
        drm/tegra: Check syncpoint ID in the 'submit' IOCTL
        drm/tegra: Correct copying of waitchecks and disable them in the 'submit' IOCTL
        drm/tegra: Check for malformed offsets and sizes in the 'submit' IOCTL
        drm/tegra: Add driver documentation
        gpu: host1x: Flesh out kerneldoc
      4a525bad
  7. 19 Jun, 2017 4 commits
  8. 17 Jun, 2017 3 commits
  9. 16 Jun, 2017 15 commits
    • Rodrigo Vivi's avatar
      drm/i915/cfl: Introduce Coffee Lake workarounds. · 46c26662
      Rodrigo Vivi authored
      Coffee Lake inherit most of Kabylake production
      workarounds.
      
      v2: Fix typo on commit message and remove
          WaDisableKillLogic and GEN9_DISABLE_OCL_OOB_SUPPRESS_LOGIC,
          since as Mika pointed out they shouldn't be here for cfl
          according to BSpec.
      
      Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1497653398-15722-1-git-send-email-rodrigo.vivi@intel.com
      46c26662
    • Dave Airlie's avatar
      amdgpu: use drm sync objects for shared semaphores (v6) · 660e8558
      Dave Airlie authored
      This creates a new command submission chunk for amdgpu
      to add in and out sync objects around the submission.
      
      Sync objects are managed via the drm syncobj ioctls.
      
      The command submission interface is enhanced with two new
      chunks, one for syncobj pre submission dependencies,
      and one for post submission sync obj signalling,
      and just takes a list of handles for each.
      
      This is based on work originally done by David Zhou at AMD,
      with input from Christian Konig on what things should look like.
      
      In theory VkFences could be backed with sync objects and
      just get passed into the cs as syncobj handles as well.
      
      NOTE: this interface addition needs a version bump to expose
      it to userspace.
      
      TODO: update to dep_sync when rebasing onto amdgpu master.
      (with this - r-b from Christian)
      
      v1.1: keep file reference on import.
      v2: move to using syncobjs
      v2.1: change some APIs to just use p pointer.
      v3: make more robust against CS failures, we now add the
      wait sems but only remove them once the CS job has been
      submitted.
      v4: rewrite names of API and base on new syncobj code.
      v5: move post deps earlier, rename some apis
      v6: lookup post deps earlier, and just replace fences
      in post deps stage (Christian)
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      660e8558
    • Dave Airlie's avatar
      amdgpu/cs: split out fence dependency checking (v2) · 6f0308eb
      Dave Airlie authored
      This just splits out the fence depenency checking into it's
      own function to make it easier to add semaphore dependencies.
      
      v2: rebase onto other changes.
      
      v1-Reviewed-by: Christian König <christian.koenig@amd.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      6f0308eb
    • Alex Deucher's avatar
      drm/amdgpu: don't check the default value for vm size · 64dab074
      Alex Deucher authored
      Avoids printing spurious messages like this:
      [    3.102059] amdgpu 0000:01:00.0: VM size (-1) must be a power of 2
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Reviewed-by: default avatarMichel Dänzer <michel.daenzer@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      64dab074
    • Dhinakaran Pandiyan's avatar
      drm/i915: Store 9 bits of PCI Device ID for platforms with a LP PCH · 28e0f4ee
      Dhinakaran Pandiyan authored
      Although we use 9 bits of Device ID for identifying PCH, only 8 bits are
      stored in dev_priv->pch_id. This makes HAS_PCH_CNP_LP() and
      HAS_PCH_SPT_LP() incorrect. Fix this by storing all the 9 bits for the
      platforms with LP PCH.
      
      v2: Drop PCH_LPT_LP change (Imre)
      
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Imre Deak <imre.deak@intel.com>
      Fixes: commit ec7e0bb3 ("drm/i915/cnp: Add PCI ID for Cannonpoint LP PCH")
      Reported-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarImre Deak <imre.deak@intel.com>
      Signed-off-by: default avatarDhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1497641774-29104-1-git-send-email-dhinakaran.pandiyan@intel.com
      28e0f4ee
    • Chris Wilson's avatar
      drm/i915: Stash a pointer to the obj's resv in the vma · 95ff7c7d
      Chris Wilson authored
      During execbuf, a mandatory step is that we add this request (this
      fence) to each object's reservation_object. Inside execbuf, we track the
      vma, and to add the fence to the reservation_object then means having to
      first chase the obj, incurring another cache miss. We can reduce the
       number of cache misses by stashing a pointer to the reservation_object
      in the vma itself.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170616140525.6394-1-chris@chris-wilson.co.uk
      95ff7c7d
    • Chris Wilson's avatar
      drm/i915: Async GPU relocation processing · 7dd4f672
      Chris Wilson authored
      If the user requires patching of their batch or auxiliary buffers, we
      currently make the alterations on the cpu. If they are active on the GPU
      at the time, we wait under the struct_mutex for them to finish executing
      before we rewrite the contents. This happens if shared relocation trees
      are used between different contexts with separate address space (and the
      buffers then have different addresses in each), the 3D state will need
      to be adjusted between execution on each context. However, we don't need
      to use the CPU to do the relocation patching, as we could queue commands
      to the GPU to perform it and use fences to serialise the operation with
      the current activity and future - so the operation on the GPU appears
      just as atomic as performing it immediately. Performing the relocation
      rewrites on the GPU is not free, in terms of pure throughput, the number
      of relocations/s is about halved - but more importantly so is the time
      under the struct_mutex.
      
      v2: Break out the request/batch allocation for clearer error flow.
      v3: A few asserts to ensure rq ordering is maintained
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      7dd4f672
    • Chris Wilson's avatar
      drm/i915: Allow execbuffer to use the first object as the batch · 1a71cf2f
      Chris Wilson authored
      Currently, the last object in the execlist is the always the batch.
      However, when building the batch buffer we often know the batch object
      first and if we can use the first slot in the execlist we can emit
      relocation instructions relative to it immediately and avoid a separate
      pass to adjust the relocations to point to the last execlist slot.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      1a71cf2f
    • Chris Wilson's avatar
      drm/i915: Wait upon userptr get-user-pages within execbuffer · 8a2421bd
      Chris Wilson authored
      This simply hides the EAGAIN caused by userptr when userspace causes
      resource contention. However, it is quite beneficial with highly
      contended userptr users as we avoid repeating the setup costs and
      kernel-user context switches.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMichał Winiarski <michal.winiarski@intel.com>
      8a2421bd
    • Chris Wilson's avatar
      drm/i915: First try the previous execbuffer location · 616d9cee
      Chris Wilson authored
      When choosing a slot for an execbuffer, we ideally want to use the same
      address as last time (so that we don't have to rebind it) and the same
      address as expected by the user (so that we don't have to fixup any
      relocations pointing to it). If we first try to bind the incoming
      execbuffer->offset from the user, or the currently bound offset that
      should hopefully achieve the goal of avoiding the rebind cost and the
      relocation penalty. However, if the object is not currently bound there
      we don't want to arbitrarily unbind an object in our chosen position and
      so choose to rebind/relocate the incoming object instead. After we
      report the new position back to the user, on the next pass the
      relocations should have settled down.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtien@linux.intel.com>
      616d9cee
    • Chris Wilson's avatar
      drm/i915: Store a persistent reference for an object in the execbuffer cache · dade2a61
      Chris Wilson authored
      If we take a reference to the object/vma when it is first used in an
      execbuf, we can keep that reference until the object's file-local handle
      is closed. Thereby saving a frequent ref/unref pair.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      dade2a61
    • Chris Wilson's avatar
      drm/i915: Eliminate lots of iterations over the execobjects array · 2889caa9
      Chris Wilson authored
      The major scaling bottleneck in execbuffer is the processing of the
      execobjects. Creating an auxiliary list is inefficient when compared to
      using the execobject array we already have allocated.
      
      Reservation is then split into phases. As we lookup up the VMA, we
      try and bind it back into active location. Only if that fails, do we add
      it to the unbound list for phase 2. In phase 2, we try and add all those
      objects that could not fit into their previous location, with fallback
      to retrying all objects and evicting the VM in case of severe
      fragmentation. (This is the same as before, except that phase 1 is now
      done inline with looking up the VMA to avoid an iteration over the
      execobject array. In the ideal case, we eliminate the separate reservation
      phase). During the reservation phase, we only evict from the VM between
      passes (rather than currently as we try to fit every new VMA). In
      testing with Unreal Engine's Atlantis demo which stresses the eviction
      logic on gen7 class hardware, this speed up the framerate by a factor of
      2.
      
      The second loop amalgamation is between move_to_gpu and move_to_active.
      As we always submit the request, even if incomplete, we can use the
      current request to track active VMA as we perform the flushes and
      synchronisation required.
      
      The next big advancement is to avoid copying back to the user any
      execobjects and relocations that are not changed.
      
      v2: Add a Theory of Operation spiel.
      v3: Fall back to slow relocations in preparation for flushing userptrs.
      v4: Document struct members, factor out eb_validate_vma(), add a few
      more comments to explain some magic and hide other magic behind macros.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      2889caa9
    • Chris Wilson's avatar
      drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations · 071750e5
      Chris Wilson authored
      If we write a relocation into the buffer, we require our own implicit
      synchronisation added after the start of the execbuf, outside of the
      user's control. As we may end up clflushing, or doing the patch itself
      on the GPU, asynchronously we need to look at the implicit serialisation
      on obj->resv and hence need to disable EXEC_OBJECT_ASYNC for this
      object.
      
      If the user does trigger a stall for relocations, we make sure the stall
      is complete enough so that the batch is not submitted before we complete
      those relocations.
      
      Fixes: 77ae9957 ("drm/i915: Enable userspace to opt-out of implicit fencing")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      071750e5
    • Chris Wilson's avatar
      drm/i915: Pass vma to relocate entry · 507d977f
      Chris Wilson authored
      We can simplify our tracking of pending writes in an execbuf to the
      single bit in the vma->exec_entry->flags, but that requires the
      relocation function knowing the object's vma. Pass it along.
      
      Note we have only been using a single bit to track flushing since
      
      commit cc889e0f
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Wed Jun 13 20:45:19 2012 +0200
      
          drm/i915: disable flushing_list/gpu_write_list
      
      unconditionally flushed all render caches before the breadcrumb and
      
      commit 6ac42f41
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Sat Jul 21 12:25:01 2012 +0200
      
          drm/i915: Replace the complex flushing logic with simple invalidate/flush all
      
      did away with the explicit GPU domain tracking. This was then codified
      into the ABI with NO_RELOC in
      
      commit ed5982e6
      Author: Daniel Vetter <daniel.vetter@ffwll.ch> # Oi! Patch stealer!
      Date:   Thu Jan 17 22:23:36 2013 +0100
      
          drm/i915: Allow userspace to hint that the relocations were known
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      507d977f
    • Chris Wilson's avatar
      drm/i915: Store a direct lookup from object handle to vma · 4ff4b44c
      Chris Wilson authored
      The advent of full-ppgtt lead to an extra indirection between the object
      and its binding. That extra indirection has a noticeable impact on how
      fast we can convert from the user handles to our internal vma for
      execbuffer. In order to bypass the extra indirection, we use a
      resizable hashtable to jump from the object to the per-ctx vma.
      rhashtable was considered but we don't need the online resizing feature
      and the extra complexity proved to undermine its usefulness. Instead, we
      simply reallocate the hastable on demand in a background task and
      serialize it before iterating.
      
      In non-full-ppgtt modes, multiple files and multiple contexts can share
      the same vma. This leads to having multiple possible handle->vma links,
      so we only use the first to establish the fast path. The majority of
      buffers are not shared and so we should still be able to realise
      speedups with multiple clients.
      
      v2: Prettier names, more magic.
      v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      4ff4b44c