1. 09 Sep, 2013 1 commit
    • Daniel Vetter's avatar
      drm/i915: fix wait_for_pending_flips vs gpu hang deadlock · 17e1df07
      Daniel Vetter authored
      My g33 here seems to be shockingly good at hitting them all. This time
      around kms_flip/flip-vs-panning-vs-hang blows up:
      
      intel_crtc_wait_for_pending_flips correctly checks for gpu hangs and
      if a gpu hang is pending aborts the wait for outstanding flips so that
      the setcrtc call will succeed and release the crtc mutex. And the gpu
      hang handler needs that lock in intel_display_handle_reset to be able
      to complete outstanding flips.
      
      The problem is that we can race in two ways:
      - Waiters on the dev_priv->pending_flip_queue aren't woken up after
        we've the reset as pending, but before we actually start the reset
        work. This means that the waiter doesn't notice the pending reset
        and hence will keep on hogging the locks.
      
        Like with dev->struct_mutex and the ring->irq_queue wait queues we
        there need to wake up everyone that potentially holds a lock which
        the reset handler needs.
      
      - intel_display_handle_reset was called _after_ we've already
        signalled the completion of the reset work. Which means a waiter
        could sneak in, grab the lock and never release it (since the
        pageflips won't ever get released).
      
        Similar to resetting the gem state all the reset work must complete
        before we update the reset counter. Contrary to the gem reset we
        don't need to have a second explicit wake up call since that will
        have happened already when completing the pageflips. We also don't
        have any issues that the completion happens while the reset state is
        still pending - wait_for_pending_flips is only there to ensure we
        display the right frame. After a gpu hang&reset events such
        guarantees are out the window anyway. This is in contrast to the gem
        code where too-early wake-up would result in unnecessary restarting
        of ioctls.
      
      Also, since we've gotten these various deadlocks and ordering
      constraints wrong so often throw copious amounts of comments at the
      code.
      
      This deadlock regression has been introduced in the commit which added
      the pageflip reset logic to the gpu hang work:
      
      commit 96a02917
      Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Date:   Mon Feb 18 19:08:49 2013 +0200
      
          drm/i915: Finish page flips and update primary planes after a GPU reset
      
      v2:
      - Add comments to explain how the wake_up serves as memory barriers
        for the atomic_t reset counter.
      - Improve the comments a bit as suggested by Chris Wilson.
      - Extract the wake_up calls before/after the reset into a little
        i915_error_wake_up and unconditionally wake up the
        pending_flip_queue waiters, again as suggested by Chris Wilson.
      
      v3: Throw copious amounts of comments at i915_error_wake_up as
      suggested by Chris Wilson.
      
      Cc: stable@vger.kernel.org
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      17e1df07
  2. 08 Sep, 2013 1 commit
  3. 06 Sep, 2013 3 commits
  4. 05 Sep, 2013 3 commits
    • Chris Wilson's avatar
      drm/i915: Skip stolen region initialisation if none is reserved · 6644a4e9
      Chris Wilson authored
      Paulo reported that if he set the amount of reserved memory to 0, then
      we emitted a warning about a conflict before disabling our use of stolen
      memory. This was introduced with
      
      commit eaba1b8f
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Thu Jul 4 12:28:35 2013 +0100
      
          drm/i915: Verify that our stolen memory doesn't conflict
      
      and is simply fixed by checking for a no reservation first.
      Reported-by: default avatarPaulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      6644a4e9
    • Daniel Vetter's avatar
      drm/i915: fix gpu hang vs. flip stall deadlocks · 122f46ba
      Daniel Vetter authored
      Since we've started to clean up pending flips when the gpu hangs in
      
      commit 96a02917
      Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Date:   Mon Feb 18 19:08:49 2013 +0200
      
          drm/i915: Finish page flips and update primary planes after a GPU reset
      
      the gpu reset work now also grabs modeset locks. But since work items
      on our private work queue are not allowed to do that due to the
      flush_workqueue from the pageflip code this results in a neat
      deadlock:
      
      INFO: task kms_flip:14676 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kms_flip        D ffff88019283a5c0     0 14676  13344 0x00000004
       ffff88018e62dbf8 0000000000000046 ffff88013bdb12e0 ffff88018e62dfd8
       ffff88018e62dfd8 00000000001d3b00 ffff88019283a5c0 ffff88018ec21000
       ffff88018f693f00 ffff88018eece000 ffff88018e62dd60 ffff88018eece898
      Call Trace:
       [<ffffffff8138ee7b>] schedule+0x60/0x62
       [<ffffffffa046c0dd>] intel_crtc_wait_for_pending_flips+0xb2/0x114 [i915]
       [<ffffffff81050ff4>] ? finish_wait+0x60/0x60
       [<ffffffffa0478041>] intel_crtc_set_config+0x7f3/0x81e [i915]
       [<ffffffffa031780a>] drm_mode_set_config_internal+0x4f/0xc6 [drm]
       [<ffffffffa0319cf3>] drm_mode_setcrtc+0x44d/0x4f9 [drm]
       [<ffffffff810e44da>] ? might_fault+0x38/0x86
       [<ffffffffa030d51f>] drm_ioctl+0x2f9/0x447 [drm]
       [<ffffffff8107a722>] ? trace_hardirqs_off+0xd/0xf
       [<ffffffffa03198a6>] ? drm_mode_setplane+0x343/0x343 [drm]
       [<ffffffff8112222f>] ? mntput_no_expire+0x3e/0x13d
       [<ffffffff81117f33>] vfs_ioctl+0x18/0x34
       [<ffffffff81118776>] do_vfs_ioctl+0x396/0x454
       [<ffffffff81396b37>] ? sysret_check+0x1b/0x56
       [<ffffffff81118886>] SyS_ioctl+0x52/0x7d
       [<ffffffff81396b12>] system_call_fastpath+0x16/0x1b
      2 locks held by kms_flip/14676:
       #0:  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa0316545>] drm_modeset_lock_all+0x22/0x59 [drm]
       #1:  (&crtc->mutex){+.+.+.}, at: [<ffffffffa031656b>] drm_modeset_lock_all+0x48/0x59 [drm]
      INFO: task kworker/u8:4:175 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kworker/u8:4    D ffff88018de9a5c0     0   175      2 0x00000000
      Workqueue: i915 i915_error_work_func [i915]
       ffff88018e37dc30 0000000000000046 ffff8801938ab8a0 ffff88018e37dfd8
       ffff88018e37dfd8 00000000001d3b00 ffff88018de9a5c0 ffff88018ec21018
       0000000000000246 ffff88018e37dca0 000000005a865a86 ffff88018de9a5c0
      Call Trace:
       [<ffffffff8138ee7b>] schedule+0x60/0x62
       [<ffffffff8138f23d>] schedule_preempt_disabled+0x9/0xb
       [<ffffffff8138d0cd>] mutex_lock_nested+0x205/0x3b1
       [<ffffffffa0477094>] ? intel_display_handle_reset+0x7e/0xbd [i915]
       [<ffffffffa0477094>] ? intel_display_handle_reset+0x7e/0xbd [i915]
       [<ffffffffa0477094>] intel_display_handle_reset+0x7e/0xbd [i915]
       [<ffffffffa044e0a2>] i915_error_work_func+0x128/0x147 [i915]
       [<ffffffff8104a89a>] process_one_work+0x1d4/0x35a
       [<ffffffff8104a821>] ? process_one_work+0x15b/0x35a
       [<ffffffff8104b4a5>] worker_thread+0x144/0x1f0
       [<ffffffff8104b361>] ? rescuer_thread+0x275/0x275
       [<ffffffff8105076d>] kthread+0xac/0xb4
       [<ffffffff81059d30>] ? finish_task_switch+0x3b/0xc0
       [<ffffffff810506c1>] ? __kthread_parkme+0x60/0x60
       [<ffffffff81396a6c>] ret_from_fork+0x7c/0xb0
       [<ffffffff810506c1>] ? __kthread_parkme+0x60/0x60
      3 locks held by kworker/u8:4/175:
       #0:  (i915){.+.+.+}, at: [<ffffffff8104a821>] process_one_work+0x15b/0x35a
       #1:  ((&dev_priv->gpu_error.work)){+.+.+.}, at: [<ffffffff8104a821>] process_one_work+0x15b/0x35a
       #2:  (&crtc->mutex){+.+.+.}, at: [<ffffffffa0477094>] intel_display_handle_reset+0x7e/0xbd [i915]
      
      This blew up while running kms_flip/flip-vs-panning-vs-hang-interruptible
      on one of my older machines.
      
      Unfortunately (despite the proper lockdep annotations for
      flush_workqueue) lockdep still doesn't detect this correctly, so we
      need to rely on chance to discover these bugs.
      
      Apply the usual bugfix and schedule the reset work on the system
      workqueue to keep our own driver workqueue free of any modeset lock
      grabbing.
      
      Note that this is not a terribly serious regression since before the
      offending commit we'd simply have stalled userspace forever due to
      failing to abort all outstanding pageflips.
      
      v2: Add a comment as requested by Chris.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      122f46ba
    • Chris Wilson's avatar
      drm/i915: Hold an object reference whilst we shrink it · 57094f82
      Chris Wilson authored
      Whilst running the shrinker, we need to hold a reference as we unbind
      the objects, or else we may end up waiting for and retiring requests,
      which in turn may result in this object being freed.
      
      This is very similar to the eviction code which also has to be very
      careful to keep a reference to its objects as it retires and unbinds
      them.
      
      Another similarity, that Ben pointed out, is that as we may call
      retire-requests, the unbound_list is outside of our control. We must
      only process a single element of that list at a time, that is we can not
      rely on the "safe" next pointer being valid after a call to
      i915_vma_unbind().
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
        IP: [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
        PGD 758d3067 PUD ac0d6067 PMD 0
        Oops: 0000 [#1] SMP
        Modules linked in: dm_mod snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support pcspkr snd_hda_intel i2c_i801 snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd lpc_ich mfd_core soundcore battery ac option usb_wwan usbserial uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev i915 video button drm_kms_helper drm acpi_cpufreq mperf freq_table
        CPU: 1 PID: 16835 Comm: fbo-maxsize Not tainted 3.11.0-rc7_nightlytop_8fdad4_20130902_+ #7977
        task: ffff8800712106d0 ti: ffff880028e4a000 task.ti: ffff880028e4a000
        RIP: 0010:[<ffffffffa0082892>]  [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
        RSP: 0018:ffff880028e4b9e8  EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff880145734000 RCX: ffff880145735328
        RDX: ffff8801457353fc RSI: 0000000000000000 RDI: ffff88007597cc00
        RBP: ffff88007597cc00 R08: 0000000000000001 R09: ffff88014f257f00
        R10: ffffea0001d65f00 R11: 0000000000bba60b R12: ffff880149e5b000
        R13: ffff880145734001 R14: ffff88007597ccc8 R15: ffff88007597cc00
        FS:  00007ff5bc919740(0000) GS:ffff88014f240000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000008 CR3: 0000000028f4c000 CR4: 00000000001407e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Stack:
         0000000000000000 ffff88007597cc00 ffff8801440d6840 0000000000000000
         ffff880145734000 ffffffffa007c854 0000000000000010 ffff88007597c900
         0000000000018000 00000000004a1201 ffff88007597cc60 ffffffffa007d183
        Call Trace:
         [<ffffffffa007c854>] ? i915_vma_unbind+0xe2/0x1d1 [i915]
         [<ffffffffa007d183>] ? __i915_gem_shrink+0xf1/0x162 [i915]
         [<ffffffffa007d2ee>] ? i915_gem_object_get_pages_gtt+0xfa/0x303 [i915]
         [<ffffffffa00795f4>] ? i915_gem_object_get_pages+0x54/0x89 [i915]
         [<ffffffffa007cbda>] ? i915_gem_object_pin+0x238/0x5ce [i915]
         [<ffffffff812cba5f>] ? __sg_page_iter_next+0x2b/0x58
         [<ffffffffa0082056>] ? gen6_ppgtt_insert_entries+0xf2/0x114 [i915]
         [<ffffffffa007fe4b>] ? i915_gem_execbuffer_reserve_vma.isra.13+0x79/0x18d [i915]
         [<ffffffffa008017c>] ? i915_gem_execbuffer_reserve+0x21d/0x347 [i915]
         [<ffffffffa0080bfb>] ? i915_gem_do_execbuffer.isra.17+0x4f3/0xe61 [i915]
         [<ffffffffa00795f4>] ? i915_gem_object_get_pages+0x54/0x89 [i915]
         [<ffffffffa007e405>] ? i915_gem_pwrite_ioctl+0x743/0x7a5 [i915]
         [<ffffffffa0081a46>] ? i915_gem_execbuffer2+0x15e/0x1e4 [i915]
         [<ffffffffa000e20d>] ? drm_ioctl+0x2a5/0x3c4 [drm]
         [<ffffffffa00818e8>] ? i915_gem_execbuffer+0x37f/0x37f [i915]
         [<ffffffff816f64c0>] ? __do_page_fault+0x3ab/0x449
         [<ffffffff810be3da>] ? do_mmap_pgoff+0x2b2/0x341
         [<ffffffff810e49be>] ? vfs_ioctl+0x1e/0x31
         [<ffffffff810e5194>] ? do_vfs_ioctl+0x3ad/0x3ef
         [<ffffffff810e5224>] ? SyS_ioctl+0x4e/0x7e
         [<ffffffff816f88d2>] ? system_call_fastpath+0x16/0x1b
        Code: 52 0c a0 48 c7 c6 22 30 0d a0 31 c0 e8 ef 00 f9 ff bf c6 a7 00 00 e8 90 5d 24 e1 f6 85 13 01 00 00 10 75 44 48 8b 85 18 01 00 00 <8b> 50 08 48 8b 30 49 8b 84 24 88 02 00 00 48 89 c7 48 81 c7 98
        RIP  [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
        RSP <ffff880028e4b9e8>
        CR2: 0000000000000008
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: stable@vger.kernel.org
      [danvet: Bikeshed the comments a bit as discussed with Chris.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      57094f82
  5. 04 Sep, 2013 3 commits
    • Daniel Vetter's avatar
      drm/i915: fix i9xx_crtc_clock_get for multiplied pixels · a2dc53e7
      Daniel Vetter authored
      The dpll actually runs at the port clock so we don't need
      to multiply it again with the pixel multiplier to get the
      adjusted_mode.clock. This is in contrast to the ironlake
      pixel clock readout code which uses the fdi dotclock: That
      one does _not_ run with multiplied pixels.
      
      This issue goes back to the original clock readout code added
      in
      
      commit f1f644dc
      Author: Jesse Barnes <jbarnes@virtuousgeek.org>
      Date:   Thu Jun 27 00:39:25 2013 +0300
      
          drm/i915: get mode clock when reading the pipe config v9
      
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      a2dc53e7
    • Daniel Vetter's avatar
      drm/i915: handle sdvo input pixel multiplier correctly again · eeb47937
      Daniel Vetter authored
      The sdvo input timing needs to be the actual mode, the sdvo
      encoder automatically adjusts for the need of pixel doubling or
      quadrupling. This was lost in pipe config conversion of the
      pixel multiplier in
      
      commit 6cc5f341
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Wed Mar 27 00:44:53 2013 +0100
      
          drm/i915: add pipe_config->pixel_multiplier
      
      While at it ditch the intel_ prefix from the crtc in
      intel_sdvo_mode_set.
      
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      eeb47937
    • Daniel Vetter's avatar
      drm/i915: fix hpd work vs. flush_work in the pageflip code deadlock · 645416f5
      Daniel Vetter authored
      Historically we've run our own driver hotplug handling in our own
      work-queue, which then launched the drm core hotplug handling in the
      system workqueue. This is important since we flush our own driver
      workqueue in the pageflip code while hodling modeset locks, and only
      the drm hotplug code grabbed these locks. But with
      
      commit 69787f7d
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Tue Oct 23 18:23:34 2012 +0000
      
          drm: run the hpd irq event code directly
      
      this was changed and now we could deadlock in our flip handler if
      there's a hotplug work blocking the progress of the crucial unpin
      works. So this broke the careful deadlock avoidance implemented in
      
      commit b4a98e57
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Thu Nov 1 09:26:26 2012 +0000
      
          drm/i915: Flush outstanding unpin tasks before pageflipping
      
      Since the rule thus far has been that work items on our own workqueue
      may never grab modeset locks simply restore that rule again.
      
      v2: Add a comment to the declaration of dev_priv->wq to warn readers
      about the tricky implications of using it. Suggested by Chris Wilson.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Stuart Abercrombie <sabercrombie@chromium.org>
      Reported-by: default avatarStuart Abercrombie <sabercrombie@chromium.org>
      References: http://permalink.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/26239
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      [danvet: Squash in a comment at the place where we schedule the work.
      Requested after-the-fact by Chris on irc since the hpd work isn't the
      only place we botch this.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      645416f5
  6. 03 Sep, 2013 20 commits
  7. 02 Sep, 2013 6 commits
  8. 01 Sep, 2013 3 commits
    • Dave Airlie's avatar
      drm/nouveau: fix up 32-bit ioctls and device wake up. · 2254f637
      Dave Airlie authored
      Noticed by kbuild test robot.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      2254f637
    • Dave Airlie's avatar
      drm/tegra: fix up page flip flags. · a5b6f74e
      Dave Airlie authored
      This was one level away from where I'd grepped.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      a5b6f74e
    • Dave Airlie's avatar
      Merge branch 'drm-next-3.12' of git://people.freedesktop.org/~agd5f/linux into drm-next · 9c725e5b
      Dave Airlie authored
      Alex writes:
      This is the radeon drm-next request.  Big changes include:
      - support for dpm on CIK parts
      - support for ASPM on CIK parts
      - support for berlin GPUs
      - major ring handling cleanup
      - remove the old 3D blit code for bo moves in favor of CP DMA or sDMA
      - lots of bug fixes
      
      [airlied: fix up a bunch of conflicts from drm_order removal]
      
      * 'drm-next-3.12' of git://people.freedesktop.org/~agd5f/linux: (898 commits)
        drm/radeon/dpm: make sure dc performance level limits are valid (CI)
        drm/radeon/dpm: make sure dc performance level limits are valid (BTC-SI) (v2)
        drm/radeon: gcc fixes for extended dpm tables
        drm/radeon: gcc fixes for kb/kv dpm
        drm/radeon: gcc fixes for ci dpm
        drm/radeon: gcc fixes for si dpm
        drm/radeon: gcc fixes for ni dpm
        drm/radeon: gcc fixes for trinity dpm
        drm/radeon: gcc fixes for sumo dpm
        drm/radeonn: gcc fixes for rv7xx/eg/btc dpm
        drm/radeon: gcc fixes for rv6xx dpm
        drm/radeon: gcc fixes for radeon_atombios.c
        drm/radeon: enable UVD interrupts on CIK
        drm/radeon: fix init ordering for r600+
        drm/radeon/dpm: only need to reprogram uvd if uvd pg is enabled
        drm/radeon: check the return value of uvd_v1_0_start in uvd_v1_0_init
        drm/radeon: split out radeon_uvd_resume from uvd_v4_2_resume
        radeon kms: fix uninitialised hotplug work usage in r100_irq_process()
        drm/radeon/audio: set up the sads on DCE3.2 asics
        drm/radeon: fix handling of variable sized arrays for router objects
        ...
      
      Conflicts:
      	drivers/gpu/drm/i915/i915_dma.c
      	drivers/gpu/drm/i915/i915_gem_dmabuf.c
      	drivers/gpu/drm/i915/intel_pm.c
      	drivers/gpu/drm/radeon/cik.c
      	drivers/gpu/drm/radeon/ni.c
      	drivers/gpu/drm/radeon/r600.c
      9c725e5b