1. 08 Sep, 2014 4 commits
    • Chris Wilson's avatar
      drm/i915: Evict CS TLBs between batches · c4d69da1
      Chris Wilson authored
      Running igt, I was encountering the invalid TLB bug on my 845g, despite
      that it was using the CS workaround. Examining the w/a buffer in the
      error state, showed that the copy from the user batch into the
      workaround itself was suffering from the invalid TLB bug (the first
      cacheline was broken with the first two words reversed). Time to try a
      fresh approach. This extends the workaround to write into each page of
      our scratch buffer in order to overflow the TLB and evict the invalid
      entries. This could be refined to only do so after we update the GTT,
      but for simplicity, we do it before each batch.
      
      I suspect this supersedes our current workaround, but for safety keep
      doing both.
      
      v2: The magic number shall be 2.
      
      This doesn't conclusively prove that it is the mythical TLB bug we've
      been trying to workaround for so long, that it requires touching a number
      of pages to prevent the corruption indicates to me that it is TLB
      related, but the corruption (the reversed cacheline) is more subtle than
      a TLB bug, where we would expect it to read the wrong page entirely.
      
      Oh well, it prevents a reliable hang for me and so probably for others
      as well.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      c4d69da1
    • Daniel Vetter's avatar
      drm/i915: Fix irq enable tracking in driver load · 4868b45d
      Daniel Vetter authored
      A bunch of warnings fire on some ->irq_postinstall hooks since those
      can enable interrupts (e.g. rps interrupts). And then our ordering
      self-checks fire and complain.
      
      To fix that set the tracking boolen before enabling the irqs with
      drm_irq_install. Quoting the discussion with Jesse why that's safe:
      
      On Tue, Aug 26, 2014 at 11:18 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
      > Yes, it might work, but if you look through the history, we set this
      > field carefully; first to true in the irq_init code, then to false only
      > after the irq_install completes.  So I think your fragility arguments
      > apply to this change too.
      
      Well we've done it in 4 commits or so, but currently we have:
      
      - Set irqs_disabled to true early in driver load to make sure checks
      that. That's done in irq_init, which is totally not the function that
      enables interrupts, only the function that initializes all the vtables
      and similar things. We actually have a fairly sane naming scheme
      nowadays (not fully consistent ofc): _init is sw setup,
      _enable/_hw_init is the actual hw setup. That is done in
      95f25bed
      
      - Set irqs_disabled to false right after the irqs are actually
      enabled. This is done in ed2e6df1
      
      So my change should only move the flag change over the ->preinstall
      and ->postinstall hooks. I've done a little audit and didn't spot
      anything amiss. Furthermore the runtime pm setup already clears
      irqs_disabled _before_ calling these two hooks.
      
      This regression has been introduced in
      
      commit ed2e6df1
      Author: Jesse Barnes <jbarnes@virtuousgeek.org>
      Date:   Fri Jun 20 09:39:36 2014 -0700
      
          drm/i915: clear pm._irqs_disabled field after installing IRQs
      
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Oliver Hartkopp <socketcan@hartkopp.net>
      Tested-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Tested-by: Chris Wilson <chris@chris-wilson.co.uk> # gm45, ilk
      Reviewed-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      4868b45d
    • Daniel Vetter's avatar
      drm/i915: Fix EIO/wedged handling in gem fault handler · 2232f031
      Daniel Vetter authored
      In
      
      commit 1f83fee0
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Thu Nov 15 17:17:22 2012 +0100
      
          drm/i915: clear up wedged transitions
      
      I've accidentally inverted the EIO/wedged handling in the fault
      handler: We want to return the EIO as a SIGBUS only if it's not
      because of the gpu having died, to prevent userspace from unduly
      dying.
      
      In my defence the comment right above is completely misleading, so fix
      both.
      
      v2: Drop the WARN_ON, it's not actually a bug to e.g. receive an -EIO
      when swap-in fails.
      
      v3: Don't remove too much ... oops.
      Reported-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      2232f031
    • Chris Wilson's avatar
      drm/i915: Prevent recursive deadlock on releasing a busy userptr · ad46cb53
      Chris Wilson authored
      During release of the GEM object we hold the struct_mutex. As the
      object may be holding onto the last reference for the task->mm,
      calling mmput() may trigger exit_mmap() which close the vma
      which will call drm_gem_vm_close() and attempt to reacquire
      the struct_mutex. In order to avoid that recursion, we have
      to defer the mmput() until after we drop the struct_mutex,
      i.e. we need to schedule a worker to do the clean up. A further issue
      spotted by Tvrtko was caused when we took a GTT mmapping of a userptr
      buffer object. In that case, we would never call mmput as the object
      would be cyclically referenced by the GTT mmapping and not freed upon
      process exit - keeping the entire process mm alive after the process
      task was reaped. The fix employed is to replace the mm_users/mmput()
      reference handling to mm_count/mmdrop() for the shared i915_mm_struct.
      
         INFO: task test_surfaces:1632 blocked for more than 120 seconds.
               Tainted: GF          O 3.14.5+ #1
         "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
         test_surfaces   D 0000000000000000     0  1632   1590 0x00000082
          ffff88014914baa8 0000000000000046 0000000000000000 ffff88014914a010
          0000000000012c40 0000000000012c40 ffff8800a0058210 ffff88014784b010
          ffff88014914a010 ffff880037b1c820 ffff8800a0058210 ffff880037b1c824
         Call Trace:
          [<ffffffff81582499>] schedule+0x29/0x70
          [<ffffffff815825fe>] schedule_preempt_disabled+0xe/0x10
          [<ffffffff81583b93>] __mutex_lock_slowpath+0x183/0x220
          [<ffffffff81583c53>] mutex_lock+0x23/0x40
          [<ffffffffa005c2a3>] drm_gem_vm_close+0x33/0x70 [drm]
          [<ffffffff8115a483>] remove_vma+0x33/0x70
          [<ffffffff8115a5dc>] exit_mmap+0x11c/0x170
          [<ffffffff8104d6eb>] mmput+0x6b/0x100
          [<ffffffffa00f44b9>] i915_gem_userptr_release+0x89/0xc0 [i915]
          [<ffffffffa00e6706>] i915_gem_free_object+0x126/0x250 [i915]
          [<ffffffffa005c06a>] drm_gem_object_free+0x2a/0x40 [drm]
          [<ffffffffa005cc32>] drm_gem_object_handle_unreference_unlocked+0xe2/0x120 [drm]
          [<ffffffffa005ccd4>] drm_gem_object_release_handle+0x64/0x90 [drm]
          [<ffffffff8127ffeb>] idr_for_each+0xab/0x100
          [<ffffffffa005cc70>] ?  drm_gem_object_handle_unreference_unlocked+0x120/0x120 [drm]
          [<ffffffff81583c46>] ? mutex_lock+0x16/0x40
          [<ffffffffa005c354>] drm_gem_release+0x24/0x40 [drm]
          [<ffffffffa005b82b>] drm_release+0x3fb/0x480 [drm]
          [<ffffffff8118d482>] __fput+0xb2/0x260
          [<ffffffff8118d6de>] ____fput+0xe/0x10
          [<ffffffff8106f27f>] task_work_run+0x8f/0xf0
          [<ffffffff81052228>] do_exit+0x1a8/0x480
          [<ffffffff81052551>] do_group_exit+0x51/0xc0
          [<ffffffff810525d7>] SyS_exit_group+0x17/0x20
          [<ffffffff8158e092>] system_call_fastpath+0x16/0x1b
      
      v2: Incorporate feedback from Tvrtko and remove the unnessary mm
      referencing when creating the i915_mm_struct and improve some of the
      function names and comments.
      Reported-by: default avatarJacek Danecki <jacek.danecki@intel.com>
      Test-case: igt/gem_userptr_blits/process-exit*
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Tested-by: default avatar"Gong, Zhipeng" <zhipeng.gong@intel.com>
      Cc: Jacek Danecki <jacek.danecki@intel.com>
      Cc: "Ursulin, Tvrtko" <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatar"Ursulin, Tvrtko" <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: stable@vger.kernel.org # hold off until 3.17 ships for additional testing
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      ad46cb53
  2. 07 Sep, 2014 11 commits
  3. 06 Sep, 2014 6 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 2b12164b
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "A smattering of bug fixes across most architectures"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        powerpc/kvm/cma: Fix panic introduces by signed shift operation
        KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags
        KVM: s390/mm: Fix storage key corruption during swapping
        arm/arm64: KVM: Complete WFI/WFE instructions
        ARM/ARM64: KVM: Nuke Hyp-mode tlbs before enabling MMU
        KVM: s390/mm: try a cow on read only pages for key ops
        KVM: s390: Fix user triggerable bug in dead code
      2b12164b
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 56c22854
      Linus Torvalds authored
      Pull ARM SoC fixes from Kevin Hilman:
       "Another round of fixes from arm-soc land, which are mostly DT fixes
        for:
      
         - OMAP: handful of DT fixes devices on newly supported hardware
         - davinci: fix 2nd EDMA channel
         - ux500: extend previous pinctrl fix to another board
         - at91: clock registration fixes, compatibility string precision
      
        And one more fix for event cleanup in drivers/bus/arm-ccn"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        bus: arm-ccn: Move event cleanup routine
        ARM: at91/dt: rm9200: fix usb clock definition
        ARM: at91: rm9200: fix clock registration
        ARM: at91/dt: sam9g20: set at91sam9g20 pllb driver
        ARM: dts: dra7-evm: Add vtt regulator support
        ARM: dts: dra7-evm: Fix spi1 mux documentation
        ARM: dts: am43x-epos-evm: Disable QSPI to prevent conflict with GPMC-NAND
        ARM: OMAP2+: gpmc: Don't complain if wait pin is used without r/w monitoring
        ARM: dts: am43xx-epos-evm: Don't use read/write wait monitoring
        ARM: dts: am437x-gp-evm: Don't use read/write wait monitoring
        ARM: dts: am437x-gp-evm: Use BCH16 ECC scheme instead of BCH8
        ARM: dts: am43x-epos-evm: Use BCH16 ECC scheme instead of BCH8
        ARM: dts: am4372: fix USB regs size
        ARM: dts: am437x-gp: switch i2c0 to 100KHz
        ARM: dts: dra7-evm: Fix 8th NAND partition's name
        ARM: dts: dra7-evm: Fix i2c3 pinmux and frequency
        ARM: ux500: disable msp2 node on Snowball
        ARM: edma: Fix configuration parsing for SoCs with multiple eDMA3 CC
        ARM: dts: set 'ti,set-rate-parent' for dpll4_m5x2 clock
      56c22854
    • Linus Torvalds's avatar
      Merge tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs · 11e97398
      Linus Torvalds authored
      Pull xfs fixes from Dave Chinner:
       "The fixes all address recently discovered data corruption issues.
      
        The original Direct IO issue was discovered by Chris Mason @ Facebook
        on a production workload which mixed buffered reads with direct reads
        and writes IO to the same file.  The fix for that exposed other issues
        with page invalidation (exposed by millions of fsx operations) failing
        due to dirty buffers beyond EOF.
      
        Finally, the collapse_range code could also cause problems due to
        racing writeback changing the extent map while it was being shifted
        around.  The commits for that problem are simple mitigation fixes that
        prevent the problem from occuring.  A more robust fix for 3.18 that
        addresses the underlying problem is currently being worked on by
        Brian.
      
        Summary of fixes:
         - a direct IO read/buffered read data corruption
         - the associated fallout from the DIO data corruption fix
         - collapse range bugs that are potential data corruption issues"
      
      * tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
        xfs: trim eofblocks before collapse range
        xfs: xfs_file_collapse_range is delalloc challenged
        xfs: don't log inode unless extent shift makes extent modifications
        xfs: use ranged writeback and invalidation for direct IO
        xfs: don't zero partial page cache pages during O_DIRECT writes
        xfs: don't zero partial page cache pages during O_DIRECT writes
        xfs: don't dirty buffers beyond EOF
      11e97398
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20140905' of git://git.infradead.org/linux-mtd · 925e0ea4
      Linus Torvalds authored
      Pull mtd fixes from Brian Norris:
       "Two trivial MTD updates for 3.17-rc4:
      
         - a tiny comment tweak, to kill a bunch of DocBook warnings added
           during the merge window
      
         - a small fixup to the OTP routines' error handling"
      
      * tag 'for-linus-20140905' of git://git.infradead.org/linux-mtd:
        mtd: nand: fix DocBook warnings on nand_sdr_timings doc
        mtd: cfi_cmdset_0002: check return code for get_chip()
      925e0ea4
    • Thomas Gleixner's avatar
      timekeeping: Update timekeeper before updating vsyscall and pvclock · 9bf2419f
      Thomas Gleixner authored
      The update_walltime() code works on the shadow timekeeper to make the
      seqcount protected region as short as possible. But that update to the
      shadow timekeeper does not update all timekeeper fields because it's
      sufficient to do that once before it becomes life. One of these fields
      is tkr.base_mono. That stays stale in the shadow timekeeper unless an
      operation happens which copies the real timekeeper to the shadow.
      
      The update function is called after the update calls to vsyscall and
      pvclock. While not correct, it did not cause any problems because none
      of the invoked update functions used base_mono.
      
      commit cbcf2dd3 (x86: kvm: Make kvm_get_time_and_clockread()
      nanoseconds based) changed that in the kvm pvclock update function, so
      the stale mono_base value got used and caused kvm-clock to malfunction.
      
      Put the update where it belongs and fix the issue.
      Reported-by: default avatarChris J Arges <chris.j.arges@canonical.com>
      Reported-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanosSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      9bf2419f
    • Thomas Gleixner's avatar
      compat: nanosleep: Clarify error handling · 849151dd
      Thomas Gleixner authored
      The error handling in compat_sys_nanosleep() is correct, but
      completely non obvious. Document it and restrict it to the
      -ERESTART_RESTARTBLOCK return value for clarity.
      Reported-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      849151dd
  4. 05 Sep, 2014 17 commits
  5. 04 Sep, 2014 2 commits