An error occurred fetching the project authors.
  1. 10 Oct, 2013 2 commits
  2. 09 Oct, 2013 1 commit
  3. 03 Oct, 2013 3 commits
    • Chris Wilson's avatar
      drm/i915: Tweak RPS thresholds to more aggressively downclock · dd75fdc8
      Chris Wilson authored
      After applying wait-boost we often find ourselves stuck at higher clocks
      than required. The current threshold value requires the GPU to be
      continuously and completely idle for 313ms before it is dropped by one
      bin. Conversely, we require the GPU to be busy for an average of 90% over
      a 84ms period before we upclock. So the current thresholds almost never
      downclock the GPU, and respond very slowly to sudden demands for more
      power. It is easy to observe that we currently lock into the wrong bin
      and both underperform in benchmarks and consume more power than optimal
      (just by repeating the task and measuring the different results).
      
      An alternative approach, as discussed in the bspec, is to use a
      continuous threshold for upclocking, and an average value for downclocking.
      This is good for quickly detecting and reacting to state changes within a
      frame, however it fails with the common throttling method of waiting
      upon the outstanding frame - at least it is difficult to choose a
      threshold that works well at 15,000fps and at 60fps. So continue to use
      average busy/idle loads to determine frequency change.
      
      v2: Use 3 power zones to keep frequencies low in steady-state mostly
      idle (e.g. scrolling, interactive 2D drawing), and frequencies high
      for demanding games. In between those end-states, we use a
      fast-reclocking algorithm to converge more quickly on the desired bin.
      
      v3: Bug fixes - make sure we reset adj after switching power zones.
      
      v4: Tune - drop the continuous busy thresholds as it prevents us from
      choosing the right frequency for glxgears style swap benchmarks. Instead
      the goal is to be able to find the right clocks irrespective of the
      wait-boost.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
      Cc: Owen Taylor <otaylor@redhat.com>
      Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
      Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
      Reviewed-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      dd75fdc8
    • Chris Wilson's avatar
      drm/i915: Boost RPS frequency for CPU stalls · b29c19b6
      Chris Wilson authored
      If we encounter a situation where the CPU blocks waiting for results
      from the GPU, give the GPU a kick to boost its the frequency.
      
      This should work to reduce user interface stalls and to quickly promote
      mesa to high frequencies - but the cost is that our requested frequency
      stalls high (as we do not idle for long enough before rc6 to start
      reducing frequencies, nor are we aggressive at down clocking an
      underused GPU). However, this should be mitigated by rc6 itself powering
      off the GPU when idle, and that energy use is dependent upon the workload
      of the GPU in addition to its frequency (e.g. the math or sampler
      functions only consume power when used). Still, this is likely to
      adversely affect light workloads.
      
      In particular, this nearly eliminates the highly noticeable wake-up lag
      in animations from idle. For example, expose or workspace transitions.
      (However, given the situation where we fail to downclock, our requested
      frequency is almost always the maximum, except for Baytrail where we
      manually downclock upon idling. This often masks the latency of
      upclocking after being idle, so animations are typically smooth - at the
      cost of increased power consumption.)
      
      Stéphane raised the concern that this will punish good applications and
      reward bad applications - but due to the nature of how mesa performs its
      client throttling, I believe all mesa applications will be roughly
      equally affected. To address this concern, and to prevent applications
      like compositors from permanently boosting the RPS state, we ratelimit the
      frequency of the wait-boosts each client recieves.
      
      Unfortunately, this techinique is ineffective with Ironlake - which also
      has dynamic render power states and suffers just as dramatically. For
      Ironlake, the thermal/power headroom is shared with the CPU through
      Intelligent Power Sharing and the intel-ips module. This leaves us with
      no GPU boost frequencies available when coming out of idle, and due to
      hardware limitations we cannot change the arbitration between the CPU and
      GPU quickly enough to be effective.
      
      v2: Limit each client to receiving a single boost for each active period.
          Tested by QA to only marginally increase power, and to demonstrably
          increase throughput in games. No latency measurements yet.
      
      v3: Cater for front-buffer rendering with manual throttling.
      
      v4: Tidy up.
      
      v5: Sadly the compositor needs frequent boosts as it may never idle, but
      due to its picking mechanism (using ReadPixels) may require frequent
      waits. Those waits, along with the waits for the vrefresh swap, conspire
      to keep the GPU at low frequencies despite the interactive latency. To
      overcome this we ditch the one-boost-per-active-period and just ratelimit
      the number of wait-boosts each client can receive.
      Reported-and-tested-by: default avatarPaul Neumann <paul104x@yahoo.de>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
      Cc: Owen Taylor <otaylor@redhat.com>
      Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
      Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
      Reviewed-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      [danvet: No extern for function prototypes in headers.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      b29c19b6
    • Ben Widawsky's avatar
      drm/i915: Clean up the ring scaling calculations · f6aca45c
      Ben Widawsky authored
      This patch attempts to clean up the ring/IA scaling programming in the
      following ways.
      1. Fix the comment about the DDR frequency. The math is 266MHz, not
      133MHz. Formula was right, docs are wrong.
      
      2. Mask the DCLK register since I don't know how it is defined on future
      platforms.
      
      3. use mult_frac instead of magic math.
      
      This helps for future platform enabling.
      
      v2: Actually use the right patch. The v1 was a mix of things, none of
      which was right. Note that due to rounding, we actually get different
      values (slightly higher) for the effective ring frequency.
      
      v3: Use 1.25 instead of 1.33 as the original code did. (Jesse)
      
      CC: Jesse Barnes <jbarnes@virtuousgeek.org>
      CC: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      f6aca45c
  4. 01 Oct, 2013 5 commits
  5. 20 Sep, 2013 5 commits
  6. 16 Sep, 2013 5 commits
  7. 10 Sep, 2013 4 commits
  8. 08 Sep, 2013 1 commit
  9. 03 Sep, 2013 2 commits
  10. 23 Aug, 2013 3 commits
    • Paulo Zanoni's avatar
      drm/i915: allow package C8+ states on Haswell (disabled) · c67a470b
      Paulo Zanoni authored
      This patch allows PC8+ states on Haswell. These states can only be
      reached when all the display outputs are disabled, and they allow some
      more power savings.
      
      The fact that the graphics device is allowing PC8+ doesn't mean that
      the machine will actually enter PC8+: all the other devices also need
      to allow PC8+.
      
      For now this option is disabled by default. You need i915.allow_pc8=1
      if you want it.
      
      This patch adds a big comment inside i915_drv.h explaining how it
      works and how it tracks things. Read it.
      
      v2: (this is not really v2, many previous versions were already sent,
           but they had different names)
          - Use the new functions to enable/disable GTIMR and GEN6_PMIMR
          - Rename almost all variables and functions to names suggested by
            Chris
          - More WARNs on the IRQ handling code
          - Also disable PC8 when there's GPU work to do (thanks to Ben for
            the help on this), so apps can run caster
          - Enable PC8 on a delayed work function that is delayed for 5
            seconds. This makes sure we only enable PC8+ if we're really
            idle
          - Make sure we're not in PC8+ when suspending
      v3: - WARN if IRQs are disabled on __wait_seqno
          - Replace some DRM_ERRORs with WARNs
          - Fix calls to restore GT and PM interrupts
          - Use intel_mark_busy instead of intel_ring_advance to disable PC8
      v4: - Use the force_wake, Luke!
      v5: - Remove the "IIR is not zero" WARNs
          - Move the force_wake chunk to its own patch
          - Only restore what's missing from RC6, not everything
      Signed-off-by: default avatarPaulo Zanoni <paulo.r.zanoni@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      c67a470b
    • Jesse Barnes's avatar
      drm/i915: drop WaMbcDriverBootEnable workaround · 3414caf6
      Jesse Barnes authored
      Turns out the BIOS will do this for us as needed, and if we try to do it
      again we risk hangs or other bad behavior.
      
      Note that this seems to break libva on ChromeOS after resumes (but
      strangely _not_ after booting up).
      
      This essentially reverts
      
      commit b4ae3f22
      Author: Jesse Barnes <jbarnes@virtuousgeek.org>
      Date:   Thu Jun 14 11:04:48 2012 -0700
      
          drm/i915: load boot context at driver init time
      
      and
      
      commit b3bf0766
      Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Date:   Tue Nov 20 13:27:44 2012 -0200
      
          drm/i915: implement WaMbcDriverBootEnable on Haswell
      Signed-off-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Reported-and-Tested-by: default avatarStéphane Marchesin <marcheu@chromium.org>
      [danvet: Add note about impact and regression citation.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      3414caf6
    • Paulo Zanoni's avatar
      drm/i915: wrap GEN6_PMIMR changes · edbfdb45
      Paulo Zanoni authored
      Just like we're doing with the other IMR changes.
      
      One of the functional changes is that not every caller was doing the
      POSTING_READ.
      Signed-off-by: default avatarPaulo Zanoni <paulo.r.zanoni@intel.com>
      Reviewed-by: default avatarRodrigo Vivi <rodrigo.vivi@gmail.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      edbfdb45
  11. 22 Aug, 2013 3 commits
    • Vinit Azad's avatar
      drm/i915: Only unmask required PM interrupts · fd547d25
      Vinit Azad authored
      Un-masking all PM interrupts causes hardware to generate
      interrupts regardless of whether the interrupts are enabled
      on the DE side. Since turbo only need up/down threshold and
      rc6 timeout interrupt, mask all other interrupts bits to avoid
      unnecessary overhead/wake up.
      
      Note that our interrupt handler isn't being fired since we do set the
      IER bits properly (IIR bits aren't set). The overhead isn't because
      our driver is reacting to these interrupts, but because hardware keeps
      generating internal messages when PMINTRMSK doesn't mask out the
      up/down EI interrupts (which happen periodically).
      
      Change-Id: I6c947df6fd5f60584d39b9e8b8c89faa51a5e827
      Signed-off-by: default avatarVinit Azad <vinit.azad@intel.com>
      [danvet: Add follow-up explanation of the precise effects from Vinit
      as a note to the commit message.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      fd547d25
    • Paulo Zanoni's avatar
      drm/i915: clarify Haswell power well bit names · 6aedd1f5
      Paulo Zanoni authored
      Whenever I need to work with the HSW_PWER_WELL_* register bits I have
      to look at the documentation to find out which bit is to request the
      power well and which one shows its current state. Rename the bits so I
      won't need to look the docs every time.
      Signed-off-by: default avatarPaulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      6aedd1f5
    • Stéphane Marchesin's avatar
      drm/i915: tune the RC6 threshold for stability · 351aa566
      Stéphane Marchesin authored
      It's basically the same deal as the RC6+ issues on ivy bridge
      except this time with RC6 on sandy bridge. Like last time the
      core of the issue is that the timings don't work 100% with our
      voltage regulator. So from time to time, the kernel will print
      a warning message about the GPU not getting out of RC6. In
      particular, I found this fairly easy to reproduce during
      suspend/resume.
      
      Changing the threshold to 125000 instead of 50000 seems to fix
      the issue. The previous patch used 150000 but as it turns out
      this doesn't work everywhere. After getting such a machine, I
      bisected the highest value which works, which is 125000, so here
      it is.
      
      I also measured the idle power usage before/after this patch and
      didn't see a difference on a sandy bridge laptop. On haswell and
      up, it makes a big difference, so we want to keep it at 50k
      there. It also seems like haswell doesn't have the RC6 issues
      that sandy bridge has so the 50k value is fine.
      Signed-off-by: default avatarStéphane Marchesin <marcheu@chromium.org>
      Acked-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      351aa566
  12. 09 Aug, 2013 2 commits
  13. 08 Aug, 2013 4 commits