1. 13 May, 2019 2 commits
  2. 07 May, 2019 3 commits
    • Joonas Lahtinen's avatar
      Merge tag 'gvt-next-fixes-2019-05-07' of https://github.com/intel/gvt-linux... · 23372cce
      Joonas Lahtinen authored
      Merge tag 'gvt-next-fixes-2019-05-07' of https://github.com/intel/gvt-linux into drm-intel-next-fixes
      
      gvt-next-fixes-2019-05-07
      
      - Revert MCHBAR save range change for BXT regression (Yakui)
      - Align display dmabuf size for bytes instead of error-prone pages (Xiong)
      - Fix one context MMIO save/restore after RCS0 name change (Colin)
      - Misc klocwork warning/errors fixes (Aleksei)
      Signed-off-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      From: Zhenyu Wang <zhenyu.z.wang@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190507090558.GE12913@zhen-hp.sh.intel.com
      23372cce
    • Chris Wilson's avatar
      drm/i915: Disable semaphore busywaits on saturated systems · 2564fe70
      Chris Wilson authored
      Asking the GPU to busywait on a memory address, perhaps not unexpectedly
      in hindsight for a shared system, leads to bus contention that affects
      CPU programs trying to concurrently access memory. This can manifest as
      a drop in transcode throughput on highly over-saturated workloads.
      
      The only clue offered by perf, is that the bus-cycles (perf stat -e
      bus-cycles) jumped by 50% when enabling semaphores. This corresponds
      with extra CPU active cycles being attributed to intel_idle's mwait.
      
      This patch introduces a heuristic to try and detect when more than one
      client is submitting to the GPU pushing it into an oversaturated state.
      As we already keep track of when the semaphores are signaled, we can
      inspect their state on submitting the busywait batch and if we planned
      to use a semaphore but were too late, conclude that the GPU is
      overloaded and not try to use semaphores in future requests. In
      practice, this means we optimistically try to use semaphores for the
      first frame of a transcode job split over multiple engines, and fail if
      there are multiple clients active and continue not to use semaphores for
      the subsequent frames in the sequence. Periodically, we try to
      optimistically switch semaphores back on whenever the client waits to
      catch up with the transcode results.
      
      With 1 client, on Broxton J3455, with the relative fps normalized by %cpu:
      
      x no semaphores
      + drm-tip
      * patched
      +------------------------------------------------------------------------+
      |                                                    *                   |
      |                                                    *+                  |
      |                                                    **+                 |
      |                                                    **+  x              |
      |                                x               *  +**+  x              |
      |                                x  x       *    *  +***x xx             |
      |                                x  x       *    * *+***x *x             |
      |                                x  x*   +  *    * *****x *x x           |
      |                         +    x xx+x*   + ***   * ********* x   *       |
      |                         +    x xx+x*   * *** +** ********* xx  *       |
      |    *   +         ++++*  +    x*x****+*+* ***+*************+x*  *       |
      |*+ +** *+ + +* + *++****** *xxx**********x***+*****************+*++    *|
      |                                   |__________A_____M_____|             |
      |                           |_______________A____M_________|             |
      |                                 |____________A___M________|            |
      +------------------------------------------------------------------------+
          N           Min           Max        Median           Avg        Stddev
      x 120       2.60475       3.50941       3.31123     3.2143953    0.21117399
      + 120        2.3826       3.57077       3.25101     3.1414161    0.28146407
      Difference at 95.0% confidence
      	-0.0729792 +/- 0.0629585
      	-2.27039% +/- 1.95864%
      	(Student's t, pooled s = 0.248814)
      * 120       2.35536       3.66713        3.2849     3.2059917    0.24618565
      No difference proven at 95.0% confidence
      
      With 10 clients over-saturating the pipeline:
      
      x no semaphores
      + drm-tip
      * patched
      +------------------------------------------------------------------------+
      |                     ++                                        **       |
      |                     ++                                        **       |
      |                     ++                                        **       |
      |                     ++                                        **       |
      |                     ++                                    xx ***       |
      |                     ++                                    xx ***       |
      |                     ++                                    xxx***       |
      |                     ++                                    xxx***       |
      |                    +++                                    xxx***       |
      |                    +++                                    xx****       |
      |                    +++                                    xx****       |
      |                    +++                                    xx****       |
      |                    +++                                    xx****       |
      |                    ++++                                   xx****       |
      |                   +++++                                   xx****       |
      |                   +++++                                 x x******      |
      |                  ++++++                                 xxx*******     |
      |                  ++++++                                 xxx*******     |
      |                  ++++++                                 xxx*******     |
      |                  ++++++                                 xx********     |
      |                  ++++++                               xxxx********     |
      |                  ++++++                               xxxx********     |
      |                ++++++++                             xxxxx*********     |
      |+ +  +        + ++++++++                           xxx*xx**********x*  *|
      |                                                         |__A__|        |
      |                 |__AM__|                                               |
      |                                                            |__A_|      |
      +------------------------------------------------------------------------+
          N           Min           Max        Median           Avg        Stddev
      x 120       2.47855        2.8972       2.72376     2.7193402   0.074604933
      + 120       1.17367       1.77459       1.71977     1.6966782   0.085850697
      Difference at 95.0% confidence
      	-1.02266 +/- 0.0203502
      	-37.607% +/- 0.748352%
      	(Student's t, pooled s = 0.0804246)
      * 120       2.57868       3.00821       2.80142     2.7923878   0.058646477
      Difference at 95.0% confidence
      	0.0730476 +/- 0.0169791
      	2.68622% +/- 0.624383%
      	(Student's t, pooled s = 0.0671018)
      
      Indicating that we've recovered the regression from enabling semaphores
      on this saturated setup, with a hint towards an overall improvement.
      
      Very similar, but of smaller magnitude, results are observed on both
      Skylake(gt2) and Kabylake(gt4). This may be due to the reduced impact of
      bus-cycles, where we see a 50% hit on Broxton, it is only 10% on the big
      core, in this particular test.
      
      One observation to make here is that for a greedy client trying to
      maximise its own throughput, using semaphores is the right choice. It is
      only the holistic system-wide view that semaphores of one client
      impacts another and reduces the overall throughput where we would choose
      to disable semaphores.
      
      The most noticeable negactive impact this has is on the no-op
      microbenchmarks, which are also very notable for having no cpu bus load.
      In particular, this increases the runtime and energy consumption of
      gem_exec_whisper.
      
      Fixes: e8861964 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190504070707.30902-1-chris@chris-wilson.co.uk
      (cherry picked from commit ca6e56f6)
      Signed-off-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      2564fe70
    • Chris Wilson's avatar
      drm/i915: Delay semaphore submission until the start of the signaler · e766fde6
      Chris Wilson authored
      Currently we submit the semaphore busywait as soon as the signaler is
      submitted to HW. However, we may submit the signaler as the tail of a
      batch of requests, and even not as the first context in the HW list,
      i.e. the busywait may start spinning far in advance of the signaler even
      starting.
      
      If we wait until the request before the signaler is completed before
      submitting the busywait, we prevent the busywait from starting too
      early, if the signaler is not first in submission port.
      
      To handle the case where the signaler is at the start of the second (or
      later) submission port, we will need to delay the execution callback
      until we know the context is promoted to port0. A challenge for later.
      
      Fixes: e8861964 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190501114541.10077-9-chris@chris-wilson.co.uk
      (cherry picked from commit 0d90ccb7)
      [Joonas: edited Fixes: tag into single line.]
      Signed-off-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      e766fde6
  3. 05 May, 2019 1 commit
    • Colin Xu's avatar
      drm/i915/gvt: Add in context mmio 0x20D8 to gen9 mmio list · 75fdb811
      Colin Xu authored
      Depends on GEN family and I915_PARAM_HAS_CONTEXT_ISOLATION, Mesa driver
      will decide whether constant buffer 0 address is relative or absolute,
      and load GPU initial state by lri to context mmio INSTPM (GEN8)
      or 0x20D8 (>=GEN9).
      Mesa Commit fa8a764b62
      ("i965: Use absolute addressing for constant buffer 0 on Kernel 4.16+.")
      
      INSTPM is already added to gen8_engine_mmio_list, but 0x20D8 is missed
      in gen9_engine_mmio_list. From GVT point of view, different guest could
      have different context so should switch those mmio accordingly.
      
      v2: Update fixes commit ID.
      
      Fixes: 17865713 ("drm/i915/gvt: vGPU context switch")
      Reviewed-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      Signed-off-by: default avatarColin Xu <colin.xu@intel.com>
      Signed-off-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      (cherry picked from commit 1e8b15a1)
      75fdb811
  4. 30 Apr, 2019 1 commit
  5. 29 Apr, 2019 2 commits
  6. 25 Apr, 2019 5 commits
  7. 24 Apr, 2019 17 commits
  8. 23 Apr, 2019 1 commit
  9. 21 Apr, 2019 5 commits
  10. 19 Apr, 2019 3 commits