1. 01 Jul, 2022 1 commit
    • Dave Airlie's avatar
      Merge tag 'drm-intel-gt-next-2022-06-29' of... · c6a3d735
      Dave Airlie authored
      Merge tag 'drm-intel-gt-next-2022-06-29' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
      
      UAPI Changes:
      
      - Expose per tile media freq factor in sysfs (Ashutosh Dixit, Dale B Stimson)
      - Document memory residency and Flat-CCS capability of obj (Ramalingam C)
      - Disable GETPARAM lookups of I915_PARAM_[SUB]SLICE_MASK on Xe_HP+ (Matt Roper)
      
      Cross-subsystem Changes:
      
      - Rename intel-gtt symbols (Lucas De Marchi)
      
      Core Changes:
      
      Driver Changes:
      
      - Support programming the EU priority in the GuC descriptor (DG2) (Matthew Brost)
      - DG2 HuC loading support (Daniele Ceraolo Spurio)
      - Fix build error without CONFIG_PM (YueHaibing)
      - Enable THP on Icelake and beyond (Tvrtko Ursulin)
      - Only setup private tmpfs mount when needed and fix logging (Tvrtko Ursulin)
      - Make __guc_reset_context aware of guilty engines (Umesh Nerlige Ramappa)
      - DG2 small bar memory probing fixes (Nirmoy Das)
      - Remove unnecessary GuC err capture noise (Alan Previn)
      - Fix i915_gem_object_ggtt_pin_ww regression on old platforms (Maarten Lankhorst)
      - Fix undefined behavior in GuC backend due to shift overflowing the constant (Borislav Petkov)
      - New DG2 workarounds (Swathi Dhanavanthri, Anshuman Gupta)
      - Report no hwconfig support on ADL-N (Balasubramani Vivekanandan)
      - Fix error_state_read ptr + offset use (Alan Previn)
      - Expose per tile media freq factor in sysfs (Ashutosh Dixit, Dale B Stimson)
      - Fix memory leaks in per-gt sysfs (Ashutosh Dixit)
      - Fix dma_resv fence handling in multi-batch execbuf (Nirmoy Das)
      - Add extra registers to GPU error dump on Gen11+ (Stuart Summers)
      - More PVC+DG2 workarounds (Matt Roper)
      - Improve user experience and driver robustness under SIGINT or similar (Tvrtko Ursulin)
      - Don't show engine classes not present (Tvrtko Ursulin)
      - Improve on suspend / resume time with VT-d enabled (Thomas Hellström)
      - Add missing else (katrinzhou)
      - Don't leak lmem mapping in vma_evict (Juha-Pekka Heikkila)
      - Add smem fallback allocation for dpt (Juha-Pekka Heikkila)
      - Tweak the ordering in cpu_write_needs_clflush (Matthew Auld)
      - Do not access rq->engine without a reference (Niranjana Vishwanathapura)
      - Revert "drm/i915: Hold reference to intel_context over life of i915_request" (Niranjana Vishwanathapura)
      - Don't update engine busyness stats too frequently (Alan Previn)
      - Add additional steps for Wa_22011802037 for execlist backend (Umesh Nerlige Ramappa)
      - Fix a lockdep warning at error capture (Nirmoy Das)
      
      - Ponte Vecchio prep work and new blitter engines (Matt Roper, John Harrison, Lucas De Marchi)
      - Read correct RP_STATE_CAP register (PVC) (Matt Roper)
      - Define MOCS table for PVC (Ayaz A Siddiqui)
      - Driver refactor and support Ponte Vecchio forcewake handling (Matt Roper)
      - Remove additional 3D flags from PIPE_CONTROL (Ponte Vecchio) (Stuart Summers)
      - XEHPSDV and PVC do not use HuC (Daniele Ceraolo Spurio)
      - Extract stepping information from PCI revid (Ponte Vecchio) (Matt Roper)
      - Add initial PVC workarounds (Stuart Summers)
      - SSEU handling driver refactor and Ponte Vecchio support (Matt Roper)
      - GuC depriv applies to PVC (Matt Roper)
      - Add register steering (Ponte Vecchio) (Matt Roper)
      - Add recommended MMIO setting (Ponte Vecchio) (Matt Roper)
      
      - Move multicast register handling to a dedicated file (Matt Roper)
      - Cleanup interface for MCR operations (Matt Roper)
      - Extend i915_vma_pin_iomap() (CQ Tang)
      - Re-do the intel-gtt split (Lucas De Marchi)
      - Correct duplicated/misplaced GT register definitions (Matt Roper)
      - Prefer "XEHP_" prefix for registers (Matt Roper)
      
      - Don't use DRM_DEBUG_WARN_ON for unexpected l3bank/mslice config (Tvrtko Ursulin)
      - Don't use DRM_DEBUG_WARN_ON for ring unexpectedly not idle (Tvrtko Ursulin)
      - Make drop_pages() return bool (Lucas De Marchi)
      - Fix CFI violation with show_dynamic_id() (Nathan Chancellor)
      - Use i915_probe_error instead of drm_error in GuC code (Vinay Belgaumkar)
      - Fix use of static in macro mismatch (Andi Shyti)
      - Update tiled blits selftest (Bommu Krishnaiah)
      - Future-proof platform checks (Matt Roper)
      - Only include what's needed (Jani Nikula)
      - remove accidental static from a local variable (Jani Nikula)
      - Add global forcewake request to drpc (Vinay Belgaumkar)
      - Fix spelling typo in comment (pengfuyuan)
      - Increase timeout for live_parallel_switch selftest (Akeem G Abodunrin)
      - Use non-blocking H2G for waitboost (Vinay Belgaumkar)
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/YrwtLM081SQUG1Dc@tursulin-desk
      c6a3d735
  2. 30 Jun, 2022 3 commits
    • Dave Airlie's avatar
      Merge tag 'drm-misc-next-2022-06-30' of git://anongit.freedesktop.org/drm/drm-misc into drm-next · f9292174
      Dave Airlie authored
      drm-misc-next for v5.20:
      
      UAPI Changes:
      
       * fourcc: Update documentation
      
      Cross-subsystem Changes:
      
       * iosys-map: Rework iosys_map_{rd,wr} for improved performance
      
       * vfio: Use aperture helpers
      
      Core Changes:
      
       * aperture: Export for use with other subsystems
      
       * connector: Remove deprecated ida_simple_get()
      
       * crtc: Add helper with general state checks, convert drivers
      
       * format-helper: Add Kunit tests for RGB32 to RGB8
      
      Driver Changes:
      
       * ast: Fix black screen on resume
      
       * bridge: tc358767: Simplify DSI lane handling
      
       * mcde: Fix ref-count leak
      
       * mxsfb/lcdif: Support i.MX8MP LCD controller
      
       * stm/ltdc: Support dynamic Z order; Support mirroring; Fixes and cleanups
      
       * vc4: Many small fixes throughout the driver
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      # -----BEGIN PGP SIGNATURE-----
      #
      # iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmK9TnYACgkQaA3BHVML
      # eiMJcgf+JsGWlFutkxlJCEUDKTXk6BYHQL4czyskDvpBoLrdU1tyrAfKKtqP5k+0
      # SMvS6h1CFa/fSUCYpbdpJ6ER1fZ9r19WdgoPTBc4b97/uQTOJDzd5zuHDiJZquwC
      # O6HD/rptUzPFe6HJuY2cYVtwMlWb2NhITMHfctgyeQJSMK8TwoU1bDVFftwxaWFt
      # ISscTz0enn38sCjEarSpyKkBCinuaWDcpe5BI2Dp3imkDWR3ktzuh4B11QWS0DKs
      # Q/FLGTEl1sDrV7r93WiA5BIAPVwNMm1Pl0syd1p42SNLNnv0gcap4GL6qni4h9Ev
      # P/3fIInor/Sht8fyhlFsOUA8k2x7MA==
      # =6NoJ
      # -----END PGP SIGNATURE-----
      # gpg: Signature made Thu 30 Jun 2022 17:19:18 AEST
      # gpg:                using RSA key 7217FBAC8CE9CF6344A168E5680DC11D530B7A23
      # gpg: Can't check signature: No public key
      From: Thomas Zimmermann <tzimmermann@suse.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/Yr1On+eT1mCvIMzW@linux-uq9g
      f9292174
    • Lucas De Marchi's avatar
      iosys-map: Add per-word write · 6fb5ee7c
      Lucas De Marchi authored
      Like was done for read, provide the equivalent for write. Even if
      current users are not in the hot path, this should future-proof it.
      
      v2:
        - Remove default from _Generic() - callers wanting to write more
          than u64 should use iosys_map_memcpy_to()
        - Add WRITE_ONCE() cases dereferencing the pointer when using system
          memory
      v3:
        - Fix precedence issue when casting inside WRITE_ONCE(). By not using ()
          around vaddr__ the offset was not part of the cast, but rather added
          to it, producing a wrong address
        - Remove compiletime_assert() as WRITE_ONCE() already contains it
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Reviewed-by: Reviewed-by: Christian König <christian.koenig@amd.com> # v1
      Reviewed-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220628191016.3899428-2-lucas.demarchi@intel.com
      6fb5ee7c
    • Lucas De Marchi's avatar
      iosys-map: Add per-word read · 5f278dbd
      Lucas De Marchi authored
      Instead of always falling back to memcpy_fromio() for any size, prefer
      using read{b,w,l}(). When reading struct members it's common to read
      individual integer variables individually. Going through memcpy_fromio()
      for each of them poses a high penalty.
      
      Employ a similar trick as __seqprop() by using _Generic() to generate
      only the specific call based on a type-compatible variable.
      
      For a pariticular i915 workload producing GPU context switches,
      __get_engine_usage_record() is particularly hot since the engine usage
      is read from device local memory with dgfx, possibly multiple times
      since it's racy. Test execution time for this test shows a ~12.5%
      improvement with DG2:
      
      Before:
      	nrepeats = 1000; min = 7.63243e+06; max = 1.01817e+07;
      	median = 9.52548e+06; var = 526149;
      After:
      	nrepeats = 1000; min = 7.03402e+06; max = 8.8832e+06;
      	median = 8.33955e+06; var = 333113;
      
      Other things attempted that didn't prove very useful:
      1) Change the _Generic() on x86 to just dereference the memory address
      2) Change __get_engine_usage_record() to do just 1 read per loop,
         comparing with the previous value read
      3) Change __get_engine_usage_record() to access the fields directly as it
         was before the conversion to iosys-map
      
      (3) did gave a small improvement (~3%), but doesn't seem to scale well
      to other similar cases in the driver.
      
      Additional test by Chris Wilson using gem_create from igt with some
      changes to track object creation time. This happens to accidentally
      stress this code path:
      
      	Pre iosys_map conversion of engine busyness:
      	lmem0: Creating    262144 4KiB objects took 59274.2ms
      
      	Unpatched:
      	lmem0: Creating    262144 4KiB objects took 108830.2ms
      
      	With readl (this patch):
      	lmem0: Creating    262144 4KiB objects took 61348.6ms
      
      	s/readl/READ_ONCE/
      	lmem0: Creating    262144 4KiB objects took 61333.2ms
      
      So we do take a little bit more time than before the conversion, but
      that is due to other factors: bringing the READ_ONCE back would be as
      good as just doing this conversion.
      
      v2:
        - Remove default from _Generic() - callers wanting to read more
          than u64 should use iosys_map_memcpy_from()
        - Add READ_ONCE() cases dereferencing the pointer when using system
          memory
      v3:
        - Fix precedence issue when casting inside READ_ONCE(). By not using ()
          around vaddr__ the offset was not part of the cast, but rather added
          to it, producing a wrong address
        - Remove compiletime_assert() as READ_ONCE() already contains it
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
      Reviewed-by: Christian König <christian.koenig@amd.com> # v1
      Reviewed-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220628191016.3899428-1-lucas.demarchi@intel.com
      5f278dbd
  3. 29 Jun, 2022 1 commit
  4. 28 Jun, 2022 35 commits