1. 28 Aug, 2024 3 commits
  2. 27 Aug, 2024 5 commits
  3. 26 Aug, 2024 1 commit
  4. 24 Aug, 2024 1 commit
  5. 23 Aug, 2024 9 commits
  6. 22 Aug, 2024 14 commits
  7. 21 Aug, 2024 2 commits
  8. 20 Aug, 2024 4 commits
  9. 19 Aug, 2024 1 commit
    • Nirmoy Das's avatar
      drm/xe/lnl: Offload system clear page activity to GPU · 23683061
      Nirmoy Das authored
      On LNL because of flat CCS, driver creates migrates job to clear
      CCS meta data. Extend that to also clear system pages using GPU.
      Inform TTM to allocate pages without __GFP_ZERO to avoid double page
      clearing by clearing out TTM_TT_FLAG_ZERO_ALLOC flag and set
      TTM_TT_FLAG_CLEARED_ON_FREE while freeing to skip ttm pool's clear
      on free as XE now takes care of clearing pages. If a bo is in system
      placement such as BO created with  DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING
      and there is a cpu map then for such BO gpu clear will be avoided as
      there is no dma mapping for such BO at that moment to create migration
      jobs.
      
      Tested this patch api_overhead_benchmark_l0 from
      https://github.com/intel/compute-benchmarks
      
      Without the patch:
      api_overhead_benchmark_l0 --testFilter=UsmMemoryAllocation:
      UsmMemoryAllocation(api=l0 type=Host size=4KB) 84.206 us
      UsmMemoryAllocation(api=l0 type=Host size=1GB) 105775.56 us
      erf tool top 5 entries:
      71.44% api_overhead_be  [kernel.kallsyms]   [k] clear_page_erms
      6.34%  api_overhead_be  [kernel.kallsyms]   [k] __pageblock_pfn_to_page
      2.24%  api_overhead_be  [kernel.kallsyms]   [k] cpa_flush
      2.15%  api_overhead_be  [kernel.kallsyms]   [k] pages_are_mergeable
      1.94%  api_overhead_be  [kernel.kallsyms]   [k] find_next_iomem_res
      
      With the patch:
      api_overhead_benchmark_l0 --testFilter=UsmMemoryAllocation:
      UsmMemoryAllocation(api=l0 type=Host size=4KB) 79.439 us
      UsmMemoryAllocation(api=l0 type=Host size=1GB) 98677.75 us
      Perf tool top 5 entries:
      11.16% api_overhead_be  [kernel.kallsyms]   [k] __pageblock_pfn_to_page
      7.85%  api_overhead_be  [kernel.kallsyms]   [k] cpa_flush
      7.59%  api_overhead_be  [kernel.kallsyms]   [k] find_next_iomem_res
      7.24%  api_overhead_be  [kernel.kallsyms]   [k] pages_are_mergeable
      5.53%  api_overhead_be  [kernel.kallsyms]   [k] lookup_address_in_pgd_attr
      
      Without this patch clear_page_erms() dominates execution time which is
      also not pipelined with migration jobs. With this patch page clearing
      will get pipelined with migration job and will free CPU for more work.
      
      v2: Handle regression on dgfx(Himal)
          Update commit message as no ttm API changes needed.
      v3: Fix Kunit test.
      v4: handle data leak on cpu mmap(Thomas)
      v5: s/gpu_page_clear/gpu_page_clear_sys and move setting
          it to xe_ttm_sys_mgr_init() and other nits (Matt Auld)
      v6: Disable it when init_on_alloc and/or init_on_free is active(Matt)
          Use compute-benchmarks as reporter used it to report this
          allocation latency issue also a proper test application than mime.
          In v5, the test showed significant reduction in alloc latency but
          that is not the case any more, I think this was mostly because
          previous test was done on IFWI which had low mem BW from CPU.
      
      Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20240816135154.19678-2-nirmoy.das@intel.comSigned-off-by: default avatarNirmoy Das <nirmoy.das@intel.com>
      23683061