1. 22 Mar, 2023 6 commits
    • Tong Liu01's avatar
      drm/amdgpu: add mes resume when do gfx post soft reset · 3234fac0
      Tong Liu01 authored
      [why]
      when gfx do soft reset, mes will also do reset, if mes is not
      resumed when do recover from soft reset, mes is unable to respond
      in later sequence
      
      [how]
      resume mes when do gfx post soft reset
      Signed-off-by: default avatarTong Liu01 <Tong.Liu01@amd.com>
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      3234fac0
    • Tim Huang's avatar
      drm/amdgpu: skip ASIC reset for APUs when go to S4 · af1f2985
      Tim Huang authored
      For GC IP v11.0.4/11, PSP TMR need to be reserved
      for ASIC mode2 reset. But for S4, when psp suspend,
      it will destroy the TMR that fails the ASIC reset.
      
      [  96.006101] amdgpu 0000:62:00.0: amdgpu: MODE2 reset
      [  100.409717] amdgpu 0000:62:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000011 SMN_C2PMSG_82:0x00000002
      [  100.411593] amdgpu 0000:62:00.0: amdgpu: Mode2 reset failed!
      [  100.412470] amdgpu 0000:62:00.0: PM: pci_pm_freeze(): amdgpu_pmops_freeze+0x0/0x50 [amdgpu] returns -62
      [  100.414020] amdgpu 0000:62:00.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xd0 returns -62
      [  100.415311] amdgpu 0000:62:00.0: PM: pci_pm_freeze+0x0/0xd0 returned -62 after 4623202 usecs
      [  100.416608] amdgpu 0000:62:00.0: PM: failed to freeze async: error -62
      
      We can skip the reset on APUs, assuming we can resume them
      properly. Verified on some GFX11, GFX10 and old GFX9 APUs.
      Signed-off-by: default avatarTim Huang <tim.huang@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      af1f2985
    • Tim Huang's avatar
      drm/amdgpu: reposition the gpu reset checking for reuse · d24eae4d
      Tim Huang authored
      Move the amdgpu_acpi_should_gpu_reset out of
      CONFIG_SUSPEND to share it with hibernate case.
      Signed-off-by: default avatarTim Huang <tim.huang@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      d24eae4d
    • Alex Deucher's avatar
      drm/amdgpu: drop the extra sign extension · aef98f2e
      Alex Deucher authored
      amdgpu_bo_gpu_offset_no_check() already calls
      amdgpu_gmc_sign_extend() so no need to call it twice.
      Reviewed-by: default avatarYang Wang <kevinyang.wang@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      aef98f2e
    • Dave Airlie's avatar
      Merge tag 'drm-habanalabs-next-2023-03-20' of... · d36d68fd
      Dave Airlie authored
      Merge tag 'drm-habanalabs-next-2023-03-20' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next
      
      This tag contains habanalabs driver and accel changes for v6.4:
      
      - uAPI changes:
      
        - Add opcodes to the CS ioctl to allow user to stall/resume specific engines
          inside Gaudi2. This is to allow the user to perform power
          testing/measurements when training different topologies.
      
        - Expose in the INFO ioctl the amount of device memory that the driver
          and f/w reserve for themselves.
      
        - Expose in the INFO ioctl a bit-mask of the available rotator engines
          in Gaudi2. This is to align with other engines that are already exposed.
      
        - Expose in the INFO ioctl the register's address of the f/w that should
          be used to trigger interrupts from within the user's code running in the
          compute engines.
      
        - Add a critical-event bit in the eventfd bitmask so the user will know the
          event that was received was critical, and a reset will now occur
      
        - Expose in the INFO ioctl two new opcodes to fetch information on h/w and
          f/w events. The events recorded are the events that were reported in the
          eventfd.
      
      - New features and improvements:
      
        - Add a dedicated interrupt ID in MSI-X in the device to the notification of
          an unexpected user-related event in Gaudi2. Handle it in the driver by
          reporting this event.
      
        - Allow the user to fetch the device memory current usage even when the
          device is undergoing compute-reset (a reset type that only clears the
          compute engines).
      
        - Enable graceful reset mechanism for compute-reset. This will give the
          user a few seconds before the device is reset. For example, the user can,
          during that time, perform certain device operations (dump data for debug)
          or close the device in an orderly fashion.
      
        - Align the decoder with the rest of the engines in regard to notification
          to the user about interrupts and in regard to performing graceful reset
          when needed (instead of immediate reset).
      
        - Add support for assert interrupt from the TPC engine.
      
        - Get the reset type that is necessary to perform per event from the
          auto-generated irq_map array.
      
        - Print the specific reason why a device is still in use when notifying to
          the user about it (after the user closed the device's FD).
      
        - Move to threaded IRQ when handling interrupts of workload completions.
      
      - Firmware related fixes:
      
        - Fix RAZWI event handler to match newest f/w version.
      
        - Read error cause register in dma core events because the f/w doesn't
          do that.
      
        - Increase maximum time to wait for completion of Gaudi2 reset due to f/w
          bug.
      
        - Align to the latest firmware specs.
      
      - Enforce the release order of the compute device and dma-buf.
        i.e increment the device file refcount for any dma-buf that was exported
        for that device. This will make sure the compute device release function
        won't be called until the user closes all the FDs of the relevant
        dma-bufs. Without this change, closing the device's FD before/without
        closing the dma-buf's FD would always lead to hard-reset of the device.
      
      - Fix a link in the drm documentation to correctly point to the accel section.
      
      - Compilation warnings cleanups
      
      - Misc bug fixes and code cleanups
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      # -----BEGIN PGP SIGNATURE-----
      #
      # iQEzBAABCgAdFiEE7TEboABC71LctBLFZR1NuKta54AFAmQYfcAACgkQZR1NuKta
      # 54DB4Af/SuiHZkVXwr+yHPv9El726rz9ZQD7mQtzNmehWGonwAvz15yqocNMUSbF
      # JbqE/vrZjvbXrP1Uv5UrlRVdnFHSPV18VnHU4BMS/WOm19SsR6vZ0QOXOoa6/AUb
      # w+kF3D//DbFI4/mTGfpH5/pzwu51ti8aVktosPFlHIa8iI8CB4/4IV+ivQ8UW4oK
      # HyDRkIvHdRmER7vGOfhwhsr4zdqSlJBYrv3C3Z1dkSYBPW/5ICbiM1UlKycwdYKI
      # cajQBSdUQwUCWnI+i8RmSy3kjNO6OE4XRUvTv89F2bQeyK/1rJLG2m2xZR/Ml/o5
      # 7Cgvbn0hWZyeqe7OObYiBlSOBSehCA==
      # =wclm
      # -----END PGP SIGNATURE-----
      # gpg: Signature made Tue 21 Mar 2023 01:37:36 AEST
      # gpg:                using RSA key ED311BA00042EF52DCB412C5651D4DB8AB5AE780
      # gpg: Can't check signature: No public key
      From: Oded Gabbay <ogabbay@kernel.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230320154026.GA766126@ogabbay-vm-u20.habana-labs.com
      d36d68fd
    • Dave Airlie's avatar
      Merge tag 'drm-intel-gt-next-2023-03-16' of... · d240daa2
      Dave Airlie authored
      Merge tag 'drm-intel-gt-next-2023-03-16' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
      
      Driver Changes:
      
      - Fix issue #6333: "list_add corruption" and full system lockup from
        performance monitoring (Janusz)
      - Give the punit time to settle before fatally failing (Aravind, Chris)
      - Don't use stolen memory or BAR for ring buffers on LLC platforms (John)
      - Add missing ecodes and correct timeline seqno on GuC error captures (John)
      - Make sure DSM size has correct 1MiB granularity on Gen12+ (Nirmoy,
        Lucas)
      - Fix potential SSEU max_subslices array-index-out-of-bounds access on Gen11 (Andrea)
      - Whitelist COMMON_SLICE_CHICKEN3 for UMD access on Gen12+ (Matt R.)
      - Apply Wa_1408615072/Wa_1407596294 correctly on Gen11 (Matt R)
      - Apply LNCF/LBCF workarounds correctly on XeHP SDV/PVC/DG2 (Matt R)
      - Implement Wa_1606376872 for Xe_LP (Gustavo)
      - Consider GSI offset when doing MCR lookups on Meteorlake+ (Matt R.)
      - Add engine TLB invalidation for Meteorlake (Matt R.)
      - Fix GSC Driver-FLR completion on Meteorlake (Alan)
      - Fix GSC races on driver load/unload on Meteorlake+ (Daniele)
      - Disable MC6 for MTL A step (Badal)
      
      - Consolidate TLB invalidation flow (Tvrtko)
      - Improve debug GuC/HuC debug messages (Michal Wa., John)
      - Move fd_install after last use of fence (Rob)
      - Initialize the obj flags for shmem objects (Aravind)
      - Fix missing debug object activation (Nirmoy)
      - Probe lmem before the stolen portion (Matt A)
      - Improve clean up of GuC busyness stats worker (John)
      - Fix missing return code checks in GuC submission init (John)
      - Annotate two more workaround/tuning registers as MCR on PVC (Matt R)
      - Fix GEN8_MISCCPCTL definition and remove unused INF_UNIT_LEVEL_CLKGATE (Lucas)
      - Use sysfs_emit() and sysfs_emit_at() (Nirmoy)
      - Make kobj_type structures constant (Thomas W.)
      - make kobj attributes const on gt/ (Jani)
      - Remove the unused virtualized start hack on buddy allocator (Matt A)
      - Remove redundant check for DG1 (Lucas)
      - Move DG2 tuning to the right function (Lucas)
      - Rename dev_priv to i915 for private data naming consistency in gt/ (Andi)
      - Remove unnecessary whitelisting of CS_CTX_TIMESTAMP on Xe_HP platforms (Matt R.)
      -
      
      - Escape wildcard in method names in kerneldoc (Bagas)
      - Selftest improvements (Chris, Jonathan, Tvrtko, Anshuman, Tejas)
      - Fix sparse warnings (Jani)
      
      [airlied: fix unused variable in intel_workarounds]
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/ZBMSb42yjjzczRhj@jlahtine-mobl.ger.corp.intel.com
      d240daa2
  2. 21 Mar, 2023 1 commit
  3. 20 Mar, 2023 22 commits
  4. 16 Mar, 2023 3 commits
  5. 15 Mar, 2023 8 commits