Commits · a499b68cce3c535531432c805682f4350a90f150 · Kirill Smelkov / linux

22 Jan, 2024 34 commits

drm/amd/display: Promote DAL to 3.2.269 · a499b68c

Aric Cyr authored Jan 15, 2024

- FW Release 0.0.201.0
- Fix resizing video window for dcn321
- Fix timing bandwidth calculation for HDMI
- Fix null-deref in dml2 assigned pipe search
- Add GART memory support for dmcub
- Add power_state and pme_pending flag
- Add usb4_bw_alloc_support flag
- Revert "Rework DC Z10 restore
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a499b68c

drm/amd/display: [FW Promotion] Release 0.0.201.0 · a125206c

Anthony Koo authored Jan 13, 2024

 - Add debug flag for Replay IPS visual confirm
 - Remove unused debug flags that should not
   be controlled inside Replay FSM
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Anthony Koo <anthony.koo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a125206c

drm/amd/display: Replay + IPS + ABM in Full Screen VPB · f980579c

ChunTao Tso authored Jan 08, 2024

[Why]
Because ABM will wait VStart to start getting histogram data,
 it will cause we can't enter IPS while full screnn video playing.

[How]
Modify the panel refresh rate to the maximun multiple of current
 refresh rate.
Reviewed-by: Dennis Chan <dennis.chan@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: ChunTao Tso <chuntao.tso@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f980579c

drm/amd/display: turn off windowed Mpo ODM feature for dcn321 · 4516a793

Wenjing Liu authored Jan 12, 2024

[why]
It has been found a regression caused by enabling this feature during ODM to
MPC combine switch when user is resizing video window. The transition is
only needed when the feature is enabled. During the transition driver will
temporary switch to use max dppclk level through SMU set hard min interface.
The interface times out and fail to configure the max dpp clock level, which caused
system issue as the desired clock can't be set. We will continue investigating
the issue and root cause the issue where max dppclk level can't be reached.
But for now we have to disable this feature as this feature will cause us to hit this
problem in common use cases during video playback unfortunately. The issue
is dcn321 specific so it won't impact other dcn revisions.
Reviewed-by: Martin Leung <martin.leung@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4516a793

drm/amd/display: Add GART memory support for dmcub · 624e0d7f

Fudongwang authored Dec 19, 2023

[Why]
In dump file, GART memory can be accessed while frame buffer cannot.

[How]
Add GART memory support for dmcub.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Fudongwang <fudong.wang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

624e0d7f

drm/amd/display: Revert "Rework DC Z10 restore" · 8457bddc

Charlene Liu authored Jan 11, 2024

This reverts commit e6f82bd4.

It caused intermittent hangs when enabling IPS on static screen.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Charlene Liu <charlene.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8457bddc

drm/amd/display: add power_state and pme_pending flag · 2a8e918f

Muhammad Ahmed authored Dec 06, 2023

[what]
Adding power_state to dc.h and pme_pending flag to clk_mgr_internal.h
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Muhammad Ahmed <ahmed.ahmed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2a8e918f

drm/amd/display: Add IPS checks before dcn register access · 60818ed7

Roman Li authored Jan 09, 2024

[Why]
With IPS enabled a system hangs once PSR is active.
PSR active triggers transition to IPS2 state.
While in IPS2 an access to dcn registers results in hard hang.
Existing check doesn't cover for PSR sequence.

[How]
Safeguard register access by disabling idle optimization in atomic commit
and crtc scanout. It will be re-enabled on next vblank.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

60818ed7

drm/amd/display: Add NULL-checks in dml2 assigned pipe search · b8f22348

Allen Pan authored Jan 09, 2024

[Why]
NULL-deref regression after:
"drm/amd/display: Fix dml2 assigned pipe search"

[How]
Add verification for potential NULLs

Fixes: d451b534 ("drm/amd/display: Fix dml2 assigned pipe search")
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Gabe Teeger <gabe.teeger@amd.com>
Signed-off-by: Allen Pan <allen.pan@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b8f22348

drm/amd/display: Add usb4_bw_alloc_support flag · 855f42ba

Peichen Huang authored Jan 09, 2024

[Why]
dc should have a flag for DM to enable usb4_bw_alloc in dptx

[How]
- Add usb4_bw_alloc_support flag in dc_config
Reviewed-by: Wayne Lin <wayne.lin@amd.com>
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Peichen Huang <peichen.huang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

855f42ba

drm/amd/display: Promote DAL to 3.2.268 · 9feaa4c0

Aric Cyr authored Jan 08, 2024

Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

9feaa4c0

drm/amd/display: create DCN3-specific log for MPC state · 63484694

Melissa Wen authored Nov 28, 2023

Logging DCN3 MPC state was following DCN1 implementation that doesn't
consider new DCN3 MPC color blocks. Create new elements according to
DCN3 MPC color caps and a new DCN3-specific function for reading MPC
data.

v3:
- remove gamut remap reg reading in favor of fixed31_32 matrix data
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

63484694

drm/amd/display: Fix timing bandwidth calculation for HDMI · c597479f

Leo (Hanghong) Ma authored Jan 04, 2024

[Why && How]
The current bandwidth calculation for timing doesn't account for
certain HDMI modes overhead which leads to DSC can't be enabled.
Add support to calculate the actual bandwidth for these HDMI modes.
Reviewed-by: Chris Park <chris.park@amd.com>
Acked-by: Roman Li <roman.li@amd.com>
Signed-off-by: Leo (Hanghong) Ma <hanghong.ma@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c597479f

drm/amd/display: add get_gamut_remap helper for MPC3 · aa708057

Melissa Wen authored Nov 28, 2023

We want to be able to read the MPC's gamut remap matrix similar to
what we do with .dpp_get_gamut_remap functions. On the other hand, we
don't need a hook here because only DCN3+ has the MPC gamut remap
block, being absent in previous families.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

aa708057

drm/amd/display: fill up DCN3 DPP color state · e808825c

Melissa Wen authored Nov 28, 2023

DCN3 DPP color state was uncollected and some state elements from DCN1
doesn't fit DCN3. Create new elements according to DCN3 color caps and
fill them up for DTN log output.

rfc-v2:
- fix reading of gamcor and blnd gamma states
- remove gamut remap register in favor of gamut remap matrix reading
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e808825c

drm/amd/display: read gamut remap matrix in fixed-point 31.32 format · f2640756

Melissa Wen authored Nov 28, 2023

Instead of read gamut remap data from hw values, convert HW register
values (S2D13) into a fixed-point 31.32 matrix for color state log.
Change DCN10 log to print data in the format of the gamut remap matrix.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f2640756

drm/amd/display: Add dpp_get_gamut_remap functions · 07b2483e

Harry Wentland authored Nov 28, 2023

We want to be able to read the DPP's gamut remap matrix.

v2:
- code-style and doc comments clean-up (Melissa)
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

07b2483e

drm/amd/display: decouple color state from hw state log · efbfc987

Melissa Wen authored Nov 28, 2023

Prepare to hook up color state log according to the DCN version.

v3:
- put functions in single line (Siqueira)
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

efbfc987

drm/amd/display: Fix uninitialized variable usage in core_link_ 'read_dpcd() &... · 3295580d

Srinivasan Shanmugam authored Jan 17, 2024

drm/amd/display: Fix uninitialized variable usage in core_link_ 'read_dpcd() & write_dpcd()' functions

The 'status' variable in 'core_link_read_dpcd()' &
'core_link_write_dpcd()' was uninitialized.

Thus, initializing 'status' variable to 'DC_ERROR_UNEXPECTED' by default.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dpcd.c:226 core_link_read_dpcd() error: uninitialized symbol 'status'.
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dpcd.c:248 core_link_write_dpcd() error: uninitialized symbol 'status'.

Cc: stable@vger.kernel.org
Cc: Jerry Zuo <jerry.zuo@amd.com>
Cc: Jun Lei <Jun.Lei@amd.com>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3295580d

drm/amdgpu: update check condition of query for ras page retire · 1757bb7d

Tao Zhou authored Jan 18, 2024

Support page retirement handling in debug mode.

v2: revert smu_v13_0_6_get_ecc_info directly.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1757bb7d

Revert "drm/amd/pm: smu v13_0_6 supports ecc info by default" · 18d71047

Tao Zhou authored Jan 18, 2024

This reverts commit 6fe08f56.
We use debug mode flag instead of this interface.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

18d71047

drm/amd/display: Drop kdoc markers for some Panel Replay functions · e8cc57a9

Srinivasan Shanmugam authored Jan 18, 2024

Fixes the below gcc with W=1:
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_replay.c:262: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Set REPLAY power optimization flags and coasting vtotal.
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_replay.c:284: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * send Replay general cmd to DMUB.

Fixes: e379787c ("drm/amd/display: Add some functions for Panel Replay")
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e8cc57a9

drm/amdgpu: Cleanup inconsistent indenting in 'amdgpu_gfx_enable_kcq()' · be91a828

Srinivasan Shanmugam authored Jan 18, 2024

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:645 amdgpu_gfx_enable_kcq() warn: inconsistent indenting

Cc: Le Ma <Le.Ma@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

be91a828

drm/amd/pm: udpate smu v13.0.6 message permission · 0d50f404

Yang Wang authored Jan 19, 2024

update smu v13.0.6 message to allow guest driver set gfx clock.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0d50f404

drm/amdgpu:Support retiring multiple MCA error address pages · 0795b5d2

YiPeng Chai authored Jan 15, 2024

Support retiring multiple MCA error address pages in
one in-band query for umc v12_0.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0795b5d2

drm/amdgpu: add interface to check mca umc status · afb617f3

YiPeng Chai authored Jan 15, 2024

Add interface to check mca umc status.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

afb617f3

drm/amdgpu: Use asynchronous polling to handle umc_v12_0 poisoning · 6c23f3d1

YiPeng Chai authored Jan 15, 2024

Use asynchronous polling to handle umc_v12_0 poisoning.

v2:
  1. Change function name.
  2. Change the debugging information content.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6c23f3d1

drm/amdgpu: Fix ras features value calltrace · ee9c3031

Stanley.Yang authored Jan 17, 2024

The high three bits of ras features mask indicate socket
id, it should skip to check high three bits of ras features
mask before disable all ras features.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ee9c3031

drm/amdgpu: Prepare for asynchronous processing of umc page retirement · 3fdcd0a3

YiPeng Chai authored Jan 18, 2024

Preparing for asynchronous processing of umc page retirement.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3fdcd0a3

drm/amdgpu: Add log info for umc_v12_0 · 22f6e3e1

YiPeng Chai authored Jan 15, 2024

Add log info for umc_v12_0.

v2:
 Delete redundant logs.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

22f6e3e1

drm/amdgpu: fix wrong sizeof argument · 3e221746

Samasth Norway Ananda authored Jan 17, 2024

voltage_parameters is a point to a struct of type
SET_VOLTAGE_PARAMETERS_V1_3. Passing just voltage_parameters would
not print the right size of the struct variable. So we need to pass
*voltage_parameters to sizeof().

Fixes: 4630d503 ("drm/amdgpu: check PS, WS index")
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3e221746

drm/amdgpu: Enable seq64 manager and fix bugs · 00a11f97

Arunpravin Paneer Selvam authored Jan 11, 2024

- Enable the seq64 mapping sequence.
- Fix wflinfo va conflict and other bugs.

v1:
  - The seq64 area needs to be included in the AMDGPU_VA_RESERVED_SIZE
    otherwise the areas will conflict with user space allocations (Alex)

  - It needs to be mapped read only in the user VM (Alex)

v2:
  - Instead of just one define for TOP/BOTTOM
    reserved space separate them into two (Christian)

  - Fix the CPU and VA calculations and while at it
    also cleanup error handling and kerneldoc (Christian)
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

00a11f97

drm/radeon/ni_dpm: remove redundant NULL check · 059e7c6b

Nikita Zhandarovich authored Jan 17, 2024

'leakage_table' will always be successfully initialized as a pointer
to '&rdev->pm.dpm.dyn_state.cac_leakage_table'.

Remove unnecessary check if only to silence static checkers.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool Svace.

Fixes: 69e0b57a ("drm/radeon/kms: add dpm support for cayman (v5)")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

059e7c6b

drm/radeon: remove dead code in ni_mc_load_microcode() · bf38a4e4

Nikita Zhandarovich authored Jan 17, 2024

Inside the if block with (running == 0), the checks for 'running'
possibly being non-zero are redundant. Remove them altogether.

This change is similar to the one authored by Heinrich Schuchardt
<xypron.glpk@gmx.de> in commit
ddbbd3be ("drm/radeon: remove dead code, si_mc_load_microcode (v2)")

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool Svace.

Fixes: 0af62b01 ("drm/radeon/kms: add ucode loader for NI")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

bf38a4e4

18 Jan, 2024 6 commits

drm/amd/pm: enable amdgpu smu send message log · 0cd2bc06

Yang Wang authored Apr 26, 2023

v1:
enable amdgpu smu driver message log.

v2:
add smu/pmfw response value into debug log.
Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0cd2bc06

drm/amdgpu: update error condition check for umc_v12_0_query_error_address · a9e4f61d

Tao Zhou authored Jan 17, 2024

Deferred error is also taken into account.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a9e4f61d

drm/amdgpu: Skip do PCI error slot reset during RAS recovery · 601429cc

Stanley.Yang authored Jan 10, 2024

Why:
    The PCI error slot reset maybe triggered after inject ue to UMC multi times, this
    caused system hang.
    [  557.371857] amdgpu 0000:af:00.0: amdgpu: GPU reset succeeded, trying to resume
    [  557.373718] [drm] PCIE GART of 512M enabled.
    [  557.373722] [drm] PTB located at 0x0000031FED700000
    [  557.373788] [drm] VRAM is lost due to GPU reset!
    [  557.373789] [drm] PSP is resuming...
    [  557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Device state = 1 pci_status: 0. Exit, result = 3, need reset
    [  557.547067] [drm] PCI error: detected callback, state(1)!!
    [  557.547069] [drm] No support for XGMI hive yet...
    [  557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 0. Enter
    [  557.607763] mlx5_core 0000:55:00.0: wait vital counter value 0x16b5b after 1 iterations
    [  557.607777] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 1. Exit, err = 0, result = 5, recovered
    [  557.610492] [drm] PCI error: slot reset callback!!
    ...
    [  560.689382] amdgpu 0000:3f:00.0: amdgpu: GPU reset(2) succeeded!
    [  560.689546] amdgpu 0000:5a:00.0: amdgpu: GPU reset(2) succeeded!
    [  560.689562] general protection fault, probably for non-canonical address 0x5f080b54534f611f: 0000 [#1] SMP NOPTI
    [  560.701008] CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G           OE     5.15.0-91-generic #101-Ubuntu
    [  560.712057] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C11.AG.1 11/08/2023
    [  560.720959] Workqueue: amdgpu-reset-hive amdgpu_ras_do_recovery [amdgpu]
    [  560.728887] RIP: 0010:amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]
    [  560.736891] Code: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00 00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff <48> 83 ba 18 02 00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00
    [  560.757967] RSP: 0018:ffa0000032e53d80 EFLAGS: 00010202
    [  560.763848] RAX: ffa00000001dfd10 RBX: ffa0000000197090 RCX: ffa0000032e53db0
    [  560.771856] RDX: 5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010
    [  560.779867] RBP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08
    [  560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000
    [  560.795889] R13: ffa0000032e53e00 R14: 0000000000000000 R15: 0000000000000000
    [  560.803889] FS:  0000000000000000(0000) GS:ff11007e7e800000(0000) knlGS:0000000000000000
    [  560.812973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  560.819422] CR2: 000055a04c118e68 CR3: 0000000007410005 CR4: 0000000000771ee0
    [  560.827433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  560.835433] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
    [  560.843444] PKRU: 55555554
    [  560.846480] Call Trace:
    [  560.849225]  <TASK>
    [  560.851580]  ? show_trace_log_lvl+0x1d6/0x2ea
    [  560.856488]  ? show_trace_log_lvl+0x1d6/0x2ea
    [  560.861379]  ? amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]
    [  560.867778]  ? show_regs.part.0+0x23/0x29
    [  560.872293]  ? __die_body.cold+0x8/0xd
    [  560.876502]  ? die_addr+0x3e/0x60
    [  560.880238]  ? exc_general_protection+0x1c5/0x410
    [  560.885532]  ? asm_exc_general_protection+0x27/0x30
    [  560.891025]  ? amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]
    [  560.898323]  amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]
    [  560.904520]  process_one_work+0x228/0x3d0
How:
    In RAS recovery, mode-1 reset is issued from RAS fatal error handling and expected
    all the nodes in a hive to be reset. no need to issue another mode-1 during this procedure.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

601429cc

drm/amdgpu: Show deferred error count for UMC · 2c7a1560

Stanley.Yang authored Jan 17, 2024

Show deferred error count for UMC syfs node
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2c7a1560

drm/amdgpu: Enable GFXOFF for Compute on GFX11 · 776b0953

Ori Messinger authored Nov 22, 2023

On GFX version 11, GFXOFF was disabled due to a MES KIQ firmware
issue, which has since been fixed after version 64.
This patch only re-enables GFXOFF for GFX version 11 if the GPU's
MES KIQ firmware version is newer than version 64.

V2: Keep GFXOFF disabled on GFX11 if MES KIQ is below version 64.
V3: Add parentheses to avoid GCC warning for parentheses:
"suggest parentheses around comparison in operand of ‘&’"
V4: Remove "V3" from commit title
V5: Change commit description and insert 'Acked-by'
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

776b0953

drm/amdgpu: fix UBSAN array-index-out-of-bounds for ras_block_string[] · 7ed97155

Yang Wang authored Jan 16, 2024

fix array index out of bounds issue for ras_block_string[] array.

Fixes: 30df05fb ("drm/amdgpu: Align ras block enum with firmware")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7ed97155