- 22 Jan, 2024 15 commits
-
-
Tao Zhou authored
Support page retirement handling in debug mode. v2: revert smu_v13_0_6_get_ecc_info directly. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tao Zhou authored
This reverts commit 6fe08f56. We use debug mode flag instead of this interface. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Srinivasan Shanmugam authored
Fixes the below gcc with W=1: drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_replay.c:262: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Set REPLAY power optimization flags and coasting vtotal. drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_replay.c:284: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * send Replay general cmd to DMUB. Fixes: e379787c ("drm/amd/display: Add some functions for Panel Replay") Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Srinivasan Shanmugam authored
Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:645 amdgpu_gfx_enable_kcq() warn: inconsistent indenting Cc: Le Ma <Le.Ma@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Yang Wang authored
update smu v13.0.6 message to allow guest driver set gfx clock. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
YiPeng Chai authored
Support retiring multiple MCA error address pages in one in-band query for umc v12_0. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
YiPeng Chai authored
Add interface to check mca umc status. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
YiPeng Chai authored
Use asynchronous polling to handle umc_v12_0 poisoning. v2: 1. Change function name. 2. Change the debugging information content. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Stanley.Yang authored
The high three bits of ras features mask indicate socket id, it should skip to check high three bits of ras features mask before disable all ras features. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
YiPeng Chai authored
Preparing for asynchronous processing of umc page retirement. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
YiPeng Chai authored
Add log info for umc_v12_0. v2: Delete redundant logs. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Samasth Norway Ananda authored
voltage_parameters is a point to a struct of type SET_VOLTAGE_PARAMETERS_V1_3. Passing just voltage_parameters would not print the right size of the struct variable. So we need to pass *voltage_parameters to sizeof(). Fixes: 4630d503 ("drm/amdgpu: check PS, WS index") Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Arunpravin Paneer Selvam authored
- Enable the seq64 mapping sequence. - Fix wflinfo va conflict and other bugs. v1: - The seq64 area needs to be included in the AMDGPU_VA_RESERVED_SIZE otherwise the areas will conflict with user space allocations (Alex) - It needs to be mapped read only in the user VM (Alex) v2: - Instead of just one define for TOP/BOTTOM reserved space separate them into two (Christian) - Fix the CPU and VA calculations and while at it also cleanup error handling and kerneldoc (Christian) Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
-
Nikita Zhandarovich authored
'leakage_table' will always be successfully initialized as a pointer to '&rdev->pm.dpm.dyn_state.cac_leakage_table'. Remove unnecessary check if only to silence static checkers. Found by Linux Verification Center (linuxtesting.org) with static analysis tool Svace. Fixes: 69e0b57a ("drm/radeon/kms: add dpm support for cayman (v5)") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Nikita Zhandarovich authored
Inside the if block with (running == 0), the checks for 'running' possibly being non-zero are redundant. Remove them altogether. This change is similar to the one authored by Heinrich Schuchardt <xypron.glpk@gmx.de> in commit ddbbd3be ("drm/radeon: remove dead code, si_mc_load_microcode (v2)") Found by Linux Verification Center (linuxtesting.org) with static analysis tool Svace. Fixes: 0af62b01 ("drm/radeon/kms: add ucode loader for NI") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
- 18 Jan, 2024 25 commits
-
-
Yang Wang authored
v1: enable amdgpu smu driver message log. v2: add smu/pmfw response value into debug log. Signed-off-by: Yang Wang <KevinYang.Wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tao Zhou authored
Deferred error is also taken into account. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Stanley.Yang authored
Why: The PCI error slot reset maybe triggered after inject ue to UMC multi times, this caused system hang. [ 557.371857] amdgpu 0000:af:00.0: amdgpu: GPU reset succeeded, trying to resume [ 557.373718] [drm] PCIE GART of 512M enabled. [ 557.373722] [drm] PTB located at 0x0000031FED700000 [ 557.373788] [drm] VRAM is lost due to GPU reset! [ 557.373789] [drm] PSP is resuming... [ 557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Device state = 1 pci_status: 0. Exit, result = 3, need reset [ 557.547067] [drm] PCI error: detected callback, state(1)!! [ 557.547069] [drm] No support for XGMI hive yet... [ 557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 0. Enter [ 557.607763] mlx5_core 0000:55:00.0: wait vital counter value 0x16b5b after 1 iterations [ 557.607777] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 1. Exit, err = 0, result = 5, recovered [ 557.610492] [drm] PCI error: slot reset callback!! ... [ 560.689382] amdgpu 0000:3f:00.0: amdgpu: GPU reset(2) succeeded! [ 560.689546] amdgpu 0000:5a:00.0: amdgpu: GPU reset(2) succeeded! [ 560.689562] general protection fault, probably for non-canonical address 0x5f080b54534f611f: 0000 [#1] SMP NOPTI [ 560.701008] CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G OE 5.15.0-91-generic #101-Ubuntu [ 560.712057] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C11.AG.1 11/08/2023 [ 560.720959] Workqueue: amdgpu-reset-hive amdgpu_ras_do_recovery [amdgpu] [ 560.728887] RIP: 0010:amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu] [ 560.736891] Code: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00 00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff <48> 83 ba 18 02 00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00 [ 560.757967] RSP: 0018:ffa0000032e53d80 EFLAGS: 00010202 [ 560.763848] RAX: ffa00000001dfd10 RBX: ffa0000000197090 RCX: ffa0000032e53db0 [ 560.771856] RDX: 5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010 [ 560.779867] RBP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08 [ 560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000 [ 560.795889] R13: ffa0000032e53e00 R14: 0000000000000000 R15: 0000000000000000 [ 560.803889] FS: 0000000000000000(0000) GS:ff11007e7e800000(0000) knlGS:0000000000000000 [ 560.812973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 560.819422] CR2: 000055a04c118e68 CR3: 0000000007410005 CR4: 0000000000771ee0 [ 560.827433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 560.835433] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 560.843444] PKRU: 55555554 [ 560.846480] Call Trace: [ 560.849225] <TASK> [ 560.851580] ? show_trace_log_lvl+0x1d6/0x2ea [ 560.856488] ? show_trace_log_lvl+0x1d6/0x2ea [ 560.861379] ? amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu] [ 560.867778] ? show_regs.part.0+0x23/0x29 [ 560.872293] ? __die_body.cold+0x8/0xd [ 560.876502] ? die_addr+0x3e/0x60 [ 560.880238] ? exc_general_protection+0x1c5/0x410 [ 560.885532] ? asm_exc_general_protection+0x27/0x30 [ 560.891025] ? amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu] [ 560.898323] amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu] [ 560.904520] process_one_work+0x228/0x3d0 How: In RAS recovery, mode-1 reset is issued from RAS fatal error handling and expected all the nodes in a hive to be reset. no need to issue another mode-1 during this procedure. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Stanley.Yang authored
Show deferred error count for UMC syfs node Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Ori Messinger authored
On GFX version 11, GFXOFF was disabled due to a MES KIQ firmware issue, which has since been fixed after version 64. This patch only re-enables GFXOFF for GFX version 11 if the GPU's MES KIQ firmware version is newer than version 64. V2: Keep GFXOFF disabled on GFX11 if MES KIQ is below version 64. V3: Add parentheses to avoid GCC warning for parentheses: "suggest parentheses around comparison in operand of ‘&’" V4: Remove "V3" from commit title V5: Change commit description and insert 'Acked-by' Signed-off-by: Ori Messinger <Ori.Messinger@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Yang Wang authored
fix array index out of bounds issue for ras_block_string[] array. Fixes: 30df05fb ("drm/amdgpu: Align ras block enum with firmware") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
YuanShang authored
Submit command of wreg in GFX and COMPUTE ring to update RLC_SPM_MC_CNT in guest machine during runtime. Signed-off-by: YuanShang <YuanShang.Mao@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Srinivasan Shanmugam authored
Return value of 'to_amdgpu_crtc' which is container_of(...) can't be null, so it's null check 'acrtc' is dropped. Fixing the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:9302 amdgpu_dm_atomic_commit_tail() error: we previously assumed 'acrtc' could be null (see line 9299) Added 'new_crtc_state' NULL check for function 'drm_atomic_get_new_crtc_state' that retrieves the new state for a CRTC, while enabling writeback requests. Cc: stable@vger.kernel.org Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
A static checker pointed out, that bo_va->base.bo was already derefenced earlier in the same scope. Therefore this check is unnecessary here. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: 50661eb1 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the HW in an active state and is an unbalanced use of the IP callbacks. Using the IP callbacks like this can lead to memory leaks, double free and imbalanced reference counters. Leaving the HW in an active state can lead to DMA accesses to memory now freed by the driver. Both is a complete no-go for driver unload so completely revert the workaround for now. This reverts commit f5c7e779. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christophe JAILLET authored
ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). Note that the upper limit of ida_simple_get() is exclusive, but the one of ida_alloc_range() is inclusive. So a -1 has been added when needed. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Flora Cui authored
otherwise drm_client_dev_unregister() would try to kfree(&adev->kfd.client). Fixes: 18192001 ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christophe JAILLET authored
It is likely that the statement related to 'dml_edp' is misplaced. So move it in the correct "case SIGNAL_TYPE_EDP". Fixes: 7966f319 ("drm/amd/display: Introduce DML2") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: space required after that ',' (ctx:VxV) ERROR: spaces required around that '>' (ctx:VxV) ERROR: spaces required around that '<' (ctx:VxV) Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line ERROR: trailing statements should be on next lineo Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: space required before the open parenthesis '(' Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line ERROR: space prohibited before open square bracket '[' Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-