- 18 Jan, 2024 40 commits
-
-
Tao Zhou authored
Deferred error is also taken into account. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Stanley.Yang authored
Why: The PCI error slot reset maybe triggered after inject ue to UMC multi times, this caused system hang. [ 557.371857] amdgpu 0000:af:00.0: amdgpu: GPU reset succeeded, trying to resume [ 557.373718] [drm] PCIE GART of 512M enabled. [ 557.373722] [drm] PTB located at 0x0000031FED700000 [ 557.373788] [drm] VRAM is lost due to GPU reset! [ 557.373789] [drm] PSP is resuming... [ 557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Device state = 1 pci_status: 0. Exit, result = 3, need reset [ 557.547067] [drm] PCI error: detected callback, state(1)!! [ 557.547069] [drm] No support for XGMI hive yet... [ 557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 0. Enter [ 557.607763] mlx5_core 0000:55:00.0: wait vital counter value 0x16b5b after 1 iterations [ 557.607777] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 1. Exit, err = 0, result = 5, recovered [ 557.610492] [drm] PCI error: slot reset callback!! ... [ 560.689382] amdgpu 0000:3f:00.0: amdgpu: GPU reset(2) succeeded! [ 560.689546] amdgpu 0000:5a:00.0: amdgpu: GPU reset(2) succeeded! [ 560.689562] general protection fault, probably for non-canonical address 0x5f080b54534f611f: 0000 [#1] SMP NOPTI [ 560.701008] CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G OE 5.15.0-91-generic #101-Ubuntu [ 560.712057] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C11.AG.1 11/08/2023 [ 560.720959] Workqueue: amdgpu-reset-hive amdgpu_ras_do_recovery [amdgpu] [ 560.728887] RIP: 0010:amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu] [ 560.736891] Code: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00 00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff <48> 83 ba 18 02 00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00 [ 560.757967] RSP: 0018:ffa0000032e53d80 EFLAGS: 00010202 [ 560.763848] RAX: ffa00000001dfd10 RBX: ffa0000000197090 RCX: ffa0000032e53db0 [ 560.771856] RDX: 5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010 [ 560.779867] RBP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08 [ 560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000 [ 560.795889] R13: ffa0000032e53e00 R14: 0000000000000000 R15: 0000000000000000 [ 560.803889] FS: 0000000000000000(0000) GS:ff11007e7e800000(0000) knlGS:0000000000000000 [ 560.812973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 560.819422] CR2: 000055a04c118e68 CR3: 0000000007410005 CR4: 0000000000771ee0 [ 560.827433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 560.835433] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 560.843444] PKRU: 55555554 [ 560.846480] Call Trace: [ 560.849225] <TASK> [ 560.851580] ? show_trace_log_lvl+0x1d6/0x2ea [ 560.856488] ? show_trace_log_lvl+0x1d6/0x2ea [ 560.861379] ? amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu] [ 560.867778] ? show_regs.part.0+0x23/0x29 [ 560.872293] ? __die_body.cold+0x8/0xd [ 560.876502] ? die_addr+0x3e/0x60 [ 560.880238] ? exc_general_protection+0x1c5/0x410 [ 560.885532] ? asm_exc_general_protection+0x27/0x30 [ 560.891025] ? amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu] [ 560.898323] amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu] [ 560.904520] process_one_work+0x228/0x3d0 How: In RAS recovery, mode-1 reset is issued from RAS fatal error handling and expected all the nodes in a hive to be reset. no need to issue another mode-1 during this procedure. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Stanley.Yang authored
Show deferred error count for UMC syfs node Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Ori Messinger authored
On GFX version 11, GFXOFF was disabled due to a MES KIQ firmware issue, which has since been fixed after version 64. This patch only re-enables GFXOFF for GFX version 11 if the GPU's MES KIQ firmware version is newer than version 64. V2: Keep GFXOFF disabled on GFX11 if MES KIQ is below version 64. V3: Add parentheses to avoid GCC warning for parentheses: "suggest parentheses around comparison in operand of ‘&’" V4: Remove "V3" from commit title V5: Change commit description and insert 'Acked-by' Signed-off-by: Ori Messinger <Ori.Messinger@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Yang Wang authored
fix array index out of bounds issue for ras_block_string[] array. Fixes: 30df05fb ("drm/amdgpu: Align ras block enum with firmware") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
YuanShang authored
Submit command of wreg in GFX and COMPUTE ring to update RLC_SPM_MC_CNT in guest machine during runtime. Signed-off-by: YuanShang <YuanShang.Mao@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Srinivasan Shanmugam authored
Return value of 'to_amdgpu_crtc' which is container_of(...) can't be null, so it's null check 'acrtc' is dropped. Fixing the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:9302 amdgpu_dm_atomic_commit_tail() error: we previously assumed 'acrtc' could be null (see line 9299) Added 'new_crtc_state' NULL check for function 'drm_atomic_get_new_crtc_state' that retrieves the new state for a CRTC, while enabling writeback requests. Cc: stable@vger.kernel.org Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
A static checker pointed out, that bo_va->base.bo was already derefenced earlier in the same scope. Therefore this check is unnecessary here. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: 50661eb1 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the HW in an active state and is an unbalanced use of the IP callbacks. Using the IP callbacks like this can lead to memory leaks, double free and imbalanced reference counters. Leaving the HW in an active state can lead to DMA accesses to memory now freed by the driver. Both is a complete no-go for driver unload so completely revert the workaround for now. This reverts commit f5c7e779. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christophe JAILLET authored
ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). Note that the upper limit of ida_simple_get() is exclusive, but the one of ida_alloc_range() is inclusive. So a -1 has been added when needed. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Flora Cui authored
otherwise drm_client_dev_unregister() would try to kfree(&adev->kfd.client). Fixes: 18192001 ("drm/amdkfd: Export DMABufs from KFD using GEM handles") Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christophe JAILLET authored
It is likely that the statement related to 'dml_edp' is misplaced. So move it in the correct "case SIGNAL_TYPE_EDP". Fixes: 7966f319 ("drm/amd/display: Introduce DML2") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: space required after that ',' (ctx:VxV) ERROR: spaces required around that '>' (ctx:VxV) ERROR: spaces required around that '<' (ctx:VxV) Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line ERROR: trailing statements should be on next lineo Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
XueBing Chen authored
Fix the following errors reported by checkpatch: ERROR: space required before the open parenthesis '(' Signed-off-by: XueBing Chen <chenxb_99091@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line ERROR: space prohibited before open square bracket '[' Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: space prohibited before that close parenthesis ')' ERROR: need consistent spacing around '<<' (ctx:WxV) ERROR: need consistent spacing around '-' (ctx:WxV) Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: spaces required around that '=' (ctx:VxW) Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: space prohibited after that open parenthesis '(' ERROR: space prohibited before that close parenthesis ')' Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line ERROR: spaces required around that '&=' (ctx:WxO) ERROR: space required before that '~' (ctx:OxV) ERROR: space prohibited before that close parenthesis ')' ERROR: space required after that ',' (ctx:WxO) ERROR: space required before that '&' (ctx:OxV) ERROR: need consistent spacing around '*' (ctx:VxW) Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line ERROR: open brace '{' following union go on the same line Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line ERROR: need consistent spacing around '-' (ctx:WxV) ERROR: space required before the open parenthesis '(' ERROR: "foo* bar" should be "foo *bar" Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: that open brace { should be on the previous line Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line ERROR: open brace '{' following enum go on the same line Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: "foo* bar" should be "foo *bar" ERROR: that open brace { should be on the previous line Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: space required before the open parenthesis '(' Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: open brace '{' following struct go on the same line Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: "(foo*)" should be "(foo *)" Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
GuoHua Chen authored
Fix the following errors reported by checkpatch: ERROR: spaces required around that '||' (ctx:VxE) Signed-off-by: GuoHua Chen <chenguohua_716@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-