- 23 Aug, 2024 16 commits
-
-
Candice Li authored
Drop unsupported features on smu v14_0_2. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Xiaogang Chen authored
When app unmap vm ranges(munmap) kfd/svm starts drain pending page fault and not handle any incoming pages fault of this process until a deferred work item got executed by default system wq. The time period of "not handle page fault" can be long and is unpredicable. That is advese to kfd performance on page faults recovery. This patch uses time stamp of incoming page fault to decide to drop or recover page fault. When app unmap vm ranges kfd records each gpu device's ih ring current time stamp. These time stamps are used at kfd page fault recovery routine. Any page fault happened on unmapped ranges after unmap events is application bug that accesses vm range after unmap. It is not driver work to cover that. By using time stamp of page fault do not need drain page faults at deferred work. So, the time period that kfd does not handle page faults is reduced and can be controlled. Signed-off-by: Xiaogang.Chen <Xiaogang.Chen@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Lijo Lazar authored
Add p2s table support for a new revision of SMUv13.0.6. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Likun Gao authored
Add gc_info table v1.3 for IP discovery. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Ma Ke authored
Instead of using state->fb->obj[0] directly, get object from framebuffer by calling drm_gem_fb_get_obj() and return error code when object is null to avoid using null object of framebuffer. Fixes: 5d945cbc ("drm/amd/display: Create a file dedicated to planes") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Yang Wang authored
Add list empty check to avoid null pointer issues in some corner cases. - list_for_each_entry_safe() Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Jinjie Ruan authored
The sparse tool complains as follows: drivers/gpu/drm/amd/amdgpu/../display/dc/dsc/dcn401/dcn401_dsc.c:30:24: warning: symbol 'dcn401_dsc_funcs' was not declared. Should it be static? This symbol is not used outside of dcn401_dsc.c, so marks it static. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Jinjie Ruan authored
The sparse tool complains as follows: drivers/gpu/drm/amd/amdgpu/../display/dc/hubp/dcn35/dcn35_hubp.c:191:19: warning: symbol 'dcn35_hubp_funcs' was not declared. Should it be static? This symbol is not used outside of dcn35_hubp.c, so marks it static. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Jinjie Ruan authored
The sparse tool complains as follows: drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4.c:12:28: warning: symbol 'core_dcn4_ip_caps_base' was not declared. Should it be static? This symbol is not used outside of dcn35_hubp.c, so marks it static. And do not want to change it, so mark it const. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Jinjie Ruan authored
The sparse tool complains as follows: drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c:6853:56: warning: symbol 'core_dcn4_g6_temp_read_blackout_table' was not declared. Should it be static? This symbol is not used outside of dml2_core_dcn4_calcs.c, so marks it static. And not want to change it, so mark it const. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
Driver switches to interrupt source id to identify utcl2 poison event. polling interface is not needed. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rahul Jain authored
when trying to enable p2p the amdgpu_device_is_peer_accessible() checks the condition where address_mask overlaps the aper_base and hence returns 0, due to which the p2p disables for this platform IOMMU should remap the BAR addresses so the device can access them. Hence check if peer_adev is remapping DMA v5: (Felix, Alex) - fixing comment as per Alex feedback - refactor code as per Felix v4: (Alex) - fix the comment and description v3: - remove iommu_remap variable v2: (Alex) - Fix as per review comments - add new function amdgpu_device_check_iommu_remap to check if iommu remap Signed-off-by: Rahul Jain <Rahul.Jain@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Kenneth Feng authored
update message interface for smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
Not supported. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
Traditional utcl2 fault_status polling does not work in SRIOV environment. The polling of fault status register from guest side will be dropped by hardware. Driver should switch to check utcl2 interrupt source id to identify utcl2 poison event. It is set to 1 when poisoned data interrupts are signaled. v2: drop the unused local variable (Tao) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
- 21 Aug, 2024 24 commits
-
-
Alex Deucher authored
Otherwise we can fail to drop the software mutex when we fail to take the hardware mutex. Fixes: 76acba7b ("drm/amdgpu/gfx11: add a mutex for the gfx semaphore") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tim Huang authored
This resolves the dereference null return value warning reported by Coverity. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Victor Zhao authored
when use cpu to do page table update under sriov runtime, since mmio access is blocked, kiq has to be used to flush hdp. change WREG32_NO_KIQ to WREG32 to allow kiq. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
The driver needs to wait for the on board firmware to finish its initialization before probing the card. Commit 95905698 ("drm/amdgpu: Fix discovery initialization failure during pci rescan") switched from using msleep() to using usleep_range() which seems to have caused init failures on some navi1x boards. Switch back to msleep(). Fixes: 95905698 ("drm/amdgpu: Fix discovery initialization failure during pci rescan") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3559 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3500Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Ma Jun <Jun.Ma2@amd.com>
-
Martin Leung authored
- Various DML 2.1 fixes - Fix module unload - Fix construct_phy with MXM connector - Support UHBR10 link rate on eDP - Revert updated DCCG wrappers Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Martin Leung <Martin.Leung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Austin Zheng authored
[Why and How] DML2.1 reintegration for several fixes and updates to the DML code. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Roman Li <roman.li@amd Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tim Huang authored
Flexible endpoints use DIGs from available inflexible endpoints, so only the encoders of inflexible links need to be freed. Otherwise, a double free issue may occur when unloading the amdgpu module. [ 279.190523] RIP: 0010:__slab_free+0x152/0x2f0 [ 279.190577] Call Trace: [ 279.190580] <TASK> [ 279.190582] ? show_regs+0x69/0x80 [ 279.190590] ? die+0x3b/0x90 [ 279.190595] ? do_trap+0xc8/0xe0 [ 279.190601] ? do_error_trap+0x73/0xa0 [ 279.190605] ? __slab_free+0x152/0x2f0 [ 279.190609] ? exc_invalid_op+0x56/0x70 [ 279.190616] ? __slab_free+0x152/0x2f0 [ 279.190642] ? asm_exc_invalid_op+0x1f/0x30 [ 279.190648] ? dcn10_link_encoder_destroy+0x19/0x30 [amdgpu] [ 279.191096] ? __slab_free+0x152/0x2f0 [ 279.191102] ? dcn10_link_encoder_destroy+0x19/0x30 [amdgpu] [ 279.191469] kfree+0x260/0x2b0 [ 279.191474] dcn10_link_encoder_destroy+0x19/0x30 [amdgpu] [ 279.191821] link_destroy+0xd7/0x130 [amdgpu] [ 279.192248] dc_destruct+0x90/0x270 [amdgpu] [ 279.192666] dc_destroy+0x19/0x40 [amdgpu] [ 279.193020] amdgpu_dm_fini+0x16e/0x200 [amdgpu] [ 279.193432] dm_hw_fini+0x26/0x40 [amdgpu] [ 279.193795] amdgpu_device_fini_hw+0x24c/0x400 [amdgpu] [ 279.194108] amdgpu_driver_unload_kms+0x4f/0x70 [amdgpu] [ 279.194436] amdgpu_pci_remove+0x40/0x80 [amdgpu] [ 279.194632] pci_device_remove+0x3a/0xa0 [ 279.194638] device_remove+0x40/0x70 [ 279.194642] device_release_driver_internal+0x1ad/0x210 [ 279.194647] driver_detach+0x4e/0xa0 [ 279.194650] bus_remove_driver+0x6f/0xf0 [ 279.194653] driver_unregister+0x33/0x60 [ 279.194657] pci_unregister_driver+0x44/0x90 [ 279.194662] amdgpu_exit+0x19/0x1f0 [amdgpu] [ 279.194939] __do_sys_delete_module.isra.0+0x198/0x2f0 [ 279.194946] __x64_sys_delete_module+0x16/0x20 [ 279.194950] do_syscall_64+0x58/0x120 [ 279.194954] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 279.194980] </TASK> Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Nicholas Susanto authored
[Why] Causes hard hangs when resuming after display off on extended/duplicate modes [How] Set the min dispclk to 50Mhz for DCN35 Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Nicholas Susanto <Nicholas.Susanto@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Ilya Bakoulin authored
[Why/How] The call to construct_phy will fail in cases where connector type is MXM, and the dc_link won't be properly created/initialized. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Sung Joon Kim authored
[why] Supporting UHBR10 link rate on eDP leverages the existing DP2.0 code but need to add some small adjustments in code. [how] Acknowledge the given DPCD caps for UHBR10 link rate support and allow DP2.0 programming sequence and link training for eDP. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Sung Joon Kim <Sungjoon.Kim@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Nevenko Stupar authored
[Why & How] DCN4 Cursor has separate degamma block and should always do Cursor degamma for Cursor color modes. Reviewed-by: Chris Park <chris.park@amd.com> Signed-off-by: Nevenko Stupar <Nevenko.Stupar@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Michael Strauss authored
[WHY] eDP 2.0 is introducing support for UHBR link rates, however current eDP ILR link optimization does not account for UHBR capabilities. Either UHBR capabilities will be provided via the same 128b/132b rate DPCD caps that are currently used on DP2.1, or Table 4-13 may be updated to include UHBR rates. [HOW] Add extra Supported Link Rates table translations for UHBR10/13.5/20. Update eDP link setting optimization search to be aware of 128b/132b DPCD rate caps in order to unblock UHBR on panels with Supported Link Rates table. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Nicholas Susanto authored
Removing redundant condition. Reviewed-by: Hansen Dsouza <Hansen.Dsouza@amd.com> Signed-off-by: Nicholas Susanto <Nicholas.Susanto@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Aurabindo Pillai authored
when removing the amdgpu module and reinserting it, a call trace is triggered: [ 334.230602] RIP: 0010:hubbub2_get_dchub_ref_freq+0xbb/0xe0 [amdgpu] [ 334.230807] Code: 25 28 00 00 00 75 3c 48 8d 65 f0 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 45 31 db e9 55 a1 ca de <0f> 0b eb c6 0f 0b eb c2 d1 eb 8d 83 c0 63 ff ff 3d 20 4e 00 00 76 [ 334.230809] RSP: 0018:ffffbc8b823fb540 EFLAGS: 00010246 [ 334.230811] RAX: 0000000000001000 RBX: 00000000000186a0 RCX: 0000000000000000 [ 334.230812] RDX: ffffbc8b823fb544 RSI: 0000000000000000 RDI: 0000000000000000 [ 334.230813] RBP: ffffbc8b823fb560 R08: 0000000000000000 R09: 0000000000000000 [ 334.230814] R10: 0000000000000000 R11: 000000000000000f R12: ffff9e644f1f2bb0 [ 334.230815] R13: ffff9e6451361300 R14: 0000000000000000 R15: ffff9e6452c00000 [ 334.230816] FS: 00007af7c8519000(0000) GS:ffff9e737dd00000(0000) knlGS:0000000000000000 [ 334.230817] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 334.230818] CR2: 0000703576b9cbd0 CR3: 00000001095a2000 CR4: 0000000000750ee0 [ 334.230819] PKRU: 55555554 [ 334.230820] Call Trace: [ 334.230822] <TASK> [ 334.230824] ? show_regs+0x6d/0x80 [ 334.230828] ? __warn+0x89/0x160 [ 334.230832] ? hubbub2_get_dchub_ref_freq+0xbb/0xe0 [amdgpu] [ 334.231024] ? report_bug+0x17e/0x1b0 [ 334.231028] ? handle_bug+0x46/0x90 [ 334.231030] ? exc_invalid_op+0x18/0x80 [ 334.231032] ? asm_exc_invalid_op+0x1b/0x20 [ 334.231036] ? hubbub2_get_dchub_ref_freq+0xbb/0xe0 [amdgpu] [ 334.231217] dc_create_resource_pool+0xfd/0x320 [amdgpu] [ 334.231408] dc_create+0x256/0x700 [amdgpu] [ 334.231588] ? srso_alias_return_thunk+0x5/0x7f [ 334.231590] ? dmi_matches+0xa0/0x230 [ 334.231594] amdgpu_dm_init+0x28c/0x25f0 [amdgpu] [ 334.231791] ? prb_read_valid+0x1c/0x30 [ 334.231795] ? __irq_work_queue_local+0x43/0xf0 [ 334.231798] ? srso_alias_return_thunk+0x5/0x7f [ 334.231800] ? irq_work_queue+0x2f/0x70 [ 334.231802] ? srso_alias_return_thunk+0x5/0x7f [ 334.231803] ? __wake_up_klogd.part.0+0x40/0x70 [ 334.231805] ? srso_alias_return_thunk+0x5/0x7f [ 334.231807] ? vprintk_emit+0xd9/0x210 [ 334.231809] ? set_dev_info+0x130/0x1c0 [ 334.231812] ? srso_alias_return_thunk+0x5/0x7f [ 334.231813] ? dev_printk_emit+0xa1/0xe0 [ 334.231819] dm_hw_init+0x14/0x30 [amdgpu] [ 334.231993] amdgpu_device_init+0x23c7/0x2fc0 [amdgpu] [ 334.232134] ? pci_read_config_word+0x25/0x50 [ 334.232139] amdgpu_driver_load_kms+0x1a/0xd0 [amdgpu] [ 334.232284] amdgpu_pci_probe+0x1f9/0x620 [amdgpu] On DCN401, get_dchub_ref_freq() hook is called before init_hw() hook. Hence, it is expected to trigger an assert. Remove the extraneous call to get_dchub_ref_freq() to suppress the call trace Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Michael Strauss authored
[WHY] Previous multi-display HPO fix moved where HPO I/O enable/disable is performed. The codepath now taken to enable/disable HPO I/O is not used for compliance test automation, meaning that if a compliance box being driven at a DP1 rate requests retrain at UHBR, HPO I/O will remain off if it was previously off. [HOW] Explicitly update HPO I/O after allocating encoders for test request. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hansen Dsouza authored
[Why] Revert updated DCCG wrappers due to regression [How] This reverts commit 680458d4. Reviewed-by: Chris Park <chris.park@amd.com> Signed-off-by: Hansen Dsouza <Hansen.Dsouza@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Candice Li authored
Add TA binary size validation to avoid OOB write. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Mukul Joshi authored
Based on the recommendation of MEC FW, update BadOpcode interrupt handling by unmapping all queues, removing the queue that got the interrupt from scheduling and remapping rest of the queues back when using MES scheduler. This is done to prevent the case where unmapping of the bad queue can fail thereby causing a GPU reset. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Mukul Joshi authored
MEC FW expects MES to unmap all queues when a VM fault is observed on a queue and then resumed once the affected process is terminated. Use the MES Suspend and Resume APIs to achieve this. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Mukul Joshi authored
Add implementation for MES Suspend and Resume APIs to unmap/map all queues for GFX11. Support for GFX12 will be added when the corresponding firmware support is in place. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Amber Lin authored
When amdgpu enable enforce_isolation, KFD enables single-process mode in HWS and sets exec_cleaner_shader bit in MAP_PROCESS. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Srinivasan Shanmugam authored
This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v9_4_3 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. The commit also includes a check for the type of the ring. If the type of the ring is `AMDGPU_RING_TYPE_COMPUTE`, the `xcp_id` of the `enforce_isolation` structure in the `gfx` structure of the `amdgpu_device` is set to the `xcp_id` of the ring. This ensures that the correct `xcp_id` is used when enforcing isolation on compute rings. The `xcp_id` is an identifier for an XCP partition, and different rings can be associated with different XCP partitions. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
-
Srinivasan Shanmugam authored
This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v9_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>
-
Srinivasan Shanmugam authored
This commit introduces the Enforce Isolation Handler designed to enforce shader isolation on AMD GPUs, which helps to prevent data leakage between different processes. The handler counts the number of emitted fences for each GFX and compute ring. If there are any fences, it schedules the `enforce_isolation_work` to be run after a delay of `GFX_SLICE_PERIOD`. If there are no fences, it signals the Kernel Fusion Driver (KFD) to resume the runqueue. The function is synchronized using the `enforce_isolation_mutex`. This commit also introduces a reference count mechanism (kfd_sch_req_count) to keep track of the number of requests to enable the KFD scheduler. When a request to enable the KFD scheduler is made, the reference count is decremented. When the reference count reaches zero, a delayed work is scheduled to enforce isolation after a delay of GFX_SLICE_PERIOD. When a request to disable the KFD scheduler is made, the function first checks if the reference count is zero. If it is, it cancels the delayed work for enforcing isolation and checks if the KFD scheduler is active. If the KFD scheduler is active, it sends a request to stop the KFD scheduler and sets the KFD scheduler state to inactive. Then, it increments the reference count. The function is synchronized using the kfd_sch_mutex to ensure that the KFD scheduler state and reference count are updated atomically. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com>
-