- 17 Nov, 2022 24 commits
-
-
Ramesh Errabolu authored
Allow user to know number of compute units (CU) that are in use at any given moment. Enable access to the method kgd_gfx_v9_get_cu_occupancy that computes CU occupancy. Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tao Zhou authored
Register related irq handler. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tao Zhou authored
Register irq handler. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tao Zhou authored
Add interrupt source id macros. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tao Zhou authored
Initialize JPEG RAS structure and add error query interface. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tao Zhou authored
Initialize VCN RAS structure and add RAS status query function. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tao Zhou authored
Make the code reusable. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tao Zhou authored
So the code can be reused. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Tao Zhou authored
Set support flag for VCN/JPEG 4.0. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
YiPeng Chai authored
The patch is enabling mode-1 reset for RAS recovery in fatal error mode. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Luben Tuikov authored
Add support for RAS table at I2C EEPROM address of 0x40000, since on some ASICs it is not at 0, but at 0x40000. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Kent Russell <kent.russell@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Tested-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Luben Tuikov authored
Don't assume FRU MCU memory locations for the FRU data fields, or their sizes, instead read and interpret the IPMI data, as stipulated in the IPMI spec version 1.0 rev 1.2. Extract the Product Name, Product Part/Model Number, and the Product Serial Number by interpreting the IPMI data. Check the checksums of the stored IPMI data to make sure we don't read and give corrupted data back the the user. Eliminate small I2C reads, and instead read the whole Product Info Area in one go, and then extract the information we're seeking from it. Eliminates a whole function, making this file smaller. v2: Clarify changes in the commit message. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Kent Russell <kent.russell@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Tested-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Luben Tuikov authored
Set the new correct default FRU MCU I2C address for newer ASICs, so that we can correctly read the Product Name, Product Part/Model Number and Serial Number. On newer ASICs, the FRU MCU was moved to I2C address 0x58. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Kent Russell <kent.russell@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Tested-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Luben Tuikov authored
Allow non-standard EEPROM I2C address of 0x58, where the Device Type Identifier is 1011b, where we form 1011000b = 0x58 I2C address, as on some ASICs the FRU data lives there. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Kent Russell <kent.russell@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Tested-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Remove unused parameters and cleanup dead code. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Clean that up a bit, no functional change. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
The basic problem here is that it's not allowed to page fault while holding the reservation lock. So it can happen that multiple processes try to validate an userptr at the same time. Work around that by putting the HMM range object into the mutex protected bo list for now. v2: make sure range is set to NULL in case of an error Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> CC: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Since switching to HMM we always need that because we no longer grab references to the pages. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> CC: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Lyude Paul authored
Now that we've fixed the issue with using the incorrect topology manager, we're actually grabbing the topology manager's lock - and consequently deadlocking. Luckily for us though, there's actually nothing in AMD's DSC state computation code that really should need this lock. The one exception is the mutex_lock() in dm_dp_mst_is_port_support_mode(), however we grab no locks beneath &mgr->lock there so that should be fine to leave be. Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: 8c20a1ed ("drm/amd/display: MST DSC compute fair share") Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Lyude Paul authored
This bug hurt me. Basically, it appears that we've been grabbing the entirely wrong mutex in the MST DSC computation code for amdgpu! While we've been grabbing: amdgpu_dm_connector->mst_mgr That's zero-initialized memory, because the only connectors we'll ever actually be doing DSC computations for are MST ports. Which have mst_mgr zero-initialized, and instead have the correct topology mgr pointer located at: amdgpu_dm_connector->mst_port->mgr; I'm a bit impressed that until now, this code has managed not to crash anyone's systems! It does seem to cause a warning in LOCKDEP though: [ 66.637670] DEBUG_LOCKS_WARN_ON(lock->magic != lock) This was causing the problems that appeared to have been introduced by: commit 4d07b0bc ("drm/display/dp_mst: Move all payload info into the atomic state") This wasn't actually where they came from though. Presumably, before the only thing we were doing with the topology mgr pointer was attempting to grab mst_mgr->lock. Since the above commit however, we grab much more information from mst_mgr including the atomic MST state and respective modesetting locks. This patch also implies that up until now, it's quite likely we could be susceptible to race conditions when going through the MST topology state for DSC computations since we technically will not have grabbed any lock when going through it. So, let's fix this by adjusting all the respective code paths to look at the right pointer and skip things that aren't actual MST connectors from a topology. Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: 8c20a1ed ("drm/amd/display: MST DSC compute fair share") Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Lyude Paul authored
Looks like that we're accidentally dropping a pretty important return code here. For some reason, we just return -EINVAL if we fail to get the MST topology state. This is wrong: error codes are important and should never be squashed without being handled, which here seems to have the potential to cause a deadlock. Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Fixes: 8ec04671 ("drm/dp_mst: Add helper to trigger modeset on affected DSC MST CRTCs") Cc: <stable@vger.kernel.org> # v5.6+ Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Lyude Paul authored
It appears that amdgpu makes the mistake of completely ignoring the return values from the DP MST helpers, and instead just returns a simple true/false. In this case, it seems to have come back to bite us because as a result of simply returning false from compute_mst_dsc_configs_for_state(), amdgpu had no way of telling when a deadlock happened from these helpers. This could definitely result in some kernel splats. V2: * Address Wayne's comments (fix another bunch of spots where we weren't passing down return codes) Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: 8c20a1ed ("drm/amd/display: MST DSC compute fair share") Cc: Harry Wentland <harry.wentland@amd.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Roman Li authored
[Why] Assert on non-OK response from SMU is unnecessary. It was replaced with respective log message on other asics in the past with commit: "drm/amd/display: Removing assert statements for Linux" [How] Remove assert and add dbg logging as on other DCNs. Signed-off-by: Roman Li <roman.li@amd.com> Reviewed-by: Saaem Rizvi <SyedSaaem.Rizvi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
Move the new callback outside of the guard. Fixes: dc55b106 ("drm/amd/display: Disable phantom OTG after enable for plane disable") CC: Alvin Lee <Alvin.Lee2@amd.com> CC: Alan Liu <HaoPing.Liu@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
- 15 Nov, 2022 16 commits
-
-
Christian König authored
The state of VRAM is unreliable due to a PCI event like AER, link reset or DPC. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
Re-submitting IBs by the kernel has many problems because pre- requisite state is not automatically re-created as well. In other words neither binary semaphores nor things like ring buffer pointers are in the state they should be when the hardware starts to work on the IBs again. Additional to that even after more than 5 years of developing this feature it is still not stable and we have massively problems getting the reference counts right. As discussed with user space developers this behavior is not helpful in the first place. For graphics and multimedia workloads it makes much more sense to either completely re-create the context or at least re-submitting the IBs from userspace. For compute use cases re-submitting is also not very helpful since userspace must rely on the accuracy of the result. Because of this we stop this practice and instead just properly note that the fence submission was canceled. The only use case we keep the re-submission for now is SRIOV and function level resets. v2: as suggested by Sshaoyun stop resubmitting jobs even for SRIOV Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Christian König authored
This reverts commit e6c6338f. This feature basically re-submits one job after another to figure out which one was the one causing a hang. This is obviously incompatible with gang-submit which requires that multiple jobs run at the same time. It's also absolutely not helpful to crash the hardware multiple times if a clean recovery is desired. For testing and debugging environments we should rather disable recovery alltogether to be able to inspect the state with a hw debugger. Additional to that the sw implementation is clearly buggy and causes reference count issues for the hardware fence. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Dan Carpenter authored
In the PP_OD_EDIT_VDDC_CURVE case the "input_index" variable is capped at 2 but not checked for negative values so it results in an out of bounds read. This value comes from the user via sysfs. Fixes: d5bf2653 ("drm/amd/powerplay: added vega20 overdrive support V3") Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Paulo Miguel Almeida authored
One-element arrays are deprecated, and we are replacing them with flexible array members instead. So, replace one-element array with flexible-array member in structs ATOM_I2C_VOLTAGE_OBJECT_V3, ATOM_ASIC_INTERNAL_SS_INFO_V2, ATOM_ASIC_INTERNAL_SS_INFO_V3, and refactor the rest of the code accordingly. Important to mention is that doing a build before/after this patch results in no functional binary output differences. This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help us make progress towards globally enabling -fstrict-flex-arrays=3 [1]. Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/238 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836 [1] Signed-off-by: Paulo Miguel Almeida <paulo.miguel.almeida.rodenas@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Evan Quan authored
The workaround designed for some specific ASICs is wrongly applied to SMU13 ASICs. That leads to some runpm hang. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Evan Quan authored
Enable SMU13.0.7 runpm support. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Evan Quan authored
Enable SMU13.0.0 runpm support. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
YiPeng Chai authored
Amdgpu_ras_set_error_query_ready is called at the start of amdgpu_device_gpu_recover to disable query ras error, but the code behind only enables query ras error in full reset path, but not in soft reset path, emergency restart path and skip the hardware reset path. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
If we enable virtual display functionality on parts with no display hardware we can end up trying to check for and reserve the vbios FB area on devices where it doesn't exist. Check if display hardware is actually present on the hardware before trying to reserve the memory. v2: move the check into common code Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Eric Huang authored
It is to resolve a regression, which fails to allocate VRAM due to no free memory in application, the reason is we add check of vram_pin_size for memory limit, and application is pinning the memory for Peerdirect, KFD should not count it in memory limit. So removing vram_pin_size will resolve it. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Guchun Chen authored
Otherwise, some unexpected PCIE AER errors will be observed in runtime suspend/resume cycle. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
YiPeng Chai authored
Add umc channel index mapping table for umc_v8_10. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Perry Yuan authored
change the vangogh family to use IP discovery path to initialize IP list, this needs to remove the DID from the PCI ID list to allow the IP discovery path to set all the IP versions correctly. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Perry Yuan authored
Use ip versions (10,3,1) to match the GPU after Vangogh switched to use IP discovery path. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Perry Yuan authored
Add the missing apu flag for Vangogh when using IP discovery code path to initialize IPs Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-