Commits · 73490d26588443ba95cfcca00b6ac2267718fcdd · Kirill Smelkov / linux

23 Sep, 2021 40 commits

drm/amdgpu: Consolidate RAS cmd warning messages · 73490d26

John Clements authored Sep 23, 2021

Explicity post warning if cmd is issued against unsupported IP

Update to latest RAS TA interface
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

73490d26

drm/amdkfd: fix svm_migrate_fini warning · 22f4f4fa

Philip Yang authored Sep 20, 2021

Device manager releases device-specific resources when a driver
disconnects from a device, devm_memunmap_pages and
devm_release_mem_region calls in svm_migrate_fini are redundant.

It causes below warning trace after patch "drm/amdgpu: Split
amdgpu_device_fini into early and late", so remove function
svm_migrate_fini.

BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718

WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795
devm_release_action+0x51/0x60
Call Trace:
    ? memunmap_pages+0x360/0x360
    svm_migrate_fini+0x2d/0x60 [amdgpu]
    kgd2kfd_device_exit+0x23/0xa0 [amdgpu]
    amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu]
    amdgpu_device_fini_sw+0x45/0x290 [amdgpu]
    amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
    drm_dev_release+0x20/0x40 [drm]
    release_nodes+0x196/0x1e0
    device_release_driver_internal+0x104/0x1d0
    driver_detach+0x47/0x90
    bus_remove_driver+0x7a/0xd0
    pci_unregister_driver+0x3d/0x90
    amdgpu_exit+0x11/0x20 [amdgpu]
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

22f4f4fa

drm/amdkfd: handle svm migrate init error · 586d71a4

Philip Yang authored Sep 17, 2021

If svm migration init failed to create pgmap for device memory, set
pgmap type to 0 to disable device SVM support capability.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

586d71a4

drm/amdgpu: Updated RAS infrastructure · 640ae42e

John Clements authored Sep 22, 2021

Update RAS infrastructure to support RAS query for MCA subblocks
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

640ae42e

drm/amdgpu: move amdgpu_virt_release_full_gpu to fini_early stage · 6effad8a

Guchun Chen authored Sep 18, 2021

adev->rmmio is set to be NULL in amdgpu_device_unmap_mmio to prevent
access after pci_remove, however, in SRIOV case, amdgpu_virt_release_full_gpu
will still use adev->rmmio for access after amdgpu_device_unmap_mmio.
The patch is to move such SRIOV calling earlier to fini_early stage.

Fixes: 07775fc1 ("drm/amdgpu: Unmap all MMIO mappings")
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6effad8a

drm/amd/display: Fix wrong format specifier in amdgpu_dm.c · 655c167e

Hayden Goodfellow authored Sep 12, 2021

[Why]
Currently, the 32bit kernel build fails due to an incorrect string
format specifier. ARRAY_SIZE() returns size_t type as it uses sizeof().
However, we specify it in a string as %ld. This causes a compiler error
and causes the 32bit build to fail.

[How]
Change the %ld to %zu as size_t (which sizeof() returns) is an unsigned
integer data type. We use 'z' to ensure it also works with 64bit build.
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Hayden Goodfellow <Hayden.Goodfellow@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

655c167e

drm/amd/display: 3.2.154 · c719b0cd

Aric Cyr authored Sep 13, 2021

This new DC version brings improvements in the following areas:
- New firmware version
- Fix HPD problems on DCN2
- Fix generic encoder problems and null deferences
- Adjust DCN301 watermark
- Rework dynamic bpp for DCN3x
- Improve link training fallback logic
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c719b0cd

drm/amd/display: [FW Promotion] Release 0.0.84 · 2800ff0e

Anthony Koo authored Sep 12, 2021

Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2800ff0e

drm/amd/display: Fix null pointer dereference for encoders · 60f39edd

Jimmy Kizito authored Sep 12, 2021

[Why]
Links which are dynamically assigned link encoders have their link
encoder set to NULL.

[How]
Check that a pointer to a link_encoder object is non-NULL before using
it.
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Jimmy Kizito <Jimmy.Kizito@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

60f39edd

drm/amd/display: Creating a fw boot options bit for an upcoming feature · 39371f7d

Meenakshikumar Somasundaram authored Sep 10, 2021

[Why]
Need a bit for x86 driver to enable a FW boot option for an upcoming
feature.

[How]
Added a bit in dmub_fw_boot_options for an upcoming feature.
Reviewed-by: Jimmy Kizito <jimmy.kizito@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

39371f7d

drm/amd/display: DIG mapping change is causing a blocker · 05408f24

Liu, Zhan authored Sep 10, 2021

[Why]
DIG mapping change is causing a blocker

[How]
Revert the change for now. We will re-implement it later.
Reviewed-by: Jimmy Kizito <jimmy.kizito@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Zhan Liu <Zhan.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

05408f24

drm/amd/display: Fix B0 USB-C DP Alt mode · bdd1a21b

Liu, Zhan authored Sep 09, 2021

[Why]
Starting from B0, along with RDPCSTX, RDPCSPIPE registers are also used.

[How]
Make sure RDPCSPIPE registers are programmed correctly.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Zhan Liu <Zhan.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

bdd1a21b

drm/amd/display: Disable mem low power for CM HW block on DCN3.1 · 5d694266

Michael Strauss authored Sep 08, 2021

[WHY]
Currently causes visible flicker in some scenarios on OLED eDPs
Reviewed-by: Haonan Wang <haonan.wang2@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5d694266

drm/amd/display: Fix issue with dynamic bpp change for DCN3x · 253a5591

Guo, Bing authored Aug 24, 2021

Why:
Screen sometimes would have artifacts or blink once at the time when bpp
is dynamically changed.

How:
1. Changed to update PPS infopacket in frame mode instead of immediate mode
   since other updates for bpp change are double-buffered.
2. Changed double-buffering enablement programming for DCN30 as advised by
ASIC team
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Bing Guo <Bing.Guo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

253a5591

drm/amd/display: Use adjusted DCN301 watermarks · 808643ea

Nikola Cornij authored Sep 07, 2021

[why]
If DCN30 watermark calc is used for DCN301, the calculated values are
wrong due to the data structure mismatch between DCN30 and DCN301.
However, using the original DCN301 watermark values causes underflow.

[how]
- Add DCN21-style watermark calculations
- Adjust DCN301 watermark values to remove the underflow
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Nikola Cornij <nikola.cornij@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

808643ea

drm/amd/display: Added power down on boot for DCN3 · f777bb9a

Lai, Derek authored Sep 03, 2021

[Why]
The change of setting a timer callback on boot for 10 seconds is still
working, just lost power down on boot and power down for DCN3.

[How]
Added power down on boot and power down for DCN3.
Reviewed-by: Anthony Koo <anthony.koo@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Derek Lai <Derek.Lai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f777bb9a

drm/amd/display: Fix dynamic encoder reassignment · 0d4b4253

Jimmy Kizito authored Sep 02, 2021

[Why]
Incorrect encoder assignments were being used while applying a new state
to hardware.

(1) When committing a new state to hardware requires resetting the
back-end, the encoder assignments of the current or old state should be
used when disabling the back-end; and the encoder assignments for the
next or new state should be used when re-enabling the back-end.

(2) Link training on hot plug could take over an encoder already in use
by another stream without first disabling it.

[How]

(1) Introduce a resource context 'link_enc_cfg_context' which includes:
- a mode to indicate when transitioning from current to next state.
- transient encoder assignments to use during this state transition.

Update the encoder configuration interface to respond to queries about
encoder assignment based on the mode of operation.

(2) Check if an encoder is already in use before attempting to perform
link training on hot plug.
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Jimmy Kizito <Jimmy.Kizito@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0d4b4253

drm/amd/display: Fix concurrent dynamic encoder assignment · b3492ed1

Jimmy Kizito authored Aug 28, 2021

[Why]
Trying to enable multiple displays simultaneously exposed shortcomings
with the algorithm for dynamic link encoder assignment.

The main problems were:
- Assuming stream order remained constant across states would sometimes
lead to invalid DIG encoder assignment.
- Incorrect logic for deciding whether or not a DIG could support a
stream would also sometimes lead to invalid DIG encoder assignment.
- Changes in encoder assignment were wholesale while updating of the
pipe backend is incremental. This would lead to the hardware state not
matching the software state even with valid encoder assignments.

[How]

The following changes fix the identified problems.
- Use stream pointer rather than stream index to track streams across
states.
- Fix DIG compatibility check by examining the link signal type rather
than the stream signal type.
- Modify assignment algorithm to make incremental updates so software
and hardware states remain coherent.

Additionally:
- Add assertions and an encoder assignment validation function
link_enc_cfg_validate() to detect potential problems with encoder
assignment closer to their root cause.
- Reduce the frequency with which the assignment algorithm is executed.
It should not be necessary for fast state validation.
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Jimmy Kizito <Jimmy.Kizito@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b3492ed1

drm/amd/display: Fix link training fallback logic · 4de0bfe6

Jimmy Kizito authored Aug 25, 2021

[Why]
Link training should fail if stream bandwidth exceeds link bandwidth.

[How]
Correct fallback logic and use named variables to make intention clear.
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Jimmy Kizito <Jimmy.Kizito@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4de0bfe6

drm/amd/display: Fix DCN3 B0 DP Alt Mapping · 4b7786d8

Liu, Zhan authored Sep 02, 2021

[Why]
DCN3 B0 has a mux, which redirects PHYC and PHYD to PHYF and PHYG.

[How]
Fix DIG mapping.
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Zhan Liu <Zhan.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4b7786d8

drm/amd/display: 3.2.153 · d51fc42a

Aric Cyr authored Sep 06, 2021

Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d51fc42a

drm/amd/display: [FW Promotion] Release 0.0.83 · 13d463ec

Anthony Koo authored Sep 04, 2021

Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

13d463ec

drm/amd/display: Extend w/a for hard hang on HPD to dcn20 · 1bd3bc74

Qingqing Zhuo authored Aug 25, 2021

[Why]
HPD disable and enable sequences are not mutually exclusive on Linux.
For HPDs that spans under 1s (i.e. HPD low = 1s), part of the disable
sequence (specifically, a request to SMU to lower refclk) could come
right before the call to PHY enablement, causing DMUB to access an
irresponsive PHY and thus a hard hang on the system.

[How]
Disable 48mhz refclk off when there is any HPD status in connected state
for dcn20.
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1bd3bc74

drm/amd/display: Reduce stack size for dml21_ModeSupportAndSystemConfigurationFull · a62427ef

Harry Wentland authored Sep 13, 2021

[Why & How]
With Werror enabled in the kernel we were failing the clang build since
dml21_ModeSupportAndSystemConfigurationFull's stack frame is 1064 when
building with clang, and exceeding the default 1024 stack frame limit.

The culprit seems to be the Pipe struct, so pull the relevant block
out into its own sub-function.
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a62427ef

drm/amd/display: Allocate structs needed by dcn_bw_calc_rq_dlg_ttu in pipe_ctx · 1f2fcc81

Harry Wentland authored Sep 07, 2021

[Why & How]
dcn_bw_calc_rq_dlg_ttu uses a stack frame great than 1024. To solve this
we could allocate the rq_param, dlg_sys_param, and input structs
dynamically. Since this function is inside a kernel_fpu_begin()/end()
call we want to avoid memory allocation. Instead it's much
safer to pre-allocate these on the pipe_ctx.
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Fixes: 3fe617cc ("Enable '-Werror' by default for all kernel builds")
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: llvm@lists.linux.dev
Acked-by: Christian König <christian.koenig@amd.com>
Build-tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1f2fcc81

drm/amd/display: Fix rest of pass-by-value structs in DML · 757af27b

Harry Wentland authored Sep 08, 2021

Passing structs adds a lot of overhead. We don't ever want to pass
anything bigger than primitives by value.

This patch fixes these Coverity IDs:
Addresses-Coverity-ID: 1424031: ("Big parameter passed by value")
Addresses-Coverity-ID: 1424055: ("Big parameter passed by value")
Addresses-Coverity-ID: 1424072: ("Big parameter passed by value")
Addresses-Coverity-ID: 1423779: ("Big parameter passed by value")
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: llvm@lists.linux.dev
Acked-by: Christian König <christian.koenig@amd.com>
Build-tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

757af27b

drm/amd/display: Pass all structs in display_rq_dlg_helpers by pointer · 4768349e

Harry Wentland authored Sep 08, 2021

Passing structs adds a lot of overhead. We don't ever want to pass
anything bigger than primitives by value.

This patch fixes these Coverity IDs:
Addresses-Coverity-ID: 1423868: ("Big parameter passed by value")
Addresses-Coverity-ID: 1423870: ("Big parameter passed by value")
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: llvm@lists.linux.dev
Acked-by: Christian König <christian.koenig@amd.com>
Build-tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4768349e

drm/amd/display: Pass display_pipe_params_st as const in DML · 22667e6e

Harry Wentland authored Sep 07, 2021

[Why]
This neither needs to be on the stack nor passed by value
to each function call. In fact, when building with clang
it seems to break the Linux's default 1024 byte stack
frame limit.

[How]
We can simply pass this as a const pointer.

This patch fixes these Coverity IDs
Addresses-Coverity-ID: 1424031: ("Big parameter passed by value")
Addresses-Coverity-ID: 1423970: ("Big parameter passed by value")
Addresses-Coverity-ID: 1423941: ("Big parameter passed by value")
Addresses-Coverity-ID: 1451742: ("Big parameter passed by value")
Addresses-Coverity-ID: 1451887: ("Big parameter passed by value")
Addresses-Coverity-ID: 1454146: ("Big parameter passed by value")
Addresses-Coverity-ID: 1454152: ("Big parameter passed by value")
Addresses-Coverity-ID: 1454413: ("Big parameter passed by value")
Addresses-Coverity-ID: 1466144: ("Big parameter passed by value")
Addresses-Coverity-ID: 1487237: ("Big parameter passed by value")
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Fixes: 3fe617cc ("Enable '-Werror' by default for all kernel builds")
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Xinhui Pan <Xinhui.Pan@amd.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: llvm@lists.linux.dev
Acked-by: Christian König <christian.koenig@amd.com>
Build-tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

22667e6e

drm/amdkfd: fix dma mapping leaking warning · e7eb2137

Philip Yang authored Sep 14, 2021

For xnack off, restore work dma unmap previous system memory page, and
dma map the updated system memory page to update GPU mapping, this is
not dma mapping leaking, remove the WARN_ONCE for dma mapping leaking.

prange->dma_addr store the VRAM page pfn after the range migrated to
VRAM, should not dma unmap VRAM page when updating GPU mapping or
remove prange. Add helper svm_is_valid_dma_mapping_addr to check VRAM
page and error cases.

Mask out SVM_RANGE_VRAM_DOMAIN flag in dma_addr before calling amdgpu vm
update to avoid BUG_ON(*addr & 0xFFFF00000000003FULL), and set it again
immediately after. This flag is used to know the type of page later to
dma unmapping system memory page.

Fixes: 1d5dbfe6 ("drm/amdkfd: classify and map mixed svm range pages in GPU")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e7eb2137

drm/amdkfd: SVM map to gpus check vma boundary · 1aed4828

Philip Yang authored Sep 13, 2021

SVM range may includes multiple VMAs with different vm_flags, if prange
page index is the last page of the VMA offset + npages, update GPU
mapping to create GPU page table with same VMA access permission.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1aed4828

MAINTAINERS: fix up entry for AMD Powerplay · 5ff560cb

Alex Deucher authored Sep 17, 2021

Fix the path to cover both the older powerplay infrastructure
and the newer SwSMU infrastructure.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5ff560cb

drm/amd/display: fix empty debug macros · 7ac80532

Arnd Bergmann authored Sep 20, 2021

Using an empty macro expansion as a conditional expression
produces a W=1 warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c: In function 'dce_aux_transfer_with_retries':
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:775:156: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
  775 |                                                                 "dce_aux_transfer_with_retries: AUX_RET_SUCCESS: AUX_TRANSACTION_REPLY_I2C_OVER_AUX_DEFER");
      |                                                                                                                                                            ^
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:783:155: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
  783 |                                                                 "dce_aux_transfer_with_retries: AUX_RET_SUCCESS: AUX_TRANSACTION_REPLY_I2C_OVER_AUX_NACK");
      |                                                                                                                                                           ^

Expand it to "do { } while (0)" instead to make the expression
more robust and avoid the warning.

Fixes: 56aca230 ("drm/amd/display: Add AUX I2C tracing.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7ac80532

drm/amdgpu: Fix resume failures when device is gone · ebe86a57

Andrey Grodzovsky authored Sep 17, 2021

Problem:
When device goes into suspend and unplugged during it
then all HW programming during resume fails leading
to a bad SW during pci remove handling which follows.
Because device is first resumed and only later removed
we cannot rely on drm_dev_enter/exit here.

Fix:
Use a flag we use for PCIe error recovery to avoid
accessing registres. This allows to successfully complete
pm resume sequence and finish pci remove.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ebe86a57

drm/amdgpu: Fix MMIO access page fault · c03509cb

Andrey Grodzovsky authored Sep 16, 2021

Add more guards to MMIO access post device
unbind/unplug

Bug: https://bugs.archlinux.org/task/72092?project=1&order=dateopened&sort=desc&pagenum=1Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c03509cb

drm/amdgpu: Fix crash on device remove/driver unload · d82e2c24

Andrey Grodzovsky authored Sep 15, 2021

Crash:
BUG: unable to handle page fault for address: 00000000000010e1
RIP: 0010:vega10_power_gate_vce+0x26/0x50 [amdgpu]
Call Trace:
pp_set_powergating_by_smu+0x16a/0x2b0 [amdgpu]
amdgpu_dpm_set_powergating_by_smu+0x92/0xf0 [amdgpu]
amdgpu_dpm_enable_vce+0x2e/0xc0 [amdgpu]
vce_v4_0_hw_fini+0x95/0xa0 [amdgpu]
amdgpu_device_fini_hw+0x232/0x30d [amdgpu]
amdgpu_driver_unload_kms+0x5c/0x80 [amdgpu]
amdgpu_pci_remove+0x27/0x40 [amdgpu]
pci_device_remove+0x3e/0xb0
device_release_driver_internal+0x103/0x1d0
device_release_driver+0x12/0x20
pci_stop_bus_device+0x79/0xa0
pci_stop_and_remove_bus_device_locked+0x1b/0x30
remove_store+0x7b/0x90
dev_attr_store+0x17/0x30
sysfs_kf_write+0x4b/0x60
kernfs_fop_write_iter+0x151/0x1e0

Why:
VCE/UVD had dependency on SMC block for their suspend but
SMC block is the first to do HW fini due to some constraints

How:
Since the original patch was dealing with suspend issues
move the SMC block dependency back into suspend hooks as
was done in V1 of the original patches.
Keep flushing idle work both in suspend and HW fini seuqnces
since it's essential in both cases.

Fixes: 859e4659 ("drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend")
Fixes: bf756fb8 ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d82e2c24

drm/amdgpu: Fix uvd ib test timeout when use pre-allocated BO · 0a226780

xinhui pan authored Sep 16, 2021

Now we use same BO for create/destroy msg. So destroy will wait for the
fence returned from create to be signaled. The default timeout value in
destroy is 10ms which is too short.

Lets wait both fences with the specific timeout.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0a226780

drm/amdgpu: Put drm_dev_enter/exit outside hot codepath · b2fe31cf

xinhui pan authored Sep 15, 2021

We hit soft hang while doing memory pressure test on one numa system.
After a qucik look, this is because kfd invalid/valid userptr memory
frequently with process_info lock hold.
Looks like update page table mapping use too much cpu time.

perf top says below,
75.81%  [kernel]       [k] __srcu_read_unlock
 6.19%  [amdgpu]       [k] amdgpu_gmc_set_pte_pde
 3.56%  [kernel]       [k] __srcu_read_lock
 2.20%  [amdgpu]       [k] amdgpu_vm_cpu_update
 2.20%  [kernel]       [k] __sg_page_iter_dma_next
 2.15%  [drm]          [k] drm_dev_enter
 1.70%  [drm]          [k] drm_prime_sg_to_dma_addr_array
 1.18%  [kernel]       [k] __sg_alloc_table_from_pages
 1.09%  [drm]          [k] drm_dev_exit

So move drm_dev_enter/exit outside gmc code, instead let caller do it.
They are gart_unbind, gart_map, vm_clear_bo, vm_update_pdes and
gmc_init_pdb0. vm_bo_update_mapping already calls it.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b2fe31cf

drm/amd/display: Fix crash on device remove/driver unload · 006c26a0

Andrey Grodzovsky authored Sep 15, 2021

Why:
DC core is being released from DM before it's referenced
from hpd_rx wq destruction code.

How: Move hpd_rx destruction before DC core destruction.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

006c26a0

drm/amd/display: Add modifiers capable of DCC image stores for gfx10_3 · 7f6ab50a

Joshua Ashton authored Sep 15, 2021

Some games, ie. Doom Eternal, present from compute following compute
post-fx and would benefit from having DCC image stores available.

DCN on gfx10_3 doesn't need INDEPENDENT_128B_BLOCKS = 0 so we can expose
these modifiers capable of DCC image stores.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7f6ab50a

drm/amd/display: Handle GFX10_RBPLUS modifiers for dcc_ind_blk · a86396c3

Joshua Ashton authored Sep 15, 2021

Adds the missing logic to set the correct value of dcc_ind_blk for this tiling version.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a86396c3