Commits · bd200d190f45b62c006d1ad0a63eeffd87db7a47 · nexedi / linux

02 Aug, 2019 24 commits

drm/amd/display: Don't replace the dc_state for fast updates · bd200d19

Nicholas Kazlauskas authored Jul 31, 2019

[Why]
DRM private objects have no hw_done/flip_done fencing mechanism on their
own and cannot be used to sequence commits accordingly.

When issuing commits that don't touch the same set of hardware resources
like page-flips on different CRTCs we can run into the issue below
because of this:

1. Client requests non-blocking Commit #1, has a new dc_state #1,
state is swapped, commit tail is deferred to work queue

2. Client requests non-blocking Commit #2, has a new dc_state #2,
state is swapped, commit tail is deferred to work queue

3. Commit #2 work starts, commit tail finishes,
atomic state is cleared, dc_state #1 is freed

4. Commit #1 work starts,
commit tail encounters null pointer deref on dc_state #1

In order to change the DC state as in the private object we need to
ensure that we wait for all outstanding commits to finish and that
any other pending commits must wait for the current one to finish as
well.

We do this for MEDIUM and FULL updates. But not for FAST updates, nor
would we want to since it would cause stuttering from the delays.

FAST updates that go through dm_determine_update_type_for_commit always
create a new dc_state and lock the DRM private object if there are
any changed planes.

We need the old state to validate, but we don't actually need the new
state here.

[How]
If the commit isn't a full update then the use after free can be
resolved by simply discarding the new state entirely and retaining
the existing one instead.

With this change the sequence above can be reexamined. Commit #2 will
still free Commit #1's reference, but before this happens we actually
added an additional reference as part of Commit #2.

If an update comes in during this that needs to change the dc_state
it will need to wait on Commit #1 and Commit #2 to finish. Then it'll
swap the state, finish the work in commit tail and drop the last
reference on Commit #2's dc_state.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204181
Fixes: 004b3938 ("drm/amd/display: Check scaling info when determing update type")
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: David Francis <david.francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

bd200d19

drm/amd/display: Skip determining update type for async updates · 43d10d30

Nicholas Kazlauskas authored Jul 31, 2019

[Why]
By passing through the dm_determine_update_type_for_commit for atomic
commits that can be done asynchronously we are incurring a
performance penalty by locking access to the global private object
and holding that access until the end of the programming sequence.

This is also allocating a new large dc_state on every access in addition
to retaining all the references on each stream and plane until the end
of the programming sequence.

[How]
Shift the determination for async update before validation. Return early
if it's going to be an async update.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: David Francis <david.francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

43d10d30

drm/amd/display: Allow cursor async updates for framebuffer swaps · e16e37ef

Nicholas Kazlauskas authored Jun 10, 2019

[Why]
We previously allowed framebuffer swaps as async updates for cursor
planes but had to disable them due to a bug in DRM with async update
handling and incorrect ref counting. The check to block framebuffer
swaps has been added to DRM for a while now, so this check is redundant.

The real fix that allows this to properly in DRM has also finally been
merged and is getting backported into stable branches, so dropping
this now seems to be the right time to do so.

[How]
Drop the redundant check for old_fb != new_fb.

With the proper fix in DRM, this should also fix some cursor stuttering
issues with xf86-video-amdgpu since it double buffers the cursor.

IGT tests that swap framebuffers (-varying-size for example) should
also pass again.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: David Francis <david.francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e16e37ef

drm/amdgpu: fix unsigned variable instance compared to less than zero · ac4bf4a1

Colin Ian King authored Aug 01, 2019

Currenly the error check on variable instance is always false because
it is a uint32_t type and this is never less than zero. Fix this by
making it an int type.

Addresses-Coverity: ("Unsigned compared against 0")
Fixes: 7d0e6329 ("drm/amdgpu: update more sdma instances irq support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ac4bf4a1

drm/amd/powerplay: Allow changing of fan_control in smu_v11_0 · f0ced3f6

Matt Coffin authored Jul 31, 2019

[Why]
Before this change, the fan control state on smu_v11 was not able to be
changed because the capability check for checking if the fan control
capability existed was inverted.

[How]
The capability check for fan control in smu_v11_0_auto_fan_control was
inverted, to correctly check for the absence, instead of presence of fan
control capabilities.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Matt Coffin <mcoffin13@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f0ced3f6

drm/amd/powerplay: fix a few spelling mistakes · ab631311

Colin Ian King authored Aug 01, 2019

There are a few spelling mistakes "unknow" -> "unknown" and
"enabeld" -> "enabled". Fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ab631311

gpu: drm: radeon: Fix a possible null-pointer dereference in radeon_connector_set_property() · f3eb9b8f

Jia-Ju Bai authored Jul 29, 2019

In radeon_connector_set_property(), there is an if statement on line 743
to check whether connector->encoder is NULL:
    if (connector->encoder)

When connector->encoder is NULL, it is used on line 755:
    if (connector->encoder->crtc)

Thus, a possible null-pointer dereference may occur.

To fix this bug, connector->encoder is checked before being used.

This bug is found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f3eb9b8f

drm/amd/powerplay: fix off-by-one upper bounds limit checks · e3bf125b

Colin Ian King authored Aug 01, 2019

There are two occurrances of off-by-one upper bound checking of indexes
causing potential out-of-bounds array reads. Fix these.

Addresses-Coverity: ("Out-of-bounds read")
Fixes: cb33363d ("drm/amd/powerplay: add smu feature name support")
Fixes: 6b294793 ("drm/amd/powerplay: add smu message name support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e3bf125b

drm/radeon: Fix EEH during kexec · 6f7fe9a9

KyleMahlkuch authored Jul 31, 2019

During kexec some adapters hit an EEH since they are not properly
shut down in the radeon_pci_shutdown() function. Adding
radeon_suspend_kms() fixes this issue.
Signed-off-by: KyleMahlkuch <kmahlkuc@linux.vnet.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6f7fe9a9

drm/amdkfd: Extend CU mask to 8 SEs (v3) · 5145d57e

Jay Cornwall authored Jul 18, 2019

Following bitmap layout logic introduced by:
"drm/amdgpu: support get_cu_info for Arcturus".

v2: squash in fixup for gfx_v9_0.c (Alex)
v3: squash in debug print output fix
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5145d57e

drm/amdgpu: support get_cu_info for Arcturus · 857b82d0

Le Ma authored Jul 08, 2019

This change is because SE/SH layout on Arcturus is 8*1, different from
4*2(or 4*1) on Vega ASICs.

Currently the cu bitmap array is 4x4 size, and besides the bitmap is used widely
across SW stack. To mostly reduce the scale of impact, we make the cu bitmap
array compatible with SE/SH layout on Arcturus. Then the store of cu bits of
each shader array for Arcturus will be like below:
    SE0,SH0 --> bitmap[0][0]
    SE1,SH0 --> bitmap[1][0]
    SE2,SH0 --> bitmap[2][0]
    SE3,SH0 --> bitmap[3][0]
    SE4,SH0 --> bitmap[0][1]
    SE5,SH0 --> bitmap[1][1]
    SE6,SH0 --> bitmap[2][1]
    SE7,SH0 --> bitmap[3][1]
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

857b82d0

drm/amdgpu: Fix pcie_bw on Vega20 · 612e4ed9

Kent Russell authored Jul 31, 2019

The registers used for VG20 are different in that certain performance
counters were split off to TXCLK3/4. Vega10/12 doesn't have this, so add
a new vg20_get_pcie_usage to reflect this change.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

612e4ed9

drm/amdgpu: Update NBIO headers to add TXCLK3/4 · 57d352f7

Kent Russell authored Jul 31, 2019

These are added for VG20, and are needed for PCIe bandwidth.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

57d352f7

drm/amdgpu: Add amdgpu_asic_funcs.reset_method for Vega20 · 19ed70ff

Andrey Grodzovsky authored Aug 01, 2019

Fixes GPU reset crash.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

19ed70ff

drm/amdgpu: Mark KFD VRAM allocations for wipe on release · 6856e4b6

Felix Kuehling authored Jul 08, 2019

Memory used by KFD applications can contain sensitive information that
should not be leaked to other processes. The current approach to prevent
leaks is to clear VRAM at allocation time. This is not effective because
memory can be reused in other ways without being cleared. Synchronously
clearing memory on the allocation path also carries a significant
performance penalty.

Stop clearing memory at allocation time. Instead mark the memory for
wipe on release.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6856e4b6

drm/amdgpu: Implement VRAM wipe on release · ab2f7a5c

Felix Kuehling authored Jul 09, 2019

Wipe VRAM memory containing sensitive data when moving or releasing
BOs. Clearing the memory is pipelined to minimize any impact on
subsequent memory allocation latency. Use of a poison value should
help debug future use-after-free bugs.

When moving BOs, the existing ttm_bo_pipelined_move ensures that the
memory won't be reused before being wiped.

When releasing BOs, the BO is fenced with the memory fill operation,
which results in queuing the BO for a delayed delete.

v2: Move amdgpu_amdkfd_unreserve_memory_limit into
amdgpu_bo_release_notify so that KFD can use memory that's still
being cleared in the background
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ab2f7a5c

drm/amdgpu: Add flag to wipe VRAM on release · d8f4981e

Felix Kuehling authored Jul 08, 2019

This memory allocation flag will be used to indicate BOs containing
sensitive data that should not be leaked to other processes.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d8f4981e

drm/ttm: Add release_notify callback to ttm_bo_driver · 274840e5

Felix Kuehling authored Jul 09, 2019

This notifies the driver that a BO is about to be released.

Releasing a BO also invokes the move_notify callback from
ttm_bo_cleanup_memtype_use, but that happens too late for anything
that would add fences to the BO and require a delayed delete.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

274840e5

drm/amd/display: Use switch table for dc_to_smu_clock_type · d9ec5cfd

Leo Li authored Jul 25, 2019

Using a static int array will cause errors if the given dm_pp_clk_type
is out-of-bounds. For robustness, use a switch table, with a default
case to handle all invalid values.

v2: 0 is a valid clock type for smu_clk_type. Return SMU_CLK_COUNT
    instead on invalid mapping.
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d9ec5cfd

drm/amd/display: Use proper enum conversion functions · d196bbbc

Nathan Chancellor authored Jul 03, 2019

clang warns:

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_pp_smu.c:336:8:
warning: implicit conversion from enumeration type 'enum smu_clk_type'
to different enumeration type 'enum amd_pp_clock_type'
[-Wenum-conversion]
                                        dc_to_smu_clock_type(clk_type),
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_pp_smu.c:421:14:
warning: implicit conversion from enumeration type 'enum
amd_pp_clock_type' to different enumeration type 'enum smu_clk_type'
[-Wenum-conversion]
                                        dc_to_pp_clock_type(clk_type),
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are functions to properly convert between all of these types, use
them so there are no longer any warnings.

Fixes: a43913ea ("drm/amd/powerplay: add function get_clock_by_type_with_latency for navi10")
Fixes: e5e4e223 ("drm/amd/powerplay: add interface to get clock by type with latency for display (v2)")
Link: https://github.com/ClangBuiltLinux/linux/issues/586Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d196bbbc

drm/amdgpu: fix double ucode load by PSP(v3) · 482f0e53

Monk Liu authored Jul 31, 2019

previously the ucode loading of PSP was repreated, one executed in
phase_1 init/re-init/resume and the other in fw_loading routine

Avoid this double loading by clearing ip_blocks.status.hw in suspend or reset
prior to the FW loading and any block's hw_init/resume

v2:
still do the smu fw loading since it is needed by bare-metal

v3:
drop the change in reinit_early_sriov, just clear all block's status.hw
in the head place and set the status.hw after hw_init done is enough
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

482f0e53

drm/amdgpu: fix incorrect judge on sos fw version · 9244d3a6

Monk Liu authored Jul 30, 2019

for SRIOV the SOS fw of PSP is loaded in hypervisor thus
guest won't tell the version of it, and judging feature by
reading the sos fw version in guest side is completely wrong
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

9244d3a6

drm/amdgpu: cleanup vega10 SRIOV code path · 4cd4c5c0

Monk Liu authored Jul 30, 2019

we can simplify all those unnecessary function under
SRIOV for vega10 since:
1) PSP L1 policy is by force enabled in SRIOV
2) original logic always set all flags which make itself
   a dummy step

besides,
1) the ih_doorbell_range set should also be skipped
for VEGA10 SRIOV.
2) the gfx_common registers should also be skipped
for VEGA10 SRIOV.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4cd4c5c0

drm/amd/powerplay: sort feature status index by asic feature id for smu · 67194518

Kevin Wang authored Jul 31, 2019

before this change, the pp_feature sysfs show feature enable state by
logic feature id, it is not easy to read.
this change will sort pp_features show index by asic feature id.

before:
features high: 0x00000623 low: 0xb3cdaffb
00. DPM_PREFETCHER       ( 0) : enabeld
01. DPM_GFXCLK           ( 1) : enabeld
02. DPM_UCLK             ( 3) : enabeld
03. DPM_SOCCLK           ( 4) : enabeld
04. DPM_MP0CLK           ( 5) : enabeld
05. DPM_LINK             ( 6) : enabeld
06. DPM_DCEFCLK          ( 7) : enabeld
07. DS_GFXCLK            (10) : enabeld
08. DS_SOCCLK            (11) : enabeld
09. DS_LCLK              (12) : disabled
10. PPT                  (23) : enabeld
11. TDC                  (24) : enabeld
12. THERMAL              (33) : enabeld
13. RM                   (35) : disabled
......

after:
features high: 0x00000623 low: 0xb3cdaffb
00. DPM_PREFETCHER       ( 0) : enabeld
01. DPM_GFXCLK           ( 1) : enabeld
02. DPM_GFX_PACE         ( 2) : disabled
03. DPM_UCLK             ( 3) : enabeld
04. DPM_SOCCLK           ( 4) : enabeld
05. DPM_MP0CLK           ( 5) : enabeld
06. DPM_LINK             ( 6) : enabeld
07. DPM_DCEFCLK          ( 7) : enabeld
08. MEM_VDDCI_SCALING    ( 8) : enabeld
09. MEM_MVDD_SCALING     ( 9) : enabeld
10. DS_GFXCLK            (10) : enabeld
11. DS_SOCCLK            (11) : enabeld
12. DS_LCLK              (12) : disabled
13. DS_DCEFCLK           (13) : enabeld
......
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

67194518

31 Jul, 2019 16 commits

drm/amdkfd: enable KFD support for navi14 · 9475a77b

Alex Deucher authored Jul 26, 2019

Same as navi10.
Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

9475a77b

drm/amdgpu: disable inject for failed subblocks of gfx · dc4d716d

Dennis Li authored Jul 23, 2019

some subblocks of gfx fail in inject test, disable them
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

dc4d716d

drm/amdgpu: support gfx ras error injection and err_cnt query · 83b0582c

Dennis Li authored Jul 31, 2019

check gfx error count in both ras querry function and
ras interrupt handler.

gfx ras is still disabled by default due to known stability
issue found in gpu reset.
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

83b0582c

drm/amdgpu: add RAS callback for gfx · 2c960ea0

Dennis Li authored Jul 31, 2019

Add functions for RAS error inject and query error counter
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2c960ea0

drm/amdgpu: add define for gfx ras subblock · dc23a08f

Dennis Li authored Jul 19, 2019

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

dc23a08f

drm/amd/include: add define of TCP_EDC_CNT_NEW · 4bb6b8c7

Dennis Li authored Jul 19, 2019

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4bb6b8c7

drm/amd/include: add bitfield define for EDC registers · ca3f422f

Dennis Li authored Jul 19, 2019

Add EDC registers to support VEGA20 RAS
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ca3f422f

drm/amdgpu: remove ras_reserve_vram in ras injection · 7cdc2ee3

Tao Zhou authored Jul 24, 2019

error injection address is not in gpu address space
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Dennis Li <dennis.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7cdc2ee3

drm/amdgpu: add check for ras error type · e1063493

Tao Zhou authored Jul 23, 2019

only ue and ce errors are supported
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Dennis Li <dennis.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e1063493

drm/amdgpu: update interrupt callback for all ras clients · 81e02619

Tao Zhou authored Jul 22, 2019

add err_data parameter in interrupt cb for ras clients
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Dennis Li <dennis.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

81e02619

drm/amdgpu: allow ras interrupt callback to return error data · cf04dfd0

Tao Zhou authored Jul 22, 2019

add error data as parameter for ras interrupt cb and process it
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Dennis Li <dennis.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

cf04dfd0

drm/amdgpu: query umc ras error address · 8c948103

Tao Zhou authored Jul 24, 2019

query umc ras error address, translate it to gpu 4k page view
and save it.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <dennis.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8c948103

drm/amdgpu: add structures for umc error address translation · c2742aef

Tao Zhou authored Jul 22, 2019

add related registers, callback function and channel index table
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c2742aef

drm/amdgpu: add support for recording ras error address · 6f102dba

Tao Zhou authored Jul 22, 2019

more than one error address may be recorded in one query
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Dennis Li <dennis.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6f102dba

drm/amdgpu: update algorithm of umc uncorrectable error counting · f1ed4afa

Tao Zhou authored Jul 23, 2019

remove the check of ErrorCodeExt

v2: refine the if condition for ue counting
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Dennis Li <dennis.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f1ed4afa

drm/amdgpu: switch to amdgpu_umc structure · 045c0216

Tao Zhou authored Jul 23, 2019

create new amdgpu_umc structure to for more umc
settings in future and switch to the new structure
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li <dennis.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

045c0216