Commits · cfb83b1d9c38c29c3c89e8d242b8e7f0148d6c09 · nexedi / linux

04 Dec, 2017 40 commits

drm/amdgpu:fix gpu recover missing skipping(v2) · cfb83b1d

Monk Liu authored Nov 08, 2017

if app close CTX right after IB submit, gpu recover
will fail to find out the entity behind this guilty
job thus lead to no job skipping for this guilty job.

to fix this corner case just move the increasement of
job->karma out of the entity iteration.

v2:
only do karma increasment if bad->s_priority != KERNEL
because we always consider KERNEL job be correct and always
want to recover an unfinished kernel job (sometimes kernel
job is interrupted by VF FLR or other GPU hang event)
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-By: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

cfb83b1d

drm/amdgpu:read VRAMLOST from gim · 75bc6099

Monk Liu authored Oct 30, 2017

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

75bc6099

drm/amdgpu: bypass FB resizing for SRIOV VF · 0c03b912

pding authored Nov 07, 2017

It introduces 900ms latency in exclusive mode which causes failure
of driver loading. Host can resize the BAR before guest staring,
so the resizing is not necessary here.
Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0c03b912

drm/amdgpu: release exclusive mode after hw_init · c6332b97

pding authored Nov 06, 2017

Signed-off-by: pding <Pixel.Ding@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c6332b97

drm/amdkfd: initialise kfd inside amdgpu_device_init · 1884734a

pding authored Nov 06, 2017

Also finalize kfd inside amdgpu_device_fini. kfd device_init needs
SRIOV exclusive accessing. Try to gather exclusive accessing to
reduce time consuming.
Signed-off-by: pding <Pixel.Ding@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1884734a

drm/amdgpu: don't use ttm_bo_move_ttm in amdgpu_ttm_bind v2 · 40575732

Christian König authored Oct 26, 2017

Just allocate the GART space and fill it.

This prevents forcing the BO to be idle.

v2: don't unbind/bind at all, just fill the allocated GART space
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

40575732

drm/amdgpu: rename amdgpu_ttm_bind to amdgpu_ttm_alloc_gart · c5835bbb

Christian König authored Oct 27, 2017

We actually don't bind here, but rather allocate GART space if necessary.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c5835bbb

drm/amdgpu: switch to use new SOC15 reg read/write macros for soc15 ih · b2b7e457

Hawking Zhang authored Nov 02, 2017

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b2b7e457

drm/amdgpu: resize VRAM BAR for CPU access v6 · d6895ad3

Christian König authored Feb 28, 2017

Try to resize BAR0 to let CPU access all of VRAM.

v2: rebased, style cleanups, disable mem decode before resize,
    handle gmc_v9 as well, round size up to power of two.
v3: handle gmc_v6 as well, release and reassign all BARs in the driver.
v4: rename new function to amdgpu_device_resize_fb_bar,
    reenable mem decoding only if all resources are assigned.
v5: reorder resource release, return -ENODEV instead of BUG_ON().
v6: squash in rebase fix
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d6895ad3

drm/amdgpu: refine SR-IOV firmware VRAM reservation to protect data · 3c738893

Horace Chen authored Nov 01, 2017

The previous solution will create a zero buffer on the system
domain and then move the zeroes to the VRAM. This will break the
original data on the VRAM.

Refine the code to create bo on VRAM domain directly and then remove
and re-create mem node to the exact position before bo_pin. This can
avoid breaking the data and will not cause eviction.
Signed-off-by: Horace Chen <horace.chen@amd.com>
Reviewed-by: monk liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3c738893

drm/amdgpu: retry init if exclusive mode request is failed · 5ffa61c1

pding authored Oct 30, 2017

This is caused of that hypervisor fails to handle request, one known
issue is MMIO unblocking timeout. In theory we can retry init here.
Signed-off-by: pding <Pixel.Ding@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5ffa61c1

drm/amdgpu: return error when sriov access requests get timeout · f4711033

pding authored Oct 30, 2017

Reported-by: Sun Gary <Gary.Sun@amd.com>
Signed-off-by: pding <Pixel.Ding@amd.com>
Reviewed-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f4711033

drm/amd/display: Remove fb_location parameter from get_fb_info · 9817d5f5

Michel Dänzer authored Oct 26, 2017

It's dead code.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

9817d5f5

amdgpu: Remove AMDGPU_{HPD,CRTC_IRQ,PAGEFLIP_IRQ}_LAST · 8fb0450c

Michel Dänzer authored Oct 24, 2017

Not used anymore.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8fb0450c

drm/amd/display: Use real number of CRTCs and HPDs in set_irq_funcs · c8dd5715

Michel Dänzer authored Oct 24, 2017

Corresponding to the previous non-DC change.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c8dd5715

amdgpu/dce: Use actual number of CRTCs and HPDs in set_irq_funcs · d794b9f8

Michel Dänzer authored Oct 24, 2017

Hardcoding the maximum numbers could result in spurious error messages
from the IRQ state callbacks, e.g. on Polaris 11/12:

[drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d794b9f8

drm/amd/display: Move conn_state to header · b3734397

Harry Wentland authored Oct 19, 2017

We'll need it in amdgpu_dm_mst_types.c as well.
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b3734397

drm/amd/display: dal 3.1.10 · 2d7d273d

Tony Cheng authored Oct 25, 2017

Signed-off-by: Tony Cheng <tony.cheng@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2d7d273d

drm/amd/display: correct DP is always in full range or bt609 · 603b83ba

Charlene Liu authored Oct 24, 2017

Signed-off-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

603b83ba

drm/amd/display: fix bug from last commit for hubbub · 75dbba34

Yue Hin Lau authored Oct 24, 2017

fix memory leak
Signed-off-by: Yue Hin Lau <Yuehin.Lau@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

75dbba34

drm/amd/display: Move hdr_metadata from plane to stream · 56ef6ed9

Anthony Koo authored Oct 23, 2017

Need to move HDR Metadata from Surface to Stream since there is only one
infoframe possible per stream.

Also cleaning up some duplicate definitions.
Signed-off-by: Anthony Koo <anthony.koo@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

56ef6ed9

drm/amd/display: Apply VQ adjustments in MPO case · de4a2967

SivapiriyanKumarasamy authored Oct 19, 2017

Signed-off-by: SivapiriyanKumarasamy <sivapiriyan.kumarasamy@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

de4a2967

drm/amd/display: create new structure for hubbub · c9ef081d

Yue Hin Lau authored Oct 23, 2017

instantiating new structure hubbub in resource.c
Signed-off-by: Yue Hin Lau <Yuehin.Lau@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c9ef081d

drm/amd/display: dal 3.1.09 · bcb40a67

Tony Cheng authored Oct 21, 2017

Signed-off-by: Tony Cheng <tony.cheng@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

bcb40a67

drm/amd/display: Added disconnect dchub. · 1dbac201

Yongqiang Sun authored Oct 21, 2017

Add disable ttu interface to dcn10, when remove
mpc, disable ttu as well.
Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1dbac201

drm/amd/display: dal 3.1.08 · d75aee4b

Tony Cheng authored Oct 20, 2017

Signed-off-by: Tony Cheng <tony.cheng@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d75aee4b

drm/amd/display: Not reset front end when program back end. · 74707de3

Yongqiang Sun authored Oct 17, 2017

Since front end is programmed before back end programming,
no need to reset front end in back end programming.
Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

74707de3

drm/amd/display: Power down front end in init_hw. · 7a5086a7

Yongqiang Sun authored Oct 20, 2017

front end is initialized during init_hw, but not
power gated. There are some left over valuse and will
cause some diags test failed. Power gated all front
end pipes will make sure every test has same starting
point.
Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7a5086a7

drm/amd/display: Reject PPLib clock values if they are invalid · 00893681

Andrew Jiang authored Oct 19, 2017

We should be sticking with the default clock values if the values
obtained from PPLib are bogus.
Signed-off-by: Andrew Jiang <Andrew.Jiang@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

00893681

drm/amd/display: create new files for hubbub functions · 62d591a8

Yue Hin Lau authored Oct 18, 2017

moving hubbub functions to new file
Signed-off-by: Yue Hin Lau <Yuehin.Lau@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

62d591a8

drm/amd/display: Complete TODO item: use new DRM iterator · e1fc2dca

Leo (Sunpeng) Li authored Oct 18, 2017

Abandon new_crtcs array and use for_each_new iterator to acquire new
crtcs.
Signed-off-by: Leo (Sunpeng) Li <sunpeng.li@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e1fc2dca

drm/amd/display: Fix styling of freesync code in commit_tail · 8b8f27f9

Leo (Sunpeng) Li authored Oct 18, 2017

For better readability.
Signed-off-by: Leo (Sunpeng) Li <sunpeng.li@amd.com>
Reviewed-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8b8f27f9

drm/amdgpu: move GART recovery into GTT manager v2 · c1c7ce8f

Christian König authored Oct 16, 2017

The GTT manager handles the GART address space anyway, so it is
completely pointless to keep the same information around twice.

v2: rebased
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c1c7ce8f

drm/amdgpu: nuke amdgpu_ttm_is_bound() v2 · 3da917b6

Christian König authored Oct 27, 2017

Rename amdgpu_gtt_mgr_is_allocated() to amdgpu_gtt_mgr_has_gart_addr() and use
that instead.

v2: rename the function as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3da917b6

drm/amdgpu:fix random missing of FLR NOTIFY · 34a4d2bf

Monk Liu authored Oct 24, 2017

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

34a4d2bf

drm/amdgpu/sriov:fix memory leak in psp_load_fw · 77a3c96b

Monk Liu authored Sep 19, 2017

for SR-IOV when doing gpu reset this routine shouldn't do
resource allocating otherwise memory leak
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

77a3c96b

drm/amdgpu:cleanup ucode_init_bo · 503846e0

Monk Liu authored Oct 17, 2017

1,no sriov check since gpu recover is unified
2,need CPU_ACCESS_REQUIRED flag for VRAM if SRIOV
because otherwise after following PIN the first allocated
VRAM bo is wasted due to some TTM mgr reason.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

503846e0

drm/amdgpu:cleanup in_sriov_reset and lock_reset · 13a752e3

Monk Liu authored Oct 17, 2017

since now gpu reset is unified with gpu_recover
for both bare-metal and SR-IOV:

1)rename in_sriov_reset to in_gpu_reset
2)move lock_reset from adev->virt to adev
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

13a752e3

drm/amdgpu:implement new GPU recover(v3) · 5740682e

Monk Liu authored Oct 25, 2017

1,new imple names amdgpu_gpu_recover which gives more hint
on what it does compared with gpu_reset

2,gpu_recover unify bare-metal and SR-IOV, only the asic reset
part is implemented differently

3,gpu_recover will increase hang job karma and mark its entity/context
as guilty if exceeds limit

V2:

4,in scheduler main routine the job from guilty context  will be immedialy
fake signaled after it poped from queue and its fence be set with
"-ECANCELED" error

5,in scheduler recovery routine all jobs from the guilty entity would be
dropped

6,in run_job() routine the real IB submission would be skipped if @skip parameter
equales true or there was VRAM lost occured.

V3:

7,replace deprecated gpu reset, use new gpu recover
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5740682e

amd/scheduler:imple job skip feature(v3) · 48f05f29

Monk Liu authored Oct 25, 2017

jobs are skipped under two cases
1)when the entity behind this job marked guilty, the job
poped from this entity's queue will be dropped in sched_main loop.

2)in job_recovery(), skip the scheduling job if its karma detected
above limit, and also skipped as well for other jobs sharing the
same fence context. this approach is becuase job_recovery() cannot
access job->entity due to entity may already dead.

v2:
some logic fix

v3:
when entity detected guilty, don't drop the job in the poping
stage, instead set its fence error as -ECANCELED

in run_job(), skip the scheduling either:1) fence->error < 0
or 2) there was a VRAM LOST occurred on this job.
this way we can unify the job skipping logic.

with this feature we can introduce new gpu recover feature.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

48f05f29