Commits · c7b0f71237af7e4ceb7bf723cf96b87178c00e54 · Kirill Smelkov / linux

05 Mar, 2019 2 commits

drm/amd/display: Add disable triple buffering DC debug option · c7b0f712

Charlene Liu authored Jan 31, 2019

Added a "disable_tri_buf" DC debug option. When set to 1  feature will
be off.
Signed-off-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c7b0f712

drm/amd/display: Use vrr friendly pageflip throttling in DC. · 7b19bba5

Mario Kleiner authored Feb 09, 2019

In VRR mode, keep track of the vblank count of the last
completed pageflip in amdgpu_crtc->last_flip_vblank, as
recorded in the pageflip completion handler after each
completed flip.

Use that count to prevent mmio programming a new pageflip
within the same vblank in which the last pageflip completed,
iow. to throttle pageflips to at most one flip per video
frame, while at the same time allowing to request a flip
not only before start of vblank, but also anywhere within
vblank.

The old logic did the same, and made sense for regular fixed
refresh rate flipping, but in vrr mode it prevents requesting
a flip anywhere inside the possibly huge vblank, thereby
reducing framerate in vrr mode instead of improving it, by
delaying a slightly delayed flip requests up to a maximum
vblank duration + 1 scanout duration. This would limit VRR
usefulness to only help applications with a very high GPU
demand, which can submit the flip request before start of
vblank, but then have to wait long for fences to complete.

With this method a flip can be both requested and - after
fences have completed - executed, ie. it doesn't matter if
the request (amdgpu_dm_do_flip()) gets delayed until deep
into the extended vblank due to cpu execution delays. This
also allows clients which want to regulate framerate within
the vrr range a much more fine-grained control of flip timing,
a feature that might be useful for video playback, and is
very useful for neuroscience/vision research applications.

In regular non-VRR mode, retain the old flip submission
behavior. This to keep flip scheduling for fullscreen X11/GLX
OpenGL clients intact, if they use the GLX_OML_sync_control
extensions glXSwapBufferMscOML(, ..., target_msc,...) function
with a specific target_msc target vblank count.

glXSwapBuffersMscOML() or DRI3/Present PresentPixmap() will
not flip at the proper target_msc for a non-zero target_msc
if VRR mode is active with this patch. They'd often flip one
frame too early. However, this limitation should not matter
much in VRR mode, as scheduling based on vblank counts is
pretty futile/unusable under variable refresh duration
anyway, so no real extra harm is done.

According to some testing already done with this patch by
Nicholas on top of my tests, IGT tests didn't report any
problems. If fixes stuttering and flickering when flipping
at rates below the minimum vrr refresh rate.

Fixes: bb47de73 ("drm/amdgpu: Set FreeSync state using drm VRR
properties")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Michel Dänzer <michel@daenzer.net>
Tested-by: Bruno Filipe <bmilreu@gmail.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

7b19bba5

28 Feb, 2019 17 commits

drm/amdgpu: clear PDs/PTs only after initializing them · 1e293037

Christian König authored Jan 30, 2019

Clear the VM PDs/PTs only after initializing all the structures.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1e293037

drm/amd/display: Pass app_tf by value rather than by reference · 672e78ca

Nathan Chancellor authored Dec 10, 2018

Clang warns when an expression that equals zero is used as a null
pointer constant (in lieu of NULL):

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:4435:3:
warning: expression which evaluates to zero treated as a null pointer
constant of type 'const enum color_transfer_func *'
[-Wnon-literal-null-conversion]
                TRANSFER_FUNC_UNKNOWN,
                ^~~~~~~~~~~~~~~~~~~~~
1 warning generated.

This warning is caused by commit bb47de73 ("drm/amdgpu: Set FreeSync
state using drm VRR properties") and it could be solved by using NULL
instead of TRANSFER_FUNC_UNKNOWN or casting TRANSFER_FUNC_UNKNOWN as a
pointer. However, after looking into it, there doesn't appear to be a
good reason to pass app_tf by reference as it is never mutated along the
way. This is the only code path in which app_tf is used:

mod_freesync_build_vrr_infopacket ->
    build_vrr_infopacket_v2 ->
        build_vrr_infopacket_fs2_data

Neither mod_freesync_build_vrr_infopacket or build_vrr_infopacket_v2
modify app_tf's value and build_vrr_infopacket_fs2_data expects just
the value so we can avoid dereferencing anything by just passing in
app_tf's value to mod_freesync_build_vrr_infopacket and
build_vrr_infopacket_v2.

There is no functional change because build_vrr_infopacket_fs2_data
doesn't do anything if TRANSFER_FUNC_UNKNOWN is passed to it, the same
as not calling build_vrr_infopacket_fs2_data at all like before this
change when NULL was used for app_tf.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

672e78ca

Revert "drm/amdgpu: use BACO reset on vega20 if platform support" · 7db329e5

Candice Li authored Feb 25, 2019

This reverts commit 2172b89e.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7db329e5

drm/amd/powerplay: show the right override pcie parameters · 084a56c7

Evan Quan authored Feb 21, 2019

Instead of the hard-coded ones from VBIOS.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

084a56c7

drm/amd/powerplay: honor the OD settings · 65543b28

Evan Quan authored Feb 20, 2019

Set the soft/hard max settings as max possible to
not violate the OD settings.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

65543b28

drm/amd/powerplay: set default fclk for no fclk dpm support case · f5e79735

Evan Quan authored Feb 20, 2019

Set the default fclk as what we got from VBIOS.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f5e79735

drm/amd/powerplay: support retrieving clock information from other sysplls · 2e41a874

Evan Quan authored Feb 20, 2019

There will be some needs to retrieve clock information from other
sysplls also except default 0.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2e41a874

drm/amd/powerplay: overwrite ODSettingsMin for UCLK_FMAX feature · 3a301bc5

Evan Quan authored Feb 20, 2019

For UCLK_FMAX OD feature, SMU overwrites the highest UCLK DPM level freq.
Therefore it can only take values that are greater than the second highest
DPM level freq.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3a301bc5

drm/amd/powerplay: force FCLK to highest also for 5K or higher displays · 971e7ac1

Evan Quan authored Feb 20, 2019

This can fix possible screen freeze on high resolution displays.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

971e7ac1

drm/amd/powerplay: need to reapply the dpm level settings · d19e9233

Evan Quan authored Feb 20, 2019

As these settings got reset during above phm_apply_clock_adjust_rules.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d19e9233

drm/amd/powerplay: drop redundant soft min/max settings · fe1331a2

Evan Quan authored Feb 20, 2019

As these are already set during apply_clocks_adjust_rules.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

fe1331a2

drm/amdkfd: use init_mqd function to allocate object for hid_mqd (CI) · cac734c2

Kevin Wang authored Feb 22, 2019

if use the legacy method to allocate object, when mqd_hiq need to run
uninit code, it will be cause WARNING call trace.

eg: (s3 suspend test)
[   34.918944] Call Trace:
[   34.918948]  [<ffffffff92961dc1>] dump_stack+0x19/0x1b
[   34.918950]  [<ffffffff92297648>] __warn+0xd8/0x100
[   34.918951]  [<ffffffff9229778d>] warn_slowpath_null+0x1d/0x20
[   34.918991]  [<ffffffffc03ce1fe>] uninit_mqd_hiq_sdma+0x4e/0x50 [amdgpu]
[   34.919028]  [<ffffffffc03d0ef7>] uninitialize+0x37/0xe0 [amdgpu]
[   34.919064]  [<ffffffffc03d15a6>] kernel_queue_uninit+0x16/0x30 [amdgpu]
[   34.919086]  [<ffffffffc03d26c2>] pm_uninit+0x12/0x20 [amdgpu]
[   34.919107]  [<ffffffffc03d4915>] stop_nocpsch+0x15/0x20 [amdgpu]
[   34.919129]  [<ffffffffc03c1dce>] kgd2kfd_suspend.part.4+0x2e/0x50 [amdgpu]
[   34.919150]  [<ffffffffc03c2667>] kgd2kfd_suspend+0x17/0x20 [amdgpu]
[   34.919171]  [<ffffffffc03c103a>] amdgpu_amdkfd_suspend+0x1a/0x20 [amdgpu]
[   34.919187]  [<ffffffffc02ec428>] amdgpu_device_suspend+0x88/0x3a0 [amdgpu]
[   34.919189]  [<ffffffff922e22cf>] ? enqueue_entity+0x2ef/0xbe0
[   34.919205]  [<ffffffffc02e8220>] amdgpu_pmops_suspend+0x20/0x30 [amdgpu]
[   34.919207]  [<ffffffff925c56ff>] pci_pm_suspend+0x6f/0x150
[   34.919208]  [<ffffffff925c5690>] ? pci_pm_freeze+0xf0/0xf0
[   34.919210]  [<ffffffff926b45c6>] dpm_run_callback+0x46/0x90
[   34.919212]  [<ffffffff926b49db>] __device_suspend+0xfb/0x2a0
[   34.919213]  [<ffffffff926b4b9f>] async_suspend+0x1f/0xa0
[   34.919214]  [<ffffffff922c918f>] async_run_entry_fn+0x3f/0x130
[   34.919216]  [<ffffffff922b9d4f>] process_one_work+0x17f/0x440
[   34.919217]  [<ffffffff922bade6>] worker_thread+0x126/0x3c0
[   34.919218]  [<ffffffff922bacc0>] ? manage_workers.isra.25+0x2a0/0x2a0
[   34.919220]  [<ffffffff922c1c31>] kthread+0xd1/0xe0
[   34.919221]  [<ffffffff922c1b60>] ? insert_kthread_work+0x40/0x40
[   34.919222]  [<ffffffff92974c1d>] ret_from_fork_nospec_begin+0x7/0x21
[   34.919224]  [<ffffffff922c1b60>] ? insert_kthread_work+0x40/0x40
[   34.919224] ---[ end trace 38cd9f65c963adad ]---
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

cac734c2

drm/amdgpu: use REG32_PCIE wrapper instead for psp · 76f8f699

Huang Rui authored Feb 25, 2019

This patch uses REG32_PCIE wrapper instead of writting pci_index2 and reading
pci_data2 for psp. This sequence should be protected by pcie_idx_lock.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

76f8f699

drm/amd/powerplay: use REG32_PCIE wrapper instead for powerplay · 5307db85

Huang Rui authored Feb 25, 2019

This patch uses REG32_PCIE wrapper instead of writting pci_index2 and reading
pci_data2 for powerplay. This sequence should be protected by pcie_idx_lock.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5307db85

drm/amd/display: Fix issue with link_active state not correct for MST · 293b9160

Anthony Koo authored Feb 07, 2019

[Why]
For MST, link not disabled until all streams disabled

[How]
Add check for stream_count before setting link_active = false for MST
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

293b9160

drm/amd/display: Fix reference counting for struct dc_sink. · dcd5fb82

Mathias Fröhlich authored Feb 10, 2019

Reference counting in amdgpu_dm_connector for amdgpu_dm_connector::dc_sink
and amdgpu_dm_connector::dc_em_sink as well as in dc_link::local_sink seems
to be out of shape. Thus make reference counting consistent for these
members and just plain increment the reference count when the variable
gets assigned and decrement when the pointer is set to zero or replaced.
Also simplify reference counting in selected function sopes to be sure the
reference is released in any case. In some cases add NULL pointer check
before dereferencing.
At a hand full of places a comment is placed to stat that the reference
increment happened already somewhere else.

This actually fixes the following kernel bug on my system when enabling
display core in amdgpu. There are some more similar bug reports around,
so it probably helps at more places.

   kernel BUG at mm/slub.c:294!
   invalid opcode: 0000 [#1] SMP PTI
   CPU: 9 PID: 1180 Comm: Xorg Not tainted 5.0.0-rc1+ #2
   Hardware name: Supermicro X10DAi/X10DAI, BIOS 3.0a 02/05/2018
   RIP: 0010:__slab_free+0x1e2/0x3d0
   Code: 8b 54 24 30 48 89 4c 24 28 e8 da fb ff ff 4c 8b 54 24 28 85 c0 0f 85 67 fe ff ff 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 49 3b 5c 24 28 75 ab 48 8b 44 24 30 49 89 4c 24 28 49 89 44
   RSP: 0018:ffffb0978589fa90 EFLAGS: 00010246
   RAX: ffff92f12806c400 RBX: 0000000080200019 RCX: ffff92f12806c400
   RDX: ffff92f12806c400 RSI: ffffdd6421a01a00 RDI: ffff92ed2f406e80
   RBP: ffffb0978589fb40 R08: 0000000000000001 R09: ffffffffc0ee4748
   R10: ffff92f12806c400 R11: 0000000000000001 R12: ffffdd6421a01a00
   R13: ffff92f12806c400 R14: ffff92ed2f406e80 R15: ffffdd6421a01a20
   FS:  00007f4170be0ac0(0000) GS:ffff92ed2fb40000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 0000562818aaa000 CR3: 000000045745a002 CR4: 00000000003606e0
   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   Call Trace:
    ? drm_dbg+0x87/0x90 [drm]
    dc_stream_release+0x28/0x50 [amdgpu]
    amdgpu_dm_connector_mode_valid+0xb4/0x1f0 [amdgpu]
    drm_helper_probe_single_connector_modes+0x492/0x6b0 [drm_kms_helper]
    drm_mode_getconnector+0x457/0x490 [drm]
    ? drm_connector_property_set_ioctl+0x60/0x60 [drm]
    drm_ioctl_kernel+0xa9/0xf0 [drm]
    drm_ioctl+0x201/0x3a0 [drm]
    ? drm_connector_property_set_ioctl+0x60/0x60 [drm]
    amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
    do_vfs_ioctl+0xa4/0x630
    ? __sys_recvmsg+0x83/0xa0
    ksys_ioctl+0x60/0x90
    __x64_sys_ioctl+0x16/0x20
    do_syscall_64+0x5b/0x160
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
   RIP: 0033:0x7f417110809b
   Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd bd 0c 00 f7 d8 64 89 01 48
   RSP: 002b:00007ffdd8d1c268 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
   RAX: ffffffffffffffda RBX: 0000562818a8ebc0 RCX: 00007f417110809b
   RDX: 00007ffdd8d1c2a0 RSI: 00000000c05064a7 RDI: 0000000000000012
   RBP: 00007ffdd8d1c2a0 R08: 0000562819012280 R09: 0000000000000007
   R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c05064a7
   R13: 0000000000000012 R14: 0000000000000012 R15: 00007ffdd8d1c2a0
   Modules linked in: nfsv4 dns_resolver nfs lockd grace fscache fuse vfat fat amdgpu intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul chash gpu_sched crc32_pclmul snd_hda_codec_realtek ghash_clmulni_intel amd_iommu_v2 iTCO_wdt iTCO_vendor_support ttm snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel drm_kms_helper snd_hda_codec intel_cstate snd_hda_core drm snd_hwdep snd_seq snd_seq_device intel_uncore snd_pcm intel_rapl_perf snd_timer snd soundcore ioatdma pcspkr intel_wmi_thunderbolt mxm_wmi i2c_i801 lpc_ich pcc_cpufreq auth_rpcgss sunrpc igb crc32c_intel i2c_algo_bit dca wmi hid_cherry analog gameport joydev

This patch is based on agd5f/drm-next-5.1-wip. This patch does not require
all of that, but agd5f/drm-next-5.1-wip contains at least one more dc_sink
counting fix that I could spot.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

dcd5fb82

drm/amdgpu/powerplay: add missing breaks in polaris10_smumgr · 6feaa419

Alex Deucher authored Feb 18, 2019

This was noticed by Gustavo and his -Wimplicit-fallthrough
patches.  However, in this case, I believe we should have breaks
rather than falling though, that said, in practice we should
never fall through in the first place so there should be no
change in behavior.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6feaa419

22 Feb, 2019 3 commits

drm/amd/powerplay: fix the confusing ppfeature mask calculations · b7d485df

Evan Quan authored Feb 19, 2019

Simplify the ppfeature mask calculations.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b7d485df

drm/powerplay: print current clock level when dpm is disabled on vg20 · 6a7a20ed

shaoyunl authored Feb 19, 2019

When DPM for the specific clock is disabled, driver should still print out
current clock info for rocm-smi support on vega20
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Eric Huang <JinhuiEric.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6a7a20ed

Merge branch 'drm-next-5.1' of git://people.freedesktop.org/~agd5f/linux into drm-next · fbac3c48

Dave Airlie authored Feb 22, 2019

Fixes for 5.1:
amdgpu:
- Fix missing fw declaration after dropping old CI DPM code
- Fix debugfs access to registers beyond the MMIO bar size
- Fix context priority handling
- Add missing license on some new files
- Various cleanups and bug fixes

radeon:
- Fix missing break in CS parser for evergreen
- Various cleanups and bug fixes

sched:
- Fix entities with 0 run queues
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190221214134.3308-1-alexander.deucher@amd.com

fbac3c48

21 Feb, 2019 4 commits

drm/amdgpu: Bump amdgpu version for context priority override. · 767e06a9

Bas Nieuwenhuizen authored Feb 03, 2019

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

767e06a9

drm/amdgpu/powerplay: fix typo in BACO header guards · f1b4ac96

Alex Deucher authored Feb 15, 2019

s/BOCO/BACO/g
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f1b4ac96

drm/amdgpu/powerplay: fix return codes in BACO code · 41d3ae4b

Alex Deucher authored Feb 15, 2019

Use a proper return code rather than -1.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

41d3ae4b

drm/amdgpu: add missing license on baco files · 94b94438

Alex Deucher authored Feb 10, 2019

Trivial.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

94b94438

20 Feb, 2019 2 commits

Merge https://gitlab.freedesktop.org/drm/msm into drm-next · a5f2fafe

Dave Airlie authored Feb 20, 2019

On the display side, cleanups and fixes to enabled modifiers
(QCOM_COMPRESSED).  And otherwise mostly misc fixes all around.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGuZ5uBKpf=fHvKpTiD10nychuEY8rnE+HeRz0QMvtY5_A@mail.gmail.com

a5f2fafe

Merge branch 'linux-5.1' of git://github.com/skeggsb/linux into drm-next · 71f4e45a

Dave Airlie authored Feb 20, 2019

Various fixes/cleanups, along with initial support for SVM features
utilising HMM address-space mirroring and device memory migration.
There's a lot more work to do in these areas, both in terms of
features and efficiency, but these can slowly trickle in later down
the track.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Ben Skeggs <skeggsb@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv5bsB4rRY1Gqa_Bp_KAd-v_q1rGZ4nYmOAQhceL0Nr-Xg@mail.gmail.com

71f4e45a

19 Feb, 2019 12 commits

drm/nouveau/dmem: use dma addresses during migration copies · a788ade4
Ben Skeggs authored Feb 15, 2019
```
Removes the need for temporary VMM mappings.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
```
a788ade4
drm/nouveau/dmem: use physical vram addresses during migration copies · fd5e9856
Ben Skeggs authored Feb 15, 2019
```
Removes the need for temporary VMM mappings.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
```
fd5e9856
drm/nouveau/dmem: extend copy function to allow direct use of physical addresses · 6c762d1b
Ben Skeggs authored Feb 15, 2019
```
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
```
6c762d1b

drm/nouveau/svm: new ioctl to migrate process memory to GPU memory · f180bf12

Jérôme Glisse authored Aug 07, 2018

This add an ioctl to migrate a range of process address space to the
device memory. On platform without cache coherent bus (x86, ARM, ...)
this means that CPU can not access that range directly, instead CPU
will fault which will migrate the memory back to system memory.

This is behind a staging flag so that we can evolve the API.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>

f180bf12

drm/nouveau/dmem: device memory helpers for SVM · 5be73b69

Jérôme Glisse authored Jul 26, 2018

Device memory can be use in SVM, in which case we do not have any of
the existing buffer object. This commit add infrastructure to allow
use of device memory without nouveau_bo. Again this is a temporary
solution until a rework of GPU memory management.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>

5be73b69

drm/nouveau/svm: initial support for shared virtual memory · eeaf06ac

Ben Skeggs authored Jul 05, 2018

This uses HMM to mirror a process' CPU page tables into a channel's page
tables, and keep them synchronised so that both the CPU and GPU are able
to access the same memory at the same virtual address.

While this code also supports Volta/Turing, it's only enabled for Pascal
GPUs currently due to channel recovery being unreliable right now on the
later GPUs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

eeaf06ac

drm/nouveau: prepare for enabling svm with existing userspace interfaces · bfe91afa

Ben Skeggs authored Feb 19, 2019

For a channel to make use of SVM features, it requires a different GPU MMU
configuration than we would normally use, which is not desirable to switch
to unless a client is actively going to use SVM.

In order to supporting SVM without more extensive changes to the userspace
interfaces, the SVM_INIT ioctl needs to replace the previous configuration
safely.

The only way we can currently do this safely, accounting for some unlikely
failure conditions, is to allocate the new VMM without destroying the last
one, and prioritising the SVM-enabled configuration in the code that cares.

This will get cleaned up again further down the track.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

bfe91afa

drm/nouveau/fault/gv100-: expose VoltaFaultBufferA · a261a20c

Ben Skeggs authored May 08, 2018

This nvclass exposes the replayable fault buffer, which will be used
by SVM to manage GPU page faults.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

a261a20c

drm/nouveau/fault/gp100: expose MaxwellFaultBufferA · 13e95729

Ben Skeggs authored May 08, 2018

This nvclass exposes the replayable fault buffer, which will be used
by SVM to manage GPU page faults.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

13e95729

drm/nouveau/mmu/gp100-: support vmms with gcc/tex replayable faults enabled · ab2ee9ff

Ben Skeggs authored May 08, 2018

Some GPU units are capable of supporting "replayable" page faults, where
the execution unit will wait for SW to fixup GPU page tables rather than
triggering a channel-fatal fault.

This feature isn't useful (it's harmful, even) unless something like HMM
is being used to manage events appearing in the replayable fault buffer,
so, it's disabled by default.

This commit allows a client to request it be enabled.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

ab2ee9ff

drm/nouveau/mmu/gp100-: add privileged methods for fault replay/cancel · 71871aa6

Ben Skeggs authored Jul 09, 2018

Host methods exist to do at least some of what we need, but we are not
currently pushing replay/cancels through a channel like UVM does as it's
not clear whether it's necessary in our case (UVM also updates PTEs with
the GPU).

UVM also pushes a software method for fault cancels on Pascal, seemingly
because the host methods don't appear to be sufficient.  If/when we want
to push the replay/cancel on the GPU, we can re-purpose the cancellation
code here to implement that swmthd.

Keep it simple for now, until we figure out exactly what we need here.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

71871aa6

drm/nouveau/mmu: add a privileged method to directly manage PTEs · a5ff307f

Ben Skeggs authored Jul 07, 2018

This provides a somewhat more direct method of manipulating the GPU page
tables, which will be required to support SVM.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

a5ff307f