- 21 Apr, 2021 40 commits
-
-
Philip Yang authored
amdgpu_gmc_get_vm_pte use bo_va->is_xgmi same hive information to set pte flags to update GPU mapping. Add local structure variable bo_va, and update bo_va.is_xgmi, pass it to mapping->bo_va while mapping to GPU. Assuming xgmi pstate is hi after boot. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
If svm range perfetch location is not zero, use TTM to alloc amdgpu_bo vram nodes to validate svm range, then map vram nodes to GPUs. Use offset to sub allocate from the same amdgpu_bo to handle overlap vram range while adding new range or unmapping range. svm_bo has ref count to trace the shared ranges. If all ranges of shared amdgpu_bo are migrated to ram, ref count becomes 0, then amdgpu_bo is released, all ranges svm_bo is set to NULL. To migrate range from ram back to vram, allocate the same amdgpu_bo with previous offset if the range has svm_bo. If prange migrate to VRAM, no CPU mapping exist, then process exit will not have unmap callback for this prange to free prange and svm bo. Free outstanding pranges from svms list before process is freed in svm_range_list_fini. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Philip Yang authored
HMM migration alloc sizeof(struct page) on system memory for each VRAM page, it is 1GB system memory reserved for 64GB VRAM. To avoid application OOM, increase system memory used size based on VRAM size of all GPUs, then application alloc memory will fail if system memory usage reach the limit. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Philip Yang authored
Register vram memory as MEMORY_DEVICE_PRIVATE type resource, to allocate vram backing pages for page migration. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Sierra authored
Xnack retries are used for page fault recovery. Some AMD chip families support continuously retry while page table entries are invalid. The driver must handle the page fault interrupt and fill in a valid entry for the GPU to continue. This ioctl allows to enable/disable XNACK retries per KFD process. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Sierra authored
XNACK mode controls the SQ RETRY_DISABLE setting that determines, whether recoverable page faults can be supported on GFXv9 hardware. Only on Aldebaran we can support different processes running with different XNACK modes. On older chips all processes must use the same RETRY_DISABLE setting. However, processes not relying on recoverable page faults can work with RETRY enabled. This means XNACK off is always available as a fallback so we can use the same mode on all GPUs in a process. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
This is needed to allow per-process XNACK mode selection in the SQ when booting with XNACK off by default. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
HMM interval notifier callback notify CPU page table will be updated, stop process queues if the updated address belongs to svm range registered in process svms objects tree. Scheduled restore work to update GPU page table using new pages address in the updated svm range. The restore worker flushes any deferred work to make sure it restores an up-to-date svm_range_list. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
Use amdgpu_vm_bo_update_mapping to update GPU page table to map or unmap svm range system memory pages address to GPUs. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Philip Yang authored
It will be used by kfd to map svm range to GPU, because svm range does not have amdgpu_bo and bo_va, cannot use amdgpu_bo_update interface, use amdgpu vm update interface directly. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Philip Yang authored
When application explicitly call unmap or unmap from mmput when application exit, driver will receive MMU_NOTIFY_UNMAP event to remove svm range from process svms object tree and list first, unmap from GPUs (in the following patch). Split the svm ranges to handle partial unmapping of svm ranges. To avoid deadlocks, updating MMU notifiers, range lists and interval trees is done in a deferred worker. New child ranges are attached to their parent range's child_list until the worker can update the svm_range_list. svm_range_set_attr flushes deferred work and takes the mmap_write_lock to guarantee that it has an up-to-date svm_range_list. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Philip Yang authored
Use HMM to get system memory pages address, which will be used to map to GPUs or migrate to vram. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Philip Yang authored
For larger range allocation, if hmm_range_fault return -EBUSY, set retry timeout based on 1 second for every 512MB, this is safe timeout value. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Philip Yang authored
Move the HMM get pages function from amdgpu_ttm and to amdgpu_mn. This common function will be used by new svm APIs. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Philip Yang authored
Get the intersection of attributes over all memory in the given range Signed-off-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Philip Yang authored
svm range structure stores the range start address, size, attributes, flags, prefetch location and gpu bitmap which indicates which GPU this range maps to. Same virtual address is shared by CPU and GPUs. Process has svm range list which uses both interval tree and list to store all svm ranges registered by the process. Interval tree is used by GPU vm fault handler and CPU page fault handler to get svm range structure from the specific address. List is used to scan all ranges in eviction restore work. No overlap range interval [start, last] exist in svms object interval tree. If process registers new range which has overlap with old range, the old range split into 2 ranges depending on the overlap happens at head or tail part of old range. Apply attributes preferred location, prefetch location, mapping flags, migration granularity to svm range, store mapping gpu index into bitmap. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Philip Yang authored
Add svm (shared virtual memory) ioctl data structure and API definition. The svm ioctl API is designed to be extensible in the future. All operations are provided by a single IOCTL to preserve ioctl number space. The arguments structure ends with a variable size array of attributes that can be used to set or get one or multiple attributes. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Sierra authored
svm range uses gpu bitmap to store which GPU svm range maps to. Application pass driver gpu id to specify GPU, the helper is needed to convert gpu id to gpu bitmap idx. Access through kfd_process_device pointers array from kfd_process. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
This shortcut is no longer needed with access managed properly by KFD. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
DRM render node file handles are used for CPU mapping of BOs using mmap by the Thunk. It uses the DRM render node of the GPU where the BO was allocated. DRM allows mmap access automatically when it creates a GEM handle for a BO. KFD BOs don't have GEM handles, so KFD needs to manage access manually. Use drm_vma_node_allow to allow user mode to mmap BOs allocated with kfd_ioctl_alloc_memory_of_gpu through the DRM render node that was used in the kfd_ioctl_acquire_vm call for the same GPU. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu needs the drm_priv to allow mmap to access the BO through the corresponding file descriptor. The VM can also be extracted from drm_priv, so drm_priv can replace the vm parameter in the kfd2kgd interface. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
Aldebaran has a hw fix so no longer requires the workaround. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Jinzhou Su authored
The buffer of SA bo will be used by many cases. So it's better to invalidate the cache of indirect buffer allocated by SA before commit the IB. Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Mukul Joshi authored
Fix the following issues with SDMA RAS error reporting: 1. Read the EDC_COUNTER2 register also to fetch error counts for all sub-blocks in SDMA. 2. SDMA RAS on Aldebaran suports single-bit uncorrectable errors only. So, report error count in UE count instead of CE count. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-By: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Mukul Joshi authored
Reset the RAS error count and error status registers after reading to prevent over reporting error counts on Aldebaran. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-By: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Oak Zeng authored
This reverts commit 2f055097. 2f055097 was a driver workaround when PSP firmware was not ready. Now the PSP fw is ready so we revert this driver workaround. Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Aric Cyr authored
Signed-off-by: Aric Cyr <aric.cyr@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Anthony Koo authored
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Bing Guo authored
[Why] Some MST devices uses different method to enable mst specific stream features. [How] Add dm_helpers_mst_enable_stream features. This can be modified later when we are ready to implement those features. Signed-off-by: Bing Guo <bing.guo@amd.com> Reviewed-by: Anthony Koo <Anthony.Koo@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Dillon Varone authored
[Why?] When a monitor does not set both QS and QY bits, DC does not set Q0, Q1, QY0 and QY1 bits in AVI infoframe. Setting RGB bits should be separate from setting YCC bits. [How?] Separate logic for setting RGB and YCC quantization range bits in the AVI infoframe. Signed-off-by: Dillon Varone <dillon.varone@amd.com> Reviewed-by: Chris Park <Chris.Park@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Dillon Varone authored
[Why & How?] Call to dc_dsc_compute_bandwidth_range should have min and max bpp in 16ths of a bit. Multiply min and max bpp from policy. Signed-off-by: Dillon Varone <dillon.varone@amd.com> Reviewed-by: Eryk Brol <Eryk.Brol@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
David Galiffi authored
[How & Why] Changed "prsent" to "present". Signed-off-by: David Galiffi <David.Galiffi@amd.com> Reviewed-by: Chris Park <Chris.Park@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Nicholas Kazlauskas authored
[Why] Requirement from the spec - we shouldn't be potentially exiting out early based on encryption status. [How] Drop the calls from HDCP1 and HDCP2 execution that exit out early based on link encryption status. Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Michael Strauss authored
[Why&How] Add logs to verify ILR optimization behaviour on boot Signed-off-by: Michael Strauss <michael.strauss@amd.com> Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Wesley Chalmers authored
[WHY] While Link Training is being performed, and the LTTPRs are in Non-LTTPR or LTTPR Transparent mode, any DPCD registers besides those used for Link Training are not to be accessed. The spec defines the link training registers as DP_TRAINING_PATTERN_SET (102h) to DP_TRAINING_LANE3_SET (106h), and DP_LANE0_1_STATUS (202h) to DP_ADJUST_REQUEST_LANE2_3 (207h). [HOW] Move the current write to DPCD Address DP_LINK_TRAINING_PATTERN_SET out of its conditional block. Signed-off-by: Wesley Chalmers <Wesley.Chalmers@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Aric Cyr authored
[WHY] We should skip programming manual trigger on non-primary planes when MPO is enabled. [HOW] Implement an explicit mechanism for skipping manual trigger programming for planes that shouldn't cause the frame to end. Signed-off-by: Aric Cyr <aric.cyr@amd.com> Reviewed-by: Anthony Koo <Anthony.Koo@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Acked-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hugo Hu authored
Previous change had been reverted since it caused hang. Remake change to avoid defect. [Why] 1. Driver use umachannelnumber to calculate watermarks for stutter. In asymmetric memory config, the actual bandwidth is less than dual-channel. The bandwidth should be the same as single-channel. 2. We found single rank dimm need additional delay time for stutter. [How] Get information from each DIMM. Treat memory config as a single-channel for asymmetric memory in bandwidth calculating. Add additional delay time for single rank dimm. Fixes: b8720ed0 ("drm/amd/display: System black screen hangs on driver load") Signed-off-by: Hugo Hu <hugo.hu@amd.com> Reviewed-by: Sung Lee <Sung.Lee@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Robin Singh authored
[Why] Found that dc_link_reallocate_mst_payload is not used anymore in any of the use case scenario. [How] removed dc_link_reallocate_mst_payload function definition and declaration. Signed-off-by: Robin Singh <robin.singh@amd.com> Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Anthony Wang authored
[Why] Primary feature use case is with eDP panels. [How] Fail seamless boot validation if display is not an eDP panel. Signed-off-by: Anthony Wang <anthony1.wang@amd.com> Reviewed-by: Martin Leung <Martin.Leung@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Dingchen (David) Zhang authored
[why] the current implementation of hdcp2 rx id list validation does not have handler/checker for invalid message status, e.g. HMAC, the V parameter calculated from PSP not matching the V prime from Rx. [how] return a generic FAILURE for any message status not SUCCESS or REVOKED. Signed-off-by: Dingchen (David) Zhang <dingchen.zhang@amd.com> Reviewed-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-