1. 15 Sep, 2020 1 commit
    • Dennis Li's avatar
      drm/kfd: fix a system crash issue during GPU recovery · 66a5710b
      Dennis Li authored
      The crash log as the below:
      
      [Thu Aug 20 23:18:14 2020] general protection fault: 0000 [#1] SMP NOPTI
      [Thu Aug 20 23:18:14 2020] CPU: 152 PID: 1837 Comm: kworker/152:1 Tainted: G           OE     5.4.0-42-generic #46~18.04.1-Ubuntu
      [Thu Aug 20 23:18:14 2020] Hardware name: GIGABYTE G482-Z53-YF/MZ52-G40-00, BIOS R12 05/13/2020
      [Thu Aug 20 23:18:14 2020] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
      [Thu Aug 20 23:18:14 2020] RIP: 0010:evict_process_queues_cpsch+0xc9/0x130 [amdgpu]
      [Thu Aug 20 23:18:14 2020] Code: 49 8d 4d 10 48 39 c8 75 21 eb 44 83 fa 03 74 36 80 78 72 00 74 0c 83 ab 68 01 00 00 01 41 c6 45 41 00 48 8b 00 48 39 c8 74 25 <80> 78 70 00 c6 40 6d 01 74 ee 8b 50 28 c6 40 70 00 83 ab 60 01 00
      [Thu Aug 20 23:18:14 2020] RSP: 0018:ffffb29b52f6fc90 EFLAGS: 00010213
      [Thu Aug 20 23:18:14 2020] RAX: 1c884edb0a118914 RBX: ffff8a0d45ff3c00 RCX: ffff8a2d83e41038
      [Thu Aug 20 23:18:14 2020] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8a0e2e4178c0
      [Thu Aug 20 23:18:14 2020] RBP: ffffb29b52f6fcb0 R08: 0000000000001b64 R09: 0000000000000004
      [Thu Aug 20 23:18:14 2020] R10: ffffb29b52f6fb78 R11: 0000000000000001 R12: ffff8a0d45ff3d28
      [Thu Aug 20 23:18:14 2020] R13: ffff8a2d83e41028 R14: 0000000000000000 R15: 0000000000000000
      [Thu Aug 20 23:18:14 2020] FS:  0000000000000000(0000) GS:ffff8a0e2e400000(0000) knlGS:0000000000000000
      [Thu Aug 20 23:18:14 2020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [Thu Aug 20 23:18:14 2020] CR2: 000055c783c0e6a8 CR3: 00000034a1284000 CR4: 0000000000340ee0
      [Thu Aug 20 23:18:14 2020] Call Trace:
      [Thu Aug 20 23:18:14 2020]  kfd_process_evict_queues+0x43/0xd0 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  kfd_suspend_all_processes+0x60/0xf0 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  kgd2kfd_suspend.part.7+0x43/0x50 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  kgd2kfd_pre_reset+0x46/0x60 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  amdgpu_amdkfd_pre_reset+0x1a/0x20 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  amdgpu_device_gpu_recover+0x377/0xf90 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  amdgpu_ras_do_recovery+0x159/0x190 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  process_one_work+0x20f/0x400
      [Thu Aug 20 23:18:14 2020]  worker_thread+0x34/0x410
      
      When GPU hang, user process will fail to create a compute queue whose
      struct object will be freed later, but driver wrongly add this queue to
      queue list of the proccess. And then kfd_process_evict_queues will
      access a freed memory, which cause a system crash.
      
      v2:
      The failure to execute_queues should probably not be reported to
      the caller of create_queue, because the queue was already created.
      Therefore change to ignore the return value from execute_queues.
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarDennis Li <Dennis.Li@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      66a5710b
  2. 10 Sep, 2020 2 commits
  3. 09 Sep, 2020 1 commit
  4. 08 Sep, 2020 5 commits
  5. 07 Sep, 2020 1 commit
  6. 06 Sep, 2020 4 commits
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block · a8205e31
      Linus Torvalds authored
      Pull more io_uring fixes from Jens Axboe:
       "Two followup fixes. One is fixing a regression from this merge window,
        the other is two commits fixing cancelation of deferred requests.
      
        Both have gone through full testing, and both spawned a few new
        regression test additions to liburing.
      
         - Don't play games with const, properly store the output iovec and
           assign it as needed.
      
         - Deferred request cancelation fix (Pavel)"
      
      * tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block:
        io_uring: fix linked deferred ->files cancellation
        io_uring: fix cancel of deferred reqs with ->files
        io_uring: fix explicit async read/write mapping for large segments
      a8205e31
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 2ccdd9f8
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - three Intel VT-d fixes to fix address handling on 32bit, fix a NULL
         pointer dereference bug and serialize a hardware register access as
         required by the VT-d spec.
      
       - two patches for AMD IOMMU to force AMD GPUs into translation mode
         when memory encryption is active and disallow using IOMMUv2
         functionality.  This makes the AMDGPU driver work when memory
         encryption is active.
      
       - two more fixes for AMD IOMMU to fix updating the Interrupt Remapping
         Table Entries.
      
       - MAINTAINERS file update for the Qualcom IOMMU driver.
      
      * tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/vt-d: Handle 36bit addressing for x86-32
        iommu/amd: Do not use IOMMUv2 functionality when SME is active
        iommu/amd: Do not force direct mapping when SME is active
        iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE
        iommu/amd: Restore IRTE.RemapEn bit after programming IRTE
        iommu/vt-d: Fix NULL pointer dereference in dev_iommu_priv_set()
        iommu/vt-d: Serialize IOMMU GCMD register modifications
        MAINTAINERS: Update QUALCOMM IOMMU after Arm SMMU drivers move
      2ccdd9f8
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 015b3155
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
      
       - more generic entry code ABI fallout
      
       - debug register handling bugfixes
      
       - fix vmalloc mappings on 32-bit kernels
      
       - kprobes instrumentation output fix on 32-bit kernels
      
       - fix over-eager WARN_ON_ONCE() on !SMAP hardware
      
       - NUMA debugging fix
      
       - fix Clang related crash on !RETPOLINE kernels
      
      * tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/entry: Unbreak 32bit fast syscall
        x86/debug: Allow a single level of #DB recursion
        x86/entry: Fix AC assertion
        tracing/kprobes, x86/ptrace: Fix regs argument order for i386
        x86, fakenuma: Fix invalid starting node ID
        x86/mm/32: Bring back vmalloc faulting on x86_32
        x86/cmdline: Disable jump tables for cmdline.c
      015b3155
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 68beef57
      Linus Torvalds authored
      Pull xen updates from Juergen Gross:
       "A small series for fixing a problem with Xen PVH guests when running
        as backends (e.g. as dom0).
      
        Mapping other guests' memory is now working via ZONE_DEVICE, thus not
        requiring to abuse the memory hotplug functionality for that purpose"
      
      * tag 'for-linus-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen: add helpers to allocate unpopulated memory
        memremap: rename MEMORY_DEVICE_DEVDAX to MEMORY_DEVICE_GENERIC
        xen/balloon: add header guard
      68beef57
  7. 05 Sep, 2020 26 commits