1. 16 Sep, 2019 16 commits
    • Philip Yang's avatar
      drm/amdgpu: check if nbio->ras_if exist · cde85ac2
      Philip Yang authored
      To avoid NULL function pointer access. This happens on VG10, reboot
      command hangs and have to power off/on to reboot the machine. This is
      serial console log:
      
      [  OK  ] Reached target Unmount All Filesystems.
      [  OK  ] Reached target Final Step.
               Starting Reboot...
      [  305.696271] systemd-shutdown[1]: Syncing filesystems and block
      devices.
      [  306.947328] systemd-shutdown[1]: Sending SIGTERM to remaining
      processes...
      [  306.963920] systemd-journald[1722]: Received SIGTERM from PID 1
      (systemd-shutdow).
      [  307.322717] systemd-shutdown[1]: Sending SIGKILL to remaining
      processes...
      [  307.336472] systemd-shutdown[1]: Unmounting file systems.
      [  307.454202] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
      [  307.480523] systemd-shutdown[1]: All filesystems unmounted.
      [  307.486537] systemd-shutdown[1]: Deactivating swaps.
      [  307.491962] systemd-shutdown[1]: All swaps deactivated.
      [  307.497624] systemd-shutdown[1]: Detaching loop devices.
      [  307.504418] systemd-shutdown[1]: All loop devices detached.
      [  307.510418] systemd-shutdown[1]: Detaching DM devices.
      [  307.565907] sd 2:0:0:0: [sda] Synchronizing SCSI cache
      [  307.731313] BUG: kernel NULL pointer dereference, address:
      0000000000000000
      [  307.738802] #PF: supervisor read access in kernel mode
      [  307.744326] #PF: error_code(0x0000) - not-present page
      [  307.749850] PGD 0 P4D 0
      [  307.752568] Oops: 0000 [#1] SMP PTI
      [  307.756314] CPU: 3 PID: 1 Comm: systemd-shutdow Not tainted
      5.2.0-rc1-kfd-yangp #453
      [  307.764644] Hardware name: ASUS All Series/Z97-PRO(Wi-Fi ac)/USB 3.1,
      BIOS 9001 03/07/2016
      [  307.773580] RIP: 0010:soc15_common_hw_fini+0x33/0xc0 [amdgpu]
      [  307.779760] Code: 89 fb e8 60 f5 ff ff f6 83 50 df 01 00 04 75 3d 48
      8b b3 90 7d 00 00 48 c7 c7 17 b8 530
      [  307.799967] RSP: 0018:ffffac9483153d40 EFLAGS: 00010286
      [  307.805585] RAX: 0000000000000000 RBX: ffff9eb299da0000 RCX:
      0000000000000006
      [  307.813261] RDX: 0000000000000000 RSI: ffff9eb29e3508a0 RDI:
      ffff9eb29e350000
      [  307.820935] RBP: ffff9eb299da0000 R08: 0000000000000000 R09:
      0000000000000000
      [  307.828609] R10: 0000000000000000 R11: 0000000000000000 R12:
      ffff9eb299dbd1f8
      [  307.836284] R13: ffffffffc04f8368 R14: ffff9eb29cebd130 R15:
      0000000000000000
      [  307.843959] FS:  00007f06721c9940(0000) GS:ffff9eb2a18c0000(0000)
      knlGS:0000000000000000
      [  307.852663] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  307.858842] CR2: 0000000000000000 CR3: 000000081d798005 CR4:
      00000000001606e0
      [  307.866516] Call Trace:
      [  307.869169]  amdgpu_device_ip_suspend_phase2+0x80/0x110 [amdgpu]
      [  307.875654]  ? amdgpu_device_ip_suspend_phase1+0x4d/0xd0 [amdgpu]
      [  307.882230]  amdgpu_device_ip_suspend+0x2e/0x60 [amdgpu]
      [  307.887966]  amdgpu_pci_shutdown+0x2f/0x40 [amdgpu]
      [  307.893211]  pci_device_shutdown+0x31/0x60
      [  307.897613]  device_shutdown+0x14c/0x1f0
      [  307.901829]  kernel_restart+0xe/0x50
      [  307.905669]  __do_sys_reboot+0x1df/0x210
      [  307.909884]  ? task_work_run+0x73/0xb0
      [  307.913914]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [  307.918970]  do_syscall_64+0x4a/0x1c0
      [  307.922904]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  307.928336] RIP: 0033:0x7f0671cf8373
      [  307.932176] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00
      00 0f 1f 44 00 00 89 fa be 69 19 128
      [  307.952384] RSP: 002b:00007ffdd1723d68 EFLAGS: 00000202 ORIG_RAX:
      00000000000000a9
      [  307.960527] RAX: ffffffffffffffda RBX: 0000000001234567 RCX:
      00007f0671cf8373
      [  307.968201] RDX: 0000000001234567 RSI: 0000000028121969 RDI:
      00000000fee1dead
      [  307.975875] RBP: 00007ffdd1723dd0 R08: 0000000000000000 R09:
      0000000000000000
      [  307.983550] R10: 0000000000000002 R11: 0000000000000202 R12:
      00007ffdd1723dd8
      [  307.991224] R13: 0000000000000000 R14: 0000001b00000004 R15:
      00007ffdd17240c8
      [  307.998901] Modules linked in: xt_MASQUERADE nfnetlink iptable_nat
      xt_addrtype xt_conntrack nf_nat nf_cos
      [  308.026505] CR2: 0000000000000000
      [  308.039998] RIP: 0010:soc15_common_hw_fini+0x33/0xc0 [amdgpu]
      [  308.046180] Code: 89 fb e8 60 f5 ff ff f6 83 50 df 01 00 04 75 3d 48
      8b b3 90 7d 00 00 48 c7 c7 17 b8 530
      [  308.066392] RSP: 0018:ffffac9483153d40 EFLAGS: 00010286
      [  308.072013] RAX: 0000000000000000 RBX: ffff9eb299da0000 RCX:
      0000000000000006
      [  308.079689] RDX: 0000000000000000 RSI: ffff9eb29e3508a0 RDI:
      ffff9eb29e350000
      [  308.087366] RBP: ffff9eb299da0000 R08: 0000000000000000 R09:
      0000000000000000
      [  308.095042] R10: 0000000000000000 R11: 0000000000000000 R12:
      ffff9eb299dbd1f8
      [  308.102717] R13: ffffffffc04f8368 R14: ffff9eb29cebd130 R15:
      0000000000000000
      [  308.110394] FS:  00007f06721c9940(0000) GS:ffff9eb2a18c0000(0000)
      knlGS:0000000000000000
      [  308.119099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  308.125280] CR2: 0000000000000000 CR3: 000000081d798005 CR4:
      00000000001606e0
      [  308.135304] printk: systemd-shutdow: 3 output lines suppressed due to
      ratelimiting
      [  308.143518] Kernel panic - not syncing: Attempted to kill init!
      exitcode=0x00000009
      [  308.151798] Kernel Offset: 0x15000000 from 0xffffffff81000000
      (relocation range: 0xffffffff80000000-0xff)
      [  308.171775] ---[ end Kernel panic - not syncing: Attempted to kill
      init! exitcode=0x00000009 ]---
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      cde85ac2
    • Xiaojie Yuan's avatar
      drm/amdgpu: fix null pointer deref in firmware header printing · bfa603aa
      Xiaojie Yuan authored
      v2: declare as (struct common_firmware_header *) type because
          struct xxx_firmware_header inherits from it
      
      When CE's ucode_id(8) is used to get sdma_hdr, we will be accessing an
      unallocated amdgpu_firmware_info instance.
      
      This issue appears on rhel7.7 with gcc 4.8.5. Newer compilers might have
      optimized out such 'defined but not referenced' variable.
      
      [ 1120.798564] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a
      [ 1120.806703] IP: [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu]
      [ 1120.813693] PGD 80000002603ff067 PUD 271b8d067 PMD 0
      [ 1120.818931] Oops: 0000 [#1] SMP
      [ 1120.822245] Modules linked in: amdgpu(OE+) amdkcl(OE) amd_iommu_v2 amdttm(OE) amd_sched(OE) xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun bridge stp llc devlink ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dm_mirror dm_region_hash dm_log dm_mod intel_pmc_core intel_powerclamp coretemp intel_rapl joydev kvm_intel eeepc_wmi asus_wmi kvm sparse_keymap iTCO_wdt irqbypass rfkill crc32_pclmul snd_hda_codec_realtek mxm_wmi ghash_clmulni_intel intel_wmi_thunderbolt iTCO_vendor_support snd_hda_codec_generic snd_hda_codec_hdmi aesni_intel lrw gf128mul glue_helper ablk_helper sg cryptd pcspkr snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd pinctrl_sunrisepoint pinctrl_intel soundcore acpi_pad mei_me wmi mei i2c_i801 pcc_cpufreq ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic i915 i2c_algo_bit iosf_mbi drm_kms_helper e1000e syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm ptp libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw pps_core drm_panel_orientation_quirks video i2c_hid
      [ 1120.954136] CPU: 4 PID: 2426 Comm: modprobe Tainted: G           OE  ------------   3.10.0-1062.el7.x86_64 #1
      [ 1120.964390] Hardware name: System manufacturer System Product Name/Z170-A, BIOS 1302 11/09/2015
      [ 1120.973321] task: ffff991ef1e3c1c0 ti: ffff991ee625c000 task.ti: ffff991ee625c000
      [ 1120.981020] RIP: 0010:[<ffffffffc0e3c9b3>]  [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu]
      [ 1120.990483] RSP: 0018:ffff991ee625f950  EFLAGS: 00010202
      [ 1120.995935] RAX: 0000000000000002 RBX: ffff991edf6b2d38 RCX: ffff991edf6a0000
      [ 1121.003391] RDX: 0000000000000000 RSI: ffff991f01d13898 RDI: ffffffffc110afb3
      [ 1121.010706] RBP: ffff991ee625f9b0 R08: 0000000000000000 R09: 0000000000000000
      [ 1121.018029] R10: 00000000000004c4 R11: ffff991ee625f64e R12: ffff991edf6b3220
      [ 1121.025353] R13: ffff991edf6a0000 R14: 0000000000000008 R15: ffff991edf6b2d30
      [ 1121.032666] FS:  00007f97b0c0b740(0000) GS:ffff991f01d00000(0000) knlGS:0000000000000000
      [ 1121.041000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1121.046880] CR2: 000000000000000a CR3: 000000025e604000 CR4: 00000000003607e0
      [ 1121.054239] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1121.061631] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1121.068938] Call Trace:
      [ 1121.071494]  [<ffffffffc0e3dba8>] psp_hw_init+0x218/0x270 [amdgpu]
      [ 1121.077886]  [<ffffffffc0da3188>] amdgpu_device_fw_loading+0xe8/0x160 [amdgpu]
      [ 1121.085296]  [<ffffffffc0e3b34c>] ? vega10_ih_irq_init+0x4bc/0x730 [amdgpu]
      [ 1121.092534]  [<ffffffffc0da5c75>] amdgpu_device_init+0x1495/0x1c90 [amdgpu]
      [ 1121.099675]  [<ffffffffc0da9cab>] amdgpu_driver_load_kms+0x8b/0x2f0 [amdgpu]
      [ 1121.106888]  [<ffffffffc01b25cf>] drm_dev_register+0x12f/0x1d0 [drm]
      [ 1121.113419]  [<ffffffffa4dcdfd8>] ? pci_enable_device_flags+0xe8/0x140
      [ 1121.120183]  [<ffffffffc0da260a>] amdgpu_pci_probe+0xca/0x170 [amdgpu]
      [ 1121.126919]  [<ffffffffa4dcf97a>] local_pci_probe+0x4a/0xb0
      [ 1121.132622]  [<ffffffffa4dd10c9>] pci_device_probe+0x109/0x160
      [ 1121.138607]  [<ffffffffa4eb4205>] driver_probe_device+0xc5/0x3e0
      [ 1121.144766]  [<ffffffffa4eb4603>] __driver_attach+0x93/0xa0
      [ 1121.150507]  [<ffffffffa4eb4570>] ? __device_attach+0x50/0x50
      [ 1121.156422]  [<ffffffffa4eb1da5>] bus_for_each_dev+0x75/0xc0
      [ 1121.162213]  [<ffffffffa4eb3b7e>] driver_attach+0x1e/0x20
      [ 1121.167771]  [<ffffffffa4eb3620>] bus_add_driver+0x200/0x2d0
      [ 1121.173590]  [<ffffffffa4eb4c94>] driver_register+0x64/0xf0
      [ 1121.179345]  [<ffffffffa4dd0905>] __pci_register_driver+0xa5/0xc0
      [ 1121.185593]  [<ffffffffc099f000>] ? 0xffffffffc099efff
      [ 1121.190914]  [<ffffffffc099f0a4>] amdgpu_init+0xa4/0xb0 [amdgpu]
      [ 1121.197101]  [<ffffffffa4a0210a>] do_one_initcall+0xba/0x240
      [ 1121.202901]  [<ffffffffa4b1c90a>] load_module+0x271a/0x2bb0
      [ 1121.208598]  [<ffffffffa4dad740>] ? ddebug_proc_write+0x100/0x100
      [ 1121.214894]  [<ffffffffa4b1ce8f>] SyS_init_module+0xef/0x140
      [ 1121.220698]  [<ffffffffa518bede>] system_call_fastpath+0x25/0x2a
      [ 1121.226870] Code: b4 01 60 a2 00 00 31 c0 e8 83 60 33 e4 41 8b 47 08 48 8b 4d d0 48 c7 c7 b3 af 10 c1 48 69 c0 68 07 00 00 48 8b 84 01 60 a2 00 00 <48> 8b 70 08 31 c0 48 89 75 c8 e8 56 60 33 e4 48 8b 4d d0 48 c7
      [ 1121.247422] RIP  [<ffffffffc0e3c9b3>] psp_np_fw_load+0x1e3/0x390 [amdgpu]
      [ 1121.254432]  RSP <ffff991ee625f950>
      [ 1121.258017] CR2: 000000000000000a
      [ 1121.261427] ---[ end trace e98b35387ede75bd ]---
      Signed-off-by: default avatarXiaojie Yuan <xiaojie.yuan@amd.com>
      Fixes: c5fb9126 ("drm/amdgpu: add firmware header printing for psp fw loading (v2)")
      Reviewed-by: default avatarKevin Wang <kevin1.wang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      bfa603aa
    • Huang Rui's avatar
      drm/amdkfd: enable renoir while device probes · 4042a188
      Huang Rui authored
      This patch is to add asic flag to enable device probe during kfd init.
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      4042a188
    • Huang Rui's avatar
      drm/amdgpu: disable gfxoff while use no H/W scheduling policy · aa978594
      Huang Rui authored
      While gfxoff is enabled, the mmVM_XXX registers will be 0xfffffff while the GFX
      is in "off" state. KFD queue creattion doesn't use ring based method, so it will
      trigger a VM fault.
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      aa978594
    • Huang Rui's avatar
      drm/amdkfd: add renoir kfd topology · f5d843d4
      Huang Rui authored
      This patch adds renoir kfd topology which is the same with Raven.
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      f5d843d4
    • Huang Rui's avatar
      drm/amdkfd: add package manager for renoir · 444d4f5f
      Huang Rui authored
      Renoir use GFX v9, so adds v9 package manager.
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      444d4f5f
    • Huang Rui's avatar
      drm/amdkfd: init kernel queue for renoir · 59a6fc1a
      Huang Rui authored
      Renoir is GFX v9, so init v9 kernel queue.
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      59a6fc1a
    • Huang Rui's avatar
      drm/amdkfd: init kfd apertures v9 for renoir · 4d85488c
      Huang Rui authored
      Renoir is GMC v9, so init v9 kfd apertures.
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      4d85488c
    • Huang Rui's avatar
      drm/amdkfd: add renoir type for the workaround of iommu v2 (v2) · 514e5e7e
      Huang Rui authored
      Renoir is the same with Raven, will enable iommu event in future.
      
      v2: fix the checking (Thong)
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      514e5e7e
    • Huang Rui's avatar
      drm/amdkfd: enable kfd device queue manager v9 for renoir · 5a959a89
      Huang Rui authored
      Renoir is GFX9, so enable v9 devcie queue manager.
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      5a959a89
    • Huang Rui's avatar
      drm/amdkfd: add renoir kfd device info (v2) · 2b9c2211
      Huang Rui authored
      This patch inits renoir kfd device info, so we treat renoir as "dgpu"
      (bypass iommu v2). Will enable needs_iommu_device till renoir iommu is ready.
      
      v2: rebase and align the drm-next
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      2b9c2211
    • Huang Rui's avatar
      drm/amdkfd: add renoir cache info for CRAT (v2) · a8d42f17
      Huang Rui authored
      Renoir's cache info should be the same with raven and carrizo's.
      
      v2: fix missed "break"
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      a8d42f17
    • Yong Zhao's avatar
      drm/amdkfd: Support Navi14 in KFD · 8099ae40
      Yong Zhao authored
      Initial support of Navi14 in KFD. The device IDs will be added later.
      Signed-off-by: default avatarYong Zhao <Yong.Zhao@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      8099ae40
    • Felix Kuehling's avatar
      drm/amdgpu: Disable retry faults in VMID0 · 7cae7061
      Felix Kuehling authored
      There is no point retrying page faults in VMID0. Those faults are
      always fatal.
      Signed-off-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-and-Tested-by: default avatarHuang Rui <ray.huang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      7cae7061
    • Yong Zhao's avatar
      drm/amdgpu: Add a kernel parameter for specifying the asic type · 4e66d7d2
      Yong Zhao authored
      As more and more new asics start to reuse the old device IDs before
      launch, there is a need to quickly override the existing asic type
      corresponding to the reused device ID through a kernel parameter. With
      this, engineers no longer need to rely on local hack patches,
      facilitating cooperation across teams.
      Signed-off-by: default avatarYong Zhao <Yong.Zhao@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      4e66d7d2
    • Alex Deucher's avatar
      drm/amdgpu/irq: check if nbio funcs exist · bb42eda2
      Alex Deucher authored
      We need to check if the nbios funcs exist before
      checking the individual pointers.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>
      bb42eda2
  2. 13 Sep, 2019 24 commits