1. 09 Sep, 2021 5 commits
    • Robin Murphy's avatar
      iommu: Clarify default domain Kconfig · 8cc63319
      Robin Murphy authored
      Although strictly it is the AMD and Intel drivers which have an existing
      expectation of lazy behaviour by default, it ends up being rather
      unintuitive to describe this literally in Kconfig. Express it instead as
      an architecture dependency, to clarify that it is a valid config-time
      decision. The end result is the same since virtio-iommu doesn't support
      lazy mode and thus falls back to strict at runtime regardless.
      
      The per-architecture disparity is a matter of historical expectations:
      the AMD and Intel drivers have been lazy by default since 2008, and
      changing that gets noticed by people asking where their I/O throughput
      has gone. Conversely, Arm-based systems with their wider assortment of
      IOMMU drivers mostly only support strict mode anyway; only the Arm SMMU
      drivers have later grown support for passthrough and lazy mode, for
      users who wanted to explicitly trade off isolation for performance.
      These days, reducing the default level of isolation in a way which may
      go unnoticed by users who expect otherwise hardly seems worth risking
      for the sake of one line of Kconfig, so here's where we are.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Link: https://lore.kernel.org/r/69a0c6f17b000b54b8333ee42b3124c1d5a869e2.1631105737.git.robin.murphy@arm.comSigned-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      8cc63319
    • Fenghua Yu's avatar
      iommu/vt-d: Fix a deadlock in intel_svm_drain_prq() · 6ef05051
      Fenghua Yu authored
      pasid_mutex and dev->iommu->param->lock are held while unbinding mm is
      flushing IO page fault workqueue and waiting for all page fault works to
      finish. But an in-flight page fault work also need to hold the two locks
      while unbinding mm are holding them and waiting for the work to finish.
      This may cause an ABBA deadlock issue as shown below:
      
      	idxd 0000:00:0a.0: unbind PASID 2
      	======================================================
      	WARNING: possible circular locking dependency detected
      	5.14.0-rc7+ #549 Not tainted [  186.615245] ----------
      	dsa_test/898 is trying to acquire lock:
      	ffff888100d854e8 (&param->lock){+.+.}-{3:3}, at:
      	iopf_queue_flush_dev+0x29/0x60
      	but task is already holding lock:
      	ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at:
      	intel_svm_unbind+0x34/0x1e0
      	which lock already depends on the new lock.
      
      	the existing dependency chain (in reverse order) is:
      
      	-> #2 (pasid_mutex){+.+.}-{3:3}:
      	       __mutex_lock+0x75/0x730
      	       mutex_lock_nested+0x1b/0x20
      	       intel_svm_page_response+0x8e/0x260
      	       iommu_page_response+0x122/0x200
      	       iopf_handle_group+0x1c2/0x240
      	       process_one_work+0x2a5/0x5a0
      	       worker_thread+0x55/0x400
      	       kthread+0x13b/0x160
      	       ret_from_fork+0x22/0x30
      
      	-> #1 (&param->fault_param->lock){+.+.}-{3:3}:
      	       __mutex_lock+0x75/0x730
      	       mutex_lock_nested+0x1b/0x20
      	       iommu_report_device_fault+0xc2/0x170
      	       prq_event_thread+0x28a/0x580
      	       irq_thread_fn+0x28/0x60
      	       irq_thread+0xcf/0x180
      	       kthread+0x13b/0x160
      	       ret_from_fork+0x22/0x30
      
      	-> #0 (&param->lock){+.+.}-{3:3}:
      	       __lock_acquire+0x1134/0x1d60
      	       lock_acquire+0xc6/0x2e0
      	       __mutex_lock+0x75/0x730
      	       mutex_lock_nested+0x1b/0x20
      	       iopf_queue_flush_dev+0x29/0x60
      	       intel_svm_drain_prq+0x127/0x210
      	       intel_svm_unbind+0xc5/0x1e0
      	       iommu_sva_unbind_device+0x62/0x80
      	       idxd_cdev_release+0x15a/0x200 [idxd]
      	       __fput+0x9c/0x250
      	       ____fput+0xe/0x10
      	       task_work_run+0x64/0xa0
      	       exit_to_user_mode_prepare+0x227/0x230
      	       syscall_exit_to_user_mode+0x2c/0x60
      	       do_syscall_64+0x48/0x90
      	       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      	other info that might help us debug this:
      
      	Chain exists of:
      	  &param->lock --> &param->fault_param->lock --> pasid_mutex
      
      	 Possible unsafe locking scenario:
      
      	       CPU0                    CPU1
      	       ----                    ----
      	  lock(pasid_mutex);
      				       lock(&param->fault_param->lock);
      				       lock(pasid_mutex);
      	  lock(&param->lock);
      
      	 *** DEADLOCK ***
      
      	2 locks held by dsa_test/898:
      	 #0: ffff888100cc1cc0 (&group->mutex){+.+.}-{3:3}, at:
      	 iommu_sva_unbind_device+0x53/0x80
      	 #1: ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at:
      	 intel_svm_unbind+0x34/0x1e0
      
      	stack backtrace:
      	CPU: 2 PID: 898 Comm: dsa_test Not tainted 5.14.0-rc7+ #549
      	Hardware name: Intel Corporation Kabylake Client platform/KBL S
      	DDR4 UD IMM CRB, BIOS KBLSE2R1.R00.X050.P01.1608011715 08/01/2016
      	Call Trace:
      	 dump_stack_lvl+0x5b/0x74
      	 dump_stack+0x10/0x12
      	 print_circular_bug.cold+0x13d/0x142
      	 check_noncircular+0xf1/0x110
      	 __lock_acquire+0x1134/0x1d60
      	 lock_acquire+0xc6/0x2e0
      	 ? iopf_queue_flush_dev+0x29/0x60
      	 ? pci_mmcfg_read+0xde/0x240
      	 __mutex_lock+0x75/0x730
      	 ? iopf_queue_flush_dev+0x29/0x60
      	 ? pci_mmcfg_read+0xfd/0x240
      	 ? iopf_queue_flush_dev+0x29/0x60
      	 mutex_lock_nested+0x1b/0x20
      	 iopf_queue_flush_dev+0x29/0x60
      	 intel_svm_drain_prq+0x127/0x210
      	 ? intel_pasid_tear_down_entry+0x22e/0x240
      	 intel_svm_unbind+0xc5/0x1e0
      	 iommu_sva_unbind_device+0x62/0x80
      	 idxd_cdev_release+0x15a/0x200
      
      pasid_mutex protects pasid and svm data mapping data. It's unnecessary
      to hold pasid_mutex while flushing the workqueue. To fix the deadlock
      issue, unlock pasid_pasid during flushing the workqueue to allow the works
      to be handled.
      
      Fixes: d5b9e4bf ("iommu/vt-d: Report prq to io-pgfault framework")
      Reported-and-tested-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.comSigned-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20210828070622.2437559-3-baolu.lu@linux.intel.com
      [joro: Removed timing information from kernel log messages]
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      6ef05051
    • Fenghua Yu's avatar
      iommu/vt-d: Fix PASID leak in intel_svm_unbind_mm() · a21518cb
      Fenghua Yu authored
      The mm->pasid will be used in intel_svm_free_pasid() after load_pasid()
      during unbinding mm. Clearing it in load_pasid() will cause PASID cannot
      be freed in intel_svm_free_pasid().
      
      Additionally mm->pasid was updated already before load_pasid() during pasid
      allocation. No need to update it again in load_pasid() during binding mm.
      Don't update mm->pasid to avoid the issues in both binding mm and unbinding
      mm.
      
      Fixes: 40483774 ("iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers")
      Reported-and-tested-by: default avatarDave Jiang <dave.jiang@intel.com>
      Co-developed-by: default avatarJacob Pan <jacob.jun.pan@linux.intel.com>
      Signed-off-by: default avatarJacob Pan <jacob.jun.pan@linux.intel.com>
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.comSigned-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20210828070622.2437559-2-baolu.lu@linux.intel.comSigned-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      a21518cb
    • Suravee Suthikulpanit's avatar
      iommu/amd: Remove iommu_init_ga() · eb03f2d2
      Suravee Suthikulpanit authored
      Since the function has been simplified and only call iommu_init_ga_log(),
      remove the function and replace with iommu_init_ga_log() instead.
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Link: https://lore.kernel.org/r/20210820202957.187572-4-suravee.suthikulpanit@amd.com
      Fixes: 8bda0cfb ("iommu/amd: Detect and initialize guest vAPIC log")
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      eb03f2d2
    • Wei Huang's avatar
      iommu/amd: Relocate GAMSup check to early_enable_iommus · c3811a50
      Wei Huang authored
      Currently, iommu_init_ga() checks and disables IOMMU VAPIC support
      (i.e. AMD AVIC support in IOMMU) when GAMSup feature bit is not set.
      However it forgets to clear IRQ_POSTING_CAP from the previously set
      amd_iommu_irq_ops.capability.
      
      This triggers an invalid page fault bug during guest VM warm reboot
      if AVIC is enabled since the irq_remapping_cap(IRQ_POSTING_CAP) is
      incorrectly set, and crash the system with the following kernel trace.
      
          BUG: unable to handle page fault for address: 0000000000400dd8
          RIP: 0010:amd_iommu_deactivate_guest_mode+0x19/0xbc
          Call Trace:
           svm_set_pi_irte_mode+0x8a/0xc0 [kvm_amd]
           ? kvm_make_all_cpus_request_except+0x50/0x70 [kvm]
           kvm_request_apicv_update+0x10c/0x150 [kvm]
           svm_toggle_avic_for_irq_window+0x52/0x90 [kvm_amd]
           svm_enable_irq_window+0x26/0xa0 [kvm_amd]
           vcpu_enter_guest+0xbbe/0x1560 [kvm]
           ? avic_vcpu_load+0xd5/0x120 [kvm_amd]
           ? kvm_arch_vcpu_load+0x76/0x240 [kvm]
           ? svm_get_segment_base+0xa/0x10 [kvm_amd]
           kvm_arch_vcpu_ioctl_run+0x103/0x590 [kvm]
           kvm_vcpu_ioctl+0x22a/0x5d0 [kvm]
           __x64_sys_ioctl+0x84/0xc0
           do_syscall_64+0x33/0x40
           entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes by moving the initializing of AMD IOMMU interrupt remapping mode
      (amd_iommu_guest_ir) earlier before setting up the
      amd_iommu_irq_ops.capability with appropriate IRQ_POSTING_CAP flag.
      
      [joro:	Squashed the two patches and limited
      	check_features_on_all_iommus() to CONFIG_IRQ_REMAP
      	to fix a compile warning.]
      Signed-off-by: default avatarWei Huang <wei.huang2@amd.com>
      Co-developed-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Link: https://lore.kernel.org/r/20210820202957.187572-2-suravee.suthikulpanit@amd.com
      Link: https://lore.kernel.org/r/20210820202957.187572-3-suravee.suthikulpanit@amd.com
      Fixes: 8bda0cfb ("iommu/amd: Detect and initialize guest vAPIC log")
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      c3811a50
  2. 20 Aug, 2021 3 commits
  3. 19 Aug, 2021 9 commits
  4. 18 Aug, 2021 23 commits