• Alexey Kardashevskiy's avatar
    KVM: PPC: Fix TCE handling for VFIO · 26a62b75
    Alexey Kardashevskiy authored
    The LoPAPR spec defines a guest visible IOMMU with a variable page size.
    Currently QEMU advertises 4K, 64K, 2M, 16MB pages, a Linux VM picks
    the biggest (16MB). In the case of a passed though PCI device, there is
    a hardware IOMMU which does not support all pages sizes from the above -
    P8 cannot do 2MB and P9 cannot do 16MB. So for each emulated
    16M IOMMU page we may create several smaller mappings ("TCEs") in
    the hardware IOMMU.
    
    The code wrongly uses the emulated TCE index instead of hardware TCE
    index in error handling. The problem is easier to see on POWER8 with
    multi-level TCE tables (when only the first level is preallocated)
    as hash mode uses real mode TCE hypercalls handlers.
    The kernel starts using indirect tables when VMs get bigger than 128GB
    (depends on the max page order).
    The very first real mode hcall is going to fail with H_TOO_HARD as
    in the real mode we cannot allocate memory for TCEs (we can in the virtual
    mode) but on the way out the code attempts to clear hardware TCEs using
    emulated TCE indexes which corrupts random kernel memory because
    it_offset==1<<59 is subtracted from those indexes and the resulting index
    is out of the TCE table bounds.
    
    This fixes kvmppc_clear_tce() to use the correct TCE indexes.
    
    While at it, this fixes TCE cache invalidation which uses emulated TCE
    indexes instead of the hardware ones. This went unnoticed as 64bit DMA
    is used these days and VMs map all RAM in one go and only then do DMA
    and this is when the TCE cache gets populated.
    
    Potentially this could slow down mapping, however normally 16MB
    emulated pages are backed by 64K hardware pages so it is one write to
    the "TCE Kill" per 256 updates which is not that bad considering the size
    of the cache (1024 TCEs or so).
    
    Fixes: ca1fc489 ("KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages with smaller physical pages")
    Signed-off-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
    Tested-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
    Reviewed-by: default avatarFrederic Barrat <fbarrat@linux.ibm.com>
    Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20220420050840.328223-1-aik@ozlabs.ru
    26a62b75
book3s_64_vio.c 17.6 KB