• Greg Kurz's avatar
    KVM: PPC: Book3S HV: XIVE: Make VP block size configurable · 062cfab7
    Greg Kurz authored
    The XIVE VP is an internal structure which allow the XIVE interrupt
    controller to maintain the interrupt context state of vCPUs non
    dispatched on HW threads.
    
    When a guest is started, the XIVE KVM device allocates a block of
    XIVE VPs in OPAL, enough to accommodate the highest possible vCPU
    id KVM_MAX_VCPU_ID (16384) packed down to KVM_MAX_VCPUS (2048).
    With a guest's core stride of 8 and a threading mode of 1 (QEMU's
    default), a VM must run at least 256 vCPUs to actually need such a
    range of VPs.
    
    A POWER9 system has a limited XIVE VP space : 512k and KVM is
    currently wasting this HW resource with large VP allocations,
    especially since a typical VM likely runs with a lot less vCPUs.
    
    Make the size of the VP block configurable. Add an nr_servers
    field to the XIVE structure and a function to set it for this
    purpose.
    
    Split VP allocation out of the device create function. Since the
    VP block isn't used before the first vCPU connects to the XIVE KVM
    device, allocation is now performed by kvmppc_xive_connect_vcpu().
    This gives the opportunity to set nr_servers in between:
    
              kvmppc_xive_create() / kvmppc_xive_native_create()
                                   .
                                   .
                         kvmppc_xive_set_nr_servers()
                                   .
                                   .
        kvmppc_xive_connect_vcpu() / kvmppc_xive_native_connect_vcpu()
    
    The connect_vcpu() functions check that the vCPU id is below nr_servers
    and if it is the first vCPU they allocate the VP block. This is protected
    against a concurrent update of nr_servers by kvmppc_xive_set_nr_servers()
    with the xive->lock mutex.
    
    Also, the block is allocated once for the device lifetime: nr_servers
    should stay constant otherwise connect_vcpu() could generate a boggus
    VP id and likely crash OPAL. It is thus forbidden to update nr_servers
    once the block is allocated.
    
    If the VP allocation fail, return ENOSPC which seems more appropriate to
    report the depletion of system wide HW resource than ENOMEM or ENXIO.
    
    A VM using a stride of 8 and 1 thread per core with 32 vCPUs would hence
    only need 256 VPs instead of 2048. If the stride is set to match the number
    of threads per core, this goes further down to 32.
    
    This will be exposed to userspace by a subsequent patch.
    Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
    Reviewed-by: default avatarCédric Le Goater <clg@kaod.org>
    Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
    062cfab7
book3s_xive_native.c 30 KB