• Rik van Riel's avatar
    x86,kvm: move qemu/guest FPU switching out to vcpu_run · f775b13e
    Rik van Riel authored
    Currently, every time a VCPU is scheduled out, the host kernel will
    first save the guest FPU/xstate context, then load the qemu userspace
    FPU context, only to then immediately save the qemu userspace FPU
    context back to memory. When scheduling in a VCPU, the same extraneous
    FPU loads and saves are done.
    
    This could be avoided by moving from a model where the guest FPU is
    loaded and stored with preemption disabled, to a model where the
    qemu userspace FPU is swapped out for the guest FPU context for
    the duration of the KVM_RUN ioctl.
    
    This is done under the VCPU mutex, which is also taken when other
    tasks inspect the VCPU FPU context, so the code should already be
    safe for this change. That should come as no surprise, given that
    s390 already has this optimization.
    
    This can fix a bug where KVM calls get_user_pages while owning the
    FPU, and the file system ends up requesting the FPU again:
    
        [258270.527947]  __warn+0xcb/0xf0
        [258270.527948]  warn_slowpath_null+0x1d/0x20
        [258270.527951]  kernel_fpu_disable+0x3f/0x50
        [258270.527953]  __kernel_fpu_begin+0x49/0x100
        [258270.527955]  kernel_fpu_begin+0xe/0x10
        [258270.527958]  crc32c_pcl_intel_update+0x84/0xb0
        [258270.527961]  crypto_shash_update+0x3f/0x110
        [258270.527968]  crc32c+0x63/0x8a [libcrc32c]
        [258270.527975]  dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
        [258270.527978]  node_prepare_for_write+0x44/0x70 [dm_persistent_data]
        [258270.527985]  dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
        [258270.527988]  submit_io+0x170/0x1b0 [dm_bufio]
        [258270.527992]  __write_dirty_buffer+0x89/0x90 [dm_bufio]
        [258270.527994]  __make_buffer_clean+0x4f/0x80 [dm_bufio]
        [258270.527996]  __try_evict_buffer+0x42/0x60 [dm_bufio]
        [258270.527998]  dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
        [258270.528002]  shrink_slab.part.40+0x1f5/0x420
        [258270.528004]  shrink_node+0x22c/0x320
        [258270.528006]  do_try_to_free_pages+0xf5/0x330
        [258270.528008]  try_to_free_pages+0xe9/0x190
        [258270.528009]  __alloc_pages_slowpath+0x40f/0xba0
        [258270.528011]  __alloc_pages_nodemask+0x209/0x260
        [258270.528014]  alloc_pages_vma+0x1f1/0x250
        [258270.528017]  do_huge_pmd_anonymous_page+0x123/0x660
        [258270.528021]  handle_mm_fault+0xfd3/0x1330
        [258270.528025]  __get_user_pages+0x113/0x640
        [258270.528027]  get_user_pages+0x4f/0x60
        [258270.528063]  __gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
        [258270.528108]  try_async_pf+0x66/0x230 [kvm]
        [258270.528135]  tdp_page_fault+0x130/0x280 [kvm]
        [258270.528149]  kvm_mmu_page_fault+0x60/0x120 [kvm]
        [258270.528158]  handle_ept_violation+0x91/0x170 [kvm_intel]
        [258270.528162]  vmx_handle_exit+0x1ca/0x1400 [kvm_intel]
    
    No performance changes were detected in quick ping-pong tests on
    my 4 socket system, which is expected since an FPU+xstate load is
    on the order of 0.1us, while ping-ponging between CPUs is on the
    order of 20us, and somewhat noisy.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarRik van Riel <riel@redhat.com>
    Suggested-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    [Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu,
     which happened inside from KVM_CREATE_VCPU ioctl. - Radim]
    Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
    f775b13e
x86.c 228 KB