• Sean Christopherson's avatar
    KVM: x86: Suppress pending MMIO write exits if emulator detects exception · 0dc90226
    Sean Christopherson authored
    Clear vcpu->mmio_needed when injecting an exception from the emulator to
    squash a (legitimate) warning about vcpu->mmio_needed being true at the
    start of KVM_RUN without a callback being registered to complete the
    userspace MMIO exit.  Suppressing the MMIO write exit is inarguably wrong
    from an architectural perspective, but it is the least awful hack-a-fix
    due to shortcomings in KVM's uAPI, not to mention that KVM already
    suppresses MMIO writes in this scenario.
    
    Outside of REP string instructions, KVM doesn't provide a way to resume
    an instruction at the exact point where it was "interrupted" if said
    instruction partially completed before encountering an MMIO access.  For
    MMIO reads, KVM immediately exits to userspace upon detecting MMIO as
    userspace provides the to-be-read value in a buffer, and so KVM can safely
    (more or less) restart the instruction from the beginning.  When the
    emulator re-encounters the MMIO read, KVM will service the MMIO by getting
    the value from the buffer instead of exiting to userspace, i.e. KVM won't
    put the vCPU into an infinite loop.
    
    On an emulated MMIO write, KVM finishes the instruction before exiting to
    userspace, as exiting immediately would ultimately hang the vCPU due to
    the aforementioned shortcoming of KVM not being able to resume emulation
    in the middle of an instruction.
    
    For the vast majority of _emulated_ instructions, deferring the userspace
    exit doesn't cause problems as very few x86 instructions (again ignoring
    string operations) generate multiple writes.  But for instructions that
    generate multiple writes, e.g. PUSHA (multiple pushes onto the stack),
    deferring the exit effectively results in only the final write triggering
    an exit to userspace.  KVM does support multiple MMIO "fragments", but
    only for page splits; if an instruction performs multiple distinct MMIO
    writes, the number of fragments gets reset when the next MMIO write comes
    along and any previous MMIO writes are dropped.
    
    Circling back to the warning, if a deferred MMIO write coincides with an
    exception, e.g. in this case a #SS due to PUSHA underflowing the stack
    after queueing a write to an MMIO page on a previous push, KVM injects
    the exceptions and leaves the deferred MMIO pending without registering a
    callback, thus triggering the splat.
    
    Sweep the problem under the proverbial rug as dropping MMIO writes is not
    unique to the exception scenario (see above), i.e. instructions like PUSHA
    are fundamentally broken with respect to MMIO, and have been since KVM's
    inception.
    Reported-by: default avatarzhangjianguo <zhangjianguo18@huawei.com>
    Reported-by: syzbot+760a73552f47a8cd0fd9@syzkaller.appspotmail.com
    Reported-by: syzbot+8accb43ddc6bd1f5713a@syzkaller.appspotmail.com
    Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
    Message-Id: <20230322141220.2206241-1-seanjc@google.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    0dc90226
x86.c 360 KB