• David Hildenbrand's avatar
    virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM · 5a6b4cc5
    David Hildenbrand authored
    Commit 71994620 ("virtio_balloon: replace oom notifier with shrinker")
    changed the behavior when deflation happens automatically. Instead of
    deflating when called by the OOM handler, the shrinker is used.
    
    However, the balloon is not simply some slab cache that should be
    shrunk when under memory pressure. The shrinker does not have a concept of
    priorities, so this behavior cannot be configured.
    
    There was a report that this results in undesired side effects when
    inflating the balloon to shrink the page cache. [1]
    	"When inflating the balloon against page cache (i.e. no free memory
    	 remains) vmscan.c will both shrink page cache, but also invoke the
    	 shrinkers -- including the balloon's shrinker. So the balloon
    	 driver allocates memory which requires reclaim, vmscan gets this
    	 memory by shrinking the balloon, and then the driver adds the
    	 memory back to the balloon. Basically a busy no-op."
    
    The name "deflate on OOM" makes it pretty clear when deflation should
    happen - after other approaches to reclaim memory failed, not while
    reclaiming. This allows to minimize the footprint of a guest - memory
    will only be taken out of the balloon when really needed.
    
    Especially, a drop_slab() will result in the whole balloon getting
    deflated - undesired. While handling it via the OOM handler might not be
    perfect, it keeps existing behavior. If we want a different behavior, then
    we need a new feature bit and document it properly (although, there should
    be a clear use case and the intended effects should be well described).
    
    Keep using the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT, because
    this has no such side effects. Always register the shrinker with
    VIRTIO_BALLOON_F_FREE_PAGE_HINT now. We are always allowed to reuse free
    pages that are still to be processed by the guest. The hypervisor takes
    care of identifying and resolving possible races between processing a
    hinting request and the guest reusing a page.
    
    In contrast to pre commit 71994620 ("virtio_balloon: replace oom
    notifier with shrinker"), don't add a moodule parameter to configure the
    number of pages to deflate on OOM. Can be re-added if really needed.
    Also, pay attention that leak_balloon() returns the number of 4k pages -
    convert it properly in virtio_balloon_oom_notify().
    
    Note1: using the OOM handler is frowned upon, but it really is what we
           need for this feature.
    
    Note2: without VIRTIO_BALLOON_F_MUST_TELL_HOST (iow, always with QEMU) we
           could actually skip sending deflation requests to our hypervisor,
           making the OOM path *very* simple. Besically freeing pages and
           updating the balloon. If the communication with the host ever
           becomes a problem on this call path.
    
    [1] https://www.spinics.net/lists/linux-virtualization/msg40863.html
    
    Test report by Tyler Sanderson:
    
    Test setup: VM with 16 CPU, 64GB RAM. Running Debian 10. We have a 42
    GB file full of random bytes that we continually cat to /dev/null.
    This fills the page cache as the file is read. Meanwhile we trigger
    the balloon to inflate, with a target size of 53 GB. This setup causes
    the balloon inflation to pressure the page cache as the page cache is
    also trying to grow. Afterwards we shrink the balloon back to zero (so
    total deflate = total inflate).
    
    Without patch (kernel 4.19.0-5):
    Inflation never reaches the target until we stop the "cat file >
    /dev/null" process. Total inflation time was 542 seconds. The longest
    period that made no net forward progress was 315 seconds (see attached
    graph).
    Result of "grep balloon /proc/vmstat" after the test:
    balloon_inflate 154828377
    balloon_deflate 154828377
    
    With patch (kernel 5.6.0-rc4+):
    Total inflation duration was 63 seconds. No deflate-queue activity
    occurs when pressuring the page-cache.
    Result of "grep balloon /proc/vmstat" after the test:
    balloon_inflate 12968539
    balloon_deflate 12968539
    
    Conclusion: This patch fixes the issue. In the test it reduced
    inflate/deflate activity by 12x, and reduced inflation time by 8.6x.
    But more importantly, if we hadn't killed the "grep balloon
    /proc/vmstat" process then, without the patch, the inflation process
    would never reach the target.
    
    Attached [1] is a png of a graph showing the problematic behavior without
    this patch. It shows deflate-queue activity increasing linearly while
    balloon size stays constant over the course of more than 8 minutes of
    the test.
    
    [1] https://lore.kernel.org/linux-mm/CAJuQAmphPcfew1v_EOgAdSFiprzjiZjmOf3iJDmFX0gD6b9TYQ@mail.gmail.com/2-without_patch.png
    
    Full test report and discussion [2]:
    
    [2] https://lore.kernel.org/r/CAJuQAmphPcfew1v_EOgAdSFiprzjiZjmOf3iJDmFX0gD6b9TYQ@mail.gmail.comTested-by: default avatarTyler Sanderson <tysand@google.com>
    Reported-by: default avatarTyler Sanderson <tysand@google.com>
    Cc: Michael S. Tsirkin <mst@redhat.com>
    Cc: Wei Wang <wei.w.wang@intel.com>
    Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Nadav Amit <namit@vmware.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Link: https://lore.kernel.org/r/20200205163402.42627-4-david@redhat.comSigned-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
    5a6b4cc5
virtio_balloon.c 29.9 KB