• Henry Burns's avatar
    mm/zsmalloc.c: fix race condition in zs_destroy_pool · 701d6785
    Henry Burns authored
    In zs_destroy_pool() we call flush_work(&pool->free_work).  However, we
    have no guarantee that migration isn't happening in the background at
    that time.
    
    Since migration can't directly free pages, it relies on free_work being
    scheduled to free the pages.  But there's nothing preventing an
    in-progress migrate from queuing the work *after*
    zs_unregister_migration() has called flush_work().  Which would mean
    pages still pointing at the inode when we free it.
    
    Since we know at destroy time all objects should be free, no new
    migrations can come in (since zs_page_isolate() fails for fully-free
    zspages).  This means it is sufficient to track a "# isolated zspages"
    count by class, and have the destroy logic ensure all such pages have
    drained before proceeding.  Keeping that state under the class spinlock
    keeps the logic straightforward.
    
    In this case a memory leak could lead to an eventual crash if compaction
    hits the leaked page.  This crash would only occur if people are
    changing their zswap backend at runtime (which eventually starts
    destruction).
    
    Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
    Fixes: 48b4800a ("zsmalloc: page migration support")
    Signed-off-by: default avatarHenry Burns <henryburns@google.com>
    Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Henry Burns <henrywolfeburns@gmail.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Jonathan Adams <jwadams@google.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    701d6785
zsmalloc.c 61.9 KB