• Nhat Pham's avatar
    zsmalloc: consolidate zs_pool's migrate_lock and size_class's locks · c0547d0b
    Nhat Pham authored
    Currently, zsmalloc has a hierarchy of locks, which includes a pool-level
    migrate_lock, and a lock for each size class.  We have to obtain both
    locks in the hotpath in most cases anyway, except for zs_malloc.  This
    exception will no longer exist when we introduce a LRU into the zs_pool
    for the new writeback functionality - we will need to obtain a pool-level
    lock to synchronize LRU handling even in zs_malloc.
    
    In preparation for zsmalloc writeback, consolidate these locks into a
    single pool-level lock, which drastically reduces the complexity of
    synchronization in zsmalloc.
    
    We have also benchmarked the lock consolidation to see the performance
    effect of this change on zram.
    
    First, we ran a synthetic FS workload on a server machine with 36 cores
    (same machine for all runs), using
    
    fs_mark  -d  ../zram1mnt  -s  100000  -n  2500  -t  32  -k
    
    before and after for btrfs and ext4 on zram (FS usage is 80%).
    
    Here is the result (unit is file/second):
    
    With lock consolidation (btrfs):
    Average: 13520.2, Median: 13531.0, Stddev: 137.5961482019028
    
    Without lock consolidation (btrfs):
    Average: 13487.2, Median: 13575.0, Stddev: 309.08283679298665
    
    With lock consolidation (ext4):
    Average: 16824.4, Median: 16839.0, Stddev: 89.97388510006668
    
    Without lock consolidation (ext4)
    Average: 16958.0, Median: 16986.0, Stddev: 194.7370021336469
    
    As you can see, we observe a 0.3% regression for btrfs, and a 0.9%
    regression for ext4. This is a small, barely measurable difference in my
    opinion.
    
    For a more realistic scenario, we also tries building the kernel on zram.
    Here is the time it takes (in seconds):
    
    With lock consolidation (btrfs):
    real
    Average: 319.6, Median: 320.0, Stddev: 0.8944271909999159
    user
    Average: 6894.2, Median: 6895.0, Stddev: 25.528415540334656
    sys
    Average: 521.4, Median: 522.0, Stddev: 1.51657508881031
    
    Without lock consolidation (btrfs):
    real
    Average: 319.8, Median: 320.0, Stddev: 0.8366600265340756
    user
    Average: 6896.6, Median: 6899.0, Stddev: 16.04057355583023
    sys
    Average: 520.6, Median: 521.0, Stddev: 1.140175425099138
    
    With lock consolidation (ext4):
    real
    Average: 320.0, Median: 319.0, Stddev: 1.4142135623730951
    user
    Average: 6896.8, Median: 6878.0, Stddev: 28.621670111997307
    sys
    Average: 521.2, Median: 521.0, Stddev: 1.7888543819998317
    
    Without lock consolidation (ext4)
    real
    Average: 319.6, Median: 319.0, Stddev: 0.8944271909999159
    user
    Average: 6886.2, Median: 6887.0, Stddev: 16.93221781102523
    sys
    Average: 520.4, Median: 520.0, Stddev: 1.140175425099138
    
    The difference is entirely within the noise of a typical run on zram. 
    This hardly justifies the complexity of maintaining both the pool lock and
    the class lock.  In fact, for writeback, we would need to introduce yet
    another lock to prevent data races on the pool's LRU, further complicating
    the lock handling logic.  IMHO, it is just better to collapse all of these
    into a single pool-level lock.
    
    Link: https://lkml.kernel.org/r/20221128191616.1261026-4-nphamcs@gmail.comSigned-off-by: default avatarNhat Pham <nphamcs@gmail.com>
    Suggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: default avatarMinchan Kim <minchan@kernel.org>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Reviewed-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
    Cc: Dan Streetman <ddstreet@ieee.org>
    Cc: Nitin Gupta <ngupta@vflare.org>
    Cc: Seth Jennings <sjenning@redhat.com>
    Cc: Vitaly Wool <vitaly.wool@konsulko.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    c0547d0b
zsmalloc.c 57.3 KB