• Uladzislau Rezki (Sony)'s avatar
    rcu/kvfree: Use a polled API to speedup a reclaim process · cc37d520
    Uladzislau Rezki (Sony) authored
    Currently all objects placed into a batch wait for a full grace period
    to elapse after that batch is ready to send to RCU.  However, this
    can unnecessarily delay freeing of the first objects that were added
    to the batch.  After all, several RCU grace periods might have elapsed
    since those objects were added, and if so, there is no point in further
    deferring their freeing.
    
    This commit therefore adds per-page grace-period snapshots which are
    obtained from get_state_synchronize_rcu().  When the batch is ready
    to be passed to call_rcu(), each page's snapshot is checked by passing
    it to poll_state_synchronize_rcu().  If a given page's RCU grace period
    has already elapsed, its objects are freed immediately by kvfree_rcu_bulk().
    Otherwise, these objects are freed after a call to synchronize_rcu().
    
    This approach requires that the pages be traversed in reverse order,
    that is, the oldest ones first.
    
    Test example:
    
    kvm.sh --memory 10G --torture rcuscale --allcpus --duration 1 \
      --kconfig CONFIG_NR_CPUS=64 \
      --kconfig CONFIG_RCU_NOCB_CPU=y \
      --kconfig CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y \
      --kconfig CONFIG_RCU_LAZY=n \
      --bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 \
      rcuscale.holdoff=20 rcuscale.kfree_loops=10000 \
      torture.disable_onoff_at_boot" --trust-make
    
    Before this commit:
    
    Total time taken by all kfree'ers: 8535693700 ns, loops: 10000, batches: 1188, memory footprint: 2248MB
    Total time taken by all kfree'ers: 8466933582 ns, loops: 10000, batches: 1157, memory footprint: 2820MB
    Total time taken by all kfree'ers: 5375602446 ns, loops: 10000, batches: 1130, memory footprint: 6502MB
    Total time taken by all kfree'ers: 7523283832 ns, loops: 10000, batches: 1006, memory footprint: 3343MB
    Total time taken by all kfree'ers: 6459171956 ns, loops: 10000, batches: 1150, memory footprint: 6549MB
    
    After this commit:
    
    Total time taken by all kfree'ers: 8560060176 ns, loops: 10000, batches: 1787, memory footprint: 61MB
    Total time taken by all kfree'ers: 8573885501 ns, loops: 10000, batches: 1777, memory footprint: 93MB
    Total time taken by all kfree'ers: 8320000202 ns, loops: 10000, batches: 1727, memory footprint: 66MB
    Total time taken by all kfree'ers: 8552718794 ns, loops: 10000, batches: 1790, memory footprint: 75MB
    Total time taken by all kfree'ers: 8601368792 ns, loops: 10000, batches: 1724, memory footprint: 62MB
    
    The reduction in memory footprint is well in excess of an order of
    magnitude.
    Signed-off-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
    cc37d520
tree.c 160 KB