• Tim Chen's avatar
    tmpfs: make tmpfs scalable with percpu_counter for used blocks · 7e496299
    Tim Chen authored
    The current implementation of tmpfs is not scalable.  We found that
    stat_lock is contended by multiple threads when we need to get a new page,
    leading to useless spinning inside this spin lock.
    
    This patch makes use of the percpu_counter library to maintain local count
    of used blocks to speed up getting and returning of pages.  So the
    acquisition of stat_lock is unnecessary for getting and returning blocks,
    improving the performance of tmpfs on system with large number of cpus.
    On a 4 socket 32 core NHM-EX system, we saw improvement of 270%.
    
    The implementation below has a slight chance of race between threads
    causing a slight overshoot of the maximum configured blocks.  However, any
    overshoot is small, and is bounded by the number of cpus.  This happens
    when the number of used blocks is slightly below the maximum configured
    blocks when a thread checks the used block count, and another thread
    allocates the last block before the current thread does.  This should not
    be a problem for tmpfs, as the overshoot is most likely to be a few blocks
    and bounded.  If a strict limit is really desired, then configured the max
    blocks to be the limit less the number of cpus in system.
    Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
    Cc: Hugh Dickins <hughd@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    7e496299
shmem.c 70.8 KB