• Hugh Dickins's avatar
    tmpfs: avoid a little creat and stat slowdown · d0424c42
    Hugh Dickins authored
    LKP reports that v4.2 commit afa2db2f ("tmpfs: truncate prealloc
    blocks past i_size") causes a 14.5% slowdown in the AIM9 creat-clo
    benchmark.
    
    creat-clo does just what you'd expect from the name, and creat's O_TRUNC
    on 0-length file does indeed get into more overhead now shmem_setattr()
    tests "0 <= 0" instead of "0 < 0".
    
    I'm not sure how much we care, but I think it would not be too VW-like to
    add in a check for whether any pages (or swap) are allocated: if none are
    allocated, there's none to remove from the radix_tree.  At first I thought
    that check would be good enough for the unmaps too, but no: we should not
    skip the unlikely case of unmapping pages beyond the new EOF, which were
    COWed from holes which have now been reclaimed, leaving none.
    
    This gives me an 8.5% speedup: on Haswell instead of LKP's Westmere, and
    running a debug config before and after: I hope those account for the
    lesser speedup.
    
    And probably someone has a benchmark where a thousand threads keep on
    stat'ing the same file repeatedly: forestall that report by adjusting v4.3
    commit 44a30220 ("shmem: recalculate file inode when fstat") not to
    take the spinlock in shmem_getattr() when there's no work to do.
    Signed-off-by: default avatarHugh Dickins <hughd@google.com>
    Reported-by: default avatarYing Huang <ying.huang@linux.intel.com>
    Tested-by: default avatarYing Huang <ying.huang@linux.intel.com>
    Cc: Josef Bacik <jbacik@fb.com>
    Cc: Yu Zhao <yuzhao@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    d0424c42
shmem.c 89.8 KB