• Wang Shilong's avatar
    ext4: reduce lock contention in __ext4_new_inode · 901ed070
    Wang Shilong authored
    While running number of creating file threads concurrently,
    we found heavy lock contention on group spinlock:
    
    FUNC                           TOTAL_TIME(us)       COUNT        AVG(us)
    ext4_create                    1707443399           1440000      1185.72
    _raw_spin_lock                 1317641501           180899929    7.28
    jbd2__journal_start            287821030            1453950      197.96
    jbd2_journal_get_write_access  33441470             73077185     0.46
    ext4_add_nondir                29435963             1440000      20.44
    ext4_add_entry                 26015166             1440049      18.07
    ext4_dx_add_entry              25729337             1432814      17.96
    ext4_mark_inode_dirty          12302433             5774407      2.13
    
    most of cpu time blames to _raw_spin_lock, here is some testing
    numbers with/without patch.
    
    Test environment:
    Server : SuperMicro Sever (2 x E5-2690 v3@2.60GHz, 128GB 2133MHz
             DDR4 Memory, 8GbFC)
    Storage : 2 x RAID1 (DDN SFA7700X, 4 x Toshiba PX02SMU020 200GB
              Read Intensive SSD)
    
    format command:
            mkfs.ext4 -J size=4096
    
    test command:
            mpirun -np 48 mdtest -n 30000 -d /ext4/mdtest.out -F -C \
                    -r -i 1 -v -p 10 -u #first run to load inode
    
            mpirun -np 48 mdtest -n 30000 -d /ext4/mdtest.out -F -C \
                    -r -i 3 -v -p 10 -u
    
    Kernel version: 4.13.0-rc3
    
    Test  1,440,000 files with 48 directories by 48 processes:
    
    Without patch:
    
    File Creation   File removal
    79,033          289,569 ops/per second
    81,463          285,359
    79,875          288,475
    
    With patch:
    File Creation   File removal
    810669		301694
    812805		302711
    813965		297670
    
    Creation performance is improved more than 10X with large
    journal size. The main problem here is we test bitmap
    and do some check and journal operations which could be
    slept, then we test and set with lock hold, this could
    be racy, and make 'inode' steal by other process.
    
    However, after first try, we could confirm handle has
    been started and inode bitmap journaled too, then
    we could find and set bit with lock hold directly, this
    will mostly gurateee success with second try.
    Tested-by: default avatarShuichi Ihara <sihara@ddn.com>
    Signed-off-by: default avatarWang Shilong <wshilong@ddn.com>
    Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
    Reviewed-by: default avatarJan Kara <jack@suse.cz>
    901ed070
ialloc.c 40 KB