• Tang Junhui's avatar
    bcache: fix high CPU occupancy during journal · c4dc2497
    Tang Junhui authored
    After long time small writing I/O running, we found the occupancy of CPU
    is very high and I/O performance has been reduced by about half:
    
    [root@ceph151 internal]# top
    top - 15:51:05 up 1 day,2:43,  4 users,  load average: 16.89, 15.15, 16.53
    Tasks: 2063 total,   4 running, 2059 sleeping,   0 stopped,   0 zombie
    %Cpu(s):4.3 us, 17.1 sy 0.0 ni, 66.1 id, 12.0 wa,  0.0 hi,  0.5 si,  0.0 st
    KiB Mem : 65450044 total, 24586420 free, 38909008 used,  1954616 buff/cache
    KiB Swap: 65667068 total, 65667068 free,        0 used. 25136812 avail Mem
    
      PID USER PR NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
     2023 root 20  0       0      0      0 S 55.1  0.0   0:04.42 kworker/11:191
    14126 root 20  0       0      0      0 S 42.9  0.0   0:08.72 kworker/10:3
     9292 root 20  0       0      0      0 S 30.4  0.0   1:10.99 kworker/6:1
     8553 ceph 20  0 4242492 1.805g  18804 S 30.0  2.9 410:07.04 ceph-osd
    12287 root 20  0       0      0      0 S 26.7  0.0   0:28.13 kworker/7:85
    31019 root 20  0       0      0      0 S 26.1  0.0   1:30.79 kworker/22:1
     1787 root 20  0       0      0      0 R 25.7  0.0   5:18.45 kworker/8:7
    32169 root 20  0       0      0      0 S 14.5  0.0   1:01.92 kworker/23:1
    21476 root 20  0       0      0      0 S 13.9  0.0   0:05.09 kworker/1:54
     2204 root 20  0       0      0      0 S 12.5  0.0   1:25.17 kworker/9:10
    16994 root 20  0       0      0      0 S 12.2  0.0   0:06.27 kworker/5:106
    15714 root 20  0       0      0      0 R 10.9  0.0   0:01.85 kworker/19:2
     9661 ceph 20  0 4246876 1.731g  18800 S 10.6  2.8 403:00.80 ceph-osd
    11460 ceph 20  0 4164692 2.206g  18876 S 10.6  3.5 360:27.19 ceph-osd
     9960 root 20  0       0      0      0 S 10.2  0.0   0:02.75 kworker/2:139
    11699 ceph 20  0 4169244 1.920g  18920 S 10.2  3.1 355:23.67 ceph-osd
     6843 ceph 20  0 4197632 1.810g  18900 S  9.6  2.9 380:08.30 ceph-osd
    
    The kernel work consumed a lot of CPU, and I found they are running journal
    work, The journal is reclaiming source and flush btree node with surprising
    frequency.
    
    Through further analysis, we found that in btree_flush_write(), we try to
    get a btree node with the smallest fifo idex to flush by traverse all the
    btree nodein c->bucket_hash, after we getting it, since no locker protects
    it, this btree node may have been written to cache device by other works,
    and if this occurred, we retry to traverse in c->bucket_hash and get
    another btree node. When the problem occurrd, the retry times is very high,
    and we consume a lot of CPU in looking for a appropriate btree node.
    
    In this patch, we try to record 128 btree nodes with the smallest fifo idex
    in heap, and pop one by one when we need to flush btree node. It greatly
    reduces the time for the loop to find the appropriate BTREE node, and also
    reduce the occupancy of CPU.
    
    [note by mpl: this triggers a checkpatch error because of adjacent,
    pre-existing style violations]
    Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
    Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    c4dc2497
bcache.h 28.8 KB