• Feng Zhou's avatar
    bpf: avoid grabbing spin_locks of all cpus when no free elems · 54a9c3a4
    Feng Zhou authored
    This patch use head->first in pcpu_freelist_head to check freelist
    having free or not. If having, grab spin_lock, or check next cpu's
    freelist.
    
    Before patch: hash_map performance
    ./map_perf_test 1
    0:hash_map_perf pre-alloc 1043397 events per sec
    ...
    The average of the test results is around 1050000 events per sec.
    
    hash_map the worst: no free
    ./run_bench_bpf_hashmap_full_update.sh
    Setting up benchmark 'bpf-hashmap-ful-update'...
    Benchmark 'bpf-hashmap-ful-update' started.
    1:hash_map_full_perf 15687 events per sec
    ...
    The average of the test results is around 16000 events per sec.
    
    ftrace trace:
    0)               |  htab_map_update_elem() {
    0)               |      __pcpu_freelist_pop() {
    0)               |        _raw_spin_lock()
    0)               |        _raw_spin_unlock()
    0)               |        ...
    0) + 25.188 us   |      }
    0) + 28.439 us   |  }
    
    The test machine is 16C, trying to get spin_lock 17 times, in addition
    to 16c, there is an extralist.
    
    after patch: hash_map performance
    ./map_perf_test 1
    0:hash_map_perf pre-alloc 1053298 events per sec
    ...
    The average of the test results is around 1050000 events per sec.
    
    hash_map worst: no free
    ./run_bench_bpf_hashmap_full_update.sh
    Setting up benchmark 'bpf-hashmap-ful-update'...
    Benchmark 'bpf-hashmap-ful-update' started.
    1:hash_map_full_perf 555830 events per sec
    ...
    The average of the test results is around 550000 events per sec.
    
    ftrace trace:
    0)               |  htab_map_update_elem() {
    0)               |    alloc_htab_elem() {
    0)   0.586 us    |      __pcpu_freelist_pop();
    0)   0.945 us    |    }
    0)   8.669 us    |  }
    
    It can be seen that after adding this patch, the map performance is
    almost not degraded, and when free=0, first check head->first instead of
    directly acquiring spin_lock.
    Co-developed-by: default avatarChengming Zhou <zhouchengming@bytedance.com>
    Signed-off-by: default avatarChengming Zhou <zhouchengming@bytedance.com>
    Signed-off-by: default avatarFeng Zhou <zhoufeng.zf@bytedance.com>
    Link: https://lore.kernel.org/r/20220610023308.93798-2-zhoufeng.zf@bytedance.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    54a9c3a4
percpu_freelist.c 4.91 KB