• Andrii Nakryiko's avatar
    objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids · 78d0b161
    Andrii Nakryiko authored
    Profiling shows that calling nr_possible_cpus() in objpool_pop() takes
    a noticeable amount of CPU (when profiled on 80-core machine), as we
    need to recalculate number of set bits in a CPU bit mask. This number
    can't change, so there is no point in paying the price for recalculating
    it. As such, cache this value in struct objpool_head and use it in
    objpool_pop().
    
    On the other hand, cached pool->nr_cpus isn't necessary, as it's not
    used in hot path and is also a pretty trivial value to retrieve. So drop
    pool->nr_cpus in favor of using nr_cpu_ids everywhere. This way the size
    of struct objpool_head remains the same, which is a nice bonus.
    
    Same BPF selftests benchmarks were used to evaluate the effect. Using
    changes in previous patch (inlining of objpool_pop/objpool_push) as
    baseline, here are the differences:
    
    BASELINE
    ========
    kretprobe      :    9.937 ± 0.174M/s
    kretprobe-multi:   10.440 ± 0.108M/s
    
    AFTER
    =====
    kretprobe      :   10.106 ± 0.120M/s (+1.7%)
    kretprobe-multi:   10.515 ± 0.180M/s (+0.7%)
    
    Link: https://lore.kernel.org/all/20240424215214.3956041-3-andrii@kernel.org/
    
    Cc: Matt (Qiang) Wu <wuqiang.matt@bytedance.com>
    Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
    78d0b161
objpool.c 4.81 KB