arch/x86/kernel/apic/x2apic_cluster.c · 7d20dd3294b31c11a5f642a3e342174ef8da7c73 · Kirill Smelkov / linux

x86/apic: Reduce cache line misses in __x2apic_send_IPI_mask() · cc95a07f

Eric Dumazet authored Oct 07, 2021

Using per-cpu storage for @x86_cpu_to_logical_apicid is not optimal.

Broadcast IPI will need at least one cache line per cpu to access this
field.

__x2apic_send_IPI_mask() is using standard bitmask operators.

By converting x86_cpu_to_logical_apicid to an array, we divide by 16x
number of needed cache lines, because we find 16 values per cache
line. CPU prefetcher can kick nicely.

Also move @cluster_masks to READ_MOSTLY section to avoid false sharing.

Tested on a dual socket host with 256 cpus, cost for a full broadcast
is now 11 usec instead of 33 usec.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211007143556.574911-1-eric.dumazet@gmail.com

cc95a07f

x2apic_cluster.c 6.09 KB

Replace x2apic_cluster.c