• Uros Bizjak's avatar
    x86/percpu: Rewrite arch_raw_cpu_ptr() to be easier for compilers to optimize · a048d3ab
    Uros Bizjak authored
    Implement arch_raw_cpu_ptr() as a load from this_cpu_off and then
    add the ptr value to the base. This way, the compiler can propagate
    addend to the following instruction and simplify address calculation.
    
    E.g.: address calcuation in amd_pmu_enable_virt() improves from:
    
        48 c7 c0 00 00 00 00	mov    $0x0,%rax
    	87b7: R_X86_64_32S	cpu_hw_events
    
        65 48 03 05 00 00 00	add    %gs:0x0(%rip),%rax
        00
    	87bf: R_X86_64_PC32	this_cpu_off-0x4
    
        48 c7 80 28 13 00 00	movq   $0x0,0x1328(%rax)
        00 00 00 00
    
    to:
    
        65 48 8b 05 00 00 00	mov    %gs:0x0(%rip),%rax
        00
    	8798: R_X86_64_PC32	this_cpu_off-0x4
        48 c7 80 00 00 00 00	movq   $0x0,0x0(%rax)
        00 00 00 00
    	87a6: R_X86_64_32S	cpu_hw_events+0x1328
    
    The compiler also eliminates additional redundant loads from
    this_cpu_off, reducing the number of percpu offset reads
    from 1668 to 1646 on a test build, a -1.3% reduction.
    Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Uros Bizjak <ubizjak@gmail.com>
    Cc: Sean Christopherson <seanjc@google.com>
    Link: https://lore.kernel.org/r/20231015202523.189168-1-ubizjak@gmail.com
    a048d3ab
percpu.h 22.8 KB