• Mark Rutland's avatar
    arm64: atomics: remove redundant static branch · 16860a20
    Mark Rutland authored
    Due to a historical oversight, we emit a redundant static branch for
    each atomic/atomic64 operation when CONFIG_ARM64_LSE_ATOMICS is
    selected. We can safely remove this, making the kernel Image reasonably
    smaller.
    
    When CONFIG_ARM64_LSE_ATOMICS is selected, every LSE atomic operation
    has two preceding static branches with the same target, e.g.
    
    	b	f7c <kernel_init_freeable+0xa4>
    	b	f7c <kernel_init_freeable+0xa4>
    	mov	w0, #0x1                   	// #1
    	ldadd	w0, w0, [x19]
    
    This is because the __lse_ll_sc_body() wrapper uses
    system_uses_lse_atomics(), which checks both `arm64_const_caps_ready`
    and `cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]`, each of which emits a
    static branch. This has been the case since commit:
    
      addfc386 ("arm64: atomics: avoid out-of-line ll/sc atomics")
    
    However, there was never a need to check `arm64_const_caps_ready`, which
    was itself introduced in commit:
    
      63a1e1c9
    
     ("arm64/cpufeature: don't use mutex in bringup path")
    
    ... so that cpus_have_const_cap() could fall back to checking the
    `cpu_hwcaps` bitmap prior to the static keys for individual caps
    becoming enabled. As system_uses_lse_atomics() doesn't check
    `cpu_hwcaps`, and doesn't need to as we can safely use the LL/SC atomics
    prior to enabling the `ARM64_HAS_LSE_ATOMICS` static key, it doesn't
    need to check `arm64_const_caps_ready`.
    
    This patch removes the `arm64_const_caps_ready` check from
    system_uses_lse_atomics(). As the arch_atomic_* routines are meant to be
    safely usable in noinstr code, I've also marked
    system_uses_lse_atomics() as __always_inline.
    
    This results in one fewer static branch per atomic operation, with the
    prior example becoming:
    
    	b	f78 <kernel_init_freeable+0xa0>
    	mov	w0, #0x1                   	// #1
    	ldadd	w0, w0, [x19]
    
    Each static branch consists of the branch itself and an associated
    __jump_table entry. Removing these has a reasonable impact on the Image
    size, with a GCC 11.1.0 defconfig v5.17-rc2 Image being reduced by
    128KiB:
    
    | [mark@lakrids:~/src/linux]% ls -al Image*
    | -rw-r--r-- 1 mark mark 34619904 Feb  3 18:24 Image.baseline
    | -rw-r--r-- 1 mark mark 34488832 Feb  3 18:33 Image.onebranch
    Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
    Cc: Ard Biesheuvel <ardb@kernel.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Suzuki Poulose <suzuki.poulose@arm.com>
    Cc: Will Deacon <will@kernel.org>
    Link: https://lore.kernel.org/r/20220204104439.270567-1-mark.rutland@arm.com
    
    Signed-off-by: default avatarWill Deacon <will@kernel.org>
    16860a20
lse.h 1.14 KB