• Mark Rutland's avatar
    arm64: atomics: lse: improve cmpxchg implementation · e5cacb54
    Mark Rutland authored
    For historical reasons, the LSE implementation of cmpxchg*() hard-codes
    the GPRs to use, and shuffles registers around with MOVs. This is no
    longer necessary, and can be simplified.
    
    When the LSE cmpxchg implementation was added in commit:
    
      c342f782 ("arm64: cmpxchg: patch in lse instructions when supported by the CPU")
    
    ... the LL/SC implementation of cmpxchg() would be placed out-of-line,
    and the in-line assembly for cmpxchg would default to:
    
    	NOP
    	BL	<ll_sc_cmpxchg*_implementation>
    	NOP
    
    The LL/SC implementation of each cmpxchg() function accepted arguments
    as per AAPCS64 rules, to it was necessary to place the pointer in x0,
    the older value in X1, and the new value in x2, and acquire the return
    value from x0. The LL/SC implementation required a temporary register
    (e.g. for the STXR status value). As the LL/SC implementation preserved
    the old value, the LSE implementation does likewise.
    
    Since commit:
    
      addfc386 ("arm64: atomics: avoid out-of-line ll/sc atomics")
    
    ... the LSE and LL/SC implementations of cmpxchg are inlined as separate
    asm blocks, with another branch choosing between thw two. Due to this,
    it is no longer necessary for the LSE implementation to match the
    register constraints of the LL/SC implementation. This was partially
    dealt with by removing the hard-coded use of x30 in commit:
    
      3337cb5a ("arm64: avoid using hard-coded registers for LSE atomics")
    
    ... but we didn't clean up the hard-coding of x0, x1, and x2.
    
    This patch simplifies the LSE implementation of cmpxchg, removing the
    register shuffling and directly clobbering the 'old' argument. This
    gives the compiler greater freedom for register allocation, and avoids
    redundant work.
    
    The new constraints permit 'old' (Rs) and 'new' (Rt) to be allocated to
    the same register when the initial values of the two are the same, e.g.
    resulting in:
    
    	CAS	X0, X0, [X1]
    
    This is safe as Rs is only written back after the initial values of Rs
    and Rt are consumed, and there are no UNPREDICTABLE behaviours to avoid
    when Rs == Rt.
    
    The new constraints also permit 'new' to be allocated to the zero
    register, avoiding a MOV in a few cases. The same cannot be done for
    'old' as it is both an input and output, and any caller of cmpxchg()
    should care about the output value. Note that for CAS* the use of the
    zero register never affects the ordering (while for SWP* the use of the
    zero regsiter for the 'old' value drops any ACQUIRE semantic).
    
    Compared to v6.2-rc4, a defconfig vmlinux is ~116KiB smaller, though the
    resulting Image is the same size due to internal alignment and padding:
    
      [mark@lakrids:~/src/linux]% ls -al vmlinux-*
      -rwxr-xr-x 1 mark mark 137269304 Jan 16 11:59 vmlinux-after
      -rwxr-xr-x 1 mark mark 137387936 Jan 16 10:54 vmlinux-before
      [mark@lakrids:~/src/linux]% ls -al Image-*
      -rw-r--r-- 1 mark mark 38711808 Jan 16 11:59 Image-after
      -rw-r--r-- 1 mark mark 38711808 Jan 16 10:54 Image-before
    
    This patch does not touch cmpxchg_double*() as that requires contiguous
    register pairs, and separate patches will replace it with cmpxchg128*().
    
    There should be no functional change as a result of this patch.
    Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Robin Murphy <robin.murphy@arm.com>
    Cc: Will Deacon <will@kernel.org>
    Link: https://lore.kernel.org/r/20230314153700.787701-2-mark.rutland@arm.comSigned-off-by: default avatarWill Deacon <will@kernel.org>
    e5cacb54
atomic_lse.h 8.37 KB