• Lynn Boger's avatar
    sync/atomic, runtime/internal/atomic: improve ppc64x atomics · eeca3ba9
    Lynn Boger authored
    The following performance improvements have been made to the
    low-level atomic functions for ppc64le & ppc64:
    
    - For those cases containing a lwarx and stwcx (or other sizes):
    sync, lwarx, maybe something, stwcx, loop to sync, sync, isync
    The sync is moved before (outside) the lwarx/stwcx loop, and the
     sync after is removed, so it becomes:
    sync, lwarx, maybe something, stwcx, loop to lwarx, isync
    
    - For the Or8 and And8, the shifting and manipulation of the
    address to the word aligned version were removed and the
    instructions were changed to use lbarx, stbcx instead of
    register shifting, xor, then lwarx, stwcx.
    
    - New instructions LWSYNC, LBAR, STBCC were tested and added.
    runtime/atomic_ppc64x.s was changed to use the LWSYNC opcode
    instead of the WORD encoding.
    
    Fixes #15469
    
    Ran some of the benchmarks in the runtime and sync directories.
    Some results varied from run to run but the trend was improvement
    based on best times for base and new:
    
    ru...
    eeca3ba9
asm9.go 84 KB