• Heiko Carstens's avatar
    s390/spinlock: optimize spin_unlock code · 44230282
    Heiko Carstens authored
    Use a memory barrier + store sequence instead of a load + compare and swap
    sequence to unlock a spinlock and an rw lock.
    For the spinlock case this saves us two memory reads and a not needed cpu
    serialization after the compare and swap instruction stored the new value.
    
    The kernel size (performance_defconfig) gets reduced by ~14k.
    
    Average execution time of a tight inlined spin_unlock loop drops from
    5.8ns to 0.7ns on a zEC12 machine.
    
    An artificial stress test case where several counters are protected with
    a single spinlock and which are only incremented while holding the spinlock
    shows ~30% improvement on a 4 cpu machine.
    Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
    Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
    44230282
barrier.h 1.22 KB