ARM: 7983/1: atomics: implement a better __atomic_add_unless for v6+
Looking at perf profiles of multi-threaded hackbench runs, a significant performance hit appears to manifest from the cmpxchg loop used to implement the 32-bit atomic_add_unless function. This can be mitigated by writing a direct implementation of __atomic_add_unless which doesn't require iteration outside of the atomic operation. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Showing
Please register or sign in to comment