Commit d0dd066a authored by Christoph Lameter (Ampere)'s avatar Christoph Lameter (Ampere) Committed by Linus Torvalds

seqcount: replace smp_rmb() in read_seqcount() with load acquire

Many architectures support load acquire which can replace a memory
barrier and save some cycles.

A typical sequence

	do {
		seq = read_seqcount_begin(&s);
		<something>
	} while (read_seqcount_retry(&s, seq);

requires 13 cycles on an N1 Neoverse arm64 core (Ampere Altra, to be
specific) for an empty loop.  Two read memory barriers are needed.  One
for each of the seqcount_* functions.

We can replace the first read barrier with a load acquire of the
seqcount which saves us one barrier.

On the Altra doing so reduces the cycle count from 13 to 8.

According to ARM, this is a general improvement for the ARM64
architecture and not specific to a certain processor.

See

  https://developer.arm.com/documentation/102336/0100/Load-Acquire-and-Store-Release-instructions

 "Weaker ordering requirements that are imposed by Load-Acquire and
  Store-Release instructions allow for micro-architectural
  optimizations, which could reduce some of the performance impacts that
  are otherwise imposed by an explicit memory barrier.

  If the ordering requirement is satisfied using either a Load-Acquire
  or Store-Release, then it would be preferable to use these
  instructions instead of a DMB"

[ NOTE! This is my original minimal patch that unconditionally switches
  over to using smp_load_acquire(), instead of the much more involved
  and subtle patch that Christoph Lameter wrote that made it
  conditional.

  But Christoph gets authorship credit because I had initially thought
  that we needed the more complex model, and Christoph ran with it it
  and did the work. Only after looking at code generation for all the
  relevant architectures, did I come to the conclusion that nobody
  actually really needs the old "smp_rmb()" model.

  Even architectures without load-acquire support generally do as well
  or better with smp_load_acquire().

  So credit to Christoph, but if this then causes issues on other
  architectures, put the blame solidly on me.

  Also note as part of the ruthless simplification, this gets rid of the
  overly subtle optimization where some code uses a non-barrier version
  of the sequence count (see the __read_seqcount_begin() users in
  fs/namei.c). They then play games with their own barriers and/or with
  nested sequence counts.

  Those optimizations are literally meaningless on x86, and questionable
  elsewhere. If somebody can show that they matter, we need to re-do
  them more cleanly than "use an internal helper".       - Linus ]
Signed-off-by: default avatarChristoph Lameter (Ampere) <cl@gentwo.org>
Link: https://lore.kernel.org/all/20240912-seq_optimize-v3-1-8ee25e04dffa@gentwo.org/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent de5cb0dc
...@@ -157,7 +157,7 @@ __seqprop_##lockname##_const_ptr(const seqcount_##lockname##_t *s) \ ...@@ -157,7 +157,7 @@ __seqprop_##lockname##_const_ptr(const seqcount_##lockname##_t *s) \
static __always_inline unsigned \ static __always_inline unsigned \
__seqprop_##lockname##_sequence(const seqcount_##lockname##_t *s) \ __seqprop_##lockname##_sequence(const seqcount_##lockname##_t *s) \
{ \ { \
unsigned seq = READ_ONCE(s->seqcount.sequence); \ unsigned seq = smp_load_acquire(&s->seqcount.sequence); \
\ \
if (!IS_ENABLED(CONFIG_PREEMPT_RT)) \ if (!IS_ENABLED(CONFIG_PREEMPT_RT)) \
return seq; \ return seq; \
...@@ -170,7 +170,7 @@ __seqprop_##lockname##_sequence(const seqcount_##lockname##_t *s) \ ...@@ -170,7 +170,7 @@ __seqprop_##lockname##_sequence(const seqcount_##lockname##_t *s) \
* Re-read the sequence counter since the (possibly \ * Re-read the sequence counter since the (possibly \
* preempted) writer made progress. \ * preempted) writer made progress. \
*/ \ */ \
seq = READ_ONCE(s->seqcount.sequence); \ seq = smp_load_acquire(&s->seqcount.sequence); \
} \ } \
\ \
return seq; \ return seq; \
...@@ -208,7 +208,7 @@ static inline const seqcount_t *__seqprop_const_ptr(const seqcount_t *s) ...@@ -208,7 +208,7 @@ static inline const seqcount_t *__seqprop_const_ptr(const seqcount_t *s)
static inline unsigned __seqprop_sequence(const seqcount_t *s) static inline unsigned __seqprop_sequence(const seqcount_t *s)
{ {
return READ_ONCE(s->sequence); return smp_load_acquire(&s->sequence);
} }
static inline bool __seqprop_preemptible(const seqcount_t *s) static inline bool __seqprop_preemptible(const seqcount_t *s)
...@@ -263,17 +263,9 @@ SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex) ...@@ -263,17 +263,9 @@ SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex)
#define seqprop_assert(s) __seqprop(s, assert)(s) #define seqprop_assert(s) __seqprop(s, assert)(s)
/** /**
* __read_seqcount_begin() - begin a seqcount_t read section w/o barrier * __read_seqcount_begin() - begin a seqcount_t read section
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
* *
* __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
* barrier. Callers should ensure that smp_rmb() or equivalent ordering is
* provided before actually loading any of the variables that are to be
* protected in this critical section.
*
* Use carefully, only in critical code, and comment how the barrier is
* provided.
*
* Return: count to be passed to read_seqcount_retry() * Return: count to be passed to read_seqcount_retry()
*/ */
#define __read_seqcount_begin(s) \ #define __read_seqcount_begin(s) \
...@@ -293,13 +285,7 @@ SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex) ...@@ -293,13 +285,7 @@ SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex)
* *
* Return: count to be passed to read_seqcount_retry() * Return: count to be passed to read_seqcount_retry()
*/ */
#define raw_read_seqcount_begin(s) \ #define raw_read_seqcount_begin(s) __read_seqcount_begin(s)
({ \
unsigned _seq = __read_seqcount_begin(s); \
\
smp_rmb(); \
_seq; \
})
/** /**
* read_seqcount_begin() - begin a seqcount_t read critical section * read_seqcount_begin() - begin a seqcount_t read critical section
...@@ -328,7 +314,6 @@ SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex) ...@@ -328,7 +314,6 @@ SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex)
({ \ ({ \
unsigned __seq = seqprop_sequence(s); \ unsigned __seq = seqprop_sequence(s); \
\ \
smp_rmb(); \
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \ kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \
__seq; \ __seq; \
}) })
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment