Commit 2ecf8101 authored by Paul E. McKenney's avatar Paul E. McKenney Committed by Ingo Molnar

Documentation/memory-barriers.txt: Add needed ACCESS_ONCE() calls to memory-barriers.txt

The Documentation/memory-barriers.txt file was written before
the need for ACCESS_ONCE() was fully appreciated.  It therefore
contains no ACCESS_ONCE() calls, which can be a problem when
people lift examples from it.  This commit therefore adds
ACCESS_ONCE() calls.
Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
Reviewed-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1386799151-2219-1-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
parent 962d9c57
...@@ -194,18 +194,22 @@ There are some minimal guarantees that may be expected of a CPU: ...@@ -194,18 +194,22 @@ There are some minimal guarantees that may be expected of a CPU:
(*) On any given CPU, dependent memory accesses will be issued in order, with (*) On any given CPU, dependent memory accesses will be issued in order, with
respect to itself. This means that for: respect to itself. This means that for:
Q = P; D = *Q; ACCESS_ONCE(Q) = P; smp_read_barrier_depends(); D = ACCESS_ONCE(*Q);
the CPU will issue the following memory operations: the CPU will issue the following memory operations:
Q = LOAD P, D = LOAD *Q Q = LOAD P, D = LOAD *Q
and always in that order. and always in that order. On most systems, smp_read_barrier_depends()
does nothing, but it is required for DEC Alpha. The ACCESS_ONCE()
is required to prevent compiler mischief. Please note that you
should normally use something like rcu_dereference() instead of
open-coding smp_read_barrier_depends().
(*) Overlapping loads and stores within a particular CPU will appear to be (*) Overlapping loads and stores within a particular CPU will appear to be
ordered within that CPU. This means that for: ordered within that CPU. This means that for:
a = *X; *X = b; a = ACCESS_ONCE(*X); ACCESS_ONCE(*X) = b;
the CPU will only issue the following sequence of memory operations: the CPU will only issue the following sequence of memory operations:
...@@ -213,7 +217,7 @@ There are some minimal guarantees that may be expected of a CPU: ...@@ -213,7 +217,7 @@ There are some minimal guarantees that may be expected of a CPU:
And for: And for:
*X = c; d = *X; ACCESS_ONCE(*X) = c; d = ACCESS_ONCE(*X);
the CPU will only issue: the CPU will only issue:
...@@ -224,6 +228,41 @@ There are some minimal guarantees that may be expected of a CPU: ...@@ -224,6 +228,41 @@ There are some minimal guarantees that may be expected of a CPU:
And there are a number of things that _must_ or _must_not_ be assumed: And there are a number of things that _must_ or _must_not_ be assumed:
(*) It _must_not_ be assumed that the compiler will do what you want with
memory references that are not protected by ACCESS_ONCE(). Without
ACCESS_ONCE(), the compiler is within its rights to do all sorts
of "creative" transformations:
(-) Repeat the load, possibly getting a different value on the second
and subsequent loads. This is especially prone to happen when
register pressure is high.
(-) Merge adjacent loads and stores to the same location. The most
familiar example is the transformation from:
while (a)
do_something();
to something like:
if (a)
for (;;)
do_something();
Using ACCESS_ONCE() as follows prevents this sort of optimization:
while (ACCESS_ONCE(a))
do_something();
(-) "Store tearing", where a single store in the source code is split
into smaller stores in the object code. Note that gcc really
will do this on some architectures when storing certain constants.
It can be cheaper to do a series of immediate stores than to
form the constant in a register and then to store that register.
(-) "Load tearing", which splits loads in a manner analogous to
store tearing.
(*) It _must_not_ be assumed that independent loads and stores will be issued (*) It _must_not_ be assumed that independent loads and stores will be issued
in the order given. This means that for: in the order given. This means that for:
...@@ -450,14 +489,14 @@ The usage requirements of data dependency barriers are a little subtle, and ...@@ -450,14 +489,14 @@ The usage requirements of data dependency barriers are a little subtle, and
it's not always obvious that they're needed. To illustrate, consider the it's not always obvious that they're needed. To illustrate, consider the
following sequence of events: following sequence of events:
CPU 1 CPU 2 CPU 1 CPU 2
=============== =============== =============== ===============
{ A == 1, B == 2, C = 3, P == &A, Q == &C } { A == 1, B == 2, C = 3, P == &A, Q == &C }
B = 4; B = 4;
<write barrier> <write barrier>
P = &B ACCESS_ONCE(P) = &B
Q = P; Q = ACCESS_ONCE(P);
D = *Q; D = *Q;
There's a clear data dependency here, and it would seem that by the end of the There's a clear data dependency here, and it would seem that by the end of the
sequence, Q must be either &A or &B, and that: sequence, Q must be either &A or &B, and that:
...@@ -477,15 +516,15 @@ Alpha). ...@@ -477,15 +516,15 @@ Alpha).
To deal with this, a data dependency barrier or better must be inserted To deal with this, a data dependency barrier or better must be inserted
between the address load and the data load: between the address load and the data load:
CPU 1 CPU 2 CPU 1 CPU 2
=============== =============== =============== ===============
{ A == 1, B == 2, C = 3, P == &A, Q == &C } { A == 1, B == 2, C = 3, P == &A, Q == &C }
B = 4; B = 4;
<write barrier> <write barrier>
P = &B ACCESS_ONCE(P) = &B
Q = P; Q = ACCESS_ONCE(P);
<data dependency barrier> <data dependency barrier>
D = *Q; D = *Q;
This enforces the occurrence of one of the two implications, and prevents the This enforces the occurrence of one of the two implications, and prevents the
third possibility from arising. third possibility from arising.
...@@ -504,21 +543,22 @@ Another example of where data dependency barriers might be required is where a ...@@ -504,21 +543,22 @@ Another example of where data dependency barriers might be required is where a
number is read from memory and then used to calculate the index for an array number is read from memory and then used to calculate the index for an array
access: access:
CPU 1 CPU 2 CPU 1 CPU 2
=============== =============== =============== ===============
{ M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 } { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
M[1] = 4; M[1] = 4;
<write barrier> <write barrier>
P = 1 ACCESS_ONCE(P) = 1
Q = P; Q = ACCESS_ONCE(P);
<data dependency barrier> <data dependency barrier>
D = M[Q]; D = M[Q];
The data dependency barrier is very important to the RCU system, for example. The data dependency barrier is very important to the RCU system,
See rcu_dereference() in include/linux/rcupdate.h. This permits the current for example. See rcu_assign_pointer() and rcu_dereference() in
target of an RCU'd pointer to be replaced with a new modified target, without include/linux/rcupdate.h. This permits the current target of an RCU'd
the replacement target appearing to be incompletely initialised. pointer to be replaced with a new modified target, without the replacement
target appearing to be incompletely initialised.
See also the subsection on "Cache Coherency" for a more thorough example. See also the subsection on "Cache Coherency" for a more thorough example.
...@@ -530,22 +570,23 @@ A control dependency requires a full read memory barrier, not simply a data ...@@ -530,22 +570,23 @@ A control dependency requires a full read memory barrier, not simply a data
dependency barrier to make it work correctly. Consider the following bit of dependency barrier to make it work correctly. Consider the following bit of
code: code:
q = &a; q = ACCESS_ONCE(a);
if (p) { if (p) {
<data dependency barrier> <data dependency barrier>
q = &b; q = ACCESS_ONCE(b);
} }
x = *q; x = *q;
This will not have the desired effect because there is no actual data This will not have the desired effect because there is no actual data
dependency, but rather a control dependency that the CPU may short-circuit by dependency, but rather a control dependency that the CPU may short-circuit
attempting to predict the outcome in advance. In such a case what's actually by attempting to predict the outcome in advance, so that other CPUs see
required is: the load from b as having happened before the load from a. In such a
case what's actually required is:
q = &a; q = ACCESS_ONCE(a);
if (p) { if (p) {
<read barrier> <read barrier>
q = &b; q = ACCESS_ONCE(b);
} }
x = *q; x = *q;
...@@ -561,23 +602,23 @@ barrier, though a general barrier would also be viable. Similarly a read ...@@ -561,23 +602,23 @@ barrier, though a general barrier would also be viable. Similarly a read
barrier or a data dependency barrier should always be paired with at least an barrier or a data dependency barrier should always be paired with at least an
write barrier, though, again, a general barrier is viable: write barrier, though, again, a general barrier is viable:
CPU 1 CPU 2 CPU 1 CPU 2
=============== =============== =============== ===============
a = 1; ACCESS_ONCE(a) = 1;
<write barrier> <write barrier>
b = 2; x = b; ACCESS_ONCE(b) = 2; x = ACCESS_ONCE(b);
<read barrier> <read barrier>
y = a; y = ACCESS_ONCE(a);
Or: Or:
CPU 1 CPU 2 CPU 1 CPU 2
=============== =============================== =============== ===============================
a = 1; a = 1;
<write barrier> <write barrier>
b = &a; x = b; ACCESS_ONCE(b) = &a; x = ACCESS_ONCE(b);
<data dependency barrier> <data dependency barrier>
y = *x; y = *x;
Basically, the read barrier always has to be there, even though it can be of Basically, the read barrier always has to be there, even though it can be of
the "weaker" type. the "weaker" type.
...@@ -586,13 +627,13 @@ the "weaker" type. ...@@ -586,13 +627,13 @@ the "weaker" type.
match the loads after the read barrier or the data dependency barrier, and vice match the loads after the read barrier or the data dependency barrier, and vice
versa: versa:
CPU 1 CPU 2 CPU 1 CPU 2
=============== =============== =================== ===================
a = 1; }---- --->{ v = c ACCESS_ONCE(a) = 1; }---- --->{ v = ACCESS_ONCE(c);
b = 2; } \ / { w = d ACCESS_ONCE(b) = 2; } \ / { w = ACCESS_ONCE(d);
<write barrier> \ <read barrier> <write barrier> \ <read barrier>
c = 3; } / \ { x = a; ACCESS_ONCE(c) = 3; } / \ { x = ACCESS_ONCE(a);
d = 4; }---- --->{ y = b; ACCESS_ONCE(d) = 4; }---- --->{ y = ACCESS_ONCE(b);
EXAMPLES OF MEMORY BARRIER SEQUENCES EXAMPLES OF MEMORY BARRIER SEQUENCES
...@@ -1435,12 +1476,12 @@ three CPUs; then should the following sequence of events occur: ...@@ -1435,12 +1476,12 @@ three CPUs; then should the following sequence of events occur:
CPU 1 CPU 2 CPU 1 CPU 2
=============================== =============================== =============================== ===============================
*A = a; *E = e; ACCESS_ONCE(*A) = a; ACCESS_ONCE(*E) = e;
LOCK M LOCK Q LOCK M LOCK Q
*B = b; *F = f; ACCESS_ONCE(*B) = b; ACCESS_ONCE(*F) = f;
*C = c; *G = g; ACCESS_ONCE(*C) = c; ACCESS_ONCE(*G) = g;
UNLOCK M UNLOCK Q UNLOCK M UNLOCK Q
*D = d; *H = h; ACCESS_ONCE(*D) = d; ACCESS_ONCE(*H) = h;
Then there is no guarantee as to what order CPU 3 will see the accesses to *A Then there is no guarantee as to what order CPU 3 will see the accesses to *A
through *H occur in, other than the constraints imposed by the separate locks through *H occur in, other than the constraints imposed by the separate locks
...@@ -1460,17 +1501,17 @@ However, if the following occurs: ...@@ -1460,17 +1501,17 @@ However, if the following occurs:
CPU 1 CPU 2 CPU 1 CPU 2
=============================== =============================== =============================== ===============================
*A = a; ACCESS_ONCE(*A) = a;
LOCK M [1] LOCK M [1]
*B = b; ACCESS_ONCE(*B) = b;
*C = c; ACCESS_ONCE(*C) = c;
UNLOCK M [1] UNLOCK M [1]
*D = d; *E = e; ACCESS_ONCE(*D) = d; ACCESS_ONCE(*E) = e;
LOCK M [2] LOCK M [2]
*F = f; ACCESS_ONCE(*F) = f;
*G = g; ACCESS_ONCE(*G) = g;
UNLOCK M [2] UNLOCK M [2]
*H = h; ACCESS_ONCE(*H) = h;
CPU 3 might see: CPU 3 might see:
...@@ -2177,11 +2218,11 @@ A programmer might take it for granted that the CPU will perform memory ...@@ -2177,11 +2218,11 @@ A programmer might take it for granted that the CPU will perform memory
operations in exactly the order specified, so that if the CPU is, for example, operations in exactly the order specified, so that if the CPU is, for example,
given the following piece of code to execute: given the following piece of code to execute:
a = *A; a = ACCESS_ONCE(*A);
*B = b; ACCESS_ONCE(*B) = b;
c = *C; c = ACCESS_ONCE(*C);
d = *D; d = ACCESS_ONCE(*D);
*E = e; ACCESS_ONCE(*E) = e;
they would then expect that the CPU will complete the memory operation for each they would then expect that the CPU will complete the memory operation for each
instruction before moving on to the next one, leading to a definite sequence of instruction before moving on to the next one, leading to a definite sequence of
...@@ -2228,12 +2269,12 @@ However, it is guaranteed that a CPU will be self-consistent: it will see its ...@@ -2228,12 +2269,12 @@ However, it is guaranteed that a CPU will be self-consistent: it will see its
_own_ accesses appear to be correctly ordered, without the need for a memory _own_ accesses appear to be correctly ordered, without the need for a memory
barrier. For instance with the following code: barrier. For instance with the following code:
U = *A; U = ACCESS_ONCE(*A);
*A = V; ACCESS_ONCE(*A) = V;
*A = W; ACCESS_ONCE(*A) = W;
X = *A; X = ACCESS_ONCE(*A);
*A = Y; ACCESS_ONCE(*A) = Y;
Z = *A; Z = ACCESS_ONCE(*A);
and assuming no intervention by an external influence, it can be assumed that and assuming no intervention by an external influence, it can be assumed that
the final result will appear to be: the final result will appear to be:
...@@ -2250,7 +2291,12 @@ accesses: ...@@ -2250,7 +2291,12 @@ accesses:
in that order, but, without intervention, the sequence may have almost any in that order, but, without intervention, the sequence may have almost any
combination of elements combined or discarded, provided the program's view of combination of elements combined or discarded, provided the program's view of
the world remains consistent. the world remains consistent. Note that ACCESS_ONCE() is -not- optional
in the above example, as there are architectures where a given CPU might
interchange successive loads to the same location. On such architectures,
ACCESS_ONCE() does whatever is necessary to prevent this, for example, on
Itanium the volatile casts used by ACCESS_ONCE() cause GCC to emit the
special ld.acq and st.rel instructions that prevent such reordering.
The compiler may also combine, discard or defer elements of the sequence before The compiler may also combine, discard or defer elements of the sequence before
the CPU even sees them. the CPU even sees them.
...@@ -2264,13 +2310,13 @@ may be reduced to: ...@@ -2264,13 +2310,13 @@ may be reduced to:
*A = W; *A = W;
since, without a write barrier, it can be assumed that the effect of the since, without either a write barrier or an ACCESS_ONCE(), it can be
storage of V to *A is lost. Similarly: assumed that the effect of the storage of V to *A is lost. Similarly:
*A = Y; *A = Y;
Z = *A; Z = *A;
may, without a memory barrier, be reduced to: may, without a memory barrier or an ACCESS_ONCE(), be reduced to:
*A = Y; *A = Y;
Z = Y; Z = Y;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment