Commit f00c313b authored by Mauro Carvalho Chehab's avatar Mauro Carvalho Chehab Committed by Jonathan Corbet

docs: trace: ring-buffer-design.txt: convert to ReST format

- Just like some media documents, this file is dual licensed
  with GPL and GFDL. As right now the GFDL SPDX definition is
  bogus (as it doesn't tell anything about invariant parts),
  let's not use SPDX here. Let's use, instead, the same test
  as we have on media.
- Convert title to ReST format;
- use :field:  markup;
- Proper mark literal blocks as such;
- Add it to trace/index.rst file.
Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/d350be9b666ca0de441b684b2282ddd76bd7b397.1592918949.git.mchehab+huawei@kernel.orgSigned-off-by: default avatarJonathan Corbet <corbet@lwn.net>
parent 691462f2
...@@ -22,6 +22,7 @@ Linux Tracing Technologies ...@@ -22,6 +22,7 @@ Linux Tracing Technologies
boottime-trace boottime-trace
hwlat_detector hwlat_detector
intel_th intel_th
ring-buffer-design
stm stm
sys-t sys-t
coresight/index coresight/index
Lockless Ring Buffer Design .. This file is dual-licensed: you can use it either under the terms
=========================== .. of the GPL 2.0 or the GFDL 1.2 license, at your option. Note that this
.. dual licensing only applies to this file, and not this project as a
.. whole.
..
.. a) This file is free software; you can redistribute it and/or
.. modify it under the terms of the GNU General Public License as
.. published by the Free Software Foundation version 2 of
.. the License.
..
.. This file is distributed in the hope that it will be useful,
.. but WITHOUT ANY WARRANTY; without even the implied warranty of
.. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
.. GNU General Public License for more details.
..
.. Or, alternatively,
..
.. b) Permission is granted to copy, distribute and/or modify this
.. document under the terms of the GNU Free Documentation License,
.. Version 1.2 version published by the Free Software
.. Foundation, with no Invariant Sections, no Front-Cover Texts
.. and no Back-Cover Texts. A copy of the license is included at
.. Documentation/userspace-api/media/fdl-appendix.rst.
..
.. TODO: replace it to GPL-2.0 OR GFDL-1.2 WITH no-invariant-sections
===========================
Lockless Ring Buffer Design
===========================
Copyright 2009 Red Hat Inc. Copyright 2009 Red Hat Inc.
Author: Steven Rostedt <srostedt@redhat.com>
License: The GNU Free Documentation License, Version 1.2 :Author: Steven Rostedt <srostedt@redhat.com>
:License: The GNU Free Documentation License, Version 1.2
(dual licensed under the GPL v2) (dual licensed under the GPL v2)
Reviewers: Mathieu Desnoyers, Huang Ying, Hidetoshi Seto, :Reviewers: Mathieu Desnoyers, Huang Ying, Hidetoshi Seto,
and Frederic Weisbecker. and Frederic Weisbecker.
...@@ -14,37 +42,50 @@ Written for: 2.6.31 ...@@ -14,37 +42,50 @@ Written for: 2.6.31
Terminology used in this Document Terminology used in this Document
--------------------------------- ---------------------------------
tail - where new writes happen in the ring buffer. tail
- where new writes happen in the ring buffer.
head - where new reads happen in the ring buffer. head
- where new reads happen in the ring buffer.
producer - the task that writes into the ring buffer (same as writer) producer
- the task that writes into the ring buffer (same as writer)
writer - same as producer writer
- same as producer
consumer - the task that reads from the buffer (same as reader) consumer
- the task that reads from the buffer (same as reader)
reader - same as consumer. reader
- same as consumer.
reader_page - A page outside the ring buffer used solely (for the most part) reader_page
- A page outside the ring buffer used solely (for the most part)
by the reader. by the reader.
head_page - a pointer to the page that the reader will use next head_page
- a pointer to the page that the reader will use next
tail_page - a pointer to the page that will be written to next tail_page
- a pointer to the page that will be written to next
commit_page - a pointer to the page with the last finished non-nested write. commit_page
- a pointer to the page with the last finished non-nested write.
cmpxchg - hardware-assisted atomic transaction that performs the following: cmpxchg
- hardware-assisted atomic transaction that performs the following::
A = B if previous A == C A = B if previous A == C
R = cmpxchg(A, C, B) is saying that we replace A with B if and only if R = cmpxchg(A, C, B) is saying that we replace A with B if and only
current A is equal to C, and we put the old (current) A into R if current A is equal to C, and we put the old (current)
A into R
R gets the previous A regardless if A is updated with B or not. R gets the previous A regardless if A is updated with B or not.
To see if the update was successful a compare of R == C may be used. To see if the update was successful a compare of ``R == C``
may be used.
The Generic Ring Buffer The Generic Ring Buffer
----------------------- -----------------------
...@@ -64,7 +105,7 @@ No two writers can write at the same time (on the same per-cpu buffer), ...@@ -64,7 +105,7 @@ No two writers can write at the same time (on the same per-cpu buffer),
but a writer may interrupt another writer, but it must finish writing but a writer may interrupt another writer, but it must finish writing
before the previous writer may continue. This is very important to the before the previous writer may continue. This is very important to the
algorithm. The writers act like a "stack". The way interrupts works algorithm. The writers act like a "stack". The way interrupts works
enforces this behavior. enforces this behavior::
writer1 start writer1 start
...@@ -115,6 +156,8 @@ A sample of how the reader page is swapped: Note this does not ...@@ -115,6 +156,8 @@ A sample of how the reader page is swapped: Note this does not
show the head page in the buffer, it is for demonstrating a swap show the head page in the buffer, it is for demonstrating a swap
only. only.
::
+------+ +------+
|reader| RING BUFFER |reader| RING BUFFER
|page | |page |
...@@ -172,6 +215,7 @@ only. ...@@ -172,6 +215,7 @@ only.
It is possible that the page swapped is the commit page and the tail page, It is possible that the page swapped is the commit page and the tail page,
if what is in the ring buffer is less than what is held in a buffer page. if what is in the ring buffer is less than what is held in a buffer page.
::
reader page commit page tail page reader page commit page tail page
| | | | | |
...@@ -184,8 +228,8 @@ if what is in the ring buffer is less than what is held in a buffer page. ...@@ -184,8 +228,8 @@ if what is in the ring buffer is less than what is held in a buffer page.
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |---> <---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
This case is still valid for this algorithm. This case is still valid for this algorithm.
...@@ -196,15 +240,19 @@ buffer. ...@@ -196,15 +240,19 @@ buffer.
The main pointers: The main pointers:
reader page - The page used solely by the reader and is not part reader page
- The page used solely by the reader and is not part
of the ring buffer (may be swapped in) of the ring buffer (may be swapped in)
head page - the next page in the ring buffer that will be swapped head page
- the next page in the ring buffer that will be swapped
with the reader page. with the reader page.
tail page - the page where the next write will take place. tail page
- the page where the next write will take place.
commit page - the page that last finished a write. commit page
- the page that last finished a write.
The commit page only is updated by the outermost writer in the The commit page only is updated by the outermost writer in the
writer stack. A writer that preempts another writer will not move the writer stack. A writer that preempts another writer will not move the
...@@ -219,7 +267,7 @@ transaction. If another write happens it must finish before continuing ...@@ -219,7 +267,7 @@ transaction. If another write happens it must finish before continuing
with the previous write. with the previous write.
Write reserve: Write reserve::
Buffer page Buffer page
+---------+ +---------+
...@@ -230,7 +278,7 @@ with the previous write. ...@@ -230,7 +278,7 @@ with the previous write.
| empty | | empty |
+---------+ +---------+
Write commit: Write commit::
Buffer page Buffer page
+---------+ +---------+
...@@ -242,7 +290,7 @@ with the previous write. ...@@ -242,7 +290,7 @@ with the previous write.
+---------+ +---------+
If a write happens after the first reserve: If a write happens after the first reserve::
Buffer page Buffer page
+---------+ +---------+
...@@ -253,7 +301,7 @@ with the previous write. ...@@ -253,7 +301,7 @@ with the previous write.
|reserved | |reserved |
+---------+ <--- tail pointer +---------+ <--- tail pointer
After second writer commits: After second writer commits::
Buffer page Buffer page
...@@ -266,7 +314,7 @@ with the previous write. ...@@ -266,7 +314,7 @@ with the previous write.
|commit | |commit |
+---------+ <--- tail pointer +---------+ <--- tail pointer
When the first writer commits: When the first writer commits::
Buffer page Buffer page
+---------+ +---------+
...@@ -292,20 +340,21 @@ be several pages ahead. If the tail page catches up to the commit ...@@ -292,20 +340,21 @@ be several pages ahead. If the tail page catches up to the commit
page then no more writes may take place (regardless of the mode page then no more writes may take place (regardless of the mode
of the ring buffer: overwrite and produce/consumer). of the ring buffer: overwrite and produce/consumer).
The order of pages is: The order of pages is::
head page head page
commit page commit page
tail page tail page
Possible scenario: Possible scenario::
tail page tail page
head page commit page | head page commit page |
| | | | | |
v v v v v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |---> <---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
There is a special case that the head page is after either the commit page There is a special case that the head page is after either the commit page
...@@ -315,6 +364,7 @@ part of the ring buffer, but the reader page is not. Whenever there ...@@ -315,6 +364,7 @@ part of the ring buffer, but the reader page is not. Whenever there
has been less than a full page that has been committed inside the ring buffer, has been less than a full page that has been committed inside the ring buffer,
and a reader swaps out a page, it will be swapping out the commit page. and a reader swaps out a page, it will be swapping out the commit page.
::
reader page commit page tail page reader page commit page tail page
| | | | | |
...@@ -327,8 +377,8 @@ and a reader swaps out a page, it will be swapping out the commit page. ...@@ -327,8 +377,8 @@ and a reader swaps out a page, it will be swapping out the commit page.
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |---> <---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
^ ^
| |
...@@ -347,14 +397,14 @@ When the tail meets the head page, if the buffer is in overwrite mode, ...@@ -347,14 +397,14 @@ When the tail meets the head page, if the buffer is in overwrite mode,
the head page will be pushed ahead one. If the buffer is in producer/consumer the head page will be pushed ahead one. If the buffer is in producer/consumer
mode, the write will fail. mode, the write will fail.
Overwrite mode: Overwrite mode::
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |---> <---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
^ ^
| |
...@@ -365,8 +415,8 @@ Overwrite mode: ...@@ -365,8 +415,8 @@ Overwrite mode:
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |---> <---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
^ ^
| |
...@@ -377,8 +427,8 @@ Overwrite mode: ...@@ -377,8 +427,8 @@ Overwrite mode:
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |---> <---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
^ ^
| |
...@@ -397,7 +447,7 @@ State flags are placed inside the pointer to the page. To do this, ...@@ -397,7 +447,7 @@ State flags are placed inside the pointer to the page. To do this,
each page must be aligned in memory by 4 bytes. This will allow the 2 each page must be aligned in memory by 4 bytes. This will allow the 2
least significant bits of the address to be used as flags, since least significant bits of the address to be used as flags, since
they will always be zero for the address. To get the address, they will always be zero for the address. To get the address,
simply mask out the flags. simply mask out the flags::
MASK = ~3 MASK = ~3
...@@ -405,11 +455,14 @@ simply mask out the flags. ...@@ -405,11 +455,14 @@ simply mask out the flags.
Two flags will be kept by these two bits: Two flags will be kept by these two bits:
HEADER - the page being pointed to is a head page HEADER
- the page being pointed to is a head page
UPDATE - the page being pointed to is being updated by a writer UPDATE
- the page being pointed to is being updated by a writer
and was or is about to be a head page. and was or is about to be a head page.
::
reader page reader page
| |
...@@ -420,8 +473,8 @@ Two flags will be kept by these two bits: ...@@ -420,8 +473,8 @@ Two flags will be kept by these two bits:
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |---> <---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
...@@ -430,23 +483,23 @@ the next page is the next page to be swapped out by the reader. ...@@ -430,23 +483,23 @@ the next page is the next page to be swapped out by the reader.
This pointer means the next page is the head page. This pointer means the next page is the head page.
When the tail page meets the head pointer, it will use cmpxchg to When the tail page meets the head pointer, it will use cmpxchg to
change the pointer to the UPDATE state: change the pointer to the UPDATE state::
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |---> <---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |---> <---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
"-U->" represents a pointer in the UPDATE state. "-U->" represents a pointer in the UPDATE state.
...@@ -462,7 +515,7 @@ head page does not have the HEADER flag set, the compare will fail ...@@ -462,7 +515,7 @@ head page does not have the HEADER flag set, the compare will fail
and the reader will need to look for the new head page and try again. and the reader will need to look for the new head page and try again.
Note, the flags UPDATE and HEADER are never set at the same time. Note, the flags UPDATE and HEADER are never set at the same time.
The reader swaps the reader page as follows: The reader swaps the reader page as follows::
+------+ +------+
|reader| RING BUFFER |reader| RING BUFFER
...@@ -477,7 +530,7 @@ The reader swaps the reader page as follows: ...@@ -477,7 +530,7 @@ The reader swaps the reader page as follows:
+-----H-------------+ +-----H-------------+
The reader sets the reader page next pointer as HEADER to the page after The reader sets the reader page next pointer as HEADER to the page after
the head page. the head page::
+------+ +------+
...@@ -495,7 +548,7 @@ the head page. ...@@ -495,7 +548,7 @@ the head page.
It does a cmpxchg with the pointer to the previous head page to make it It does a cmpxchg with the pointer to the previous head page to make it
point to the reader page. Note that the new pointer does not have the HEADER point to the reader page. Note that the new pointer does not have the HEADER
flag set. This action atomically moves the head page forward. flag set. This action atomically moves the head page forward::
+------+ +------+
|reader| RING BUFFER |reader| RING BUFFER
...@@ -511,7 +564,7 @@ flag set. This action atomically moves the head page forward. ...@@ -511,7 +564,7 @@ flag set. This action atomically moves the head page forward.
+------------------------------------+ +------------------------------------+
After the new head page is set, the previous pointer of the head page is After the new head page is set, the previous pointer of the head page is
updated to the reader page. updated to the reader page::
+------+ +------+
|reader| RING BUFFER |reader| RING BUFFER
...@@ -548,7 +601,7 @@ prev pointers may not. ...@@ -548,7 +601,7 @@ prev pointers may not.
Note, the way to determine a reader page is simply by examining the previous Note, the way to determine a reader page is simply by examining the previous
pointer of the page. If the next pointer of the previous page does not pointer of the page. If the next pointer of the previous page does not
point back to the original page, then the original page is a reader page: point back to the original page, then the original page is a reader page::
+--------+ +--------+
...@@ -572,53 +625,53 @@ not be able to swap the head page from the buffer, nor will it be able to ...@@ -572,53 +625,53 @@ not be able to swap the head page from the buffer, nor will it be able to
move the head page, until the writer is finished with the move. move the head page, until the writer is finished with the move.
This eliminates any races that the reader can have on the writer. The reader This eliminates any races that the reader can have on the writer. The reader
must spin, and this is why the reader cannot preempt the writer. must spin, and this is why the reader cannot preempt the writer::
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |---> <---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |---> <---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
The following page will be made into the new head page. The following page will be made into the new head page::
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |---> <---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
After the new head page has been set, we can set the old head page After the new head page has been set, we can set the old head page
pointer back to NORMAL. pointer back to NORMAL::
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |--->| |-H->| |---> <---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
After the head page has been moved, the tail page may now move forward. After the head page has been moved, the tail page may now move forward::
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |--->| |-H->| |---> <---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
...@@ -630,7 +683,7 @@ tail page may make it all the way around the buffer and meet the commit ...@@ -630,7 +683,7 @@ tail page may make it all the way around the buffer and meet the commit
page. At this time, we must start dropping writes (usually with some kind page. At this time, we must start dropping writes (usually with some kind
of warning to the user). But what happens if the commit was still on the of warning to the user). But what happens if the commit was still on the
reader page? The commit page is not part of the ring buffer. The tail page reader page? The commit page is not part of the ring buffer. The tail page
must account for this. must account for this::
reader page commit page reader page commit page
...@@ -644,8 +697,8 @@ must account for this. ...@@ -644,8 +697,8 @@ must account for this.
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |---> <---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
^ ^
| |
...@@ -676,7 +729,7 @@ the head page if the head page is the next page. If the head page ...@@ -676,7 +729,7 @@ the head page if the head page is the next page. If the head page
is not the next page, the tail page is simply updated with a cmpxchg. is not the next page, the tail page is simply updated with a cmpxchg.
Only writers move the tail page. This must be done atomically to protect Only writers move the tail page. This must be done atomically to protect
against nested writers. against nested writers::
temp_page = tail_page temp_page = tail_page
next_page = temp_page->next next_page = temp_page->next
...@@ -684,7 +737,7 @@ against nested writers. ...@@ -684,7 +737,7 @@ against nested writers.
The above will update the tail page if it is still pointing to the expected The above will update the tail page if it is still pointing to the expected
page. If this fails, a nested write pushed it forward, the current write page. If this fails, a nested write pushed it forward, the current write
does not need to push it. does not need to push it::
temp page temp page
...@@ -694,43 +747,43 @@ does not need to push it. ...@@ -694,43 +747,43 @@ does not need to push it.
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |---> <---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
Nested write comes in and moves the tail page forward: Nested write comes in and moves the tail page forward::
tail page (moved by nested writer) tail page (moved by nested writer)
temp page | temp page |
| | | |
v v v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |---> <---| |--->| |--->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
The above would fail the cmpxchg, but since the tail page has already The above would fail the cmpxchg, but since the tail page has already
been moved forward, the writer will just try again to reserve storage been moved forward, the writer will just try again to reserve storage
on the new tail page. on the new tail page.
But the moving of the head page is a bit more complex. But the moving of the head page is a bit more complex::
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |---> <---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
The write converts the head page pointer to UPDATE. The write converts the head page pointer to UPDATE::
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |---> <---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
But if a nested writer preempts here, it will see that the next But if a nested writer preempts here, it will see that the next
...@@ -739,217 +792,216 @@ it is nested and will save that information. The detection is the ...@@ -739,217 +792,216 @@ it is nested and will save that information. The detection is the
fact that it sees the UPDATE flag instead of a HEADER or NORMAL fact that it sees the UPDATE flag instead of a HEADER or NORMAL
pointer. pointer.
The nested writer will set the new head page pointer. The nested writer will set the new head page pointer::
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |---> <---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
But it will not reset the update back to normal. Only the writer But it will not reset the update back to normal. Only the writer
that converted a pointer from HEAD to UPDATE will convert it back that converted a pointer from HEAD to UPDATE will convert it back
to NORMAL. to NORMAL::
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |---> <---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
After the nested writer finishes, the outermost writer will convert After the nested writer finishes, the outermost writer will convert
the UPDATE pointer to NORMAL. the UPDATE pointer to NORMAL::
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |--->| |-H->| |---> <---| |--->| |--->| |-H->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
It can be even more complex if several nested writes came in and moved It can be even more complex if several nested writes came in and moved
the tail page ahead several pages: the tail page ahead several pages::
(first writer) (first writer)
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-H->| |--->| |---> <---| |--->| |-H->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
The write converts the head page pointer to UPDATE. The write converts the head page pointer to UPDATE::
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |---> <---| |--->| |-U->| |--->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
Next writer comes in, and sees the update and sets up the new Next writer comes in, and sees the update and sets up the new
head page. head page::
(second writer) (second writer)
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |---> <---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
The nested writer moves the tail page forward. But does not set the old The nested writer moves the tail page forward. But does not set the old
update page to NORMAL because it is not the outermost writer. update page to NORMAL because it is not the outermost writer::
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |---> <---| |--->| |-U->| |-H->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
Another writer preempts and sees the page after the tail page is a head page. Another writer preempts and sees the page after the tail page is a head page.
It changes it from HEAD to UPDATE. It changes it from HEAD to UPDATE::
(third writer) (third writer)
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |-U->| |---> <---| |--->| |-U->| |-U->| |--->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
The writer will move the head page forward: The writer will move the head page forward::
(third writer) (third writer)
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |-U->| |-H-> <---| |--->| |-U->| |-U->| |-H->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
But now that the third writer did change the HEAD flag to UPDATE it But now that the third writer did change the HEAD flag to UPDATE it
will convert it to normal: will convert it to normal::
(third writer) (third writer)
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H-> <---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
Then it will move the tail page, and return back to the second writer. Then it will move the tail page, and return back to the second writer::
(second writer) (second writer)
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H-> <---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
The second writer will fail to move the tail page because it was already The second writer will fail to move the tail page because it was already
moved, so it will try again and add its data to the new tail page. moved, so it will try again and add its data to the new tail page.
It will return to the first writer. It will return to the first writer::
(first writer) (first writer)
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H-> <---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
The first writer cannot know atomically if the tail page moved The first writer cannot know atomically if the tail page moved
while it updates the HEAD page. It will then update the head page to while it updates the HEAD page. It will then update the head page to
what it thinks is the new head page. what it thinks is the new head page::
(first writer) (first writer)
tail page tail page
| |
v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |-H-> <---| |--->| |-U->| |-H->| |-H->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
Since the cmpxchg returns the old value of the pointer the first writer Since the cmpxchg returns the old value of the pointer the first writer
will see it succeeded in updating the pointer from NORMAL to HEAD. will see it succeeded in updating the pointer from NORMAL to HEAD.
But as we can see, this is not good enough. It must also check to see But as we can see, this is not good enough. It must also check to see
if the tail page is either where it use to be or on the next page: if the tail page is either where it use to be or on the next page::
(first writer) (first writer)
A B tail page A B tail page
| | | | | |
v v v v v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |-H->| |-H-> <---| |--->| |-U->| |-H->| |-H->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
If tail page != A and tail page != B, then it must reset the pointer If tail page != A and tail page != B, then it must reset the pointer
back to NORMAL. The fact that it only needs to worry about nested back to NORMAL. The fact that it only needs to worry about nested
writers means that it only needs to check this after setting the HEAD page. writers means that it only needs to check this after setting the HEAD page::
(first writer) (first writer)
A B tail page A B tail page
| | | | | |
v v v v v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |-U->| |--->| |-H-> <---| |--->| |-U->| |--->| |-H->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
Now the writer can update the head page. This is also why the head page must Now the writer can update the head page. This is also why the head page must
remain in UPDATE and only reset by the outermost writer. This prevents remain in UPDATE and only reset by the outermost writer. This prevents
the reader from seeing the incorrect head page. the reader from seeing the incorrect head page::
(first writer) (first writer)
A B tail page A B tail page
| | | | | |
v v v v v v
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
<---| |--->| |--->| |--->| |-H-> <---| |--->| |--->| |--->| |-H->
--->| |<---| |<---| |<---| |<--- --->| |<---| |<---| |<---| |<---
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment