Commit b0a4aa95 authored by Mauro Carvalho Chehab's avatar Mauro Carvalho Chehab

docs: nvdimm: convert to ReST

Rename the nvdimm documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
parent 6e58e2d8
=============================
BTT - Block Translation Table BTT - Block Translation Table
============================= =============================
1. Introduction 1. Introduction
--------------- ===============
Persistent memory based storage is able to perform IO at byte (or more Persistent memory based storage is able to perform IO at byte (or more
accurately, cache line) granularity. However, we often want to expose such accurately, cache line) granularity. However, we often want to expose such
...@@ -25,7 +26,7 @@ provides atomic sector updates. ...@@ -25,7 +26,7 @@ provides atomic sector updates.
2. Static Layout 2. Static Layout
---------------- ================
The underlying storage on which a BTT can be laid out is not limited in any way. The underlying storage on which a BTT can be laid out is not limited in any way.
The BTT, however, splits the available space into chunks of up to 512 GiB, The BTT, however, splits the available space into chunks of up to 512 GiB,
...@@ -33,27 +34,27 @@ called "Arenas". ...@@ -33,27 +34,27 @@ called "Arenas".
Each arena follows the same layout for its metadata, and all references in an Each arena follows the same layout for its metadata, and all references in an
arena are internal to it (with the exception of one field that points to the arena are internal to it (with the exception of one field that points to the
next arena). The following depicts the "On-disk" metadata layout: next arena). The following depicts the "On-disk" metadata layout::
Backing Store +-------> Arena Backing Store +-------> Arena
+---------------+ | +------------------+ +---------------+ | +------------------+
| | | | Arena info block | | | | | Arena info block |
| Arena 0 +---+ | 4K | | Arena 0 +---+ | 4K |
| 512G | +------------------+ | 512G | +------------------+
| | | | | | | |
+---------------+ | | +---------------+ | |
| | | | | | | |
| Arena 1 | | Data Blocks | | Arena 1 | | Data Blocks |
| 512G | | | | 512G | | |
| | | | | | | |
+---------------+ | | +---------------+ | |
| . | | | | . | | |
| . | | | | . | | |
| . | | | | . | | |
| | | | | | | |
| | | | | | | |
+---------------+ +------------------+ +---------------+ +------------------+
| | | |
| BTT Map | | BTT Map |
| | | |
...@@ -69,7 +70,7 @@ next arena). The following depicts the "On-disk" metadata layout: ...@@ -69,7 +70,7 @@ next arena). The following depicts the "On-disk" metadata layout:
3. Theory of Operation 3. Theory of Operation
---------------------- ======================
a. The BTT Map a. The BTT Map
...@@ -79,31 +80,37 @@ The map is a simple lookup/indirection table that maps an LBA to an internal ...@@ -79,31 +80,37 @@ The map is a simple lookup/indirection table that maps an LBA to an internal
block. Each map entry is 32 bits. The two most significant bits are special block. Each map entry is 32 bits. The two most significant bits are special
flags, and the remaining form the internal block number. flags, and the remaining form the internal block number.
======== =============================================================
Bit Description Bit Description
31 - 30 : Error and Zero flags - Used in the following way: ======== =============================================================
Bit Description 31 - 30 Error and Zero flags - Used in the following way:
31 30
-----------------------------------------------------------------------
00 Initial state. Reads return zeroes; Premap = Postmap
01 Zero state: Reads return zeroes
10 Error state: Reads fail; Writes clear 'E' bit
11 Normal Block – has valid postmap
== == ====================================================
31 30 Description
== == ====================================================
0 0 Initial state. Reads return zeroes; Premap = Postmap
0 1 Zero state: Reads return zeroes
1 0 Error state: Reads fail; Writes clear 'E' bit
1 1 Normal Block – has valid postmap
== == ====================================================
29 - 0 : Mappings to internal 'postmap' blocks 29 - 0 Mappings to internal 'postmap' blocks
======== =============================================================
Some of the terminology that will be subsequently used: Some of the terminology that will be subsequently used:
External LBA : LBA as made visible to upper layers. ============ ================================================================
ABA : Arena Block Address - Block offset/number within an arena External LBA LBA as made visible to upper layers.
Premap ABA : The block offset into an arena, which was decided upon by range ABA Arena Block Address - Block offset/number within an arena
Premap ABA The block offset into an arena, which was decided upon by range
checking the External LBA checking the External LBA
Postmap ABA : The block number in the "Data Blocks" area obtained after Postmap ABA The block number in the "Data Blocks" area obtained after
indirection from the map indirection from the map
nfree : The number of free blocks that are maintained at any given time. nfree The number of free blocks that are maintained at any given time.
This is the number of concurrent writes that can happen to the This is the number of concurrent writes that can happen to the
arena. arena.
============ ================================================================
For example, after adding a BTT, we surface a disk of 1024G. We get a read for For example, after adding a BTT, we surface a disk of 1024G. We get a read for
...@@ -121,19 +128,21 @@ i.e. Every write goes to a "free" block. A running list of free blocks is ...@@ -121,19 +128,21 @@ i.e. Every write goes to a "free" block. A running list of free blocks is
maintained in the form of the BTT flog. 'Flog' is a combination of the words maintained in the form of the BTT flog. 'Flog' is a combination of the words
"free list" and "log". The flog contains 'nfree' entries, and an entry contains: "free list" and "log". The flog contains 'nfree' entries, and an entry contains:
lba : The premap ABA that is being written to ======== =====================================================================
old_map : The old postmap ABA - after 'this' write completes, this will be a lba The premap ABA that is being written to
old_map The old postmap ABA - after 'this' write completes, this will be a
free block. free block.
new_map : The new postmap ABA. The map will up updated to reflect this new_map The new postmap ABA. The map will up updated to reflect this
lba->postmap_aba mapping, but we log it here in case we have to lba->postmap_aba mapping, but we log it here in case we have to
recover. recover.
seq : Sequence number to mark which of the 2 sections of this flog entry is seq Sequence number to mark which of the 2 sections of this flog entry is
valid/newest. It cycles between 01->10->11->01 (binary) under normal valid/newest. It cycles between 01->10->11->01 (binary) under normal
operation, with 00 indicating an uninitialized state. operation, with 00 indicating an uninitialized state.
lba' : alternate lba entry lba' alternate lba entry
old_map': alternate old postmap entry old_map' alternate old postmap entry
new_map': alternate new postmap entry new_map' alternate new postmap entry
seq' : alternate sequence number. seq' alternate sequence number.
======== =====================================================================
Each of the above fields is 32-bit, making one entry 32 bytes. Entries are also Each of the above fields is 32-bit, making one entry 32 bytes. Entries are also
padded to 64 bytes to avoid cache line sharing or aliasing. Flog updates are padded to 64 bytes to avoid cache line sharing or aliasing. Flog updates are
...@@ -147,8 +156,10 @@ c. The concept of lanes ...@@ -147,8 +156,10 @@ c. The concept of lanes
While 'nfree' describes the number of concurrent IOs an arena can process While 'nfree' describes the number of concurrent IOs an arena can process
concurrently, 'nlanes' is the number of IOs the BTT device as a whole can concurrently, 'nlanes' is the number of IOs the BTT device as a whole can
process. process::
nlanes = min(nfree, num_cpus) nlanes = min(nfree, num_cpus)
A lane number is obtained at the start of any IO, and is used for indexing into A lane number is obtained at the start of any IO, and is used for indexing into
all the on-disk and in-memory data structures for the duration of the IO. If all the on-disk and in-memory data structures for the duration of the IO. If
there are more CPUs than the max number of available lanes, than lanes are there are more CPUs than the max number of available lanes, than lanes are
...@@ -180,10 +191,10 @@ e. In-memory data structure: map locks ...@@ -180,10 +191,10 @@ e. In-memory data structure: map locks
-------------------------------------- --------------------------------------
Consider a case where two writer threads are writing to the same LBA. There can Consider a case where two writer threads are writing to the same LBA. There can
be a race in the following sequence of steps: be a race in the following sequence of steps::
free[lane] = map[premap_aba] free[lane] = map[premap_aba]
map[premap_aba] = postmap_aba map[premap_aba] = postmap_aba
Both threads can update their respective free[lane] with the same old, freed Both threads can update their respective free[lane] with the same old, freed
postmap_aba. This has made the layout inconsistent by losing a free entry, and postmap_aba. This has made the layout inconsistent by losing a free entry, and
...@@ -202,6 +213,7 @@ On startup, we analyze the BTT flog to create our list of free blocks. We walk ...@@ -202,6 +213,7 @@ On startup, we analyze the BTT flog to create our list of free blocks. We walk
through all the entries, and for each lane, of the set of two possible through all the entries, and for each lane, of the set of two possible
'sections', we always look at the most recent one only (based on the sequence 'sections', we always look at the most recent one only (based on the sequence
number). The reconstruction rules/steps are simple: number). The reconstruction rules/steps are simple:
- Read map[log_entry.lba]. - Read map[log_entry.lba].
- If log_entry.new matches the map entry, then log_entry.old is free. - If log_entry.new matches the map entry, then log_entry.old is free.
- If log_entry.new does not match the map entry, then log_entry.new is free. - If log_entry.new does not match the map entry, then log_entry.new is free.
...@@ -245,6 +257,7 @@ Write: ...@@ -245,6 +257,7 @@ Write:
An arena would be in an error state if any of the metadata is corrupted An arena would be in an error state if any of the metadata is corrupted
irrecoverably, either due to a bug or a media error. The following conditions irrecoverably, either due to a bug or a media error. The following conditions
indicate an error: indicate an error:
- Info block checksum does not match (and recovering from the copy also fails) - Info block checksum does not match (and recovering from the copy also fails)
- All internal available blocks are not uniquely and entirely addressed by the - All internal available blocks are not uniquely and entirely addressed by the
sum of mapped blocks and free blocks (from the BTT flog). sum of mapped blocks and free blocks (from the BTT flog).
...@@ -263,11 +276,10 @@ The BTT can be set up on any disk (namespace) exposed by the libnvdimm subsystem ...@@ -263,11 +276,10 @@ The BTT can be set up on any disk (namespace) exposed by the libnvdimm subsystem
(pmem, or blk mode). The easiest way to set up such a namespace is using the (pmem, or blk mode). The easiest way to set up such a namespace is using the
'ndctl' utility [1]: 'ndctl' utility [1]:
For example, the ndctl command line to setup a btt with a 4k sector size is: For example, the ndctl command line to setup a btt with a 4k sector size is::
ndctl create-namespace -f -e namespace0.0 -m sector -l 4k ndctl create-namespace -f -e namespace0.0 -m sector -l 4k
See ndctl create-namespace --help for more options. See ndctl create-namespace --help for more options.
[1]: https://github.com/pmem/ndctl [1]: https://github.com/pmem/ndctl
:orphan:
===================================
Non-Volatile Memory Device (NVDIMM)
===================================
.. toctree::
:maxdepth: 1
nvdimm
btt
security
NVDIMM SECURITY ===============
NVDIMM Security
=============== ===============
1. Introduction 1. Introduction
...@@ -138,4 +139,5 @@ This command is only available when the master security is enabled, indicated ...@@ -138,4 +139,5 @@ This command is only available when the master security is enabled, indicated
by the extended security status. by the extended security status.
[1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf [1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf
[2]: http://www.t13.org/documents/UploadedDocuments/docs2006/e05179r4-ACS-SecurityClarifications.pdf [2]: http://www.t13.org/documents/UploadedDocuments/docs2006/e05179r4-ACS-SecurityClarifications.pdf
...@@ -33,7 +33,7 @@ config BLK_DEV_PMEM ...@@ -33,7 +33,7 @@ config BLK_DEV_PMEM
Documentation/admin-guide/kernel-parameters.rst). This driver converts Documentation/admin-guide/kernel-parameters.rst). This driver converts
these persistent memory ranges into block devices that are these persistent memory ranges into block devices that are
capable of DAX (direct-access) file system mappings. See capable of DAX (direct-access) file system mappings. See
Documentation/nvdimm/nvdimm.txt for more details. Documentation/nvdimm/nvdimm.rst for more details.
Say Y if you want to use an NVDIMM Say Y if you want to use an NVDIMM
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment