Commit 90caa781 authored by Tobin C. Harding's avatar Tobin C. Harding Committed by Jonathan Corbet

docs: filesystems: vfs: Use 72 character column width

In preparation for conversion to RST format use the kernels favoured
documentation column width.  If we are going to do this we might as well
do it thoroughly.  Just do the paragraphs (not the indented stuff), the
rest will be done during indentation fix up patch.

This patch is whitespace only, no textual changes.

Use 72 character column width for all paragraph sections.
Tested-by: default avatarRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: default avatarTobin C. Harding <tobin@kernel.org>
Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
parent 4ee33ea4
......@@ -12,15 +12,14 @@
Introduction
============
The Virtual File System (also known as the Virtual Filesystem Switch)
is the software layer in the kernel that provides the filesystem
interface to userspace programs. It also provides an abstraction
within the kernel which allows different filesystem implementations to
coexist.
The Virtual File System (also known as the Virtual Filesystem Switch) is
the software layer in the kernel that provides the filesystem interface
to userspace programs. It also provides an abstraction within the
kernel which allows different filesystem implementations to coexist.
VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so
on are called from a process context. Filesystem locking is described
in the document Documentation/filesystems/Locking.
VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so on
are called from a process context. Filesystem locking is described in
the document Documentation/filesystems/Locking.
Directory Entry Cache (dcache)
......@@ -34,11 +33,10 @@ translate a pathname (filename) into a specific dentry. Dentries live
in RAM and are never saved to disc: they exist only for performance.
The dentry cache is meant to be a view into your entire filespace. As
most computers cannot fit all dentries in the RAM at the same time,
some bits of the cache are missing. In order to resolve your pathname
into a dentry, the VFS may have to resort to creating dentries along
the way, and then loading the inode. This is done by looking up the
inode.
most computers cannot fit all dentries in the RAM at the same time, some
bits of the cache are missing. In order to resolve your pathname into a
dentry, the VFS may have to resort to creating dentries along the way,
and then loading the inode. This is done by looking up the inode.
The Inode Object
......@@ -46,33 +44,32 @@ The Inode Object
An individual dentry usually has a pointer to an inode. Inodes are
filesystem objects such as regular files, directories, FIFOs and other
beasts. They live either on the disc (for block device filesystems)
or in the memory (for pseudo filesystems). Inodes that live on the
disc are copied into the memory when required and changes to the inode
are written back to disc. A single inode can be pointed to by multiple
beasts. They live either on the disc (for block device filesystems) or
in the memory (for pseudo filesystems). Inodes that live on the disc
are copied into the memory when required and changes to the inode are
written back to disc. A single inode can be pointed to by multiple
dentries (hard links, for example, do this).
To look up an inode requires that the VFS calls the lookup() method of
the parent directory inode. This method is installed by the specific
filesystem implementation that the inode lives in. Once the VFS has
the required dentry (and hence the inode), we can do all those boring
things like open(2) the file, or stat(2) it to peek at the inode
data. The stat(2) operation is fairly simple: once the VFS has the
dentry, it peeks at the inode data and passes some of it back to
userspace.
filesystem implementation that the inode lives in. Once the VFS has the
required dentry (and hence the inode), we can do all those boring things
like open(2) the file, or stat(2) it to peek at the inode data. The
stat(2) operation is fairly simple: once the VFS has the dentry, it
peeks at the inode data and passes some of it back to userspace.
The File Object
---------------
Opening a file requires another operation: allocation of a file
structure (this is the kernel-side implementation of file
descriptors). The freshly allocated file structure is initialized with
a pointer to the dentry and a set of file operation member functions.
These are taken from the inode data. The open() file method is then
called so the specific filesystem implementation can do its work. You
can see that this is another switch performed by the VFS. The file
structure is placed into the file descriptor table for the process.
structure (this is the kernel-side implementation of file descriptors).
The freshly allocated file structure is initialized with a pointer to
the dentry and a set of file operation member functions. These are
taken from the inode data. The open() file method is then called so the
specific filesystem implementation can do its work. You can see that
this is another switch performed by the VFS. The file structure is
placed into the file descriptor table for the process.
Reading, writing and closing files (and other assorted VFS operations)
is done by using the userspace file descriptor to grab the appropriate
......@@ -93,11 +90,12 @@ functions:
extern int unregister_filesystem(struct file_system_type *);
The passed struct file_system_type describes your filesystem. When a
request is made to mount a filesystem onto a directory in your namespace,
the VFS will call the appropriate mount() method for the specific
filesystem. New vfsmount referring to the tree returned by ->mount()
will be attached to the mountpoint, so that when pathname resolution
reaches the mountpoint it will jump into the root of that vfsmount.
request is made to mount a filesystem onto a directory in your
namespace, the VFS will call the appropriate mount() method for the
specific filesystem. New vfsmount referring to the tree returned by
->mount() will be attached to the mountpoint, so that when pathname
resolution reaches the mountpoint it will jump into the root of that
vfsmount.
You can see all filesystems that are registered to the kernel in the
file /proc/filesystems.
......@@ -156,21 +154,21 @@ The mount() method must return the root dentry of the tree requested by
caller. An active reference to its superblock must be grabbed and the
superblock must be locked. On failure it should return ERR_PTR(error).
The arguments match those of mount(2) and their interpretation
depends on filesystem type. E.g. for block filesystems, dev_name is
interpreted as block device name, that device is opened and if it
contains a suitable filesystem image the method creates and initializes
struct super_block accordingly, returning its root dentry to caller.
The arguments match those of mount(2) and their interpretation depends
on filesystem type. E.g. for block filesystems, dev_name is interpreted
as block device name, that device is opened and if it contains a
suitable filesystem image the method creates and initializes struct
super_block accordingly, returning its root dentry to caller.
->mount() may choose to return a subtree of existing filesystem - it
doesn't have to create a new one. The main result from the caller's
point of view is a reference to dentry at the root of (sub)tree to
be attached; creation of new superblock is a common side effect.
point of view is a reference to dentry at the root of (sub)tree to be
attached; creation of new superblock is a common side effect.
The most interesting member of the superblock structure that the
mount() method fills in is the "s_op" field. This is a pointer to
a "struct super_operations" which describes the next level of the
filesystem implementation.
The most interesting member of the superblock structure that the mount()
method fills in is the "s_op" field. This is a pointer to a "struct
super_operations" which describes the next level of the filesystem
implementation.
Usually, a filesystem uses one of the generic mount() implementations
and provides a fill_super() callback instead. The generic variants are:
......@@ -317,16 +315,16 @@ or bottom half).
implementations will cause holdoff problems due to large scan batch
sizes.
Whoever sets up the inode is responsible for filling in the "i_op" field. This
is a pointer to a "struct inode_operations" which describes the methods that
can be performed on individual inodes.
Whoever sets up the inode is responsible for filling in the "i_op"
field. This is a pointer to a "struct inode_operations" which describes
the methods that can be performed on individual inodes.
struct xattr_handlers
---------------------
On filesystems that support extended attributes (xattrs), the s_xattr
superblock field points to a NULL-terminated array of xattr handlers. Extended
attributes are name:value pairs.
superblock field points to a NULL-terminated array of xattr handlers.
Extended attributes are name:value pairs.
name: Indicates that the handler matches attributes with the specified name
(such as "system.posix_acl_access"); the prefix field must be NULL.
......@@ -346,9 +344,9 @@ attributes are name:value pairs.
attribute. This method is called by the the setxattr(2) and
removexattr(2) system calls.
When none of the xattr handlers of a filesystem match the specified attribute
name or when a filesystem doesn't support extended attributes, the various
*xattr(2) system calls return -EOPNOTSUPP.
When none of the xattr handlers of a filesystem match the specified
attribute name or when a filesystem doesn't support extended attributes,
the various *xattr(2) system calls return -EOPNOTSUPP.
The Inode Object
......@@ -360,8 +358,8 @@ An inode object represents an object within the filesystem.
struct inode_operations
-----------------------
This describes how the VFS can manipulate an inode in your
filesystem. As of kernel 2.6.22, the following members are defined:
This describes how the VFS can manipulate an inode in your filesystem.
As of kernel 2.6.22, the following members are defined:
struct inode_operations {
int (*create) (struct inode *,struct dentry *, umode_t, bool);
......@@ -517,42 +515,40 @@ The Address Space Object
========================
The address space object is used to group and manage pages in the page
cache. It can be used to keep track of the pages in a file (or
anything else) and also track the mapping of sections of the file into
process address spaces.
cache. It can be used to keep track of the pages in a file (or anything
else) and also track the mapping of sections of the file into process
address spaces.
There are a number of distinct yet related services that an
address-space can provide. These include communicating memory
pressure, page lookup by address, and keeping track of pages tagged as
Dirty or Writeback.
address-space can provide. These include communicating memory pressure,
page lookup by address, and keeping track of pages tagged as Dirty or
Writeback.
The first can be used independently to the others. The VM can try to
either write dirty pages in order to clean them, or release clean
pages in order to reuse them. To do this it can call the ->writepage
method on dirty pages, and ->releasepage on clean pages with
PagePrivate set. Clean pages without PagePrivate and with no external
references will be released without notice being given to the
address_space.
either write dirty pages in order to clean them, or release clean pages
in order to reuse them. To do this it can call the ->writepage method
on dirty pages, and ->releasepage on clean pages with PagePrivate set.
Clean pages without PagePrivate and with no external references will be
released without notice being given to the address_space.
To achieve this functionality, pages need to be placed on an LRU with
lru_cache_add and mark_page_active needs to be called whenever the
page is used.
lru_cache_add and mark_page_active needs to be called whenever the page
is used.
Pages are normally kept in a radix tree index by ->index. This tree
maintains information about the PG_Dirty and PG_Writeback status of
each page, so that pages with either of these flags can be found
quickly.
maintains information about the PG_Dirty and PG_Writeback status of each
page, so that pages with either of these flags can be found quickly.
The Dirty tag is primarily used by mpage_writepages - the default
->writepages method. It uses the tag to find dirty pages to call
->writepage on. If mpage_writepages is not used (i.e. the address
provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is
almost unused. write_inode_now and sync_inode do use it (through
provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is almost
unused. write_inode_now and sync_inode do use it (through
__sync_single_inode) to check if ->writepages has been successful in
writing out the whole address_space.
The Writeback tag is used by filemap*wait* and sync_page* functions,
via filemap_fdatawait_range, to wait for all writeback to complete.
The Writeback tag is used by filemap*wait* and sync_page* functions, via
filemap_fdatawait_range, to wait for all writeback to complete.
An address_space handler may attach extra information to a page,
typically using the 'private' field in the 'struct page'. If such
......@@ -562,25 +558,24 @@ handler to deal with that data.
An address space acts as an intermediate between storage and
application. Data is read into the address space a whole page at a
time, and provided to the application either by copying of the page,
or by memory-mapping the page.
Data is written into the address space by the application, and then
written-back to storage typically in whole pages, however the
address_space has finer control of write sizes.
time, and provided to the application either by copying of the page, or
by memory-mapping the page. Data is written into the address space by
the application, and then written-back to storage typically in whole
pages, however the address_space has finer control of write sizes.
The read process essentially only requires 'readpage'. The write
process is more complicated and uses write_begin/write_end or
set_page_dirty to write data into the address_space, and writepage
and writepages to writeback data to storage.
set_page_dirty to write data into the address_space, and writepage and
writepages to writeback data to storage.
Adding and removing pages to/from an address_space is protected by the
inode's i_mutex.
When data is written to a page, the PG_Dirty flag should be set. It
typically remains set until writepage asks for it to be written. This
should clear PG_Dirty and set PG_Writeback. It can be actually
written at any point after PG_Dirty is clear. Once it is known to be
safe, PG_Writeback is cleared.
should clear PG_Dirty and set PG_Writeback. It can be actually written
at any point after PG_Dirty is clear. Once it is known to be safe,
PG_Writeback is cleared.
Writeback makes use of a writeback_control structure to direct the
operations. This gives the the writepage and writepages operations some
......@@ -609,9 +604,10 @@ file descriptors should get back an error is not possible.
Instead, the generic writeback error tracking infrastructure in the
kernel settles for reporting errors to fsync on all file descriptions
that were open at the time that the error occurred. In a situation with
multiple writers, all of them will get back an error on a subsequent fsync,
even if all of the writes done through that particular file descriptor
succeeded (or even if there were no writes on that file descriptor at all).
multiple writers, all of them will get back an error on a subsequent
fsync, even if all of the writes done through that particular file
descriptor succeeded (or even if there were no writes on that file
descriptor at all).
Filesystems that wish to use this infrastructure should call
mapping_set_error to record the error in the address_space when it
......@@ -623,8 +619,8 @@ point in the stream of errors emitted by the backing device(s).
struct address_space_operations
-------------------------------
This describes how the VFS can manipulate mapping of a file to page cache in
your filesystem. The following members are defined:
This describes how the VFS can manipulate mapping of a file to page
cache in your filesystem. The following members are defined:
struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
......@@ -1231,8 +1227,8 @@ filesystems.
Showing options
---------------
If a filesystem accepts mount options, it must define show_options()
to show all the currently active options. The rules are:
If a filesystem accepts mount options, it must define show_options() to
show all the currently active options. The rules are:
- options MUST be shown which are not default or their values differ
from the default
......@@ -1240,14 +1236,14 @@ to show all the currently active options. The rules are:
- options MAY be shown which are enabled by default or have their
default value
Options used only internally between a mount helper and the kernel
(such as file descriptors), or which only have an effect during the
mounting (such as ones controlling the creation of a journal) are exempt
from the above rules.
Options used only internally between a mount helper and the kernel (such
as file descriptors), or which only have an effect during the mounting
(such as ones controlling the creation of a journal) are exempt from the
above rules.
The underlying reason for the above rules is to make sure, that a
mount can be accurately replicated (e.g. umounting and mounting again)
based on the information found in /proc/mounts.
The underlying reason for the above rules is to make sure, that a mount
can be accurately replicated (e.g. umounting and mounting again) based
on the information found in /proc/mounts.
Resources
=========
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment