Commit 18ccb223 authored by Mauro Carvalho Chehab's avatar Mauro Carvalho Chehab Committed by Jonathan Corbet

docs: filesystems: convert orangefs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.
Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/6f438eeff5b029d229197a602bd9b74004fe9b63.1581955849.git.mchehab+huawei@kernel.orgSigned-off-by: default avatarJonathan Corbet <corbet@lwn.net>
parent 7cbb468f
......@@ -79,6 +79,7 @@ Documentation for filesystem implementations.
ocfs2
ocfs2-online-filecheck
omfs
orangefs
overlayfs
virtiofs
vfat
.. SPDX-License-Identifier: GPL-2.0
========
ORANGEFS
========
......@@ -21,25 +24,25 @@ Orangefs features include:
* Stateless
MAILING LIST ARCHIVES
Mailing List Archives
=====================
http://lists.orangefs.org/pipermail/devel_lists.orangefs.org/
MAILING LIST SUBMISSIONS
Mailing List Submissions
========================
devel@lists.orangefs.org
DOCUMENTATION
Documentation
=============
http://www.orangefs.org/documentation/
USERSPACE FILESYSTEM SOURCE
Userspace Filesystem Source
===========================
http://www.orangefs.org/download
......@@ -48,16 +51,16 @@ Orangefs versions prior to 2.9.3 would not be compatible with the
upstream version of the kernel client.
RUNNING ORANGEFS ON A SINGLE SERVER
Running ORANGEFS On a Single Server
===================================
OrangeFS is usually run in large installations with multiple servers and
clients, but a complete filesystem can be run on a single machine for
development and testing.
On Fedora, install orangefs and orangefs-server.
On Fedora, install orangefs and orangefs-server::
dnf -y install orangefs orangefs-server
dnf -y install orangefs orangefs-server
There is an example server configuration file in
/etc/orangefs/orangefs.conf. Change localhost to your hostname if
......@@ -70,29 +73,29 @@ single line. Uncomment it and change the hostname if necessary. This
controls clients which use libpvfs2. This does not control the
pvfs2-client-core.
Create the filesystem.
Create the filesystem::
pvfs2-server -f /etc/orangefs/orangefs.conf
pvfs2-server -f /etc/orangefs/orangefs.conf
Start the server.
Start the server::
systemctl start orangefs-server
systemctl start orangefs-server
Test the server.
Test the server::
pvfs2-ping -m /pvfsmnt
pvfs2-ping -m /pvfsmnt
Start the client. The module must be compiled in or loaded before this
point.
point::
systemctl start orangefs-client
systemctl start orangefs-client
Mount the filesystem.
Mount the filesystem::
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
BUILDING ORANGEFS ON A SINGLE SERVER
Building ORANGEFS on a Single Server
====================================
Where OrangeFS cannot be installed from distribution packages, it may be
......@@ -102,49 +105,51 @@ You can omit --prefix if you don't care that things are sprinkled around
in /usr/local. As of version 2.9.6, OrangeFS uses Berkeley DB by
default, we will probably be changing the default to LMDB soon.
./configure --prefix=/opt/ofs --with-db-backend=lmdb
::
make
./configure --prefix=/opt/ofs --with-db-backend=lmdb
make install
make
Create an orangefs config file.
make install
/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
Create an orangefs config file::
Create an /etc/pvfs2tab file.
/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
/etc/pvfs2tab
Create an /etc/pvfs2tab file::
Create the mount point you specified in the tab file if needed.
echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
/etc/pvfs2tab
mkdir /pvfsmnt
Create the mount point you specified in the tab file if needed::
Bootstrap the server.
mkdir /pvfsmnt
/opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf
Bootstrap the server::
Start the server.
/opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf
/opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
Start the server::
/opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
Now the server should be running. Pvfs2-ls is a simple
test to verify that the server is running.
test to verify that the server is running::
/opt/ofs/bin/pvfs2-ls /pvfsmnt
/opt/ofs/bin/pvfs2-ls /pvfsmnt
If stuff seems to be working, load the kernel module and
turn on the client core.
turn on the client core::
/opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
/opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
Mount your filesystem.
Mount your filesystem::
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
RUNNING XFSTESTS
Running xfstests
================
It is useful to use a scratch filesystem with xfstests. This can be
......@@ -159,21 +164,23 @@ Then there are two FileSystem sections: orangefs and scratch.
This change should be made before creating the filesystem.
pvfs2-server -f /etc/orangefs/orangefs.conf
::
pvfs2-server -f /etc/orangefs/orangefs.conf
To run xfstests, create /etc/xfsqa.config.
To run xfstests, create /etc/xfsqa.config::
TEST_DIR=/orangefs
TEST_DEV=tcp://localhost:3334/orangefs
SCRATCH_MNT=/scratch
SCRATCH_DEV=tcp://localhost:3334/scratch
TEST_DIR=/orangefs
TEST_DEV=tcp://localhost:3334/orangefs
SCRATCH_MNT=/scratch
SCRATCH_DEV=tcp://localhost:3334/scratch
Then xfstests can be run
Then xfstests can be run::
./check -pvfs2
./check -pvfs2
OPTIONS
Options
=======
The following mount options are accepted:
......@@ -193,32 +200,32 @@ The following mount options are accepted:
Distributed locking is being worked on for the future.
DEBUGGING
Debugging
=========
If you want the debug (GOSSIP) statements in a particular
source file (inode.c for example) go to syslog:
source file (inode.c for example) go to syslog::
echo inode > /sys/kernel/debug/orangefs/kernel-debug
No debugging (the default):
No debugging (the default)::
echo none > /sys/kernel/debug/orangefs/kernel-debug
Debugging from several source files:
Debugging from several source files::
echo inode,dir > /sys/kernel/debug/orangefs/kernel-debug
All debugging:
All debugging::
echo all > /sys/kernel/debug/orangefs/kernel-debug
Get a list of all debugging keywords:
Get a list of all debugging keywords::
cat /sys/kernel/debug/orangefs/debug-help
PROTOCOL BETWEEN KERNEL MODULE AND USERSPACE
Protocol between Kernel Module and Userspace
============================================
Orangefs is a user space filesystem and an associated kernel module.
......@@ -234,7 +241,8 @@ The kernel module implements a pseudo device that userspace
can read from and write to. Userspace can also manipulate the
kernel module through the pseudo device with ioctl.
THE BUFMAP:
The Bufmap
----------
At startup userspace allocates two page-size-aligned (posix_memalign)
mlocked memory buffers, one is used for IO and one is used for readdir
......@@ -250,7 +258,8 @@ copied from user space to kernel space with copy_from_user and is used
to initialize the kernel module's "bufmap" (struct orangefs_bufmap), which
then contains:
* refcnt - a reference counter
* refcnt
- a reference counter
* desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) - the IO buffer's
partition size, which represents the filesystem's block size and
is used for s_blocksize in super blocks.
......@@ -259,17 +268,19 @@ then contains:
* desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks.
* total_size - the total size of the IO buffer.
* page_count - the number of 4096 byte pages in the IO buffer.
* page_array - a pointer to page_count * (sizeof(struct page*)) bytes
* page_array - a pointer to ``page_count * (sizeof(struct page*))`` bytes
of kcalloced memory. This memory is used as an array of pointers
to each of the pages in the IO buffer through a call to get_user_pages.
* desc_array - a pointer to desc_count * (sizeof(struct orangefs_bufmap_desc))
* desc_array - a pointer to ``desc_count * (sizeof(struct orangefs_bufmap_desc))``
bytes of kcalloced memory. This memory is further intialized:
user_desc is the kernel's copy of the IO buffer's ORANGEFS_dev_map_desc
structure. user_desc->ptr points to the IO buffer.
pages_per_desc = bufmap->desc_size / PAGE_SIZE
offset = 0
::
pages_per_desc = bufmap->desc_size / PAGE_SIZE
offset = 0
bufmap->desc_array[0].page_array = &bufmap->page_array[offset]
bufmap->desc_array[0].array_count = pages_per_desc = 1024
......@@ -293,7 +304,8 @@ then contains:
* readdir_index_lock - a spinlock to protect readdir_index_array during
update.
OPERATIONS:
Operations
----------
The kernel module builds an "op" (struct orangefs_kernel_op_s) when it
needs to communicate with userspace. Part of the op contains the "upcall"
......@@ -308,13 +320,19 @@ in flight at any given time.
Ops are stateful:
* unknown - op was just initialized
* waiting - op is on request_list (upward bound)
* inprogr - op is in progress (waiting for downcall)
* serviced - op has matching downcall; ok
* purged - op has to start a timer since client-core
* unknown
- op was just initialized
* waiting
- op is on request_list (upward bound)
* inprogr
- op is in progress (waiting for downcall)
* serviced
- op has matching downcall; ok
* purged
- op has to start a timer since client-core
exited uncleanly before servicing op
* given up - submitter has given up waiting for it
* given up
- submitter has given up waiting for it
When some arbitrary userspace program needs to perform a
filesystem operation on Orangefs (readdir, I/O, create, whatever)
......@@ -389,10 +407,15 @@ union of structs, each of which is associated with a particular
response type.
The several members outside of the union are:
- int32_t type - type of operation.
- int32_t status - return code for the operation.
- int64_t trailer_size - 0 unless readdir operation.
- char *trailer_buf - initialized to NULL, used during readdir operations.
``int32_t type``
- type of operation.
``int32_t status``
- return code for the operation.
``int64_t trailer_size``
- 0 unless readdir operation.
``char *trailer_buf``
- initialized to NULL, used during readdir operations.
The appropriate member inside the union is filled out for any
particular response.
......@@ -449,18 +472,20 @@ Userspace uses writev() on /dev/pvfs2-req to pass responses to the requests
made by the kernel side.
A buffer_list containing:
- a pointer to the prepared response to the request from the
kernel (struct pvfs2_downcall_t).
- and also, in the case of a readdir request, a pointer to a
buffer containing descriptors for the objects in the target
directory.
... is sent to the function (PINT_dev_write_list) which performs
the writev.
PINT_dev_write_list has a local iovec array: struct iovec io_array[10];
The first four elements of io_array are initialized like this for all
responses:
responses::
io_array[0].iov_base = address of local variable "proto_ver" (int32_t)
io_array[0].iov_len = sizeof(int32_t)
......@@ -475,7 +500,7 @@ responses:
of global variable vfs_request (vfs_request_t)
io_array[3].iov_len = sizeof(pvfs2_downcall_t)
Readdir responses initialize the fifth element io_array like this:
Readdir responses initialize the fifth element io_array like this::
io_array[4].iov_base = contents of member trailer_buf (char *)
from out_downcall member of global variable
......@@ -517,13 +542,13 @@ from a dentry is cheap, obtaining it from userspace is relatively expensive,
hence the motivation to use the dentry when possible.
The timeout values d_time and getattr_time are jiffy based, and the
code is designed to avoid the jiffy-wrap problem:
code is designed to avoid the jiffy-wrap problem::
"In general, if the clock may have wrapped around more than once, there
is no way to tell how much time has elapsed. However, if the times t1
and t2 are known to be fairly close, we can reliably compute the
difference in a way that takes into account the possibility that the
clock may have wrapped between times."
"In general, if the clock may have wrapped around more than once, there
is no way to tell how much time has elapsed. However, if the times t1
and t2 are known to be fairly close, we can reliably compute the
difference in a way that takes into account the possibility that the
clock may have wrapped between times."
from course notes by instructor Andy Wang
from course notes by instructor Andy Wang
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment