Commit 61f00590 authored by Jonathan Corbet's avatar Jonathan Corbet

Merge branch 'nfs' into docs-next

Daniel W. S. Almeida writes:

  This series converts a few docs in Documentation/filesystems/nfs to RST.
  The docs were also moved into admin-guide because they contain
  information that might be useful for system administrators

  Most changes are related to aesthetics and presentation, i.e. the content
  itself remains mostly untouched. The use of markup was limited in order
  not to negatively impact the plain-text reading experience.
parents a1986433 6996e8ca
...@@ -76,6 +76,7 @@ configure specific aspects of kernel behavior to your liking. ...@@ -76,6 +76,7 @@ configure specific aspects of kernel behavior to your liking.
device-mapper/index device-mapper/index
efi-stub efi-stub
ext4 ext4
nfs/index
gpio/index gpio/index
highuid highuid
hw_random hw_random
......
===================
NFS Fault Injection
===================
Fault Injection
===============
Fault injection is a method for forcing errors that may not normally occur, or Fault injection is a method for forcing errors that may not normally occur, or
may be difficult to reproduce. Forcing these errors in a controlled environment may be difficult to reproduce. Forcing these errors in a controlled environment
can help the developer find and fix bugs before their code is shipped in a can help the developer find and fix bugs before their code is shipped in a
......
=============
NFS
=============
.. toctree::
:maxdepth: 1
nfs-client
nfsroot
nfs-rdma
nfsd-admin-interfaces
nfs-idmapper
pnfs-block-server
pnfs-scsi-server
fault_injection
==========
NFS Client
==========
The NFS client The NFS client
============== ==============
...@@ -59,10 +62,11 @@ The DNS resolver ...@@ -59,10 +62,11 @@ The DNS resolver
NFSv4 allows for one server to refer the NFS client to data that has been NFSv4 allows for one server to refer the NFS client to data that has been
migrated onto another server by means of the special "fs_locations" migrated onto another server by means of the special "fs_locations"
attribute. See attribute. See `RFC3530 Section 6: Filesystem Migration and Replication`_ and
http://tools.ietf.org/html/rfc3530#section-6 `Implementation Guide for Referrals in NFSv4`_.
and
http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00 .. _RFC3530 Section 6\: Filesystem Migration and Replication: http://tools.ietf.org/html/rfc3530#section-6
.. _Implementation Guide for Referrals in NFSv4: http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00
The fs_locations information can take the form of either an ip address and The fs_locations information can take the form of either an ip address and
a path, or a DNS hostname and a path. The latter requires the NFS client to a path, or a DNS hostname and a path. The latter requires the NFS client to
...@@ -78,8 +82,8 @@ Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual ...@@ -78,8 +82,8 @@ Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual
(2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent' (2) If no valid entry exists, the helper script '/sbin/nfs_cache_getent'
(may be changed using the 'nfs.cache_getent' kernel boot parameter) (may be changed using the 'nfs.cache_getent' kernel boot parameter)
is run, with two arguments: is run, with two arguments:
- the cache name, "dns_resolve" - the cache name, "dns_resolve"
- the hostname to resolve - the hostname to resolve
(3) After looking up the corresponding ip address, the helper script (3) After looking up the corresponding ip address, the helper script
writes the result into the rpc_pipefs pseudo-file writes the result into the rpc_pipefs pseudo-file
...@@ -94,43 +98,44 @@ Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual ...@@ -94,43 +98,44 @@ Assuming that the user has the 'rpc_pipefs' filesystem mounted in the usual
script, and <ttl> is the 'time to live' of this cache entry (in script, and <ttl> is the 'time to live' of this cache entry (in
units of seconds). units of seconds).
Note: If <ip address> is invalid, say the string "0", then a negative .. note::
entry is created, which will cause the kernel to treat the hostname If <ip address> is invalid, say the string "0", then a negative
as having no valid DNS translation. entry is created, which will cause the kernel to treat the hostname
as having no valid DNS translation.
A basic sample /sbin/nfs_cache_getent A basic sample /sbin/nfs_cache_getent
===================================== =====================================
.. code-block:: sh
#!/bin/bash
# #!/bin/bash
ttl=600 #
# ttl=600
cut=/usr/bin/cut #
getent=/usr/bin/getent cut=/usr/bin/cut
rpc_pipefs=/var/lib/nfs/rpc_pipefs getent=/usr/bin/getent
# rpc_pipefs=/var/lib/nfs/rpc_pipefs
die() #
{ die()
echo "Usage: $0 cache_name entry_name" {
exit 1 echo "Usage: $0 cache_name entry_name"
} exit 1
}
[ $# -lt 2 ] && die
cachename="$1" [ $# -lt 2 ] && die
cache_path=${rpc_pipefs}/cache/${cachename}/channel cachename="$1"
cache_path=${rpc_pipefs}/cache/${cachename}/channel
case "${cachename}" in
dns_resolve) case "${cachename}" in
name="$2" dns_resolve)
result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )" name="$2"
[ -z "${result}" ] && result="0" result="$(${getent} hosts ${name} | ${cut} -f1 -d\ )"
;; [ -z "${result}" ] && result="0"
*) ;;
die *)
;; die
esac ;;
echo "${result} ${name} ${ttl}" >${cache_path} esac
echo "${result} ${name} ${ttl}" >${cache_path}
=============
NFS ID Mapper
=============
=========
ID Mapper
=========
Id mapper is used by NFS to translate user and group ids into names, and to Id mapper is used by NFS to translate user and group ids into names, and to
translate user and group names into ids. Part of this translation involves translate user and group names into ids. Part of this translation involves
performing an upcall to userspace to request the information. There are two performing an upcall to userspace to request the information. There are two
...@@ -20,22 +20,24 @@ legacy rpc.idmap daemon for the id mapping. This result will be stored ...@@ -20,22 +20,24 @@ legacy rpc.idmap daemon for the id mapping. This result will be stored
in a custom NFS idmap cache. in a custom NFS idmap cache.
===========
Configuring Configuring
=========== ===========
The file /etc/request-key.conf will need to be modified so /sbin/request-key can The file /etc/request-key.conf will need to be modified so /sbin/request-key can
direct the upcall. The following line should be added: direct the upcall. The following line should be added:
#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ... ``#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...``
#====== ======= =============== =============== =============================== ``#====== ======= =============== =============== ===============================``
create id_resolver * * /usr/sbin/nfs.idmap %k %d 600 ``create id_resolver * * /usr/sbin/nfs.idmap %k %d 600``
This will direct all id_resolver requests to the program /usr/sbin/nfs.idmap. This will direct all id_resolver requests to the program /usr/sbin/nfs.idmap.
The last parameter, 600, defines how many seconds into the future the key will The last parameter, 600, defines how many seconds into the future the key will
expire. This parameter is optional for /usr/sbin/nfs.idmap. When the timeout expire. This parameter is optional for /usr/sbin/nfs.idmap. When the timeout
is not specified, nfs.idmap will default to 600 seconds. is not specified, nfs.idmap will default to 600 seconds.
id mapper uses for key descriptions: id mapper uses for key descriptions::
uid: Find the UID for the given user uid: Find the UID for the given user
gid: Find the GID for the given group gid: Find the GID for the given group
user: Find the user name for the given UID user: Find the user name for the given UID
...@@ -45,23 +47,24 @@ You can handle any of these individually, rather than using the generic upcall ...@@ -45,23 +47,24 @@ You can handle any of these individually, rather than using the generic upcall
program. If you would like to use your own program for a uid lookup then you program. If you would like to use your own program for a uid lookup then you
would edit your request-key.conf so it look similar to this: would edit your request-key.conf so it look similar to this:
#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ... ``#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...``
#====== ======= =============== =============== =============================== ``#====== ======= =============== =============== ===============================``
create id_resolver uid:* * /some/other/program %k %d 600 ``create id_resolver uid:* * /some/other/program %k %d 600``
create id_resolver * * /usr/sbin/nfs.idmap %k %d 600 ``create id_resolver * * /usr/sbin/nfs.idmap %k %d 600``
Notice that the new line was added above the line for the generic program. Notice that the new line was added above the line for the generic program.
request-key will find the first matching line and corresponding program. In request-key will find the first matching line and corresponding program. In
this case, /some/other/program will handle all uid lookups and this case, /some/other/program will handle all uid lookups and
/usr/sbin/nfs.idmap will handle gid, user, and group lookups. /usr/sbin/nfs.idmap will handle gid, user, and group lookups.
See <file:Documentation/security/keys/request-key.rst> for more information See Documentation/security/keys/request-key.rst for more information
about the request-key function. about the request-key function.
=========
nfs.idmap nfs.idmap
========= =========
nfs.idmap is designed to be called by request-key, and should not be run "by nfs.idmap is designed to be called by request-key, and should not be run "by
hand". This program takes two arguments, a serialized key and a key hand". This program takes two arguments, a serialized key and a key
description. The serialized key is first converted into a key_serial_t, and description. The serialized key is first converted into a key_serial_t, and
......
===================
Setting up NFS/RDMA
===================
:Author:
NetApp and Open Grid Computing (May 29, 2008)
.. warning::
This document is probably obsolete.
Overview
========
This document describes how to install and setup the Linux NFS/RDMA client
and server software.
The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server
was first included in the following release, Linux 2.6.25.
In our testing, we have obtained excellent performance results (full 10Gbit
wire bandwidth at minimal client CPU) under many workloads. The code passes
the full Connectathon test suite and operates over both Infiniband and iWARP
RDMA adapters.
Getting Help
============
If you get stuck, you can ask questions on the
nfs-rdma-devel@lists.sourceforge.net mailing list.
Installation
============
These instructions are a step by step guide to building a machine for
use with NFS/RDMA.
- Install an RDMA device
Any device supported by the drivers in drivers/infiniband/hw is acceptable.
Testing has been performed using several Mellanox-based IB cards, the
Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter.
- Install a Linux distribution and tools
The first kernel release to contain both the NFS/RDMA client and server was
Linux 2.6.25 Therefore, a distribution compatible with this and subsequent
Linux kernel release should be installed.
The procedures described in this document have been tested with
distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
- Install nfs-utils-1.1.2 or greater on the client
An NFS/RDMA mount point can be obtained by using the mount.nfs command in
nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
version with support for NFS/RDMA mounts, but for various reasons we
recommend using nfs-utils-1.1.2 or greater). To see which version of
mount.nfs you are using, type:
.. code-block:: sh
$ /sbin/mount.nfs -V
If the version is less than 1.1.2 or the command does not exist,
you should install the latest version of nfs-utils.
Download the latest package from: http://www.kernel.org/pub/linux/utils/nfs
Uncompress the package and follow the installation instructions.
If you will not need the idmapper and gssd executables (you do not need
these to create an NFS/RDMA enabled mount command), the installation
process can be simplified by disabling these features when running
configure:
.. code-block:: sh
$ ./configure --disable-gss --disable-nfsv4
To build nfs-utils you will need the tcp_wrappers package installed. For
more information on this see the package's README and INSTALL files.
After building the nfs-utils package, there will be a mount.nfs binary in
the utils/mount directory. This binary can be used to initiate NFS v2, v3,
or v4 mounts. To initiate a v4 mount, the binary must be called
mount.nfs4. The standard technique is to create a symlink called
mount.nfs4 to mount.nfs.
This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
.. code-block:: sh
$ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
In this location, mount.nfs will be invoked automatically for NFS mounts
by the system mount command.
.. note::
mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
on the NFS client machine. You do not need this specific version of
nfs-utils on the server. Furthermore, only the mount.nfs command from
nfs-utils-1.1.2 is needed on the client.
- Install a Linux kernel with NFS/RDMA
The NFS/RDMA client and server are both included in the mainline Linux
kernel version 2.6.25 and later. This and other versions of the Linux
kernel can be found at: https://www.kernel.org/pub/linux/kernel/
Download the sources and place them in an appropriate location.
- Configure the RDMA stack
Make sure your kernel configuration has RDMA support enabled. Under
Device Drivers -> InfiniBand support, update the kernel configuration
to enable InfiniBand support [NOTE: the option name is misleading. Enabling
InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)].
Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or
iWARP adapter support (amso, cxgb3, etc.).
If you are using InfiniBand, be sure to enable IP-over-InfiniBand support.
- Configure the NFS client and server
Your kernel configuration must also have NFS file system support and/or
NFS server support enabled. These and other NFS related configuration
options can be found under File Systems -> Network File Systems.
- Build, install, reboot
The NFS/RDMA code will be enabled automatically if NFS and RDMA
are turned on. The NFS/RDMA client and server are configured via the hidden
SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The
value of SUNRPC_XPRT_RDMA will be:
#. N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client
and server will not be built
#. M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M,
in this case the NFS/RDMA client and server will be built as modules
#. Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client
and server will be built into the kernel
Therefore, if you have followed the steps above and turned no NFS and RDMA,
the NFS/RDMA client and server will be built.
Build a new kernel, install it, boot it.
Check RDMA and NFS Setup
========================
Before configuring the NFS/RDMA software, it is a good idea to test
your new kernel to ensure that the kernel is working correctly.
In particular, it is a good idea to verify that the RDMA stack
is functioning as expected and standard NFS over TCP/IP and/or UDP/IP
is working properly.
- Check RDMA Setup
If you built the RDMA components as modules, load them at
this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
card:
.. code-block:: sh
$ modprobe ib_mthca
$ modprobe ib_ipoib
If you are using InfiniBand, make sure there is a Subnet Manager (SM)
running on the network. If your IB switch has an embedded SM, you can
use it. Otherwise, you will need to run an SM, such as OpenSM, on one
of your end nodes.
If an SM is running on your network, you should see the following:
.. code-block:: sh
$ cat /sys/class/infiniband/driverX/ports/1/state
4: ACTIVE
where driverX is mthca0, ipath5, ehca3, etc.
To further test the InfiniBand software stack, use IPoIB (this
assumes you have two IB hosts named host1 and host2):
.. code-block:: sh
host1$ ip link set dev ib0 up
host1$ ip address add dev ib0 a.b.c.x
host2$ ip link set dev ib0 up
host2$ ip address add dev ib0 a.b.c.y
host1$ ping a.b.c.y
host2$ ping a.b.c.x
For other device types, follow the appropriate procedures.
- Check NFS Setup
For the NFS components enabled above (client and/or server),
test their functionality over standard Ethernet using TCP/IP or UDP/IP.
NFS/RDMA Setup
==============
We recommend that you use two machines, one to act as the client and
one to act as the server.
One time configuration:
-----------------------
- On the server system, configure the /etc/exports file and start the NFS/RDMA server.
Exports entries with the following formats have been tested::
/vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
/vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
The IP address(es) is(are) the client's IPoIB address for an InfiniBand
HCA or the client's iWARP address(es) for an RNIC.
.. note::
The "insecure" option must be used because the NFS/RDMA client does
not use a reserved port.
Each time a machine boots:
--------------------------
- Load and configure the RDMA drivers
For InfiniBand using a Mellanox adapter:
.. code-block:: sh
$ modprobe ib_mthca
$ modprobe ib_ipoib
$ ip li set dev ib0 up
$ ip addr add dev ib0 a.b.c.d
.. note::
Please use unique addresses for the client and server!
- Start the NFS server
If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
kernel config), load the RDMA transport module:
.. code-block:: sh
$ modprobe svcrdma
Regardless of how the server was built (module or built-in), start the
server:
.. code-block:: sh
$ /etc/init.d/nfs start
or
.. code-block:: sh
$ service nfs start
Instruct the server to listen on the RDMA transport:
.. code-block:: sh
$ echo rdma 20049 > /proc/fs/nfsd/portlist
- On the client system
If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
kernel config), load the RDMA client module:
.. code-block:: sh
$ modprobe xprtrdma.ko
Regardless of how the client was built (module or built-in), use this
command to mount the NFS/RDMA server:
.. code-block:: sh
$ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt
To verify that the mount is using RDMA, run "cat /proc/mounts" and check
the "proto" field for the given mount.
Congratulations! You're using NFS/RDMA!
==================================
Administrative interfaces for nfsd Administrative interfaces for nfsd
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ==================================
Note that normally these interfaces are used only by the utilities in Note that normally these interfaces are used only by the utilities in
nfs-utils. nfs-utils.
...@@ -13,18 +14,16 @@ nfsd/threads. ...@@ -13,18 +14,16 @@ nfsd/threads.
Before doing that, NFSD can be told which sockets to listen on by Before doing that, NFSD can be told which sockets to listen on by
writing to nfsd/portlist; that write may be: writing to nfsd/portlist; that write may be:
- an ascii-encoded file descriptor, which should refer to a - an ascii-encoded file descriptor, which should refer to a
bound (and listening, for tcp) socket, or bound (and listening, for tcp) socket, or
- "transportname port", where transportname is currently either - "transportname port", where transportname is currently either
"udp", "tcp", or "rdma". "udp", "tcp", or "rdma".
If nfsd is started without doing any of these, then it will create one If nfsd is started without doing any of these, then it will create one
udp and one tcp listener at port 2049 (see nfsd_init_socks). udp and one tcp listener at port 2049 (see nfsd_init_socks).
On startup, nfsd and lockd grace periods start. On startup, nfsd and lockd grace periods start. nfsd is shut down by a write of
0 to nfsd/threads. All locks and state are thrown away at that point.
nfsd is shut down by a write of 0 to nfsd/threads. All locks and state
are thrown away at that point.
Between startup and shutdown, the number of threads may be adjusted up Between startup and shutdown, the number of threads may be adjusted up
or down by additional writes to nfsd/threads or by writes to or down by additional writes to nfsd/threads or by writes to
...@@ -34,7 +33,7 @@ For more detail about files under nfsd/ and what they control, see ...@@ -34,7 +33,7 @@ For more detail about files under nfsd/ and what they control, see
fs/nfsd/nfsctl.c; most of them have detailed comments. fs/nfsd/nfsctl.c; most of them have detailed comments.
Implementation notes Implementation notes
^^^^^^^^^^^^^^^^^^^^ ====================
Note that the rpc server requires the caller to serialize addition and Note that the rpc server requires the caller to serialize addition and
removal of listening sockets, and startup and shutdown of the server. removal of listening sockets, and startup and shutdown of the server.
......
===================================
pNFS block layout server user guide pNFS block layout server user guide
===================================
The Linux NFS server now supports the pNFS block layout extension. In this The Linux NFS server now supports the pNFS block layout extension. In this
case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition
...@@ -22,16 +24,19 @@ If the nfsd server needs to fence a non-responding client it calls ...@@ -22,16 +24,19 @@ If the nfsd server needs to fence a non-responding client it calls
/sbin/nfsd-recall-failed with the first argument set to the IP address of /sbin/nfsd-recall-failed with the first argument set to the IP address of
the client, and the second argument set to the device node without the /dev the client, and the second argument set to the device node without the /dev
prefix for the file system to be fenced. Below is an example file that shows prefix for the file system to be fenced. Below is an example file that shows
how to translate the device into a serial number from SCSI EVPD 0x80: how to translate the device into a serial number from SCSI EVPD 0x80::
cat > /sbin/nfsd-recall-failed << EOF cat > /sbin/nfsd-recall-failed << EOF
#!/bin/sh
CLIENT="$1" .. code-block:: sh
DEV="/dev/$2"
EVPD=`sg_inq --page=0x80 ${DEV} | \
grep "Unit serial number:" | \
awk -F ': ' '{print $2}'`
echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log #!/bin/sh
EOF
CLIENT="$1"
DEV="/dev/$2"
EVPD=`sg_inq --page=0x80 ${DEV} | \
grep "Unit serial number:" | \
awk -F ': ' '{print $2}'`
echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log
EOF
==================================
pNFS SCSI layout server user guide pNFS SCSI layout server user guide
================================== ==================================
......
################################################################################
# #
# NFS/RDMA README #
# #
################################################################################
Author: NetApp and Open Grid Computing
Date: May 29, 2008
Table of Contents
~~~~~~~~~~~~~~~~~
- Overview
- Getting Help
- Installation
- Check RDMA and NFS Setup
- NFS/RDMA Setup
Overview
~~~~~~~~
This document describes how to install and setup the Linux NFS/RDMA client
and server software.
The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server
was first included in the following release, Linux 2.6.25.
In our testing, we have obtained excellent performance results (full 10Gbit
wire bandwidth at minimal client CPU) under many workloads. The code passes
the full Connectathon test suite and operates over both Infiniband and iWARP
RDMA adapters.
Getting Help
~~~~~~~~~~~~
If you get stuck, you can ask questions on the
nfs-rdma-devel@lists.sourceforge.net
mailing list.
Installation
~~~~~~~~~~~~
These instructions are a step by step guide to building a machine for
use with NFS/RDMA.
- Install an RDMA device
Any device supported by the drivers in drivers/infiniband/hw is acceptable.
Testing has been performed using several Mellanox-based IB cards, the
Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter.
- Install a Linux distribution and tools
The first kernel release to contain both the NFS/RDMA client and server was
Linux 2.6.25 Therefore, a distribution compatible with this and subsequent
Linux kernel release should be installed.
The procedures described in this document have been tested with
distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
- Install nfs-utils-1.1.2 or greater on the client
An NFS/RDMA mount point can be obtained by using the mount.nfs command in
nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
version with support for NFS/RDMA mounts, but for various reasons we
recommend using nfs-utils-1.1.2 or greater). To see which version of
mount.nfs you are using, type:
$ /sbin/mount.nfs -V
If the version is less than 1.1.2 or the command does not exist,
you should install the latest version of nfs-utils.
Download the latest package from:
http://www.kernel.org/pub/linux/utils/nfs
Uncompress the package and follow the installation instructions.
If you will not need the idmapper and gssd executables (you do not need
these to create an NFS/RDMA enabled mount command), the installation
process can be simplified by disabling these features when running
configure:
$ ./configure --disable-gss --disable-nfsv4
To build nfs-utils you will need the tcp_wrappers package installed. For
more information on this see the package's README and INSTALL files.
After building the nfs-utils package, there will be a mount.nfs binary in
the utils/mount directory. This binary can be used to initiate NFS v2, v3,
or v4 mounts. To initiate a v4 mount, the binary must be called
mount.nfs4. The standard technique is to create a symlink called
mount.nfs4 to mount.nfs.
This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
$ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
In this location, mount.nfs will be invoked automatically for NFS mounts
by the system mount command.
NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
on the NFS client machine. You do not need this specific version of
nfs-utils on the server. Furthermore, only the mount.nfs command from
nfs-utils-1.1.2 is needed on the client.
- Install a Linux kernel with NFS/RDMA
The NFS/RDMA client and server are both included in the mainline Linux
kernel version 2.6.25 and later. This and other versions of the Linux
kernel can be found at:
https://www.kernel.org/pub/linux/kernel/
Download the sources and place them in an appropriate location.
- Configure the RDMA stack
Make sure your kernel configuration has RDMA support enabled. Under
Device Drivers -> InfiniBand support, update the kernel configuration
to enable InfiniBand support [NOTE: the option name is misleading. Enabling
InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)].
Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or
iWARP adapter support (amso, cxgb3, etc.).
If you are using InfiniBand, be sure to enable IP-over-InfiniBand support.
- Configure the NFS client and server
Your kernel configuration must also have NFS file system support and/or
NFS server support enabled. These and other NFS related configuration
options can be found under File Systems -> Network File Systems.
- Build, install, reboot
The NFS/RDMA code will be enabled automatically if NFS and RDMA
are turned on. The NFS/RDMA client and server are configured via the hidden
SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The
value of SUNRPC_XPRT_RDMA will be:
- N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client
and server will not be built
- M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M,
in this case the NFS/RDMA client and server will be built as modules
- Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client
and server will be built into the kernel
Therefore, if you have followed the steps above and turned no NFS and RDMA,
the NFS/RDMA client and server will be built.
Build a new kernel, install it, boot it.
Check RDMA and NFS Setup
~~~~~~~~~~~~~~~~~~~~~~~~
Before configuring the NFS/RDMA software, it is a good idea to test
your new kernel to ensure that the kernel is working correctly.
In particular, it is a good idea to verify that the RDMA stack
is functioning as expected and standard NFS over TCP/IP and/or UDP/IP
is working properly.
- Check RDMA Setup
If you built the RDMA components as modules, load them at
this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
card:
$ modprobe ib_mthca
$ modprobe ib_ipoib
If you are using InfiniBand, make sure there is a Subnet Manager (SM)
running on the network. If your IB switch has an embedded SM, you can
use it. Otherwise, you will need to run an SM, such as OpenSM, on one
of your end nodes.
If an SM is running on your network, you should see the following:
$ cat /sys/class/infiniband/driverX/ports/1/state
4: ACTIVE
where driverX is mthca0, ipath5, ehca3, etc.
To further test the InfiniBand software stack, use IPoIB (this
assumes you have two IB hosts named host1 and host2):
host1$ ip link set dev ib0 up
host1$ ip address add dev ib0 a.b.c.x
host2$ ip link set dev ib0 up
host2$ ip address add dev ib0 a.b.c.y
host1$ ping a.b.c.y
host2$ ping a.b.c.x
For other device types, follow the appropriate procedures.
- Check NFS Setup
For the NFS components enabled above (client and/or server),
test their functionality over standard Ethernet using TCP/IP or UDP/IP.
NFS/RDMA Setup
~~~~~~~~~~~~~~
We recommend that you use two machines, one to act as the client and
one to act as the server.
One time configuration:
- On the server system, configure the /etc/exports file and
start the NFS/RDMA server.
Exports entries with the following formats have been tested:
/vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
/vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
The IP address(es) is(are) the client's IPoIB address for an InfiniBand
HCA or the client's iWARP address(es) for an RNIC.
NOTE: The "insecure" option must be used because the NFS/RDMA client does
not use a reserved port.
Each time a machine boots:
- Load and configure the RDMA drivers
For InfiniBand using a Mellanox adapter:
$ modprobe ib_mthca
$ modprobe ib_ipoib
$ ip li set dev ib0 up
$ ip addr add dev ib0 a.b.c.d
NOTE: use unique addresses for the client and server
- Start the NFS server
If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
kernel config), load the RDMA transport module:
$ modprobe svcrdma
Regardless of how the server was built (module or built-in), start the
server:
$ /etc/init.d/nfs start
or
$ service nfs start
Instruct the server to listen on the RDMA transport:
$ echo rdma 20049 > /proc/fs/nfsd/portlist
- On the client system
If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
kernel config), load the RDMA client module:
$ modprobe xprtrdma.ko
Regardless of how the client was built (module or built-in), use this
command to mount the NFS/RDMA server:
$ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt
To verify that the mount is using RDMA, run "cat /proc/mounts" and check
the "proto" field for the given mount.
Congratulations! You're using NFS/RDMA!
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment