Commits · 93b7f7ad2018d2037559b1d0892417864c78b371 · Kirill Smelkov / linux

12 Jun, 2018 1 commit

skip LAYOUTRETURN if layout is invalid · 93b7f7ad

Olga Kornievskaia authored Jun 11, 2018

Currently, when IO to DS fails, client returns the layout and
retries against the MDS. However, then on umounting (inode eviction)
it returns the layout again.

This is because pnfs_return_layout() was changed in
commit d78471d3 ("pnfs/blocklayout: set PNFS_LAYOUTRETURN_ON_ERROR")
to always set NFS_LAYOUT_RETURN_REQUESTED so even if we returned
the layout, it will be returned again. Instead, let's also check
if we have already marked the layout invalid.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

93b7f7ad

09 Jun, 2018 3 commits

NFSv4.1: Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY · f9312a54

Trond Myklebust authored Jun 09, 2018

If the server returns NFS4ERR_SEQ_FALSE_RETRY or NFS4ERR_RETRY_UNCACHED_REP,
then it thinks we're trying to replay an existing request. If so, then
let's just bump the sequence ID and retry the operation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f9312a54

NFSv4: Fix a typo in nfs41_sequence_process · 99589100

Trond Myklebust authored Jun 09, 2018

We want to compare the slot_id to the highest slot number advertised by the
server.

Fixes: 3be0f80b ("NFSv4.1: Fix up replays of interrupted requests")
Cc: stable@vger.kernel.org # 4.15+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

99589100

NFSv4: Revert commit ("NFSv4.x: Fix wraparound issues..") · fc40724f

Trond Myklebust authored Jun 09, 2018

The correct behaviour for NFSv4 sequence IDs is to wrap around
to the value 0 after 0xffffffff.
See https://tools.ietf.org/html/rfc5661#section-2.10.6.1

Fixes: 5f83d86c ("NFSv4.x: Fix wraparound issues when validing...")
Cc: stable@vger.kernel.org # 4.6+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

fc40724f

08 Jun, 2018 2 commits

NFSv4: Return NFS4ERR_DELAY when a layout recall fails due to igrab() · ce5624f7
Trond Myklebust authored Jun 07, 2018
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
ce5624f7

NFSv4: Return NFS4ERR_DELAY when a delegation recall fails due to igrab() · 6c342655

Trond Myklebust authored Jun 07, 2018

If the attempt to recall the delegation fails because the inode is
in the process of being evicted from cache, then use NFS4ERR_DELAY
to ask the server to retry later.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6c342655

06 Jun, 2018 2 commits

NFSv4.0: Remove transport protocol name from non-UCS client ID · 025bb9f8

Chuck Lever authored Jun 04, 2018

Commit 69dd716c ("NFSv4: Add socket proto argument to
setclientid") (2007) added the transport protocol name to the client
ID string, but the patch description doesn't explain why this was
necessary.

At that time, the only transport protocol name that would have been
used is "tcp" (for both IPv4 and IPv6), resulting in no additional
distinctiveness of the client ID string.

Since there is one client instance, the server should recognize it's
state whether the client is connecting via TCP or RDMA. Same client,
same lease.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

025bb9f8

NFSv4.0: Remove cl_ipaddr from non-UCS client ID · 848a4eb2

Chuck Lever authored Jun 04, 2018

It is possible for two distinct clients to have the same cl_ipaddr:

 - if the client admin disables callback with clientaddr=0.0.0.0 on
   more than one client

 - if two clients behind separate NATs use the same private subnet
   number

 - if the client admin specifies the same address via clientaddr=
   mount option (pointing the server at the same NAT box, for
   example)

Because of the way the Linux NFSv4.0 client constructs its client
ID string by default, such clients could interfere with each others'
lease state when mounting the same server:

	scnprintf(str, len, "Linux NFSv4.0 %s/%s %s",
		clp->cl_ipaddr,
		rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR),
		rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_PROTO));

cl_ipaddr is set to the value of the clientaddr= mount option. Two
clients whose addresses are 192.168.3.77 that mount the same server
(whose public IP address is, say, 3.4.5.6) would both generate the
same client ID string when sending a SETCLIENTID:

  Linux NFSv4.0 192.168.3.77/3.4.5.6 tcp

and thus the server would not be able to distinguish the clients'
leases. If both clients are using AUTH_SYS when sending SETCLIENTID
then the server could possibly permit the two clients to interfere
with or purge each others' leases.

To better ensure that Linux's NFSv4.0 client ID strings are distinct
in these cases, remove cl_ipaddr from the client ID string and
replace it with something more likely to be unique. Note that the
replacement looks a lot like the uniform client ID string.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

848a4eb2

05 Jun, 2018 1 commit

NFSv4: Fix a compiler warning when CONFIG_NFS_V4_1 is undefined · 977294c7

Trond Myklebust authored Jun 05, 2018

Fix a compiler warning:
fs/nfs/nfs4proc.c:910:13: warning: 'nfs4_layoutget_release' defined but not used [-Wunused-function]
 static void nfs4_layoutget_release(void *calldata)
             ^~~~~~~~~~~~~~~~~~~~~~
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

977294c7

04 Jun, 2018 12 commits

Merge tag 'nfs-rdma-for-4.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfs · fcda3d5d

Trond Myklebust authored Jun 04, 2018

NFS-over-RDMA client updates for Linux 4.18

Stable patches:
- xprtrdma: Return -ENOBUFS when no pages are available

New features:
- Add ->alloc_slot() and ->free_slot() functions

Bugfixes and cleanups:
- Add missing SPDX tags to some files
- Try to fail mount quickly if client has no RDMA devices
- Create transport IDs in the correct network namespace
- Fix max_send_wr computation
- Clean up receive tracepoints
- Refactor receive handling
- Remove unused functions

fcda3d5d

NFS: Filter cache invalidation when holding a delegation · 3f0b3cf4

Trond Myklebust authored Jun 03, 2018

If the client holds a delegation, then ensure we filter out attempts
to invalidate the size, owner, group owner, or mode unless we made the
change, in which case, check that NFS_INO_REVAL_FORCED is set by the
caller.
Always filter out attempts to invalidate the change attribute and
size, since we are authoritative for those.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

3f0b3cf4

NFS: Ignore NFS_INO_REVAL_FORCED in nfs_check_inode_attributes() · 4ebe83af

Trond Myklebust authored Jun 03, 2018

If we hold a delegation, we should not need to call
nfs_check_inode_attributes() since we already know which attributes
are valid, and which ones may still need revalidation. The state
of the NFS_INO_REVAL_FORCED flag is therefore irrelevant.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

4ebe83af

NFS: Improve caching while holding a delegation · c80d17c5

Trond Myklebust authored Jun 03, 2018

Make sure that the client completely ignores change attribute and size
changes on the server when it holds a delegation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c80d17c5

NFS: Fix attribute revalidation · 0b467264

Trond Myklebust authored Jun 03, 2018

Don't mark attributes as invalid just because they have changed. Instead,
for the purposes of adjusting the attribute cache timeout, keep a
separate variable that tracks whether or not a change occurred.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0b467264

NFS: fix up nfs_setattr_update_inode · 6a97d02d

Trond Myklebust authored Apr 08, 2018

Always try to set the attributes, even if we don't have a valid struct
nfs_fattr.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6a97d02d

NFSv4: Ensure the inode is clean when we set a delegation · 97c2c17a

Trond Myklebust authored Apr 07, 2018

If there are attributes that are still invalid when we set a delegation,
then we need to set the NFS_INO_REVAL_FORCED flag.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

97c2c17a

NFSv4: Ignore NFS_INO_REVAL_FORCED in nfs4_proc_access · 7c672654

Trond Myklebust authored Jun 04, 2018

If we hold a delegation, we don't need to care about whether or not
the inode attributes are up to date. We know we can cache the results
of this call regardless.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7c672654

NFSv4: Don't ask for delegated attributes when adding a hard link · 2f28dc38
Trond Myklebust authored Apr 08, 2018
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
2f28dc38

NFSv4: Don't ask for delegated attributes when revalidating the inode · 771734f2

Trond Myklebust authored Apr 07, 2018

Again, when revalidating the inode, we don't need to ask for attributes
for which we are authoritative.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

771734f2

NFS: Pass the inode down to the getattr() callback · a841b54d

Trond Myklebust authored Apr 07, 2018

Allow the getattr() callback to check things like whether or not we hold
a delegation so that it can adjust the attributes that it is asking for.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

a841b54d

NFSv4: Don't request size+change attribute if they are delegated to us · 30846df0

Trond Myklebust authored Apr 07, 2018

When we hold a delegation, we should not need to request attributes such
as the file size or the change attribute. For some servers, avoiding
asking for these unneeded attributes can improve the overall system
performance.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

30846df0

01 Jun, 2018 5 commits

xprtrdma: Remove transfertypes array · 11d0ac16

Chuck Lever authored May 04, 2018

Clean up: This array was used in a dprintk that was replaced by a
trace point in commit ab03eff5 ("xprtrdma: Add trace points in
RPC Call transmit paths").
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

11d0ac16

xprtrdma: Add trace_xprtrdma_dma_map(mr) · 8335640c

Chuck Lever authored May 04, 2018

Matches trace_xprtrdma_dma_unmap(mr).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

8335640c

xprtrdma: Wait on empty sendctx queue · 2fad6592

Chuck Lever authored May 04, 2018

Currently, when the sendctx queue is exhausted during marshaling, the
RPC/RDMA transport places the RPC task on the delayq, which forces a
wait for HZ >> 2 before the marshal and send is retried.

With this change, the transport now places such an RPC task on the
pending queue, and wakes it just as soon as more sendctxs become
available. This typically takes less than a millisecond, and the
write_space waking mechanism is less deadlock-prone.

Moreover, the waiting RPC task is holding the transport's write
lock, which blocks the transport from sending RPCs. Therefore faster
recovery from sendctx queue exhaustion is desirable.

Cf. commit 5804891455d5 ("xprtrdma: ->send_request returns -EAGAIN
when there are no free MRs").
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

2fad6592

xprtrdma: Move common wait_for_buffer_space call to parent function · ed3aa742

Chuck Lever authored May 04, 2018

Clean up: The logic to wait for write space is common to a bunch of
the encoding helper functions. Lift it out and put it in the tail
of rpcrdma_marshal_req().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

ed3aa742

xprtrdma: Return -ENOBUFS when no pages are available · a8f688ec

Chuck Lever authored May 04, 2018

The use of -EAGAIN in rpcrdma_convert_iovs() is a latent bug: the
transport never calls xprt_write_space() when more pages become
available. -ENOBUFS will trigger the correct "delay briefly and call
again" logic.

Fixes: 7a89f9c6 ("xprtrdma: Honor ->send_request API contract")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

a8f688ec

31 May, 2018 14 commits

pnfs: Don't release the sequence slot until we've processed layoutget on open · ae55e59d

Trond Myklebust authored May 22, 2018

If the server recalls the layout that was just handed out, we risk hitting
a race as described in RFC5661 Section 2.10.6.3 unless we ensure that we
release the sequence slot after processing the LAYOUTGET operation that
was sent as part of the OPEN compound.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

ae55e59d

pnfs: Don't call commit on failed layoutget-on-open · 32f1c28f

Trond Myklebust authored May 22, 2018

If the layoutget on open call failed, we can't really commit the inode,
so don't bother calling it.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

32f1c28f

pNFS: Don't send LAYOUTGET on OPEN for read, if we already have cached data · 64294b08

Trond Myklebust authored Feb 02, 2017

If we're only opening the file for reading, and the file is empty and/or
we already have cached data, then heuristically optimise away the
LAYOUTGET.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

64294b08

NFSv4/pnfs: Don't switch off layoutget-on-open for transient errors · 8dc96566

Trond Myklebust authored Feb 01, 2017

Ensure that we only switch off the LAYOUTGET operation in the OPEN
compound when the server is truly broken, and/or it is complaining
that the compound is too large.
Currently, we end up turning off the functionality permanently,
even for transient errors such as EACCES or ENOSPC.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

8dc96566

NFSv4/pnfs: Ensure pnfs_parse_lgopen() won't try to parse uninitialised data · d49e0d5b

Trond Myklebust authored Feb 01, 2017

We need to ensure that pnfs_parse_lgopen() doesn't try to parse a
struct nfs4_layoutget_res that was not filled by a successful call
to decode_layoutget(). This can happen if we performed a cached open,
or if either the OP_ACCESS or OP_GETATTR operations preceding the
OP_LAYOUTGET in the compound returned an error.

By initialising the 'status' field to NFS4ERR_DELAY, we ensure that
pnfs_parse_lgopen() won't try to interpret the structure.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

d49e0d5b

pnfs: Fix manipulation of NFS_LAYOUT_FIRST_LAYOUTGET · 30ae2412

Fred Isaman authored Oct 18, 2016

The flag was not always being cleared after LAYOUTGET on OPEN.
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

30ae2412

pnfs: Add barrier to prevent lgopen using LAYOUTGET during recall · c49b5209

Fred Isaman authored Oct 05, 2016

Since the LAYOUTGET on OPEN can be sent without prior inode information,
existing methods to prevent LAYOUTGET from being sent while processing
CB_LAYOUTRECALL don't work. Track if a recall occurred while LAYOUTGET
was being sent, and if so ignore the results.
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c49b5209

pnfs: Stop attempting LAYOUTGET on OPEN on failure · 6e01260c

Fred Isaman authored Oct 04, 2016

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6e01260c

pnfs: Add LAYOUTGET to OPEN of an existing file · 78746a38

Fred Isaman authored Sep 22, 2016

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

78746a38

pNFS: Refactor nfs4_layoutget_release() · 29a8bfe5

Trond Myklebust authored May 30, 2018

Move the actual freeing of the struct nfs4_layoutget into fs/nfs/pnfs.c
where it can be reused by the layoutget on open code.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

29a8bfe5

pnfs: Add LAYOUTGET to OPEN of a new file · 2409a976

Fred Isaman authored Oct 06, 2016

This triggers when have no pre-existing inode to attach to.
The preexisting case is saved for later.
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

2409a976

pnfs: Change pnfs_alloc_init_layoutget_args call signature · 5e36e2a9

Fred Isaman authored Oct 06, 2016

Don't send in a layout, instead use the (possibly NULL) inode.

This is needed for LAYOUTGET attached to an OPEN where the inode is not
yet set.
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

5e36e2a9

pnfs: Move nfs4_opendata into nfs4_fs.h · 1b146fcf

Fred Isaman authored Sep 21, 2016

It will be needed now by the pnfs code.
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

1b146fcf

pnfs: Add conditional encode/decode of LAYOUTGET within OPEN compound · 56f487f8

Fred Isaman authored Sep 21, 2016

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

56f487f8