Commits · 35156bfff3c0cd44d0e2e674530e0817fd22b313 · nexedi / linux

10 Apr, 2018 40 commits

NFSv4: Fix the nfs_inode_set_delegation() arguments · 35156bff

Trond Myklebust authored Mar 20, 2018

Neither nfs_inode_set_delegation() nor nfs_inode_reclaim_delegation() are
generic code. They have no business delving into NFSv4 OPEN xdr structures,
so let's replace the "struct nfs_openres" parameter.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

35156bff

NFSv4: Clean up CB_GETATTR encoding · 8b064946

Trond Myklebust authored Mar 20, 2018

Replace the open coded bitmap implementation with a generic one.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

8b064946

NFSv4: Don't ask for attributes when ACCESS is protected by a delegation · 8bcbe7d9

Trond Myklebust authored Mar 20, 2018

If we hold a delegation, then the results of the ACCESS call are protected
anyway.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

8bcbe7d9

NFSv4: Add a helper to encode/decode struct timespec · 36b3743f

Trond Myklebust authored Mar 20, 2018

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

36b3743f

NFSv4: Clean up encode_attrs · 40a3426c

Trond Myklebust authored Mar 20, 2018

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

40a3426c

NFSv4; Clean up XDR encoding of type bitmap4 · 37c88763

Trond Myklebust authored Mar 20, 2018

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

37c88763

NFSv4: Allow GFP_NOIO sleeps in decode_attr_owner/decode_attr_group · e8d8aa46

Trond Myklebust authored Mar 20, 2018

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

e8d8aa46

SUNRPC: Add a helper for encoding opaque data inline · 85e3dd44

Trond Myklebust authored Mar 20, 2018

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

85e3dd44

SUNRPC: Add helpers for decoding opaque and string types · 0e779aa7

Trond Myklebust authored Mar 20, 2018

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

0e779aa7

NFSv4: Ignore change attribute invalidations if we hold a delegation · d943f2dd

Trond Myklebust authored Mar 20, 2018

Don't bother even recording an invalid change attribute if we hold a
delegation since we already know the state of our attribute cache.
We can rely on the fact that we will pick up a copy from the server
when we return the delegation.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

d943f2dd

NFS: More fine grained attribute tracking · 16e14375

Trond Myklebust authored Mar 20, 2018

Currently, if the NFS_INO_INVALID_ATTR flag is set, for instance by
a call to nfs_post_op_update_inode_locked(), then it will not be cleared
until all the attributes have been revalidated. This means, for instance,
that NFSv4 writes will always force a full attribute revalidation.

Track the ctime, mtime, size and change attribute separately from the
other attributes so that we can have nfs_post_op_update_inode_locked()
set them correctly, and later have the cache consistency bitmask be
able to clear them.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

16e14375

NFS: Don't force unnecessary cache invalidation in nfs_update_inode() · cac88f94

Trond Myklebust authored Mar 20, 2018

If we managed to revalidate all the attributes, then there is no reason
to mark them as invalid again. We do, however want to ensure that we
set nfsi->attrtimeo correctly.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

cac88f94

NFS: Don't redirty the attribute cache in nfs_wcc_update_inode() · 783b194c

Trond Myklebust authored Mar 20, 2018

If we received weak cache consistency data from the server, then those
attributes are up to date, and there is no reason to mark them as
dirty in the attribute cache.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

783b194c

NFS: Don't force a revalidation of all attributes if change is missing · 8619ddd0

Trond Myklebust authored Mar 20, 2018

Even if the change attribute is missing, it is still OK to mark the other
attributes as being up to date.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

8619ddd0

NFS: Convert NFS_INO_INVALID flags to unsigned long · 90972882

Trond Myklebust authored Mar 20, 2018

The cache validity attribute is unsigned long, so make sure that
the flags are too.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

90972882

NFSv4: Don't return the delegation when not needed by NFSv4.x (x>0) · c01d3645

Trond Myklebust authored Mar 20, 2018

Starting with NFSv4.1, the server is able to deduce the client id from
the SEQUENCE op which means it can always figure out whether or not
the client is holding a delegation on a file that is being changed.
For that reason, RFC5661 does not require a delegation to be unconditionally
recalled on operations such as SETATTR, RENAME, or REMOVE.

Note that for now, we continue to return READ delegations since that is
still expected by the Linux knfsd server.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

c01d3645

NFS: Remove the unused return_delegation() callback · c135cb39

Trond Myklebust authored Mar 20, 2018

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

c135cb39

NFS: Move the delegation return down into _nfs4_do_setattr() · 199366f0

Trond Myklebust authored Mar 20, 2018

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

199366f0

NFS: Add a delegation return into nfs4_proc_unlink_setup() · 977fcc2b

Trond Myklebust authored Mar 20, 2018

Ensure that when we do finally delete the file, then we return the
delegation.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

977fcc2b

NFS: Move delegation recall into the NFSv4 callback for rename_setup() · f2c2c552

Trond Myklebust authored Mar 20, 2018

Move the delegation recall out of the generic code, and into the NFSv4
specific callback.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

f2c2c552

NFS: Move the delegation return down into nfs4_proc_remove() · 912678db

Trond Myklebust authored Mar 20, 2018

Move the delegation return out of generic code and down into the
NFSv4 specific unlink code.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

912678db

NFS: Move the delegation return down into nfs4_proc_link() · 9f768272

Trond Myklebust authored Mar 20, 2018

Move the delegation return out of generic code.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

9f768272

NFSv4: Fix nfs4_return_incompatible_delegation · f5086242

Trond Myklebust authored Mar 20, 2018

The 'fmode' argument can take an FMODE_EXEC value, which we want to
filter out before comparing to the delegation type.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

f5086242

xprtrdma: Fix corner cases when handling device removal · 25524288

Chuck Lever authored Mar 19, 2018

Michal Kalderon has found some corner cases around device unload
with active NFS mounts that I didn't have the imagination to test
when xprtrdma device removal was added last year.

- The ULP device removal handler is responsible for deallocating
  the PD. That wasn't clear to me initially, and my own testing
  suggested it was not necessary, but that is incorrect.

- The transport destruction path can no longer assume that there
  is a valid ID.

- When destroying a transport, ensure that ib_free_cq() is not
  invoked on a CQ that was already released.
Reported-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Fixes: bebd0318 ("xprtrdma: Support unplugging an HCA from ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

25524288

nfs4: wake any lock waiters on successful RECLAIM_COMPLETE · 57174593

Jeff Layton authored Mar 18, 2018

If we have a RECLAIM_COMPLETE with a populated cl_lock_waitq, then
that implies that a reconnect has occurred. Since we can't expect a
CB_NOTIFY_LOCK callback at that point, just wake up the entire queue
so that all the tasks can re-poll for their locks.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

57174593

nfs4: don't compare clientid in nfs4_wake_lock_waiter · 56566103

Jeff Layton authored Mar 18, 2018

The task is expected to sleep for a while here, and it's possible that
a new EXCHANGE_ID has occurred in the interim, and we were assigned a
new clientid. Since this is a per-client list, there isn't a lot of
value in vetting the clientid on the incoming request.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

56566103

nfs4: always reset notified flag to false before repolling for lock · 41a74620

Jeff Layton authored Mar 18, 2018

We may get a notification and lose the race to another client. Ensure
that we wait again for a notification in that case.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

41a74620

sunrpc: Add static trace point to report result of RPC ping · a25a4cb3

Chuck Lever authored Mar 16, 2018

This information can help track down local misconfiguration issues
as well as network partitions and unresponsive servers.

There are several ways to send a ping, and with transport multi-
plexing, the exact rpc_xprt that is used is sometimes not known by
the upper layer. The rpc_xprt pointer passed to the trace point
call also has to be RCU-safe.

I found a spot inside the client FSM where an rpc_xprt pointer is
always available and safe to use.
Suggested-by: Bill Baker <Bill.Baker@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

a25a4cb3

sunrpc: Add static trace point to report RPC latency stats · 40bf7eb3

Chuck Lever authored Mar 16, 2018

Introduce a low-overhead mechanism to report information about
latencies of individual RPCs. The goal is to enable user space to
filter the trace record for latency outliers, or build histograms,
etc.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

40bf7eb3

sunrpc: Simplify synopsis of some trace points · e671edb9

Chuck Lever authored Mar 16, 2018

Clean up: struct rpc_task carries a pointer to a struct rpc_clnt,
and in fact task->tk_client is always what is passed into trace
points that are already passing @task.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

e671edb9

SUNRPC: Make num_reqs a non-atomic integer · ff699ea8

Chuck Lever authored Mar 05, 2018

If recording xprt->stat.max_slots is moved into xprt_alloc_slot,
then xprt->num_reqs is never manipulated outside
xprt->reserve_lock. There's no longer a need for xprt->num_reqs to
be atomic.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

ff699ea8

SUNRPC: Make RTT measurement more precise (Send) · 78215759

Chuck Lever authored Mar 05, 2018

Some RPC transports have more overhead in their send_request
callouts than others. For example, for RPC-over-RDMA:

- Marshaling an RPC often has to DMA map the RPC arguments

- Registration methods perform memory registration as part of
  marshaling

To capture just server and network latencies more precisely: when
sending a Call, capture the rq_xtime timestamp _after_ the transport
header has been marshaled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

78215759

SUNRPC: Make RTT measurement more precise (Receive) · 0b87a46b

Chuck Lever authored Mar 05, 2018

Some RPC transports have more overhead in their reply handlers
than others. For example, for RPC-over-RDMA:

- RPC completion has to wait for memory invalidation, which is
  not a part of the server/network round trip

- Recently a context switch was introduced into the reply handler,
  which further artificially inflates the measure of RPC RTT

To capture just server and network latencies more precisely: when
receiving a reply, compute the RTT as soon as the XID is recognized
rather than at RPC completion time.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

0b87a46b

SUNRPC: Move xprt_update_rtt callsite · ecd465ee

Chuck Lever authored Mar 05, 2018

Since commit 33849792 ("xprtrdma: Detect unreachable NFS/RDMA
servers more reliably"), the xprtrdma transport now has a ->timer
callout. But xprtrdma does not need to compute RTT data, only UDP
needs that. Move the xprt_update_rtt call into the UDP transport
implementation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

ecd465ee

xprtrdma: Move creation of rl_rdmabuf to rpcrdma_create_req · 2dd4a012

Chuck Lever authored Feb 28, 2018

Refactor: Both rpcrdma_create_req call sites have to allocate the
buffer where the transport header is built, so just move that
allocation into rpcrdma_create_req.

This buffer is a fixed size. There's no needed information available
in call_allocate that is not also available when the transport is
created.

The original purpose for allocating these buffers on demand was to
reduce the possibility that an allocation failure during transport
creation will hork the mount operation during low memory scenarios.
Some relief for this rare possibility is coming up in the next few
patches.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

2dd4a012

xprtrdma: Chain Send to FastReg WRs · f2877623

Chuck Lever authored Feb 28, 2018

With FRWR, the client transport can perform memory registration and
post a Send with just a single ib_post_send.

This reduces contention between the send_request path and the Send
Completion handlers, and reduces the overhead of registering a chunk
that has multiple segments.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

f2877623

xprtrdma: "Support" call-only RPCs · fb14ae88

Chuck Lever authored Feb 28, 2018

RPC-over-RDMA version 1 credit accounting relies on there being a
response message for every RPC Call. This means that RPC procedures
that have no reply will disrupt credit accounting, just in the same
way as a retransmit would (since it is sent because no reply has
arrived). Deal with the "no reply" case the same way.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

fb14ae88

xprtrdma: Reduce number of MRs created by rpcrdma_mrs_create · ae741a85

Chuck Lever authored Feb 28, 2018

Create fewer MRs on average. Many workloads don't need as many as
32 MRs, and the transport can now quickly restock the MR free list.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

ae741a85

xprtrdma: ->send_request returns -EAGAIN when there are no free MRs · 9e679d5e

Chuck Lever authored Feb 28, 2018

Currently, when the MR free list is exhausted during marshaling, the
RPC/RDMA transport places the RPC task on the delayq, which forces a
wait for HZ >> 2 before the marshal and send is retried.

With this change, the transport now places such an RPC task on the
pending queue, and wakes it just as soon as more MRs have been
created. Creating more MRs typically takes less than a millisecond,
and this waking mechanism is less deadlock-prone.

Moreover, the waiting RPC task is holding the transport's write
lock, which blocks the transport from sending RPCs. Therefore faster
recovery from MR exhaustion is desirable.

This is the same mechanism that the TCP transport utilizes when
handling write buffer space exhaustion.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

9e679d5e

xprtrdma: Remove xprt-specific connect cookie · 8a14793e

Chuck Lever authored Feb 28, 2018

Clean up: The generic rq_connect_cookie is sufficient to detect RPC
Call retransmission.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

8a14793e