Commits · 79f687a3de9e3ba2518b4ea33f38ca6cbe9133eb · Kirill Smelkov / linux

02 Dec, 2016 1 commit

NFS: Fix a performance regression in readdir · 79f687a3

Trond Myklebust authored Nov 19, 2016

Ben Coddington reports that commit 311324ad, by adding the function
nfs_dir_mapping_need_revalidate() that checks page cache validity on
each call to nfs_readdir() causes a performance regression when
the directory is being modified.

If the directory is changing while we're iterating through the directory,
POSIX does not require us to invalidate the page cache unless the user
calls rewinddir(). However, we still do want to ensure that we use
readdirplus in order to avoid a load of stat() calls when the user
is doing an 'ls -l' workload.

The fix should be to invalidate the page cache immediately when we're
setting the NFS_INO_ADVISE_RDPLUS bit.
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 311324ad ("NFS: Be more aggressive in using readdirplus...")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

79f687a3

01 Dec, 2016 39 commits

NFS: fix typo in parameter description · f36ab161

Wei Yongjun authored Oct 28, 2016

Fix typo in parameter description.

Fixes: 5405fc44 ("NFSv4.x: Add kernel parameter to control the
callback server")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

f36ab161

NFS: discard nfs_lockowner structure. · d51fdb87

NeilBrown authored Oct 13, 2016

It now has only one field and is only used in one structure.
So replaced it in that structure by the field it contains.
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

d51fdb87

NFSv4: enhance nfs4_copy_lock_stateid to use a flock stateid if there is one · 8d424431

NeilBrown authored Oct 13, 2016

A process can have two possible lock owner for a given open file:
a per-process Posix lock owner and a per-open-file flock owner
Use both of these when searching for a suitable stateid to use.

With this patch, READ/WRITE requests will use the correct stateid
if a flock lock is active.
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

8d424431

NFSv4: change nfs4_select_rw_stateid to take a lock_context inplace of lock_owner · 17393475

NeilBrown authored Oct 13, 2016

The only time that a lock_context is not immediately available is in
setattr, and now that it has an open_context, it can easily find one
with nfs_get_lock_context.
This removes the need for the on-stack nfs_lockowner.

This change is preparation for correctly support flock stateids.
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

17393475

NFSv4: change nfs4_do_setattr to take an open_context instead of a nfs4_state. · 29b59f94

NeilBrown authored Oct 13, 2016

The open_context can always lead directly to the state, and is always easily
available, so this is a straightforward change.
Doing this makes more information available to _nfs4_do_setattr() for use
in the next patch.
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

29b59f94

NFSv4: add flock_owner to open context · 532d4def

NeilBrown authored Oct 13, 2016

An open file description (struct file) in a given process can be
associated with two different lock owners.

It can have a Posix lock owner which will be different in each process
that has a fd on the file.
It can have a Flock owner which will be the same in all processes.

When searching for a lock stateid to use, we need to consider both of these
owners

So add a new "flock_owner" to the "nfs_open_context" (of which there
is one for each open file description).

This flock_owner does not need to be reference-counted as there is a
1-1 relation between 'struct file' and nfs open contexts,
and it will never be part of a list of contexts.  So there is no need
for a 'flock_context' - just the owner is enough.

The io_count included in the (Posix) lock_context provides no
guarantee that all read-aheads that could use the state have
completed, so not supporting it for flock locks in not a serious
problem.  Synchronization between flock and read-ahead can be added
later if needed.

When creating an open_context for a non-openning create call, we don't have
a 'struct file' to pass in, so the lock context gets initialized with
a NULL owner, but this will never be used.

The flock_owner is not used at all in this patch, that will come later.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

532d4def

NFS: remove l_pid field from nfs_lockowner · b184b5c3

NeilBrown authored Oct 13, 2016

this field is not used in any important way and probably should
have been removed by

Commit: 8003d3c4 ("nfs4: treat lock owners as opaque values")

which removed the pid argument from nfs4_get_lock_state.

Except in unusual and uninteresting cases, two threads with the same
->tgid will have the same ->files pointer, so keeping them both
for comparison brings no benefit.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

b184b5c3

NFS: Remove unused argument from nfs_direct_write_complete() · 4d3b55d3

Anna Schumaker authored Nov 23, 2016

This parameter hasn't been used since 2a009ec9 (Linux 3.13-rc3), so
let's remove it from this function.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

4d3b55d3

NFS: Remove unused authflavour parameter from nfs_get_client() · 7d38de3f

Anna Schumaker authored Nov 17, 2016

This parameter hasn't been used since f8407299 (Linux 3.11-rc2), so
let's remove it from this function and callers.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

7d38de3f

nfs: fix false positives in nfs40_walk_client_list() · ced85a75

J. Bruce Fields authored Nov 28, 2016

It's possible that two different servers can return the same (clientid,
verifier) pair purely by coincidence.  Both are 64-bit values, but
depending on the server implementation, they can be highly predictable
and collisions may be quite likely, especially when there are lots of
servers.

So, check for this case.  If the clientid and verifier both match, then
we actually know they *can't* be the same server, since a new
SETCLIENTID to an already-known server should have changed the verifier.

This helps fix a bug that could cause the client to mount a filesystem
from the wrong server.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Yongcheng Yang <yoyang@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

ced85a75

sunrpc: Don't engage exponential backoff when connection attempt is rejected. · 2c2ee6d2

NeilBrown authored Nov 23, 2016

xs_connect() contains an exponential backoff mechanism so the repeated
connection attempts are delayed by longer and longer amounts.

This is appropriate when the connection failed due to a timeout, but
it not appropriate when a definitive "no" answer is received.  In such
cases, call_connect_status() imposes a minimum 3-second back-off, so
not having the exponetial back-off will never result in immediate
retries.

The current situation is a problem when the NFS server tries to
register with rpcbind but rpcbind isn't running.  All connection
attempts are made on the same "xprt" and as the connection is never
"closed", the exponential back delays successive attempts to register,
or de-register, different protocols.  This results in a multi-minute
delay with no benefit.

So, when call_connect_status() receives a definitive "no", use
xprt_conditional_disconnect() to cancel the previous connection attempt.
This will set XPRT_CLOSE_WAIT so that xprt->ops->close() calls xs_close()
which resets the reestablish_timeout.

To ensure xprt_conditional_disconnect() does the right thing, we
ensure that rq_connect_cookie is set before a connection attempt, and
allow xprt_conditional_disconnect() to complete even when the
transport is not fully connected.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

2c2ee6d2

pNFS: Skip invalid stateids when doing a bulk destroy · b85f5620

Trond Myklebust authored Nov 30, 2016

If the layout stateid is already invalid, we have no work to do.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

b85f5620

pNFS: Wait on outstanding layoutreturns to complete in pnfs_roc() · 29ade5db
Trond Myklebust authored Nov 30, 2016
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
29ade5db
pNFS: Don't mark the layout as freed if the last lseg is marked for return · abb3e1c8
Trond Myklebust authored Nov 30, 2016
```
Address another memory leak.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
abb3e1c8

pNFS: Sync the layout state bits in pnfs_cache_lseg_for_layoutreturn · 4aab9732

Trond Myklebust authored Nov 30, 2016

Ensure that the layout state bits are synced when we cache a layout
segment for layoutreturn using an appropriate call to
pnfs_set_plh_return_info.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

4aab9732

pNFS: Fix bugs in _pnfs_return_layout · 24408f52

Trond Myklebust authored Nov 30, 2016

We need to honour the NFS_LAYOUT_RETURN_REQUESTED bit regardless of
whether or not there are layout segments pending.
Furthermore, we should ensure that we leave the plh_return_segs list
empty.

This patch fixes a memory leak of the layout segments on plh_return_segs.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

24408f52

pNFS: Clear all layout segment state in pnfs_mark_layout_stateid_invalid · fe1cf946

Trond Myklebust authored Nov 30, 2016

When the layout state is invalidated, then so is the layout segment
state, and hence we do need to clean up the state bits.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

fe1cf946

pNFS: Prevent unnecessary layoutreturns after delegreturn · 53e6fc86

Trond Myklebust authored Nov 19, 2016

If we cannot grab the inode or superblock, then we cannot pin the
layout header, and so we cannot send a layoutreturn as part of an
async delegreturn call. In this case, we currently end up sending
an extra layoutreturn after the delegreturn. Since the layout was
implicitly returned by the delegreturn, that just gets a BAD_STATEID.

The fix is to simply complete the return-on-close immediately.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

53e6fc86

pNFS: Enable layoutreturn operation for return-on-close · 1c5bd76d

Trond Myklebust authored Nov 16, 2016

Amend the pnfs return on close helper functions to enable sending the
layoutreturn op in CLOSE/DELEGRETURN. This closes a potential race between
CLOSE/DELEGRETURN and parallel OPEN calls to the same file, and allows the
client and the server to agree on whether or not there is an outstanding
layout.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

1c5bd76d

pNFS: Clean up - add a helper to initialise struct layoutreturn_args · 828ed9ec
Trond Myklebust authored Nov 15, 2016
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
828ed9ec

NFSv4: Add encode/decode of the layoutreturn op in DELEGRETURN · 586f1c39

Trond Myklebust authored Nov 15, 2016

Add XDR encoding for the layoutreturn op, and storage for the layoutreturn
arguments to the DELEGRETURN compound.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

586f1c39

NFSv4: Add encode/decode of the layoutreturn op in CLOSE · cf805165

Trond Myklebust authored Nov 15, 2016

Add XDR encoding for the layoutreturn op, and storage for the layoutreturn
arguments to the CLOSE compound.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

cf805165

NFSv4: Fix missing operation accounting in NFS4_dec_delegreturn_sz · d8434d4c

Trond Myklebust authored Nov 16, 2016

We need to account for the reply to the PUTFH operation in the
DELEGRETURN compound.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

d8434d4c

pNFS: Don't mark layout segments invalid on layoutreturn in pnfs_roc · 69820d22

Trond Myklebust authored Nov 15, 2016

The layoutreturn call will take care of invalidating the layout segments
once the call is successful.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

69820d22

pNFS: Get rid of unnecessary layout parameter in encode_layoutreturn callback · 94e5c571
Trond Myklebust authored Sep 15, 2016
```
The parameter is already present in the "args" structure.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
94e5c571
pNFS: Skip checking for return-on-close if the layout is invalid · 0cdc329e
Trond Myklebust authored Nov 21, 2016
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
0cdc329e

pNFS: Remove spurious wake up in pnfs_layout_remove_lseg() · e685d237

Trond Myklebust authored Nov 18, 2016

There is no change to the value of NFS_LAYOUT_RETURN, so we should
not be waking up the RPC call.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

e685d237

NFSv4: Ignore LAYOUTRETURN result if the layout doesn't match or is invalid · 2a974425

Trond Myklebust authored Nov 20, 2016

Fix a potential race with CB_LAYOUTRECALL in which the server recalls the
remaining layout segments while our LAYOUTRETURN is still in transit.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

2a974425

pNFS: Do not free layout segments that are marked for return · 68f74479

Trond Myklebust authored Oct 12, 2016

We may want to process and transmit layout stat information for the
layout segments that are being returned, so we should defer freeing
them until after the layoutreturn has completed.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

68f74479

pNFS: Delay getting the layout header in CB_LAYOUTRECALL handlers · 7b410d9c

Trond Myklebust authored Oct 31, 2016

Instead of grabbing the layout, we want to get the inode so that we
can reduce races between layoutget and layoutrecall when the server
does not support call referring.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

7b410d9c

pNFS: consolidate the different range intersection tests · 17822b20

Trond Myklebust authored Oct 25, 2016

Both pnfs.c and the flexfiles code have their own versions of the
range intersection testing, and the "end_offset" helper.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

17822b20

pNFS: Fix race in pnfs_wait_on_layoutreturn · ee284e35

Trond Myklebust authored Nov 18, 2016

We must put the task to sleep while holding the inode->i_lock in order
to ensure atomicity with the test for NFS_LAYOUT_RETURN.

Fixes: 500d701f ("NFS41: make close wait for layoutreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

ee284e35

pNFS: On error, do not send LAYOUTGET until the LAYOUTRETURN has completed · 6604b203

Trond Myklebust authored Oct 17, 2016

If there is an I/O error, we should not call LAYOUTGET until the
LAYOUTRETURN that reports the error is complete.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.8+

6604b203

pNFS: Force a retry of LAYOUTGET if the stateid doesn't match our cache · 9888d837

Trond Myklebust authored Nov 23, 2016

If the server sends us a completely new stateid, and the client thinks
it already holds a layout, then force a retry of the LAYOUTGET after
invalidating the existing layout in order to avoid corruption due to
races.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

9888d837

pNFS: Clear NFS_LAYOUT_RETURN_REQUESTED when invalidating the layout stateid · ae5a459d

Trond Myklebust authored Nov 14, 2016

We must ensure that we don't schedule a layoutreturn if the layout stateid
has been marked as invalid.

Fixes: 2a59a041 ("pNFS: Fix pnfs_set_layout_stateid() to clear...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.8+

ae5a459d

pNFS: Don't clear the layout stateid if a layout return is outstanding · 7b650994

Trond Myklebust authored Nov 14, 2016

If we no longer hold any layout segments, we're normally expected to
consider the layout stateid to be invalid. However we cannot assume this
if we're about to, or in the process of sending a layoutreturn.

Fixes: 334a8f37 ("pNFS: Don't forget the layout stateid if...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.8+

7b650994

pNFS: Fix a deadlock between read resends and layoutreturn · 54e4a0df

Trond Myklebust authored Nov 27, 2016

We must not call nfs_pageio_init_read() on a new nfs_pageio_descriptor
while holding a reference to a layout segment, as that can deadlock
pnfs_update_layout().

Fixes: d67ae825 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.0+

54e4a0df

NFSv4.1: Fix regression in callback retry handling · 9a837856

Fred Isaman authored Sep 27, 2016

When initializing a freshly created slot for the calllback channel,
the seq_nr needs to be 0, not 1.  Otherwise validate_seqid
and nfs4_slot_wait_on_seqid get confused and believe that the
mpty slot corresponds to a previously sent reply.
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

9a837856

NFSv4: Optimise away forced revalidation when we know the attributes are OK · 1ad13dbc

Trond Myklebust authored Oct 27, 2016

The NFS_INO_REVAL_FORCED flag needs to be set if we just got a delegation,
and we see that there might still be some ambiguity as to whether or not
our attribute or data cache are valid.
In practice, this means that a call to nfs_check_inode_attributes() will
have noticed a discrepancy between cached attributes and measured ones,
so let's move the setting of NFS_INO_REVAL_FORCED to there.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

1ad13dbc