Commits · 362745268ce119c473952b30f57d947bdede7f7a · nexedi / linux

24 Jul, 2016 3 commits
- Merge branch 'writeback' · 36274526
  Trond Myklebust authored Jul 24, 2016
  
  36274526
- Merge branch 'sunrpc' · 7f94ed24
  Trond Myklebust authored Jul 24, 2016
  
  7f94ed24
- SUNRPC: Fix a compiler warning in fs/nfs/clnt.c · ce272302
  Trond Myklebust authored Jul 24, 2016
```
Fix the report:

net/sunrpc/clnt.c:2580:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
  ce272302
22 Jul, 2016 1 commit

nfs: don't create zero-length requests · 149a4fdd

Benjamin Coddington authored Jul 18, 2016

NFS doesn't expect requests with wb_bytes set to zero and may make
unexpected decisions about how to handle that request at the page IO layer.
Skip request creation if we won't have any wb_bytes in the request.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Weston Andros Adamson <dros@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

149a4fdd

21 Jul, 2016 1 commit

pNFS/files: filelayout_write_done_cb must call nfs_writeback_update_inode() · e033fb51

Trond Myklebust authored Jul 21, 2016

All write callbacks are required to call nfs_writeback_update_inode() upon
success to ensure that file size changes are recorded, and the attribute
cache is invalidated.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

e033fb51

19 Jul, 2016 13 commits

sunrpc: Prevent resvport min/max inversion via sysfs and module parameter · ffb6ca33

Frank Sorenson authored Jul 08, 2016

The current min/max resvport settings are independently limited
by the entire range of allowed ports, so max_resvport can be
set to a port lower than min_resvport.

Prevent inversion of min/max values when set through sysfs and
module parameter by setting the limits dependent on each other.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

ffb6ca33

sunrpc: Prevent resvport min/max inversion via sysctl · e08ea3a9

Frank Sorenson authored Jul 08, 2016

The current min/max resvport settings are independently limited
by the entire range of allowed ports, so max_resvport can be
set to a port lower than min_resvport.

Prevent inversion of min/max values when set through sysctl by
setting the limits dependent on each other.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

e08ea3a9

sunrpc: Fix reserved port range calculation · 5d71899a

Frank Sorenson authored Jul 08, 2016

The range calculation for choosing the random reserved port will panic
with divide-by-zero when min_resvport == max_resvport, a range of one
port, not zero.

Fix the reserved port range calculation by adding one to the difference.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

5d71899a

sunrpc: Fix bit count when setting hashtable size to power-of-two · 34ae685c

Frank Sorenson authored Jun 27, 2016

Author: Frank Sorenson <sorenson@redhat.com>
Date:   2016-06-27 13:55:48 -0500

    sunrpc: Fix bit count when setting hashtable size to power-of-two

    The hashtable size is incorrectly calculated as the next higher
    power-of-two when being set to a power-of-two.  fls() returns the
    bit number of the most significant set bit, with the least
    significant bit being numbered '1'.  For a power-of-two, fls()
    will return a bit number which is one higher than the number of bits
    required, leading to a hashtable which is twice the requested size.

    In addition, the value of (1 << nbits) will always be at least num,
    so the test will never be true.

    Fix the hash table size calculation to correctly set hashtable
    size, and eliminate the unnecessary check.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

34ae685c

nfs4: flexfiles: respect noresvport when establishing connections to DSes · b224f7cb

Tigran Mkrtchyan authored Jun 13, 2016

Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

b224f7cb

nfs4: clnt: respect noresvport when establishing connections to DSes · 3fc75f12

Tigran Mkrtchyan authored Jun 13, 2016

result:

$ mount -o vers=4.1 dcache-lab007:/ /pnfs
$ cp /etc/profile /pnfs
tcp        0      0 131.169.185.68:1005     131.169.191.141:32049   ESTABLISHED
tcp        0      0 131.169.185.68:751      131.169.191.144:2049    ESTABLISHED
$

$ mount -o vers=4.1,noresvport dcache-lab007:/ /pnfs
$ cp /etc/profile /pnfs
tcp        0      0 131.169.185.68:34894    131.169.191.141:32049   ESTABLISHED
tcp        0      0 131.169.185.68:35722    131.169.191.144:2049    ESTABLISHED
$
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

3fc75f12

pnfs/blocklayout: put deviceid node after releasing bl_ext_lock · d9c0ce0e

Benjamin Coddington authored Jun 10, 2016

The last put of deviceid nodes for SCSI layouts may sleep, so we shouldn't
hold any spinlocks. Make sure we put them outside the bl_ext_lock.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

d9c0ce0e

sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flags · ce52914e

Scott Mayhew authored Jun 07, 2016

A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
not really safe to use the the generic_cred->acred->ac_flags to store
the NO_CRKEY_TIMEOUT flag.  A lookup for a unx_cred triggered while the
KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
with the auth_cred to be in a state where they're perpetually doing 4K
NFS_FILE_SYNC writes.

This can be reproduced as follows:

1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
They do not need to be the same export, nor do they even need to be from
the same NFS server.  Also, v3 is fine.
$ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
$ sudo mount -o v3,sec=sys server2:/export /mnt/sys

2. As the normal user, before accessing the kerberized mount, kinit with
a short lifetime (but not so short that renewing the ticket would leave
you within the 4-minute window again by the time the original ticket
expires), e.g.
$ kinit -l 10m -r 60m

3. Do some I/O to the kerberized mount and verify that the writes are
wsize, UNSTABLE:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

4. Wait until you're within 4 minutes of key expiry, then do some more
I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
set.  Verify that the writes are 4K, FILE_SYNC:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

5. Now do some I/O to the sec=sys mount.  This will cause
RPC_CRED_NO_CRKEY_TIMEOUT to be set:
$ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1

6. Writes for that user will now be permanently 4K, FILE_SYNC for that
user, regardless of which mount is being written to, until you reboot
the client.  Renewing the kerberos ticket (assuming it hasn't already
expired) will have no effect.  Grabbing a new kerberos ticket at this
point will have no effect either.

Move the flag to the auth->au_flags field (which is currently unused)
and rename it slightly to reflect that it's no longer associated with
the auth_cred->ac_flags.  Add the rpc_auth to the arg list of
rpcauth_cred_key_to_expire and check the au_flags there too.  Finally,
add the inode to the arg list of nfs_ctx_key_to_expire so we can
determine the rpc_auth to pass to rpcauth_cred_key_to_expire.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

ce52914e

mount: use sec= that was specified on the command line · e68fd7c8

Steve Dickson authored May 25, 2016

When older servers return RPC_AUTH_NULL, it means the
rpc creds will be ignored. In that case use the sec=
that was specified instead of setting sec=null

Fixes Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1112983Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

e68fd7c8

pNFS: Fix LAYOUTGET handling of NFS4ERR_BAD_STATEID and NFS4ERR_EXPIRED · f7db0b28

Trond Myklebust authored Jul 14, 2016

We want to recover the open stateid if there is no layout stateid
and/or the stateid argument matches an open stateid.
Otherwise throw out the existing layout and recover from scratch, as
the layout stateid is bad.

Fixes: 183d9e7b ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>

f7db0b28

pNFS: Handle NFS4ERR_RECALLCONFLICT correctly in LAYOUTGET · 66b53f32

Trond Myklebust authored Jul 14, 2016

Instead of giving up altogether and falling back to doing I/O
through the MDS, which may make the situation worse, wait for
2 lease periods for the callback to resolve itself, and then
try destroying the existing layout.

Only if this was an attempt at getting a first layout, do we
give up altogether, as the server is clearly crazy.

Fixes: 183d9e7b ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>

66b53f32

pNFS: Separate handling of NFS4ERR_LAYOUTTRYLATER and RECALLCONFLICT · e85d7ee4

Trond Myklebust authored Jul 14, 2016

They are not the same error, and need to be handled differently.

Fixes: 183d9e7b ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>

e85d7ee4

pNFS: Fix post-layoutget error handling in pnfs_update_layout() · 56b38a1f

Trond Myklebust authored Jul 14, 2016

The non-retry error path is currently broken and ends up releasing the
reference to the layout twice. It also can end up clearing the
NFS_LAYOUT_FIRST_LAYOUTGET flag twice, causing a race.

In addition, the retry path will fail to decrement the plh_outstanding
counter.

Fixes: 183d9e7b ("pnfs: rework LAYOUTGET retry handling")
Cc: stable@vger.kernel.org # 4.7
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>

56b38a1f

18 Jul, 2016 1 commit

pNFS: Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding · 10b7e9ad

Trond Myklebust authored Jul 18, 2016

We know that the attributes will need updating if there is still a
LAYOUTCOMMIT outstanding.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

10b7e9ad

16 Jul, 2016 1 commit

SUNRPC: Fix infinite looping in rpc_clnt_iterate_for_each_xprt · bdc54d8e

Trond Myklebust authored Jul 16, 2016

If there were less than 2 entries in the multipath list, then
xprt_iter_next_entry_multiple() would never advance beyond the
first entry, which is correct for round robin behaviour, but not
for the list iteration.

The end result would be infinite looping in rpc_clnt_iterate_for_each_xprt()
as we would never see the xprt == NULL condition fulfilled.
Reported-by: Oleg Drokin <green@linuxhacker.ru>
Fixes: 80b14d5e ("SUNRPC: Add a structure to track multiple transports")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

bdc54d8e

14 Jul, 2016 1 commit

NFSv4: Revert "Truncating file opens should also sync O_DIRECT writes" · 8b7d9d09

Trond Myklebust authored Jul 14, 2016

We're not holding any locks, so both nfs_wb_all() and inode_dio_wait()
are unenforcible and have livelock potential. Just limit ourselves to
flushing out the data.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

8b7d9d09

05 Jul, 2016 19 commits

NFS nfs_vm_page_mkwrite: Don't freeze me, Bro... · 9a773e7c

Trond Myklebust authored Jun 23, 2016

Prevent filesystem freezes while handling the write page fault.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

9a773e7c

NFSv4.2: llseek(SEEK_HOLE) and llseek(SEEK_DATA) don't require data sync · e95fc4a0

Trond Myklebust authored Jun 25, 2016

We want to ensure that we write the cached data to the server, but
don't require it be synced to disk. If the server reboots, we will
get a stateid error, which will cause us to retry anyway.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

e95fc4a0

NFSv4.2: Fix writeback races in nfs4_copy_file_range · 837bb1d7

Trond Myklebust authored Jun 25, 2016

We need to ensure that any writes to the destination file are serialised
with the copy, meaning that the writeback has to occur under the inode lock.

Also relax the writeback requirement on the source, and rely on the
stateid checking to tell us if the source rebooted. Add the helper
nfs_filemap_write_and_wait_range() to call pnfs_sync_inode() as
is appropriate for pNFS servers that may need a layoutcommit.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

837bb1d7

NFSv4.2: Fix a race in nfs42_proc_deallocate() · 1e564d3d

Trond Myklebust authored Jun 25, 2016

When punching holes in a file, we want to ensure the operation is
serialised w.r.t. other writes, meaning that we want to call
nfs_sync_inode() while holding the inode lock.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

1e564d3d

NFS: Getattr doesn't require data sync semantics · 79566ef0

Trond Myklebust authored Jun 25, 2016

When retrieving stat() information, NFS unfortunately does require us to
sync writes to disk in order to ensure that mtime and ctime are up to
date. However we shouldn't have to ensure that those writes are persisted.

Relaxing that requirement does mean that we may see an mtime/ctime change
if the server reboots and forces us to replay all writes.

The exception to this rule are pNFS clients that are required to send
layoutcommit, however that is dealt with by the call to pnfs_sync_inode()
in _nfs_revalidate_inode().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

79566ef0

NFS: Do not aggressively cache file attributes in the case of O_DIRECT · 651b0e70

Trond Myklebust authored Jun 25, 2016

A file that is open for O_DIRECT is by definition not obeying
close-to-open cache consistency semantics, so let's not cache
the attributes too aggressively either.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

651b0e70

NFS: Remove unused function nfs_revalidate_mapping_protected() · be527494
Trond Myklebust authored Jun 22, 2016
```
Clean up...
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
be527494

NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin() · f508d46a

Trond Myklebust authored Jun 23, 2016

We're now waiting immediately after taking the locks, so waiting
in fsync() and write_begin() is either redundant or potentially
subject to livelock (if not holding the lock).
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

f508d46a

NFS: Cleanup nfs_direct_complete() · f7b5c340

Trond Myklebust authored Jun 23, 2016

There is only one caller that sets the "write" argument to true,
so just move the call to nfs_zap_mapping() and get rid of the
now redundant argument.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

f7b5c340

NFS: Do not serialise O_DIRECT reads and writes · a5864c99

Trond Myklebust authored Jun 03, 2016

Allow dio requests to be scheduled in parallel, but ensuring that they
do not conflict with buffered I/O.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

a5864c99

NFS: Move buffered I/O locking into nfs_file_write() · 18290650

Trond Myklebust authored Jun 23, 2016

Preparation for the patch that de-serialises O_DIRECT reads and
writes.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

18290650

NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c · 89698b24
Trond Myklebust authored Jun 23, 2016
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
89698b24

NFS: Remove racy size manipulations in O_DIRECT · 2f3c7d87

Trond Myklebust authored Jun 22, 2016

On success, the RPC callbacks will ensure that we make the appropriate calls
to nfs_writeback_update_inode()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

2f3c7d87

NFS: Ensure we reset the write verifier 'committed' value on resend. · a5314a74
Trond Myklebust authored Jun 01, 2016
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
a5314a74

NFS: Fix O_DIRECT verifier problems · 8fc3c386

Trond Myklebust authored Jun 01, 2016

We should not be interested in looking at the value of the stable field,
since that could take any value.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

8fc3c386

pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1 · 67120077
Trond Myklebust authored Jul 05, 2016
```
Cleanup...
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
67120077

pNFS: Ensure we layoutcommit before revalidating attributes · ac46bd37

Trond Myklebust authored Jul 05, 2016

If we need to update the cached attributes, then we'd better make
sure that we also layoutcommit first. Otherwise, the server may have stale
attributes.

Prior to this patch, the revalidation code tried to "fix" this problem by
simply disabling attributes that would be affected by the layoutcommit.
That approach breaks nfs_writeback_check_extend(), leading to a file size
corruption.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

ac46bd37

pNFS: Files and flexfiles always need to commit before layoutcommit · 2e18d4d8

Trond Myklebust authored Jun 26, 2016

So ensure that we mark the layout for commit once the write is done,
and then ensure that the commit to ds is finished before sending
layoutcommit.

Note that by doing this, we're able to optimise away the commit
for the case of servers that don't need layoutcommit in order to
return updated attributes.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

2e18d4d8

pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit() · bc28e1c2

Trond Myklebust authored Jun 26, 2016

Let's just have one place where we check ff_layout_need_layoutcommit().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

bc28e1c2