Commits · 8154ef2776aa512a3eaa0e7db030dc4803354d61 · Kirill Smelkov / linux

03 Apr, 2018 19 commits

NFSD: Clean up legacy NFS WRITE argument XDR decoders · 8154ef27

Chuck Lever authored Mar 27, 2018

Move common code in NFSD's legacy NFS WRITE decoders into a helper.
The immediate benefit is reduction of code duplication and some nice
micro-optimizations (see below).

In the long term, this helper can perform a per-transport call-out
to fill the rq_vec (say, using RDMA Reads).

The legacy WRITE decoders and procs are changed to work like NFSv4,
which constructs the rq_vec just before it is about to call
vfs_writev.

Why? Calling a transport call-out from the proc instead of the XDR
decoder means that the incoming FH can be resolved to a particular
filesystem and file. This would allow pages from the backing file to
be presented to the transport to be filled, rather than presenting
anonymous pages and copying or flipping them into the file's page
cache later.

I also prefer using the pages in rq_arg.pages, instead of pulling
the data pages directly out of the rqstp::rq_pages array. This is
currently the way the NFSv3 write decoder works, but the other two
do not seem to take this approach. Fixing this removes the only
reference to rq_pages found in NFSD, eliminating an NFSD assumption
about how transports use the pages in rq_pages.

Lastly, avoid setting up the first element of rq_vec as a zero-
length buffer. This happens with an RDMA transport when a normal
Read chunk is present because the data payload is in rq_arg's
page list (none of it is in the head buffer).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

8154ef27

nfsd: Trace NFSv4 COMPOUND execution · fff4080b

Chuck Lever authored Mar 27, 2018

This helps record the identity and timing of the ops in each NFSv4
COMPOUND, replacing dprintk calls that did much the same thing.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

fff4080b

nfsd: Add I/O trace points in the NFSv4 read proc · 87c5942e

Chuck Lever authored Mar 28, 2018

NFSv4 read compound processing invokes nfsd_splice_read and
nfs_readv directly, so the trace points currently in nfsd_read are
not invoked for NFSv4 reads.

Move the NFSD READ trace points to common helpers so that NFSv4
reads are captured.

Also, record any local I/O error that occurs, the total count of
bytes that were actually returned, and whether splice or vectored
read was used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

87c5942e

nfsd: Add I/O trace points in the NFSv4 write path · d890be15

Chuck Lever authored Mar 27, 2018

NFSv4 write compound processing invokes nfsd_vfs_write directly. The
trace points currently in nfsd_write are not effective for NFSv4
writes.

Move the trace points into the shared nfsd_vfs_write() helper.

After the I/O, we also want to record any local I/O error that
might have occurred, and the total count of bytes that were actually
moved (rather than the requested number).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

d890be15

nfsd: Add "nfsd_" to trace point names · f394b62b

Chuck Lever authored Mar 27, 2018

Follow naming convention used in client and in sunrpc layers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

f394b62b

nfsd: Record request byte count, not count of vectors · 79e0b4e2

Chuck Lever authored Mar 27, 2018

Byte count is more helpful to know than vector count.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

79e0b4e2

nfsd: Fix NFSD trace points · afa720a0

Chuck Lever authored Mar 27, 2018

nfsd-1915  [003] 77915.780959: write_opened:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1
nfsd-1915  [003] 77915.780960: write_io_done:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1
nfsd-1915  [003] 77915.780964: write_done:
	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1

Byte swapping and knfsd_fh_hash() are not available in "trace-cmd
report", where the print format string is actually used. These
data transformations have to be done during the TP_fast_assign step.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

afa720a0

svc: Report xprt dequeue latency · 55f5088c

Chuck Lever authored Mar 27, 2018

Record the time between when a rqstp is enqueued on a transport
and when it is dequeued. This includes how long the rqstp waits on
the queue and how long it takes the kernel scheduler to wake a
nfsd thread to service it.

The svc_xprt_dequeue trace point is altered to include the number
of microseconds between xprt_enqueue and xprt_dequeue.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

55f5088c

sunrpc: Report per-RPC execution stats · aaba72cd

Chuck Lever authored Mar 27, 2018

Introduce a mechanism to report the server-side execution latency of
each RPC. The goal is to enable user space to filter the trace
record for latency outliers, build histograms, etc.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

aaba72cd

sunrpc: Re-purpose trace_svc_process · 0b9547bf

Chuck Lever authored Mar 27, 2018

Currently, trace_svc_process has two call sites:

1. Just after a call to svc_send. svc_send already invokes
   trace_svc_send with the same arguments just before returning

2. Just before a call to svc_drop. svc_drop already invokes
   trace_svc_drop with the same arguments just after it is called

Therefore trace_svc_process does not provide any additional
information not already provided by these other trace points.

However, it would be useful to record the incoming RPC procedure.
So reuse trace_svc_process for this purpose.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

0b9547bf

sunrpc: Save remote presentation address in svc_xprt for trace events · ece200dd

Chuck Lever authored Mar 27, 2018

TP_printk defines a format string that is passed to user space for
converting raw trace event records to something human-readable.

My user space's printf (Oracle Linux 7), however, does not have a
%pI format specifier. The result is that what is supposed to be an
IP address in the output of "trace-cmd report" is just a string that
says the field couldn't be displayed.

To fix this, adopt the same approach as the client: maintain a pre-
formated presentation address for occasions when %pI is not
available.

The location of the trace_svc_send trace point is adjusted so that
rqst->rq_xprt is not NULL when the trace event is recorded.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ece200dd

sunrpc: Simplify trace_svc_recv · 41f306d0

Chuck Lever authored Mar 27, 2018

There doesn't seem to be a lot of value in calling trace_svc_recv
in the failing case.

1. There are two very common cases: one is the transport is not
ready, and the other is shutdown. Neither is terribly interesting.

2. The trace record for the failing case contains nothing but
the status code.

Therefore the trace point call site in the error exit is removed.
Since the trace point is now recording a length instead of a
status, rename the status field and remove the case that records a
zero XID.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

41f306d0

sunrpc: Simplify do_enqueue tracing · 7dbb53ba

Chuck Lever authored Mar 27, 2018

There are three cases where svc_xprt_do_enqueue() returns without
waking an nfsd thread:

1. There is no work to do

2. The transport is already busy

3. There are no available nfsd threads

Only 3. is truly interesting. Move the trace point so it records
that there was work to do and either an nfsd thread was awoken, or
a free one could not found.

As an additional clean up, remove a redundant comment and a couple
of dprintk call sites.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

7dbb53ba

sunrpc: Move trace_svc_xprt_dequeue() · caa3e106

Chuck Lever authored Mar 27, 2018

Reduce the amount of noise generated by trace_svc_xprt_dequeue by
moving it to the end of svc_get_next_xprt. This generates exactly
one trace event when a ready xprt is found, rather than spurious
events when there is no work to do. The empty events contain no
information that can't be obtained simply by tracing function calls
to svc_xprt_dequeue.

A small additional benefit is simplification of the svc_xprt_event
trace class, which no longer has to handle the case when the @xprt
parameter is NULL.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

caa3e106

sunrpc: Update show_svc_xprt_flags() to include recently added flags · 03edb90f

Chuck Lever authored Mar 27, 2018

XPT_KILL_TEMP was added by commit 546125d1 ("sunrpc: don't call
sleeping functions from the notifier block callbacks"), and
XPT_CONG_CTRL was added by commit 362142b2 ("sunrpc: flag
transports as having congestion control") .
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

03edb90f

svc: Simplify ->xpo_secure_port · 989f881e

Chuck Lever authored Mar 27, 2018

Clean up: Instead of returning a value that is used to set or clear
a bit, just make ->xpo_secure_port mangle that bit, and return void.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

989f881e

sunrpc: Remove unneeded pointer dereference · 63a1b156

Chuck Lever authored Mar 27, 2018

Clean up: Noticed during code inspection that there is already a
local automatic variable "xprt" so dereferencing rqst->rq_xprt
again is unnecessary.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

63a1b156

nfsd: use correct enum type in decode_cb_op_status · 47299f79

Stefan Agner authored Mar 19, 2018

Use enum nfs_cb_opnum4 in decode_cb_op_status. This fixes warnings
seen with clang:
  fs/nfsd/nfs4callback.c:451:36: warning: implicit conversion from
      enumeration type 'enum nfs_cb_opnum4' to different enumeration
      type 'enum nfs_opnum4' [-Wenum-conversion]
        status = decode_cb_op_status(xdr, OP_CB_SEQUENCE, &cb->cb_seq_status);
                 ~~~~~~~~~~~~~~~~~~~      ^~~~~~~~~~~~~~
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

47299f79

nfsd: fix boolreturn.cocci warnings · 51d87bc2

Fengguang Wu authored Mar 22, 2018

fs/nfsd/nfs4state.c:926:8-9: WARNING: return of 0/1 in function 'nfs4_delegation_exists' with return type bool
fs/nfsd/nfs4state.c:2955:9-10: WARNING: return of 0/1 in function 'nfsd4_compound_in_session' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

Fixes: 68b18f52 ("nfsd: make nfs4_get_existing_delegation less confusing")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
[bfields: also fix -EAGAIN]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

51d87bc2

20 Mar, 2018 11 commits

nfsd: create a separate lease for each delegation · 353601e7

J. Bruce Fields authored Feb 16, 2018

Currently we only take one vfs-level delegation (lease) for each file,
no matter how many clients hold delegations on that file.

Let's instead keep a one-to-one mapping between NFSv4 delegations and
VFS delegations. This turns out to be simpler.

There is still a many-to-one mapping of NFS opens to NFS files, and the
delegations on one file are all associated with one struct file. The
VFS can still distinguish between these delegations since we're setting
fl_owner to the struct nfs4_delegation now, not to the shared file.

I'm replacing at least one complicated function wholesale, which I don't
like to do, but I haven't figured out how to do this more incrementally.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

353601e7

nfsd: move sc_file assignment into alloc_init_deleg · 86d29b10

J. Bruce Fields authored Feb 21, 2018

Take an easy chance to simplify the caller a little.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

86d29b10

nfsd: factor out common delegation-destruction code · 0af6e690

J. Bruce Fields authored Feb 21, 2018

Pull some duplicated code into a common helper.

This changes the order in destroy_delegation a little, but it looks to
me like that shouldn't matter.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

0af6e690

nfsd: make nfs4_get_existing_delegation less confusing · 68b18f52

J. Bruce Fields authored Feb 19, 2018

This doesn't "get" anything.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

68b18f52

nfsd4: dp->dl_stid.sc_file doesn't need locking · 0c911f54

J. Bruce Fields authored Feb 16, 2018

The delegation isn't visible to anyone yet.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

0c911f54

nfsd4: set fl_owner to delegation, not file pointer · 653e514e

J. Bruce Fields authored Feb 15, 2018

For now this makes no difference, as for files having delegations,
there's a one-to-one relationship between an nfs4_file and its
nfs4_delegation.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

653e514e

nfsd: simplify nfs4_put_deleg_lease calls · cba7b3d1

J. Bruce Fields authored Feb 15, 2018

Every single caller gets the file out of the delegation, so let's do
that once in nfs4_put_deleg_lease.

Plus we'll need it there for other reasons.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

cba7b3d1

nfsd: simplify put of fi_deleg_file · b8232d33

J. Bruce Fields authored Feb 15, 2018

fi_delegees is basically just a reference count on users of
fi_deleg_file, which is cleared when fi_delegees goes to zero.  The
fi_deleg_file check here is redundant.  Also add an assertion to make
sure we don't have unbalanced puts.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

b8232d33

svcrdma: Clean up rdma_build_arg_xdr · 6f29d07c

Chuck Lever authored Mar 20, 2018

Clean up: The value of the byte_count parameter is already passed
to rdma_build_arg_xdr as part of the svc_rdma_op_ctxt structure.

Further, without the parameter called "byte_count" there is no need
to have the abbreviated "bc" automatic variable. "bc" can now be
called something more intuitive.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

6f29d07c

svcrdma: Consult max_qp_init_rd_atom when accepting connections · 97cc3264

Chuck Lever authored Mar 20, 2018

The target needs to return the lesser of the client's Inbound RDMA
Read Queue Depth (IRD), provided in the connection parameters, and
the local device's Outbound RDMA Read Queue Depth (ORD). The latter
limit is max_qp_init_rd_atom, not max_qp_rd_atom.

The svcrdma_ord value caps the ORD value for iWARP transports, which
do not exchange ORD/IRD values at connection time. Since no other
Linux kernel RDMA-enabled storage target sees fit to provide this
cap, I'm removing it here too.

initiator_depth is a u8, so ensure the computed ORD value does not
overflow that field.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

97cc3264

svcrdma: Use pr_err to report Receive errors · 0c4398ff

Chuck Lever authored Mar 20, 2018

Clean up: Other completion handlers use pr_err, not pr_warn.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

0c4398ff

19 Mar, 2018 8 commits

nfsd: move nfs4_client allocation to dedicated slabcache · 9258a2d5

Jeff Layton authored Mar 16, 2018

On x86_64, it's 1152 bytes, so we can avoid wasting 896 bytes each.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

9258a2d5

nfsd: don't require low ports for gss requests · 9d7ed135

J. Bruce Fields authored Mar 08, 2018

In a traditional NFS deployment using auth_unix, the clients are trusted
to correctly report the credentials of their logged-in users. The
server assumes that only root on client machines is allowed to send
requests from low-numbered ports, so it can use the originating port
number to distinguish "real" NFS clients from NFS clients run by
ordinary users, to prevent ordinary users from spoofing credentials.

The originating port number on a gss-authenticated request is less
important. The authentication ties the request to a user, and we take
it as proof that that user authorized the request. The low port number
check no longer adds much.

So, don't enforce low port numbers in the auth_gss case.
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

9d7ed135

nfsd: remove unsused "cp_consecutive" field · edcc8452
J. Bruce Fields authored Mar 07, 2018
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
edcc8452

lockd: make nlm_ntf_refcnt and nlm_ntf_wq static · 554faf28

Colin Ian King authored Feb 07, 2018

The variables nlm_ntf_refcnt and nlm_ntf_wq are local to the source and
do not need to be in global scope, so make them static.

Cleans up sparse warnings:
fs/lockd/svc.c:60:10: warning: symbol 'nlm_ntf_refcnt' was not declared.
Should it be static?
fs/lockd/svc.c:61:1: warning: symbol 'nlm_ntf_wq' was not declared.
Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

554faf28

nfsd4: send the special close_stateid in v4.0 replies as well · bd2decac

Jeff Layton authored Feb 15, 2018

We already send it for v4.1, but RFC7530 also notes that the stateid in
the close reply is bogus.

Always send the special close stateid, even in v4.0 responses. No client
should put any meaning on it whatsoever. For now, we continue to
increment the stateid value, though that might not be necessary either.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

bd2decac

SUNRPC: cache: ignore timestamp written to 'flush' file. · 3b68e6ee

NeilBrown authored Feb 14, 2018

The interface for flushing the sunrpc auth cache was poorly
designed and has caused problems a number of times.

The design is that you write a timestamp, and all entries
created before that time are discarded.
The most obvious problem is that this is not what people
actually want.  They want to just flush the whole cache.
The 1-second granularity can be a problem, as can the use
of wall-clock time.

A current problem is that code will write the current time to
this file - expecting it to clear everything - and if the
seconds number ticks over before this timestamp is checked,
the test "then >= now" fails, and a full flush isn't forced.

So lets just drop the subtleties and always flush the whole
cache.  The worst this could do is impose an extra cost
refilling it, but that would require someone to be using
non-standard tools.

We still report an error if the string written is not a number,
but we cause any valid number to flush the whole cache.
Reported-by: "Wang, Alan 1. (NSB - CN/Hangzhou)" <alan.1.wang@nokia-sbell.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

3b68e6ee

sunrpc: Fix unaligned access on sparc64 · 90a9b147

James Ettle authored Jan 28, 2018

Fix unaligned access in gss_{get,verify}_mic_v2() on sparc64
Signed-off-by: James Ettle <james@ettle.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

90a9b147

nfsd: remove blocked locks on client teardown · 68ef3bc3

Jeff Layton authored Mar 16, 2018

We had some reports of panics in nfsd4_lm_notify, and that showed a
nfs4_lockowner that had outlived its so_client.

Ensure that we walk any leftover lockowners after tearing down all of
the stateids, and remove any blocked locks that they hold.

With this change, we also don't need to walk the nbl_lru on nfsd_net
shutdown, as that will happen naturally when we tear down the clients.

Fixes: 76d348fa (nfsd: have nfsd4_lock use blocking locks for v4.1+ locks)
Reported-by: Frank Sorenson <fsorenso@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 4.9
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

68ef3bc3

19 Feb, 2018 1 commit
- Linux 4.16-rc2 · 91ab883e
  Linus Torvalds authored Feb 18, 2018
  
  91ab883e
18 Feb, 2018 1 commit

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0e06fb5b

Linus Torvalds authored Feb 18, 2018

Pull x86 Kconfig fixes from Thomas Gleixner:
 "Three patchlets to correct HIGHMEM64G and CMPXCHG64 dependencies in
  Kconfig when CPU selections are explicitely set to M586 or M686"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/Kconfig: Explicitly enumerate i686-class CPUs in Kconfig
  x86/Kconfig: Exclude i586-class CPUs lacking PAE support from the HIGHMEM64G Kconfig group
  x86/Kconfig: Add missing i586-class CPUs to the X86_CMPXCHG64 Kconfig group

0e06fb5b