Commits · b0e35fda827e72cf4b065b52c4c472c28c004fca · nexedi / linux

30 May, 2014 24 commits

nfsd4: turn off zero-copy-read in exotic cases · b0e35fda

J. Bruce Fields authored Feb 04, 2014

We currently allow only one read per compound, with operations before
and after whose responses will require no more than about a page to
encode.

While we don't expect clients to violate those limits any time soon,
this limitation isn't really condoned by the spec, so to future proof
the server we should lift the limitation.

At the same time we'd like to continue to support zero-copy reads.

Supporting multiple zero-copy-reads per compound would require a new
data structure to replace struct xdr_buf, which can represent only one
set of included pages.

So for now we plan to modify encode_read() to support either zero-copy
or non-zero-copy reads, and use some heuristics at the start of the
compound processing to decide whether a zero-copy read will work.

This will allow us to support more exotic compounds without introducing
a performance regression in the normal case.

Later patches handle those "exotic compounds", this one just makes sure
zero-copy is turned off in those cases.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

b0e35fda

nfsd4: estimate sequence response size · ccae70a9

J. Bruce Fields authored Mar 23, 2014

Otherwise a following patch would turn off all 4.1 zero-copy reads.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ccae70a9

nfsd4: better estimate of getattr response size · b86cef60

J. Bruce Fields authored Mar 23, 2014

We plan to use this estimate to decide whether or not to allow zero-copy
reads. Currently we're assuming all getattr's are a page, which can be
both too small (ACLs e.g. may be arbitrarily long) and too large (after
an upcoming read patch this will unnecessarily prevent zero copy reads
in any read compound also containing a getattr).
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

b86cef60

nfsd4: don't treat readlink like a zero-copy operation · 476a7b1f

J. Bruce Fields authored Jan 20, 2014

There's no advantage to this zero-copy-style readlink encoding, and it
unnecessarily limits the kinds of compounds we can handle. (In practice
I can't see why a client would want e.g. multiple readlink calls in a
comound, but it's probably a spec violation for us not to handle it.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

476a7b1f

nfsd4: enforce rd_dircount · 3b299709

J. Bruce Fields authored Mar 20, 2014

As long as we're here, let's enforce the protocol's limit on the number
of directory entries to return in a readdir.

I don't think anyone's ever noticed our lack of enforcement, but maybe
there's more of a chance they will now that we allow larger readdirs.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

3b299709

nfsd4: allow large readdirs · 561f0ed4

J. Bruce Fields authored Jan 20, 2014

Currently we limit readdir results to a single page.  This can result in
a performance regression compared to NFSv3 when reading large
directories.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

561f0ed4

nfsd4: use session limits to release send buffer reservation · 32aaa62e

J. Bruce Fields authored Mar 20, 2014

Once we know the limits the session places on the size of the rpc, we
can also use that information to release any unnecessary reserved reply
buffer space.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

32aaa62e

nfsd4: adjust buflen to session channel limit · 47ee5298

J. Bruce Fields authored Mar 12, 2014

We can simplify session limit enforcement by restricting the xdr buflen
to the session size.

Also fix a preexisting bug: we should really have been taking into
account the auth-required space when comparing against session limits,
which are limits on the size of the entire rpc reply, including any krb5
overhead.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

47ee5298

rpc: define xdr_restrict_buflen · db3f58a9

J. Bruce Fields authored Mar 06, 2014

With this xdr_reserve_space can help us enforce various limits.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

db3f58a9

nfsd4: fix buflen calculation after read encoding · 30596768

J. Bruce Fields authored May 19, 2014

We don't necessarily want to assume that the buflen is the same
as the number of bytes available in the pages.  We may have some reason
to set it to something less (for example, later patches will use a
smaller buflen to enforce session limits).

So, calculate the buflen relative to the previous buflen instead of
recalculating it from scratch.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

30596768

nfsd4: nfsd4_check_resp_size should check against whole buffer · 89ff884e
J. Bruce Fields authored Mar 11, 2014
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
89ff884e
nfsd4: minor encode_read cleanup · 6ff9897d
J. Bruce Fields authored Mar 11, 2014
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
6ff9897d

nfsd4: more precise nfsd4_max_reply · 4f0cefbf

J. Bruce Fields authored Mar 11, 2014

It will turn out to be useful to have a more accurate estimate of reply
size; so, piggyback on the existing op reply-size estimators.

Also move nfsd4_max_reply to nfs4proc.c to get easier access to struct
nfsd4_operation and friends.  (Thanks to Christoph Hellwig for pointing
out that simplification.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

4f0cefbf

nfsd4: don't try to encode conflicting owner if low on space · 8c7424cf

J. Bruce Fields authored Mar 10, 2014

I ran into this corner case in testing: in theory clients can provide
state owners up to 1024 bytes long.  In the sessions case there might be
a risk of this pushing us over the DRC slot size.

The conflicting owner isn't really that important, so let's humor a
client that provides a small maxresponsize_cached by allowing ourselves
to return without the conflicting owner instead of outright failing the
operation.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

8c7424cf

nfsd4: convert 4.1 replay encoding · f5236013

J. Bruce Fields authored Mar 21, 2014

Limits on maxresp_sz mean that we only ever need to replay rpc's that
are contained entirely in the head.

The one exception is very small zero-copy reads.  That's an odd corner
case as clients wouldn't normally ask those to be cached.

in any case, this seems a little more robust.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

f5236013

nfsd4: allow encoding across page boundaries · 2825a7f9

J. Bruce Fields authored Aug 26, 2013

After this we can handle for example getattr of very large ACLs.

Read, readdir, readlink are still special cases with their own limits.

Also we can't handle a new operation starting close to the end of a
page.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

2825a7f9

nfsd4: size-checking cleanup · a8095f7e

J. Bruce Fields authored Mar 11, 2014

Better variable name, some comments, etc.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

a8095f7e

nfsd4: remove redundant encode buffer size checking · ea8d7720

J. Bruce Fields authored Mar 08, 2014

Now that all op encoders can handle running out of space, we no longer
need to check the remaining size for every operation; only nonidempotent
operations need that check, and that can be done by
nfsd4_check_resp_size.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ea8d7720

nfsd4: nfsd4_check_resp_size needn't recalculate length · 67492c99

J. Bruce Fields authored Mar 08, 2014

We're keeping the length updated as we go now, so there's no need for
the extra calculation here.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

67492c99

nfsd4: reserve space before inlining 0-copy pages · 4e21ac4b

J. Bruce Fields authored Mar 22, 2014

Once we've included page-cache pages in the encoding it's difficult to
remove them and restart encoding.  (xdr_truncate_encode doesn't handle
that case.)  So, make sure we'll have adequate space to finish the
operation first.

For now COMPOUND_SLACK_SPACE checks should prevent this case happening,
but we want to remove those checks.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

4e21ac4b

nfsd4: teach encoders to handle reserve_space failures · d0a381dd

J. Bruce Fields authored Jan 30, 2014

We've tried to prevent running out of space with COMPOUND_SLACK_SPACE
and special checking in those operations (getattr) whose result can vary
enormously.

However:
	- COMPOUND_SLACK_SPACE may be difficult to maintain as we add
	  more protocol.
	- BUG_ON or page faulting on failure seems overly fragile.
	- Especially in the 4.1 case, we prefer not to fail compounds
	  just because the returned result came *close* to session
	  limits.  (Though perfect enforcement here may be difficult.)
	- I'd prefer encoding to be uniform for all encoders instead of
	  having special exceptions for encoders containing, for
	  example, attributes.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

d0a381dd

nfsd4: "backfill" using write_bytes_to_xdr_buf · 082d4bd7

J. Bruce Fields authored Aug 29, 2013

Normally xdr encoding proceeds in a single pass from start of a buffer
to end, but sometimes we have to write a few bytes to an earlier
position.

Use write_bytes_to_xdr_buf for these cases rather than saving a pointer
to write to.  We plan to rewrite xdr_reserve_space to handle encoding
across page boundaries using a scratch buffer, and don't want to risk
writing to a pointer that was contained in a scratch buffer.

Also it will no longer be safe to calculate lengths by subtracting two
pointers, so use xdr_buf offsets instead.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

082d4bd7

nfsd4: use xdr_truncate_encode · 1fcea5b2

J. Bruce Fields authored Feb 26, 2014

Now that lengths are reliable, we can use xdr_truncate instead of
open-coding it everywhere.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

1fcea5b2

rpc: xdr_truncate_encode · 3e19ce76

J. Bruce Fields authored Feb 25, 2014

This will be used in the server side in a few cases:
	- when certain operations (read, readdir, readlink) fail after
	  encoding a partial response.
	- when we run out of space after encoding a partial response.
	- in readlink, where we initially reserve PAGE_SIZE bytes for
	  data, then truncate to the actual size.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

3e19ce76

28 May, 2014 5 commits

nfsd4: keep xdr buf length updated · 6ac90391
J. Bruce Fields authored Feb 26, 2014
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
6ac90391

nfsd4: no need for encode_compoundres to adjust lengths · dd97fdde

J. Bruce Fields authored Feb 26, 2014

xdr_reserve_space should now be calculating the length correctly as we
go, so there's no longer any need to fix it up here.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

dd97fdde

nfsd4: remove ADJUST_ARGS · f46d382a

J. Bruce Fields authored Jan 31, 2014

It's just uninteresting debugging code at this point.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

f46d382a

nfsd4: use xdr_stream throughout compound encoding · d3f627c8

J. Bruce Fields authored Feb 26, 2014

Note this makes ADJUST_ARGS useless; we'll remove it in the following
patch.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

d3f627c8

nfsd4: use xdr_reserve_space in attribute encoding · ddd1ea56

J. Bruce Fields authored Aug 27, 2013

This is a cosmetic change for now; no change in behavior.

Note we're just depending on xdr_reserve_space to do the bounds checking
for us, we're not really depending on its adjustment of iovec or xdr_buf
lengths yet, as those are fixed up by as necessary after the fact by
read-link operations and by nfs4svc_encode_compoundres.  However we do
have to update xdr->iov on read-like operations to prevent
xdr_reserve_space from messing with the already-fixed-up length of the
the head.

When the attribute encoding fails partway through we have to undo the
length adjustments made so far.  We do it manually for now, but later
patches will add an xdr_truncate_encode() helper to handle cases like
this.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ddd1ea56

27 May, 2014 2 commits

nfsd4: allow space for final error return · 5f4ab945

J. Bruce Fields authored Mar 07, 2014

This post-encoding check should be taking into account the need to
encode at least an out-of-space error to the following op (if any).
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

5f4ab945

nfsd4: fix encoding of out-of-space replies · 07d1f802

J. Bruce Fields authored Mar 06, 2014

If nfsd4_check_resp_size() returns an error then we should really be
truncating the reply here, otherwise we may leave extra garbage at the
end of the rpc reply.

Also add a warning to catch any cases where our reply-size estimates may
be wrong in the case of a non-idempotent operation.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

07d1f802

23 May, 2014 9 commits

nfsd4: reserve head space for krb5 integ/priv info · 1802a678

J. Bruce Fields authored Jan 21, 2014

Currently if the nfs-level part of a reply would be too large, we'll
return an error to the client.  But if the nfs-level part fits and
leaves no room for krb5p or krb5i stuff, then we just drop the request
entirely.

That's no good.  Instead, reserve some slack space at the end of the
buffer and make sure we fail outright if we'd come close.

The slack space here is a massive overstimate of what's required, we
should probably try for a tighter limit at some point.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

1802a678

nfsd4: move proc_compound xdr encode init to helper · 2d124dfa

J. Bruce Fields authored Jan 15, 2014

Mechanical transformation with no change of behavior.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

2d124dfa

nfsd4: tweak nfsd4_encode_getattr to take xdr_stream · d5184658

J. Bruce Fields authored Aug 26, 2013

Just change the nfsd4_encode_getattr api.  Not changing any code or
adding any new functionality yet.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

d5184658

nfsd4: embed xdr_stream in nfsd4_compoundres · 4aea24b2

J. Bruce Fields authored Jan 15, 2014

This is a mechanical transformation with no change in behavior.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

4aea24b2

nfsd4: decoding errors can still be cached and require space · e372ba60

J. Bruce Fields authored May 19, 2014

Currently a non-idempotent op reply may be cached if it fails in the
proc code but not if it fails at xdr decoding.  I doubt there are any
xdr-decoding-time errors that would make this a problem in practice, so
this probably isn't a serious bug.

The space estimates should also take into account space required for
encoding of error returns.  Again, not a practical problem, though it
would become one after future patches which will tighten the space
estimates.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

e372ba60

nfsd4: fix write reply size estimate · f34e432b

J. Bruce Fields authored May 16, 2014

The write reply also includes count and stable_how.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

f34e432b

nfsd4: read size estimate should include padding · 622f560e
J. Bruce Fields authored May 16, 2014
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
622f560e

nfsd4: allow larger 4.1 session drc slots · 24906f32

J. Bruce Fields authored Mar 12, 2014

The client is actually asking for 2532 bytes. I suspect that's a
mistake. But maybe we can allow some more. In theory lock needs more
if it might return a maximum-length lockowner in the denied case.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

24906f32

nfsd4: READ, READDIR, etc., are idempotent · 5b648699

J. Bruce Fields authored Mar 07, 2014

OP_MODIFIES_SOMETHING flags operations that we should be careful not to
initiate without being sure we have the buffer space to encode a reply.

None of these ops fall into that category.

We could probably remove a few more, but this isn't a very important
problem at least for ops whose reply size is easy to estimate.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

5b648699