Commits · 0a93a47f042c459f0f46942c3a920e3c81878031 · nexedi / linux

01 Jun, 2009 2 commits

NFSv4: kill off complicated macro 'PROC' · 0a93a47f

Yu Zhiguo authored May 19, 2009

J. Bruce Fields wrote:
...
> (This is extremely confusing code to track down: note that
> proc->pc_decode is set to nfs4svc_decode_compoundargs() by the PROC()
> macro at the end of fs/nfsd/nfs4proc.c.  Which means, for example, that
> grepping for nfs4svc_decode_compoundargs() gets you nowhere.  Patches to
> kill off that macro would be welcomed....)

the macro 'PROC' is complicated and obscure, it had better
be killed off in order to make the code more clear.
Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

0a93a47f

NFSv4: do exact check about attribute specified · 3c8e0316

Yu Zhiguo authored May 16, 2009

Server should return NFS4ERR_ATTRNOTSUPP if an attribute specified is
not supported in current environment.
Operations CREATE, NVERIFY, OPEN, SETATTR and VERIFY should do this check.

This bug is found when do newpynfs tests. The names of the tests that failed
are following:
  CR12 NVF7a NVF7b NVF7c NVF7d NVF7f NVF7r NVF7s
  OPEN15 VF7a VF7b VF7c VF7d VF7f VF7r VF7s

Add function do_check_fattr() to do exact check:
1, Check attribute specified is supported by the NFSv4 server or not.
2, Check FATTR4_WORD0_ACL & FATTR4_WORD0_FS_LOCATIONS are supported
   in current environment or not.
3, Check attribute specified is writable or not.

step 1 and 3 are done in function nfsd4_decode_fattr() but removed
to this function now.
Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

3c8e0316

27 May, 2009 3 commits

knfsd: remove unreported filehandle stats counters · 1dbd0d53

Greg Banks authored Apr 01, 2009

The file nfsfh.c contains two static variables nfsd_nr_verified and
nfsd_nr_put.  These are counters which are incremented as a side
effect of the fh_verify() fh_compose() and fh_put() operations,
i.e. at least twice per NFS call for any non-trivial workload.
Needless to say this makes the cacheline that contains them (and any
other innocent victims) a very hot contention point indeed under high
call-rate workloads on multiprocessor NFS server.  It also turns out
that these counters are not used anywhere.  They're not reported to
userspace, they're not used in logic, they're not even exported from
the object file (let alone the module).  All they do is waste CPU time.

So this patch removes them.

Tests on a 16 CPU Altix A4700 with 2 10gige Myricom cards, configured
separately (no bonding).  Workload is 640 client threads doing directory
traverals with random small reads, from server RAM.

Before
======

Kernel profile:

  %   cumulative   self              self     total
 time   samples   samples    calls   1/call   1/call  name
  6.05   2716.00  2716.00    30406     0.09     1.02  svc_process
  4.44   4706.00  1990.00     1975     1.01     1.01  spin_unlock_irqrestore
  3.72   6376.00  1670.00     1666     1.00     1.00  svc_export_put
  3.41   7907.00  1531.00     1786     0.86     1.02  nfsd_ofcache_lookup
  3.25   9363.00  1456.00    10965     0.13     1.01  nfsd_dispatch
  3.10  10752.00  1389.00     1376     1.01     1.01  nfsd_cache_lookup
  2.57  11907.00  1155.00     4517     0.26     1.03  svc_tcp_recvfrom
  ...
  2.21  15352.00  1003.00     1081     0.93     1.00  nfsd_choose_ofc  <----
  ^^^^

Here the function nfsd_choose_ofc() reads a global variable
which by accident happened to be located in the same cacheline as
nfsd_nr_verified.

Call rate:

nullarbor:~ # pmdumptext nfs3.server.calls
...
Thu Dec 13 00:15:27     184780.663
Thu Dec 13 00:15:28     184885.881
Thu Dec 13 00:15:29     184449.215
Thu Dec 13 00:15:30     184971.058
Thu Dec 13 00:15:31     185036.052
Thu Dec 13 00:15:32     185250.475
Thu Dec 13 00:15:33     184481.319
Thu Dec 13 00:15:34     185225.737
Thu Dec 13 00:15:35     185408.018
Thu Dec 13 00:15:36     185335.764

After
=====

kernel profile:

  %   cumulative   self              self     total
 time   samples   samples    calls   1/call   1/call  name
  6.33   2813.00  2813.00    29979     0.09     1.01  svc_process
  4.66   4883.00  2070.00     2065     1.00     1.00  spin_unlock_irqrestore
  4.06   6687.00  1804.00     2182     0.83     1.00  nfsd_ofcache_lookup
  3.20   8110.00  1423.00    10932     0.13     1.00  nfsd_dispatch
  3.03   9456.00  1346.00     1343     1.00     1.00  nfsd_cache_lookup
  2.62  10622.00  1166.00     4645     0.25     1.01  svc_tcp_recvfrom
[...]
  0.10  42586.00    44.00       74     0.59     1.00  nfsd_choose_ofc  <--- HA!!
  ^^^^

Call rate:

nullarbor:~ # pmdumptext nfs3.server.calls
...
Thu Dec 13 01:45:28     194677.118
Thu Dec 13 01:45:29     193932.692
Thu Dec 13 01:45:30     194294.364
Thu Dec 13 01:45:31     194971.276
Thu Dec 13 01:45:32     194111.207
Thu Dec 13 01:45:33     194999.635
Thu Dec 13 01:45:34     195312.594
Thu Dec 13 01:45:35     195707.293
Thu Dec 13 01:45:36     194610.353
Thu Dec 13 01:45:37     195913.662
Thu Dec 13 01:45:38     194808.675

i.e. about a 5.3% improvement in call rate.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Reviewed-by: David Chinner <dgc@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

1dbd0d53

knfsd: fix reply cache memory corruption · cf0a586c

Greg Banks authored Apr 01, 2009

Fix a regression in the reply cache introduced when the code was
converted to use proper Linux lists.  When a new entry needs to be
inserted, the case where all the entries are currently being used
by threads is not correctly detected.  This can result in memory
corruption and a crash.  In the current code this is an extremely
unlikely corner case; it would require the machine to have 1024
nfsd threads and all of them to be busy at the same time.  However,
upcoming reply cache changes make this more likely; a crash due to
this problem was actually observed in field.
Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

cf0a586c

knfsd: reply cache cleanups · fca4217c

Greg Banks authored Apr 01, 2009

Make REQHASH() an inline function.  Rename hash_list to cache_hash.
Fix an obsolete comment.
Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

fca4217c

13 May, 2009 1 commit

lockd: fix FILE_LOCKING=n build error · dd4dc82d

Randy Dunlap authored May 12, 2009

lockd/svclock.c is missing a header file <linux/fs.h>.

<linux/fs.h> is missing a definition of locks_release_private()
for the config case of FILE_LOCKING=n, causing a build error:

fs/lockd/svclock.c:330: error: implicit declaration of function 'locks_release_private'

lockd without FILE_LOCKING doesn't make sense, so make LOCKD and LOCKD_V4
depend on FILE_LOCKING, and make NFS depend on FILE_LOCKING.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

dd4dc82d

06 May, 2009 1 commit

nfsd: nfs4_stat_init cleanup · 02cb2858

Wang Chen authored Apr 24, 2009

Save some loop time.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

02cb2858

03 May, 2009 2 commits

nfsd: use C99 struct initializers · 9064caae

Randy Dunlap authored Apr 28, 2009

Eliminate 56 sparse warnings like this one:

fs/nfsd/nfs4xdr.c:1331:15: warning: obsolete array initializer, use C99 syntax
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

9064caae

nfsd4: make recall callback an asynchronous rpc · 63e4863f

J. Bruce Fields authored May 01, 2009

As with the probe, this removes the need for another kthread.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

63e4863f

02 May, 2009 1 commit

nfsd4: track recall retries in nfs4_delegation · 3aea09dc

J. Bruce Fields authored May 01, 2009

Move this out of a local variable into the nfs4_delegation object in
preparation for making this an async rpc call (at which point we'll need
any state like this in a common object that's preserved across function
calls).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

3aea09dc

01 May, 2009 3 commits

nfsd4: remove unused dl_trunc · 6707bd3d

J. Bruce Fields authored May 01, 2009

There's no point in keeping this field around--it's always zero.

(Background: the protocol allows you to tell the client that the file is
about to be truncated, as an optimization to save the client from
writing back dirty pages that will just be discarded.  We don't
implement this hint.  If we do some day, adding this field back in will
be the least of the work involved.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

6707bd3d

nfsd4: eliminate struct nfs4_cb_recall · b53d40c5

J. Bruce Fields authored May 01, 2009

The nfs4_cb_recall struct is used only in nfs4_delegation, so its
pointer to the containing delegation is unnecessary--we could just use
container_of().

But there's no real reason to have this a separate struct at all--just
move these fields to nfs4_delegation.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

b53d40c5

nfsd4: rename callback struct to cb_conn · c237dc03

J. Bruce Fields authored Apr 29, 2009

I want to use the name for a struct that actually does represent a
single callback.

(Actually, I've never been sure it helps to a separate struct for the
callback information.  Some day maybe those fields could just be dumped
into struct nfs4_client.  I don't know.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

c237dc03

29 Apr, 2009 10 commits

nfsd4: replace callback thread by asynchronous rpc · e300a63c

J. Bruce Fields authored Mar 05, 2009

We don't really need a synchronous rpc, and moving to an asynchronous
rpc allows us to do without this extra kthread.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

e300a63c

nfsd4: lookup up callback cred only once · 3cef9ab2

J. Bruce Fields authored Feb 23, 2009

Lookup the callback cred once and then use it for all subsequent
callbacks.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

3cef9ab2

nfsd4: create rpc callback client from server thread · ecdd03b7

J. Bruce Fields authored Feb 23, 2009

The code is a little simpler, and it should be easier to avoid races, if
we just do all rpc client creation/destruction from nfsd or laundromat
threads and do only the rpc calls themselves asynchronously. The rpc
creation doesn't involve any significant waiting (it doesn't call the
client, for example), so there's no reason not to do this.

Also don't bother destroying the client on failure of the rpc null
probe. We may want to retry the probe later anyway.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

ecdd03b7

nfsd4: set cb_client inside setup_callback_client · e1cab5a5
J. Bruce Fields authored Feb 23, 2009
```
This is just a minor code simplification.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
```
e1cab5a5

nfsd4: set shorter timeout · 595947ac

J. Bruce Fields authored Mar 05, 2009

We tried to do something overly complicated with the callback rpc
timeouts here.  And they're wrong--the result is that by the time a
single callback times out, it's already too late to tell the client
(using the cb_path_down return to RENEW) that the callback is down.

Use a much shorter, simpler timeout.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

595947ac

nfsd4: setclientid_confirm callback-change fixes · f64f79ea

J. Bruce Fields authored Apr 29, 2009

This setclientid_confirm case should allow the client to change
callbacks, but it currently has a dummy implementation that just turns
off callbacks completely.  That dummy implementation isn't completely
correct either, though:

	- There's no need to remove any client recovery directory in
	  this case.
	- New clientid confirm verifiers should be generated (and
	  returned) in setclientid; there's no need to generate a new
	  one here.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

f64f79ea

nfsd: quiet compile warning · b8fd47ae

J. Bruce Fields authored Apr 29, 2009

Stephen Rothwell said:
"Today's linux-next build (powerpc ppc64_defconfig) produced this new
warning:

fs/nfsd/nfs4state.c: In function 'EXPIRED_STATEID':
fs/nfsd/nfs4state.c:2757: warning: comparison of distinct pointer types lacks a cast

Caused by commit 78155ed7 ("nfsd4:
distinguish expired from stale stateids")."
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>

b8fd47ae

nfsd: support ext4 i_version · c654b8a9

J. Bruce Fields authored Apr 16, 2009

ext4 supports a real NFSv4 change attribute, which is bumped whenever
the ctime would be updated, including times when two updates arrive
within a jiffy of each other.  (Note that although ext4 has space for
nanosecond-precision ctime, the real resolution is lower: it actually
uses jiffies as the time-source.)  This ensures clients will invalidate
their caches when they need to.

There is some fear that keeping the i_version up-to-date could have
performance drawbacks, so for now it's turned on only by a mount option.
We hope to do something better eventually.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Theodore Tso <tytso@mit.edu>

c654b8a9

nfsd4: delete obsolete xdr comments · 3352d2c2

J. Bruce Fields authored Apr 07, 2009

We don't need comments to tell us these macros are ugly.  And we're long
past trying to share any of this code with the BSD's.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

3352d2c2

nfsd: eliminate ENCODE_HEAD macro · bc749ca4

J. Bruce Fields authored Apr 07, 2009

This macro doesn't serve any useful purpose.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

bc749ca4

28 Apr, 2009 17 commits

NFSD: Stricter buffer size checking in fs/nfsd/nfsctl.c · e06b6405

Chuck Lever authored Apr 23, 2009

Clean up: For consistency, handle output buffer size checking in a
other nfsctl functions the same way it's done for write_versions().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

e06b6405

NFSD: Stricter buffer size checking in write_versions() · 261758b5

Chuck Lever authored Apr 23, 2009

While it's not likely today that there are enough NFS versions to
overflow the output buffer in write_versions(), we should be more
careful about detecting the end of the buffer.

The number of NFS versions will only increase as NFSv4 minor versions
are added.

Note that this API doesn't behave the same as portlist.  Here we
attempt to display as many versions as will fit in the buffer, and do
not provide any indication that an overflow would have occurred.  I
don't have any good rationale for that.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

261758b5

NFSD: Stricter buffer size checking in write_recoverydir() · 3d72ab8f

Chuck Lever authored Apr 23, 2009

While it's not likely a pathname will be longer than
SIMPLE_TRANSACTION_SIZE, we should be more careful about just
plopping it into the output buffer without bounds checking.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

3d72ab8f

SUNRPC: Clean up one_sock_name() · 017cb47f

Chuck Lever authored Apr 23, 2009

Clean up svc_one_sock_name() by setting up automatic variables for
frequently used expressions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

017cb47f

SUNRPC: Support PF_INET6 in one_sock_name() · 58de2f86

Chuck Lever authored Apr 23, 2009

Add an arm to the switch statement in svc_one_sock_name() so it can
construct the name of PF_INET6 sockets properly.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aime Le Rouzic <aime.le-rouzic@bull.net>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

58de2f86

SUNRPC: Switch one_sock_name() to use snprintf() · e7942b9f

Chuck Lever authored Apr 23, 2009

Use snprintf() in one_sock_name() to prevent overflowing the output
buffer.  If the name doesn't fit in the buffer, the buffer is filled
in with an empty string, and -ENAMETOOLONG is returned.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

e7942b9f

SUNRPC: pass buffer size to svc_sock_names() · 8435d34d

Chuck Lever authored Apr 23, 2009

Adjust the synopsis of svc_sock_names() to pass in the size of the
output buffer.  Add a documenting comment.

This is a cosmetic change for now.  A subsequent patch will make sure
the buffer length is passed to one_sock_name(), where the length will
actually be useful.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

8435d34d

SUNRPC: pass buffer size to svc_addsock() · bfba9ab4

Chuck Lever authored Apr 23, 2009

Adjust the synopsis of svc_addsock() to pass in the size of the output
buffer.  Add a documenting comment.

This is a cosmetic change for now.  A subsequent patch will make sure
the buffer length is passed to one_sock_name(), where the length will
actually be useful.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

bfba9ab4

NFSD: Prevent a buffer overflow in svc_xprt_names() · 335c54bd

Chuck Lever authored Apr 23, 2009

The svc_xprt_names() function can overflow its buffer if it's so near
the end of the passed in buffer that the "name too long" string still
doesn't fit.  Of course, it could never tell if it was near the end
of the passed in buffer, since its only caller passes in zero as the
buffer length.

Let's make this API a little safer.

Change svc_xprt_names() so it *always* checks for a buffer overflow,
and change its only caller to pass in the correct buffer length.

If svc_xprt_names() does overflow its buffer, it now fails with an
ENAMETOOLONG errno, instead of trying to write a message at the end
of the buffer.  I don't like this much, but I can't figure out a clean
way that's always safe to return some of the names, *and* an
indication that the buffer was not long enough.

The displayed error when doing a 'cat /proc/fs/nfsd/portlist' is
"File name too long".
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

335c54bd

NFSD: move lockd_up() before svc_addsock() · ea068bad

Chuck Lever authored Apr 23, 2009

Clean up.

A couple of years ago, a series of commits, finishing with commit
5680c446, swapped the order of the lockd_up() and svc_addsock() calls
in __write_ports().  At that time lockd_up() needed to know the
transport protocol of the passed-in socket to start a listener on the
same transport protocol.

These days, lockd_up() doesn't take a protocol argument; it always
starts both a UDP and TCP listener.  It's now more straightforward to
try the lockd_up() first, then do a lockd_down() if the svc_addsock()
fails.

Careful review of this code shows that the svc_sock_names() call is
used only to close the just-opened socket in case lockd_up() fails.
So it is no longer needed if lockd_up() is done first.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

ea068bad

NFSD: Finish refactoring __write_ports() · 0a5372d8

Chuck Lever authored Apr 23, 2009

Clean up: Refactor transport name listing out of __write_ports() to
make it easier to understand and maintain.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

0a5372d8

NFSD: Note an additional requirement when passing TCP sockets to portlist · c71206a7

Chuck Lever authored Apr 23, 2009

User space must call listen(3) on SOCK_STREAM sockets passed into
/proc/fs/nfsd/portlist, otherwise that listener is ignored.  Document
this.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

c71206a7

NFSD: Refactor socket creation out of __write_ports() · 0b7c2f6f

Chuck Lever authored Apr 23, 2009

Clean up: Refactor the socket creation logic out of __write_ports() to
make it easier to understand and maintain.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

0b7c2f6f

NFSD: Refactor portlist socket closing into a helper · 82d56591

Chuck Lever authored Apr 23, 2009

Clean up: Refactor the socket closing logic out of __write_ports() to
make it easier to understand and maintain.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

82d56591

NFSD: Refactor transport addition out of __write_ports() · 4eb68c26

Chuck Lever authored Apr 23, 2009

Clean up: Refactor transport addition out of __write_ports() to make
it easier to understand and maintain.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

4eb68c26

NFSD: Refactor transport removal out of __write_ports() · 4cd5dc75

Chuck Lever authored Apr 23, 2009

Clean up: Refactor transport removal out of __write_ports() to make it
easier to understand and maintain.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

4cd5dc75

SUNRPC: Fix error return value of svc_addr_len() · abc5c44d

Chuck Lever authored Apr 23, 2009

The svc_addr_len() helper function returns -EAFNOSUPPORT if it doesn't
recognize the address family of the passed-in socket address. However,
the return type of this function is size_t, which means -EAFNOSUPPORT
is turned into a very large positive value in this case.

The check in svc_udp_recvfrom() to see if the return value is less
than zero therefore won't work at all.

Additionally, handle_connect_req() passes this value directly to
memset(). This could cause memset() to clobber a large chunk of memory
if svc_addr_len() has returned an error. Currently the address family
of these addresses, however, is known to be supported long before
handle_connect_req() is called, so this isn't a real risk.

Change the error return value of svc_addr_len() to zero, which fits in
the range of size_t, and is safer to pass to memset() directly.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

abc5c44d