Commits · ec28e02ca5f2a4287c19c585f8be2d9b3ba123ea · Kirill Smelkov / linux

28 Nov, 2012 2 commits

nfsd4: remove state lock from nfs4_state_shutdown · ec28e02c

Stanislav Kinsbursky authored Nov 21, 2012

Protection of __nfs4_state_shutdown() with nfs4_lock_state() looks redundant.

This function is called by the last NFSd thread on it's exit and state lock
protects actually two functions (del_recall_lru is protected by recall_lock):
1) nfsd4_client_tracking_exit
2) __nfs4_state_shutdown_net

"nfsd4_client_tracking_exit" doesn't require state lock protection, because it's
state can be modified only by tracker callbacks.
Here a re they:
1) create: is called only from nfsd4_proc_compound.
2) remove: is called from either nfsd4_proc_compound or nfs4_laundromat.
3) check: is called only from nfsd4_proc_compound.
4) grace_done; called only from nfs4_laundromat.

nfsd4_proc_compound is called onll by NFSd kthread, which is exiting right
now.
nfs4_laundromat is called by laundry_wq. But laundromat_work was canceled
already.

"__nfs4_state_shutdown_net" also doesn't require state lock protection,
because all NFSd kthreads are dead, and no race can happen with NFSd start,
because "nfsd_up" flag is still set.
Moreover, all Nfsd shutdown is protected with global nfsd_mutex.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ec28e02c

nfsd4: remove state lock from nfsd4_load_reboot_recovery_data · dba88ba5

J. Bruce Fields authored Nov 16, 2012

That function is only called under nfsd_mutex: we know that because the
only caller is nfsd_svc, via

        nfsd_svc
          nfsd_startup
            nfs4_state_start
              nfsd4_client_tracking_init
                client_tracking_ops->init == nfsd4_load_reboot_recovery_data

The shared state accessed here includes:

        - user_recovery_dirname: used here, modified only by
          nfs4_reset_recoverydir, which can be verified to only be
          called under nfsd_mutex.
        - filesystem state, protected by i_mutex (handwaving slightly
	  here)
        - rec_file, reclaim_str_hashtbl, reclaim_str_hashtbl_size: other
          than here, used only from code called from nfsd or laundromat
          threads, both of which should be started only after this runs
          (see nfsd_svc) and stopped before this could run again (see
          nfsd_shutdown, called from nfsd_last_thread).
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

dba88ba5

27 Nov, 2012 1 commit

nfsd4: return badname, not inval, on "." or "..", or "/" · a36b1725

J. Bruce Fields authored Nov 25, 2012

The spec requires badname, not inval, in these cases.

Some callers want us to return enoent, but I can see no justification
for that.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

a36b1725

26 Nov, 2012 7 commits

nfsd4: downgrade some fs/nfsd/nfs4state.c BUG's · 063b0fb9

J. Bruce Fields authored Nov 25, 2012

Linus has pointed out that indiscriminate use of BUG's can make it
harder to diagnose bugs because they can bring a machine down, often
before we manage to get any useful debugging information to the logs.
(Consider, for example, a BUG() that fires in a workqueue, or while
holding a spinlock).

Most of these BUG's won't do much more than kill an nfsd thread, but it
would still probably be safer to get out the warning without dying.

There's still more of this to do in nfsd/.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

063b0fb9

nfsd4: delay filling in write iovec array till after xdr decoding · ffe1137b

J. Bruce Fields authored Nov 15, 2012

Our server rejects compounds containing more than one write operation.
It's unclear whether this is really permitted by the spec; with 4.0,
it's possibly OK, with 4.1 (which has clearer limits on compound
parameters), it's probably not OK. No client that we're aware of has
ever done this, but in theory it could be useful.

The source of the limitation: we need an array of iovecs to pass to the
write operation. In the worst case that array of iovecs could have
hundreds of elements (the maximum rwsize divided by the page size), so
it's too big to put on the stack, or in each compound op. So we instead
keep a single such array in the compound argument.

We fill in that array at the time we decode the xdr operation.

But we decode every op in the compound before executing any of them. So
once we've used that array we can't decode another write.

If we instead delay filling in that array till the time we actually
perform the write, we can reuse it.

Another option might be to switch to decoding compound ops one at a
time. I considered doing that, but it has a number of other side
effects, and I'd rather fix just this one problem for now.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ffe1137b

nfsd4: move more write parameters into xdr argument · 70cc7f75
J. Bruce Fields authored Nov 16, 2012
```
In preparation for moving some of this elsewhere.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
70cc7f75

nfsd4: reorganize write decoding · 5a80a54d

J. Bruce Fields authored Nov 16, 2012

In preparation for moving some of it elsewhere.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

5a80a54d

nfsd4: simplify reading of opnum · 8a61b18c

J. Bruce Fields authored Nov 16, 2012

The comment here is totally bogus:
	- OP_WRITE + 1 is RELEASE_LOCKOWNER.  Maybe there was some older
	  version of the spec in which that served as a sort of
	  OP_ILLEGAL?  No idea, but it's clearly wrong now.
	- In any case, I can't see that the spec says anything about
	  what to do if the client sends us less ops than promised.
	  It's clearly nutty client behavior, and we should do
	  whatever's easiest: returning an xdr error (even though it
	  won't be consistent with the error on the last op returned)
	  seems fine to me.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

8a61b18c

nfsd4: no, we're not going to check tags for utf8 · 447bfcc9
J. Bruce Fields authored Nov 16, 2012
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
447bfcc9

nfsd: fix v4 reply caching · 57d276d7

J. Bruce Fields authored Nov 16, 2012

Very embarassing: 1091006c "nfsd: turn
on reply cache for NFSv4" missed a line, effectively leaving the reply
cache off in the v4 case.  I thought I'd tested that, but I guess not.

This time, wrote a pynfs test to confirm it works.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

57d276d7

15 Nov, 2012 16 commits

nfsd: make laundromat network namespace aware · 09121281

Stanislav Kinsbursky authored Nov 14, 2012

This patch moves laundromat_work to nfsd per-net context, thus allowing to run
multiple laundries.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

09121281

nfsd: pass nfsd_net instead of net to grace enders · 12760c66

Stanislav Kinsbursky authored Nov 14, 2012

Passing net context looks as overkill.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

12760c66

nfsd: use service net instead of hard-coded init_net · 3320fef1

Stanislav Kinsbursky authored Nov 14, 2012

This patch replaces init_net by SVC_NET(), where possible and also passes
proper context to nested functions where required.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

3320fef1

nfsd: make close_lru list per net · 73758fed

Stanislav Kinsbursky authored Nov 14, 2012

This list holds nfs4 clients (open) stateowner queue for last close replay,
which are network namespace aware. So let's make this list per network
namespace too.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

73758fed

nfsd: make client_lru list per net · 5ed58bb2

Stanislav Kinsbursky authored Nov 14, 2012

This list holds nfs4 clients queue for lease renewal, which are network
namespace aware. So let's make this list per network namespace too.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

5ed58bb2

nfsd: make sessionid_hashtbl allocated per net · 1872de0e

Stanislav Kinsbursky authored Nov 14, 2012

This hash holds established sessions state and closely associated with
nfs4_clients info, which are network namespace aware. So let's make it
allocated per network namespace too.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

1872de0e

nfsd: make lockowner_ino_hashtbl allocated per net · 20e9e2bc

Stanislav Kinsbursky authored Nov 14, 2012

This hash holds file lock owners and closely associated with nfs4_clients info,
which are network namespace aware. So let's make it allocated per network
namespace too.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

20e9e2bc

nfsd: make ownerstr_hashtbl allocated per net · 9b531137

Stanislav Kinsbursky authored Nov 14, 2012

This hash holds open owner state and closely associated with nfs4_clients
info, which are network namespace aware. So let's make it allocated per
network namespace too.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

9b531137

nfsd: make unconf_name_tree per net · a99454aa

Stanislav Kinsbursky authored Nov 14, 2012

This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

a99454aa

nfsd: make unconf_id_hashtbl allocated per net · 0a7ec377

Stanislav Kinsbursky authored Nov 14, 2012

This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

0a7ec377

nfsd: make conf_name_tree per net · 382a62e7

Stanislav Kinsbursky authored Nov 14, 2012

This tree holds nfs4_clients info, which are network namespace aware.
So let's make it per network namespace.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

382a62e7

nfsd: make conf_id_hashtbl allocated per net · 8daae4dc

Stanislav Kinsbursky authored Nov 14, 2012

This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

8daae4dc

nfsd: make reclaim_str_hashtbl allocated per net · 52e19c09

Stanislav Kinsbursky authored Nov 14, 2012

This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Note: this hash is used only by legacy tracker. So let's allocate hash in
tracker init.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

52e19c09

nfsd: make nfs4_client network namespace dependent · c212cecf

Stanislav Kinsbursky authored Nov 14, 2012

And use it's net where possible.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

c212cecf

nfsd: use service net instead of hard-coded net where possible · 7f2210fa
Stanislav Kinsbursky authored Nov 14, 2012
```
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
7f2210fa

svcrpc: Revert "sunrpc/cache.h: replace simple_strtoul" · 621eb19c

J. Bruce Fields authored Nov 14, 2012

Commit bbf43dc8 "sunrpc/cache.h: replace
simple_strtoul" introduced new range-checking which could cause get_int
to fail on unsigned integers too large to be represented as an int.

We could parse them as unsigned instead--but it turns out svcgssd is
actually passing down "-1" in some cases.  Which is perhaps stupid, but
there's nothing we can do about it now.

So just revert back to the previous "sloppy" behavior that accepts
either representation.

Cc: stable@vger.kernel.org
Reported-by: Sven Geggus <lists@fuchsschwanzdomain.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

621eb19c

14 Nov, 2012 2 commits
- nfsd4: get_backchannel_cred should be static · 2b4cf668
  Fengguang Wu authored Nov 13, 2012
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
  2b4cf668
- nfsd4: init_session should be declared static · 135ae827
  Fengguang Wu authored Nov 10, 2012
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
  135ae827
12 Nov, 2012 11 commits

nfsd: release the legacy reclaimable clients list in grace_done · 7e4f015d

Jeff Layton authored Nov 12, 2012

The current code holds on to this list until nfsd is shut down, but it's
never touched once the grace period ends. Release that memory back into
the wild when the grace period ends.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

7e4f015d

nfsd: get rid of cl_recdir field · 2216d449

Jeff Layton authored Nov 12, 2012

Remove the cl_recdir field from the nfs4_client struct. Instead, just
compute it on the fly when and if it's needed, which is now only when
the legacy client tracking code is in effect.

The error handling in the legacy client tracker is also changed to
handle the case where md5 is unavailable. In that case, we'll warn
the admin with a KERN_ERR message and disable the client tracking.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

2216d449

nfsd: move the confirmed and unconfirmed hlists to a rbtree · ac55fdc4

Jeff Layton authored Nov 12, 2012

The current code requires that we md5 hash the name in order to store
the client in the confirmed and unconfirmed trees. Change it instead
to store the clients in a pair of rbtrees, and simply compare the
cl_names directly instead of hashing them. This also necessitates that
we add a new flag to the clp->cl_flags field to indicate which tree
the client is currently in.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ac55fdc4

nfsd: don't search for client by hash on legacy reboot recovery gracedone · 0ce0c2b5

Jeff Layton authored Nov 12, 2012

When nfsd starts, the legacy reboot recovery code creates a tracking
struct for each directory in the v4recoverydir. When the grace period
ends, it basically does a "readdir" on the directory again, and matches
each dentry in there to an existing client id to see if it should be
removed or not. If the matching client doesn't exist, or hasn't
reclaimed its state then it will remove that dentry.

This is pretty inefficient since it involves doing a lot of hash-bucket
searching. It also means that we have to keep relying on being able to
search for a nfs4_client by md5 hashed cl_recdir name.

Instead, add a pointer to the nfs4_client that indicates the association
between the nfs4_client_reclaim and nfs4_client. When a reclaim operation
comes in, we set the pointer to make that association. On gracedone, the
legacy client tracker will keep the recdir around iff:

1/ there is a reclaim record for the directory

...and...

2/ there's an association between the reclaim record and a client record
-- that is, a create or check operation was performed on the client that
matches that directory.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

0ce0c2b5

nfsd: make nfs4_client_to_reclaim return a pointer to the reclaim record · 772a9bbb

Jeff Layton authored Nov 12, 2012

Later callers will need to make changes to the record.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

772a9bbb

nfsd: break out reclaim record removal into separate function · ce30e539

Jeff Layton authored Nov 12, 2012

We'll need to be able to call this from nfs4recover.c eventually.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ce30e539

nfsd: have nfsd4_find_reclaim_client take a char * argument · 278c931c

Jeff Layton authored Nov 12, 2012

Currently, it takes a client pointer, but later we're going to need to
search for these records without knowing whether a matching client even
exists.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

278c931c

nfsd: warn about impending removal of nfsdcld upcall · 8b0554e9

Jeff Layton authored Nov 12, 2012

Let's shoot for removing the nfsdcld upcall in 3.10. Most likely,
no one is actually using it so I don't expect this warning to
fire often (except maybe on misconfigured systems).
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

8b0554e9

nfsd: pass info about the legacy recoverydir in environment variables · f3aa7e24

Jeff Layton authored Nov 12, 2012

The usermodehelper upcall program can then decide to use this info as
a (one-way) transition mechanism to the new scheme. When a "check"
upcall occurs and the client doesn't exist in the database, we can
look to see whether the directory exists. If it does, then we'd add
the client to the database, remove the legacy recdir, and return
success to the kernel to allow the recovery to proceed.

For gracedone, we simply pass the v4recovery "topdir" so that the
upcall can clean it out prior to returning to the kernel.

A module parm is also added to disable the legacy conversion if
the admin chooses.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

f3aa7e24

nfsd: change heuristic for selecting the client_tracking_ops · 2d77bf0a

Jeff Layton authored Nov 12, 2012

First, try to use the new usermodehelper upcall. It should succeed or
fail quickly, so there's little cost to doing so.

If it fails, and the legacy tracking dir exists, use that. If it
doesn't exist then fall back to using nfsdcld.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

2d77bf0a

nfsd: add a usermodehelper upcall for NFSv4 client ID tracking · 2873d214

Jeff Layton authored Nov 12, 2012

Add a new client tracker upcall type that uses call_usermodehelper to
call out to a program. This seems to be the preferred method of
calling out to usermode these days for seldom-called upcalls. It's
simple and doesn't require a running daemon, so it should "just work"
as long as the binary is installed.

The client tracking exit operation is also changed to check for a
NULL pointer before running. The UMH upcall doesn't need to do anything
at module teardown time.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

2873d214

10 Nov, 2012 1 commit
- nfsd: remove unused argument to nfs4_has_reclaimed_state · a0af710a
  Jeff Layton authored Nov 09, 2012
```
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
  a0af710a