Commits · 998db52c03cd293d16a457f1b396cea932244147 · nexedi / linux

07 Aug, 2010 1 commit

nfsd4: fix file open accounting for RDWR opens · 998db52c

J. Bruce Fields authored Aug 07, 2010

Commit f9d7562f "nfsd4: share file
descriptors between stateid's" didn't correctly account for O_RDWR opens.
Symptoms include leaked files, resulting in failures to unmount and/or
warnings about orphaned inodes on reboot.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

998db52c

06 Aug, 2010 5 commits

nfsd: don't allow setting maxblksize after svc created · 7fa53cc8

J. Bruce Fields authored Aug 06, 2010

It's harmless to set this after the server is created, but also
ineffective, since the value is only used at the time of
svc_create_pooled().  So fail the attempt, in keeping with the pattern
set by write_versions, write_{lease,grace}time and write_recoverydir.

(This could break userspace that tried to write to nfsd/max_block_size
between setting up sockets and starting the server.  However, such code
wouldn't have worked anyway, and I don't know of any examples--rpc.nfsd
in nfs-utils, probably the only user of the interface, doesn't do that.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

7fa53cc8

nfsd: initialize nfsd versions before creating svc · e844a7b9

J. Bruce Fields authored Aug 06, 2010

Commit 59db4a0c "nfsd: move more into
nfsd_startup()" inadvertently moved nfsd_versions after
nfsd_create_svc().  On older distributions using an rpc.nfsd that does
not explicitly set the list of nfsd versions, this results in
svc-create_pooled() being called with an empty versions array.  The
resulting incomplete initialization leads to a NULL dereference in
svc_process_common() the first time a client accesses the server.

Move nfsd_reset_versions() back before the svc_create_pooled(); this
time, put it closer to the svc_create_pooled() call, to make this
mistake more difficult in the future.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

e844a7b9

net: sunrpc: removed duplicated #include · e2aa7f83

Andrea Gelmini authored Aug 05, 2010

Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

e2aa7f83

nfsd41: Fix a crash when a callback is retried · c18c821f

Boaz Harrosh authored Jun 29, 2010

If a callback is retried at nfsd4_cb_recall_done() due to
some error, the returned rpc reply crashes here:

@@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res,
 	u32 dummy;
 	__be32 *p;

 +	BUG_ON(!res);
 	if (res->cbs_minorversion == 0)
 		return 0;

[BUG_ON added for demonstration]

This is because the nfsd4_cb_done_sequence() has NULLed out
the task->tk_msg.rpc_resp pointer.

Also eventually the rpc would use the new slot without making
sure it is free by calling nfsd41_cb_setup_sequence().

This problem was introduced by a 4.1 protocol addition patch:
	[0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1

Which was overlooking the possibility of an RPC callback retries.
For not-4.1 case redoing the _prepare is harmless.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

c18c821f

nfsd: fix startup/shutdown order bug · 774f8bbd

J. Bruce Fields authored Aug 02, 2010

We must create the server before we can call init_socks or check the
number of threads.

Symptoms were a NULL pointer dereference in nfsd_svc().  Problem
identified by Jeff Layton.

Also fix a minor cleanup-on-error case in nfsd_startup().
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

774f8bbd

30 Jul, 2010 1 commit

nfsd: minor nfsd read api cleanup · 039a87ca

J. Bruce Fields authored Jul 30, 2010

Christoph points that the NFSv2/v3 callers know which case they want
here, so we may as well just call the file=NULL case directly instead of
making this conditional.

Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

039a87ca

29 Jul, 2010 5 commits

gcc-4.6: nfsd: fix initialized but not read warnings · 69049961

Andi Kleen authored Jul 20, 2010

Fixes at least one real minor bug: the nfs4 recovery dir sysctl
would not return its status properly.

Also I finished Al's 1e41568d ("Take ima_path_check() in nfsd
past dentry_open() in nfsd_open()") commit, it moved the IMA
code, but left the old path initializer in there.

The rest is just dead code removed I think, although I was not
fully sure about the "is_borc" stuff. Some more review
would be still good.

Found by gcc 4.6's new warnings.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

69049961

nfsd4: share file descriptors between stateid's · f9d7562f

J. Bruce Fields authored Jul 08, 2010

The vfs doesn't really allow us to "upgrade" a file descriptor from
read-only to read-write, and our attempt to do so in nfs4_upgrade_open
is ugly and incomplete.

Move to a different scheme where we keep multiple opens, shared between
open stateid's, in the nfs4_file struct.  Each file will be opened at
most 3 times (for read, write, and read-write), and those opens will be
shared between all clients and openers.  On upgrade we will do another
open if necessary instead of attempting to upgrade an existing open.
We keep count of the number of readers and writers so we know when to
close the shared files.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

f9d7562f

nfsd4: fix openmode checking on IO using lock stateid · 02921914

J. Bruce Fields authored Jul 29, 2010

It is legal to perform a write using the lock stateid that was
originally associated with a read lock, or with a file that was
originally opened for read, but has since been upgraded.

So, when checking the openmode, check the mode associated with the
open stateid from which the lock was derived.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

02921914

nfsd4: miscellaneous process_open2 cleanup · 21fb4016
J. Bruce Fields authored Jul 28, 2010
```
Move more work into helper functions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
21fb4016

nfsd4: don't pretend to support write delegations · c3e48080

J. Bruce Fields authored Jul 28, 2010

The delegation code mostly pretends to support either read or write
delegations.  However, correct support for write delegations would
require, for example, breaking of delegations (and/or implementation of
cb_getattr) on stat.  Currently all that stops us from handing out
delegations is a subtle reference-counting issue.

Avoid confusion by adding an earlier check that explicitly refuses write
delegations.

For now, though, I'm not going so far as to rip out existing
half-support for write delegations, in case we get around to using that
soon.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

c3e48080

27 Jul, 2010 1 commit

nfsd: bypass readahead cache when have struct file · fa0a2126

J. Bruce Fields authored Jul 27, 2010

The readahead cache compensates for the fact that the NFS server
currently does an open and close on every IO operation in the NFSv2 and
NFSv3 case.

In the NFSv4 case we have long-lived struct files associated with client
opens, so there's no need for this.  In fact, concurrent IO's using
trying to modify the same file->f_ra may cause problems.

So, don't bother with the readahead cache in that case.

Note eventually we'll likely do this in the v2/v3 case as well by
keeping a cache of struct files instead of struct file_ra_state's.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

fa0a2126

23 Jul, 2010 8 commits

nfsd: minor nfsd_svc() cleanup · af4718f3

J. Bruce Fields authored Jul 21, 2010

More idiomatic to put the error case in the if clause.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

af4718f3

nfsd: move more into nfsd_startup() · 59db4a0c

J. Bruce Fields authored Jul 21, 2010

This is just cleanup--it's harmless to call nfsd_rachache_init,
nfsd_init_socks, and nfsd_reset_versions more than once.  But there's no
point to it.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

59db4a0c

nfsd: just keep single lockd reference for nfsd · ac77efbe

Jeff Layton authored Jul 20, 2010

Right now, nfsd keeps a lockd reference for each socket that it has
open. This is unnecessary and complicates the error handling on
startup and shutdown. Change it to just do a lockd_up when starting
the first nfsd thread just do a single lockd_down when taking down the
last nfsd thread. Because of the strange way the sv_count is handled
this requires an extra flag to tell whether the nfsd_serv holds a
reference for lockd or not.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ac77efbe

nfsd: clean up nfsd_create_serv error handling · 628b3687

Jeff Layton authored Jul 21, 2010

There doesn't seem to be any need to reset the nfssvc_boot time if the
nfsd startup failed.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

628b3687

nfsd: fix error handling in __write_ports_addxprt · 0cd14a06

Jeff Layton authored Jul 19, 2010

__write_ports_addxprt calls nfsd_create_serv. That increases the
refcount of nfsd_serv (which is tracked in sv_nrthreads). The service
only decrements the thread count on error, not on success like
__write_ports_addfd does, so using this interface leaves the nfsd
thread count high.

Fix this by having this function call svc_destroy() on error to release
the reference (and possibly to tear down the service) and simply
decrement the refcount without tearing down the service on success.

This makes the sv_threads handling work basically the same in both
__write_ports_addxprt and __write_ports_addfd.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

0cd14a06

nfsd: fix error handling when starting nfsd with rpcbind down · 78a8d7c8

Jeff Layton authored Jul 19, 2010

The refcounting for nfsd is a little goofy. What happens is that we
create the nfsd RPC service, attach sockets to it but don't actually
start the threads until someone writes to the "threads" procfile. To do
this, __write_ports_addfd will create the nfsd service and then will
decrement the refcount when exiting but won't actually destroy the
service.

This is fine when there aren't errors, but when there are this can
cause later attempts to start nfsd to fail. nfsd_serv will be set,
and that causes __write_versions to return EBUSY.

Fix this by calling svc_destroy on nfsd_serv when this function is
going to return error.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

78a8d7c8

nfsd4: fix v4 state shutdown error paths · 4ad9a344

Jeff Layton authored Jul 19, 2010

If someone tries to shut down the laundry_wq while it isn't up it'll
cause an oops.

This can happen because write_ports can create a nfsd_svc before we
really start the nfs server, and we may fail before the server is ever
started.

Also make sure state is shutdown on error paths in nfsd_svc().

Use a common global nfsd_up flag instead of nfs4_init, and create common
helper functions for nfsd start/shutdown, as there will be other work
that we want done only when we the number of nfsd threads transitions
between zero and nonzero.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

4ad9a344

nfsd: remove unused assignment from nfsd_link · 55b13354

J. Bruce Fields authored Jul 19, 2010

Trivial cleanup, since "dest" is never used.
Reported-by: Anshul Madan <Anshul.Madan@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

55b13354

07 Jul, 2010 1 commit

NFSD: Fill in WCC data for REMOVE, RMDIR, MKNOD, and MKDIR · 43a9aa64

Chuck Lever authored Jul 06, 2010

Some well-known NFSv3 clients drop their directory entry caches when
they receive replies with no WCC data.  Without this data, they
employ extra READ, LOOKUP, and GETATTR requests to ensure their
directory entry caches are up to date, causing performance to suffer
needlessly.

In order to return WCC data, our server has to have both the pre-op
and the post-op attribute data on hand when a reply is XDR encoded.
The pre-op data is filled in when the incoming fh is locked, and the
post-op data is filled in when the fh is unlocked.

Unfortunately, for REMOVE, RMDIR, MKNOD, and MKDIR, the directory fh
is not unlocked until well after the reply has been XDR encoded.  This
means that encode_wcc_data() does not have wcc_data for the parent
directory, so none is returned to the client after these operations
complete.

By unlocking the parent directory fh immediately after the internal
operations for each NFS procedure is complete, the post-op data is
filled in before XDR encoding starts, so it can be returned to the
client properly.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

43a9aa64

06 Jul, 2010 2 commits

nfsd4: comment nitpick · 6a85d6c7

J. Bruce Fields authored Jul 06, 2010

Reported-by: "Madan, Anshul" <Anshul.Madan@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

6a85d6c7

sunrpc: make the cache cleaner workqueue deferrable · 8eab945c

Artem Bityutskiy authored Jul 01, 2010

This patch makes the cache_cleaner workqueue deferrable, to prevent
unnecessary system wake-ups, which is very important for embedded
battery-powered devices.

do_cache_clean() is called every 30 seconds at the moment, and often
makes the system wake up from its power-save sleep state. With this
change, when the workqueue uses a deferrable timer, the
do_cache_clean() invocation will be delayed and combined with the
closest "real" wake-up. This improves the power consumption situation.

Note, I tried to create a DECLARE_DELAYED_WORK_DEFERRABLE() helper
macro, similar to DECLARE_DELAYED_WORK(), but failed because of the
way the timer wheel core stores the deferrable flag (it is the
LSBit in the time->base pointer). My attempt to define a static
variable with this bit set ended up with the "initializer element is
not constant" error.

Thus, I have to use run-time initialization, so I created a new
cache_initialize() function which is called once when sunrpc is
being initialized.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

8eab945c

24 Jun, 2010 2 commits

nfsd4: fix delegation recall race use-after-free · cba9ba4b

J. Bruce Fields authored Jun 01, 2010

When the rarely-used callback-connection-changing setclientid occurs
simultaneously with a delegation recall, we rerun the recall by
requeueing it on a workqueue. But we also need to take a reference on
the delegation in that case, since the delegation held by the rpc itself
will be released by the rpc_release callback.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

cba9ba4b

nfsd4: fix deleg leak on callback error · ac94bf58
J. Bruce Fields authored May 31, 2010
```
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
```
ac94bf58

23 Jun, 2010 1 commit
- nfsd4: remove some debugging code · ec8acac8
  J. Bruce Fields authored Jun 16, 2010
```
This is overkill.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
```
  ec8acac8
22 Jun, 2010 3 commits

nfsd: nfs4callback encode_stateid helper function · 9303bbd3

Benny Halevy authored May 25, 2010

To be used also for the pnfs cb_layoutrecall callback
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd4: fix cb_recall encoding]
    "nfsd: nfs4callback encode_stateid helper function" forgot to reserve
    more space after return from the new helper.
Reported-by: Michael Groshans <groshans@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

9303bbd3

nfsd4: translate memory errors to delay, not serverfault · 4731030d

J. Bruce Fields authored Jun 22, 2010

If the server is out of memory is better for clients to back off and
retry than to just error out.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

4731030d

nfsd4; fix session reference count leak · 76407f76

J. Bruce Fields authored Jun 22, 2010

Note the session has to be put() here regardless of what happens to the
client.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

76407f76

31 May, 2010 4 commits

nfsd4: don't bother storing callback reply tag · 68a4b48c

J. Bruce Fields authored May 27, 2010

We don't use this, and probably never will.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

68a4b48c

nfsd4: fix use of op_share_access · 24a0111e

J. Bruce Fields authored May 18, 2010

NFSv4.1 adds additional flags to the share_access argument of the open
call.  These flags need to be masked out in some of the existing code,
but current code does that inconsistently.
Tested-by: Michael Groshans <groshans@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

24a0111e

nfsd4: treat more recall errors as failures · 172c85dd

J. Bruce Fields authored May 30, 2010

If a recall fails for some unexpected reason, instead of ignoring it and
treating it like a success, it's safer to treat it as a failure,
preventing further delgation grants and returning CB_PATH_DOWN.

Also put put switches in a (two me) more logical order, with normal case
first.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

172c85dd

nfsd4: remove extra put() on callback errors · 378b7d37

J. Bruce Fields authored May 25, 2010

Since rpc_call_async() guarantees that the release method will be called
even on failure, this put is wrong.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

378b7d37

30 May, 2010 6 commits

Linux 2.6.35-rc1 · 67a3e12b
Linus Torvalds authored May 30, 2010
```
.. and thus endeth the merge window.
```
67a3e12b

Merge branch 'slub/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 · 3b03117c

Linus Torvalds authored May 30, 2010

* 'slub/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  SLUB: Allow full duplication of kmalloc array for 390
  slub: move kmem_cache_node into it's own cacheline

3b03117c

Merge branch 'core-fixes-for-linus' of... · fa7eadab

Linus Torvalds authored May 30, 2010

Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  mutex: Fix optimistic spinning vs. BKL

fa7eadab

Merge branch 'perf-fixes-for-linus' of... · bc7d352c

Linus Torvalds authored May 30, 2010

Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tui: Fix last use_browser problem related to .perfconfig
  perf symbols: Add the build id cache to the vmlinux path
  perf tui: Reset use_browser if stdout is not a tty
  ring-buffer: Move zeroing out excess in page to ring buffer code
  ring-buffer: Reset "real_end" when page is filled

bc7d352c

ia64: revert __node_random addition · b3f2f6cd

Linus Torvalds authored May 30, 2010

This partially reverts commit 4ec37de8
("[IA64] Fix build breakage"), since the commit that made it necessary
got reverted earlier (see commit 35926ff5, 'Revert "cpusets:
randomize node rotor used in cpuset_mem_spread_node()"')

Even if we ever re-introduce this, there is no reason to make
__node_random be some architecture-specific function.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b3f2f6cd

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 003386ff

Linus Torvalds authored May 30, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  mm: export generic_pipe_buf_*() to modules
  fuse: support splice() reading from fuse device
  fuse: allow splice to move pages
  mm: export remove_from_page_cache() to modules
  mm: export lru_cache_add_*() to modules
  fuse: support splice() writing to fuse device
  fuse: get page reference for readpages
  fuse: use get_user_pages_fast()
  fuse: remove unneeded variable

003386ff