Commits · 27ac0ffeac80ba6b9580529568d06144df044366 · Kirill Smelkov / linux

09 Nov, 2013 40 commits

locks: break delegations on any attribute modification · 27ac0ffe

J. Bruce Fields authored Sep 20, 2011

NFSv4 uses leases to guarantee that clients can cache metadata as well
as data.

Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

27ac0ffe

locks: break delegations on link · 146a8595

J. Bruce Fields authored Sep 20, 2011

Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

146a8595

locks: break delegations on rename · 8e6d782c

J. Bruce Fields authored Sep 20, 2011

Cc: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

8e6d782c

locks: helper functions for delegation breaking · 5a14696c

J. Bruce Fields authored Aug 28, 2012

We'll need the same logic for rename and link.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

5a14696c

locks: break delegations on unlink · b21996e3

J. Bruce Fields authored Sep 20, 2011

We need to break delegations on any operation that changes the set of
links pointing to an inode.  Start with unlink.

Such operations also hold the i_mutex on a parent directory.  Breaking a
delegation may require waiting for a timeout (by default 90 seconds) in
the case of a unresponsive NFS client.  To avoid blocking all directory
operations, we therefore drop locks before waiting for the delegation.
The logic then looks like:

	acquire locks
	...
	test for delegation; if found:
		take reference on inode
		release locks
		wait for delegation break
		drop reference on inode
		retry

It is possible this could never terminate.  (Even if we take precautions
to prevent another delegation being acquired on the same inode, we could
get a different inode on each retry.)  But this seems very unlikely.

The initial test for a delegation happens after the lock on the target
inode is acquired, but the directory inode may have been acquired
further up the call stack.  We therefore add a "struct inode **"
argument to any intervening functions, which we use to pass the inode
back up to the caller in the case it needs a delegation synchronously
broken.

Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

b21996e3

namei: minor vfs_unlink cleanup · 9accbb97

J. Bruce Fields authored Aug 28, 2012

We'll be using dentry->d_inode in one more place.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9accbb97

locks: implement delegations · df4e8d2c

J. Bruce Fields authored Mar 05, 2012

Implement NFSv4 delegations at the vfs level using the new FL_DELEG lock
type.

Note nfsd is the only delegation user and is only using read
delegations.  Warn on any attempt to set a write delegation for now.
We'll come back to that case later.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

df4e8d2c

locks: introduce new FL_DELEG lock flag · 617588d5

J. Bruce Fields authored Jul 01, 2011

For now FL_DELEG is just a synonym for FL_LEASE.  So this patch doesn't
change behavior.

Next we'll modify break_lease to treat FL_DELEG leases differently, to
account for the fact that NFSv4 delegations should be broken in more
situations than Windows oplocks.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

617588d5

vfs: take i_mutex on renamed file · 6cedba89

J. Bruce Fields authored Mar 05, 2012

A read delegation is used by NFSv4 as a guarantee that a client can
perform local read opens without informing the server.

The open operation takes the last component of the pathname as an
argument, thus is also a lookup operation, and giving the client the
above guarantee means informing the client before we allow anything that
would change the set of names pointing to the inode.

Therefore, we need to break delegations on rename, link, and unlink.

We also need to prevent new delegations from being acquired while one of
these operations is in progress.

We could add some completely new locking for that purpose, but it's
simpler to use the i_mutex, since that's already taken by all the
operations we care about.

The single exception is rename.  So, modify rename to take the i_mutex
on the file that is being renamed.

Also fix up lockdep and Documentation/filesystems/directory-locking to
reflect the change.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6cedba89

vfs: rename I_MUTEX_QUOTA now that it's not used for quotas · 40bd22c9

J. Bruce Fields authored Apr 18, 2012

I_MUTEX_QUOTA is now just being used whenever we want to lock two
non-directories.  So the name isn't right.  I_MUTEX_NONDIR2 isn't
especially elegant but it's the best I could think of.

Also fix some outdated documentation.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

40bd22c9

vfs: don't use PARENT/CHILD lock classes for non-directories · 27555516

J. Bruce Fields authored Apr 25, 2012

Reserve I_MUTEX_PARENT and I_MUTEX_CHILD for locking of actual
directories.

(Also I_MUTEX_QUOTA isn't really a meaningful name for this locking
class any more; fixed in a later patch.)
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

27555516

vfs: pull ext4's double-i_mutex-locking into common code · 375e289e

J. Bruce Fields authored Apr 18, 2012

We want to do this elsewhere as well.

Also catch any attempts to use it for directories (where this ordering
would conflict with ancestor-first directory ordering in lock_rename).

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

375e289e

exportfs: fix quadratic behavior in filehandle lookup · f27c9298

J. Bruce Fields authored Oct 17, 2013

Suppose we're given the filehandle for a directory whose closest
ancestor in the dcache is its Nth ancestor.

The main loop in reconnect_path searches for an IS_ROOT ancestor of
target_dir, reconnects that ancestor to its parent, then recommences the
search for an IS_ROOT ancestor from target_dir.

This behavior is quadratic in N.  And there's really no need to restart
the search from target_dir each time: once a directory has been looked
up, it won't become IS_ROOT again.  So instead of starting from
target_dir each time, we can continue where we left off.

This simplifies the code and improves performance on very deep directory
heirachies.  (I can't think of any reason anyone should need heirarchies
a hundred or more deep, but the performance improvement may be valuable
if only to limit damage in case of abuse.)
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

f27c9298

exportfs: better variable name · efbf201f

J. Bruce Fields authored Oct 17, 2013

Replace another unhelpful acronym.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

efbf201f

exportfs: move most of reconnect_path to helper function · bbf7a8a3

J. Bruce Fields authored Oct 17, 2013

Also replace 3 easily-confused three-letter acronyms by more helpful
variable names.

Just cleanup, no change in functionality, with one exception: the
dentry_connected() check in the "out_reconnected" case will now only
check the ancestors of the current dentry instead of checking all the
way from target_dir.  Since we've already verified connectivity up to
this dentry, that should be sufficient.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

bbf7a8a3

exportfs: eliminate unused "noprogress" counter · e4b70ebe

J. Bruce Fields authored Oct 16, 2013

Note this counter is now being set to 0 on every pass through the loop,
so it no longer serves any useful purpose.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

e4b70ebe

exportfs: stop retrying once we race with rename/remove · a056cc89

J. Bruce Fields authored Oct 16, 2013

There are two places here where we could race with a rename or remove:

	- We could find the parent, but then be removed or renamed away
	  from that parent directory before finding our name in that
	  directory.
	- We could find the parent, and find our name in that parent,
	  but then be renamed or removed before we look ourselves up by
	  that name in that parent.

In both cases the concurrent rename or remove will take care of
reconnecting the directory that we're currently examining.  Our target
directory should then also be connected.  Check this and clear
DISCONNECTED in these cases instead of looping around again.

Note: we *do* need to check that this actually happened if we want to be
robust in the face of corrupted filesystems: a corrupted filesystem
could just return a completely wrong parent, and we want to fail with an
error in that case before starting to clear DISCONNECTED on
non-DISCONNECTED filesystems.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

a056cc89

exportfs: clear DISCONNECTED on all parents sooner · 0dbc018a

J. Bruce Fields authored Sep 09, 2013

Once we've found any connected parent, we know all our parents are
connected--that's true even if there's a concurrent rename.  May as well
clear them all at once and be done with it.
Reviewed-by: Cristoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

0dbc018a

exportfs: more detailed comment for path_reconnect · 78cee9a8

J. Bruce Fields authored Oct 22, 2013

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

78cee9a8

exportfs: BUG_ON in crazy corner case · 854ff5ca

Christoph Hellwig authored Oct 16, 2013

This would indicate a nasty bug in the dcache and has never triggered in
the past 10 years as far as I know.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

854ff5ca

dcache: fix outdated DCACHE_NEED_LOOKUP comment · 13a2c3be

J. Bruce Fields authored Oct 23, 2013

The DCACHE_NEED_LOOKUP case referred to here was removed with
39e3c955 "vfs: remove
DCACHE_NEED_LOOKUP".

There are only four real_lookup() callers and all of them pass in an
unhashed dentry just returned from d_alloc.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

13a2c3be

dcache: don't clear DCACHE_DISCONNECTED too early · f80de2cd

J. Bruce Fields authored Jul 18, 2012

DCACHE_DISCONNECTED should not be cleared until we're sure the dentry is
connected all the way up to the root of the filesystem.  It *shouldn't*
be cleared as soon as the dentry is connected to a parent.  That will
cause bugs at least on exportable filesystems.
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

f80de2cd

dcache: Don't set DISCONNECTED on "pseudo filesystem" dentries · e1a24bb0

J. Bruce Fields authored Jun 29, 2012

I can't for the life of me see any reason why anyone should care whether
a dentry that is never hooked into the dentry cache would need
DCACHE_DISCONNECTED set.

This originates from 4b936885 "fs:
improve scalability of pseudo filesystems", which probably just made the
false assumption the DCACHE_DISCONNECTED was meant to be set on anything
not connected to a parent somehow.

So this is just confusing.  Ideally the only uses of DCACHE_DISCONNECTED
would be in the filehandle-lookup code, which needs it to ensure
dentries are connected into the dentry tree before use.

I left d_alloc_pseudo there even though it's now equivalent to
__d_alloc(), just on the theory the name is better documentation of its
intended use outside dcache.c.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

e1a24bb0

dcache: use IS_ROOT to decide where dentry is hashed · 7632e465

J. Bruce Fields authored Jun 28, 2012

Every hashed dentry is either hashed in the dentry_hashtable, or a
superblock's s_anon list.

__d_drop() assumes it can determine which is the case by checking
DCACHE_DISCONNECTED; this is not true.

It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
only hashed on dentry_hashtable, but is fully connected to its parents
back to the root.

But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
attempts to connect a directory (found by filehandle lookup) back to
root by ascending to parents and performing lookups one at a time.  It
does not clear DCACHE_DISCONNECTED until it's done, and that is not at
all an atomic process.

In particular, it is possible for DCACHE_DISCONNECTED to be set on a
dentry which is hashed on the dentry_hashtable.

Instead, use IS_ROOT() to check which hash chain a dentry is on.  This
*does* work:

Dentries are hashed only by:

	- d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.

	- __d_rehash, called by _d_rehash: hashes to the dentry's
	  parent, and all callers of _d_rehash appear to have d_parent
	  set to a "real" parent.
	- __d_rehash, called by __d_move: rehashes the moved dentry to
	  hash chain determined by target, and assigns target's d_parent
	  to its d_parent, before dropping the dentry's d_lock.

Therefore I believe it's safe for a holder of a dentry's d_lock to
assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
true.

I believe the incorrect assumption about DCACHE_DISCONNECTED was
originally introduced by ceb5bdc2 "fs: dcache per-bucket dcache hash
locking".

Also add a comment while we're here.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

7632e465

ocfs2: get rid of impossible checks · b19f1336
Al Viro authored Nov 03, 2013
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
b19f1336
qnx4: i_sb is never NULL · fbad2bd1
Al Viro authored Nov 03, 2013
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
fbad2bd1

exportfs: fix 32-bit nfsd handling of 64-bit inode numbers · 950ee956

J. Bruce Fields authored Sep 10, 2013

Symptoms were spurious -ENOENTs on stat of an NFS filesystem from a
32-bit NFS server exporting a very large XFS filesystem, when the
server's cache is cold (so the inodes in question are not in cache).
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Trevor Cordes <trevor@tecnopolis.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

950ee956

vfs: split out vfs_getattr_nosec · b7a6ec52

J. Bruce Fields authored Oct 02, 2013

The filehandle lookup code wants this version of getattr.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

b7a6ec52

iget/iget5: don't bother with ->i_lock until we find a match · 5a3cd992
Al Viro authored Nov 06, 2013
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
5a3cd992

VFS: Put a small type field into struct dentry::d_flags · b18825a7

David Howells authored Sep 12, 2013

Put a type field into struct dentry::d_flags to indicate if the dentry is one
of the following types that relate particularly to pathwalk:

	Miss (negative dentry)
	Directory
	"Automount" directory (defective - no i_op->lookup())
	Symlink
	Other (regular, socket, fifo, device)

The type field is set to one of the first five types on a dentry by calls to
__d_instantiate() and d_obtain_alias() from information in the inode (if one is
given).

The type is cleared by dentry_unlink_inode() when it reconstitutes an existing
dentry as a negative dentry.

Accessors provided are:

	d_set_type(dentry, type)
	d_is_directory(dentry)
	d_is_autodir(dentry)
	d_is_symlink(dentry)
	d_is_file(dentry)
	d_is_negative(dentry)
	d_is_positive(dentry)

A bunch of checks in pathname resolution switched to those.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

b18825a7

elf{,_fdpic} coredump: get rid of pointless if (siginfo->si_signo) · afabada9

Al Viro authored Oct 14, 2013

we can't get to do_coredump() if that condition isn't satisfied...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

afabada9

constify do_coredump() argument · ec57941e
Al Viro authored Oct 13, 2013
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
ec57941e
constify copy_siginfo_to_user{,32}() · ce395960
Al Viro authored Oct 13, 2013
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
ce395960

... and kill anon_inode_getfile_private() · 078d8e62

Al Viro authored Oct 09, 2013

it's a seriously misguided API, now fortunately without users.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

078d8e62

rework aio migrate pages to use aio fs · 71ad7490

Benjamin LaHaise authored Sep 17, 2013

Don't abuse anon_inodes.c to host private files needed by aio;
we can bloody well declare a mini-fs of our own instead of
patching up what anon_inodes can create for us.
Tested-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

71ad7490

take anon inode allocation to libfs.c · 6987843f
Al Viro authored Oct 02, 2013
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
6987843f

new helper: dump_align() · 22a8cb82

Al Viro authored Oct 08, 2013

dump_skip to given alignment...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

22a8cb82

spufs: get rid of dump_emit() wrappers · 7b1f4020
Al Viro authored Oct 08, 2013
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
7b1f4020
dump_skip(): dump_seek() replacement taking coredump_params · 9b56d543
Al Viro authored Oct 08, 2013
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
9b56d543

make dump_emit() use vfs_write() instead of banging at ->f_op->write directly · 2507a4fb

Al Viro authored Oct 08, 2013

... and deal with short writes properly - the output might be to pipe, after
all; as it is, e.g. no-MMU case of elf_fdpic coredump can write a whole lot
more than a page worth of data at one call.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

2507a4fb