Commits · c51da20c48b76ef1114d14b6b6ff190e11afab0e · Kirill Smelkov / linux

09 May, 2016 5 commits

more trivial ->iterate_shared conversions · c51da20c
Al Viro authored Apr 30, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
c51da20c

lustre: don't need to lock inode in directory lseek · 060ff688

Al Viro authored Apr 20, 2016

Note that lustre has its private mutex protecting directory pagecache;
if they ever remove it, they'll need to be careful with PageChecked()
use.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

060ff688

kernfs: no point locking directory around that generic_file_llseek() · 8cb0d2c1
Al Viro authored Apr 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
8cb0d2c1
configfs_readdir(): make safe under shared lock · a01b3007
Al Viro authored Apr 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
a01b3007

nfs: per-name sillyunlink exclusion · 884be175

Al Viro authored Apr 28, 2016

use d_alloc_parallel() for sillyunlink/lookup exclusion and
explicit rwsem (nfs_rmdir() being a writer and nfs_call_unlink() -
a reader) for rmdir/sillyunlink one.

That ought to make lookup/readdir/!O_CREAT atomic_open really
parallel on NFS.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

884be175

02 May, 2016 35 commits

nfs: switch to ->iterate_shared() · 9ac3d3e8

Al Viro authored Apr 28, 2016

aside of the usual care about seeding dcache from readdir, we need
to be careful about the pagecache evictions here.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9ac3d3e8

lookup_open(): lock the parent shared unless O_CREAT is given · 9cf843e3
Al Viro authored Apr 28, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
9cf843e3
lookup_open(): put the dentry fed to ->lookup() or ->atomic_open() into in-lookup hash · 6fbd0714
Al Viro authored Apr 28, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
6fbd0714

lookup_open(): expand the call of real_lookup() · 12fa5e24

Al Viro authored Apr 28, 2016

... and lose the duplicate IS_DEADDIR() - we'd already checked that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

12fa5e24

atomic_open(): reorder and clean up a bit · 384f26e2
Al Viro authored Apr 28, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
384f26e2
lookup_open(): lift the "fallback to !O_CREAT" logics from atomic_open() · 1643b43f
Al Viro authored Apr 27, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
1643b43f

atomic_open(): be paranoid about may_open() return value · b3d58eaf

Al Viro authored Apr 27, 2016

It should never return positives; however, with Linux S&M crowd
involved, no bogosity is impossible.  Results would be unpleasant...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

b3d58eaf

atomic_open(): delay open_to_namei_flags() until the method call · 0fb1ea09
Al Viro authored Apr 27, 2016
```
nobody else needs that transformation.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
0fb1ea09

do_last(): take fput() on error after opening to out: · fe9ec829

Al Viro authored Apr 27, 2016

make it conditional on *opened & FILE_OPENED; in addition to getting
rid of exit_fput: thing, it simplifies atomic_open() cleanup on
may_open() failure.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fe9ec829

do_last(): get rid of duplicate ELOOP check · 47f9dbd3
Al Viro authored Apr 27, 2016
```
may_open() will catch it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
47f9dbd3
atomic_open(): massage the create_error logics a bit · 55db2fd9
Al Viro authored Apr 27, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
55db2fd9
atomic_open(): consolidate "overridden ENOENT" in open-yourself cases · 9d0728e1
Al Viro authored Apr 27, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
9d0728e1
atomic_open(): don't bother with EEXIST check - it's done in do_last() · 5249e411
Al Viro authored Apr 27, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
5249e411
Merge branch 'for-linus' into work.lookups · df889b36
Al Viro authored May 02, 2016

df889b36

lookup_open(): expand the call of vfs_create() · ce8644fc

Al Viro authored Apr 26, 2016

Lift IS_DEADDIR handling up into the part common with atomic_open(),
remove it from the latter.  Collapse permission checks into the
call of may_o_create(), getting it closer to atomic_open() case.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ce8644fc

path_openat(): take O_PATH handling out of do_last() · 6ac08709

Al Viro authored Apr 26, 2016

do_last() and lookup_open() simpler that way and so does O_PATH
itself.  As it bloody well should: we find what the pathname
resolves to, same way as in stat() et.al. and associate it with
FMODE_PATH struct file.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6ac08709

simple local filesystems: switch to ->iterate_shared() · 3b0a3c1a

Al Viro authored Apr 20, 2016

no changes needed (XFS isn't simple, but it has the same parallelism
in the interesting parts exercised from CXFS).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

3b0a3c1a

dcache_{readdir,dir_lseek}() users: switch to ->iterate_shared · 4e82901c

Al Viro authored Apr 20, 2016

no need to lock directory in dcache_dir_lseek(), while we are
at it - per-struct file exclusion is enough.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

4e82901c

cifs: switch to ->iterate_shared() · 3125d265
Al Viro authored Apr 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
3125d265

fuse: switch to ->iterate_shared() · d9b3dbdc

Al Viro authored Apr 20, 2016

Switch dcache pre-seeding on readdir to d_alloc_parallel();
nothing else is needed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d9b3dbdc

switch all procfs directories ->iterate_shared() · f50752ea
Al Viro authored Apr 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
f50752ea
proc_sys_fill_cache(): switch to d_alloc_parallel() · 76aab3ab
Al Viro authored Apr 20, 2016
```
make it usable with directory locked shared
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
76aab3ab

proc_fill_cache(): switch to d_alloc_parallel() · 3781764b

Al Viro authored Apr 20, 2016

... making it usable with directory locked shared
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

3781764b

introduce a parallel variant of ->iterate() · 61922694

Al Viro authored Apr 20, 2016

New method: ->iterate_shared().  Same arguments as in ->iterate(),
called with the directory locked only shared.  Once all filesystems
switch, the old one will be gone.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

61922694

give readdir(2)/getdents(2)/etc. uniform exclusion with lseek() · 63b6df14

Al Viro authored Apr 20, 2016

same as read() on regular files has, and for the same reason.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

63b6df14

parallel lookups: actual switch to rwsem · 9902af79

Al Viro authored Apr 15, 2016

ta-da!

The main issue is the lack of down_write_killable(), so the places
like readdir.c switched to plain inode_lock(); once killable
variants of rwsem primitives appear, that'll be dealt with.

lockdep side also might need more work
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9902af79

parallel lookups machinery, part 4 (and last) · d9171b93

Al Viro authored Apr 15, 2016

If we *do* run into an in-lookup match, we need to wait for it to
cease being in-lookup.  Fortunately, we do have unused space in
in-lookup dentries - d_lru is never looked at until it stops being
in-lookup.

So we can stash a pointer to wait_queue_head from stack frame of
the caller of ->lookup().  Some precautions are needed while
waiting, but it's not that hard - we do hold a reference to dentry
we are waiting for, so it can't go away.  If it's found to be
in-lookup the wait_queue_head is still alive and will remain so
at least while ->d_lock is held.  Moreover, the condition we
are waiting for becomes true at the same point where everything
on that wq gets woken up, so we can just add ourselves to the
queue once.

d_alloc_parallel() gets a pointer to wait_queue_head_t from its
caller; lookup_slow() adjusted, d_add_ci() taught to use
d_alloc_parallel() if the dentry passed to it happens to be
in-lookup one (i.e. if it's been called from the parallel lookup).

That's pretty much it - all that remains is to switch ->i_mutex
to rwsem and have lookup_slow() take it shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d9171b93

parallel lookups machinery, part 3 · 94bdd655

Al Viro authored Apr 15, 2016

We will need to be able to check if there is an in-lookup
dentry with matching parent/name.  Right now it's impossible,
but as soon as start locking directories shared such beasts
will appear.

Add a secondary hash for locating those.  Hash chains go through
the same space where d_alias will be once it's not in-lookup anymore.
Search is done under the same bitlock we use for modifications -
with the primary hash we can rely on d_rehash() into the wrong
chain being the worst that could happen, but here the pointers are
buggered once it's removed from the chain.  On the other hand,
the chains are not going to be long and normally we'll end up
adding to the chain anyway.  That allows us to avoid bothering with
->d_lock when doing the comparisons - everything is stable until
removed from chain.

New helper: d_alloc_parallel().  Right now it allocates, verifies
that no hashed and in-lookup matches exist and adds to in-lookup
hash.

Returns ERR_PTR() for error, hashed match (in the unlikely case it's
been found) or new dentry.  In-lookup matches trigger BUG() for
now; that will change in the next commit when we introduce waiting
for ongoing lookup to finish.  Note that in-lookup matches won't be
possible until we actually go for shared locking.

lookup_slow() switched to use of d_alloc_parallel().

Again, these commits are separated only for making it easier to
review.  All this machinery will start doing something useful only
when we go for shared locking; it's just that the combination is
too large for my taste.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

94bdd655

parallel lookups machinery, part 2 · 84e710da

Al Viro authored Apr 15, 2016

We'll need to verify that there's neither a hashed nor in-lookup
dentry with desired parent/name before adding to in-lookup set.

One possible solution would be to hold the parent's ->d_lock through
both checks, but while the in-lookup set is relatively small at any
time, dcache is not.  And holding the parent's ->d_lock through
something like __d_lookup_rcu() would suck too badly.

So we leave the parent's ->d_lock alone, which means that we watch
out for the following scenario:
	* we verify that there's no hashed match
	* existing in-lookup match gets hashed by another process
	* we verify that there's no in-lookup matches and decide
that everything's fine.

Solution: per-directory kinda-sorta seqlock, bumped around the times
we hash something that used to be in-lookup or move (and hash)
something in place of in-lookup.  Then the above would turn into
	* read the counter
	* do dcache lookup
	* if no matches found, check for in-lookup matches
	* if there had been none of those either, check if the
counter has changed; repeat if it has.

The "kinda-sorta" part is due to the fact that we don't have much spare
space in inode.  There is a spare word (shared with i_bdev/i_cdev/i_pipe),
so the counter part is not a problem, but spinlock is a different story.

We could use the parent's ->d_lock, and it would be less painful in
terms of contention, for __d_add() it would be rather inconvenient to
grab; we could do that (using lock_parent()), but...

Fortunately, we can get serialization on the counter itself, and it
might be a good idea in general; we can use cmpxchg() in a loop to
get from even to odd and smp_store_release() from odd to even.

This commit adds the counter and updating logics; the readers will be
added in the next commit.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

84e710da

beginning of transition to parallel lookups - marking in-lookup dentries · 85c7f810

Al Viro authored Apr 14, 2016

marked as such when (would be) parallel lookup is about to pass them
to actual ->lookup(); unmarked when
	* __d_add() is about to make it hashed, positive or not.
	* __d_move() (from d_splice_alias(), directly or via
__d_unalias()) puts a preexisting dentry in its place
	* in caller of ->lookup() if it has escaped all of the
above.  Bug (WARN_ON, actually) if it reaches the final dput()
or d_instantiate() while still marked such.

As the result, we are guaranteed that for as long as the flag is
set, dentry will
	* remain negative unhashed with positive refcount
	* never have its ->d_alias looked at
	* never have its ->d_lru looked at
	* never have its ->d_parent and ->d_name changed

Right now we have at most one such for any given parent directory.
With parallel lookups that restriction will weaken to
	* only exist when parent is locked shared
	* at most one with given (parent,name) pair (comparison of
names is according to ->d_compare())
	* only exist when there's no hashed dentry with the same
(parent,name)

Transition will take the next several commits; unfortunately, we'll
only be able to switch to rwsem at the end of this series.  The
reason for not making it a single patch is to simplify review.

New primitives: d_in_lookup() (a predicate checking if dentry is in
the in-lookup state) and d_lookup_done() (tells the system that
we are done with lookup and if it's still marked as in-lookup, it
should cease to be such).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

85c7f810

__d_add(): don't drop/regain ->d_lock · 0568d705
Al Viro authored Apr 14, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
0568d705
lookup_slow(): bugger off on IS_DEADDIR() from the very beginning · 1936386e
Al Viro authored Apr 14, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
1936386e

nfs: missing wakeup in nfs_unblock_sillyrename() · d2caaa0a

Al Viro authored Apr 30, 2016

will be needed as soon as lookups are not serialized by ->i_mutex
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d2caaa0a

make ext2_get_page() and friends work without external serialization · be5b82db

Al Viro authored Apr 22, 2016

Right now ext2_get_page() (and its analogues in a bunch of other filesystems)
relies upon the directory being locked - the way it sets and tests Checked and
Error bits would be racy without that.  Switch to a slightly different scheme,
_not_ setting Checked in case of failure.  That way the logics becomes
	if Checked => OK
	else if Error => fail
	else if !validate => fail
	else => OK
with validation setting Checked or Error on success and failure resp. and
returning which one had happened.  Equivalent to the current logics, but unlike
the current logics not sensitive to the order of set_bit, test_bit getting
reordered by CPU, etc.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

be5b82db

ovl_lookup_real(): use lookup_one_len_unlocked() · b9e1d435
Al Viro authored Apr 14, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
b9e1d435