Commit 1b3d7c93 authored by Alexander Viro's avatar Alexander Viro Committed by Linus Torvalds

[PATCH] (2.5.4) death of ->i_zombie

	Rediffed to 2.5.4, documentation added.  This variant grabs
->s_vfs_rename_sem only for cross-directory renames.
parent 9c73428c
...@@ -48,28 +48,30 @@ prototypes: ...@@ -48,28 +48,30 @@ prototypes:
locking rules: locking rules:
all may block all may block
BKL i_sem(inode) i_zombie(inode) BKL i_sem(inode)
lookup: yes yes no lookup: yes yes
create: yes yes yes create: yes yes
link: yes yes yes link: yes yes
mknod: yes yes yes mknod: yes yes
mkdir: yes yes yes mkdir: yes yes
unlink: yes yes yes unlink: yes yes (both)
rmdir: yes yes yes (see below) rmdir: yes yes (both) (see below)
rename: yes yes (both) yes (both) (see below) rename: yes yes (all) (see below)
readlink: no no no readlink: no no
follow_link: no no no follow_link: no no
truncate: yes yes no (see below) truncate: yes yes (see below)
setattr: yes if ATTR_SIZE no setattr: yes if ATTR_SIZE
permssion: yes no no permssion: yes no
getattr: (see below) getattr: (see below)
revalidate: no (see below) revalidate: no (see below)
Additionally, ->rmdir() has i_zombie on victim and so does ->rename() setxattr: DOCUMENT_ME
in case when target exists and is a directory. getxattr: DOCUMENT_ME
->rename() on directories has (per-superblock) ->s_vfs_rename_sem. removexattr: DOCUMENT_ME
Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_sem on
victim.
cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
->revalidate(), it may be called both with and without the i_sem ->revalidate(), it may be called both with and without the i_sem
on dentry->d_inode. VFS never calls it with i_zombie on dentry->d_inode, on dentry->d_inode.
but watch for other methods directly calling this one...
->truncate() is never called directly - it's a callback, not a ->truncate() is never called directly - it's a callback, not a
method. It's called by vmtruncate() - library function normally used by method. It's called by vmtruncate() - library function normally used by
->setattr(). Locking information above applies to that call (i.e. is ->setattr(). Locking information above applies to that call (i.e. is
...@@ -77,6 +79,9 @@ inherited from ->setattr() - vmtruncate() is used when ATTR_SIZE had been ...@@ -77,6 +79,9 @@ inherited from ->setattr() - vmtruncate() is used when ATTR_SIZE had been
passed). passed).
->getattr() is currently unused. ->getattr() is currently unused.
See Documentation/filesystems/directory-locking for more detailed discussion
of the locking scheme for directory operations.
--------------------------- super_operations --------------------------- --------------------------- super_operations ---------------------------
prototypes: prototypes:
void (*read_inode) (struct inode *); void (*read_inode) (struct inode *);
......
Locking scheme used for directory operations is based on two
kinds of locks - per-inode (->i_sem) and per-filesystem (->s_vfs_rename_sem).
For our purposes all operations fall in 5 classes:
1) read access. Locking rules: caller locks directory we are accessing.
2) object creation. Locking rules: same as above.
3) object removal. Locking rules: caller locks parent, finds victim,
locks victim and calls the method.
4) rename() that is _not_ cross-directory. Locking rules: caller locks
the parent, finds source and target, if target already exists - locks it
and then calls the method.
5) cross-directory rename. The trickiest in the whole bunch. Locking
rules:
* lock the filesystem
* lock parents in "ancestors first" order.
* find source and target.
* if old parent is equal to or is a descendent of target
fail with -ENOTEMPTY
* if new parent is equal to or is a descendent of source
fail with -ELOOP
* if target exists - lock it.
* call the method.
The rules above obviously guarantee that all directories that are going to be
read, modified or removed by method will be locked by caller.
If no directory is its own ancestor, the scheme above is deadlock-free.
Proof:
First of all, at any moment we have a partial ordering of the
objects - A < B iff A is an ancestor of B.
That ordering can change. However, the following is true:
(1) if operation different from cross-directory rename holds lock on A and
attempts to acquire lock on B, A will remain the parent of B until we
acquire the lock on B. (Proof: only cross-directory rename can change
the parent of object and it would have to lock the parent).
(2) if cross-directory rename holds the lock on filesystem, order will not
change until rename acquires all locks. (Proof: other cross-directory
renames will be blocked on filesystem lock and we don't start changing
the order until we had acquired all locks).
Now consider the minimal deadlock. Each process is blocked on
attempt to acquire some lock and already holds at least one lock. Let's
consider the set of contended locks. First of all, filesystem lock is
not contended, since any process blocked on it is not holding any locks.
Thus all processes are blocked on ->i_sem.
Any contended object is either held by cross-directory rename or
has a child that is also contended. Indeed, suppose that it is held by
operation other than cross-directory rename. Then the lock this operation
is blocked on belongs to child of that object due to (1).
It means that one of the operations is cross-directory rename.
Otherwise the set of contended objects would be infinite - each of them
would have a contended child and we had assumed that no object is its
own descendent. Moreover, there is exactly one cross-directory rename
(see above).
Consider the object blocking the cross-directory rename. One of
its descendents is locked by cross-directory rename (otherwise we would again
have an infinite set of of contended objects). But that means that means
that cross-directory rename is taking locks out of order. Due to (2) the
order hadn't changed since we had acquired filesystem lock. But locking
rules for cross-directory rename guarantee that we do not try to acquire
lock on descendent before the lock on ancestor. Contradiction. I.e.
deadlock is impossible. Q.E.D.
These operations are guaranteed to avoid loop creation. Indeed,
the only operation that could introduce loops is cross-directory rename.
Since the only new (parent, child) pair added by rename() is (new parent,
source), such loop would have to contain these objects and the rest of it
would have to exist before rename(). I.e. at the moment of loop creation
rename() responsible for that would be holding filesystem lock and new parent
would have to be equal to or a descendent of source. But that means that
new parent had been equal to or a descendent of source since the moment when
we had acquired filesystem lock and rename() would fail with -ELOOP in that
case.
While this locking scheme works for arbitrary DAGs, it relies on
ability to check that directory is a descendent of another object. Current
implementation assumes that directory graph is a tree. This assumption is
also preserved by all operations (cross-directory rename on a tree that would
not introduce a cycle will leave it a tree and link() fails for directories).
Notice that "directory" in the above == "anything that might have
children", so if we are going to introduce hybrid objects we will need
either to make sure that link(2) doesn't work for them or to make changes
in is_subdir() that would make it work even in presense of such beasts.
...@@ -472,11 +472,9 @@ static ssize_t bm_entry_write(struct file *file, const char *buffer, ...@@ -472,11 +472,9 @@ static ssize_t bm_entry_write(struct file *file, const char *buffer,
break; break;
case 3: root = dget(file->f_vfsmnt->mnt_sb->s_root); case 3: root = dget(file->f_vfsmnt->mnt_sb->s_root);
down(&root->d_inode->i_sem); down(&root->d_inode->i_sem);
down(&root->d_inode->i_zombie);
kill_node(e); kill_node(e);
up(&root->d_inode->i_zombie);
up(&root->d_inode->i_sem); up(&root->d_inode->i_sem);
dput(root); dput(root);
break; break;
...@@ -516,8 +514,6 @@ static ssize_t bm_register_write(struct file *file, const char *buffer, ...@@ -516,8 +514,6 @@ static ssize_t bm_register_write(struct file *file, const char *buffer,
if (IS_ERR(dentry)) if (IS_ERR(dentry))
goto out; goto out;
down(&root->d_inode->i_zombie);
err = -EEXIST; err = -EEXIST;
if (dentry->d_inode) if (dentry->d_inode)
goto out2; goto out2;
...@@ -556,7 +552,6 @@ static ssize_t bm_register_write(struct file *file, const char *buffer, ...@@ -556,7 +552,6 @@ static ssize_t bm_register_write(struct file *file, const char *buffer,
mntput(mnt); mntput(mnt);
err = 0; err = 0;
out2: out2:
up(&root->d_inode->i_zombie);
dput(dentry); dput(dentry);
out: out:
up(&root->d_inode->i_sem); up(&root->d_inode->i_sem);
...@@ -605,12 +600,10 @@ static ssize_t bm_status_write(struct file * file, const char * buffer, ...@@ -605,12 +600,10 @@ static ssize_t bm_status_write(struct file * file, const char * buffer,
case 2: enabled = 1; break; case 2: enabled = 1; break;
case 3: root = dget(file->f_vfsmnt->mnt_sb->s_root); case 3: root = dget(file->f_vfsmnt->mnt_sb->s_root);
down(&root->d_inode->i_sem); down(&root->d_inode->i_sem);
down(&root->d_inode->i_zombie);
while (!list_empty(&entries)) while (!list_empty(&entries))
kill_node(list_entry(entries.next, Node, list)); kill_node(list_entry(entries.next, Node, list));
up(&root->d_inode->i_zombie);
up(&root->d_inode->i_sem); up(&root->d_inode->i_sem);
dput(root); dput(root);
default: return res; default: return res;
......
...@@ -143,7 +143,6 @@ void inode_init_once(struct inode *inode) ...@@ -143,7 +143,6 @@ void inode_init_once(struct inode *inode)
INIT_LIST_HEAD(&inode->i_dirty_data_buffers); INIT_LIST_HEAD(&inode->i_dirty_data_buffers);
INIT_LIST_HEAD(&inode->i_devices); INIT_LIST_HEAD(&inode->i_devices);
sema_init(&inode->i_sem, 1); sema_init(&inode->i_sem, 1);
sema_init(&inode->i_zombie, 1);
spin_lock_init(&inode->i_data.i_shared_lock); spin_lock_init(&inode->i_data.i_shared_lock);
} }
......
...@@ -93,6 +93,11 @@ ...@@ -93,6 +93,11 @@
* hopefully we will be able to get rid of that wart in 2.5. So far only * hopefully we will be able to get rid of that wart in 2.5. So far only
* XEmacs seems to be relying on it... * XEmacs seems to be relying on it...
*/ */
/*
* [Sep 2001 AV] Single-semaphore locking scheme (kudos to David Holland)
* implemented. Let's see if raised priority of ->s_vfs_rename_sem gives
* any extra contention...
*/
/* In order to reduce some races, while at the same time doing additional /* In order to reduce some races, while at the same time doing additional
* checking and hopefully speeding things up, we copy filenames to the * checking and hopefully speeding things up, we copy filenames to the
...@@ -931,28 +936,67 @@ static inline int lookup_flags(unsigned int f) ...@@ -931,28 +936,67 @@ static inline int lookup_flags(unsigned int f)
return retval; return retval;
} }
int vfs_create(struct inode *dir, struct dentry *dentry, int mode) /*
* p1 and p2 should be directories on the same fs.
*/
struct dentry *lock_rename(struct dentry *p1, struct dentry *p2)
{ {
int error; struct dentry *p;
mode &= S_IALLUGO; if (p1 == p2) {
mode |= S_IFREG; down(&p1->d_inode->i_sem);
return NULL;
}
down(&p1->d_inode->i_sb->s_vfs_rename_sem);
for (p = p1; p->d_parent != p; p = p->d_parent) {
if (p->d_parent == p2) {
down(&p2->d_inode->i_sem);
down(&p1->d_inode->i_sem);
return p;
}
}
for (p = p2; p->d_parent != p; p = p->d_parent) {
if (p->d_parent == p1) {
down(&p1->d_inode->i_sem);
down(&p2->d_inode->i_sem);
return p;
}
}
down(&p1->d_inode->i_sem);
down(&p2->d_inode->i_sem);
return NULL;
}
void unlock_rename(struct dentry *p1, struct dentry *p2)
{
up(&p1->d_inode->i_sem);
if (p1 != p2) {
up(&p2->d_inode->i_sem);
up(&p1->d_inode->i_sb->s_vfs_rename_sem);
}
}
int vfs_create(struct inode *dir, struct dentry *dentry, int mode)
{
int error = may_create(dir, dentry);
down(&dir->i_zombie);
error = may_create(dir, dentry);
if (error) if (error)
goto exit_lock; return error;
error = -EACCES; /* shouldn't it be ENOSYS? */
if (!dir->i_op || !dir->i_op->create) if (!dir->i_op || !dir->i_op->create)
goto exit_lock; return -EACCES; /* shouldn't it be ENOSYS? */
DQUOT_INIT(dir); DQUOT_INIT(dir);
mode &= S_IALLUGO;
mode |= S_IFREG;
lock_kernel(); lock_kernel();
error = dir->i_op->create(dir, dentry, mode); error = dir->i_op->create(dir, dentry, mode);
unlock_kernel(); unlock_kernel();
exit_lock:
up(&dir->i_zombie);
if (!error) if (!error)
inode_dir_notify(dir, DN_CREATE); inode_dir_notify(dir, DN_CREATE);
return error; return error;
...@@ -1212,26 +1256,21 @@ static struct dentry *lookup_create(struct nameidata *nd, int is_dir) ...@@ -1212,26 +1256,21 @@ static struct dentry *lookup_create(struct nameidata *nd, int is_dir)
int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
{ {
int error = -EPERM; int error = may_create(dir, dentry);
down(&dir->i_zombie);
if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
goto exit_lock;
error = may_create(dir, dentry);
if (error) if (error)
goto exit_lock; return error;
if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
return -EPERM;
error = -EPERM;
if (!dir->i_op || !dir->i_op->mknod) if (!dir->i_op || !dir->i_op->mknod)
goto exit_lock; return -EPERM;
DQUOT_INIT(dir); DQUOT_INIT(dir);
lock_kernel(); lock_kernel();
error = dir->i_op->mknod(dir, dentry, mode, dev); error = dir->i_op->mknod(dir, dentry, mode, dev);
unlock_kernel(); unlock_kernel();
exit_lock:
up(&dir->i_zombie);
if (!error) if (!error)
inode_dir_notify(dir, DN_CREATE); inode_dir_notify(dir, DN_CREATE);
return error; return error;
...@@ -1284,25 +1323,19 @@ asmlinkage long sys_mknod(const char * filename, int mode, dev_t dev) ...@@ -1284,25 +1323,19 @@ asmlinkage long sys_mknod(const char * filename, int mode, dev_t dev)
int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
{ {
int error; int error = may_create(dir, dentry);
down(&dir->i_zombie);
error = may_create(dir, dentry);
if (error) if (error)
goto exit_lock; return error;
error = -EPERM;
if (!dir->i_op || !dir->i_op->mkdir) if (!dir->i_op || !dir->i_op->mkdir)
goto exit_lock; return -EPERM;
DQUOT_INIT(dir); DQUOT_INIT(dir);
mode &= (S_IRWXUGO|S_ISVTX); mode &= (S_IRWXUGO|S_ISVTX);
lock_kernel(); lock_kernel();
error = dir->i_op->mkdir(dir, dentry, mode); error = dir->i_op->mkdir(dir, dentry, mode);
unlock_kernel(); unlock_kernel();
exit_lock:
up(&dir->i_zombie);
if (!error) if (!error)
inode_dir_notify(dir, DN_CREATE); inode_dir_notify(dir, DN_CREATE);
return error; return error;
...@@ -1369,9 +1402,8 @@ static void d_unhash(struct dentry *dentry) ...@@ -1369,9 +1402,8 @@ static void d_unhash(struct dentry *dentry)
int vfs_rmdir(struct inode *dir, struct dentry *dentry) int vfs_rmdir(struct inode *dir, struct dentry *dentry)
{ {
int error; int error = may_delete(dir, dentry, 1);
error = may_delete(dir, dentry, 1);
if (error) if (error)
return error; return error;
...@@ -1380,7 +1412,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry) ...@@ -1380,7 +1412,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
DQUOT_INIT(dir); DQUOT_INIT(dir);
double_down(&dir->i_zombie, &dentry->d_inode->i_zombie); down(&dentry->d_inode->i_sem);
d_unhash(dentry); d_unhash(dentry);
if (IS_DEADDIR(dir)) if (IS_DEADDIR(dir))
error = -ENOENT; error = -ENOENT;
...@@ -1393,7 +1425,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry) ...@@ -1393,7 +1425,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
if (!error) if (!error)
dentry->d_inode->i_flags |= S_DEAD; dentry->d_inode->i_flags |= S_DEAD;
} }
double_up(&dir->i_zombie, &dentry->d_inode->i_zombie); up(&dentry->d_inode->i_sem);
if (!error) { if (!error) {
inode_dir_notify(dir, DN_DELETE); inode_dir_notify(dir, DN_DELETE);
d_delete(dentry); d_delete(dentry);
...@@ -1447,14 +1479,18 @@ asmlinkage long sys_rmdir(const char * pathname) ...@@ -1447,14 +1479,18 @@ asmlinkage long sys_rmdir(const char * pathname)
int vfs_unlink(struct inode *dir, struct dentry *dentry) int vfs_unlink(struct inode *dir, struct dentry *dentry)
{ {
int error; int error = may_delete(dir, dentry, 0);
if (error)
return error;
if (!dir->i_op || !dir->i_op->unlink)
return -EPERM;
down(&dir->i_zombie);
error = may_delete(dir, dentry, 0);
if (!error) {
error = -EPERM;
if (dir->i_op && dir->i_op->unlink) {
DQUOT_INIT(dir); DQUOT_INIT(dir);
dget(dentry);
down(&dentry->d_inode->i_sem);
if (d_mountpoint(dentry)) if (d_mountpoint(dentry))
error = -EBUSY; error = -EBUSY;
else { else {
...@@ -1464,11 +1500,12 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry) ...@@ -1464,11 +1500,12 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
if (!error) if (!error)
d_delete(dentry); d_delete(dentry);
} }
} up(&dentry->d_inode->i_sem);
} dput(dentry);
up(&dir->i_zombie);
if (!error) if (!error)
inode_dir_notify(dir, DN_DELETE); inode_dir_notify(dir, DN_DELETE);
return error; return error;
} }
...@@ -1517,24 +1554,18 @@ asmlinkage long sys_unlink(const char * pathname) ...@@ -1517,24 +1554,18 @@ asmlinkage long sys_unlink(const char * pathname)
int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname) int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname)
{ {
int error; int error = may_create(dir, dentry);
down(&dir->i_zombie);
error = may_create(dir, dentry);
if (error) if (error)
goto exit_lock; return error;
error = -EPERM;
if (!dir->i_op || !dir->i_op->symlink) if (!dir->i_op || !dir->i_op->symlink)
goto exit_lock; return -EPERM;
DQUOT_INIT(dir); DQUOT_INIT(dir);
lock_kernel(); lock_kernel();
error = dir->i_op->symlink(dir, dentry, oldname); error = dir->i_op->symlink(dir, dentry, oldname);
unlock_kernel(); unlock_kernel();
exit_lock:
up(&dir->i_zombie);
if (!error) if (!error)
inode_dir_notify(dir, DN_CREATE); inode_dir_notify(dir, DN_CREATE);
return error; return error;
...@@ -1576,39 +1607,31 @@ asmlinkage long sys_symlink(const char * oldname, const char * newname) ...@@ -1576,39 +1607,31 @@ asmlinkage long sys_symlink(const char * oldname, const char * newname)
int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry) int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry)
{ {
struct inode *inode; struct inode *inode = old_dentry->d_inode;
int error; int error;
down(&dir->i_zombie);
error = -ENOENT;
inode = old_dentry->d_inode;
if (!inode) if (!inode)
goto exit_lock; return -ENOENT;
error = may_create(dir, new_dentry); error = may_create(dir, new_dentry);
if (error) if (error)
goto exit_lock; return error;
error = -EXDEV;
if (dir->i_sb != inode->i_sb) if (dir->i_sb != inode->i_sb)
goto exit_lock; return -EXDEV;
/* /*
* A link to an append-only or immutable file cannot be created. * A link to an append-only or immutable file cannot be created.
*/ */
error = -EPERM;
if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
goto exit_lock; return -EPERM;
if (!dir->i_op || !dir->i_op->link) if (!dir->i_op || !dir->i_op->link)
goto exit_lock; return -EPERM;
DQUOT_INIT(dir); DQUOT_INIT(dir);
lock_kernel(); lock_kernel();
error = dir->i_op->link(old_dentry, dir, new_dentry); error = dir->i_op->link(old_dentry, dir, new_dentry);
unlock_kernel(); unlock_kernel();
exit_lock:
up(&dir->i_zombie);
if (!error) if (!error)
inode_dir_notify(dir, DN_CREATE); inode_dir_notify(dir, DN_CREATE);
return error; return error;
...@@ -1680,17 +1703,23 @@ asmlinkage long sys_link(const char * oldname, const char * newname) ...@@ -1680,17 +1703,23 @@ asmlinkage long sys_link(const char * oldname, const char * newname)
* story. * story.
* c) we have to lock _three_ objects - parents and victim (if it exists). * c) we have to lock _three_ objects - parents and victim (if it exists).
* And that - after we got ->i_sem on parents (until then we don't know * And that - after we got ->i_sem on parents (until then we don't know
* whether the target exists at all, let alone whether it is a directory * whether the target exists). Solution: try to be smart with locking
* or not). Solution: ->i_zombie. Taken only after ->i_sem. Always taken * order for inodes. We rely on the fact that tree topology may change
* on link creation/removal of any kind. And taken (without ->i_sem) on * only under ->s_vfs_rename_sem _and_ that parent of the object we
* directory that will be removed (both in rmdir() and here). * move will be locked. Thus we can rank directories by the tree
* (ancestors first) and rank all non-directories after them.
* That works since everybody except rename does "lock parent, lookup,
* lock child" and rename is under ->s_vfs_rename_sem.
* HOWEVER, it relies on the assumption that any object with ->lookup()
* has no more than 1 dentry. If "hybrid" objects will ever appear,
* we'd better make sure that there's no link(2) for them.
* d) some filesystems don't support opened-but-unlinked directories, * d) some filesystems don't support opened-but-unlinked directories,
* either because of layout or because they are not ready to deal with * either because of layout or because they are not ready to deal with
* all cases correctly. The latter will be fixed (taking this sort of * all cases correctly. The latter will be fixed (taking this sort of
* stuff into VFS), but the former is not going away. Solution: the same * stuff into VFS), but the former is not going away. Solution: the same
* trick as in rmdir(). * trick as in rmdir().
* e) conversion from fhandle to dentry may come in the wrong moment - when * e) conversion from fhandle to dentry may come in the wrong moment - when
* we are removing the target. Solution: we will have to grab ->i_zombie * we are removing the target. Solution: we will have to grab ->i_sem
* in the fhandle_to_dentry code. [FIXME - current nfsfh.c relies on * in the fhandle_to_dentry code. [FIXME - current nfsfh.c relies on
* ->i_sem on parents, which works but leads to some truely excessive * ->i_sem on parents, which works but leads to some truely excessive
* locking]. * locking].
...@@ -1698,131 +1727,96 @@ asmlinkage long sys_link(const char * oldname, const char * newname) ...@@ -1698,131 +1727,96 @@ asmlinkage long sys_link(const char * oldname, const char * newname)
int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry, int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry) struct inode *new_dir, struct dentry *new_dentry)
{ {
int error; int error = 0;
struct inode *target; struct inode *target;
if (old_dentry->d_inode == new_dentry->d_inode)
return 0;
error = may_delete(old_dir, old_dentry, 1);
if (error)
return error;
if (new_dir->i_sb != old_dir->i_sb)
return -EXDEV;
if (!new_dentry->d_inode)
error = may_create(new_dir, new_dentry);
else
error = may_delete(new_dir, new_dentry, 1);
if (error)
return error;
if (!old_dir->i_op || !old_dir->i_op->rename)
return -EPERM;
/* /*
* If we are going to change the parent - check write permissions, * If we are going to change the parent - check write permissions,
* we'll need to flip '..'. * we'll need to flip '..'.
*/ */
if (new_dir != old_dir) { if (new_dir != old_dir)
error = permission(old_dentry->d_inode, MAY_WRITE); error = permission(old_dentry->d_inode, MAY_WRITE);
}
if (error) if (error)
return error; return error;
DQUOT_INIT(old_dir);
DQUOT_INIT(new_dir);
down(&old_dir->i_sb->s_vfs_rename_sem);
error = -EINVAL;
if (is_subdir(new_dentry, old_dentry))
goto out_unlock;
/* Don't eat your daddy, dear... */
/* This also avoids locking issues */
if (old_dentry->d_parent == new_dentry)
goto out_unlock;
target = new_dentry->d_inode; target = new_dentry->d_inode;
if (target) { /* Hastur! Hastur! Hastur! */ if (target) {
triple_down(&old_dir->i_zombie, down(&target->i_sem);
&new_dir->i_zombie,
&target->i_zombie);
d_unhash(new_dentry); d_unhash(new_dentry);
} else }
double_down(&old_dir->i_zombie, if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
&new_dir->i_zombie);
if (IS_DEADDIR(old_dir)||IS_DEADDIR(new_dir))
error = -ENOENT;
else if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
error = -EBUSY; error = -EBUSY;
else else
error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry); error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
if (target) { if (target) {
if (!error) if (!error)
target->i_flags |= S_DEAD; target->i_flags |= S_DEAD;
triple_up(&old_dir->i_zombie, up(&target->i_sem);
&new_dir->i_zombie,
&target->i_zombie);
if (d_unhashed(new_dentry)) if (d_unhashed(new_dentry))
d_rehash(new_dentry); d_rehash(new_dentry);
dput(new_dentry); dput(new_dentry);
} else }
double_up(&old_dir->i_zombie,
&new_dir->i_zombie);
if (!error) if (!error)
d_move(old_dentry,new_dentry); d_move(old_dentry,new_dentry);
out_unlock:
up(&old_dir->i_sb->s_vfs_rename_sem);
return error; return error;
} }
int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry, int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry) struct inode *new_dir, struct dentry *new_dentry)
{ {
struct inode *target;
int error; int error;
dget(new_dentry);
target = new_dentry->d_inode;
if (target)
down(&target->i_sem);
if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
error = -EBUSY;
else
error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
if (!error) {
/* The following d_move() should become unconditional */
if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME)) {
d_move(old_dentry, new_dentry);
}
}
if (target)
up(&target->i_sem);
dput(new_dentry);
return error;
}
int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry)
{
int error;
int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
if (old_dentry->d_inode == new_dentry->d_inode) if (old_dentry->d_inode == new_dentry->d_inode)
return 0; return 0;
error = may_delete(old_dir, old_dentry, 0); error = may_delete(old_dir, old_dentry, is_dir);
if (error) if (error)
return error; return error;
if (new_dir->i_sb != old_dir->i_sb)
return -EXDEV;
if (!new_dentry->d_inode) if (!new_dentry->d_inode)
error = may_create(new_dir, new_dentry); error = may_create(new_dir, new_dentry);
else else
error = may_delete(new_dir, new_dentry, 0); error = may_delete(new_dir, new_dentry, is_dir);
if (error) if (error)
return error; return error;
if (!old_dir->i_op || !old_dir->i_op->rename) if (!old_dir->i_op || !old_dir->i_op->rename)
return -EPERM; return -EPERM;
if (IS_DEADDIR(old_dir)||IS_DEADDIR(new_dir))
return -ENOENT;
DQUOT_INIT(old_dir); DQUOT_INIT(old_dir);
DQUOT_INIT(new_dir); DQUOT_INIT(new_dir);
double_down(&old_dir->i_zombie, &new_dir->i_zombie);
if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
error = -EBUSY;
else
error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
double_up(&old_dir->i_zombie, &new_dir->i_zombie);
if (error)
return error;
/* The following d_move() should become unconditional */
if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME)) {
d_move(old_dentry, new_dentry);
}
return 0;
}
int vfs_rename(struct inode *old_dir, struct dentry *old_dentry, if (is_dir)
struct inode *new_dir, struct dentry *new_dentry)
{
int error;
if (S_ISDIR(old_dentry->d_inode->i_mode))
error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry); error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
else else
error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry); error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
...@@ -1842,6 +1836,7 @@ static inline int do_rename(const char * oldname, const char * newname) ...@@ -1842,6 +1836,7 @@ static inline int do_rename(const char * oldname, const char * newname)
int error = 0; int error = 0;
struct dentry * old_dir, * new_dir; struct dentry * old_dir, * new_dir;
struct dentry * old_dentry, *new_dentry; struct dentry * old_dentry, *new_dentry;
struct dentry * trap;
struct nameidata oldnd, newnd; struct nameidata oldnd, newnd;
if (path_init(oldname, LOOKUP_PARENT, &oldnd)) if (path_init(oldname, LOOKUP_PARENT, &oldnd))
...@@ -1868,7 +1863,7 @@ static inline int do_rename(const char * oldname, const char * newname) ...@@ -1868,7 +1863,7 @@ static inline int do_rename(const char * oldname, const char * newname)
if (newnd.last_type != LAST_NORM) if (newnd.last_type != LAST_NORM)
goto exit2; goto exit2;
double_lock(new_dir, old_dir); trap = lock_rename(new_dir, old_dir);
old_dentry = lookup_hash(&oldnd.last, old_dir); old_dentry = lookup_hash(&oldnd.last, old_dir);
error = PTR_ERR(old_dentry); error = PTR_ERR(old_dentry);
...@@ -1886,21 +1881,30 @@ static inline int do_rename(const char * oldname, const char * newname) ...@@ -1886,21 +1881,30 @@ static inline int do_rename(const char * oldname, const char * newname)
if (newnd.last.name[newnd.last.len]) if (newnd.last.name[newnd.last.len])
goto exit4; goto exit4;
} }
/* source should not be ancestor of target */
error = -EINVAL;
if (old_dentry == trap)
goto exit4;
new_dentry = lookup_hash(&newnd.last, new_dir); new_dentry = lookup_hash(&newnd.last, new_dir);
error = PTR_ERR(new_dentry); error = PTR_ERR(new_dentry);
if (IS_ERR(new_dentry)) if (IS_ERR(new_dentry))
goto exit4; goto exit4;
/* target should not be an ancestor of source */
error = -ENOTEMPTY;
if (new_dentry == trap)
goto exit5;
lock_kernel(); lock_kernel();
error = vfs_rename(old_dir->d_inode, old_dentry, error = vfs_rename(old_dir->d_inode, old_dentry,
new_dir->d_inode, new_dentry); new_dir->d_inode, new_dentry);
unlock_kernel(); unlock_kernel();
exit5:
dput(new_dentry); dput(new_dentry);
exit4: exit4:
dput(old_dentry); dput(old_dentry);
exit3: exit3:
double_up(&new_dir->d_inode->i_sem, &old_dir->d_inode->i_sem); unlock_rename(new_dir, old_dir);
exit2: exit2:
path_release(&newnd); path_release(&newnd);
exit1: exit1:
......
...@@ -466,7 +466,7 @@ static int graft_tree(struct vfsmount *mnt, struct nameidata *nd) ...@@ -466,7 +466,7 @@ static int graft_tree(struct vfsmount *mnt, struct nameidata *nd)
return -ENOTDIR; return -ENOTDIR;
err = -ENOENT; err = -ENOENT;
down(&nd->dentry->d_inode->i_zombie); down(&nd->dentry->d_inode->i_sem);
if (IS_DEADDIR(nd->dentry->d_inode)) if (IS_DEADDIR(nd->dentry->d_inode))
goto out_unlock; goto out_unlock;
...@@ -481,7 +481,7 @@ static int graft_tree(struct vfsmount *mnt, struct nameidata *nd) ...@@ -481,7 +481,7 @@ static int graft_tree(struct vfsmount *mnt, struct nameidata *nd)
} }
spin_unlock(&dcache_lock); spin_unlock(&dcache_lock);
out_unlock: out_unlock:
up(&nd->dentry->d_inode->i_zombie); up(&nd->dentry->d_inode->i_sem);
return err; return err;
} }
...@@ -577,7 +577,7 @@ static int do_move_mount(struct nameidata *nd, char *old_name) ...@@ -577,7 +577,7 @@ static int do_move_mount(struct nameidata *nd, char *old_name)
goto out; goto out;
err = -ENOENT; err = -ENOENT;
down(&nd->dentry->d_inode->i_zombie); down(&nd->dentry->d_inode->i_sem);
if (IS_DEADDIR(nd->dentry->d_inode)) if (IS_DEADDIR(nd->dentry->d_inode))
goto out1; goto out1;
...@@ -607,7 +607,7 @@ static int do_move_mount(struct nameidata *nd, char *old_name) ...@@ -607,7 +607,7 @@ static int do_move_mount(struct nameidata *nd, char *old_name)
out2: out2:
spin_unlock(&dcache_lock); spin_unlock(&dcache_lock);
out1: out1:
up(&nd->dentry->d_inode->i_zombie); up(&nd->dentry->d_inode->i_sem);
out: out:
up_write(&current->namespace->sem); up_write(&current->namespace->sem);
if (!err) if (!err)
...@@ -949,7 +949,7 @@ asmlinkage long sys_pivot_root(const char *new_root, const char *put_old) ...@@ -949,7 +949,7 @@ asmlinkage long sys_pivot_root(const char *new_root, const char *put_old)
user_nd.dentry = dget(current->fs->root); user_nd.dentry = dget(current->fs->root);
read_unlock(&current->fs->lock); read_unlock(&current->fs->lock);
down_write(&current->namespace->sem); down_write(&current->namespace->sem);
down(&old_nd.dentry->d_inode->i_zombie); down(&old_nd.dentry->d_inode->i_sem);
error = -EINVAL; error = -EINVAL;
if (!check_mnt(user_nd.mnt)) if (!check_mnt(user_nd.mnt))
goto out2; goto out2;
...@@ -992,7 +992,7 @@ asmlinkage long sys_pivot_root(const char *new_root, const char *put_old) ...@@ -992,7 +992,7 @@ asmlinkage long sys_pivot_root(const char *new_root, const char *put_old)
path_release(&root_parent); path_release(&root_parent);
path_release(&parent_nd); path_release(&parent_nd);
out2: out2:
up(&old_nd.dentry->d_inode->i_zombie); up(&old_nd.dentry->d_inode->i_sem);
up_write(&current->namespace->sem); up_write(&current->namespace->sem);
path_release(&user_nd); path_release(&user_nd);
path_release(&old_nd); path_release(&old_nd);
......
...@@ -1226,7 +1226,7 @@ int ...@@ -1226,7 +1226,7 @@ int
nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
struct svc_fh *tfhp, char *tname, int tlen) struct svc_fh *tfhp, char *tname, int tlen)
{ {
struct dentry *fdentry, *tdentry, *odentry, *ndentry; struct dentry *fdentry, *tdentry, *odentry, *ndentry, *trap;
struct inode *fdir, *tdir; struct inode *fdir, *tdir;
int err; int err;
...@@ -1253,7 +1253,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, ...@@ -1253,7 +1253,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
/* cannot use fh_lock as we need deadlock protective ordering /* cannot use fh_lock as we need deadlock protective ordering
* so do it by hand */ * so do it by hand */
double_down(&tdir->i_sem, &fdir->i_sem); trap = lock_rename(tdentry, fdentry);
ffhp->fh_locked = tfhp->fh_locked = 1; ffhp->fh_locked = tfhp->fh_locked = 1;
fill_pre_wcc(ffhp); fill_pre_wcc(ffhp);
fill_pre_wcc(tfhp); fill_pre_wcc(tfhp);
...@@ -1266,12 +1266,17 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, ...@@ -1266,12 +1266,17 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
err = -ENOENT; err = -ENOENT;
if (!odentry->d_inode) if (!odentry->d_inode)
goto out_dput_old; goto out_dput_old;
err = -EINVAL;
if (odentry == trap)
goto out_dput_old;
ndentry = lookup_one_len(tname, tdentry, tlen); ndentry = lookup_one_len(tname, tdentry, tlen);
err = PTR_ERR(ndentry); err = PTR_ERR(ndentry);
if (IS_ERR(ndentry)) if (IS_ERR(ndentry))
goto out_dput_old; goto out_dput_old;
err = -ENOTEMPTY;
if (ndentry == trap)
goto out_dput_new;
#ifdef MSNFS #ifdef MSNFS
if ((ffhp->fh_export->ex_flags & NFSEXP_MSNFS) && if ((ffhp->fh_export->ex_flags & NFSEXP_MSNFS) &&
...@@ -1287,6 +1292,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, ...@@ -1287,6 +1292,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
} }
dput(ndentry); dput(ndentry);
out_dput_new:
dput(ndentry);
out_dput_old: out_dput_old:
dput(odentry); dput(odentry);
out_nfserr: out_nfserr:
...@@ -1299,7 +1306,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, ...@@ -1299,7 +1306,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
*/ */
fill_post_wcc(ffhp); fill_post_wcc(ffhp);
fill_post_wcc(tfhp); fill_post_wcc(tfhp);
double_up(&tdir->i_sem, &fdir->i_sem); unlock_rename(tdentry, fdentry);
ffhp->fh_locked = tfhp->fh_locked = 0; ffhp->fh_locked = tfhp->fh_locked = 0;
out: out:
......
...@@ -21,14 +21,12 @@ int vfs_readdir(struct file *file, filldir_t filler, void *buf) ...@@ -21,14 +21,12 @@ int vfs_readdir(struct file *file, filldir_t filler, void *buf)
if (!file->f_op || !file->f_op->readdir) if (!file->f_op || !file->f_op->readdir)
goto out; goto out;
down(&inode->i_sem); down(&inode->i_sem);
down(&inode->i_zombie);
res = -ENOENT; res = -ENOENT;
if (!IS_DEADDIR(inode)) { if (!IS_DEADDIR(inode)) {
lock_kernel(); lock_kernel();
res = file->f_op->readdir(file, buf, filler); res = file->f_op->readdir(file, buf, filler);
unlock_kernel(); unlock_kernel();
} }
up(&inode->i_zombie);
up(&inode->i_sem); up(&inode->i_sem);
out: out:
return res; return res;
......
...@@ -425,7 +425,6 @@ struct inode { ...@@ -425,7 +425,6 @@ struct inode {
unsigned long i_blocks; unsigned long i_blocks;
unsigned long i_version; unsigned long i_version;
struct semaphore i_sem; struct semaphore i_sem;
struct semaphore i_zombie;
struct inode_operations *i_op; struct inode_operations *i_op;
struct file_operations *i_fop; /* former ->i_op->default_file_ops */ struct file_operations *i_fop; /* former ->i_op->default_file_ops */
struct super_block *i_sb; struct super_block *i_sb;
...@@ -759,6 +758,9 @@ extern int vfs_rmdir(struct inode *, struct dentry *); ...@@ -759,6 +758,9 @@ extern int vfs_rmdir(struct inode *, struct dentry *);
extern int vfs_unlink(struct inode *, struct dentry *); extern int vfs_unlink(struct inode *, struct dentry *);
extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *); extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
extern struct dentry *lock_rename(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *);
/* /*
* File types * File types
*/ */
...@@ -1505,131 +1507,6 @@ extern int generic_osync_inode(struct inode *, int); ...@@ -1505,131 +1507,6 @@ extern int generic_osync_inode(struct inode *, int);
extern int inode_change_ok(struct inode *, struct iattr *); extern int inode_change_ok(struct inode *, struct iattr *);
extern int inode_setattr(struct inode *, struct iattr *); extern int inode_setattr(struct inode *, struct iattr *);
/*
* Common dentry functions for inclusion in the VFS
* or in other stackable file systems. Some of these
* functions were in linux/fs/ C (VFS) files.
*
*/
/*
* Locking the parent is needed to:
* - serialize directory operations
* - make sure the parent doesn't change from
* under us in the middle of an operation.
*
* NOTE! Right now we'd rather use a "struct inode"
* for this, but as I expect things to move toward
* using dentries instead for most things it is
* probably better to start with the conceptually
* better interface of relying on a path of dentries.
*/
static inline struct dentry *lock_parent(struct dentry *dentry)
{
struct dentry *dir = dget(dentry->d_parent);
down(&dir->d_inode->i_sem);
return dir;
}
static inline struct dentry *get_parent(struct dentry *dentry)
{
return dget(dentry->d_parent);
}
static inline void unlock_dir(struct dentry *dir)
{
up(&dir->d_inode->i_sem);
dput(dir);
}
/*
* Whee.. Deadlock country. Happily there are only two VFS
* operations that does this..
*/
static inline void double_down(struct semaphore *s1, struct semaphore *s2)
{
if (s1 != s2) {
if ((unsigned long) s1 < (unsigned long) s2) {
struct semaphore *tmp = s2;
s2 = s1; s1 = tmp;
}
down(s1);
}
down(s2);
}
/*
* Ewwwwwwww... _triple_ lock. We are guaranteed that the 3rd argument is
* not equal to 1st and not equal to 2nd - the first case (target is parent of
* source) would be already caught, the second is plain impossible (target is
* its own parent and that case would be caught even earlier). Very messy.
* I _think_ that it works, but no warranties - please, look it through.
* Pox on bloody lusers who mandated overwriting rename() for directories...
*/
static inline void triple_down(struct semaphore *s1,
struct semaphore *s2,
struct semaphore *s3)
{
if (s1 != s2) {
if ((unsigned long) s1 < (unsigned long) s2) {
if ((unsigned long) s1 < (unsigned long) s3) {
struct semaphore *tmp = s3;
s3 = s1; s1 = tmp;
}
if ((unsigned long) s1 < (unsigned long) s2) {
struct semaphore *tmp = s2;
s2 = s1; s1 = tmp;
}
} else {
if ((unsigned long) s1 < (unsigned long) s3) {
struct semaphore *tmp = s3;
s3 = s1; s1 = tmp;
}
if ((unsigned long) s2 < (unsigned long) s3) {
struct semaphore *tmp = s3;
s3 = s2; s2 = tmp;
}
}
down(s1);
} else if ((unsigned long) s2 < (unsigned long) s3) {
struct semaphore *tmp = s3;
s3 = s2; s2 = tmp;
}
down(s2);
down(s3);
}
static inline void double_up(struct semaphore *s1, struct semaphore *s2)
{
up(s1);
if (s1 != s2)
up(s2);
}
static inline void triple_up(struct semaphore *s1,
struct semaphore *s2,
struct semaphore *s3)
{
up(s1);
if (s1 != s2)
up(s2);
up(s3);
}
static inline void double_lock(struct dentry *d1, struct dentry *d2)
{
double_down(&d1->d_inode->i_sem, &d2->d_inode->i_sem);
}
static inline void double_unlock(struct dentry *d1, struct dentry *d2)
{
double_up(&d1->d_inode->i_sem,&d2->d_inode->i_sem);
dput(d1);
dput(d2);
}
#endif /* __KERNEL__ */ #endif /* __KERNEL__ */
#endif /* _LINUX_FS_H */ #endif /* _LINUX_FS_H */
...@@ -253,6 +253,8 @@ EXPORT_SYMBOL(vfs_statfs); ...@@ -253,6 +253,8 @@ EXPORT_SYMBOL(vfs_statfs);
EXPORT_SYMBOL(vfs_fstat); EXPORT_SYMBOL(vfs_fstat);
EXPORT_SYMBOL(vfs_stat); EXPORT_SYMBOL(vfs_stat);
EXPORT_SYMBOL(vfs_lstat); EXPORT_SYMBOL(vfs_lstat);
EXPORT_SYMBOL(lock_rename);
EXPORT_SYMBOL(unlock_rename);
EXPORT_SYMBOL(generic_read_dir); EXPORT_SYMBOL(generic_read_dir);
EXPORT_SYMBOL(generic_file_llseek); EXPORT_SYMBOL(generic_file_llseek);
EXPORT_SYMBOL(remote_llseek); EXPORT_SYMBOL(remote_llseek);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment