1. 16 Jun, 2011 4 commits
    • Jan Kara's avatar
      vfs: Fix data corruption after failed write in __block_write_begin() · f9f07b6c
      Jan Kara authored
      I've got a report of a file corruption from fsxlinux on ext3. The important
      operations to the page were:
      mapwrite to a hole
      partial write to the page
      read - found the page zeroed from the end of the normal write
      
      The culprit seems to be that if get_block() fails in __block_write_begin()
      (e.g. transient ENOSPC in ext3), the function does ClearPageUptodate(page).
      Thus when we retry the write, the logic in __block_write_begin() thinks zeroing
      of the page is needed and overwrites old data.  In fact, I don't see why we
      should ever need to zero the uptodate bit here - either the page was uptodate
      when we entered __block_write_begin() and it should stay so when we leave it,
      or it was not uptodate and noone had right to set it uptodate during
      __block_write_begin() so it remains !uptodate when we leave as well. So just
      remove clearing of the bit.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f9f07b6c
    • Anton Blanchard's avatar
      afs: afs_fill_page reads too much, or wrong data · 5e7f2337
      Anton Blanchard authored
      afs_fill_page should read the page that is about to be written but
      the current implementation has a number of issues. If we aren't
      extending the file we always read PAGE_CACHE_SIZE at offset 0. If we
      are extending the file we try to read the entire file.
      
      Change afs_fill_page to read PAGE_CACHE_SIZE at the right offset,
      clamped to i_size.
      
      While here, avoid calling afs_fill_page when we are doing a
      PAGE_CACHE_SIZE write.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5e7f2337
    • Al Viro's avatar
      VFS: Fix vfsmount overput on simultaneous automount · 8aef1884
      Al Viro authored
      [Kudos to dhowells for tracking that crap down]
      
      If two processes attempt to cause automounting on the same mountpoint at the
      same time, the vfsmount holding the mountpoint will be left with one too few
      references on it, causing a BUG when the kernel tries to clean up.
      
      The problem is that lock_mount() drops the caller's reference to the
      mountpoint's vfsmount in the case where it finds something already mounted on
      the mountpoint as it transits to the mounted filesystem and replaces path->mnt
      with the new mountpoint vfsmount.
      
      During a pathwalk, however, we don't take a reference on the vfsmount if it is
      the same as the one in the nameidata struct, but do_add_mount() doesn't know
      this.
      
      The fix is to make sure we have a ref on the vfsmount of the mountpoint before
      calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
      left with an extra ref on the mountpoint vfsmount which needs releasing.
      We can handle that in follow_managed() by not making assumptions about what
      we can and what we cannot get from lookup_mnt() as the current code does.
      
      The callers of follow_managed() expect that reference to path->mnt will be
      grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
      keep track of whether such reference has been grabbed and assume that it'll
      happen in those and only those cases that'll have us return with changed
      path->mnt.  That assumption is almost correct - it breaks in case of
      racing automounts and in even harder to hit race between following a mountpoint
      and a couple of mount --move.  The thing is, we don't need to make that
      assumption at all - after the end of loop in follow_manage() we can check
      if path->mnt has ended up unchanged and do mntput() if needed.
      
      The BUG can be reproduced with the following test program:
      
      	#include <stdio.h>
      	#include <sys/types.h>
      	#include <sys/stat.h>
      	#include <unistd.h>
      	#include <sys/wait.h>
      	int main(int argc, char **argv)
      	{
      		int pid, ws;
      		struct stat buf;
      		pid = fork();
      		stat(argv[1], &buf);
      		if (pid > 0) wait(&ws);
      		return 0;
      	}
      
      and the following procedure:
      
       (1) Mount an NFS volume that on the server has something else mounted on a
           subdirectory.  For instance, I can mount / from my server:
      
      	mount warthog:/ /mnt -t nfs4 -r
      
           On the server /data has another filesystem mounted on it, so NFS will see
           a change in FSID as it walks down the path, and will mark /mnt/data as
           being a mountpoint.  This will cause the automount code to be triggered.
      
           !!! Do not look inside the mounted fs at this point !!!
      
       (2) Run the above program on a file within the submount to generate two
           simultaneous automount requests:
      
      	/tmp/forkstat /mnt/data/testfile
      
       (3) Unmount the automounted submount:
      
      	umount /mnt/data
      
       (4) Unmount the original mount:
      
      	umount /mnt
      
           At this point the kernel should throw a BUG with something like the
           following:
      
      	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]
      
      Note that the bug appears on the root dentry of the original mount, not the
      mountpoint and not the submount because sys_umount() hasn't got to its final
      mntput_no_expire() yet, but this isn't so obvious from the call trace:
      
       [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
       [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
       [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
       [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
       [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
       [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
       [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
       [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
       [<ffffffff811862a8>] mntput+0x3b/0x44
       [<ffffffff81186d87>] release_mounts+0xa2/0xbf
       [<ffffffff811876af>] sys_umount+0x47a/0x4ba
       [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
       [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b
      
      as do_umount() is inlined.  However, you can see release_mounts() in there.
      
      Note also that it may be necessary to have multiple CPU cores to be able to
      trigger this bug.
      Tested-by: default avatarJeff Layton <jlayton@redhat.com>
      Tested-by: default avatarIan Kent <raven@themaw.net>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8aef1884
    • Török Edwin's avatar
      fix wrong iput on d_inode introduced by e6bc45d6 · 50338b88
      Török Edwin authored
      Git bisection shows that commit e6bc45d6 causes
      BUG_ONs under high I/O load:
      
      kernel BUG at fs/inode.c:1368!
      [ 2862.501007] Call Trace:
      [ 2862.501007]  [<ffffffff811691d8>] d_kill+0xf8/0x140
      [ 2862.501007]  [<ffffffff81169c19>] dput+0xc9/0x190
      [ 2862.501007]  [<ffffffff8115577f>] fput+0x15f/0x210
      [ 2862.501007]  [<ffffffff81152171>] filp_close+0x61/0x90
      [ 2862.501007]  [<ffffffff81152251>] sys_close+0xb1/0x110
      [ 2862.501007]  [<ffffffff814c14fb>] system_call_fastpath+0x16/0x1b
      
      A reliable way to reproduce this bug is:
      Login to KDE, run 'rsnapshot sync', and apt-get install openjdk-6-jdk,
      and apt-get remove openjdk-6-jdk.
      
      The buggy part of the patch is this:
      	struct inode *inode = NULL;
      .....
      -               if (nd.last.name[nd.last.len])
      -                       goto slashes;
                      inode = dentry->d_inode;
      -               if (inode)
      -                       ihold(inode);
      +               if (nd.last.name[nd.last.len] || !inode)
      +                       goto slashes;
      +               ihold(inode)
      ...
      	if (inode)
      		iput(inode);	/* truncate the inode here */
      
      If nd.last.name[nd.last.len] is nonzero (and thus goto slashes branch is taken),
      and dentry->d_inode is non-NULL, then this code now does an additional iput on
      the inode, which is wrong.
      
      Fix this by only setting the inode variable if nd.last.name[nd.last.len] is 0.
      
      Reference: https://lkml.org/lkml/2011/6/15/50Reported-by: default avatarNorbert Preining <preining@logic.at>
      Reported-by: default avatarTörök Edwin <edwintorok@gmail.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      50338b88
  2. 12 Jun, 2011 6 commits
    • Al Viro's avatar
      Delay struct net freeing while there's a sysfs instance refering to it · a685e089
      Al Viro authored
      	* new refcount in struct net, controlling actual freeing of the memory
      	* new method in kobj_ns_type_operations (->drop_ns())
      	* ->current_ns() semantics change - it's supposed to be followed by
      corresponding ->drop_ns().  For struct net in case of CONFIG_NET_NS it bumps
      the new refcount; net_drop_ns() decrements it and calls net_free() if the
      last reference has been dropped.  Method renamed to ->grab_current_ns().
      	* old net_free() callers call net_drop_ns() instead.
      	* sysfs_exit_ns() is gone, along with a large part of callchain
      leading to it; now that the references stored in ->ns[...] stay valid we
      do not need to hunt them down and replace them with NULL.  That fixes
      problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
      of sb->s_instances abuse.
      
      	Note that struct net *shutdown* logics has not changed - net_cleanup()
      is called exactly when it used to be called.  The only thing postponed by
      having a sysfs instance refering to that struct net is actual freeing of
      memory occupied by struct net.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a685e089
    • Al Viro's avatar
      afs: fix sget() races, close leak on umount · dde194a6
      Al Viro authored
      * set ->s_fs_info in set() callback passed to sget()
      * allocate the thing and set it up enough for afs_test_super() before
      making it visible
      * have it freed in ->kill_sb() (current tree simply leaks it)
      * have ->put_super() leave ->s_fs_info->volume alone; it's too early for
      dropping it; do that from ->kill_sb() after having called kill_anon_super().
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      dde194a6
    • Al Viro's avatar
      ubifs: fix sget races · d251ed27
      Al Viro authored
      * allocate ubifs_info in ->mount(), fill it enough for sb_test() and
      set ->s_fs_info to it in set() callback passed to sget().
      * do *not* free it in ->put_super(); do that in ->kill_sb() after we'd
      done kill_anon_super().
      * don't free it in ubifs_fill_super() either - deactivate_locked_super()
      done by caller when ubifs_fill_super() returns an error will take care
      of that sucker.
      * get rid of kludge with passing ubi to ubifs_fill_super() in ->s_fs_info;
      we only need it in alloc_ubifs_info(), so ubifs_fill_super() will need
      only ubifs_info.  Which it will find in ->s_fs_info just fine, no need to
      reassign anything...
      
      As the result, sb_test() becomes safe to apply to all superblocks that
      can be found by sget() (and a kludge with temporary use of ->s_fs_info
      to store a pointer to very different structure goes away).
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d251ed27
    • Al Viro's avatar
      ubifs: split allocation of ubifs_info into a separate function · b1c27ab3
      Al Viro authored
      preparation to ubifs sget() race fixes
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b1c27ab3
    • Al Viro's avatar
      fix leak in proc_set_super() · ff78fca2
      Al Viro authored
      set_anon_super() can fail...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      ff78fca2
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of... · b99ca60c
      Linus Torvalds authored
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
        AppArmor: Fix sleep in invalid context from task_setrlimit
      b99ca60c
  3. 11 Jun, 2011 1 commit
  4. 10 Jun, 2011 1 commit
  5. 09 Jun, 2011 28 commits