1. 20 Nov, 2009 1 commit
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: turn GFP_ATOMIC flag to GFP_NOFS in reiserfs_get_block() · 1d2c6cfd
      Frederic Weisbecker authored
      GFP_ATOMIC was used in reiserfs_get_block to not lose the Bkl so that
      nobody can modify the tree in the middle of its work. Now that we
      kicked out the bkl, we can use a more friendly flag. We use GFP_NOFS
      here because we already hold the reiserfs lock.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      1d2c6cfd
  2. 14 Oct, 2009 3 commits
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: drop the fs race watchdog from _get_block_create_0() · 27b3a5c5
      Frederic Weisbecker authored
      We had a watchdog in _get_block_create_0() that jumped to a fixup retry
      path in case the bkl got relaxed while calling kmap().
      This is not necessary anymore since we now have a reiserfs lock that is
      not implicitly relaxed while sleeping.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      27b3a5c5
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: definitely drop the bkl from reiserfs_ioctl() · 205cb37b
      Frederic Weisbecker authored
      The reiserfs ioctl path doesn't need the big kernel lock anymore , now
      that the filesystem synchronizes through its own lock.
      
      We can then turn reiserfs_ioctl() into an unlocked_ioctl callback.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      205cb37b
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: always lock the ioctl path · ac78a078
      Frederic Weisbecker authored
      Reiserfs uses the ioctl callback for its file operations, which means
      that its ioctl path is still locked by the bkl, this was synchronizing
      with the rest of the filsystem operations. We have changed that by
      locking it with the new reiserfs lock but we do that only from the
      compat_ioctl callback.
      
      Fix that by locking reiserfs_ioctl() everytime.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      ac78a078
  3. 05 Oct, 2009 1 commit
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: fix reiserfs lock to cpu_add_remove_lock dependency · 48f6ba5e
      Frederic Weisbecker authored
      While creating the reiserfs workqueue during the journal
      initialization, we are holding the reiserfs lock, but
      create_workqueue() also holds the cpu_add_remove_lock, creating
      then the following dependency:
      
      - reiserfs lock -> cpu_add_remove_lock
      
      But we also have the following existing dependencies:
      
      - mm->mmap_sem -> reiserfs lock
      - cpu_add_remove_lock -> cpu_hotplug.lock -> slub_lock -> sysfs_mutex
      
      The merged dependency chain then becomes:
      
      - mm->mmap_sem -> reiserfs lock -> cpu_add_remove_lock ->
      	cpu_hotplug.lock -> slub_lock -> sysfs_mutex
      
      But when we fill a dir entry in sysfs_readir(), we are holding the
      sysfs_mutex and we also might fault while copying the directory entry
      to the user, leading to the following dependency:
      
      - sysfs_mutex -> mm->mmap_sem
      
      The end result is then a lock inversion between sysfs_mutex and
      mm->mmap_sem, as reported in the following lockdep warning:
      
      [ INFO: possible circular locking dependency detected ]
      2.6.31-07095-g25a3912 #4
      -------------------------------------------------------
      udevadm/790 is trying to acquire lock:
       (&mm->mmap_sem){++++++}, at: [<c1098942>] might_fault+0x72/0xc0
      
      but task is already holding lock:
       (sysfs_mutex){+.+.+.}, at: [<c110813c>] sysfs_readdir+0x7c/0x260
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #5 (sysfs_mutex){+.+.+.}:
            [...]
      
      -> #4 (slub_lock){+++++.}:
            [...]
      
      -> #3 (cpu_hotplug.lock){+.+.+.}:
            [...]
      
      -> #2 (cpu_add_remove_lock){+.+.+.}:
            [...]
      
      -> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
            [...]
      
      -> #0 (&mm->mmap_sem){++++++}:
            [...]
      
      This can be fixed by relaxing the reiserfs lock while creating the
      workqueue.
      This is fine to relax the lock here, we just keep it around to pass
      through reiserfs lock checks and for paranoid reasons.
      Reported-by: default avatarAlexander Beregalov <a.beregalov@gmail.com>
      Tested-by: default avatarAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      48f6ba5e
  4. 17 Sep, 2009 1 commit
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: Fix induced mm->mmap_sem to sysfs_mutex dependency · 193be0ee
      Frederic Weisbecker authored
      Alexander Beregalov reported the following warning:
      
      	=======================================================
      	[ INFO: possible circular locking dependency detected ]
      	2.6.31-03149-gdcc030a #1
      	-------------------------------------------------------
      	udevadm/716 is trying to acquire lock:
      	 (&mm->mmap_sem){++++++}, at: [<c107249a>] might_fault+0x4a/0xa0
      
      	but task is already holding lock:
      	 (sysfs_mutex){+.+.+.}, at: [<c10cb9aa>] sysfs_readdir+0x5a/0x200
      
      	which lock already depends on the new lock.
      
      	the existing dependency chain (in reverse order) is:
      
      	-> #3 (sysfs_mutex){+.+.+.}:
      	       [...]
      
      	-> #2 (&bdev->bd_mutex){+.+.+.}:
      	       [...]
      
      	-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
      	       [...]
      
      	-> #0 (&mm->mmap_sem){++++++}:
      	       [...]
      
      On reiserfs mount path, we take the reiserfs lock and while
      initializing the journal, we open the device, taking the
      bdev->bd_mutex. Then rescan_partition() may signal the change
      to sysfs.
      
      We have then the following dependency:
      
      	reiserfs_lock -> bd_mutex -> sysfs_mutex
      
      Later, while entering reiserfs_readpage() after a pagefault in an
      mmaped reiserfs file, we are holding the mm->mmap_sem, and we are going
      to take the reiserfs lock too.
      We have then the following dependency:
      
      	mm->mmap_sem -> reiserfs_lock
      
      which, expanded with the previous dependency gives us:
      
      	mm->mmap_sem -> reiserfs_lock -> bd_mutex -> sysfs_mutex
      
      Now while entering the sysfs readdir path, we are holding the
      sysfs_mutex. And when we copy a directory entry to the user buffer, we
      might fault and then take the mm->mmap_sem lock. Which leads to the
      circular locking dependency reported.
      
      We can fix that by relaxing the reiserfs lock during the call to
      journal_init_dev(), which is the place where we open the mounted
      device.
      
      This is fine to relax the lock here because we are in the begining of
      the reiserfs mount path and there is nothing to protect at this time,
      the journal is not intialized.
      We just keep this lock around for paranoid reasons.
      Reported-by: default avatarAlexander Beregalov <a.beregalov@gmail.com>
      Tested-by: default avatarAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      193be0ee
  5. 14 Sep, 2009 25 commits
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: panic in case of lock imbalance · 80503185
      Frederic Weisbecker authored
      Until now, trying to unlock the reiserfs write lock whereas the current
      task doesn't hold it lead to a simple warning.
      We should actually warn and panic in this case to avoid the user datas
      to reach an unstable state.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      80503185
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write() · 7e942770
      Frederic Weisbecker authored
      reiserfs_commit_write() is always called with the write lock held.
      Thus the current calls to reiserfs_write_lock() in this function are
      acquiring the lock recursively.
      We can safely drop them.
      
      This also solves further assumptions for this lock to be really
      released while calling reiserfs_write_unlock().
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      7e942770
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: fix recursive reiserfs lock in reiserfs_mkdir() · b10ab4c3
      Frederic Weisbecker authored
      reiserfs_mkdir() acquires the reiserfs lock, assuming it has been called
      from the dir inodes callbacks, without the lock held.
      
      But it can also be called from other internal sites such as
      reiserfs_xattr_init() which already holds the lock. This recursive
      locking leads to further wrong assumptions. For example, later calls
      to reiserfs_mutex_lock_safe() won't actually unlock the reiserfs lock
      the time we acquire a given mutex, creating unexpected lock inversions.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      b10ab4c3
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: fix "reiserfs lock" / "inode mutex" lock inversion dependency · ae635c0b
      Frederic Weisbecker authored
      reiserfs_xattr_init is called with the reiserfs write lock held, but
      if the ".reiserfs_priv" entry is not created, we take the superblock
      root directory inode mutex until .reiserfs_priv is created.
      
      This creates a lock dependency inversion against other sites such as
      reiserfs_file_release() which takes an inode mutex and the reiserfs
      lock after.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      ae635c0b
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: move the concurrent tree accesses checks per superblock · 08f14fc8
      Frederic Weisbecker authored
      When do_balance() balances the tree, a trick is performed to
      provide the ability for other tree writers/readers to check whether
      do_balance() is executing concurrently (requires CONFIG_REISERFS_CHECK).
      
      This is done to protect concurrent accesses to the tree. The trick
      is the following:
      
      When do_balance is called, a unique global variable called cur_tb
      takes a pointer to the current tree to be rebalanced.
      Once do_balance finishes its work, cur_tb takes the NULL value.
      
      Then, concurrent tree readers/writers just have to check the value
      of cur_tb to ensure do_balance isn't executing concurrently.
      If it is, then it proves that schedule() occured on do_balance(),
      which then relaxed the bkl that protected the tree.
      
      Now that the bkl has be turned into a mutex, this check is still
      fine even though do_balance() becomes preemptible: the write lock
      will not be automatically released on schedule(), so the tree is
      still protected.
      
      But this is only fine if we have a single reiserfs mountpoint.
      Indeed, because the bkl is a global lock, it didn't allowed
      concurrent executions between a tree reader/writer in a mount point
      and a do_balance() on another tree from another mountpoint.
      
      So assuming all these readers/writers weren't supposed to be
      reentrant, the current check now sometimes detect false positives with
      the current per-superblock mutex which allows this reentrancy.
      
      This patch keeps the concurrent tree accesses check but moves it
      per superblock, so that only trees from a same mount point are
      checked to be not accessed concurrently.
      
      [ Impact: fix spurious panic while running several reiserfs mount-points ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      08f14fc8
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: acquire the inode mutex safely · c72e0575
      Frederic Weisbecker authored
      While searching a pathname, an inode mutex can be acquired
      in do_lookup() which calls reiserfs_lookup() which in turn
      acquires the write lock.
      
      On the other side reiserfs_fill_super() can acquire the write_lock
      and then call reiserfs_lookup_privroot() which can acquire an
      inode mutex (the root of the mount point).
      
      So we theoretically risk an AB - BA lock inversion that could lead
      to a deadlock.
      
      As for other lock dependencies found since the bkl to mutex
      conversion, the fix is to use reiserfs_mutex_lock_safe() which
      drops the lock dependency to the write lock.
      
      [ Impact: fix a possible deadlock with reiserfs ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      c72e0575
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: unlock only when needed in search_by_key · 2ac62695
      Frederic Weisbecker authored
      search_by_key() is the site which most requires the lock.
      This is mostly because it is a very central function and also
      because it releases/reaqcuires the write lock at least once each
      time it is called.
      
      Such release/reacquire creates a lot of contention in this place and
      also opens more the window which let another thread changing the tree.
      When it happens, the current path searching over the tree must be
      retried from the beggining (the root) which is a wasteful and
      time consuming recovery.
      
      This patch factorizes two release/reacquire sequences:
      
      - reading leaf nodes blocks
      - reading current block
      
      The latter immediately follows the former.
      
      The whole sequence is safe as a single unlocked section because
      we check just after if the tree has changed during these operations.
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      2ac62695
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe · c63e3c0b
      Frederic Weisbecker authored
      reiserfs_mutex_lock_safe() is a hack to avoid any dependency between
      an internal reiserfs mutex and the write lock, it has been proposed
      to follow the old bkl logic.
      
      The code does the following:
      
      while (!mutex_trylock(m)) {
      	reiserfs_write_unlock(s);
      	schedule();
      	reiserfs_write_lock(s);
      }
      
      It then imitate the implicit behaviour of the lock when it was
      a Bkl and hadn't such dependency:
      
      mutex_lock(m) {
      	if (fastpath)
      		let's go
      	else {
      		wait_for_mutex() {
      			schedule() {
      				unlock_kernel()
      				reacquire_lock_kernel()
      			}
      		}
      	}
      }
      
      The problem is that by using such explicit schedule(), we don't
      benefit of the adaptive mutex spinning on owner.
      
      The logic in use now is:
      
      reiserfs_write_unlock(s);
      mutex_lock(m); // -> possible adaptive spinning
      reiserfs_write_lock(s);
      
      [ Impact: restore the use of adaptive spinning mutexes in reiserfs ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      c63e3c0b
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end() · d6f5b0aa
      Frederic Weisbecker authored
      reiserfs_write_end() is a hot path in reiserfs.
      We have two wasteful write lock lock/release inside that can be gathered
      without changing the code logic.
      
      This patch factorizes them out in a single protected section, reducing the
      number of contentions inside.
      
      [ Impact: reduce lock contention in a reiserfs hotpath ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      d6f5b0aa
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: reduce number of contentions in search_by_key() · 09eb47a7
      Frederic Weisbecker authored
      search_by_key() is a central function in reiserfs which searches
      the patch in the fs tree from the root to a node given its key.
      
      It is the function that is most requesting the write lock
      because it's a path very often used.
      
      Also we forget to release the lock while reading the next tree node,
      making us holding the lock in a wasteful way.
      
      Then we release the lock while reading the current node and its childs,
      all-in-one. It should be safe because we have a reference to these
      blocks and even if we read a block that will be concurrently changed,
      we have an fs_changed check later that will make us retry the path from
      the root.
      
      [ Impact: release the write lock while unused in a hot path ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      09eb47a7
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: don't hold the write recursively in reiserfs_lookup() · b1c839bb
      Frederic Weisbecker authored
      The write lock can be acquired recursively in reiserfs_lookup(). But we may
      want to *really* release the lock before possible rescheduling from a
      reiserfs_lookup() callee.
      
      Hence we want to only acquire the lock once (ie: not recursively).
      
      [ Impact: prevent from possible false unreleased write lock on sleeping ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      b1c839bb
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: lock only once on reiserfs_get_block() · 26931309
      Frederic Weisbecker authored
      reiserfs_get_block() is one of these sites where the write lock might
      be acquired recursively.
      
      It's a particular problem because this function is called very often.
      It's a hot spot which needs to reschedule() periodically while converting
      direct items to indirect ones because it can take some time.
      
      Then if we are applying the write lock release/reacquire pattern on
      schedule() here, it may not produce the desired effect since we may have
      locked in more than one depth.
      
      The solution is to use reiserfs_write_lock_once() which won't try
      to reacquire the lock recursively. Then the lock will be *really*
      released before schedule().
      
      Also, we only release the lock if TIF_NEED_RESCHED is set to not
      create wasteful numerous contentions.
      
      [ Impact: fix a too long holded lock case in reiserfs_get_block() ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      26931309
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: conditionaly release the write lock on fs_changed() · d663af80
      Frederic Weisbecker authored
      The goal of fs_changed() is to check whether the tree changed during a
      schedule(). This is a BKL legacy.
      
      A recent patch added an explicit unconditional release/reacquire of the
      write lock around the cond_resched() called inside fs_changed.
      
      But it's wasteful to unconditionally do that, we are creating superfluous
      lock contention in !TIF_NEED_RESCHED case.
      
      This patch manage that by calling reiserfs_cond_resched() from fs_changed()
      which only releases the lock if we are going to reschedule.
      
      [ Impact: inject less lock contention and tree job retries ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      d663af80
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: add reiserfs_cond_resched() · e43d3f21
      Frederic Weisbecker authored
      Usually, when we call cond_resched(), we want the write lock
      to be released and then reacquired once we return from scheduling.
      Not only does it follow the previous bkl based locking scheme, but
      it also let other waiters to get the lock.
      
      But if we aren't going to reschedule(), such as in !TIF_NEED_RESCHED
      case, it's useless to release the lock. Worse, if we release and reacquire
      the lock whereas it is not needed, we create useless contentions. Also
      if someone takes the lock while we are modifying or reading the tree,
      there are good chances we'll have to retry our operation, eg if the
      block we were seeeking has moved.
      
      So this patch introduces a helper which only unlock the write lock
      if we are going to schedule.
      
      [ Impact: prepare to inject less lock contention and less tree operation attempts ]
      Reported-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      e43d3f21
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: release the write lock on flush_commit_list() · 6e3647ac
      Frederic Weisbecker authored
      flush_commit_list() uses ll_rw_block() to commit the pending log blocks.
      ll_rw_block() might sleep, and the bkl was released at this point. Then
      we can also relax the write lock at this point.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      6e3647ac
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: release the write lock inside reiserfs_read_bitmap_block() · 4c5eface
      Frederic Weisbecker authored
      reiserfs_read_bitmap_block() uses sb_bread() to read the bitmap block. This
      helper might sleep.
      
      Then, when the bkl was used, it was released at this point. We can then
      relax the write lock too here.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      4c5eface
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: release the write lock inside get_neighbors() · 148d3504
      Frederic Weisbecker authored
      get_neighbors() is used to get the left and/or right blocks
      against a given one in order to balance a tree.
      
      sb_bread() is used to read the buffer of these neighors blocks and
      while it waits for this operation, it might sleep.
      
      The bkl was released at this point, and then we can also release
      the write lock before calling sb_bread().
      
      This is safe because if the filesystem is changed after this
      lock release, the function returns REPEAT_SEARCH (aka SCHEDULE_OCCURRED
      in the function header comments) in order to repeat the neighbhor
      research.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      148d3504
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: release write lock while rescheduling on prepare_for_delete_or_cut() · 5e69e3a4
      Frederic Weisbecker authored
      prepare_for_delete_or_cut() can process several types of items, including
      indirect items, ie: items which contain no file data but pointers to
      unformatted nodes scattering the datas of a file.
      
      In this case it has to zero out these pointers to block numbers of
      unformatted nodes and release the bitmap from these block numbers.
      
      It can take some time, so a rescheduling() is performed between each
      block processed. We can safely release the write lock while
      rescheduling(), like the bkl did, because the code checks just after
      if the item has moved after sleeping.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      5e69e3a4
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: release the write lock before rescheduling on do_journal_end() · e6950a4d
      Frederic Weisbecker authored
      When do_journal_end() copies data to the journal blocks buffers in memory,
      it reschedules if needed between each block copied and dirtyfied.
      
      We can also release the write lock at this rescheduling stage,
      like did the bkl implicitly.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      e6950a4d
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: release write lock on fs_changed() · f32049dc
      Frederic Weisbecker authored
      fs_changed() is a macro used by reiserfs to check whether its tree has been
      rebalanced. It has been designed to check parallel changes on the tree after
      calling a sleeping function, which released the Bkl.
      
      fs_changed() also calls cond_resched(), so that if rescheduling is needed,
      we are in the best place to do that, since we check if the tree has changed
      just after (because of the bkl release on schedule()).
      
      Even if we are not anymore using the Bkl, we still want to release the lock
      while we reschedule, so that other waiters for the lock can acquire it safely,
      because of the following __fs_changed() check.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      f32049dc
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: only acquire the write lock once in reiserfs_dirty_inode · dc8f6d89
      Frederic Weisbecker authored
      Impact: fix a deadlock
      
      reiserfs_dirty_inode() is the super_operations::dirty_inode() callback
      of reiserfs. It can be called from different contexts where the write
      lock can be already held.
      
      But this function also grab the write lock (possibly recursively).
      Subsequent release of the lock before sleep will actually not release
      the lock if the caller of mark_inode_dirty() (which in turn calls
      reiserfs_dirty_inode()) already owns the lock.
      
      A typical case:
      
      reiserfs_write_end() {
      	acquire_write_lock()
      	mark_inode_dirty() {
      		reiserfs_dirty_inode() {
      			reacquire_write_lock() {
      				journal_begin() {
      					do_journal_begin_r() {
      						/*
      						 * fail to release, still
      						 * one depth of lock
      						 */
      						release_write_lock()
      						reiserfs_wait_on_write_block() {
      							wait_event()
      
      The event is usually provided by something which needs the write lock but
      it hasn't been released.
      
      We use reiserfs_write_lock_once() here to ensure we only grab the
      write lock in one level.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      LKML-Reference: <1239680065-25013-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      dc8f6d89
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file · 22c963ad
      Frederic Weisbecker authored
      Impact: fix a deadlock
      
      reiserfs_truncate_file() can be called from multiple context where
      the write lock can be already hold or not.
      
      This function also acquire (possibly recursively) the write
      lock. Subsequent releases before sleeping will not actually release
      the lock because we may be in more than one lock depth degree.
      
      A typical case is:
      
      reiserfs_file_release {
      	acquire_the_lock()
      	reiserfs_truncate_file()
      		reacquire_the_lock()
      		journal_begin() {
      			do_journal_begin_r() {
      				reiserfs_wait_on_write_block() {
      					/*
      					 * Not released because still one
      					 * depth owned
      					 */
      					release_lock()
      					wait_for_event()
      
      At this stage the event never happen because the one which provides
      it needs the write lock.
      
      We use reiserfs_write_lock_once() here to ensure that we don't acquire the
      write lock recursively.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      LKML-Reference: <1239680065-25013-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      22c963ad
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: provide a tool to lock only once the write lock · daf88c89
      Frederic Weisbecker authored
      Sometimes we don't want to recursively hold the per superblock write
      lock because we want to be sure it is actually released when we come
      to sleep.
      
      This patch introduces the necessary tools for that.
      
      reiserfs_write_lock_once() does the same job than reiserfs_write_lock()
      except that it won't try to acquire recursively the lock if the current
      task already owns it. Also the lock_depth before the call of this function
      is returned.
      
      reiserfs_write_unlock_once() unlock only if reiserfs_write_lock_once()
      returned a depth equal to -1, ie: only if it actually locked.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      LKML-Reference: <1239680065-25013-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      daf88c89
    • Frederic Weisbecker's avatar
      reiserfs, kill-the-BKL: fix unsafe j_flush_mutex lock · a412f9ef
      Frederic Weisbecker authored
      Impact: fix a deadlock
      
      The j_flush_mutex is acquired safely in journal.c:
      if we can't take it, we free the reiserfs per superblock lock
      and wait a bit.
      
      But we have a remaining place in kupdate_transactions() where
      j_flush_mutex is still acquired traditionnaly. Thus the following
      scenario (warned by lockdep) can happen:
      
      A						B
      
      mutex_lock(&write_lock)			mutex_lock(&write_lock)
      	mutex_lock(&j_flush_mutex)	mutex_lock(&j_flush_mutex) //block
      	mutex_unlock(&write_lock)
      	sleep...
      	mutex_lock(&write_lock) //deadlock
      
      Fix this by using reiserfs_mutex_lock_safe() in kupdate_transactions().
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Jeff Mahoney <jeffm@suse.com>
      LKML-Reference: <1239660635-12940-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a412f9ef
    • Frederic Weisbecker's avatar
      reiserfs: kill-the-BKL · 8ebc4232
      Frederic Weisbecker authored
      This patch is an attempt to remove the Bkl based locking scheme from
      reiserfs and is intended.
      
      It is a bit inspired from an old attempt by Peter Zijlstra:
      
         http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html
      
      The bkl is heavily used in this filesystem to prevent from
      concurrent write accesses on the filesystem.
      
      Reiserfs makes a deep use of the specific properties of the Bkl:
      
      - It can be acqquired recursively by a same task
      - It is released on the schedule() calls and reacquired when schedule() returns
      
      The two properties above are a roadmap for the reiserfs write locking so it's
      very hard to simply replace it with a common mutex.
      
      - We need a recursive-able locking unless we want to restructure several blocks
        of the code.
      - We need to identify the sites where the bkl was implictly relaxed
        (schedule, wait, sync, etc...) so that we can in turn release and
        reacquire our new lock explicitly.
        Such implicit releases of the lock are often required to let other
        resources producer/consumer do their job or we can suffer unexpected
        starvations or deadlocks.
      
      So the new lock that replaces the bkl here is a per superblock mutex with a
      specific property: it can be acquired recursively by a same task, like the
      bkl.
      
      For such purpose, we integrate a lock owner and a lock depth field on the
      superblock information structure.
      
      The first axis on this patch is to turn reiserfs_write_(un)lock() function
      into a wrapper to manage this mutex. Also some explicit calls to
      lock_kernel() have been converted to reiserfs_write_lock() helpers.
      
      The second axis is to find the important blocking sites (schedule...(),
      wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit
      release of the write lock on these locations before blocking. Then we can
      safely wait for those who can give us resources or those who need some.
      Typically this is a fight between the current writer, the reiserfs workqueue
      (aka the async commiter) and the pdflush threads.
      
      The third axis is a consequence of the second. The write lock is usually
      on top of a lock dependency chain which can include the journal lock, the
      flush lock or the commit lock. So it's dangerous to release and trying to
      reacquire the write lock while we still hold other locks.
      
      This is fine with the bkl:
      
            T1                       T2
      
      lock_kernel()
          mutex_lock(A)
          unlock_kernel()
          // do something
                                  lock_kernel()
                                      mutex_lock(A) -> already locked by T1
                                      schedule() (and then unlock_kernel())
          lock_kernel()
          mutex_unlock(A)
          ....
      
      This is not fine with a mutex:
      
            T1                       T2
      
      mutex_lock(write)
          mutex_lock(A)
          mutex_unlock(write)
          // do something
                                 mutex_lock(write)
                                    mutex_lock(A) -> already locked by T1
                                    schedule()
      
          mutex_lock(write) -> already locked by T2
          deadlock
      
      The solution in this patch is to provide a helper which releases the write
      lock and sleep a bit if we can't lock a mutex that depend on it. It's another
      simulation of the bkl behaviour.
      
      The last axis is to locate the fs callbacks that are called with the bkl held,
      according to Documentation/filesystem/Locking.
      
      Those are:
      
      - reiserfs_remount
      - reiserfs_fill_super
      - reiserfs_put_super
      
      Reiserfs didn't need to explicitly lock because of the context of these callbacks.
      But now we must take care of that with the new locking.
      
      After this patch, reiserfs suffers from a slight performance regression (for now).
      On UP, a high volume write with dd reports an average of 27 MB/s instead
      of 30 MB/s without the patch applied.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Bron Gondwana <brong@fastmail.fm>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8ebc4232
  6. 09 Sep, 2009 3 commits
    • Linus Torvalds's avatar
      Linux 2.6.31 · 74fca6a4
      Linus Torvalds authored
      74fca6a4
    • Ed Cashin's avatar
      aoe: allocate unused request_queue for sysfs · 7135a71b
      Ed Cashin authored
      Andy Whitcroft reported an oops in aoe triggered by use of an
      incorrectly initialised request_queue object:
      
        [ 2645.959090] kobject '<NULL>' (ffff880059ca22c0): tried to add
      		an uninitialized object, something is seriously wrong.
        [ 2645.959104] Pid: 6, comm: events/0 Not tainted 2.6.31-5-generic #24-Ubuntu
        [ 2645.959107] Call Trace:
        [ 2645.959139] [<ffffffff8126ca2f>] kobject_add+0x5f/0x70
        [ 2645.959151] [<ffffffff8125b4ab>] blk_register_queue+0x8b/0xf0
        [ 2645.959155] [<ffffffff8126043f>] add_disk+0x8f/0x160
        [ 2645.959161] [<ffffffffa01673c4>] aoeblk_gdalloc+0x164/0x1c0 [aoe]
      
      The request queue of an aoe device is not used but can be allocated in
      code that does not sleep.
      
      Bruno bisected this regression down to
      
        cd43e26f
      
        block: Expose stacked device queues in sysfs
      
      "This seems to generate /sys/block/$device/queue and its contents for
       everyone who is using queues, not just for those queues that have a
       non-NULL queue->request_fn."
      
      Addresses http://bugs.launchpad.net/bugs/410198
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13942
      
      Note that embedding a queue inside another object has always been
      an illegal construct, since the queues are reference counted and
      must persist until the last reference is dropped. So aoe was
      always buggy in this respect (Jens).
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Bruno Premont <bonbons@linux-vserver.org>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      7135a71b
    • Linus Torvalds's avatar
      i915: disable interrupts before tearing down GEM state · e6890f6f
      Linus Torvalds authored
      Reinette Chatre reports a frozen system (with blinking keyboard LEDs)
      when switching from graphics mode to the text console, or when
      suspending (which does the same thing). With netconsole, the oops
      turned out to be
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
      	IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
      
      and it's due to the i915_gem.c code doing drm_irq_uninstall() after
      having done i915_gem_idle(). And the i915_gem_idle() path will do
      
        i915_gem_idle() ->
          i915_gem_cleanup_ringbuffer() ->
            i915_gem_cleanup_hws() ->
              dev_priv->hw_status_page = NULL;
      
      but if an i915 interrupt comes in after this stage, it may want to
      access that hw_status_page, and gets the above NULL pointer dereference.
      
      And since the NULL pointer dereference happens from within an interrupt,
      and with the screen still in graphics mode, the common end result is
      simply a silently hung machine.
      
      Fix it by simply uninstalling the irq handler before idling rather than
      after. Fixes
      
          http://bugzilla.kernel.org/show_bug.cgi?id=13819Reported-and-tested-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Acked-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6890f6f
  7. 08 Sep, 2009 1 commit
  8. 07 Sep, 2009 5 commits