1. 14 Sep, 2009 14 commits
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: lock only once on reiserfs_get_block() · 26931309
      Frederic Weisbecker authored
      reiserfs_get_block() is one of these sites where the write lock might
      be acquired recursively.
      
      It's a particular problem because this function is called very often.
      It's a hot spot which needs to reschedule() periodically while converting
      direct items to indirect ones because it can take some time.
      
      Then if we are applying the write lock release/reacquire pattern on
      schedule() here, it may not produce the desired effect since we may have
      locked in more than one depth.
      
      The solution is to use reiserfs_write_lock_once() which won't try
      to reacquire the lock recursively. Then the lock will be *really*
      released before schedule().
      
      Also, we only release the lock if TIF_NEED_RESCHED is set to not
      create wasteful numerous contentions.
      
      [ Impact: fix a too long holded lock case in reiserfs_get_block() ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      26931309
    • Frederic Weisbecker's avatar
      kill-the-bkl/reiserfs: conditionaly release the write lock on fs_changed() · d663af80
      Frederic Weisbecker authored
      The goal of fs_changed() is to check whether the tree changed during a
      schedule(). This is a BKL legacy.
      
      A recent patch added an explicit unconditional release/reacquire of the
      write lock around the cond_resched() called inside fs_changed.
      
      But it's wasteful to unconditionally do that, we are creating superfluous
      lock contention in !TIF_NEED_RESCHED case.
      
      This patch manage that by calling reiserfs_cond_resched() from fs_changed()
      which only releases the lock if we are going to reschedule.
      
      [ Impact: inject less lock contention and tree job retries ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      d663af80
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: add reiserfs_cond_resched() · e43d3f21
      Frederic Weisbecker authored
      Usually, when we call cond_resched(), we want the write lock
      to be released and then reacquired once we return from scheduling.
      Not only does it follow the previous bkl based locking scheme, but
      it also let other waiters to get the lock.
      
      But if we aren't going to reschedule(), such as in !TIF_NEED_RESCHED
      case, it's useless to release the lock. Worse, if we release and reacquire
      the lock whereas it is not needed, we create useless contentions. Also
      if someone takes the lock while we are modifying or reading the tree,
      there are good chances we'll have to retry our operation, eg if the
      block we were seeeking has moved.
      
      So this patch introduces a helper which only unlock the write lock
      if we are going to schedule.
      
      [ Impact: prepare to inject less lock contention and less tree operation attempts ]
      Reported-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      e43d3f21
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: release the write lock on flush_commit_list() · 6e3647ac
      Frederic Weisbecker authored
      flush_commit_list() uses ll_rw_block() to commit the pending log blocks.
      ll_rw_block() might sleep, and the bkl was released at this point. Then
      we can also relax the write lock at this point.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      6e3647ac
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: release the write lock inside reiserfs_read_bitmap_block() · 4c5eface
      Frederic Weisbecker authored
      reiserfs_read_bitmap_block() uses sb_bread() to read the bitmap block. This
      helper might sleep.
      
      Then, when the bkl was used, it was released at this point. We can then
      relax the write lock too here.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      4c5eface
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: release the write lock inside get_neighbors() · 148d3504
      Frederic Weisbecker authored
      get_neighbors() is used to get the left and/or right blocks
      against a given one in order to balance a tree.
      
      sb_bread() is used to read the buffer of these neighors blocks and
      while it waits for this operation, it might sleep.
      
      The bkl was released at this point, and then we can also release
      the write lock before calling sb_bread().
      
      This is safe because if the filesystem is changed after this
      lock release, the function returns REPEAT_SEARCH (aka SCHEDULE_OCCURRED
      in the function header comments) in order to repeat the neighbhor
      research.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      148d3504
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: release write lock while rescheduling on prepare_for_delete_or_cut() · 5e69e3a4
      Frederic Weisbecker authored
      prepare_for_delete_or_cut() can process several types of items, including
      indirect items, ie: items which contain no file data but pointers to
      unformatted nodes scattering the datas of a file.
      
      In this case it has to zero out these pointers to block numbers of
      unformatted nodes and release the bitmap from these block numbers.
      
      It can take some time, so a rescheduling() is performed between each
      block processed. We can safely release the write lock while
      rescheduling(), like the bkl did, because the code checks just after
      if the item has moved after sleeping.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      5e69e3a4
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: release the write lock before rescheduling on do_journal_end() · e6950a4d
      Frederic Weisbecker authored
      When do_journal_end() copies data to the journal blocks buffers in memory,
      it reschedules if needed between each block copied and dirtyfied.
      
      We can also release the write lock at this rescheduling stage,
      like did the bkl implicitly.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      e6950a4d
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: release write lock on fs_changed() · f32049dc
      Frederic Weisbecker authored
      fs_changed() is a macro used by reiserfs to check whether its tree has been
      rebalanced. It has been designed to check parallel changes on the tree after
      calling a sleeping function, which released the Bkl.
      
      fs_changed() also calls cond_resched(), so that if rescheduling is needed,
      we are in the best place to do that, since we check if the tree has changed
      just after (because of the bkl release on schedule()).
      
      Even if we are not anymore using the Bkl, we still want to release the lock
      while we reschedule, so that other waiters for the lock can acquire it safely,
      because of the following __fs_changed() check.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      f32049dc
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: only acquire the write lock once in reiserfs_dirty_inode · dc8f6d89
      Frederic Weisbecker authored
      Impact: fix a deadlock
      
      reiserfs_dirty_inode() is the super_operations::dirty_inode() callback
      of reiserfs. It can be called from different contexts where the write
      lock can be already held.
      
      But this function also grab the write lock (possibly recursively).
      Subsequent release of the lock before sleep will actually not release
      the lock if the caller of mark_inode_dirty() (which in turn calls
      reiserfs_dirty_inode()) already owns the lock.
      
      A typical case:
      
      reiserfs_write_end() {
      	acquire_write_lock()
      	mark_inode_dirty() {
      		reiserfs_dirty_inode() {
      			reacquire_write_lock() {
      				journal_begin() {
      					do_journal_begin_r() {
      						/*
      						 * fail to release, still
      						 * one depth of lock
      						 */
      						release_write_lock()
      						reiserfs_wait_on_write_block() {
      							wait_event()
      
      The event is usually provided by something which needs the write lock but
      it hasn't been released.
      
      We use reiserfs_write_lock_once() here to ensure we only grab the
      write lock in one level.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      LKML-Reference: <1239680065-25013-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      dc8f6d89
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file · 22c963ad
      Frederic Weisbecker authored
      Impact: fix a deadlock
      
      reiserfs_truncate_file() can be called from multiple context where
      the write lock can be already hold or not.
      
      This function also acquire (possibly recursively) the write
      lock. Subsequent releases before sleeping will not actually release
      the lock because we may be in more than one lock depth degree.
      
      A typical case is:
      
      reiserfs_file_release {
      	acquire_the_lock()
      	reiserfs_truncate_file()
      		reacquire_the_lock()
      		journal_begin() {
      			do_journal_begin_r() {
      				reiserfs_wait_on_write_block() {
      					/*
      					 * Not released because still one
      					 * depth owned
      					 */
      					release_lock()
      					wait_for_event()
      
      At this stage the event never happen because the one which provides
      it needs the write lock.
      
      We use reiserfs_write_lock_once() here to ensure that we don't acquire the
      write lock recursively.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      LKML-Reference: <1239680065-25013-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      22c963ad
    • Frederic Weisbecker's avatar
      kill-the-BKL/reiserfs: provide a tool to lock only once the write lock · daf88c89
      Frederic Weisbecker authored
      Sometimes we don't want to recursively hold the per superblock write
      lock because we want to be sure it is actually released when we come
      to sleep.
      
      This patch introduces the necessary tools for that.
      
      reiserfs_write_lock_once() does the same job than reiserfs_write_lock()
      except that it won't try to acquire recursively the lock if the current
      task already owns it. Also the lock_depth before the call of this function
      is returned.
      
      reiserfs_write_unlock_once() unlock only if reiserfs_write_lock_once()
      returned a depth equal to -1, ie: only if it actually locked.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      LKML-Reference: <1239680065-25013-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      daf88c89
    • Frederic Weisbecker's avatar
      reiserfs, kill-the-BKL: fix unsafe j_flush_mutex lock · a412f9ef
      Frederic Weisbecker authored
      Impact: fix a deadlock
      
      The j_flush_mutex is acquired safely in journal.c:
      if we can't take it, we free the reiserfs per superblock lock
      and wait a bit.
      
      But we have a remaining place in kupdate_transactions() where
      j_flush_mutex is still acquired traditionnaly. Thus the following
      scenario (warned by lockdep) can happen:
      
      A						B
      
      mutex_lock(&write_lock)			mutex_lock(&write_lock)
      	mutex_lock(&j_flush_mutex)	mutex_lock(&j_flush_mutex) //block
      	mutex_unlock(&write_lock)
      	sleep...
      	mutex_lock(&write_lock) //deadlock
      
      Fix this by using reiserfs_mutex_lock_safe() in kupdate_transactions().
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Jeff Mahoney <jeffm@suse.com>
      LKML-Reference: <1239660635-12940-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a412f9ef
    • Frederic Weisbecker's avatar
      reiserfs: kill-the-BKL · 8ebc4232
      Frederic Weisbecker authored
      This patch is an attempt to remove the Bkl based locking scheme from
      reiserfs and is intended.
      
      It is a bit inspired from an old attempt by Peter Zijlstra:
      
         http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html
      
      The bkl is heavily used in this filesystem to prevent from
      concurrent write accesses on the filesystem.
      
      Reiserfs makes a deep use of the specific properties of the Bkl:
      
      - It can be acqquired recursively by a same task
      - It is released on the schedule() calls and reacquired when schedule() returns
      
      The two properties above are a roadmap for the reiserfs write locking so it's
      very hard to simply replace it with a common mutex.
      
      - We need a recursive-able locking unless we want to restructure several blocks
        of the code.
      - We need to identify the sites where the bkl was implictly relaxed
        (schedule, wait, sync, etc...) so that we can in turn release and
        reacquire our new lock explicitly.
        Such implicit releases of the lock are often required to let other
        resources producer/consumer do their job or we can suffer unexpected
        starvations or deadlocks.
      
      So the new lock that replaces the bkl here is a per superblock mutex with a
      specific property: it can be acquired recursively by a same task, like the
      bkl.
      
      For such purpose, we integrate a lock owner and a lock depth field on the
      superblock information structure.
      
      The first axis on this patch is to turn reiserfs_write_(un)lock() function
      into a wrapper to manage this mutex. Also some explicit calls to
      lock_kernel() have been converted to reiserfs_write_lock() helpers.
      
      The second axis is to find the important blocking sites (schedule...(),
      wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit
      release of the write lock on these locations before blocking. Then we can
      safely wait for those who can give us resources or those who need some.
      Typically this is a fight between the current writer, the reiserfs workqueue
      (aka the async commiter) and the pdflush threads.
      
      The third axis is a consequence of the second. The write lock is usually
      on top of a lock dependency chain which can include the journal lock, the
      flush lock or the commit lock. So it's dangerous to release and trying to
      reacquire the write lock while we still hold other locks.
      
      This is fine with the bkl:
      
            T1                       T2
      
      lock_kernel()
          mutex_lock(A)
          unlock_kernel()
          // do something
                                  lock_kernel()
                                      mutex_lock(A) -> already locked by T1
                                      schedule() (and then unlock_kernel())
          lock_kernel()
          mutex_unlock(A)
          ....
      
      This is not fine with a mutex:
      
            T1                       T2
      
      mutex_lock(write)
          mutex_lock(A)
          mutex_unlock(write)
          // do something
                                 mutex_lock(write)
                                    mutex_lock(A) -> already locked by T1
                                    schedule()
      
          mutex_lock(write) -> already locked by T2
          deadlock
      
      The solution in this patch is to provide a helper which releases the write
      lock and sleep a bit if we can't lock a mutex that depend on it. It's another
      simulation of the bkl behaviour.
      
      The last axis is to locate the fs callbacks that are called with the bkl held,
      according to Documentation/filesystem/Locking.
      
      Those are:
      
      - reiserfs_remount
      - reiserfs_fill_super
      - reiserfs_put_super
      
      Reiserfs didn't need to explicitly lock because of the context of these callbacks.
      But now we must take care of that with the new locking.
      
      After this patch, reiserfs suffers from a slight performance regression (for now).
      On UP, a high volume write with dd reports an average of 27 MB/s instead
      of 30 MB/s without the patch applied.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Bron Gondwana <brong@fastmail.fm>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8ebc4232
  2. 09 Sep, 2009 3 commits
    • Linus Torvalds's avatar
      Linux 2.6.31 · 74fca6a4
      Linus Torvalds authored
      74fca6a4
    • Ed Cashin's avatar
      aoe: allocate unused request_queue for sysfs · 7135a71b
      Ed Cashin authored
      Andy Whitcroft reported an oops in aoe triggered by use of an
      incorrectly initialised request_queue object:
      
        [ 2645.959090] kobject '<NULL>' (ffff880059ca22c0): tried to add
      		an uninitialized object, something is seriously wrong.
        [ 2645.959104] Pid: 6, comm: events/0 Not tainted 2.6.31-5-generic #24-Ubuntu
        [ 2645.959107] Call Trace:
        [ 2645.959139] [<ffffffff8126ca2f>] kobject_add+0x5f/0x70
        [ 2645.959151] [<ffffffff8125b4ab>] blk_register_queue+0x8b/0xf0
        [ 2645.959155] [<ffffffff8126043f>] add_disk+0x8f/0x160
        [ 2645.959161] [<ffffffffa01673c4>] aoeblk_gdalloc+0x164/0x1c0 [aoe]
      
      The request queue of an aoe device is not used but can be allocated in
      code that does not sleep.
      
      Bruno bisected this regression down to
      
        cd43e26f
      
        block: Expose stacked device queues in sysfs
      
      "This seems to generate /sys/block/$device/queue and its contents for
       everyone who is using queues, not just for those queues that have a
       non-NULL queue->request_fn."
      
      Addresses http://bugs.launchpad.net/bugs/410198
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13942
      
      Note that embedding a queue inside another object has always been
      an illegal construct, since the queues are reference counted and
      must persist until the last reference is dropped. So aoe was
      always buggy in this respect (Jens).
      Signed-off-by: default avatarEd Cashin <ecashin@coraid.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Bruno Premont <bonbons@linux-vserver.org>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      7135a71b
    • Linus Torvalds's avatar
      i915: disable interrupts before tearing down GEM state · e6890f6f
      Linus Torvalds authored
      Reinette Chatre reports a frozen system (with blinking keyboard LEDs)
      when switching from graphics mode to the text console, or when
      suspending (which does the same thing). With netconsole, the oops
      turned out to be
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
      	IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
      
      and it's due to the i915_gem.c code doing drm_irq_uninstall() after
      having done i915_gem_idle(). And the i915_gem_idle() path will do
      
        i915_gem_idle() ->
          i915_gem_cleanup_ringbuffer() ->
            i915_gem_cleanup_hws() ->
              dev_priv->hw_status_page = NULL;
      
      but if an i915 interrupt comes in after this stage, it may want to
      access that hw_status_page, and gets the above NULL pointer dereference.
      
      And since the NULL pointer dereference happens from within an interrupt,
      and with the screen still in graphics mode, the common end result is
      simply a silently hung machine.
      
      Fix it by simply uninstalling the irq handler before idling rather than
      after. Fixes
      
          http://bugzilla.kernel.org/show_bug.cgi?id=13819Reported-and-tested-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Acked-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6890f6f
  3. 08 Sep, 2009 1 commit
  4. 07 Sep, 2009 7 commits
  5. 06 Sep, 2009 1 commit
    • David S. Miller's avatar
      gianfar: Fix build. · d9d8e041
      David S. Miller authored
      Reported by Michael Guntsche <mike@it-loops.com>
      
      --------------------
      Commit
      38bddf04 gianfar: gfar_remove needs to call unregister_netdev()
      
      breaks the build of the gianfar driver because "dev" is undefined in
      this function. To quickly test rc9 I changed this to priv->ndev but I do
      not know if this is the correct one.
      --------------------
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9d8e041
  6. 05 Sep, 2009 14 commits