1. 11 Aug, 2008 15 commits
    • Ingo Molnar's avatar
      Merge branch 'core/locking' into core/urgent · 23a0ee90
      Ingo Molnar authored
      23a0ee90
    • Peter Zijlstra's avatar
      lockdep: fix debug_lock_alloc · 0f2bc27b
      Peter Zijlstra authored
      When we enable DEBUG_LOCK_ALLOC but do not enable PROVE_LOCKING and or
      LOCK_STAT, lock_alloc() and lock_release() turn into nops, even though
      we should be doing hlock checking (check=1).
      
      This causes a false warning and a lockdep self-disable.
      
      Rectify this.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0f2bc27b
    • Ingo Molnar's avatar
      lockdep: increase MAX_LOCKDEP_KEYS · e5f363e3
      Ingo Molnar authored
      certain configs produce:
      
       [   70.076229] BUG: MAX_LOCKDEP_KEYS too low!
       [   70.080230] turning off the locking correctness validator.
      
      tune them up.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e5f363e3
    • Nick Piggin's avatar
      generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask() · cc7a486c
      Nick Piggin authored
      * Venki Pallipadi <venkatesh.pallipadi@intel.com> wrote:
      
      > Found a OOPS on a big SMP box during an overnight reboot test with
      > upstream git.
      >
      > Suresh and I looked at the oops and looks like the root cause is in
      > generic_smp_call_function_interrupt() and smp_call_function_mask() with
      > wait parameter.
      >
      > The actual oops looked like
      >
      > [   11.277260] BUG: unable to handle kernel paging request at ffff8802ffffffff
      > [   11.277815] IP: [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.278155] PGD 202063 PUD 0
      > [   11.278576] Oops: 0010 [1] SMP
      > [   11.279006] CPU 5
      > [   11.279336] Modules linked in:
      > [   11.279752] Pid: 0, comm: swapper Not tainted 2.6.27-rc2-00020-g685d87f7 #290
      > [   11.280039] RIP: 0010:[<ffff8802ffffffff>]  [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.280692] RSP: 0018:ffff88027f1f7f70  EFLAGS: 00010086
      > [   11.280976] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000
      > [   11.281264] RDX: 0000000000004f4e RSI: 0000000000000001 RDI: 0000000000000000
      > [   11.281624] RBP: ffff88027f1f7f98 R08: 0000000000000001 R09: ffffffff802509af
      > [   11.281925] R10: ffff8800280c2780 R11: 0000000000000000 R12: ffff88027f097d48
      > [   11.282214] R13: ffff88027f097d70 R14: 0000000000000005 R15: ffff88027e571000
      > [   11.282502] FS:  0000000000000000(0000) GS:ffff88027f1c3340(0000) knlGS:0000000000000000
      > [   11.283096] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      > [   11.283382] CR2: ffff8802ffffffff CR3: 0000000000201000 CR4: 00000000000006e0
      > [   11.283760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > [   11.284048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      > [   11.284337] Process swapper (pid: 0, threadinfo ffff88027f1f2000, task ffff88027f1f0640)
      > [   11.284936] Stack:  ffffffff80250963 0000000000000212 0000000000ee8c78 0000000000ee8a66
      > [   11.285802]  ffff88027e571550 ffff88027f1f7fa8 ffffffff8021adb5 ffff88027f1f3e40
      > [   11.286599]  ffffffff8020bdd6 ffff88027f1f3e40 <EOI>  ffff88027f1f3ef8 0000000000000000
      > [   11.287120] Call Trace:
      > [   11.287768]  <IRQ>  [<ffffffff80250963>] ? generic_smp_call_function_interrupt+0x61/0x12c
      > [   11.288354]  [<ffffffff8021adb5>] smp_call_function_interrupt+0x17/0x27
      > [   11.288744]  [<ffffffff8020bdd6>] call_function_interrupt+0x66/0x70
      > [   11.289030]  <EOI>  [<ffffffff8024ab3b>] ? clockevents_notify+0x19/0x73
      > [   11.289380]  [<ffffffff803b9b75>] ? acpi_idle_enter_simple+0x18b/0x1fa
      > [   11.289760]  [<ffffffff803b9b6b>] ? acpi_idle_enter_simple+0x181/0x1fa
      > [   11.290051]  [<ffffffff8053aeca>] ? cpuidle_idle_call+0x70/0xa2
      > [   11.290338]  [<ffffffff80209f61>] ? cpu_idle+0x5f/0x7d
      > [   11.290723]  [<ffffffff8060224a>] ? start_secondary+0x14d/0x152
      > [   11.291010]
      > [   11.291287]
      > [   11.291654] Code:  Bad RIP value.
      > [   11.292041] RIP  [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.292380]  RSP <ffff88027f1f7f70>
      > [   11.292741] CR2: ffff8802ffffffff
      > [   11.310951] ---[ end trace 137c54d525305f1c ]---
      >
      > The problem is with the following sequence of events:
      >
      > - CPU A calls smp_call_function_mask() for CPU B with wait parameter
      > - CPU A sets up the call_function_data on the stack and does an rcu add to
      >   call_function_queue
      > - CPU A waits until the WAIT flag is cleared
      > - CPU B gets the call function interrupt and starts going through the
      >   call_function_queue
      > - CPU C also gets some other call function interrupt and starts going through
      >   the call_function_queue
      > - CPU C, which is also going through the call_function_queue, starts referencing
      >   CPU A's stack, as that element is still in call_function_queue
      > - CPU B finishes the function call that CPU A set up and as there are no other
      >   references to it, rcu deletes the call_function_data (which was from CPU A
      >   stack)
      > - CPU B sees the wait flag and just clears the flag (no call_rcu to free)
      > - CPU A which was waiting on the flag continues executing and the stack
      >   contents change
      >
      > - CPU C is still in rcu_read section accessing the CPU A's stack sees
      >   inconsistent call_funation_data and can try to execute
      >   function with some random pointer, causing stack corruption for A
      >   (by clearing the bits in mask field) and oops.
      
      Nice debugging work.
      
      I'd suggest something like the attached (boot tested) patch as the simple
      fix for now.
      
      I expect the benefits from the less synchronized, multiple-in-flight-data
      global queue will still outweigh the costs of dynamic allocations. But
      if worst comes to worst then we just go back to a globally synchronous
      one-at-a-time implementation, but that would be pretty sad!
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cc7a486c
    • Peter Zijlstra's avatar
      lockdep: fix overflow in the hlock shrinkage code · b42e737e
      Peter Zijlstra authored
      There is a overflow by 1 case in the new shrunken hlock code.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b42e737e
    • Ingo Molnar's avatar
      lockdep: rename map_[acquire|release]() => lock_map_[acquire|release]() · 3295f0ef
      Ingo Molnar authored
      the names were too generic:
      
       drivers/uio/uio.c:87: error: expected identifier or '(' before 'do'
       drivers/uio/uio.c:87: error: expected identifier or '(' before 'while'
       drivers/uio/uio.c:113: error: 'map_release' undeclared here (not in a function)
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3295f0ef
    • Rabin Vincent's avatar
      lockdep: handle chains involving classes defined in modules · 8bfe0298
      Rabin Vincent authored
      Solve this by marking the classes as unused and not printing information
      about the unused classes.
      Reported-by: default avatarEric Sesterhenn <snakebyte@gmx.de>
      Signed-off-by: default avatarRabin Vincent <rabin@rab.in>
      Acked-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8bfe0298
    • Peter Zijlstra's avatar
      mm: fix mm_take_all_locks() locking order · 7cd5a02f
      Peter Zijlstra authored
      Lockdep spotted:
      
      =======================================================
      [ INFO: possible circular locking dependency detected ]
      2.6.27-rc1 #270
      -------------------------------------------------------
      qemu-kvm/2033 is trying to acquire lock:
       (&inode->i_data.i_mmap_lock){----}, at: [<ffffffff802996cc>] mm_take_all_locks+0xc2/0xea
      
      but task is already holding lock:
       (&anon_vma->lock){----}, at: [<ffffffff8029967a>] mm_take_all_locks+0x70/0xea
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&anon_vma->lock){----}:
             [<ffffffff8025cd37>] __lock_acquire+0x11be/0x14d2
             [<ffffffff8025d0a9>] lock_acquire+0x5e/0x7a
             [<ffffffff804c655b>] _spin_lock+0x3b/0x47
             [<ffffffff8029a2ef>] vma_adjust+0x200/0x444
             [<ffffffff8029a662>] split_vma+0x12f/0x146
             [<ffffffff8029bc60>] mprotect_fixup+0x13c/0x536
             [<ffffffff8029c203>] sys_mprotect+0x1a9/0x21e
             [<ffffffff8020c0db>] system_call_fastpath+0x16/0x1b
             [<ffffffffffffffff>] 0xffffffffffffffff
      
      -> #0 (&inode->i_data.i_mmap_lock){----}:
             [<ffffffff8025ca54>] __lock_acquire+0xedb/0x14d2
             [<ffffffff8025d397>] lock_release_non_nested+0x1c2/0x219
             [<ffffffff8025d515>] lock_release+0x127/0x14a
             [<ffffffff804c6403>] _spin_unlock+0x1e/0x50
             [<ffffffff802995d9>] mm_drop_all_locks+0x7f/0xb0
             [<ffffffff802a965d>] do_mmu_notifier_register+0xe2/0x112
             [<ffffffff802a96a8>] mmu_notifier_register+0xe/0x10
             [<ffffffffa0043b6b>] kvm_dev_ioctl+0x11e/0x287 [kvm]
             [<ffffffff802bd0ca>] vfs_ioctl+0x2a/0x78
             [<ffffffff802bd36f>] do_vfs_ioctl+0x257/0x274
             [<ffffffff802bd3e1>] sys_ioctl+0x55/0x78
             [<ffffffff8020c0db>] system_call_fastpath+0x16/0x1b
             [<ffffffffffffffff>] 0xffffffffffffffff
      
      other info that might help us debug this:
      
      5 locks held by qemu-kvm/2033:
       #0:  (&mm->mmap_sem){----}, at: [<ffffffff802a95d0>] do_mmu_notifier_register+0x55/0x112
       #1:  (mm_all_locks_mutex){--..}, at: [<ffffffff8029963e>] mm_take_all_locks+0x34/0xea
       #2:  (&anon_vma->lock){----}, at: [<ffffffff8029967a>] mm_take_all_locks+0x70/0xea
       #3:  (&anon_vma->lock){----}, at: [<ffffffff8029967a>] mm_take_all_locks+0x70/0xea
       #4:  (&anon_vma->lock){----}, at: [<ffffffff8029967a>] mm_take_all_locks+0x70/0xea
      
      stack backtrace:
      Pid: 2033, comm: qemu-kvm Not tainted 2.6.27-rc1 #270
      
      Call Trace:
       [<ffffffff8025b7c7>] print_circular_bug_tail+0xb8/0xc3
       [<ffffffff8025ca54>] __lock_acquire+0xedb/0x14d2
       [<ffffffff80259bb1>] ? add_lock_to_list+0x7e/0xad
       [<ffffffff8029967a>] ? mm_take_all_locks+0x70/0xea
       [<ffffffff8029967a>] ? mm_take_all_locks+0x70/0xea
       [<ffffffff8025d397>] lock_release_non_nested+0x1c2/0x219
       [<ffffffff802996cc>] ? mm_take_all_locks+0xc2/0xea
       [<ffffffff802996cc>] ? mm_take_all_locks+0xc2/0xea
       [<ffffffff8025b202>] ? trace_hardirqs_on_caller+0x4d/0x115
       [<ffffffff802995d9>] ? mm_drop_all_locks+0x7f/0xb0
       [<ffffffff8025d515>] lock_release+0x127/0x14a
       [<ffffffff804c6403>] _spin_unlock+0x1e/0x50
       [<ffffffff802995d9>] mm_drop_all_locks+0x7f/0xb0
       [<ffffffff802a965d>] do_mmu_notifier_register+0xe2/0x112
       [<ffffffff802a96a8>] mmu_notifier_register+0xe/0x10
       [<ffffffffa0043b6b>] kvm_dev_ioctl+0x11e/0x287 [kvm]
       [<ffffffff8033f9f2>] ? file_has_perm+0x83/0x8e
       [<ffffffff802bd0ca>] vfs_ioctl+0x2a/0x78
       [<ffffffff802bd36f>] do_vfs_ioctl+0x257/0x274
       [<ffffffff802bd3e1>] sys_ioctl+0x55/0x78
       [<ffffffff8020c0db>] system_call_fastpath+0x16/0x1b
      
      Which the locking hierarchy in mm/rmap.c confirms as valid.
      
      Fix this by first taking all the mapping->i_mmap_lock instances and then
      take all anon_vma->lock instances.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7cd5a02f
    • Peter Zijlstra's avatar
      lockdep: annotate mm_take_all_locks() · 454ed842
      Peter Zijlstra authored
      The nesting is correct due to holding mmap_sem, use the new annotation
      to annotate this.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      454ed842
    • Peter Zijlstra's avatar
      lockdep: spin_lock_nest_lock() · b7d39aff
      Peter Zijlstra authored
      Expose the new lock protection lock.
      
      This can be used to annotate places where we take multiple locks of the
      same class and avoid deadlocks by always taking another (top-level) lock
      first.
      
      NOTE: we're still bound to the MAX_LOCK_DEPTH (48) limit.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b7d39aff
    • Peter Zijlstra's avatar
      lockdep: lock protection locks · 7531e2f3
      Peter Zijlstra authored
      On Fri, 2008-08-01 at 16:26 -0700, Linus Torvalds wrote:
      
      > On Fri, 1 Aug 2008, David Miller wrote:
      > >
      > > Taking more than a few locks of the same class at once is bad
      > > news and it's better to find an alternative method.
      >
      > It's not always wrong.
      >
      > If you can guarantee that anybody that takes more than one lock of a
      > particular class will always take a single top-level lock _first_, then
      > that's all good. You can obviously screw up and take the same lock _twice_
      > (which will deadlock), but at least you cannot get into ABBA situations.
      >
      > So maybe the right thing to do is to just teach lockdep about "lock
      > protection locks". That would have solved the multi-queue issues for
      > networking too - all the actual network drivers would still have taken
      > just their single queue lock, but the one case that needs to take all of
      > them would have taken a separate top-level lock first.
      >
      > Never mind that the multi-queue locks were always taken in the same order:
      > it's never wrong to just have some top-level serialization, and anybody
      > who needs to take <n> locks might as well do <n+1>, because they sure as
      > hell aren't going to be on _any_ fastpaths.
      >
      > So the simplest solution really sounds like just teaching lockdep about
      > that one special case. It's not "nesting" exactly, although it's obviously
      > related to it.
      
      Do as Linus suggested. The lock protection lock is called nest_lock.
      
      Note that we still have the MAX_LOCK_DEPTH (48) limit to consider, so anything
      that spills that it still up shit creek.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7531e2f3
    • Peter Zijlstra's avatar
      lockdep: map_acquire · 4f3e7524
      Peter Zijlstra authored
      Most the free-standing lock_acquire() usages look remarkably similar, sweep
      them into a new helper.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4f3e7524
    • Dave Jones's avatar
      lockdep: shrink held_lock structure · f82b217e
      Dave Jones authored
      struct held_lock {
              u64                        prev_chain_key;       /*     0     8 */
              struct lock_class *        class;                /*     8     8 */
              long unsigned int          acquire_ip;           /*    16     8 */
              struct lockdep_map *       instance;             /*    24     8 */
              int                        irq_context;          /*    32     4 */
              int                        trylock;              /*    36     4 */
              int                        read;                 /*    40     4 */
              int                        check;                /*    44     4 */
              int                        hardirqs_off;         /*    48     4 */
      
              /* size: 56, cachelines: 1 */
              /* padding: 4 */
              /* last cacheline: 56 bytes */
      };
      
      struct held_lock {
              u64                        prev_chain_key;       /*     0     8 */
              long unsigned int          acquire_ip;           /*     8     8 */
              struct lockdep_map *       instance;             /*    16     8 */
              unsigned int               class_idx:11;         /*    24:21  4 */
              unsigned int               irq_context:2;        /*    24:19  4 */
              unsigned int               trylock:1;            /*    24:18  4 */
              unsigned int               read:2;               /*    24:16  4 */
              unsigned int               check:2;              /*    24:14  4 */
              unsigned int               hardirqs_off:1;       /*    24:13  4 */
      
              /* size: 32, cachelines: 1 */
              /* padding: 4 */
              /* bit_padding: 13 bits */
              /* last cacheline: 32 bytes */
      };
      
      [mingo@elte.hu: shrunk hlock->class too]
      [peterz@infradead.org: fixup bit sizes]
      Signed-off-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      f82b217e
    • Peter Zijlstra's avatar
      lockdep: re-annotate scheduler runqueues · 1b12bbc7
      Peter Zijlstra authored
      Instead of using a per-rq lock class, use the regular nesting operations.
      
      However, take extra care with double_lock_balance() as it can release the
      already held rq->lock (and therefore change its nesting class).
      
      So what can happen is:
      
       spin_lock(rq->lock);	// this rq subclass 0
      
       double_lock_balance(rq, other_rq);
         // release rq
         // acquire other_rq->lock subclass 0
         // acquire rq->lock subclass 1
      
       spin_unlock(other_rq->lock);
      
      leaving you with rq->lock in subclass 1
      
      So a subsequent double_lock_balance() call can try to nest a subclass 1
      lock while already holding a subclass 1 lock.
      
      Fix this by introducing double_unlock_balance() which releases the other
      rq's lock, but also re-sets the subclass for this rq's lock to 0.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1b12bbc7
    • Peter Zijlstra's avatar
      lockdep: lock_set_subclass - reset a held lock's subclass · 64aa348e
      Peter Zijlstra authored
      this can be used to reset a held lock's subclass, for arbitrary-depth
      iterated data structures such as trees or lists which have per-node
      locks.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      64aa348e
  2. 08 Aug, 2008 25 commits