1. 12 Nov, 2009 1 commit
    • Tejun Heo's avatar
      percpu: restructure pcpu_extend_area_map() to fix bugs and improve readability · 833af842
      Tejun Heo authored
      pcpu_extend_area_map() had the following two bugs.
      
      * It should return 1 if pcpu_lock was dropped and reacquired but it
        returned 0.  This could lead to oops if free_percpu() races with
        area map extension.
      
      * pcpu_mem_free() was called under pcpu_lock.  pcpu_mem_free() might
        end up calling vfree() which isn't IRQ safe.  This could lead to
        deadlock through lock order inversion via IRQ.
      
      In addition, Linus pointed out that the temporary lock dropping and
      subtle three-way return value of pcpu_extend_area_map() was very ugly
      and suggested to split the function into two - pcpu_need_to_extend()
      and pcpu_extend_area_map().
      
      This patch restructures pcpu_extend_area_map() as suggested and fixes
      the two bugs.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      833af842
  2. 28 Oct, 2009 2 commits
    • Jiri Kosina's avatar
      sched: move rq_weight data array out of .percpu · 4a6cc4bd
      Jiri Kosina authored
      Commit 34d76c41 introduced percpu array update_shares_data, size of which
      being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
      NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
      
      Fix this by allocating this array dynamically and keep only pointer to it
      percpu.
      
      The per-cpu handling doesn't impose significant performance penalty on
      potentially contented path in tg_shares_up().
      
      ...
      ffffffff8104337c:       65 48 8b 14 25 20 cd    mov    %gs:0xcd20,%rdx
      ffffffff81043383:       00 00
      ffffffff81043385:       48 c7 c0 00 e1 00 00    mov    $0xe100,%rax
      ffffffff8104338c:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
      ffffffff81043393:       00
      ffffffff81043394:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
      ffffffff8104339b:       00
      ffffffff8104339c:       48 01 d0                add    %rdx,%rax
      ffffffff8104339f:       49 8d 94 24 08 01 00    lea    0x108(%r12),%rdx
      ffffffff810433a6:       00
      ffffffff810433a7:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
      ffffffff810433ac:       48 89 45 b0             mov    %rax,-0x50(%rbp)
      ffffffff810433b0:       bb 00 04 00 00          mov    $0x400,%ebx
      ffffffff810433b5:       48 89 55 c0             mov    %rdx,-0x40(%rbp)
      ...
      
      After:
      
      ...
      ffffffff8104337c:       65 8b 04 25 28 cd 00    mov    %gs:0xcd28,%eax
      ffffffff81043383:       00
      ffffffff81043384:       48 98                   cltq
      ffffffff81043386:       49 8d bc 24 08 01 00    lea    0x108(%r12),%rdi
      ffffffff8104338d:       00
      ffffffff8104338e:       48 8b 15 d3 7f 76 00    mov    0x767fd3(%rip),%rdx        # ffffffff817ab368 <update_shares_data>
      ffffffff81043395:       48 8b 34 c5 00 ee 6d    mov    -0x7e921200(,%rax,8),%rsi
      ffffffff8104339c:       81
      ffffffff8104339d:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
      ffffffff810433a4:       00
      ffffffff810433a5:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
      ffffffff810433aa:       48 89 7d c0             mov    %rdi,-0x40(%rbp)
      ffffffff810433ae:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
      ffffffff810433b5:       00
      ffffffff810433b6:       bb 00 04 00 00          mov    $0x400,%ebx
      ffffffff810433bb:       48 01 f2                add    %rsi,%rdx
      ffffffff810433be:       48 89 55 b0             mov    %rdx,-0x50(%rbp)
      ...
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      4a6cc4bd
    • Jiri Kosina's avatar
      percpu: allow pcpu_alloc() to be called with IRQs off · 403a91b1
      Jiri Kosina authored
      pcpu_alloc() and pcpu_extend_area_map() perform a series of
      spin_lock_irq()/spin_unlock_irq() calls, which make them unsafe
      with respect to being called from contexts which have IRQs off.
      
      This patch converts the code to perform save/restore of flags instead,
      making pcpu_alloc() (or __alloc_percpu() respectively) to be called
      from early kernel startup stage, where IRQs are off.
      
      This is needed for proper initialization of per-cpu rq_weight data from
      sched_init().
      
      tj: added comment explaining why irqsave/restore is used in alloc path.
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      403a91b1
  3. 12 Oct, 2009 1 commit
  4. 29 Sep, 2009 6 commits
    • Tejun Heo's avatar
      percpu: make allocation failures more verbose · f2badb0c
      Tejun Heo authored
      Warn and dump stack when percpu allocation fails.  percpu allocator is
      still young and unchecked NULL percpu pointer usage can result in
      random memory corruption when combined with the pointer shifting in
      access macros.  Allocation failures should be rare and the warning
      message will be disabled after certain times.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      f2badb0c
    • Tejun Heo's avatar
      percpu: make pcpu_setup_first_chunk() failures more verbose · 635b75fc
      Tejun Heo authored
      The parameters to pcpu_setup_first_chunk() come from different sources
      depending on architecture and can be quite complex.  The function runs
      various sanity checks on the parameters and triggers BUG() if
      something isn't right.  However, this is very early during the boot
      and not reporting exactly what the problem is makes debugging even
      harder.
      
      Add PCPU_SETUP_BUG() macro which prints out enough information about
      the parameters.  As the macro still puts separate BUG() for each
      check, it won't lose any information even on the situations where only
      the program counter can be retrieved.
      
      While at it, also bump pcpu_dump_alloc_info() message to KERN_INFO so
      that it's visible on the console if boot fails to complete.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      635b75fc
    • Tejun Heo's avatar
      percpu: make embedding first chunk allocator check vmalloc space size · 6ea529a2
      Tejun Heo authored
      Embedding first chunk allocator maintains the distances between units
      in the vmalloc area and thus needs vmalloc space to be larger than the
      maximum distances between units; otherwise, it wouldn't be able to
      create any dynamic chunks.  This patch makes the embedding first chunk
      allocator check vmalloc space size and if the maximum distance between
      units is larger than 75% of it, print warning and, if page mapping
      allocator is available, fail initialization so that the system falls
      back onto it.
      
      This should work around percpu allocation failure problems on certain
      sparc64 configurations where distances between NUMA nodes are larger
      than the vmalloc area and makes percpu allocator more robust for
      future configurations.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      6ea529a2
    • Tejun Heo's avatar
      sparc64: implement page mapping percpu first chunk allocator · a70c6913
      Tejun Heo authored
      Implement page mapping percpu first chunk allocator as a fallback to
      the embedding allocator.  The next patch will make the embedding
      allocator check distances between units to determine whether it fits
      within the vmalloc area so that this fallback can be used on such
      cases.
      
      sparc64 currently has relatively small vmalloc area which makes it
      impossible to create any dynamic chunks on certain configurations
      leading to percpu allocation failures.  This and the next patch should
      allow those configurations to keep working until proper solution is
      found.
      
      While at it, mark pcpu_cpu_distance() with __init.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      a70c6913
    • Tejun Heo's avatar
      percpu: make pcpu_build_alloc_info() clear static buffers · fb59e72e
      Tejun Heo authored
      pcpu_build_alloc_info() may be called multiple times when percpu is
      falling back to different first chunk allocator.  Make it clear static
      buffers so that they don't contain values from previous runs.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      fb59e72e
    • Tejun Heo's avatar
      percpu: fix unit_map[] verification in pcpu_setup_first_chunk() · ffe0d5a5
      Tejun Heo authored
      pcpu_setup_first_chunk() incorrectly used NR_CPUS as the impossible
      unit number while unit number can equal and go over NR_CPUS with
      sparse unit map.  This triggers BUG_ON() spuriously on machines which
      have non-power-of-two number of cpus.  Use UINT_MAX instead.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-tested-by: default avatarTony Vroon <tony@linx.net>
      ffe0d5a5
  5. 27 Sep, 2009 14 commits
  6. 26 Sep, 2009 16 commits