1. 07 Mar, 2009 1 commit
    • Tejun Heo's avatar
      percpu: finer grained locking to break deadlock and allow atomic free · ccea34b5
      Tejun Heo authored
      Impact: fix deadlock and allow atomic free
      
      Percpu allocation always uses GFP_KERNEL and whole alloc/free paths
      were protected by single mutex.  All percpu allocations have been from
      GFP_KERNEL-safe context and the original allocator had this assumption
      too.  However, by protecting both alloc and free paths with the same
      mutex, the new allocator creates free -> alloc -> GFP_KERNEL
      dependency which the original allocator didn't have.  This can lead to
      deadlock if free is called from FS or IO paths.  Also, in general,
      allocators are expected to allow free to be called from atomic
      context.
      
      This patch implements finer grained locking to break the deadlock and
      allow atomic free.  For details, please read the "Synchronization
      rules" comment.
      
      While at it, also add CONTEXT: to function comments to describe which
      context they expect to be called from and what they do to it.
      
      This problem was reported by Thomas Gleixner and Peter Zijlstra.
      
        http://thread.gmane.org/gmane.linux.kernel/802384Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      ccea34b5
  2. 06 Mar, 2009 11 commits
    • Tejun Heo's avatar
      percpu: move fully free chunk reclamation into a work · a56dbddf
      Tejun Heo authored
      Impact: code reorganization for later changes
      
      Do fully free chunk reclamation using a work.  This change is to
      prepare for locking changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      a56dbddf
    • Tejun Heo's avatar
      percpu: move chunk area map extension out of area allocation · 9f7dcf22
      Tejun Heo authored
      Impact: code reorganization for later changes
      
      Separate out chunk area map extension into a separate function -
      pcpu_extend_area_map() - and call it directly from pcpu_alloc() such
      that pcpu_alloc_area() is guaranteed to have enough area map slots on
      invocation.
      
      With this change, pcpu_alloc_area() does only area allocation and the
      only failure mode is when the chunk doens't have enough room, so
      there's no need to distinguish it from memory allocation failures.
      Make it return -1 on such cases instead of hacky -ENOSPC.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      9f7dcf22
    • Tejun Heo's avatar
      percpu: replace pcpu_realloc() with pcpu_mem_alloc() and pcpu_mem_free() · 1880d93b
      Tejun Heo authored
      Impact: code reorganization for later changes
      
      With static map handling moved to pcpu_split_block(), pcpu_realloc()
      only clutters the code and it's also unsuitable for scheduled locking
      changes.  Implement and use pcpu_mem_alloc/free() instead.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      1880d93b
    • Tejun Heo's avatar
      x86, percpu: setup reserved percpu area for x86_64 · 6b19b0c2
      Tejun Heo authored
      Impact: fix relocation overflow during module load
      
      x86_64 uses 32bit relocations for symbol access and static percpu
      symbols whether in core or modules must be inside 2GB of the percpu
      segement base which the dynamic percpu allocator doesn't guarantee.
      This patch makes x86_64 reserve PERCPU_MODULE_RESERVE bytes in the
      first chunk so that module percpu areas are always allocated from the
      first chunk which is always inside the relocatable range.
      
      This problem exists for any percpu allocator but is easily triggered
      when using the embedding allocator because the second chunk is located
      beyond 2GB on it.
      
      This patch also changes the meaning of PERCPU_DYNAMIC_RESERVE such
      that it only indicates the size of the area to reserve for dynamic
      allocation as static and dynamic areas can be separate.  New
      PERCPU_DYNAMIC_RESERVED is increased by 4k for both 32 and 64bits as
      the reserved area separation eats away some allocatable space and
      having slightly more headroom (currently between 4 and 8k after
      minimal boot sans module area) makes sense for common case
      performance.
      
      x86_32 can address anywhere from anywhere and doesn't need reserving.
      
      Mike Galbraith first reported the problem first and bisected it to the
      embedding percpu allocator commit.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarMike Galbraith <efault@gmx.de>
      Reported-by: default avatarJaswinder Singh Rajput <jaswinder@kernel.org>
      6b19b0c2
    • Tejun Heo's avatar
      percpu, module: implement reserved allocation and use it for module percpu variables · edcb4639
      Tejun Heo authored
      Impact: add reserved allocation functionality and use it for module
      	percpu variables
      
      This patch implements reserved allocation from the first chunk.  When
      setting up the first chunk, arch can ask to set aside certain number
      of bytes right after the core static area which is available only
      through a separate reserved allocator.  This will be used primarily
      for module static percpu variables on architectures with limited
      relocation range to ensure that the module perpcu symbols are inside
      the relocatable range.
      
      If reserved area is requested, the first chunk becomes reserved and
      isn't available for regular allocation.  If the first chunk also
      includes piggy-back dynamic allocation area, a separate chunk mapping
      the same region is created to serve dynamic allocation.  The first one
      is called static first chunk and the second dynamic first chunk.
      Although they share the page map, their different area map
      initializations guarantee they serve disjoint areas according to their
      purposes.
      
      If arch doesn't setup reserved area, reserved allocation is handled
      like any other allocation.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      edcb4639
    • Tejun Heo's avatar
      percpu: add an indirection ptr for chunk page map access · 3e24aa58
      Tejun Heo authored
      Impact: allow sharing page map, no functional difference yet
      
      Make chunk->page access indirect by adding a pointer and renaming the
      actual array to page_ar.  This will be used by future changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      3e24aa58
    • Tejun Heo's avatar
      x86: make embedding percpu allocator return excessive free space · 9a4f8a87
      Tejun Heo authored
      Impact: reduce unnecessary memory usage on certain configurations
      
      Embedding percpu allocator allocates unit_size *
      smp_num_possible_cpus() bytes consecutively and use it for the first
      chunk.  However, if the static area is small, this can result in
      excessive prellocated free space in the first chunk due to
      PCPU_MIN_UNIT_SIZE restriction.
      
      This patch makes embedding percpu allocator preallocate only what's
      necessary as described by PERPCU_DYNAMIC_RESERVE and return the
      leftover to the bootmem allocator.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      9a4f8a87
    • Tejun Heo's avatar
      percpu: use negative for auto for pcpu_setup_first_chunk() arguments · cafe8816
      Tejun Heo authored
      Impact: argument semantic cleanup
      
      In pcpu_setup_first_chunk(), zero @unit_size and @dyn_size meant
      auto-sizing.  It's okay for @unit_size as 0 doesn't make sense but 0
      dynamic reserve size is valid.  Alos, if arch @dyn_size is calculated
      from other parameters, it might end up passing in 0 @dyn_size and
      malfunction when the size is automatically adjusted.
      
      This patch makes both @unit_size and @dyn_size ssize_t and use -1 for
      auto sizing.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      cafe8816
    • Tejun Heo's avatar
      percpu: improve first chunk initial area map handling · 61ace7fa
      Tejun Heo authored
      Impact: no functional change
      
      When the first chunk is created, its initial area map is not allocated
      because kmalloc isn't online yet.  The map is allocated and
      initialized on the first allocation request on the chunk.  This works
      fine but the scattering of initialization logic between the init
      function and allocation path is a bit confusing.
      
      This patch makes the first chunk initialize and use minimal statically
      allocated map from pcpu_setpu_first_chunk().  The map resizing path
      still needs to handle this specially but it's more straight-forward
      and gives more latitude to the init path.  This will ease future
      changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      61ace7fa
    • Tejun Heo's avatar
      percpu: cosmetic renames in pcpu_setup_first_chunk() · 2441d15c
      Tejun Heo authored
      Impact: cosmetic, preparation for future changes
      
      Make the following renames in pcpur_setup_first_chunk() in preparation
      for future changes.
      
      * s/free_size/dyn_size/
      * s/static_vm/first_vm/
      * s/static_chunk/schunk/
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      2441d15c
    • Tejun Heo's avatar
      percpu: clean up percpu constants · 6a242909
      Tejun Heo authored
      Impact: cleaup
      
      Make the following cleanups.
      
      * There isn't much arch-specific about PERCPU_MODULE_RESERVE.  Always
        define it whether arch overrides PERCPU_ENOUGH_ROOM or not.
      
      * blackfin overrides PERCPU_ENOUGH_ROOM to align static area size.  Do
        it by default.
      
      * percpu allocation sizes doesn't have much to do with the page size.
        Don't use PAGE_SHIFT in their definition.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Bryan Wu <cooloney@kernel.org>
      6a242909
  3. 04 Mar, 2009 4 commits
  4. 03 Mar, 2009 12 commits
  5. 02 Mar, 2009 12 commits