• Andrew Morton's avatar
    [PATCH] slab: updates for per-arch alignments · b9e55f3d
    Andrew Morton authored
    From: Manfred Spraul <manfred@colorfullife.com>
    
    Description:
    
    Right now kmem_cache_create automatically decides about the alignment of
    allocated objects. The automatic decisions are sometimes wrong:
    
    - for some objects, it's better to keep them as small as possible to
      reduce the memory usage.  Ingo already added a parameter to
      kmem_cache_create for the sigqueue cache, but it wasn't implemented.
    
    - for s390, normal kmalloc must be 8-byte aligned.  With debugging
      enabled, the default allocation was 4-bytes.  This means that s390 cannot
      enable slab debugging.
    
    - arm26 needs 1 kB aligned objects.  Previously this was impossible to
      generate, therefore arm has its own allocator in
      arm26/machine/small_page.c
    
    - most objects should be cache line aligned, to avoid false sharing.  But
      the cache line size was set at compile time, often to 128 bytes for
      generic kernels.  This wastes memory.  The new code uses the runtime
      determined cache line size instead.
    
    - some caches want an explicit alignment.  One example are the pte_chain
      objects: they must find the start of the object with addr&mask.  Right
      now pte_chain objects are scaled to the cache line size, because that was
      the only alignment that could be generated reliably.
    
    The implementation reuses the "offset" parameter of kmem_cache_create and
    now uses it to pass in the requested alignment.  offset was ignored by the
    current implementation, and the only user I found is sigqueue, which
    intended to set the alignment.
    
    In the long run, it might be interesting for the main tree: due to the 128
    byte alignment, only 7 inodes fit into one page, with 64-byte alignment, 9
    inodes - 20% memory recovered for Athlon systems.
    
    
    
    For generic kernels  running on P6 cpus (i.e. 32 byte cachelines), it means
    
    Number of objects per page:
    
     ext2_inode_cache: 8 instead of 7
     ext3_inode_cache: 8 instead of 7
     fat_inode_cache: 9 instead of 7
     rpc_tasks: 24 instead of 15
     tcp_tw_bucket: 40 instead of 30
     arp_cache: 40 instead of 30
     nfs_write_data: 9 instead of 7
    b9e55f3d
processor.h 16.9 KB