• David Howells's avatar
    [PATCH] Permit inode & dentry hash tables to be allocated > MAX_ORDER size · 17e14bef
    David Howells authored
    Here's a patch to allocate memory for big system hash tables with the
    bootmem allocator rather than with main page allocator.
    
    It is needed for three reasons:
    
    (1) So that the size can be bigger than MAX_ORDER.  IBM have done some
        testing on their big PPC64 systems (64GB of RAM) with linux-2.4 and found
        that they get better performance if the sizes of the inode cache hash,
        dentry cache hash, buffer head hash and page cache hash are increased
        beyond MAX_ORDER (order 11).
    
         Now the main allocator can't allocate anything larger than MAX_ORDER, but
         the bootmem allocator can.
    
         In 2.6 it appears that only the inode and dentry hashes remain of those
         four, but there are other hash tables that could use this service.
    
    (2) Changing MAX_ORDER appears to have a number of effects beyond just
        limiting the maximum size that can be allocated in one go.
    
    (3) Should someone want a hash table in which each bucket isn't a power of
        two in size, memory will be wasted as the chunk of memory allocated will
        be a power of two in size (to hold a power of two number of buckets).
    
        On the other hand, using the bootmem allocator means the allocation
        will only take up sufficient pages to hold it, rather than the next power
        of two up.
    
        Admittedly, this point doesn't apply to the dentry and inode hashes,
        but it might to another hash table that might want to use this service.
    
    
    I've coelesced the meat of the inode and dentry allocation routines into
    one such routine in mm/page_alloc.c that the the respective initialisation
    functions now call before mem_init() is called.
    
    This routine gets it's approximation of memory size by counting up the
    ZONE_NORMAL and ZONE_DMA pages (and ZONE_HIGHMEM if requested) in all the
    nodes passed to the main allocator by paging_init() (or wherever the arch
    does it).  It does not use max_low_pfn as that doesn't seem to be available
    on all archs, and it doesn't use num_physpages since that includes highmem
    pages not available to the kernel for allocating data structures upon -
    which may not be appropriate when calculating hash table size.
    
    On the off chance that the size of each hash bucket may not be exactly a
    power of two, the routine will only allocate as many pages as is necessary
    to ensure that the number of buckets is exactly a power of two, rather than
    allocating the smallest power-of-two sized chunk of memory that will hold
    the same array of buckets.
    
    The maximum size of any single hash table is given by
    MAX_SYS_HASH_TABLE_ORDER, as is now defined in linux/mmzone.h.
    Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
    Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    17e14bef
inode.c 35.6 KB