• Davidlohr Bueso's avatar
    futexes: Increase hash table size for better performance · a52b89eb
    Davidlohr Bueso authored
    Currently, the futex global hash table suffers from its fixed,
    smallish (for today's standards) size of 256 entries, as well as
    its lack of NUMA awareness. Large systems, using many futexes,
    can be prone to high amounts of collisions; where these futexes
    hash to the same bucket and lead to extra contention on the same
    hb->lock. Furthermore, cacheline bouncing is a reality when we
    have multiple hb->locks residing on the same cacheline and
    different futexes hash to adjacent buckets.
    
    This patch keeps the current static size of 16 entries for small
    systems, or otherwise, 256 * ncpus (or larger as we need to
    round the number to a power of 2). Note that this number of CPUs
    accounts for all CPUs that can ever be available in the system,
    taking into consideration things like hotpluging. While we do
    impose extra overhead at bootup by making the hash table larger,
    this is a one time thing, and does not shadow the benefits of
    this patch.
    
    Furthermore, as suggested by tglx, by cache aligning the hash
    buckets we can avoid access across cacheline boundaries and also
    avoid massive cache line bouncing if multiple cpus are hammering
    away at different hash buckets which happen to reside in the
    same cache line.
    
    Also, similar to other core kernel components (pid, dcache,
    tcp), by using alloc_large_system_hash() we benefit from its
    NUMA awareness and thus the table is distributed among the nodes
    instead of in a single one.
    
    For a custom microbenchmark that pounds on the uaddr hashing --
    making the wait path fail at futex_wait_setup() returning
    -EWOULDBLOCK for large amounts of futexes, we can see the
    following benefits on a 80-core, 8-socket 1Tb server:
    
     +---------+--------------------+------------------------+-----------------------+-------------------------------+
     | threads | baseline (ops/sec) | aligned-only (ops/sec) | large table (ops/sec) | large table+aligned (ops/sec) |
     +---------+--------------------+------------------------+-----------------------+-------------------------------+
     |     512 |              32426 | 50531  (+55.8%)        | 255274  (+687.2%)     | 292553  (+802.2%)             |
     |     256 |              65360 | 99588  (+52.3%)        | 443563  (+578.6%)     | 508088  (+677.3%)             |
     |     128 |             125635 | 200075 (+59.2%)        | 742613  (+491.1%)     | 835452  (+564.9%)             |
     |      80 |             193559 | 323425 (+67.1%)        | 1028147 (+431.1%)     | 1130304 (+483.9%)             |
     |      64 |             247667 | 443740 (+79.1%)        | 997300  (+302.6%)     | 1145494 (+362.5%)             |
     |      32 |             628412 | 721401 (+14.7%)        | 965996  (+53.7%)      | 1122115 (+78.5%)              |
     +---------+--------------------+------------------------+-----------------------+-------------------------------+
    Reviewed-by: default avatarDarren Hart <dvhart@linux.intel.com>
    Reviewed-by: default avatarPeter Zijlstra <peterz@infradead.org>
    Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
    Reviewed-by: default avatarWaiman Long <Waiman.Long@hp.com>
    Reviewed-and-tested-by: default avatarJason Low <jason.low2@hp.com>
    Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Signed-off-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Jeff Mahoney <jeffm@suse.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Scott Norton <scott.norton@hp.com>
    Cc: Tom Vaden <tom.vaden@hp.com>
    Cc: Aswin Chandramouleeswaran <aswin@hp.com>
    Link: http://lkml.kernel.org/r/1389569486-25487-3-git-send-email-davidlohr@hp.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    a52b89eb
futex.c 71.3 KB