• Jason Baron's avatar
    epoll: do not take global 'epmutex' for simple topologies · 67347fe4
    Jason Baron authored
    When calling EPOLL_CTL_ADD for an epoll file descriptor that is attached
    directly to a wakeup source, we do not need to take the global 'epmutex',
    unless the epoll file descriptor is nested.  The purpose of taking the
    'epmutex' on add is to prevent complex topologies such as loops and deep
    wakeup paths from forming in parallel through multiple EPOLL_CTL_ADD
    operations.  However, for the simple case of an epoll file descriptor
    attached directly to a wakeup source (with no nesting), we do not need to
    hold the 'epmutex'.
    
    This patch along with 'epoll: optimize EPOLL_CTL_DEL using rcu' improves
    scalability on larger systems.  Quoting Nathan Zimmer's mail on SPECjbb
    performance:
    
    "On the 16 socket run the performance went from 35k jOPS to 125k jOPS.  In
    addition the benchmark when from scaling well on 10 sockets to scaling
    well on just over 40 sockets.
    
    ...
    
    Currently the benchmark stops scaling at around 40-44 sockets but it seems like
    I found a second unrelated bottleneck."
    
    [akpm@linux-foundation.org: use `bool' for boolean variables, remove unneeded/undesirable cast of void*, add missed ep_scan_ready_list() kerneldoc]
    Signed-off-by: default avatarJason Baron <jbaron@akamai.com>
    Tested-by: default avatarNathan Zimmer <nzimmer@sgi.com>
    Cc: Eric Wong <normalperson@yhbt.net>
    Cc: Nelson Elhage <nelhage@nelhage.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Davide Libenzi <davidel@xmailserver.org>
    Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    67347fe4
eventpoll.c 59.1 KB