• Joanne Koong's avatar
    net: Add a bhash2 table hashed by port and address · 28044fc1
    Joanne Koong authored
    The current bind hashtable (bhash) is hashed by port only.
    In the socket bind path, we have to check for bind conflicts by
    traversing the specified port's inet_bind_bucket while holding the
    hashbucket's spinlock (see inet_csk_get_port() and
    inet_csk_bind_conflict()). In instances where there are tons of
    sockets hashed to the same port at different addresses, the bind
    conflict check is time-intensive and can cause softirq cpu lockups,
    as well as stops new tcp connections since __inet_inherit_port()
    also contests for the spinlock.
    
    This patch adds a second bind table, bhash2, that hashes by
    port and sk->sk_rcv_saddr (ipv4) and sk->sk_v6_rcv_saddr (ipv6).
    Searching the bhash2 table leads to significantly faster conflict
    resolution and less time holding the hashbucket spinlock.
    
    Please note a few things:
    * There can be the case where the a socket's address changes after it
    has been bound. There are two cases where this happens:
    
      1) The case where there is a bind() call on INADDR_ANY (ipv4) or
      IPV6_ADDR_ANY (ipv6) and then a connect() call. The kernel will
      assign the socket an address when it handles the connect()
    
      2) In inet_sk_reselect_saddr(), which is called when rebuilding the
      sk header and a few pre-conditions are met (eg rerouting fails).
    
    In these two cases, we need to update the bhash2 table by removing the
    entry for the old address, and add a new entry reflecting the updated
    address.
    
    * The bhash2 table must have its own lock, even though concurrent
    accesses on the same port are protected by the bhash lock. Bhash2 must
    have its own lock to protect against cases where sockets on different
    ports hash to different bhash hashbuckets but to the same bhash2
    hashbucket.
    
    This brings up a few stipulations:
      1) When acquiring both the bhash and the bhash2 lock, the bhash2 lock
      will always be acquired after the bhash lock and released before the
      bhash lock is released.
    
      2) There are no nested bhash2 hashbucket locks. A bhash2 lock is always
      acquired+released before another bhash2 lock is acquired+released.
    
    * The bhash table cannot be superseded by the bhash2 table because for
    bind requests on INADDR_ANY (ipv4) or IPV6_ADDR_ANY (ipv6), every socket
    bound to that port must be checked for a potential conflict. The bhash
    table is the only source of port->socket associations.
    Signed-off-by: default avatarJoanne Koong <joannelkoong@gmail.com>
    Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    28044fc1
proto.c 30.3 KB