• Willem de Bruijn's avatar
    packet: packet fanout rollover during socket overload · 77f65ebd
    Willem de Bruijn authored
    Changes:
      v3->v2: rebase (no other changes)
              passes selftest
      v2->v1: read f->num_members only once
              fix bug: test rollover mode + flag
    
    Minimize packet drop in a fanout group. If one socket is full,
    roll over packets to another from the group. Maintain flow
    affinity during normal load using an rxhash fanout policy, while
    dispersing unexpected traffic storms that hit a single cpu, such
    as spoofed-source DoS flows. Rollover breaks affinity for flows
    arriving at saturated sockets during those conditions.
    
    The patch adds a fanout policy ROLLOVER that rotates between sockets,
    filling each socket before moving to the next. It also adds a fanout
    flag ROLLOVER. If passed along with any other fanout policy, the
    primary policy is applied until the chosen socket is full. Then,
    rollover selects another socket, to delay packet drop until the
    entire system is saturated.
    
    Probing sockets is not free. Selecting the last used socket, as
    rollover does, is a greedy approach that maximizes chance of
    success, at the cost of extreme load imbalance. In practice, with
    sufficiently long queues to absorb bursts, sockets are drained in
    parallel and load balance looks uniform in `top`.
    
    To avoid contention, scales counters with number of sockets and
    accesses them lockfree. Values are bounds checked to ensure
    correctness.
    
    Tested using an application with 9 threads pinned to CPUs, one socket
    per thread and sufficient busywork per packet operation to limits each
    thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
    packets, a FANOUT_CPU setup processes 32 Kpps in total without this
    patch, 270 Kpps with the patch. Tested with read() and with a packet
    ring (V1).
    
    Also, passes psock_fanout.c unit test added to selftests.
    Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
    Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    77f65ebd
Makefile 414 Bytes