• Eric Dumazet's avatar
    net: add additional lock to qdisc to increase throughput · 79640a4c
    Eric Dumazet authored
    When many cpus compete for sending frames on a given qdisc, the qdisc
    spinlock suffers from very high contention.
    
    The cpu owning __QDISC_STATE_RUNNING bit has same priority to acquire
    the lock, and cannot dequeue packets fast enough, since it must wait for
    this lock for each dequeued packet.
    
    One solution to this problem is to force all cpus spinning on a second
    lock before trying to get the main lock, when/if they see
    __QDISC_STATE_RUNNING already set.
    
    The owning cpu then compete with at most one other cpu for the main
    lock, allowing for higher dequeueing rate.
    
    Based on a previous patch from Alexander Duyck. I added the heuristic to
    avoid the atomic in fast path, and put the new lock far away from the
    cache line used by the dequeue worker. Also try to release the busylock
    lock as late as possible.
    
    Tests with following script gave a boost from ~50.000 pps to ~600.000
    pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver.
    (A single netperf flow can reach ~800.000 pps on this platform)
    
    for j in `seq 0 3`; do
      for i in `seq 0 7`; do
        netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 &
      done
    done
    Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
    Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    79640a4c
sch_generic.h 14.9 KB