• Matt Carlson's avatar
    tg3: Improve small packet performance · f65aac16
    Matt Carlson authored
    smp_mb() inside tg3_tx_avail() is used twice in the normal
    tg3_start_xmit() path (see illustration below).  The full memory
    barrier is only necessary during race conditions with tx completion.
    We can speed up the tx path by replacing smp_mb() in tg3_tx_avail()
    with a compiler barrier.  The compiler barrier is to force the
    compiler to fetch the tx_prod and tx_cons from memory.
    
    In the race condition between tg3_start_xmit() and tg3_tx(),
    we have the following situation:
    
    tg3_start_xmit()                       tg3_tx()
        if (!tg3_tx_avail())
            BUG();
    
        ...
    
        if (!tg3_tx_avail())
            netif_tx_stop_queue();         update_tx_index();
            smp_mb();                      smp_mb();
            if (tg3_tx_avail())            if (netif_tx_queue_stopped() &&
                netif_tx_wake_queue();         tg3_tx_avail())
    
    With smp_mb() removed from tg3_tx_avail(), we need to add smp_mb() to
    tg3_start_xmit() as shown above to properly order netif_tx_stop_queue()
    and tg3_tx_avail() to check the ring index.  If it is not strictly
    ordered, the tx queue can be stopped forever.
    
    This improves performance by about 3% with 2 ports running
    bi-directional 64-byte packets.
    Reviewed-by: default avatarBenjamin Li <benli@broadcom.com>
    Signed-off-by: default avatarMichael Chan <mchan@broadcom.com>
    Signed-off-by: default avatarMatt Carlson <mcarlson@broadcom.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    f65aac16
tg3.c 395 KB