[NET]: Close NETIF_F_LLTX race conditions.
When drivers other than loopback were using the LLTX
feature a race window was present. While sending
queued packets, the packet scheduler layer drops the
queue lock then calls directly into the drivers xmit
handler. The driver then grabs it's private TX lock
and goes to work.
However, as soon as we've dropped the queue lock another
thread doing TX processing for that card can execute
a netif_stop_queue() due to the TX queue filling up.
This race window causes problems because a properly coded
driver should never end up in it's ->hard_start_xmit()
handler if the queue on the device has been stopped and
we even BUG() trap for this condition in all of the device
drivers. That is how this race window was discovered
by Roland and the Infiniband folks.
Various suggestions were made to close this race. One
of which involved holding onto the queue lock all the
way into the ->hard_start_xmit() routine. Then having
the driver drop that lock only after taking it's private
TX lock. This solution was deemed grotty because it is
not wise to put queueing discipline internals into the
device drivers.
The solution taken here, which is based upon ideas from
Stephen Hemminger, is twofold:
1) Leave LLTX around for purely software devices that
need no locking at all for TX processing. The existing
example is loopback, although all tunnel devices could
be converted in this way too.
2) Stop trying to use LLTX for the other devices. Instead
achieve the same goal using a different mechanism.
For #2, the thing we were trying to achieve with LLTX
was to eliminate excess locking. We accomplish that
now by letting the device driver use dev->xmit_lock directly
instead of a seperate priv->tx_lock of some sort.
In order to allow that, we had to turn dev->xmit_lock into
a hardware IRQ disabling lock instead of a BH disabling one.
Signed-off-by: David S. Miller <davem@davemloft.net>
Showing
Please register or sign in to comment