• Johannes Berg's avatar
    iwlwifi: fix TX queue race · 3995bd93
    Johannes Berg authored
    I had a problem on 4965 hardware (well, probably other hardware too,
    but others don't survive my stress testing right now, unfortunately)
    where the driver was sending invalid commands to the device, but no
    such thing could be seen from the driver's point of view. I could
    reproduce this fairly easily by sending multiple TCP streams with
    iperf on different TIDs, though sometimes a single iperf stream was
    sufficient. It even happened with a single core, but I have forced
    preemption turned on.
    
    The culprit was a queue overrun, where we advanced the queue's write
    pointer over the read pointer. After careful analysis I've come to
    the conclusion that the cause is a race condition between iwlwifi
    and mac80211.
    
    mac80211, of course, checks whether the queue is stopped, before
    transmitting a frame. This effectively looks like this:
    
            lock(queues)
            if (stopped(queue)) {
                    unlock(queues)
                    return busy;
    	}
            unlock(queues)
            ...             <-- this place will be important
    			    there is some more code here
            drv_tx(frame)
    
    The driver, on the other hand, can stop and start queues, which does
    
            lock(queues)
            mark_running/stopped(queue)
            unlock(queues)
    
    	[if marked running: wake up tasklet to send pending frames]
    
    Now, however, once the driver starts the queue, mac80211 can see that
    and end up at the marked place above, at which point for some reason the
    driver seems to stop the queue again (I don't understand that) and then
    we end up transmitting while the queue is actually full.
    
    Now, this shouldn't actually matter much, but for some reason I've seen
    it happen multiple times in a row and the queue actually overflows, at
    which point the queue bites itself in the tail and things go completely
    wrong.
    
    This patch fixes this by just dropping the packet should this have
    happened, and making the lock in iwlwifi cover everything so iwlwifi
    can't race against itself (dropping the lock there might make it more
    likely, but it did seem to happen without that too).
    
    Since we can't hold the lock across drv_tx() above, I see no way to fix
    this in mac80211, but I also don't understand why I haven't seen this
    before -- maybe I just never stress tested it this badly.
    
    With this patch, the device has survived many minutes of simultanously
    sending two iperf streams on different TIDs with combined throughput
    of about 60 Mbps.
    Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
    Signed-off-by: default avatarReinette Chatre <reinette.chatre@intel.com>
    Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
    3995bd93
iwl-tx.c 41.8 KB