• Neil Horman's avatar
    netpoll: fix race on poll_list resulting in garbage entry · 7b363e44
    Neil Horman authored
    	A few months back a race was discused between the netpoll napi service
    path, and the fast path through net_rx_action:
    http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470
    
    A patch was submitted for that bug, but I think we missed a case.
    
    Consider the following scenario:
    
    INITIAL STATE
    CPU0 has one napi_struct A on its poll_list
    CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same
    napi_struct A that CPU0 has on its list
    
    
    
    CPU0						CPU1
    net_rx_action					poll_napi
    !list_empty (returns true)			locks poll_lock for A
    						 poll_one_napi
    						  napi->poll
    						   netif_rx_complete
    						    __napi_complete
    						    (removes A from poll_list)
    list_entry(list->next)
    
    
    In the above scenario, net_rx_action assumes that the per-cpu poll_list is
    exclusive to that cpu.  netpoll of course violates that, and because the netpoll
    path can dequeue from the poll list, its possible for CPU0 to detect a non-empty
    list at the top of the while loop in net_rx_action, but have it become empty by
    the time it calls list_entry.  Since the poll_list isn't surrounded by any other
    structure, the returned data from that list_entry call in this situation is
    garbage, and any number of crashes can result based on what exactly that garbage
    is.
    
    Given that its not fasible for performance reasons to place exclusive locks
    arround each cpus poll list to provide that mutal exclusion, I think the best
    solution is modify the netpoll path in such a way that we continue to guarantee
    that the poll_list for a cpu is in fact exclusive to that cpu.  To do this I've
    implemented the patch below.  It adds an additional bit to the state field in
    the napi_struct.  When executing napi->poll from the netpoll_path, this bit will
    be set. When a driver calls netif_rx_complete, if that bit is set, it will not
    remove the napi_struct from the poll_list.  That work will be saved for the next
    iteration of net_rx_action.
    
    I've tested this and it seems to work well.  About the biggest drawback I can
    see to it is the fact that it might result in an extra loop through
    net_rx_action in the event that the device is actually contended for (i.e. the
    netpoll path actually preforms all the needed work no the device, and the call
    to net_rx_action winds up doing nothing, except removing the napi_struct from
    the poll_list.  However I think this is probably a small price to pay, given
    that the alternative is a crash.
    Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    7b363e44
netpoll.c 19.3 KB