• Daniel Borkmann's avatar
    veth: Do not record rx queue hint in veth_xmit · 710ad98c
    Daniel Borkmann authored
    Laurent reported that they have seen a significant amount of TCP retransmissions
    at high throughput from applications residing in network namespaces talking to
    the outside world via veths. The drops were seen on the qdisc layer (fq_codel,
    as per systemd default) of the phys device such as ena or virtio_net due to all
    traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the
    setup was _not_ using XDP on veths as the issue is generic.)
    
    More specifically, after edbea922 ("veth: Store queue_mapping independently
    of XDP prog presence") which made it all the way back to v4.19.184+,
    skb_record_rx_queue() would set skb->queue_mapping to 1 (given 1 RX and 1 TX
    queue by default for veths) instead of leaving at 0.
    
    This is eventually retained and callbacks like ena_select_queue() will also pick
    single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic
    is forwarded to that device via upper stack or other means. Similarly, for others
    not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might
    call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well.
    
    In general, it is a _bad_ idea for virtual devices like veth to mess around with
    queue selection [by default]. Given dev->real_num_tx_queues is by default 1,
    the skb->queue_mapping was left untouched, and so prior to edbea922 the
    netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device.
    
    Unbreak this and restore prior behavior by removing the skb_record_rx_queue()
    from veth_xmit() altogether.
    
    If the veth peer has an XDP program attached, then it would return the first RX
    queue index in xdp_md->rx_queue_index (unless configured in non-default manner).
    However, this is still better than breaking the generic case.
    
    Fixes: edbea922 ("veth: Store queue_mapping independently of XDP prog presence")
    Fixes: 638264dc
    
     ("veth: Support per queue XDP ring")
    Reported-by: default avatarLaurent Bernaille <laurent.bernaille@datadoghq.com>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Cc: Paolo Abeni <pabeni@redhat.com>
    Cc: John Fastabend <john.fastabend@gmail.com>
    Cc: Willem de Bruijn <willemb@google.com>
    Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
    Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
    Acked-by: default avatarToshiaki Makita <toshiaki.makita1@gmail.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    710ad98c
veth.c 42.5 KB