• Matteo Croce's avatar
    mvpp2: recycle buffers · 133637fc
    Matteo Croce authored
    Use the new recycling API for page_pool.
    In a drop rate test, the packet rate is almost doubled,
    from 1110 Kpps to 2128 Kpps.
    
    perf top on a stock system shows:
    
    Overhead  Shared Object     Symbol
      34.88%  [kernel]          [k] page_pool_release_page
       8.06%  [kernel]          [k] free_unref_page
       6.42%  [mvpp2]           [k] mvpp2_rx
       6.07%  [kernel]          [k] eth_type_trans
       5.18%  [kernel]          [k] __netif_receive_skb_core
       4.95%  [kernel]          [k] build_skb
       4.88%  [kernel]          [k] kmem_cache_free
       3.97%  [kernel]          [k] kmem_cache_alloc
       3.45%  [kernel]          [k] dev_gro_receive
       2.73%  [kernel]          [k] page_frag_free
       2.07%  [kernel]          [k] __alloc_pages_bulk
       1.99%  [kernel]          [k] arch_local_irq_save
       1.84%  [kernel]          [k] skb_release_data
       1.20%  [kernel]          [k] netif_receive_skb_list_internal
    
    With packet rate stable at 1100 Kpps:
    
    tx: 0 bps 0 pps rx: 532.7 Mbps 1110 Kpps
    tx: 0 bps 0 pps rx: 532.6 Mbps 1110 Kpps
    tx: 0 bps 0 pps rx: 532.4 Mbps 1109 Kpps
    tx: 0 bps 0 pps rx: 532.1 Mbps 1109 Kpps
    tx: 0 bps 0 pps rx: 531.9 Mbps 1108 Kpps
    tx: 0 bps 0 pps rx: 531.9 Mbps 1108 Kpps
    
    And this is the same output with recycling enabled:
    
    Overhead  Shared Object     Symbol
      12.91%  [kernel]          [k] eth_type_trans
      12.54%  [mvpp2]           [k] mvpp2_rx
       9.67%  [kernel]          [k] build_skb
       9.63%  [kernel]          [k] __netif_receive_skb_core
       8.44%  [kernel]          [k] page_pool_put_page
       8.07%  [kernel]          [k] kmem_cache_free
       7.79%  [kernel]          [k] kmem_cache_alloc
       6.86%  [kernel]          [k] dev_gro_receive
       3.19%  [kernel]          [k] skb_release_data
       2.41%  [kernel]          [k] netif_receive_skb_list_internal
       2.18%  [kernel]          [k] page_pool_refill_alloc_cache
       1.76%  [kernel]          [k] napi_gro_receive
       1.61%  [kernel]          [k] kfree_skb
       1.20%  [kernel]          [k] dma_sync_single_for_device
       1.16%  [mvpp2]           [k] mvpp2_poll
       1.12%  [mvpp2]           [k] mvpp2_read
    
    With packet rate above 2100 Kpps:
    
    tx: 0 bps 0 pps rx: 1021 Mbps 2128 Kpps
    tx: 0 bps 0 pps rx: 1021 Mbps 2127 Kpps
    tx: 0 bps 0 pps rx: 1021 Mbps 2128 Kpps
    tx: 0 bps 0 pps rx: 1021 Mbps 2128 Kpps
    tx: 0 bps 0 pps rx: 1022 Mbps 2128 Kpps
    tx: 0 bps 0 pps rx: 1022 Mbps 2129 Kpps
    
    The major performance increase is explained by the fact that the most CPU
    consuming functions (page_pool_release_page, page_frag_free and
    free_unref_page) are no longer called on a per packet basis.
    
    The test was done by sending to the macchiatobin 64 byte ethernet frames
    with an invalid ethertype, so the packets are dropped early in the RX path.
    Signed-off-by: default avatarMatteo Croce <mcroce@microsoft.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    133637fc
mvpp2_main.c 206 KB