• John Crispin's avatar
    ath11k: optimize RX path latency · 293cb583
    John Crispin authored
    This patch drops ath11k_hal_rx_parse_dst_ring_desc(). This function was
    creating a huge amount of load, which lead to a signifcant latency delay
    when processing data in the RX path.
    
    Pegging the processing on a specific core and running perf --top we get
    the following output when running HE80 at a fixed bandwidth of 1gbit.
    
    with patch
        19.19%  [ath11k]       [k] ath11k_dp_process_rx
         5.02%  [ath11k]       [k] ath11k_dp_rx_tid_del_func
         4.39%  [kernel]       [k] v7_dma_inv_range
         4.15%  [kernel]       [k] __slab_alloc.constprop.1
         4.03%  [kernel]       [k] dev_gro_receive
         3.86%  [kernel]       [k] tcp_gro_receive
         3.07%  [ip_tables]    [k] ipt_do_table
         2.96%  [kernel]       [k] dma_cache_maint_page
    
    without patch
        21.64%  [ath11k]       [k] ath11k_hal_rx_parse_dst_ring_desc
        10.80%  [ath11k]       [k] ath11k_dp_process_rx
         3.77%  [kernel]       [k] v7_dma_inv_range
         3.48%  [kernel]       [k] dev_gro_receive
         3.32%  [ath11k]       [k] ath11k_dp_rx_tid_del_func
         3.17%  [mac80211]     [k] ieee80211_rx_napi
         2.70%  [kernel]       [k] dma_cache_maint_page
         2.65%  [mac80211]     [k] ieee80211_sta_ps_transition
    
    When removing the the bandwidth limit and rerunning the test we see an
    overall throughput improvement of 3-400mbit when running 4x4 HE80.
    Signed-off-by: default avatarShashidhar Lakkavalli <slakkavalli@datto.com>
    Signed-off-by: default avatarJohn Crispin <john@phrozen.org>
    Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
    293cb583
hal_rx.c 38.3 KB