ath11k: optimize RX path latency
This patch drops ath11k_hal_rx_parse_dst_ring_desc(). This function was creating a huge amount of load, which lead to a signifcant latency delay when processing data in the RX path. Pegging the processing on a specific core and running perf --top we get the following output when running HE80 at a fixed bandwidth of 1gbit. with patch 19.19% [ath11k] [k] ath11k_dp_process_rx 5.02% [ath11k] [k] ath11k_dp_rx_tid_del_func 4.39% [kernel] [k] v7_dma_inv_range 4.15% [kernel] [k] __slab_alloc.constprop.1 4.03% [kernel] [k] dev_gro_receive 3.86% [kernel] [k] tcp_gro_receive 3.07% [ip_tables] [k] ipt_do_table 2.96% [kernel] [k] dma_cache_maint_page without patch 21.64% [ath11k] [k] ath11k_hal_rx_parse_dst_ring_desc 10.80% [ath11k] [k] ath11k_dp_process_rx 3.77% [kernel] [k] v7_dma_inv_range 3.48% [kernel] [k] dev_gro_receive 3.32% [ath11k] [k] ath11k_dp_rx_tid_del_func 3.17% [mac80211] [k] ieee80211_rx_napi 2.70% [kernel] [k] dma_cache_maint_page 2.65% [mac80211] [k] ieee80211_sta_ps_transition When removing the the bandwidth limit and rerunning the test we see an overall throughput improvement of 3-400mbit when running 4x4 HE80. Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com> Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Showing
Please register or sign in to comment