-
Tariq Toukan authored
Several small performance improvements in RX datapath, including: - Compiler branch predictor hints. - Replace a multiplication with a shift operation. - Minimize variables scope. - Write-prefetch for packet header. - Avoid trinary-operator ("?") when value can be preset in a matching branch. - Save a branch by updating RX ring doorbell within mlx4_en_refill_rx_buffers(), which now returns void. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Single queue no-RSS optimization ON (enable by ethtool -L <interface> rx 1). XDP_DROP packet rate: Same (28.1 Mpps), lower CPU utilization (from ~100% to ~92%). Drop packets in TC: ------------------------------------- | Before | After | Gain | IPv4 | 4.14 Mpps | 4.18 Mpps | 1% | ------------------------------------- XDP_TX packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 10.1 Mpps | 10.3 Mpps | 2% | IPv6 | 10.1 Mpps | 10.3 Mpps | 2% | ------------------------------------- Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
9bcee89a