arch/powerpc/lib/checksum_32.S · f867d556dd8525fe6ff0d22a34249528e590f994 · Kirill Smelkov / linux

powerpc32: optimise csum_partial() loop · f867d556

Christophe Leroy authored Sep 22, 2015

On the 8xx, load latency is 2 cycles and taking branches also takes
2 cycles. So let's unroll the loop.

This patch improves csum_partial() speed by around 10% on both:
* 8xx (single issue processor with parallel execution)
* 83xx (superscalar 6xx processor with dual instruction fetch
and parallel execution)
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>

f867d556

checksum_32.S 6.08 KB

Replace checksum_32.S