• Robin Murphy's avatar
    arm64: Implement optimised checksum routine · 5777eaed
    Robin Murphy authored
    Apparently there exist certain workloads which rely heavily on software
    checksumming, for which the generic do_csum() implementation becomes a
    significant bottleneck. Therefore let's give arm64 its own optimised
    version - for ease of maintenance this foregoes assembly or intrisics,
    and is thus not actually arm64-specific, but does rely heavily on C
    idioms that translate well to the A64 ISA and the typical load/store
    capabilities of most ARMv8 CPU cores.
    
    The resulting increase in checksum throughput scales nicely with buffer
    size, tending towards 4x for a small in-order core (Cortex-A53), and up
    to 6x or more for an aggressive big core (Ampere eMAG).
    Reported-by: default avatarLingyan Huang <huanglingyan2@huawei.com>
    Tested-by: default avatarLingyan Huang <huanglingyan2@huawei.com>
    Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
    Signed-off-by: default avatarWill Deacon <will@kernel.org>
    5777eaed
csum.c 3.31 KB