• Xiao Wang's avatar
    riscv: Optimize crc32 with Zbc extension · a43fe27d
    Xiao Wang authored
    As suggested by the B-ext spec, the Zbc (carry-less multiplication)
    instructions can be used to accelerate CRC calculations. Currently, the
    crc32 is the most widely used crc function inside kernel, so this patch
    focuses on the optimization of just the crc32 APIs.
    
    Compared with the current table-lookup based optimization, Zbc based
    optimization can also achieve large stride during CRC calculation loop,
    meantime, it avoids the memory access latency of the table-lookup based
    implementation and it reduces memory footprint.
    
    If Zbc feature is not supported in a runtime environment, then the
    table-lookup based implementation would serve as fallback via alternative
    mechanism.
    
    By inspecting the vmlinux built by gcc v12.2.0 with default optimization
    level (-O2), we can see below instruction count change for each 8-byte
    stride in the CRC32 loop:
    
    rv64: crc32_be (54->31), crc32_le (54->13), __crc32c_le (54->13)
    rv32: crc32_be (50->32), crc32_le (50->16), __crc32c_le (50->16)
    
    The compile target CPU is little endian, extra effort is needed for byte
    swapping for the crc32_be API, thus, the instruction count change is not
    as significant as that in the *_le cases.
    
    This patch is tested on QEMU VM with the kernel CRC32 selftest for both
    rv64 and rv32. Running the CRC32 selftest on a real hardware (SpacemiT K1)
    with Zbc extension shows 65% and 125% performance improvement respectively
    on crc32_test() and crc32c_test().
    Signed-off-by: default avatarXiao Wang <xiao.w.wang@intel.com>
    Reviewed-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
    Link: https://lore.kernel.org/r/20240621054707.1847548-1-xiao.w.wang@intel.comSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
    a43fe27d
crc32.c 7.1 KB