• WANG Xuerui's avatar
    LoongArch: Add SIMD-optimized XOR routines · 75ded18a
    WANG Xuerui authored
    Add LSX and LASX implementations of xor operations, operating on 64
    bytes (one L1 cache line) at a time, for a balance between memory
    utilization and instruction mix. Huacai confirmed that all future
    LoongArch implementations by Loongson (that we care) will likely also
    feature 64-byte cache lines, and experiments show no throughput
    improvement with further unrolling.
    
    Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:
    
    > 8regs           : 12702 MB/sec
    > 8regs_prefetch  : 10920 MB/sec
    > 32regs          : 12686 MB/sec
    > 32regs_prefetch : 10918 MB/sec
    > lsx             : 17589 MB/sec
    > lasx            : 26116 MB/sec
    Acked-by: default avatarSong Liu <song@kernel.org>
    Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
    Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
    75ded18a
xor_simd.h 1.68 KB