• Ard Biesheuvel's avatar
    crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTR · 7367bfeb
    Ard Biesheuvel authored
    This implements 5-way interleaving for ECB, CBC decryption and CTR,
    resulting in a speedup of ~11% on Marvell ThunderX2, which has a
    very deep pipeline and therefore a high issue latency for NEON
    instructions operating on the same registers.
    
    Note that XTS is left alone: implementing 5-way interleave there
    would either involve spilling of the calculated tweaks to the
    stack, or recalculating them after the encryption operation, and
    doing either of those would most likely penalize low end cores.
    
    For ECB, this is not a concern at all, given that we have plenty
    of spare registers. For CTR and CBC decryption, we take advantage
    of the fact that v16 is not used by the CE version of the code
    (which is the only one targeted by the optimization), and so we
    can reshuffle the code a bit and avoid having to spill to memory
    (with the exception of one extra reload in the CBC routine)
    Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    7367bfeb
aes-neon.S 10.9 KB