• Tianjia Zhang's avatar
    crypto: arm64/sm4 - refactor and simplify NEON implementation · 62508017
    Tianjia Zhang authored
    This patch does not add new features. The main work is to refactor and
    simplify the implementation of SM4 NEON, which is reflected in the
    following aspects:
    
    The accelerated implementation supports the arbitrary number of blocks,
    not just multiples of 8, which simplifies the implementation and brings
    some optimization acceleration for data that is not aligned by 8 blocks.
    
    When loading the input data, use the ld4 instruction to replace the
    original ld1 instruction as much as possible, which will save the cost
    of matrix transposition of the input data.
    
    Use 8-block parallelism whenever possible to speed up matrix transpose
    and rotation operations, instead of up to 4-block parallelism.
    Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    62508017
sm4-neon-core.S 19 KB