• Ard Biesheuvel's avatar
    crypto: arm64/gcm-ce - implement 4 way interleave · 11031c0d
    Ard Biesheuvel authored
    To improve performance on cores with deep pipelines such as ThunderX2,
    reimplement gcm(aes) using a 4-way interleave rather than the 2-way
    interleave we use currently.
    
    This comes down to a complete rewrite of the GCM part of the combined
    GCM/GHASH driver, and instead of interleaving two invocations of AES
    with the GHASH handling at the instruction level, the new version
    uses a more coarse grained approach where each chunk of 64 bytes is
    encrypted first and then ghashed (or ghashed and then decrypted in
    the converse case).
    
    The core NEON routine is now able to consume inputs of any size,
    and tail blocks of less than 64 bytes are handled using overlapping
    loads and stores, and processed by the same 4-way encryption and
    hashing routines. This gets rid of most of the branches, and avoids
    having to return to the C code to handle the tail block using a
    stack buffer.
    
    The table below compares the performance of the old driver and the new
    one on various micro-architectures and running in various modes.
    
            |     AES-128      |     AES-192      |     AES-256      |
     #bytes | 512 | 1500 |  4k | 512 | 1500 |  4k | 512 | 1500 |  4k |
     -------+-----+------+-----+-----+------+-----+-----+------+-----+
        TX2 | 35% |  23% | 11% | 34% |  20% |  9% | 38% |  25% | 16% |
       EMAG | 11% |   6% |  3% | 12% |   4% |  2% | 11% |   4% |  2% |
        A72 |  8% |   5% | -4% |  9% |   4% | -5% |  7% |   4% | -5% |
        A53 | 11% |   6% | -1% | 10% |   8% | -1% | 10% |   8% | -2% |
    Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    11031c0d
ghash-ce-core.S 16.5 KB