1. 12 Apr, 2024 32 commits
  2. 05 Apr, 2024 8 commits
    • Thorsten Blum's avatar
      crypto: jitter - Replace http with https · 4ad27a8b
      Thorsten Blum authored
      The PDF is also available via https.
      Signed-off-by: default avatarThorsten Blum <thorsten.blum@toblux.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      4ad27a8b
    • Thorsten Blum's avatar
      8fa5f4f0
    • Eric Biggers's avatar
      crypto: x86/aes-xts - wire up VAES + AVX10/512 implementation · aa2197f5
      Eric Biggers authored
      Add an AES-XTS implementation "xts-aes-vaes-avx10_512" for x86_64 CPUs
      with the VAES, VPCLMULQDQ, and either AVX10/512 or AVX512BW + AVX512VL
      extensions.  This implementation uses zmm registers to operate on four
      AES blocks at a time.  The assembly code is instantiated using a macro
      so that most of the source code is shared with other implementations.
      
      To avoid downclocking on older Intel CPU models, an exclusion list is
      used to prevent this 512-bit implementation from being used by default
      on some CPU models.  They will use xts-aes-vaes-avx10_256 instead.  For
      now, this exclusion list is simply coded into aesni-intel_glue.c.  It
      may make sense to eventually move it into a more central location.
      
      xts-aes-vaes-avx10_512 is slightly faster than xts-aes-vaes-avx10_256 on
      some current CPUs.  E.g., on AMD Zen 4, AES-256-XTS decryption
      throughput increases by 13% with 4096-byte inputs, or 14% with 512-byte
      inputs.  On Intel Sapphire Rapids, AES-256-XTS decryption throughput
      increases by 2% with 4096-byte inputs, or 3% with 512-byte inputs.
      
      Future CPUs may provide stronger 512-bit support, in which case a larger
      benefit should be seen.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      aa2197f5
    • Eric Biggers's avatar
      crypto: x86/aes-xts - wire up VAES + AVX10/256 implementation · ee63fea0
      Eric Biggers authored
      Add an AES-XTS implementation "xts-aes-vaes-avx10_256" for x86_64 CPUs
      with the VAES, VPCLMULQDQ, and either AVX10/256 or AVX512BW + AVX512VL
      extensions.  This implementation avoids using zmm registers, instead
      using ymm registers to operate on two AES blocks at a time.  The
      assembly code is instantiated using a macro so that most of the source
      code is shared with other implementations.
      
      This is the optimal implementation on CPUs that support VAES and AVX512
      but where the zmm registers should not be used due to downclocking
      effects, for example Intel's Ice Lake.  It should also be the optimal
      implementation on future CPUs that support AVX10/256 but not AVX10/512.
      
      The performance is slightly better than that of xts-aes-vaes-avx2, which
      uses the same 256-bit vector length, due to factors such as being able
      to use ymm16-ymm31 to cache the AES round keys, and being able to use
      the vpternlogd instruction to do XORs more efficiently.  For example, on
      Ice Lake, the throughput of decrypting 4096-byte messages with
      AES-256-XTS is 6.6% higher with xts-aes-vaes-avx10_256 than with
      xts-aes-vaes-avx2.  While this is a small improvement, it is
      straightforward to provide this implementation (xts-aes-vaes-avx10_256)
      as long as we are providing xts-aes-vaes-avx2 and xts-aes-vaes-avx10_512
      anyway, due to the way the _aes_xts_crypt macro is structured.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      ee63fea0
    • Eric Biggers's avatar
      crypto: x86/aes-xts - wire up VAES + AVX2 implementation · e787060b
      Eric Biggers authored
      Add an AES-XTS implementation "xts-aes-vaes-avx2" for x86_64 CPUs with
      the VAES, VPCLMULQDQ, and AVX2 extensions, but not AVX512 or AVX10.
      This implementation uses ymm registers to operate on two AES blocks at a
      time.  The assembly code is instantiated using a macro so that most of
      the source code is shared with other implementations.
      
      This is the optimal implementation on AMD Zen 3.  It should also be the
      optimal implementation on Intel Alder Lake, which similarly supports
      VAES but not AVX512.  Comparing to xts-aes-aesni-avx on Zen 3,
      xts-aes-vaes-avx2 provides 70% higher AES-256-XTS decryption throughput
      with 4096-byte messages, or 23% higher with 512-byte messages.
      
      A large improvement is also seen with CPUs that do support AVX512 (e.g.,
      98% higher AES-256-XTS decryption throughput on Ice Lake with 4096-byte
      messages), though the following patches add AVX512 optimized
      implementations to get a bit more performance on those CPUs.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      e787060b
    • Eric Biggers's avatar
      crypto: x86/aes-xts - wire up AESNI + AVX implementation · 996f4dcb
      Eric Biggers authored
      Add an AES-XTS implementation "xts-aes-aesni-avx" for x86_64 CPUs that
      have the AES-NI and AVX extensions but not VAES.  It's similar to the
      existing xts-aes-aesni in that uses xmm registers to operate on one AES
      block at a time.  It differs from xts-aes-aesni in the following ways:
      
      - It uses the VEX-coded (non-destructive) instructions from AVX.
        This improves performance slightly.
      - It incorporates some additional optimizations such as interleaving the
        tweak computation with AES en/decryption, handling single-page
        messages more efficiently, and caching the first round key.
      - It supports only 64-bit (x86_64).
      - It's generated by an assembly macro that will also be used to generate
        VAES-based implementations.
      
      The performance improvement over xts-aes-aesni varies from small to
      large, depending on the CPU and other factors such as the size of the
      messages en/decrypted.  For example, the following increases in
      AES-256-XTS decryption throughput are seen on the following CPUs:
      
                                | 4096-byte messages | 512-byte messages |
          ----------------------+--------------------+-------------------+
          Intel Skylake         |        6%          |       31%         |
          Intel Cascade Lake    |        4%          |       26%         |
          AMD Zen 1             |        61%         |       73%         |
          AMD Zen 2             |        36%         |       59%         |
      
      (The above CPUs don't support VAES, so they can't use VAES instead.)
      
      While this isn't as large an improvement as what VAES provides, this
      still seems worthwhile.  This implementation is fairly easy to provide
      based on the assembly macro that's needed for VAES anyway, and it will
      be the best implementation on a large number of CPUs (very roughly, the
      CPUs launched by Intel and AMD from 2011 to 2018).
      
      This makes the existing xts-aes-aesni *mostly* obsolete.  For now, leave
      it in place to support 32-bit kernels and also CPUs like Intel Westmere
      that support AES-NI but not AVX.  (We could potentially remove it anyway
      and just rely on the indirect acceleration via ecb-aes-aesni in those
      cases, but that change will need to be considered separately.)
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      996f4dcb
    • Eric Biggers's avatar
      crypto: x86/aes-xts - add AES-XTS assembly macro for modern CPUs · d6371688
      Eric Biggers authored
      Add an assembly file aes-xts-avx-x86_64.S which contains a macro that
      expands into AES-XTS implementations for x86_64 CPUs that support at
      least AES-NI and AVX, optionally also taking advantage of VAES,
      VPCLMULQDQ, and AVX512 or AVX10.
      
      This patch doesn't expand the macro at all.  Later patches will do so,
      adding each implementation individually so that the motivation and use
      case for each individual implementation can be fully presented.
      
      The file also provides a function aes_xts_encrypt_iv() which handles the
      encryption of the IV (tweak), using AES-NI and AVX.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      d6371688
    • Eric Biggers's avatar
      x86: add kconfig symbols for assembler VAES and VPCLMULQDQ support · 7d4700d1
      Eric Biggers authored
      Add config symbols AS_VAES and AS_VPCLMULQDQ that expose whether the
      assembler supports the vector AES and carryless multiplication
      cryptographic extensions.
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      7d4700d1