• Taehee Yoo's avatar
    crypto: x86/aria - implement aria-avx512 · c970d420
    Taehee Yoo authored
    aria-avx512 implementation uses AVX512 and GFNI.
    It supports 64way parallel processing.
    So, byteslicing code is changed to support 64way parallel.
    And it exports some aria-avx2 functions such as encrypt() and decrypt().
    
    AVX and AVX2 have 16 registers.
    They should use memory to store/load state because of lack of registers.
    But AVX512 supports 32 registers.
    So, it doesn't require store/load in the s-box layer.
    It means that it can reduce overhead of store/load in the s-box layer.
    Also code become much simpler.
    
    Benchmark with modprobe tcrypt mode=610 num_mb=8192, i3-12100:
    
    ARIA-AVX512(128bit and 256bit)
        testing speed of multibuffer ecb(aria) (ecb-aria-avx512) encryption
    tcrypt: 1 operation in 1504 cycles (1024 bytes)
    tcrypt: 1 operation in 4595 cycles (4096 bytes)
    tcrypt: 1 operation in 1763 cycles (1024 bytes)
    tcrypt: 1 operation in 5540 cycles (4096 bytes)
        testing speed of multibuffer ecb(aria) (ecb-aria-avx512) decryption
    tcrypt: 1 operation in 1502 cyc...
    c970d420
aria_gfni_avx512_glue.c 7.57 KB