-
Tianjia Zhang authored
This adds ARMv8 NEON implementations of SM4 in ECB, CBC, CFB and CTR modes. This implementation uses the plain NEON instruction set, All S-BOX substitutions uses the tbl/tbx instructions of ARMv8, combined with the out-of-order execution in CPU, this optimization supports encryption of up to 8 blocks at the same time. The performance of encrypting one block is not as good as software implementation, so the encryption operations of CBC and CFB still use pure software algorithms. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of tcrypt. The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: sm4-generic | 16 64 128 256 1024 1420 4096 ECB enc | 80.05 91.42 93.66 94.77 95.69 95.77 95.86 ECB dec | 79.98 91.41 93.64 94.76 95.66 95.77 95.85 CBC enc | 78.55 86.50 88.02 88.77 89.36 89.42 89.48 CBC dec | 76.82 89.06 91.52 92.77 93.75 93.83 93.96 CFB enc | 77.64 86.13 87.62 88.42 89.08 88.83 89.18 CFB dec | 77.57 88.34 90.36 91.45 92.34 92.00 92.44 CTR enc | 77.80 88.28 90.23 91.22 92.11 91.81 92.25 CTR dec | 77.83 88.22 90.22 91.22 92.04 91.82 92.28 sm4-neon ECB enc | 28.31 112.77 203.03 209.89 215.49 202.11 210.59 ECB dec | 28.36 113.45 203.23 210.00 215.52 202.13 210.65 CBC enc | 79.32 87.02 88.51 89.28 89.85 89.89 89.97 CBC dec | 28.29 112.20 203.30 209.82 214.99 201.51 209.95 CFB enc | 79.59 87.16 88.54 89.30 89.83 89.62 89.92 CFB dec | 28.12 111.05 202.47 209.02 214.21 210.90 209.12 CTR enc | 28.04 108.81 200.62 206.65 211.78 208.78 206.74 CTR dec | 28.02 108.82 200.45 206.62 211.78 208.74 206.70 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
4f1aef9b