- 04 Nov, 2022 17 commits
-
-
Tianjia Zhang authored
This patch is a CE-optimized assembly implementation for GCM mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 224 and 224 modes of tcrypt, and compared the performance before and after this patch (the driver used before this patch is gcm_base(ctr-sm4-ce,ghash-generic)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before (gcm_base(ctr-sm4-ce,ghash-generic)): gcm(sm4) | 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------------- GCM enc | 25.24 64.65 104.66 116.69 123.81 125.12 129.67 130.62 GCM dec | 25.40 64.80 104.74 116.70 123.81 125.21 129.68 130.59 GCM mb enc | 24.95 64.06 104.20 116.38 123.55 124.97 129.63 130.61 GCM mb dec | 24.92 64.00 104.13 116.34 123.55 124.98 129.56 130.48 After: gcm-sm4-ce | 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------------- GCM enc | 108.62 397.18 971.60 1283.92 1522.77 1513.39 1777.00 1806.96 GCM dec | 116.36 398.14 1004.27 1319.11 1624.21 1635.43 1932.54 1974.20 GCM mb enc | 107.13 391.79 962.05 1274.94 1514.76 1508.57 1769.07 1801.58 GCM mb dec | 113.40 389.36 988.51 1307.68 1619.10 1631.55 1931.70 1970.86 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Tianjia Zhang authored
This patch is a CE-optimized assembly implementation for CCM mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 223 and 225 modes of tcrypt, and compared the performance before and after this patch (the driver used before this patch is ccm_base(ctr-sm4-ce,cbcmac-sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before (rfc4309(ccm_base(ctr-sm4-ce,cbcmac-sm4-ce))): ccm(sm4) | 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------- CCM enc | 35.07 125.40 336.47 468.17 581.97 619.18 712.56 736.01 CCM dec | 34.87 124.40 335.08 466.75 581.04 618.81 712.25 735.89 CCM mb enc | 34.71 123.96 333.92 465.39 579.91 617.49 711.45 734.92 CCM mb dec | 34.42 122.80 331.02 462.81 578.28 616.42 709.88 734.19 After (rfc4309(ccm-sm4-ce)): ccm-sm4-ce | 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------- CCM enc | 77.12 249.82 569.94 725.17 839.27 867.71 952.87 969.89 CCM dec | 75.90 247.26 566.29 722.12 836.90 865.95 951.74 968.57 CCM mb enc | 75.98 245.25 562.91 718.99 834.76 864.70 950.17 967.90 CCM mb dec | 75.06 243.78 560.58 717.13 833.68 862.70 949.35 967.11 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Tianjia Zhang authored
This patch is a CE-optimized assembly implementation for cmac/xcbc/cbcmac. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 300 mode of tcrypt, and compared the performance before and after this patch (the driver used before this patch is XXXmac(sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before: update-size | 16 64 256 1024 2048 4096 8192 ---------------+-------------------------------------------------------- cmac(sm4-ce) | 293.33 403.69 503.76 527.78 531.10 535.46 535.81 xcbc(sm4-ce) | 292.83 402.50 504.02 529.08 529.87 536.55 538.24 cbcmac(sm4-ce) | 318.42 415.79 497.12 515.05 523.15 521.19 523.01 After: update-size | 16 64 256 1024 2048 4096 8192 ---------------+-------------------------------------------------------- cmac-sm4-ce | 371.99 675.28 903.56 971.65 980.57 990.40 991.04 xcbc-sm4-ce | 372.11 674.55 903.47 971.61 980.96 990.42 991.10 cbcmac-sm4-ce | 371.63 675.33 903.23 972.07 981.42 990.93 991.45 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Tianjia Zhang authored
This patch is a CE-optimized assembly implementation for XTS mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of tcrypt, and compared the performance before and after this patch (the driver used before this patch is xts(ecb-sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before: xts(ecb-sm4-ce) | 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- XTS enc | 117.17 430.56 732.92 1134.98 2007.03 2136.23 2347.20 XTS dec | 116.89 429.02 733.40 1132.96 2006.13 2130.50 2347.92 After: xts-sm4-ce | 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- XTS enc | 224.68 798.91 1248.08 1714.60 2413.73 2467.84 2612.62 XTS dec | 229.85 791.34 1237.79 1720.00 2413.30 2473.84 2611.95 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Tianjia Zhang authored
This patch is a CE-optimized assembly implementation for CTS-CBC mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of tcrypt, and compared the performance before and after this patch (the driver used before this patch is cts(cbc-sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before: cts(cbc-sm4-ce) | 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- CTS-CBC enc | 286.09 297.17 457.97 627.75 868.58 900.80 957.69 CTS-CBC dec | 286.67 285.63 538.35 947.08 2241.03 2577.32 3391.14 After: cts-cbc-sm4-ce | 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- CTS-CBC enc | 288.19 428.80 593.57 741.04 911.73 931.80 950.00 CTS-CBC dec | 292.22 468.99 838.23 1380.76 2741.17 3036.42 3409.62 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Tianjia Zhang authored
In the accelerated implementation of the SM4 algorithm using the Crypto Extension instructions, there are some functions that can be reused in the upcoming accelerated implementation of the GCM/CCM mode, and the CBC/CFB encryption is reused in the optimized implementation of SVESM4. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Tianjia Zhang authored
Use a 128-bit swap mask and tbl instruction to simplify the implementation for generating SM4 rkey_dec. Also fixed the issue of not being wrapped by kernel_neon_begin/end() when using the sm4_ce_expand_key() function. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Tianjia Zhang authored
This patch does not add new features, but only refactors and simplifies the implementation of the Crypto Extension acceleration of the SM4 algorithm: Extract the macro optimized by SM4 Crypto Extension for reuse in the subsequent optimization of CCM/GCM modes. Encryption in CBC and CFB modes processes four blocks at a time instead of one, allowing the ld1 instruction to load 64 bytes of data at a time, which will reduces unnecessary memory accesses. CBC/CFB/CTR makes full use of free registers to reduce redundant memory accesses, and rearranges some instructions to improve out-of-order execution capabilities. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Tianjia Zhang authored
Added CTS-CBC/XTS/XCBC tests for SM4 algorithms, as well as corresponding speed tests, this is to test performance-optimized implementations of these modes. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Tianjia Zhang authored
This patch newly adds the test vectors of CTS-CBC/XTS/XCBC modes of the SM4 algorithm, and also added some test vectors for SM4 GCM/CCM. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Tianjia Zhang authored
This patch does not add new features. The main work is to refactor and simplify the implementation of SM4 NEON, which is reflected in the following aspects: The accelerated implementation supports the arbitrary number of blocks, not just multiples of 8, which simplifies the implementation and brings some optimization acceleration for data that is not aligned by 8 blocks. When loading the input data, use the ld4 instruction to replace the original ld1 instruction as much as possible, which will save the cost of matrix transposition of the input data. Use 8-block parallelism whenever possible to speed up matrix transpose and rotation operations, instead of up to 4-block parallelism. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Tianjia Zhang authored
This patch adds the NEON acceleration implementation of the SM3 hash algorithm. The main algorithm is based on SM3 NEON accelerated work of the libgcrypt project. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 326 mode of tcrypt, and compares the performance data of sm3-generic and sm3-ce. The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: update-size | 16 64 256 1024 2048 4096 8192 ---------------+-------------------------------------------------------- sm3-generic | 185.24 221.28 301.26 307.43 300.83 308.82 308.91 sm3-neon | 171.81 220.20 322.94 339.28 334.09 343.61 343.87 sm3-ce | 227.48 333.48 502.62 527.87 520.45 534.91 535.40 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Tianjia Zhang authored
Raise the priority of the sm3-ce algorithm from 200 to 400, this is to make room for the implementation of sm3-neon. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Anirudh Venkataramanan authored
The top level print banners have a leading newline. It's not entirely clear why this exists, but it makes it harder to parse tcrypt test output using a script. Drop said newlines. tcrypt output before this patch: [...] testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption [...] test 0 (160 bit key, 16 byte blocks): 1 operation in 2320 cycles (16 bytes) tcrypt output with this patch: [...] testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption [...] test 0 (160 bit key, 16 byte blocks): 1 operation in 2320 cycles (16 bytes) Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Anirudh Venkataramanan authored
The pr_fmt() define includes KBUILD_MODNAME, and so there's no need for pr_err() to also print it. Drop module name from the print string. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Anirudh Venkataramanan authored
Currently, there's mixed use of printk() and pr_info()/pr_err(). The latter prints the module name (because pr_fmt() is defined so) but the former does not. As a result there's inconsistency in the printed output. For example: modprobe mode=211: [...] test 0 (160 bit key, 16 byte blocks): 1 operation in 2320 cycles (16 bytes) [...] test 1 (160 bit key, 64 byte blocks): 1 operation in 2336 cycles (64 bytes) modprobe mode=215: [...] tcrypt: test 0 (160 bit key, 16 byte blocks): 1 operation in 2173 cycles (16 bytes) [...] tcrypt: test 1 (160 bit key, 64 byte blocks): 1 operation in 2241 cycles (64 bytes) Replace all instances of printk() with pr_info()/pr_err() so that the module name is printed consistently. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Anirudh Venkataramanan authored
For some test cases, a line break gets inserted between the test banner and the results. For example, with mode=211 this is the output: [...] testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption [...] test 0 (160 bit key, 16 byte blocks): [...] 1 operation in 2373 cycles (16 bytes) --snip-- [...] testing speed of gcm(aes) (generic-gcm-aesni) encryption [...] test 0 (128 bit key, 16 byte blocks): [...] 1 operation in 2338 cycles (16 bytes) Similar behavior is seen in the following cases as well: modprobe tcrypt mode=212 modprobe tcrypt mode=213 modprobe tcrypt mode=221 modprobe tcrypt mode=300 sec=1 modprobe tcrypt mode=400 sec=1 This doesn't happen with mode=215: [...] tcrypt: testing speed of multibuffer rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption [...] tcrypt: test 0 (160 bit key, 16 byte blocks): 1 operation in 2215 cycles (16 bytes) --snip-- [...] tcrypt: testing speed of multibuffer gcm(aes) (generic-gcm-aesni) encryption [...] tcrypt: test 0 (128 bit key, 16 byte blocks): 1 operation in 2191 cycles (16 bytes) This print inconsistency is because printk() is used instead of pr_cont() in a few places. Change these to be pr_cont(). checkpatch warns that pr_cont() shouldn't be used. This can be ignored in this context as tcrypt already uses pr_cont(). Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
- 28 Oct, 2022 23 commits
-
-
wangjianli authored
Delete the redundant word 'the'. Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Kai Ye authored
Because the permission on the VF debugfs file is "0444". So the VF function checking is redundant in qos writing api. Signed-off-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Kai Ye authored
The pci bdf number check is added for qos written by using the pci api. Directly get the devfn by pci_dev, so delete some redundant code. And use the kstrtoul instead of sscanf to simplify code. Signed-off-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Kai Ye authored
Increase the buffer to prevent stack overflow by fuzz test. The maximum length of the qos configuration buffer is 256 bytes. Currently, the value of the 'val buffer' is only 32 bytes. The sscanf does not check the dest memory length. So the 'val buffer' may stack overflow. Signed-off-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Frederick Lawler authored
We want to leverage keyring to store sensitive keys, and then use those keys for symmetric encryption via the crypto API. Among the key types we wish to support are: user, logon, encrypted, and trusted. User key types are already able to have their data copied to user space, but logon does not support this. Further, trusted and encrypted keys will return their encrypted data back to user space on read, which does not make them ideal for symmetric encryption. To support symmetric encryption for these key types, add a new ALG_SET_KEY_BY_KEY_SERIAL setsockopt() option to the crypto API. This allows users to pass a key_serial_t to the crypto API to perform symmetric encryption. The behavior is the same as ALG_SET_KEY, but the crypto key data is copied in kernel space from a keyring key, which allows for the support of logon, encrypted, and trusted key types. Keyring keys must have the KEY_(POS|USR|GRP|OTH)_SEARCH permission set to leverage this feature. This follows the asymmetric_key type where key lookup calls eventually lead to keyring_search_rcu() without the KEYRING_SEARCH_NO_CHECK_PERM flag set. Signed-off-by: Frederick Lawler <fred@cloudflare.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
The RK3399 has 2 rk3288 compatible crypto device named crypto0 and crypto1. The only difference is lack of RSA in crypto1. We need to add driver support for 2 parallel instance as only one need to register crypto algorithms. Then the driver will round robin each request on each device. For avoiding complexity (device bringup after a TFM is created), PM is modified to be handled per request. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
The RK3399 has 3 resets, so the driver to handle multiple resets. This is done by using devm_reset_control_array_get_exclusive(). Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
rk_ahash_reg_init() use crypto_info from TFM context, since we will remove it, let's take if from parameters. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
Add the number of clocks needed for each compatible. Rockchip's datasheet give maximum frequencies for some clocks, so add checks for verifying they are within limits. Let's start with rk3288 for clock frequency check, other will came later. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
The crypto_info to use must be stored in the request context. This will help when 2 crypto_info will be available on rk3399. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
Since driver support new compatible, we need to update the driver bindings. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
Convert rockchip-crypto to YAML. Reviewed-by: John Keeping <john@metanate.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
Instead of using the crypto_info from TFM ctx, use the one given as parameter. Reviewed-by: John Keeping <john@metanate.com> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
Instead of using lot of ctx->dev->xx indirections, use an intermediate variable for rk_crypto_info. This will help later, when 2 different rk_crypto_info would be used. Reviewed-by: John Keeping <john@metanate.com> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
This patch rework the rk_handle_req(), simply removing the rk_crypto_info parameter. Reviewed-by: John Keeping <john@metanate.com> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
Some functions have still ablk in their name even if there are not handling ablk_cipher anymore. So let's rename them. Reviewed-by: John Keeping <john@metanate.com> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
The rk3328 could be used as-is by the rockchip driver. Reviewed-by: John Keeping <john@metanate.com> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
This patch fixes some warning reported by checkpatch Reviewed-by: John Keeping <john@metanate.com> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
Use read_poll_timeout instead of open coding it. In the same time, fix indentation of related comment. Reviewed-by: John Keeping <john@metanate.com> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
Nobody is set as maintainer of rockchip crypto, I propose to do it as I have already reworked lot of this code. Reviewed-by: John Keeping <john@metanate.com> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
rk3328 does not have the same clock names than rk3288, instead of using a complex clock management, let's use clk_bulk to simplify their handling. Reviewed-by: John Keeping <john@metanate.com> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
reset could be handled by PM functions. We keep the initial reset pulse to be sure the hw is a know device state after probe. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
Add runtime PM support for rockchip crypto. Reviewed-by: John Keeping <john@metanate.com> Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-