1. 25 Nov, 2022 4 commits
  2. 22 Nov, 2022 1 commit
  3. 18 Nov, 2022 13 commits
  4. 14 Nov, 2022 1 commit
  5. 11 Nov, 2022 5 commits
    • Shashank Gupta's avatar
      crypto: qat - remove ADF_STATUS_PF_RUNNING flag from probe · 557ffd5a
      Shashank Gupta authored
      The ADF_STATUS_PF_RUNNING bit is set after the successful initialization
      of the communication between VF to PF in adf_vf2pf_notify_init().
      So, it is not required to be set after the execution of the function
      adf_dev_init().
      Signed-off-by: default avatarShashank Gupta <shashank.gupta@intel.com>
      Reviewed-by: default avatarGiovanni Cabiddu <giovanni.cabiddu@intel.com>
      Reviewed-by: default avatarWojciech Ziemba <wojciech.ziemba@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      557ffd5a
    • Yang Li's avatar
      crypto: rockchip - Remove surplus dev_err() when using platform_get_irq() · fb11cddf
      Yang Li authored
      There is no need to call the dev_err() function directly to print a
      custom message when handling an error from either the platform_get_irq()
      or platform_get_irq_byname() functions as both are going to display an
      appropriate error message in case of a failure.
      
      ./drivers/crypto/rockchip/rk3288_crypto.c:351:2-9: line 351 is
      redundant because platform_get_irq() already prints an error
      
      Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2677Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: default avatarYang Li <yang.lee@linux.alibaba.com>
      Acked-by: default avatarCorentin Labbe <clabbe@baylibre.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      fb11cddf
    • Ard Biesheuvel's avatar
      crypto: lib/aesgcm - Provide minimal library implementation · 520af5da
      Ard Biesheuvel authored
      Implement a minimal library version of AES-GCM based on the existing
      library implementations of AES and multiplication in GF(2^128). Using
      these primitives, GCM can be implemented in a straight-forward manner.
      
      GCM has a couple of sharp edges, i.e., the amount of input data
      processed with the same initialization vector (IV) should be capped to
      protect the counter from 32-bit rollover (or carry), and the size of the
      authentication tag should be fixed for a given key. [0]
      
      The former concern is addressed trivially, given that the function call
      API uses 32-bit signed types for the input lengths. It is still up to
      the caller to avoid IV reuse in general, but this is not something we
      can police at the implementation level.
      
      As for the latter concern, let's make the authentication tag size part
      of the key schedule, and only permit it to be configured as part of the
      key expansion routine.
      
      Note that table based AES implementations are susceptible to known
      plaintext timing attacks on the encryption key. The AES library already
      attempts to mitigate this to some extent, but given that the counter
      mode encryption used by GCM operates exclusively on known plaintext by
      construction (the IV and therefore the initial counter value are known
      to an attacker), let's take some extra care to mitigate this, by calling
      the AES library with interrupts disabled.
      
      [0] https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-38d.pdf
      
      Link: https://lore.kernel.org/all/c6fb9b25-a4b6-2e4a-2dd1-63adda055a49@amd.com/Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarNikunj A Dadhania <nikunj@amd.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      520af5da
    • Ard Biesheuvel's avatar
      crypto: lib/gf128mul - make gf128mul_lle time invariant · b67ce439
      Ard Biesheuvel authored
      The gf128mul library has different variants with different
      memory/performance tradeoffs, where the faster ones use 4k or 64k lookup
      tables precomputed at runtime, which are based on one of the
      multiplication factors, which is commonly the key for keyed hash
      algorithms such as GHASH.
      
      The slowest variant is gf128_mul_lle() [and its bbe/ble counterparts],
      which does not use precomputed lookup tables, but it still relies on a
      single u16[256] lookup table which is input independent. The use of such
      a table may cause the execution time of gf128_mul_lle() to correlate
      with the value of the inputs, which is generally something that must be
      avoided for cryptographic algorithms. On top of that, the function uses
      a sequence of if () statements that conditionally invoke be128_xor()
      based on which bits are set in the second argument of the function,
      which is usually a pointer to the multiplication factor that represents
      the key.
      
      In order to remove the correlation between the execution time of
      gf128_mul_lle() and the value of its inputs, let's address the
      identified shortcomings:
      - add a time invariant version of gf128mul_x8_lle() that replaces the
        table lookup with the expression that is used at compile time to
        populate the lookup table;
      - make the invocations of be128_xor() unconditional, but pass a zero
        vector as the third argument if the associated bit in the key is
        cleared.
      
      The resulting code is likely to be significantly slower. However, given
      that this is the slowest version already, making it even slower in order
      to make it more secure is assumed to be justified.
      
      The bbe and ble counterparts could receive the same treatment, but the
      former is never used anywhere in the kernel, and the latter is only
      used in the driver for a asynchronous crypto h/w accelerator (Chelsio),
      where timing variances are unlikely to matter.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      b67ce439
    • Ard Biesheuvel's avatar
      crypto: move gf128mul library into lib/crypto · 61c581a4
      Ard Biesheuvel authored
      The gf128mul library does not depend on the crypto API at all, so it can
      be moved into lib/crypto. This will allow us to use it in other library
      code in a subsequent patch without having to depend on CONFIG_CRYPTO.
      
      While at it, change the Kconfig symbol name to align with other crypto
      library implementations. However, the source file name is retained, as
      it is reflected in the module .ko filename, and changing this might
      break things for users.
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      61c581a4
  6. 04 Nov, 2022 16 commits
    • Ralph Siemsen's avatar
      crypto: doc - use correct function name · 329cfa42
      Ralph Siemsen authored
      The hashing API does not have a function called .finish()
      Signed-off-by: default avatarRalph Siemsen <ralph.siemsen@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      329cfa42
    • Tianjia Zhang's avatar
      crypto: arm64/sm4 - add CE implementation for GCM mode · ae1b83c7
      Tianjia Zhang authored
      This patch is a CE-optimized assembly implementation for GCM mode.
      
      Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 224 and 224
      modes of tcrypt, and compared the performance before and after this patch (the
      driver used before this patch is gcm_base(ctr-sm4-ce,ghash-generic)).
      The abscissas are blocks of different lengths. The data is tabulated and the
      unit is Mb/s:
      
      Before (gcm_base(ctr-sm4-ce,ghash-generic)):
      
      gcm(sm4)     |     16      64      256      512     1024     1420     4096     8192
      -------------+---------------------------------------------------------------------
        GCM enc    |  25.24   64.65   104.66   116.69   123.81   125.12   129.67   130.62
        GCM dec    |  25.40   64.80   104.74   116.70   123.81   125.21   129.68   130.59
        GCM mb enc |  24.95   64.06   104.20   116.38   123.55   124.97   129.63   130.61
        GCM mb dec |  24.92   64.00   104.13   116.34   123.55   124.98   129.56   130.48
      
      After:
      
      gcm-sm4-ce   |     16      64      256      512     1024     1420     4096     8192
      -------------+---------------------------------------------------------------------
        GCM enc    | 108.62  397.18   971.60  1283.92  1522.77  1513.39  1777.00  1806.96
        GCM dec    | 116.36  398.14  1004.27  1319.11  1624.21  1635.43  1932.54  1974.20
        GCM mb enc | 107.13  391.79   962.05  1274.94  1514.76  1508.57  1769.07  1801.58
        GCM mb dec | 113.40  389.36   988.51  1307.68  1619.10  1631.55  1931.70  1970.86
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      ae1b83c7
    • Tianjia Zhang's avatar
      crypto: arm64/sm4 - add CE implementation for CCM mode · 67fa3a7f
      Tianjia Zhang authored
      This patch is a CE-optimized assembly implementation for CCM mode.
      
      Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 223 and 225
      modes of tcrypt, and compared the performance before and after this patch (the
      driver used before this patch is ccm_base(ctr-sm4-ce,cbcmac-sm4-ce)).
      The abscissas are blocks of different lengths. The data is tabulated and the
      unit is Mb/s:
      
      Before (rfc4309(ccm_base(ctr-sm4-ce,cbcmac-sm4-ce))):
      
      ccm(sm4)     |     16      64     256     512    1024    1420    4096    8192
      -------------+---------------------------------------------------------------
        CCM enc    |  35.07  125.40  336.47  468.17  581.97  619.18  712.56  736.01
        CCM dec    |  34.87  124.40  335.08  466.75  581.04  618.81  712.25  735.89
        CCM mb enc |  34.71  123.96  333.92  465.39  579.91  617.49  711.45  734.92
        CCM mb dec |  34.42  122.80  331.02  462.81  578.28  616.42  709.88  734.19
      
      After (rfc4309(ccm-sm4-ce)):
      
      ccm-sm4-ce   |     16      64     256     512    1024    1420    4096    8192
      -------------+---------------------------------------------------------------
        CCM enc    |  77.12  249.82  569.94  725.17  839.27  867.71  952.87  969.89
        CCM dec    |  75.90  247.26  566.29  722.12  836.90  865.95  951.74  968.57
        CCM mb enc |  75.98  245.25  562.91  718.99  834.76  864.70  950.17  967.90
        CCM mb dec |  75.06  243.78  560.58  717.13  833.68  862.70  949.35  967.11
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      67fa3a7f
    • Tianjia Zhang's avatar
      crypto: arm64/sm4 - add CE implementation for cmac/xcbc/cbcmac · 6b5360a5
      Tianjia Zhang authored
      This patch is a CE-optimized assembly implementation for cmac/xcbc/cbcmac.
      
      Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 300 mode of
      tcrypt, and compared the performance before and after this patch (the driver
      used before this patch is XXXmac(sm4-ce)). The abscissas are blocks of
      different lengths. The data is tabulated and the unit is Mb/s:
      
      Before:
      
      update-size    |      16      64     256    1024    2048    4096    8192
      ---------------+--------------------------------------------------------
      cmac(sm4-ce)   |  293.33  403.69  503.76  527.78  531.10  535.46  535.81
      xcbc(sm4-ce)   |  292.83  402.50  504.02  529.08  529.87  536.55  538.24
      cbcmac(sm4-ce) |  318.42  415.79  497.12  515.05  523.15  521.19  523.01
      
      After:
      
      update-size    |      16      64     256    1024    2048    4096    8192
      ---------------+--------------------------------------------------------
      cmac-sm4-ce    |  371.99  675.28  903.56  971.65  980.57  990.40  991.04
      xcbc-sm4-ce    |  372.11  674.55  903.47  971.61  980.96  990.42  991.10
      cbcmac-sm4-ce  |  371.63  675.33  903.23  972.07  981.42  990.93  991.45
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      6b5360a5
    • Tianjia Zhang's avatar
      crypto: arm64/sm4 - add CE implementation for XTS mode · 01f63311
      Tianjia Zhang authored
      This patch is a CE-optimized assembly implementation for XTS mode.
      
      Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of
      tcrypt, and compared the performance before and after this patch (the driver
      used before this patch is xts(ecb-sm4-ce)). The abscissas are blocks of
      different lengths. The data is tabulated and the unit is Mb/s:
      
      Before:
      
      xts(ecb-sm4-ce) |      16       64      128      256     1024     1420     4096
      ----------------+--------------------------------------------------------------
              XTS enc |  117.17   430.56   732.92  1134.98  2007.03  2136.23  2347.20
              XTS dec |  116.89   429.02   733.40  1132.96  2006.13  2130.50  2347.92
      
      After:
      
      xts-sm4-ce      |      16       64      128      256     1024     1420     4096
      ----------------+--------------------------------------------------------------
              XTS enc |  224.68   798.91  1248.08  1714.60  2413.73  2467.84  2612.62
              XTS dec |  229.85   791.34  1237.79  1720.00  2413.30  2473.84  2611.95
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      01f63311
    • Tianjia Zhang's avatar
      crypto: arm64/sm4 - add CE implementation for CTS-CBC mode · b1863fd0
      Tianjia Zhang authored
      This patch is a CE-optimized assembly implementation for CTS-CBC mode.
      
      Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of
      tcrypt, and compared the performance before and after this patch (the driver
      used before this patch is cts(cbc-sm4-ce)). The abscissas are blocks of
      different lengths. The data is tabulated and the unit is Mb/s:
      
      Before:
      
      cts(cbc-sm4-ce) |      16       64      128      256     1024     1420     4096
      ----------------+--------------------------------------------------------------
          CTS-CBC enc |  286.09   297.17   457.97   627.75   868.58   900.80   957.69
          CTS-CBC dec |  286.67   285.63   538.35   947.08  2241.03  2577.32  3391.14
      
      After:
      
      cts-cbc-sm4-ce  |      16       64      128      256     1024     1420     4096
      ----------------+--------------------------------------------------------------
          CTS-CBC enc |  288.19   428.80   593.57   741.04   911.73   931.80   950.00
          CTS-CBC dec |  292.22   468.99   838.23  1380.76  2741.17  3036.42  3409.62
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      b1863fd0
    • Tianjia Zhang's avatar
      crypto: arm64/sm4 - export reusable CE acceleration functions · 45089dbe
      Tianjia Zhang authored
      In the accelerated implementation of the SM4 algorithm using the Crypto
      Extension instructions, there are some functions that can be reused in
      the upcoming accelerated implementation of the GCM/CCM mode, and the
      CBC/CFB encryption is reused in the optimized implementation of SVESM4.
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      45089dbe
    • Tianjia Zhang's avatar
      crypto: arm64/sm4 - simplify sm4_ce_expand_key() of CE implementation · cb9ba02b
      Tianjia Zhang authored
      Use a 128-bit swap mask and tbl instruction to simplify the implementation
      for generating SM4 rkey_dec.
      
      Also fixed the issue of not being wrapped by kernel_neon_begin/end() when
      using the sm4_ce_expand_key() function.
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      cb9ba02b
    • Tianjia Zhang's avatar
      crypto: arm64/sm4 - refactor and simplify CE implementation · ce41fefd
      Tianjia Zhang authored
      This patch does not add new features, but only refactors and simplifies the
      implementation of the Crypto Extension acceleration of the SM4 algorithm:
      
      Extract the macro optimized by SM4 Crypto Extension for reuse in the
      subsequent optimization of CCM/GCM modes.
      
      Encryption in CBC and CFB modes processes four blocks at a time instead of
      one, allowing the ld1 instruction to load 64 bytes of data at a time, which
      will reduces unnecessary memory accesses.
      
      CBC/CFB/CTR makes full use of free registers to reduce redundant memory
      accesses, and rearranges some instructions to improve out-of-order execution
      capabilities.
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      ce41fefd
    • Tianjia Zhang's avatar
      crypto: tcrypt - add SM4 cts-cbc/xts/xcbc test · 3c383637
      Tianjia Zhang authored
      Added CTS-CBC/XTS/XCBC tests for SM4 algorithms, as well as
      corresponding speed tests, this is to test performance-optimized
      implementations of these modes.
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      3c383637
    • Tianjia Zhang's avatar
      crypto: testmgr - add SM4 cts-cbc/xts/xcbc test vectors · c24ee936
      Tianjia Zhang authored
      This patch newly adds the test vectors of CTS-CBC/XTS/XCBC modes of
      the SM4 algorithm, and also added some test vectors for SM4 GCM/CCM.
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      c24ee936
    • Tianjia Zhang's avatar
      crypto: arm64/sm4 - refactor and simplify NEON implementation · 62508017
      Tianjia Zhang authored
      This patch does not add new features. The main work is to refactor and
      simplify the implementation of SM4 NEON, which is reflected in the
      following aspects:
      
      The accelerated implementation supports the arbitrary number of blocks,
      not just multiples of 8, which simplifies the implementation and brings
      some optimization acceleration for data that is not aligned by 8 blocks.
      
      When loading the input data, use the ld4 instruction to replace the
      original ld1 instruction as much as possible, which will save the cost
      of matrix transposition of the input data.
      
      Use 8-block parallelism whenever possible to speed up matrix transpose
      and rotation operations, instead of up to 4-block parallelism.
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      62508017
    • Tianjia Zhang's avatar
      crypto: arm64/sm3 - add NEON assembly implementation · a41b2129
      Tianjia Zhang authored
      This patch adds the NEON acceleration implementation of the SM3 hash
      algorithm. The main algorithm is based on SM3 NEON accelerated work of
      the libgcrypt project.
      
      Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 326 mode
      of tcrypt, and compares the performance data of sm3-generic and sm3-ce.
      The abscissas are blocks of different lengths. The data is tabulated and
      the unit is Mb/s:
      
      update-size    |      16      64     256    1024    2048    4096    8192
      ---------------+--------------------------------------------------------
      sm3-generic    |  185.24  221.28  301.26  307.43  300.83  308.82  308.91
      sm3-neon       |  171.81  220.20  322.94  339.28  334.09  343.61  343.87
      sm3-ce         |  227.48  333.48  502.62  527.87  520.45  534.91  535.40
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      a41b2129
    • Tianjia Zhang's avatar
      crypto: arm64/sm3 - raise the priority of the CE implementation · e1fa51aa
      Tianjia Zhang authored
      Raise the priority of the sm3-ce algorithm from 200 to 400, this is
      to make room for the implementation of sm3-neon.
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      e1fa51aa
    • Anirudh Venkataramanan's avatar
      crypto: tcrypt - Drop leading newlines from prints · 3513828c
      Anirudh Venkataramanan authored
      The top level print banners have a leading newline. It's not entirely
      clear why this exists, but it makes it harder to parse tcrypt test output
      using a script. Drop said newlines.
      
      tcrypt output before this patch:
      
      [...]
            testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption
      [...] test 0 (160 bit key, 16 byte blocks): 1 operation in 2320 cycles (16 bytes)
      
      tcrypt output with this patch:
      
      [...] testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption
      [...] test 0 (160 bit key, 16 byte blocks): 1 operation in 2320 cycles (16 bytes)
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      3513828c
    • Anirudh Venkataramanan's avatar
      crypto: tcrypt - Drop module name from print string · a2ef5630
      Anirudh Venkataramanan authored
      The pr_fmt() define includes KBUILD_MODNAME, and so there's no need
      for pr_err() to also print it. Drop module name from the print string.
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      a2ef5630