1. 04 Nov, 2022 7 commits
    • Tianjia Zhang's avatar
      crypto: arm64/sm4 - refactor and simplify NEON implementation · 62508017
      Tianjia Zhang authored
      This patch does not add new features. The main work is to refactor and
      simplify the implementation of SM4 NEON, which is reflected in the
      following aspects:
      
      The accelerated implementation supports the arbitrary number of blocks,
      not just multiples of 8, which simplifies the implementation and brings
      some optimization acceleration for data that is not aligned by 8 blocks.
      
      When loading the input data, use the ld4 instruction to replace the
      original ld1 instruction as much as possible, which will save the cost
      of matrix transposition of the input data.
      
      Use 8-block parallelism whenever possible to speed up matrix transpose
      and rotation operations, instead of up to 4-block parallelism.
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      62508017
    • Tianjia Zhang's avatar
      crypto: arm64/sm3 - add NEON assembly implementation · a41b2129
      Tianjia Zhang authored
      This patch adds the NEON acceleration implementation of the SM3 hash
      algorithm. The main algorithm is based on SM3 NEON accelerated work of
      the libgcrypt project.
      
      Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 326 mode
      of tcrypt, and compares the performance data of sm3-generic and sm3-ce.
      The abscissas are blocks of different lengths. The data is tabulated and
      the unit is Mb/s:
      
      update-size    |      16      64     256    1024    2048    4096    8192
      ---------------+--------------------------------------------------------
      sm3-generic    |  185.24  221.28  301.26  307.43  300.83  308.82  308.91
      sm3-neon       |  171.81  220.20  322.94  339.28  334.09  343.61  343.87
      sm3-ce         |  227.48  333.48  502.62  527.87  520.45  534.91  535.40
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      a41b2129
    • Tianjia Zhang's avatar
      crypto: arm64/sm3 - raise the priority of the CE implementation · e1fa51aa
      Tianjia Zhang authored
      Raise the priority of the sm3-ce algorithm from 200 to 400, this is
      to make room for the implementation of sm3-neon.
      Signed-off-by: default avatarTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      e1fa51aa
    • Anirudh Venkataramanan's avatar
      crypto: tcrypt - Drop leading newlines from prints · 3513828c
      Anirudh Venkataramanan authored
      The top level print banners have a leading newline. It's not entirely
      clear why this exists, but it makes it harder to parse tcrypt test output
      using a script. Drop said newlines.
      
      tcrypt output before this patch:
      
      [...]
            testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption
      [...] test 0 (160 bit key, 16 byte blocks): 1 operation in 2320 cycles (16 bytes)
      
      tcrypt output with this patch:
      
      [...] testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption
      [...] test 0 (160 bit key, 16 byte blocks): 1 operation in 2320 cycles (16 bytes)
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      3513828c
    • Anirudh Venkataramanan's avatar
      crypto: tcrypt - Drop module name from print string · a2ef5630
      Anirudh Venkataramanan authored
      The pr_fmt() define includes KBUILD_MODNAME, and so there's no need
      for pr_err() to also print it. Drop module name from the print string.
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      a2ef5630
    • Anirudh Venkataramanan's avatar
      crypto: tcrypt - Use pr_info/pr_err · 837a99f5
      Anirudh Venkataramanan authored
      Currently, there's mixed use of printk() and pr_info()/pr_err(). The latter
      prints the module name (because pr_fmt() is defined so) but the former does
      not. As a result there's inconsistency in the printed output. For example:
      
      modprobe mode=211:
      
      [...] test 0 (160 bit key, 16 byte blocks): 1 operation in 2320 cycles (16 bytes)
      [...] test 1 (160 bit key, 64 byte blocks): 1 operation in 2336 cycles (64 bytes)
      
      modprobe mode=215:
      
      [...] tcrypt: test 0 (160 bit key, 16 byte blocks): 1 operation in 2173 cycles (16 bytes)
      [...] tcrypt: test 1 (160 bit key, 64 byte blocks): 1 operation in 2241 cycles (64 bytes)
      
      Replace all instances of printk() with pr_info()/pr_err() so that the
      module name is printed consistently.
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      837a99f5
    • Anirudh Venkataramanan's avatar
      crypto: tcrypt - Use pr_cont to print test results · fdaeb224
      Anirudh Venkataramanan authored
      For some test cases, a line break gets inserted between the test banner
      and the results. For example, with mode=211 this is the output:
      
      [...]
            testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption
      [...] test 0 (160 bit key, 16 byte blocks):
      [...] 1 operation in 2373 cycles (16 bytes)
      
      --snip--
      
      [...]
            testing speed of gcm(aes) (generic-gcm-aesni) encryption
      [...] test 0 (128 bit key, 16 byte blocks):
      [...] 1 operation in 2338 cycles (16 bytes)
      
      Similar behavior is seen in the following cases as well:
      
        modprobe tcrypt mode=212
        modprobe tcrypt mode=213
        modprobe tcrypt mode=221
        modprobe tcrypt mode=300 sec=1
        modprobe tcrypt mode=400 sec=1
      
      This doesn't happen with mode=215:
      
      [...] tcrypt:
                    testing speed of multibuffer rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption
      [...] tcrypt: test 0 (160 bit key, 16 byte blocks): 1 operation in 2215 cycles (16 bytes)
      
      --snip--
      
      [...] tcrypt:
                    testing speed of multibuffer gcm(aes) (generic-gcm-aesni) encryption
      [...] tcrypt: test 0 (128 bit key, 16 byte blocks): 1 operation in 2191 cycles (16 bytes)
      
      This print inconsistency is because printk() is used instead of pr_cont()
      in a few places. Change these to be pr_cont().
      
      checkpatch warns that pr_cont() shouldn't be used. This can be ignored in
      this context as tcrypt already uses pr_cont().
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      fdaeb224
  2. 28 Oct, 2022 33 commits