Commit 3e1a29b3 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:

   - Decryption test vectors are now automatically generated from
     encryption test vectors.

  Algorithms:

   - Fix unaligned access issues in crc32/crc32c.

   - Add zstd compression algorithm.

   - Add AEGIS.

   - Add MORUS.

  Drivers:

   - Add accelerated AEGIS/MORUS on x86.

   - Add accelerated SM4 on arm64.

   - Removed x86 assembly salsa implementation as it is slower than C.

   - Add authenc(hmac(sha*), cbc(aes)) support in inside-secure.

   - Add ctr(aes) support in crypto4xx.

   - Add hardware key support in ccree.

   - Add support for new Centaur CPU in via-rng"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (112 commits)
  crypto: chtls - free beyond end rspq_skb_cache
  crypto: chtls - kbuild warnings
  crypto: chtls - dereference null variable
  crypto: chtls - wait for memory sendmsg, sendpage
  crypto: chtls - key len correction
  crypto: salsa20 - Revert "crypto: salsa20 - export generic helpers"
  crypto: x86/salsa20 - remove x86 salsa20 implementations
  crypto: ccp - Add GET_ID SEV command
  crypto: ccp - Add DOWNLOAD_FIRMWARE SEV command
  crypto: qat - Add MODULE_FIRMWARE for all qat drivers
  crypto: ccree - silence debug prints
  crypto: ccree - better clock handling
  crypto: ccree - correct host regs offset
  crypto: chelsio - Remove separate buffer used for DMA map B0 block in CCM
  crypt: chelsio - Send IV as Immediate for cipher algo
  crypto: chelsio - Return -ENOSPC for transient busy indication.
  crypto: caam/qi - fix warning in init_cgr()
  crypto: caam - fix rfc4543 descriptors
  crypto: caam - fix MC firmware detection
  crypto: clarify licensing of OpenSSL asm code
  ...
parents fd59ccc5 b268b350
#define __ARM_ARCH__ __LINUX_ARM_ARCH__ #define __ARM_ARCH__ __LINUX_ARM_ARCH__
@ SPDX-License-Identifier: GPL-2.0
@ This code is taken from the OpenSSL project but the author (Andy Polyakov)
@ has relicensed it under the GPLv2. Therefore this program is free software;
@ you can redistribute it and/or modify it under the terms of the GNU General
@ Public License version 2 as published by the Free Software Foundation.
@
@ The original headers, including the original license headers, are
@ included below for completeness.
@ ==================================================================== @ ====================================================================
@ Written by Andy Polyakov <appro@fy.chalmers.se> for the OpenSSL @ Written by Andy Polyakov <appro@fy.chalmers.se> for the OpenSSL
@ project. The module is, however, dual licensed under OpenSSL and @ project. The module is, however, dual licensed under OpenSSL and
......
#!/usr/bin/env perl #!/usr/bin/env perl
# SPDX-License-Identifier: GPL-2.0
# This code is taken from the OpenSSL project but the author (Andy Polyakov)
# has relicensed it under the GPLv2. Therefore this program is free software;
# you can redistribute it and/or modify it under the terms of the GNU General
# Public License version 2 as published by the Free Software Foundation.
#
# The original headers, including the original license headers, are
# included below for completeness.
# ==================================================================== # ====================================================================
# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL # Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
# project. The module is, however, dual licensed under OpenSSL and # project. The module is, however, dual licensed under OpenSSL and
# CRYPTOGAMS licenses depending on where you obtain it. For further # CRYPTOGAMS licenses depending on where you obtain it. For further
# details see http://www.openssl.org/~appro/cryptogams/. # details see http://www.openssl.org/~appro/cryptogams/.
#
# Permission to use under GPL terms is granted.
# ==================================================================== # ====================================================================
# SHA256 block procedure for ARMv4. May 2007. # SHA256 block procedure for ARMv4. May 2007.
......
@ SPDX-License-Identifier: GPL-2.0
@ This code is taken from the OpenSSL project but the author (Andy Polyakov)
@ has relicensed it under the GPLv2. Therefore this program is free software;
@ you can redistribute it and/or modify it under the terms of the GNU General
@ Public License version 2 as published by the Free Software Foundation.
@
@ The original headers, including the original license headers, are
@ included below for completeness.
@ ==================================================================== @ ====================================================================
@ Written by Andy Polyakov <appro@openssl.org> for the OpenSSL @ Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
@ project. The module is, however, dual licensed under OpenSSL and @ project. The module is, however, dual licensed under OpenSSL and
@ CRYPTOGAMS licenses depending on where you obtain it. For further @ CRYPTOGAMS licenses depending on where you obtain it. For further
@ details see http://www.openssl.org/~appro/cryptogams/. @ details see http://www.openssl.org/~appro/cryptogams/.
@
@ Permission to use under GPL terms is granted.
@ ==================================================================== @ ====================================================================
@ SHA256 block procedure for ARMv4. May 2007. @ SHA256 block procedure for ARMv4. May 2007.
......
#!/usr/bin/env perl #!/usr/bin/env perl
# SPDX-License-Identifier: GPL-2.0
# This code is taken from the OpenSSL project but the author (Andy Polyakov)
# has relicensed it under the GPLv2. Therefore this program is free software;
# you can redistribute it and/or modify it under the terms of the GNU General
# Public License version 2 as published by the Free Software Foundation.
#
# The original headers, including the original license headers, are
# included below for completeness.
# ==================================================================== # ====================================================================
# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL # Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
# project. The module is, however, dual licensed under OpenSSL and # project. The module is, however, dual licensed under OpenSSL and
# CRYPTOGAMS licenses depending on where you obtain it. For further # CRYPTOGAMS licenses depending on where you obtain it. For further
# details see http://www.openssl.org/~appro/cryptogams/. # details see http://www.openssl.org/~appro/cryptogams/.
#
# Permission to use under GPL terms is granted.
# ==================================================================== # ====================================================================
# SHA512 block procedure for ARMv4. September 2007. # SHA512 block procedure for ARMv4. September 2007.
......
@ SPDX-License-Identifier: GPL-2.0
@ This code is taken from the OpenSSL project but the author (Andy Polyakov)
@ has relicensed it under the GPLv2. Therefore this program is free software;
@ you can redistribute it and/or modify it under the terms of the GNU General
@ Public License version 2 as published by the Free Software Foundation.
@
@ The original headers, including the original license headers, are
@ included below for completeness.
@ ==================================================================== @ ====================================================================
@ Written by Andy Polyakov <appro@openssl.org> for the OpenSSL @ Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
@ project. The module is, however, dual licensed under OpenSSL and @ project. The module is, however, dual licensed under OpenSSL and
@ CRYPTOGAMS licenses depending on where you obtain it. For further @ CRYPTOGAMS licenses depending on where you obtain it. For further
@ details see http://www.openssl.org/~appro/cryptogams/. @ details see http://www.openssl.org/~appro/cryptogams/.
@
@ Permission to use under GPL terms is granted.
@ ==================================================================== @ ====================================================================
@ SHA512 block procedure for ARMv4. September 2007. @ SHA512 block procedure for ARMv4. September 2007.
......
...@@ -47,6 +47,12 @@ config CRYPTO_SM3_ARM64_CE ...@@ -47,6 +47,12 @@ config CRYPTO_SM3_ARM64_CE
select CRYPTO_HASH select CRYPTO_HASH
select CRYPTO_SM3 select CRYPTO_SM3
config CRYPTO_SM4_ARM64_CE
tristate "SM4 symmetric cipher (ARMv8.2 Crypto Extensions)"
depends on KERNEL_MODE_NEON
select CRYPTO_ALGAPI
select CRYPTO_SM4
config CRYPTO_GHASH_ARM64_CE config CRYPTO_GHASH_ARM64_CE
tristate "GHASH/AES-GCM using ARMv8 Crypto Extensions" tristate "GHASH/AES-GCM using ARMv8 Crypto Extensions"
depends on KERNEL_MODE_NEON depends on KERNEL_MODE_NEON
......
...@@ -23,6 +23,9 @@ sha3-ce-y := sha3-ce-glue.o sha3-ce-core.o ...@@ -23,6 +23,9 @@ sha3-ce-y := sha3-ce-glue.o sha3-ce-core.o
obj-$(CONFIG_CRYPTO_SM3_ARM64_CE) += sm3-ce.o obj-$(CONFIG_CRYPTO_SM3_ARM64_CE) += sm3-ce.o
sm3-ce-y := sm3-ce-glue.o sm3-ce-core.o sm3-ce-y := sm3-ce-glue.o sm3-ce-core.o
obj-$(CONFIG_CRYPTO_SM4_ARM64_CE) += sm4-ce.o
sm4-ce-y := sm4-ce-glue.o sm4-ce-core.o
obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o obj-$(CONFIG_CRYPTO_GHASH_ARM64_CE) += ghash-ce.o
ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o ghash-ce-y := ghash-ce-glue.o ghash-ce-core.o
......
...@@ -19,24 +19,33 @@ ...@@ -19,24 +19,33 @@
* u32 *macp, u8 const rk[], u32 rounds); * u32 *macp, u8 const rk[], u32 rounds);
*/ */
ENTRY(ce_aes_ccm_auth_data) ENTRY(ce_aes_ccm_auth_data)
ldr w8, [x3] /* leftover from prev round? */ frame_push 7
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
ldr w25, [x22] /* leftover from prev round? */
ld1 {v0.16b}, [x0] /* load mac */ ld1 {v0.16b}, [x0] /* load mac */
cbz w8, 1f cbz w25, 1f
sub w8, w8, #16 sub w25, w25, #16
eor v1.16b, v1.16b, v1.16b eor v1.16b, v1.16b, v1.16b
0: ldrb w7, [x1], #1 /* get 1 byte of input */ 0: ldrb w7, [x20], #1 /* get 1 byte of input */
subs w2, w2, #1 subs w21, w21, #1
add w8, w8, #1 add w25, w25, #1
ins v1.b[0], w7 ins v1.b[0], w7
ext v1.16b, v1.16b, v1.16b, #1 /* rotate in the input bytes */ ext v1.16b, v1.16b, v1.16b, #1 /* rotate in the input bytes */
beq 8f /* out of input? */ beq 8f /* out of input? */
cbnz w8, 0b cbnz w25, 0b
eor v0.16b, v0.16b, v1.16b eor v0.16b, v0.16b, v1.16b
1: ld1 {v3.4s}, [x4] /* load first round key */ 1: ld1 {v3.4s}, [x23] /* load first round key */
prfm pldl1strm, [x1] prfm pldl1strm, [x20]
cmp w5, #12 /* which key size? */ cmp w24, #12 /* which key size? */
add x6, x4, #16 add x6, x23, #16
sub w7, w5, #2 /* modified # of rounds */ sub w7, w24, #2 /* modified # of rounds */
bmi 2f bmi 2f
bne 5f bne 5f
mov v5.16b, v3.16b mov v5.16b, v3.16b
...@@ -55,33 +64,43 @@ ENTRY(ce_aes_ccm_auth_data) ...@@ -55,33 +64,43 @@ ENTRY(ce_aes_ccm_auth_data)
ld1 {v5.4s}, [x6], #16 /* load next round key */ ld1 {v5.4s}, [x6], #16 /* load next round key */
bpl 3b bpl 3b
aese v0.16b, v4.16b aese v0.16b, v4.16b
subs w2, w2, #16 /* last data? */ subs w21, w21, #16 /* last data? */
eor v0.16b, v0.16b, v5.16b /* final round */ eor v0.16b, v0.16b, v5.16b /* final round */
bmi 6f bmi 6f
ld1 {v1.16b}, [x1], #16 /* load next input block */ ld1 {v1.16b}, [x20], #16 /* load next input block */
eor v0.16b, v0.16b, v1.16b /* xor with mac */ eor v0.16b, v0.16b, v1.16b /* xor with mac */
bne 1b beq 6f
6: st1 {v0.16b}, [x0] /* store mac */
if_will_cond_yield_neon
st1 {v0.16b}, [x19] /* store mac */
do_cond_yield_neon
ld1 {v0.16b}, [x19] /* reload mac */
endif_yield_neon
b 1b
6: st1 {v0.16b}, [x19] /* store mac */
beq 10f beq 10f
adds w2, w2, #16 adds w21, w21, #16
beq 10f beq 10f
mov w8, w2 mov w25, w21
7: ldrb w7, [x1], #1 7: ldrb w7, [x20], #1
umov w6, v0.b[0] umov w6, v0.b[0]
eor w6, w6, w7 eor w6, w6, w7
strb w6, [x0], #1 strb w6, [x19], #1
subs w2, w2, #1 subs w21, w21, #1
beq 10f beq 10f
ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */ ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */
b 7b b 7b
8: mov w7, w8 8: mov w7, w25
add w8, w8, #16 add w25, w25, #16
9: ext v1.16b, v1.16b, v1.16b, #1 9: ext v1.16b, v1.16b, v1.16b, #1
adds w7, w7, #1 adds w7, w7, #1
bne 9b bne 9b
eor v0.16b, v0.16b, v1.16b eor v0.16b, v0.16b, v1.16b
st1 {v0.16b}, [x0] st1 {v0.16b}, [x19]
10: str w8, [x3] 10: str w25, [x22]
frame_pop
ret ret
ENDPROC(ce_aes_ccm_auth_data) ENDPROC(ce_aes_ccm_auth_data)
...@@ -126,19 +145,29 @@ ENTRY(ce_aes_ccm_final) ...@@ -126,19 +145,29 @@ ENTRY(ce_aes_ccm_final)
ENDPROC(ce_aes_ccm_final) ENDPROC(ce_aes_ccm_final)
.macro aes_ccm_do_crypt,enc .macro aes_ccm_do_crypt,enc
ldr x8, [x6, #8] /* load lower ctr */ frame_push 8
ld1 {v0.16b}, [x5] /* load mac */
CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
mov x25, x6
ldr x26, [x25, #8] /* load lower ctr */
ld1 {v0.16b}, [x24] /* load mac */
CPU_LE( rev x26, x26 ) /* keep swabbed ctr in reg */
0: /* outer loop */ 0: /* outer loop */
ld1 {v1.8b}, [x6] /* load upper ctr */ ld1 {v1.8b}, [x25] /* load upper ctr */
prfm pldl1strm, [x1] prfm pldl1strm, [x20]
add x8, x8, #1 add x26, x26, #1
rev x9, x8 rev x9, x26
cmp w4, #12 /* which key size? */ cmp w23, #12 /* which key size? */
sub w7, w4, #2 /* get modified # of rounds */ sub w7, w23, #2 /* get modified # of rounds */
ins v1.d[1], x9 /* no carry in lower ctr */ ins v1.d[1], x9 /* no carry in lower ctr */
ld1 {v3.4s}, [x3] /* load first round key */ ld1 {v3.4s}, [x22] /* load first round key */
add x10, x3, #16 add x10, x22, #16
bmi 1f bmi 1f
bne 4f bne 4f
mov v5.16b, v3.16b mov v5.16b, v3.16b
...@@ -165,9 +194,9 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ ...@@ -165,9 +194,9 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */
bpl 2b bpl 2b
aese v0.16b, v4.16b aese v0.16b, v4.16b
aese v1.16b, v4.16b aese v1.16b, v4.16b
subs w2, w2, #16 subs w21, w21, #16
bmi 6f /* partial block? */ bmi 7f /* partial block? */
ld1 {v2.16b}, [x1], #16 /* load next input block */ ld1 {v2.16b}, [x20], #16 /* load next input block */
.if \enc == 1 .if \enc == 1
eor v2.16b, v2.16b, v5.16b /* final round enc+mac */ eor v2.16b, v2.16b, v5.16b /* final round enc+mac */
eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */ eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */
...@@ -176,18 +205,29 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ ...@@ -176,18 +205,29 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */
eor v1.16b, v2.16b, v5.16b /* final round enc */ eor v1.16b, v2.16b, v5.16b /* final round enc */
.endif .endif
eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */ eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */
st1 {v1.16b}, [x0], #16 /* write output block */ st1 {v1.16b}, [x19], #16 /* write output block */
bne 0b beq 5f
CPU_LE( rev x8, x8 )
st1 {v0.16b}, [x5] /* store mac */ if_will_cond_yield_neon
str x8, [x6, #8] /* store lsb end of ctr (BE) */ st1 {v0.16b}, [x24] /* store mac */
5: ret do_cond_yield_neon
ld1 {v0.16b}, [x24] /* reload mac */
6: eor v0.16b, v0.16b, v5.16b /* final round mac */ endif_yield_neon
b 0b
5:
CPU_LE( rev x26, x26 )
st1 {v0.16b}, [x24] /* store mac */
str x26, [x25, #8] /* store lsb end of ctr (BE) */
6: frame_pop
ret
7: eor v0.16b, v0.16b, v5.16b /* final round mac */
eor v1.16b, v1.16b, v5.16b /* final round enc */ eor v1.16b, v1.16b, v5.16b /* final round enc */
st1 {v0.16b}, [x5] /* store mac */ st1 {v0.16b}, [x24] /* store mac */
add w2, w2, #16 /* process partial tail block */ add w21, w21, #16 /* process partial tail block */
7: ldrb w9, [x1], #1 /* get 1 byte of input */ 8: ldrb w9, [x20], #1 /* get 1 byte of input */
umov w6, v1.b[0] /* get top crypted ctr byte */ umov w6, v1.b[0] /* get top crypted ctr byte */
umov w7, v0.b[0] /* get top mac byte */ umov w7, v0.b[0] /* get top mac byte */
.if \enc == 1 .if \enc == 1
...@@ -197,13 +237,13 @@ CPU_LE( rev x8, x8 ) ...@@ -197,13 +237,13 @@ CPU_LE( rev x8, x8 )
eor w9, w9, w6 eor w9, w9, w6
eor w7, w7, w9 eor w7, w7, w9
.endif .endif
strb w9, [x0], #1 /* store out byte */ strb w9, [x19], #1 /* store out byte */
strb w7, [x5], #1 /* store mac byte */ strb w7, [x24], #1 /* store mac byte */
subs w2, w2, #1 subs w21, w21, #1
beq 5b beq 6b
ext v0.16b, v0.16b, v0.16b, #1 /* shift out mac byte */ ext v0.16b, v0.16b, v0.16b, #1 /* shift out mac byte */
ext v1.16b, v1.16b, v1.16b, #1 /* shift out ctr byte */ ext v1.16b, v1.16b, v1.16b, #1 /* shift out ctr byte */
b 7b b 8b
.endm .endm
/* /*
......
...@@ -30,18 +30,21 @@ ...@@ -30,18 +30,21 @@
.endm .endm
/* prepare for encryption with key in rk[] */ /* prepare for encryption with key in rk[] */
.macro enc_prepare, rounds, rk, ignore .macro enc_prepare, rounds, rk, temp
load_round_keys \rounds, \rk mov \temp, \rk
load_round_keys \rounds, \temp
.endm .endm
/* prepare for encryption (again) but with new key in rk[] */ /* prepare for encryption (again) but with new key in rk[] */
.macro enc_switch_key, rounds, rk, ignore .macro enc_switch_key, rounds, rk, temp
load_round_keys \rounds, \rk mov \temp, \rk
load_round_keys \rounds, \temp
.endm .endm
/* prepare for decryption with key in rk[] */ /* prepare for decryption with key in rk[] */
.macro dec_prepare, rounds, rk, ignore .macro dec_prepare, rounds, rk, temp
load_round_keys \rounds, \rk mov \temp, \rk
load_round_keys \rounds, \temp
.endm .endm
.macro do_enc_Nx, de, mc, k, i0, i1, i2, i3 .macro do_enc_Nx, de, mc, k, i0, i1, i2, i3
......
This diff is collapsed.
This diff is collapsed.
...@@ -100,9 +100,10 @@ ...@@ -100,9 +100,10 @@
dCONSTANT .req d0 dCONSTANT .req d0
qCONSTANT .req q0 qCONSTANT .req q0
BUF .req x0 BUF .req x19
LEN .req x1 LEN .req x20
CRC .req x2 CRC .req x21
CONST .req x22
vzr .req v9 vzr .req v9
...@@ -123,7 +124,14 @@ ENTRY(crc32_pmull_le) ...@@ -123,7 +124,14 @@ ENTRY(crc32_pmull_le)
ENTRY(crc32c_pmull_le) ENTRY(crc32c_pmull_le)
adr_l x3, .Lcrc32c_constants adr_l x3, .Lcrc32c_constants
0: bic LEN, LEN, #15 0: frame_push 4, 64
mov BUF, x0
mov LEN, x1
mov CRC, x2
mov CONST, x3
bic LEN, LEN, #15
ld1 {v1.16b-v4.16b}, [BUF], #0x40 ld1 {v1.16b-v4.16b}, [BUF], #0x40
movi vzr.16b, #0 movi vzr.16b, #0
fmov dCONSTANT, CRC fmov dCONSTANT, CRC
...@@ -132,7 +140,7 @@ ENTRY(crc32c_pmull_le) ...@@ -132,7 +140,7 @@ ENTRY(crc32c_pmull_le)
cmp LEN, #0x40 cmp LEN, #0x40
b.lt less_64 b.lt less_64
ldr qCONSTANT, [x3] ldr qCONSTANT, [CONST]
loop_64: /* 64 bytes Full cache line folding */ loop_64: /* 64 bytes Full cache line folding */
sub LEN, LEN, #0x40 sub LEN, LEN, #0x40
...@@ -162,10 +170,21 @@ loop_64: /* 64 bytes Full cache line folding */ ...@@ -162,10 +170,21 @@ loop_64: /* 64 bytes Full cache line folding */
eor v4.16b, v4.16b, v8.16b eor v4.16b, v4.16b, v8.16b
cmp LEN, #0x40 cmp LEN, #0x40
b.ge loop_64 b.lt less_64
if_will_cond_yield_neon
stp q1, q2, [sp, #.Lframe_local_offset]
stp q3, q4, [sp, #.Lframe_local_offset + 32]
do_cond_yield_neon
ldp q1, q2, [sp, #.Lframe_local_offset]
ldp q3, q4, [sp, #.Lframe_local_offset + 32]
ldr qCONSTANT, [CONST]
movi vzr.16b, #0
endif_yield_neon
b loop_64
less_64: /* Folding cache line into 128bit */ less_64: /* Folding cache line into 128bit */
ldr qCONSTANT, [x3, #16] ldr qCONSTANT, [CONST, #16]
pmull2 v5.1q, v1.2d, vCONSTANT.2d pmull2 v5.1q, v1.2d, vCONSTANT.2d
pmull v1.1q, v1.1d, vCONSTANT.1d pmull v1.1q, v1.1d, vCONSTANT.1d
...@@ -204,8 +223,8 @@ fold_64: ...@@ -204,8 +223,8 @@ fold_64:
eor v1.16b, v1.16b, v2.16b eor v1.16b, v1.16b, v2.16b
/* final 32-bit fold */ /* final 32-bit fold */
ldr dCONSTANT, [x3, #32] ldr dCONSTANT, [CONST, #32]
ldr d3, [x3, #40] ldr d3, [CONST, #40]
ext v2.16b, v1.16b, vzr.16b, #4 ext v2.16b, v1.16b, vzr.16b, #4
and v1.16b, v1.16b, v3.16b and v1.16b, v1.16b, v3.16b
...@@ -213,7 +232,7 @@ fold_64: ...@@ -213,7 +232,7 @@ fold_64:
eor v1.16b, v1.16b, v2.16b eor v1.16b, v1.16b, v2.16b
/* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */ /* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
ldr qCONSTANT, [x3, #48] ldr qCONSTANT, [CONST, #48]
and v2.16b, v1.16b, v3.16b and v2.16b, v1.16b, v3.16b
ext v2.16b, vzr.16b, v2.16b, #8 ext v2.16b, vzr.16b, v2.16b, #8
...@@ -223,6 +242,7 @@ fold_64: ...@@ -223,6 +242,7 @@ fold_64:
eor v1.16b, v1.16b, v2.16b eor v1.16b, v1.16b, v2.16b
mov w0, v1.s[1] mov w0, v1.s[1]
frame_pop
ret ret
ENDPROC(crc32_pmull_le) ENDPROC(crc32_pmull_le)
ENDPROC(crc32c_pmull_le) ENDPROC(crc32c_pmull_le)
......
...@@ -74,13 +74,19 @@ ...@@ -74,13 +74,19 @@
.text .text
.cpu generic+crypto .cpu generic+crypto
arg1_low32 .req w0 arg1_low32 .req w19
arg2 .req x1 arg2 .req x20
arg3 .req x2 arg3 .req x21
vzr .req v13 vzr .req v13
ENTRY(crc_t10dif_pmull) ENTRY(crc_t10dif_pmull)
frame_push 3, 128
mov arg1_low32, w0
mov arg2, x1
mov arg3, x2
movi vzr.16b, #0 // init zero register movi vzr.16b, #0 // init zero register
// adjust the 16-bit initial_crc value, scale it to 32 bits // adjust the 16-bit initial_crc value, scale it to 32 bits
...@@ -175,8 +181,25 @@ CPU_LE( ext v12.16b, v12.16b, v12.16b, #8 ) ...@@ -175,8 +181,25 @@ CPU_LE( ext v12.16b, v12.16b, v12.16b, #8 )
subs arg3, arg3, #128 subs arg3, arg3, #128
// check if there is another 64B in the buffer to be able to fold // check if there is another 64B in the buffer to be able to fold
b.ge _fold_64_B_loop b.lt _fold_64_B_end
if_will_cond_yield_neon
stp q0, q1, [sp, #.Lframe_local_offset]
stp q2, q3, [sp, #.Lframe_local_offset + 32]
stp q4, q5, [sp, #.Lframe_local_offset + 64]
stp q6, q7, [sp, #.Lframe_local_offset + 96]
do_cond_yield_neon
ldp q0, q1, [sp, #.Lframe_local_offset]
ldp q2, q3, [sp, #.Lframe_local_offset + 32]
ldp q4, q5, [sp, #.Lframe_local_offset + 64]
ldp q6, q7, [sp, #.Lframe_local_offset + 96]
ldr_l q10, rk3, x8
movi vzr.16b, #0 // init zero register
endif_yield_neon
b _fold_64_B_loop
_fold_64_B_end:
// at this point, the buffer pointer is pointing at the last y Bytes // at this point, the buffer pointer is pointing at the last y Bytes
// of the buffer the 64B of folded data is in 4 of the vector // of the buffer the 64B of folded data is in 4 of the vector
// registers: v0, v1, v2, v3 // registers: v0, v1, v2, v3
...@@ -304,6 +327,7 @@ _barrett: ...@@ -304,6 +327,7 @@ _barrett:
_cleanup: _cleanup:
// scale the result back to 16 bits // scale the result back to 16 bits
lsr x0, x0, #16 lsr x0, x0, #16
frame_pop
ret ret
_less_than_128: _less_than_128:
......
...@@ -213,22 +213,31 @@ ...@@ -213,22 +213,31 @@
.endm .endm
.macro __pmull_ghash, pn .macro __pmull_ghash, pn
ld1 {SHASH.2d}, [x3] frame_push 5
ld1 {XL.2d}, [x1]
mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
0: ld1 {SHASH.2d}, [x22]
ld1 {XL.2d}, [x20]
ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8
eor SHASH2.16b, SHASH2.16b, SHASH.16b eor SHASH2.16b, SHASH2.16b, SHASH.16b
__pmull_pre_\pn __pmull_pre_\pn
/* do the head block first, if supplied */ /* do the head block first, if supplied */
cbz x4, 0f cbz x23, 1f
ld1 {T1.2d}, [x4] ld1 {T1.2d}, [x23]
b 1f mov x23, xzr
b 2f
0: ld1 {T1.2d}, [x2], #16 1: ld1 {T1.2d}, [x21], #16
sub w0, w0, #1 sub w19, w19, #1
1: /* multiply XL by SHASH in GF(2^128) */ 2: /* multiply XL by SHASH in GF(2^128) */
CPU_LE( rev64 T1.16b, T1.16b ) CPU_LE( rev64 T1.16b, T1.16b )
ext T2.16b, XL.16b, XL.16b, #8 ext T2.16b, XL.16b, XL.16b, #8
...@@ -250,9 +259,18 @@ CPU_LE( rev64 T1.16b, T1.16b ) ...@@ -250,9 +259,18 @@ CPU_LE( rev64 T1.16b, T1.16b )
eor T2.16b, T2.16b, XH.16b eor T2.16b, T2.16b, XH.16b
eor XL.16b, XL.16b, T2.16b eor XL.16b, XL.16b, T2.16b
cbnz w0, 0b cbz w19, 3f
if_will_cond_yield_neon
st1 {XL.2d}, [x20]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
st1 {XL.2d}, [x1] 3: st1 {XL.2d}, [x20]
frame_pop
ret ret
.endm .endm
...@@ -304,38 +322,55 @@ ENDPROC(pmull_ghash_update_p8) ...@@ -304,38 +322,55 @@ ENDPROC(pmull_ghash_update_p8)
.endm .endm
.macro pmull_gcm_do_crypt, enc .macro pmull_gcm_do_crypt, enc
ld1 {SHASH.2d}, [x4] frame_push 10
ld1 {XL.2d}, [x1]
ldr x8, [x5, #8] // load lower counter mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
mov x23, x4
mov x24, x5
mov x25, x6
mov x26, x7
.if \enc == 1
ldr x27, [sp, #96] // first stacked arg
.endif
ldr x28, [x24, #8] // load lower counter
CPU_LE( rev x28, x28 )
0: mov x0, x25
load_round_keys w26, x0
ld1 {SHASH.2d}, [x23]
ld1 {XL.2d}, [x20]
movi MASK.16b, #0xe1 movi MASK.16b, #0xe1
ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8
CPU_LE( rev x8, x8 )
shl MASK.2d, MASK.2d, #57 shl MASK.2d, MASK.2d, #57
eor SHASH2.16b, SHASH2.16b, SHASH.16b eor SHASH2.16b, SHASH2.16b, SHASH.16b
.if \enc == 1 .if \enc == 1
ld1 {KS.16b}, [x7] ld1 {KS.16b}, [x27]
.endif .endif
0: ld1 {CTR.8b}, [x5] // load upper counter 1: ld1 {CTR.8b}, [x24] // load upper counter
ld1 {INP.16b}, [x3], #16 ld1 {INP.16b}, [x22], #16
rev x9, x8 rev x9, x28
add x8, x8, #1 add x28, x28, #1
sub w0, w0, #1 sub w19, w19, #1
ins CTR.d[1], x9 // set lower counter ins CTR.d[1], x9 // set lower counter
.if \enc == 1 .if \enc == 1
eor INP.16b, INP.16b, KS.16b // encrypt input eor INP.16b, INP.16b, KS.16b // encrypt input
st1 {INP.16b}, [x2], #16 st1 {INP.16b}, [x21], #16
.endif .endif
rev64 T1.16b, INP.16b rev64 T1.16b, INP.16b
cmp w6, #12 cmp w26, #12
b.ge 2f // AES-192/256? b.ge 4f // AES-192/256?
1: enc_round CTR, v21 2: enc_round CTR, v21
ext T2.16b, XL.16b, XL.16b, #8 ext T2.16b, XL.16b, XL.16b, #8
ext IN1.16b, T1.16b, T1.16b, #8 ext IN1.16b, T1.16b, T1.16b, #8
...@@ -390,27 +425,39 @@ CPU_LE( rev x8, x8 ) ...@@ -390,27 +425,39 @@ CPU_LE( rev x8, x8 )
.if \enc == 0 .if \enc == 0
eor INP.16b, INP.16b, KS.16b eor INP.16b, INP.16b, KS.16b
st1 {INP.16b}, [x2], #16 st1 {INP.16b}, [x21], #16
.endif .endif
cbnz w0, 0b cbz w19, 3f
CPU_LE( rev x8, x8 ) if_will_cond_yield_neon
st1 {XL.2d}, [x1] st1 {XL.2d}, [x20]
str x8, [x5, #8] // store lower counter .if \enc == 1
st1 {KS.16b}, [x27]
.endif
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
3: st1 {XL.2d}, [x20]
.if \enc == 1 .if \enc == 1
st1 {KS.16b}, [x7] st1 {KS.16b}, [x27]
.endif .endif
CPU_LE( rev x28, x28 )
str x28, [x24, #8] // store lower counter
frame_pop
ret ret
2: b.eq 3f // AES-192? 4: b.eq 5f // AES-192?
enc_round CTR, v17 enc_round CTR, v17
enc_round CTR, v18 enc_round CTR, v18
3: enc_round CTR, v19 5: enc_round CTR, v19
enc_round CTR, v20 enc_round CTR, v20
b 1b b 2b
.endm .endm
/* /*
......
...@@ -63,11 +63,12 @@ static void (*pmull_ghash_update)(int blocks, u64 dg[], const char *src, ...@@ -63,11 +63,12 @@ static void (*pmull_ghash_update)(int blocks, u64 dg[], const char *src,
asmlinkage void pmull_gcm_encrypt(int blocks, u64 dg[], u8 dst[], asmlinkage void pmull_gcm_encrypt(int blocks, u64 dg[], u8 dst[],
const u8 src[], struct ghash_key const *k, const u8 src[], struct ghash_key const *k,
u8 ctr[], int rounds, u8 ks[]); u8 ctr[], u32 const rk[], int rounds,
u8 ks[]);
asmlinkage void pmull_gcm_decrypt(int blocks, u64 dg[], u8 dst[], asmlinkage void pmull_gcm_decrypt(int blocks, u64 dg[], u8 dst[],
const u8 src[], struct ghash_key const *k, const u8 src[], struct ghash_key const *k,
u8 ctr[], int rounds); u8 ctr[], u32 const rk[], int rounds);
asmlinkage void pmull_gcm_encrypt_block(u8 dst[], u8 const src[], asmlinkage void pmull_gcm_encrypt_block(u8 dst[], u8 const src[],
u32 const rk[], int rounds); u32 const rk[], int rounds);
...@@ -368,26 +369,29 @@ static int gcm_encrypt(struct aead_request *req) ...@@ -368,26 +369,29 @@ static int gcm_encrypt(struct aead_request *req)
pmull_gcm_encrypt_block(ks, iv, NULL, pmull_gcm_encrypt_block(ks, iv, NULL,
num_rounds(&ctx->aes_key)); num_rounds(&ctx->aes_key));
put_unaligned_be32(3, iv + GCM_IV_SIZE); put_unaligned_be32(3, iv + GCM_IV_SIZE);
kernel_neon_end();
err = skcipher_walk_aead_encrypt(&walk, req, true); err = skcipher_walk_aead_encrypt(&walk, req, false);
while (walk.nbytes >= AES_BLOCK_SIZE) { while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE; int blocks = walk.nbytes / AES_BLOCK_SIZE;
kernel_neon_begin();
pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr, pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr,
walk.src.virt.addr, &ctx->ghash_key, walk.src.virt.addr, &ctx->ghash_key,
iv, num_rounds(&ctx->aes_key), ks); iv, ctx->aes_key.key_enc,
num_rounds(&ctx->aes_key), ks);
kernel_neon_end();
err = skcipher_walk_done(&walk, err = skcipher_walk_done(&walk,
walk.nbytes % AES_BLOCK_SIZE); walk.nbytes % AES_BLOCK_SIZE);
} }
kernel_neon_end();
} else { } else {
__aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv,
num_rounds(&ctx->aes_key)); num_rounds(&ctx->aes_key));
put_unaligned_be32(2, iv + GCM_IV_SIZE); put_unaligned_be32(2, iv + GCM_IV_SIZE);
err = skcipher_walk_aead_encrypt(&walk, req, true); err = skcipher_walk_aead_encrypt(&walk, req, false);
while (walk.nbytes >= AES_BLOCK_SIZE) { while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE; int blocks = walk.nbytes / AES_BLOCK_SIZE;
...@@ -467,15 +471,19 @@ static int gcm_decrypt(struct aead_request *req) ...@@ -467,15 +471,19 @@ static int gcm_decrypt(struct aead_request *req)
pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc, pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc,
num_rounds(&ctx->aes_key)); num_rounds(&ctx->aes_key));
put_unaligned_be32(2, iv + GCM_IV_SIZE); put_unaligned_be32(2, iv + GCM_IV_SIZE);
kernel_neon_end();
err = skcipher_walk_aead_decrypt(&walk, req, true); err = skcipher_walk_aead_decrypt(&walk, req, false);
while (walk.nbytes >= AES_BLOCK_SIZE) { while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE; int blocks = walk.nbytes / AES_BLOCK_SIZE;
kernel_neon_begin();
pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr, pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr,
walk.src.virt.addr, &ctx->ghash_key, walk.src.virt.addr, &ctx->ghash_key,
iv, num_rounds(&ctx->aes_key)); iv, ctx->aes_key.key_enc,
num_rounds(&ctx->aes_key));
kernel_neon_end();
err = skcipher_walk_done(&walk, err = skcipher_walk_done(&walk,
walk.nbytes % AES_BLOCK_SIZE); walk.nbytes % AES_BLOCK_SIZE);
...@@ -483,14 +491,12 @@ static int gcm_decrypt(struct aead_request *req) ...@@ -483,14 +491,12 @@ static int gcm_decrypt(struct aead_request *req)
if (walk.nbytes) if (walk.nbytes)
pmull_gcm_encrypt_block(iv, iv, NULL, pmull_gcm_encrypt_block(iv, iv, NULL,
num_rounds(&ctx->aes_key)); num_rounds(&ctx->aes_key));
kernel_neon_end();
} else { } else {
__aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv,
num_rounds(&ctx->aes_key)); num_rounds(&ctx->aes_key));
put_unaligned_be32(2, iv + GCM_IV_SIZE); put_unaligned_be32(2, iv + GCM_IV_SIZE);
err = skcipher_walk_aead_decrypt(&walk, req, true); err = skcipher_walk_aead_decrypt(&walk, req, false);
while (walk.nbytes >= AES_BLOCK_SIZE) { while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE; int blocks = walk.nbytes / AES_BLOCK_SIZE;
......
...@@ -69,30 +69,36 @@ ...@@ -69,30 +69,36 @@
* int blocks) * int blocks)
*/ */
ENTRY(sha1_ce_transform) ENTRY(sha1_ce_transform)
frame_push 3
mov x19, x0
mov x20, x1
mov x21, x2
/* load round constants */ /* load round constants */
loadrc k0.4s, 0x5a827999, w6 0: loadrc k0.4s, 0x5a827999, w6
loadrc k1.4s, 0x6ed9eba1, w6 loadrc k1.4s, 0x6ed9eba1, w6
loadrc k2.4s, 0x8f1bbcdc, w6 loadrc k2.4s, 0x8f1bbcdc, w6
loadrc k3.4s, 0xca62c1d6, w6 loadrc k3.4s, 0xca62c1d6, w6
/* load state */ /* load state */
ld1 {dgav.4s}, [x0] ld1 {dgav.4s}, [x19]
ldr dgb, [x0, #16] ldr dgb, [x19, #16]
/* load sha1_ce_state::finalize */ /* load sha1_ce_state::finalize */
ldr_l w4, sha1_ce_offsetof_finalize, x4 ldr_l w4, sha1_ce_offsetof_finalize, x4
ldr w4, [x0, x4] ldr w4, [x19, x4]
/* load input */ /* load input */
0: ld1 {v8.4s-v11.4s}, [x1], #64 1: ld1 {v8.4s-v11.4s}, [x20], #64
sub w2, w2, #1 sub w21, w21, #1
CPU_LE( rev32 v8.16b, v8.16b ) CPU_LE( rev32 v8.16b, v8.16b )
CPU_LE( rev32 v9.16b, v9.16b ) CPU_LE( rev32 v9.16b, v9.16b )
CPU_LE( rev32 v10.16b, v10.16b ) CPU_LE( rev32 v10.16b, v10.16b )
CPU_LE( rev32 v11.16b, v11.16b ) CPU_LE( rev32 v11.16b, v11.16b )
1: add t0.4s, v8.4s, k0.4s 2: add t0.4s, v8.4s, k0.4s
mov dg0v.16b, dgav.16b mov dg0v.16b, dgav.16b
add_update c, ev, k0, 8, 9, 10, 11, dgb add_update c, ev, k0, 8, 9, 10, 11, dgb
...@@ -123,16 +129,25 @@ CPU_LE( rev32 v11.16b, v11.16b ) ...@@ -123,16 +129,25 @@ CPU_LE( rev32 v11.16b, v11.16b )
add dgbv.2s, dgbv.2s, dg1v.2s add dgbv.2s, dgbv.2s, dg1v.2s
add dgav.4s, dgav.4s, dg0v.4s add dgav.4s, dgav.4s, dg0v.4s
cbnz w2, 0b cbz w21, 3f
if_will_cond_yield_neon
st1 {dgav.4s}, [x19]
str dgb, [x19, #16]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
/* /*
* Final block: add padding and total bit count. * Final block: add padding and total bit count.
* Skip if the input size was not a round multiple of the block size, * Skip if the input size was not a round multiple of the block size,
* the padding is handled by the C code in that case. * the padding is handled by the C code in that case.
*/ */
cbz x4, 3f 3: cbz x4, 4f
ldr_l w4, sha1_ce_offsetof_count, x4 ldr_l w4, sha1_ce_offsetof_count, x4
ldr x4, [x0, x4] ldr x4, [x19, x4]
movi v9.2d, #0 movi v9.2d, #0
mov x8, #0x80000000 mov x8, #0x80000000
movi v10.2d, #0 movi v10.2d, #0
...@@ -141,10 +156,11 @@ CPU_LE( rev32 v11.16b, v11.16b ) ...@@ -141,10 +156,11 @@ CPU_LE( rev32 v11.16b, v11.16b )
mov x4, #0 mov x4, #0
mov v11.d[0], xzr mov v11.d[0], xzr
mov v11.d[1], x7 mov v11.d[1], x7
b 1b b 2b
/* store new state */ /* store new state */
3: st1 {dgav.4s}, [x0] 4: st1 {dgav.4s}, [x19]
str dgb, [x0, #16] str dgb, [x19, #16]
frame_pop
ret ret
ENDPROC(sha1_ce_transform) ENDPROC(sha1_ce_transform)
...@@ -79,30 +79,36 @@ ...@@ -79,30 +79,36 @@
*/ */
.text .text
ENTRY(sha2_ce_transform) ENTRY(sha2_ce_transform)
frame_push 3
mov x19, x0
mov x20, x1
mov x21, x2
/* load round constants */ /* load round constants */
adr_l x8, .Lsha2_rcon 0: adr_l x8, .Lsha2_rcon
ld1 { v0.4s- v3.4s}, [x8], #64 ld1 { v0.4s- v3.4s}, [x8], #64
ld1 { v4.4s- v7.4s}, [x8], #64 ld1 { v4.4s- v7.4s}, [x8], #64
ld1 { v8.4s-v11.4s}, [x8], #64 ld1 { v8.4s-v11.4s}, [x8], #64
ld1 {v12.4s-v15.4s}, [x8] ld1 {v12.4s-v15.4s}, [x8]
/* load state */ /* load state */
ld1 {dgav.4s, dgbv.4s}, [x0] ld1 {dgav.4s, dgbv.4s}, [x19]
/* load sha256_ce_state::finalize */ /* load sha256_ce_state::finalize */
ldr_l w4, sha256_ce_offsetof_finalize, x4 ldr_l w4, sha256_ce_offsetof_finalize, x4
ldr w4, [x0, x4] ldr w4, [x19, x4]
/* load input */ /* load input */
0: ld1 {v16.4s-v19.4s}, [x1], #64 1: ld1 {v16.4s-v19.4s}, [x20], #64
sub w2, w2, #1 sub w21, w21, #1
CPU_LE( rev32 v16.16b, v16.16b ) CPU_LE( rev32 v16.16b, v16.16b )
CPU_LE( rev32 v17.16b, v17.16b ) CPU_LE( rev32 v17.16b, v17.16b )
CPU_LE( rev32 v18.16b, v18.16b ) CPU_LE( rev32 v18.16b, v18.16b )
CPU_LE( rev32 v19.16b, v19.16b ) CPU_LE( rev32 v19.16b, v19.16b )
1: add t0.4s, v16.4s, v0.4s 2: add t0.4s, v16.4s, v0.4s
mov dg0v.16b, dgav.16b mov dg0v.16b, dgav.16b
mov dg1v.16b, dgbv.16b mov dg1v.16b, dgbv.16b
...@@ -131,16 +137,24 @@ CPU_LE( rev32 v19.16b, v19.16b ) ...@@ -131,16 +137,24 @@ CPU_LE( rev32 v19.16b, v19.16b )
add dgbv.4s, dgbv.4s, dg1v.4s add dgbv.4s, dgbv.4s, dg1v.4s
/* handled all input blocks? */ /* handled all input blocks? */
cbnz w2, 0b cbz w21, 3f
if_will_cond_yield_neon
st1 {dgav.4s, dgbv.4s}, [x19]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
/* /*
* Final block: add padding and total bit count. * Final block: add padding and total bit count.
* Skip if the input size was not a round multiple of the block size, * Skip if the input size was not a round multiple of the block size,
* the padding is handled by the C code in that case. * the padding is handled by the C code in that case.
*/ */
cbz x4, 3f 3: cbz x4, 4f
ldr_l w4, sha256_ce_offsetof_count, x4 ldr_l w4, sha256_ce_offsetof_count, x4
ldr x4, [x0, x4] ldr x4, [x19, x4]
movi v17.2d, #0 movi v17.2d, #0
mov x8, #0x80000000 mov x8, #0x80000000
movi v18.2d, #0 movi v18.2d, #0
...@@ -149,9 +163,10 @@ CPU_LE( rev32 v19.16b, v19.16b ) ...@@ -149,9 +163,10 @@ CPU_LE( rev32 v19.16b, v19.16b )
mov x4, #0 mov x4, #0
mov v19.d[0], xzr mov v19.d[0], xzr
mov v19.d[1], x7 mov v19.d[1], x7
b 1b b 2b
/* store new state */ /* store new state */
3: st1 {dgav.4s, dgbv.4s}, [x0] 4: st1 {dgav.4s, dgbv.4s}, [x19]
frame_pop
ret ret
ENDPROC(sha2_ce_transform) ENDPROC(sha2_ce_transform)
// SPDX-License-Identifier: GPL-2.0
// This code is taken from the OpenSSL project but the author (Andy Polyakov)
// has relicensed it under the GPLv2. Therefore this program is free software;
// you can redistribute it and/or modify it under the terms of the GNU General
// Public License version 2 as published by the Free Software Foundation.
//
// The original headers, including the original license headers, are
// included below for completeness.
// Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved. // Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
// //
// Licensed under the OpenSSL license (the "License"). You may not use // Licensed under the OpenSSL license (the "License"). You may not use
...@@ -10,8 +20,6 @@ ...@@ -10,8 +20,6 @@
// project. The module is, however, dual licensed under OpenSSL and // project. The module is, however, dual licensed under OpenSSL and
// CRYPTOGAMS licenses depending on where you obtain it. For further // CRYPTOGAMS licenses depending on where you obtain it. For further
// details see http://www.openssl.org/~appro/cryptogams/. // details see http://www.openssl.org/~appro/cryptogams/.
//
// Permission to use under GPLv2 terms is granted.
// ==================================================================== // ====================================================================
// //
// SHA256/512 for ARMv8. // SHA256/512 for ARMv8.
......
...@@ -41,9 +41,16 @@ ...@@ -41,9 +41,16 @@
*/ */
.text .text
ENTRY(sha3_ce_transform) ENTRY(sha3_ce_transform)
/* load state */ frame_push 4
add x8, x0, #32
ld1 { v0.1d- v3.1d}, [x0] mov x19, x0
mov x20, x1
mov x21, x2
mov x22, x3
0: /* load state */
add x8, x19, #32
ld1 { v0.1d- v3.1d}, [x19]
ld1 { v4.1d- v7.1d}, [x8], #32 ld1 { v4.1d- v7.1d}, [x8], #32
ld1 { v8.1d-v11.1d}, [x8], #32 ld1 { v8.1d-v11.1d}, [x8], #32
ld1 {v12.1d-v15.1d}, [x8], #32 ld1 {v12.1d-v15.1d}, [x8], #32
...@@ -51,13 +58,13 @@ ENTRY(sha3_ce_transform) ...@@ -51,13 +58,13 @@ ENTRY(sha3_ce_transform)
ld1 {v20.1d-v23.1d}, [x8], #32 ld1 {v20.1d-v23.1d}, [x8], #32
ld1 {v24.1d}, [x8] ld1 {v24.1d}, [x8]
0: sub w2, w2, #1 1: sub w21, w21, #1
mov w8, #24 mov w8, #24
adr_l x9, .Lsha3_rcon adr_l x9, .Lsha3_rcon
/* load input */ /* load input */
ld1 {v25.8b-v28.8b}, [x1], #32 ld1 {v25.8b-v28.8b}, [x20], #32
ld1 {v29.8b-v31.8b}, [x1], #24 ld1 {v29.8b-v31.8b}, [x20], #24
eor v0.8b, v0.8b, v25.8b eor v0.8b, v0.8b, v25.8b
eor v1.8b, v1.8b, v26.8b eor v1.8b, v1.8b, v26.8b
eor v2.8b, v2.8b, v27.8b eor v2.8b, v2.8b, v27.8b
...@@ -66,10 +73,10 @@ ENTRY(sha3_ce_transform) ...@@ -66,10 +73,10 @@ ENTRY(sha3_ce_transform)
eor v5.8b, v5.8b, v30.8b eor v5.8b, v5.8b, v30.8b
eor v6.8b, v6.8b, v31.8b eor v6.8b, v6.8b, v31.8b
tbnz x3, #6, 2f // SHA3-512 tbnz x22, #6, 3f // SHA3-512
ld1 {v25.8b-v28.8b}, [x1], #32 ld1 {v25.8b-v28.8b}, [x20], #32
ld1 {v29.8b-v30.8b}, [x1], #16 ld1 {v29.8b-v30.8b}, [x20], #16
eor v7.8b, v7.8b, v25.8b eor v7.8b, v7.8b, v25.8b
eor v8.8b, v8.8b, v26.8b eor v8.8b, v8.8b, v26.8b
eor v9.8b, v9.8b, v27.8b eor v9.8b, v9.8b, v27.8b
...@@ -77,34 +84,34 @@ ENTRY(sha3_ce_transform) ...@@ -77,34 +84,34 @@ ENTRY(sha3_ce_transform)
eor v11.8b, v11.8b, v29.8b eor v11.8b, v11.8b, v29.8b
eor v12.8b, v12.8b, v30.8b eor v12.8b, v12.8b, v30.8b
tbnz x3, #4, 1f // SHA3-384 or SHA3-224 tbnz x22, #4, 2f // SHA3-384 or SHA3-224
// SHA3-256 // SHA3-256
ld1 {v25.8b-v28.8b}, [x1], #32 ld1 {v25.8b-v28.8b}, [x20], #32
eor v13.8b, v13.8b, v25.8b eor v13.8b, v13.8b, v25.8b
eor v14.8b, v14.8b, v26.8b eor v14.8b, v14.8b, v26.8b
eor v15.8b, v15.8b, v27.8b eor v15.8b, v15.8b, v27.8b
eor v16.8b, v16.8b, v28.8b eor v16.8b, v16.8b, v28.8b
b 3f b 4f
1: tbz x3, #2, 3f // bit 2 cleared? SHA-384 2: tbz x22, #2, 4f // bit 2 cleared? SHA-384
// SHA3-224 // SHA3-224
ld1 {v25.8b-v28.8b}, [x1], #32 ld1 {v25.8b-v28.8b}, [x20], #32
ld1 {v29.8b}, [x1], #8 ld1 {v29.8b}, [x20], #8
eor v13.8b, v13.8b, v25.8b eor v13.8b, v13.8b, v25.8b
eor v14.8b, v14.8b, v26.8b eor v14.8b, v14.8b, v26.8b
eor v15.8b, v15.8b, v27.8b eor v15.8b, v15.8b, v27.8b
eor v16.8b, v16.8b, v28.8b eor v16.8b, v16.8b, v28.8b
eor v17.8b, v17.8b, v29.8b eor v17.8b, v17.8b, v29.8b
b 3f b 4f
// SHA3-512 // SHA3-512
2: ld1 {v25.8b-v26.8b}, [x1], #16 3: ld1 {v25.8b-v26.8b}, [x20], #16
eor v7.8b, v7.8b, v25.8b eor v7.8b, v7.8b, v25.8b
eor v8.8b, v8.8b, v26.8b eor v8.8b, v8.8b, v26.8b
3: sub w8, w8, #1 4: sub w8, w8, #1
eor3 v29.16b, v4.16b, v9.16b, v14.16b eor3 v29.16b, v4.16b, v9.16b, v14.16b
eor3 v26.16b, v1.16b, v6.16b, v11.16b eor3 v26.16b, v1.16b, v6.16b, v11.16b
...@@ -183,17 +190,33 @@ ENTRY(sha3_ce_transform) ...@@ -183,17 +190,33 @@ ENTRY(sha3_ce_transform)
eor v0.16b, v0.16b, v31.16b eor v0.16b, v0.16b, v31.16b
cbnz w8, 3b cbnz w8, 4b
cbnz w2, 0b cbz w21, 5f
if_will_cond_yield_neon
add x8, x19, #32
st1 { v0.1d- v3.1d}, [x19]
st1 { v4.1d- v7.1d}, [x8], #32
st1 { v8.1d-v11.1d}, [x8], #32
st1 {v12.1d-v15.1d}, [x8], #32
st1 {v16.1d-v19.1d}, [x8], #32
st1 {v20.1d-v23.1d}, [x8], #32
st1 {v24.1d}, [x8]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
/* save state */ /* save state */
st1 { v0.1d- v3.1d}, [x0], #32 5: st1 { v0.1d- v3.1d}, [x19], #32
st1 { v4.1d- v7.1d}, [x0], #32 st1 { v4.1d- v7.1d}, [x19], #32
st1 { v8.1d-v11.1d}, [x0], #32 st1 { v8.1d-v11.1d}, [x19], #32
st1 {v12.1d-v15.1d}, [x0], #32 st1 {v12.1d-v15.1d}, [x19], #32
st1 {v16.1d-v19.1d}, [x0], #32 st1 {v16.1d-v19.1d}, [x19], #32
st1 {v20.1d-v23.1d}, [x0], #32 st1 {v20.1d-v23.1d}, [x19], #32
st1 {v24.1d}, [x0] st1 {v24.1d}, [x19]
frame_pop
ret ret
ENDPROC(sha3_ce_transform) ENDPROC(sha3_ce_transform)
......
#! /usr/bin/env perl #! /usr/bin/env perl
# SPDX-License-Identifier: GPL-2.0
# This code is taken from the OpenSSL project but the author (Andy Polyakov)
# has relicensed it under the GPLv2. Therefore this program is free software;
# you can redistribute it and/or modify it under the terms of the GNU General
# Public License version 2 as published by the Free Software Foundation.
#
# The original headers, including the original license headers, are
# included below for completeness.
# Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved. # Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
# #
# Licensed under the OpenSSL license (the "License"). You may not use # Licensed under the OpenSSL license (the "License"). You may not use
...@@ -11,8 +21,6 @@ ...@@ -11,8 +21,6 @@
# project. The module is, however, dual licensed under OpenSSL and # project. The module is, however, dual licensed under OpenSSL and
# CRYPTOGAMS licenses depending on where you obtain it. For further # CRYPTOGAMS licenses depending on where you obtain it. For further
# details see http://www.openssl.org/~appro/cryptogams/. # details see http://www.openssl.org/~appro/cryptogams/.
#
# Permission to use under GPLv2 terms is granted.
# ==================================================================== # ====================================================================
# #
# SHA256/512 for ARMv8. # SHA256/512 for ARMv8.
......
...@@ -107,17 +107,23 @@ ...@@ -107,17 +107,23 @@
*/ */
.text .text
ENTRY(sha512_ce_transform) ENTRY(sha512_ce_transform)
frame_push 3
mov x19, x0
mov x20, x1
mov x21, x2
/* load state */ /* load state */
ld1 {v8.2d-v11.2d}, [x0] 0: ld1 {v8.2d-v11.2d}, [x19]
/* load first 4 round constants */ /* load first 4 round constants */
adr_l x3, .Lsha512_rcon adr_l x3, .Lsha512_rcon
ld1 {v20.2d-v23.2d}, [x3], #64 ld1 {v20.2d-v23.2d}, [x3], #64
/* load input */ /* load input */
0: ld1 {v12.2d-v15.2d}, [x1], #64 1: ld1 {v12.2d-v15.2d}, [x20], #64
ld1 {v16.2d-v19.2d}, [x1], #64 ld1 {v16.2d-v19.2d}, [x20], #64
sub w2, w2, #1 sub w21, w21, #1
CPU_LE( rev64 v12.16b, v12.16b ) CPU_LE( rev64 v12.16b, v12.16b )
CPU_LE( rev64 v13.16b, v13.16b ) CPU_LE( rev64 v13.16b, v13.16b )
...@@ -196,9 +202,18 @@ CPU_LE( rev64 v19.16b, v19.16b ) ...@@ -196,9 +202,18 @@ CPU_LE( rev64 v19.16b, v19.16b )
add v11.2d, v11.2d, v3.2d add v11.2d, v11.2d, v3.2d
/* handled all input blocks? */ /* handled all input blocks? */
cbnz w2, 0b cbz w21, 3f
if_will_cond_yield_neon
st1 {v8.2d-v11.2d}, [x19]
do_cond_yield_neon
b 0b
endif_yield_neon
b 1b
/* store new state */ /* store new state */
3: st1 {v8.2d-v11.2d}, [x0] 3: st1 {v8.2d-v11.2d}, [x19]
frame_pop
ret ret
ENDPROC(sha512_ce_transform) ENDPROC(sha512_ce_transform)
// SPDX-License-Identifier: GPL-2.0
// This code is taken from the OpenSSL project but the author (Andy Polyakov)
// has relicensed it under the GPLv2. Therefore this program is free software;
// you can redistribute it and/or modify it under the terms of the GNU General
// Public License version 2 as published by the Free Software Foundation.
//
// The original headers, including the original license headers, are
// included below for completeness.
// Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved. // Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
// //
// Licensed under the OpenSSL license (the "License"). You may not use // Licensed under the OpenSSL license (the "License"). You may not use
...@@ -10,8 +20,6 @@ ...@@ -10,8 +20,6 @@
// project. The module is, however, dual licensed under OpenSSL and // project. The module is, however, dual licensed under OpenSSL and
// CRYPTOGAMS licenses depending on where you obtain it. For further // CRYPTOGAMS licenses depending on where you obtain it. For further
// details see http://www.openssl.org/~appro/cryptogams/. // details see http://www.openssl.org/~appro/cryptogams/.
//
// Permission to use under GPLv2 terms is granted.
// ==================================================================== // ====================================================================
// //
// SHA256/512 for ARMv8. // SHA256/512 for ARMv8.
......
// SPDX-License-Identifier: GPL-2.0
#include <linux/linkage.h>
#include <asm/assembler.h>
.irp b, 0, 1, 2, 3, 4, 5, 6, 7, 8
.set .Lv\b\().4s, \b
.endr
.macro sm4e, rd, rn
.inst 0xcec08400 | .L\rd | (.L\rn << 5)
.endm
/*
* void sm4_ce_do_crypt(const u32 *rk, u32 *out, const u32 *in);
*/
.text
ENTRY(sm4_ce_do_crypt)
ld1 {v8.4s}, [x2]
ld1 {v0.4s-v3.4s}, [x0], #64
CPU_LE( rev32 v8.16b, v8.16b )
ld1 {v4.4s-v7.4s}, [x0]
sm4e v8.4s, v0.4s
sm4e v8.4s, v1.4s
sm4e v8.4s, v2.4s
sm4e v8.4s, v3.4s
sm4e v8.4s, v4.4s
sm4e v8.4s, v5.4s
sm4e v8.4s, v6.4s
sm4e v8.4s, v7.4s
rev64 v8.4s, v8.4s
ext v8.16b, v8.16b, v8.16b, #8
CPU_LE( rev32 v8.16b, v8.16b )
st1 {v8.4s}, [x1]
ret
ENDPROC(sm4_ce_do_crypt)
// SPDX-License-Identifier: GPL-2.0
#include <asm/neon.h>
#include <asm/simd.h>
#include <crypto/sm4.h>
#include <linux/module.h>
#include <linux/cpufeature.h>
#include <linux/crypto.h>
#include <linux/types.h>
MODULE_ALIAS_CRYPTO("sm4");
MODULE_ALIAS_CRYPTO("sm4-ce");
MODULE_DESCRIPTION("SM4 symmetric cipher using ARMv8 Crypto Extensions");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
asmlinkage void sm4_ce_do_crypt(const u32 *rk, void *out, const void *in);
static void sm4_ce_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
{
const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
if (!may_use_simd()) {
crypto_sm4_encrypt(tfm, out, in);
} else {
kernel_neon_begin();
sm4_ce_do_crypt(ctx->rkey_enc, out, in);
kernel_neon_end();
}
}
static void sm4_ce_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
{
const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
if (!may_use_simd()) {
crypto_sm4_decrypt(tfm, out, in);
} else {
kernel_neon_begin();
sm4_ce_do_crypt(ctx->rkey_dec, out, in);
kernel_neon_end();
}
}
static struct crypto_alg sm4_ce_alg = {
.cra_name = "sm4",
.cra_driver_name = "sm4-ce",
.cra_priority = 200,
.cra_flags = CRYPTO_ALG_TYPE_CIPHER,
.cra_blocksize = SM4_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct crypto_sm4_ctx),
.cra_module = THIS_MODULE,
.cra_u.cipher = {
.cia_min_keysize = SM4_KEY_SIZE,
.cia_max_keysize = SM4_KEY_SIZE,
.cia_setkey = crypto_sm4_set_key,
.cia_encrypt = sm4_ce_encrypt,
.cia_decrypt = sm4_ce_decrypt
}
};
static int __init sm4_ce_mod_init(void)
{
return crypto_register_alg(&sm4_ce_alg);
}
static void __exit sm4_ce_mod_fini(void)
{
crypto_unregister_alg(&sm4_ce_alg);
}
module_cpu_feature_match(SM3, sm4_ce_mod_init);
module_exit(sm4_ce_mod_fini);
...@@ -15,7 +15,6 @@ obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o ...@@ -15,7 +15,6 @@ obj-$(CONFIG_CRYPTO_GLUE_HELPER_X86) += glue_helper.o
obj-$(CONFIG_CRYPTO_AES_586) += aes-i586.o obj-$(CONFIG_CRYPTO_AES_586) += aes-i586.o
obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o
obj-$(CONFIG_CRYPTO_SALSA20_586) += salsa20-i586.o
obj-$(CONFIG_CRYPTO_SERPENT_SSE2_586) += serpent-sse2-i586.o obj-$(CONFIG_CRYPTO_SERPENT_SSE2_586) += serpent-sse2-i586.o
obj-$(CONFIG_CRYPTO_AES_X86_64) += aes-x86_64.o obj-$(CONFIG_CRYPTO_AES_X86_64) += aes-x86_64.o
...@@ -24,7 +23,6 @@ obj-$(CONFIG_CRYPTO_CAMELLIA_X86_64) += camellia-x86_64.o ...@@ -24,7 +23,6 @@ obj-$(CONFIG_CRYPTO_CAMELLIA_X86_64) += camellia-x86_64.o
obj-$(CONFIG_CRYPTO_BLOWFISH_X86_64) += blowfish-x86_64.o obj-$(CONFIG_CRYPTO_BLOWFISH_X86_64) += blowfish-x86_64.o
obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
obj-$(CONFIG_CRYPTO_TWOFISH_X86_64_3WAY) += twofish-x86_64-3way.o obj-$(CONFIG_CRYPTO_TWOFISH_X86_64_3WAY) += twofish-x86_64-3way.o
obj-$(CONFIG_CRYPTO_SALSA20_X86_64) += salsa20-x86_64.o
obj-$(CONFIG_CRYPTO_CHACHA20_X86_64) += chacha20-x86_64.o obj-$(CONFIG_CRYPTO_CHACHA20_X86_64) += chacha20-x86_64.o
obj-$(CONFIG_CRYPTO_SERPENT_SSE2_X86_64) += serpent-sse2-x86_64.o obj-$(CONFIG_CRYPTO_SERPENT_SSE2_X86_64) += serpent-sse2-x86_64.o
obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
...@@ -38,6 +36,16 @@ obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o ...@@ -38,6 +36,16 @@ obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o
obj-$(CONFIG_CRYPTO_CRCT10DIF_PCLMUL) += crct10dif-pclmul.o obj-$(CONFIG_CRYPTO_CRCT10DIF_PCLMUL) += crct10dif-pclmul.o
obj-$(CONFIG_CRYPTO_POLY1305_X86_64) += poly1305-x86_64.o obj-$(CONFIG_CRYPTO_POLY1305_X86_64) += poly1305-x86_64.o
obj-$(CONFIG_CRYPTO_AEGIS128_AESNI_SSE2) += aegis128-aesni.o
obj-$(CONFIG_CRYPTO_AEGIS128L_AESNI_SSE2) += aegis128l-aesni.o
obj-$(CONFIG_CRYPTO_AEGIS256_AESNI_SSE2) += aegis256-aesni.o
obj-$(CONFIG_CRYPTO_MORUS640_GLUE) += morus640_glue.o
obj-$(CONFIG_CRYPTO_MORUS1280_GLUE) += morus1280_glue.o
obj-$(CONFIG_CRYPTO_MORUS640_SSE2) += morus640-sse2.o
obj-$(CONFIG_CRYPTO_MORUS1280_SSE2) += morus1280-sse2.o
# These modules require assembler to support AVX. # These modules require assembler to support AVX.
ifeq ($(avx_supported),yes) ifeq ($(avx_supported),yes)
obj-$(CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64) += \ obj-$(CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64) += \
...@@ -55,11 +63,12 @@ ifeq ($(avx2_supported),yes) ...@@ -55,11 +63,12 @@ ifeq ($(avx2_supported),yes)
obj-$(CONFIG_CRYPTO_SHA1_MB) += sha1-mb/ obj-$(CONFIG_CRYPTO_SHA1_MB) += sha1-mb/
obj-$(CONFIG_CRYPTO_SHA256_MB) += sha256-mb/ obj-$(CONFIG_CRYPTO_SHA256_MB) += sha256-mb/
obj-$(CONFIG_CRYPTO_SHA512_MB) += sha512-mb/ obj-$(CONFIG_CRYPTO_SHA512_MB) += sha512-mb/
obj-$(CONFIG_CRYPTO_MORUS1280_AVX2) += morus1280-avx2.o
endif endif
aes-i586-y := aes-i586-asm_32.o aes_glue.o aes-i586-y := aes-i586-asm_32.o aes_glue.o
twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
salsa20-i586-y := salsa20-i586-asm_32.o salsa20_glue.o
serpent-sse2-i586-y := serpent-sse2-i586-asm_32.o serpent_sse2_glue.o serpent-sse2-i586-y := serpent-sse2-i586-asm_32.o serpent_sse2_glue.o
aes-x86_64-y := aes-x86_64-asm_64.o aes_glue.o aes-x86_64-y := aes-x86_64-asm_64.o aes_glue.o
...@@ -68,10 +77,16 @@ camellia-x86_64-y := camellia-x86_64-asm_64.o camellia_glue.o ...@@ -68,10 +77,16 @@ camellia-x86_64-y := camellia-x86_64-asm_64.o camellia_glue.o
blowfish-x86_64-y := blowfish-x86_64-asm_64.o blowfish_glue.o blowfish-x86_64-y := blowfish-x86_64-asm_64.o blowfish_glue.o
twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o
twofish-x86_64-3way-y := twofish-x86_64-asm_64-3way.o twofish_glue_3way.o twofish-x86_64-3way-y := twofish-x86_64-asm_64-3way.o twofish_glue_3way.o
salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
chacha20-x86_64-y := chacha20-ssse3-x86_64.o chacha20_glue.o chacha20-x86_64-y := chacha20-ssse3-x86_64.o chacha20_glue.o
serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o
aegis128-aesni-y := aegis128-aesni-asm.o aegis128-aesni-glue.o
aegis128l-aesni-y := aegis128l-aesni-asm.o aegis128l-aesni-glue.o
aegis256-aesni-y := aegis256-aesni-asm.o aegis256-aesni-glue.o
morus640-sse2-y := morus640-sse2-asm.o morus640-sse2-glue.o
morus1280-sse2-y := morus1280-sse2-asm.o morus1280-sse2-glue.o
ifeq ($(avx_supported),yes) ifeq ($(avx_supported),yes)
camellia-aesni-avx-x86_64-y := camellia-aesni-avx-asm_64.o \ camellia-aesni-avx-x86_64-y := camellia-aesni-avx-asm_64.o \
camellia_aesni_avx_glue.o camellia_aesni_avx_glue.o
...@@ -87,6 +102,8 @@ ifeq ($(avx2_supported),yes) ...@@ -87,6 +102,8 @@ ifeq ($(avx2_supported),yes)
camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o camellia_aesni_avx2_glue.o camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o camellia_aesni_avx2_glue.o
chacha20-x86_64-y += chacha20-avx2-x86_64.o chacha20-x86_64-y += chacha20-avx2-x86_64.o
serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o
morus1280-avx2-y := morus1280-avx2-asm.o morus1280-avx2-glue.o
endif endif
aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -364,5 +364,5 @@ module_exit(ghash_pclmulqdqni_mod_exit); ...@@ -364,5 +364,5 @@ module_exit(ghash_pclmulqdqni_mod_exit);
MODULE_LICENSE("GPL"); MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("GHASH Message Digest Algorithm, " MODULE_DESCRIPTION("GHASH Message Digest Algorithm, "
"acclerated by PCLMULQDQ-NI"); "accelerated by PCLMULQDQ-NI");
MODULE_ALIAS_CRYPTO("ghash"); MODULE_ALIAS_CRYPTO("ghash");
This diff is collapsed.
/*
* The MORUS-1280 Authenticated-Encryption Algorithm
* Glue for AVX2 implementation
*
* Copyright (c) 2016-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*/
#include <crypto/internal/aead.h>
#include <crypto/morus1280_glue.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
#include <asm/cpu_device_id.h>
asmlinkage void crypto_morus1280_avx2_init(void *state, const void *key,
const void *iv);
asmlinkage void crypto_morus1280_avx2_ad(void *state, const void *data,
unsigned int length);
asmlinkage void crypto_morus1280_avx2_enc(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_avx2_dec(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_avx2_enc_tail(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_avx2_dec_tail(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_avx2_final(void *state, void *tag_xor,
u64 assoclen, u64 cryptlen);
MORUS1280_DECLARE_ALGS(avx2, "morus1280-avx2", 400);
static const struct x86_cpu_id avx2_cpu_id[] = {
X86_FEATURE_MATCH(X86_FEATURE_AVX2),
{}
};
MODULE_DEVICE_TABLE(x86cpu, avx2_cpu_id);
static int __init crypto_morus1280_avx2_module_init(void)
{
if (!x86_match_cpu(avx2_cpu_id))
return -ENODEV;
return crypto_register_aeads(crypto_morus1280_avx2_algs,
ARRAY_SIZE(crypto_morus1280_avx2_algs));
}
static void __exit crypto_morus1280_avx2_module_exit(void)
{
crypto_unregister_aeads(crypto_morus1280_avx2_algs,
ARRAY_SIZE(crypto_morus1280_avx2_algs));
}
module_init(crypto_morus1280_avx2_module_init);
module_exit(crypto_morus1280_avx2_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ondrej Mosnacek <omosnacek@gmail.com>");
MODULE_DESCRIPTION("MORUS-1280 AEAD algorithm -- AVX2 implementation");
MODULE_ALIAS_CRYPTO("morus1280");
MODULE_ALIAS_CRYPTO("morus1280-avx2");
This diff is collapsed.
/*
* The MORUS-1280 Authenticated-Encryption Algorithm
* Glue for SSE2 implementation
*
* Copyright (c) 2016-2018 Ondrej Mosnacek <omosnacek@gmail.com>
* Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the Free
* Software Foundation; either version 2 of the License, or (at your option)
* any later version.
*/
#include <crypto/internal/aead.h>
#include <crypto/morus1280_glue.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
#include <asm/cpu_device_id.h>
asmlinkage void crypto_morus1280_sse2_init(void *state, const void *key,
const void *iv);
asmlinkage void crypto_morus1280_sse2_ad(void *state, const void *data,
unsigned int length);
asmlinkage void crypto_morus1280_sse2_enc(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_sse2_dec(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_sse2_enc_tail(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_sse2_dec_tail(void *state, const void *src,
void *dst, unsigned int length);
asmlinkage void crypto_morus1280_sse2_final(void *state, void *tag_xor,
u64 assoclen, u64 cryptlen);
MORUS1280_DECLARE_ALGS(sse2, "morus1280-sse2", 350);
static const struct x86_cpu_id sse2_cpu_id[] = {
X86_FEATURE_MATCH(X86_FEATURE_XMM2),
{}
};
MODULE_DEVICE_TABLE(x86cpu, sse2_cpu_id);
static int __init crypto_morus1280_sse2_module_init(void)
{
if (!x86_match_cpu(sse2_cpu_id))
return -ENODEV;
return crypto_register_aeads(crypto_morus1280_sse2_algs,
ARRAY_SIZE(crypto_morus1280_sse2_algs));
}
static void __exit crypto_morus1280_sse2_module_exit(void)
{
crypto_unregister_aeads(crypto_morus1280_sse2_algs,
ARRAY_SIZE(crypto_morus1280_sse2_algs));
}
module_init(crypto_morus1280_sse2_module_init);
module_exit(crypto_morus1280_sse2_module_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ondrej Mosnacek <omosnacek@gmail.com>");
MODULE_DESCRIPTION("MORUS-1280 AEAD algorithm -- SSE2 implementation");
MODULE_ALIAS_CRYPTO("morus1280");
MODULE_ALIAS_CRYPTO("morus1280-sse2");
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -86,6 +86,11 @@ obj-$(CONFIG_CRYPTO_KEYWRAP) += keywrap.o ...@@ -86,6 +86,11 @@ obj-$(CONFIG_CRYPTO_KEYWRAP) += keywrap.o
obj-$(CONFIG_CRYPTO_GCM) += gcm.o obj-$(CONFIG_CRYPTO_GCM) += gcm.o
obj-$(CONFIG_CRYPTO_CCM) += ccm.o obj-$(CONFIG_CRYPTO_CCM) += ccm.o
obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o
obj-$(CONFIG_CRYPTO_AEGIS128) += aegis128.o
obj-$(CONFIG_CRYPTO_AEGIS128L) += aegis128l.o
obj-$(CONFIG_CRYPTO_AEGIS256) += aegis256.o
obj-$(CONFIG_CRYPTO_MORUS640) += morus640.o
obj-$(CONFIG_CRYPTO_MORUS1280) += morus1280.o
obj-$(CONFIG_CRYPTO_PCRYPT) += pcrypt.o obj-$(CONFIG_CRYPTO_PCRYPT) += pcrypt.o
obj-$(CONFIG_CRYPTO_CRYPTD) += cryptd.o obj-$(CONFIG_CRYPTO_CRYPTD) += cryptd.o
obj-$(CONFIG_CRYPTO_MCRYPTD) += mcryptd.o obj-$(CONFIG_CRYPTO_MCRYPTD) += mcryptd.o
...@@ -137,6 +142,7 @@ obj-$(CONFIG_CRYPTO_USER_API_HASH) += algif_hash.o ...@@ -137,6 +142,7 @@ obj-$(CONFIG_CRYPTO_USER_API_HASH) += algif_hash.o
obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o
obj-$(CONFIG_CRYPTO_USER_API_RNG) += algif_rng.o obj-$(CONFIG_CRYPTO_USER_API_RNG) += algif_rng.o
obj-$(CONFIG_CRYPTO_USER_API_AEAD) += algif_aead.o obj-$(CONFIG_CRYPTO_USER_API_AEAD) += algif_aead.o
obj-$(CONFIG_CRYPTO_ZSTD) += zstd.o
ecdh_generic-y := ecc.o ecdh_generic-y := ecc.o
ecdh_generic-y += ecdh.o ecdh_generic-y += ecdh.o
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -10,6 +10,7 @@ ...@@ -10,6 +10,7 @@
* *
*/ */
#include <crypto/algapi.h>
#include <linux/err.h> #include <linux/err.h>
#include <linux/errno.h> #include <linux/errno.h>
#include <linux/fips.h> #include <linux/fips.h>
...@@ -59,6 +60,15 @@ static int crypto_check_alg(struct crypto_alg *alg) ...@@ -59,6 +60,15 @@ static int crypto_check_alg(struct crypto_alg *alg)
if (alg->cra_blocksize > PAGE_SIZE / 8) if (alg->cra_blocksize > PAGE_SIZE / 8)
return -EINVAL; return -EINVAL;
if (!alg->cra_type && (alg->cra_flags & CRYPTO_ALG_TYPE_MASK) ==
CRYPTO_ALG_TYPE_CIPHER) {
if (alg->cra_alignmask > MAX_CIPHER_ALIGNMASK)
return -EINVAL;
if (alg->cra_blocksize > MAX_CIPHER_BLOCKSIZE)
return -EINVAL;
}
if (alg->cra_priority < 0) if (alg->cra_priority < 0)
return -EINVAL; return -EINVAL;
......
...@@ -108,6 +108,7 @@ static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key, ...@@ -108,6 +108,7 @@ static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key,
CRYPTO_TFM_RES_MASK); CRYPTO_TFM_RES_MASK);
out: out:
memzero_explicit(&keys, sizeof(keys));
return err; return err;
badkey: badkey:
......
...@@ -90,6 +90,7 @@ static int crypto_authenc_esn_setkey(struct crypto_aead *authenc_esn, const u8 * ...@@ -90,6 +90,7 @@ static int crypto_authenc_esn_setkey(struct crypto_aead *authenc_esn, const u8 *
CRYPTO_TFM_RES_MASK); CRYPTO_TFM_RES_MASK);
out: out:
memzero_explicit(&keys, sizeof(keys));
return err; return err;
badkey: badkey:
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment