Commit 36289a03 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto update from Herbert Xu:
 "API:
   - Use kmap_local instead of kmap_atomic
   - Change request callback to take void pointer
   - Print FIPS status in /proc/crypto (when enabled)

  Algorithms:
   - Add rfc4106/gcm support on arm64
   - Add ARIA AVX2/512 support on x86

  Drivers:
   - Add TRNG driver for StarFive SoC
   - Delete ux500/hash driver (subsumed by stm32/hash)
   - Add zlib support in qat
   - Add RSA support in aspeed"

* tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (156 commits)
  crypto: x86/aria-avx - Do not use avx2 instructions
  crypto: aspeed - Fix modular aspeed-acry
  crypto: hisilicon/qm - fix coding style issues
  crypto: hisilicon/qm - update comments to match function
  crypto: hisilicon/qm - change function names
  crypto: hisilicon/qm - use min() instead of min_t()
  crypto: hisilicon/qm - remove some unused defines
  crypto: proc - Print fips status
  crypto: crypto4xx - Call dma_unmap_page when done
  crypto: octeontx2 - Fix objects shared between several modules
  crypto: nx - Fix sparse warnings
  crypto: ecc - Silence sparse warning
  tls: Pass rec instead of aead_req into tls_encrypt_done
  crypto: api - Remove completion function scaffolding
  tls: Remove completion function scaffolding
  tipc: Remove completion function scaffolding
  net: ipv6: Remove completion function scaffolding
  net: ipv4: Remove completion function scaffolding
  net: macsec: Remove completion function scaffolding
  dm: Remove completion function scaffolding
  ...
parents 69308402 8b844753
What: /sys/bus/pci/devices/<BDF>/qat/state What: /sys/bus/pci/devices/<BDF>/qat/state
Date: June 2022 Date: June 2022
KernelVersion: 5.20 KernelVersion: 6.0
Contact: qat-linux@intel.com Contact: qat-linux@intel.com
Description: (RW) Reports the current state of the QAT device. Write to Description: (RW) Reports the current state of the QAT device. Write to
the file to start or stop the device. the file to start or stop the device.
...@@ -18,7 +18,7 @@ Description: (RW) Reports the current state of the QAT device. Write to ...@@ -18,7 +18,7 @@ Description: (RW) Reports the current state of the QAT device. Write to
What: /sys/bus/pci/devices/<BDF>/qat/cfg_services What: /sys/bus/pci/devices/<BDF>/qat/cfg_services
Date: June 2022 Date: June 2022
KernelVersion: 5.20 KernelVersion: 6.0
Contact: qat-linux@intel.com Contact: qat-linux@intel.com
Description: (RW) Reports the current configuration of the QAT device. Description: (RW) Reports the current configuration of the QAT device.
Write to the file to change the configured services. Write to the file to change the configured services.
......
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/bus/aspeed,ast2600-ahbc.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: ASPEED Advanced High-Performance Bus Controller (AHBC)
maintainers:
- Neal Liu <neal_liu@aspeedtech.com>
- Chia-Wei Wang <chiawei_wang@aspeedtech.com>
description: |
Advanced High-performance Bus Controller (AHBC) supports plenty of mechanisms
including a priority arbiter, an address decoder and a data multiplexer
to control the overall operations of Advanced High-performance Bus (AHB).
properties:
compatible:
enum:
- aspeed,ast2600-ahbc
reg:
maxItems: 1
required:
- compatible
- reg
additionalProperties: false
examples:
- |
ahbc@1e600000 {
compatible = "aspeed,ast2600-ahbc";
reg = <0x1e600000 0x100>;
};
...@@ -14,6 +14,7 @@ properties: ...@@ -14,6 +14,7 @@ properties:
enum: enum:
- allwinner,sun8i-h3-crypto - allwinner,sun8i-h3-crypto
- allwinner,sun8i-r40-crypto - allwinner,sun8i-r40-crypto
- allwinner,sun20i-d1-crypto
- allwinner,sun50i-a64-crypto - allwinner,sun50i-a64-crypto
- allwinner,sun50i-h5-crypto - allwinner,sun50i-h5-crypto
- allwinner,sun50i-h6-crypto - allwinner,sun50i-h6-crypto
...@@ -29,6 +30,7 @@ properties: ...@@ -29,6 +30,7 @@ properties:
- description: Bus clock - description: Bus clock
- description: Module clock - description: Module clock
- description: MBus clock - description: MBus clock
- description: TRNG clock (RC oscillator)
minItems: 2 minItems: 2
clock-names: clock-names:
...@@ -36,6 +38,7 @@ properties: ...@@ -36,6 +38,7 @@ properties:
- const: bus - const: bus
- const: mod - const: mod
- const: ram - const: ram
- const: trng
minItems: 2 minItems: 2
resets: resets:
...@@ -44,19 +47,33 @@ properties: ...@@ -44,19 +47,33 @@ properties:
if: if:
properties: properties:
compatible: compatible:
const: allwinner,sun50i-h6-crypto enum:
- allwinner,sun20i-d1-crypto
then: then:
properties: properties:
clocks: clocks:
minItems: 3 minItems: 4
clock-names: clock-names:
minItems: 3 minItems: 4
else: else:
properties: if:
clocks: properties:
maxItems: 2 compatible:
clock-names: const: allwinner,sun50i-h6-crypto
maxItems: 2 then:
properties:
clocks:
minItems: 3
maxItems: 3
clock-names:
minItems: 3
maxItems: 3
else:
properties:
clocks:
maxItems: 2
clock-names:
maxItems: 2
required: required:
- compatible - compatible
......
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/crypto/aspeed,ast2600-acry.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: ASPEED ACRY ECDSA/RSA Hardware Accelerator Engines
maintainers:
- Neal Liu <neal_liu@aspeedtech.com>
description:
The ACRY ECDSA/RSA engines is designed to accelerate the throughput
of ECDSA/RSA signature and verification. Basically, ACRY can be
divided into two independent engines - ECC Engine and RSA Engine.
properties:
compatible:
enum:
- aspeed,ast2600-acry
reg:
items:
- description: acry base address & size
- description: acry sram base address & size
clocks:
maxItems: 1
interrupts:
maxItems: 1
required:
- compatible
- reg
- clocks
- interrupts
additionalProperties: false
examples:
- |
#include <dt-bindings/clock/ast2600-clock.h>
acry: crypto@1e6fa000 {
compatible = "aspeed,ast2600-acry";
reg = <0x1e6fa000 0x400>, <0x1e710000 0x1800>;
interrupts = <160>;
clocks = <&syscon ASPEED_CLK_GATE_RSACLK>;
};
...@@ -6,12 +6,18 @@ $schema: http://devicetree.org/meta-schemas/core.yaml# ...@@ -6,12 +6,18 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
title: STMicroelectronics STM32 HASH title: STMicroelectronics STM32 HASH
description: The STM32 HASH block is built on the HASH block found in
the STn8820 SoC introduced in 2007, and subsequently used in the U8500
SoC in 2010.
maintainers: maintainers:
- Lionel Debieve <lionel.debieve@foss.st.com> - Lionel Debieve <lionel.debieve@foss.st.com>
properties: properties:
compatible: compatible:
enum: enum:
- st,stn8820-hash
- stericsson,ux500-hash
- st,stm32f456-hash - st,stm32f456-hash
- st,stm32f756-hash - st,stm32f756-hash
...@@ -41,11 +47,26 @@ properties: ...@@ -41,11 +47,26 @@ properties:
maximum: 2 maximum: 2
default: 0 default: 0
power-domains:
maxItems: 1
required: required:
- compatible - compatible
- reg - reg
- clocks - clocks
- interrupts
allOf:
- if:
properties:
compatible:
items:
const: stericsson,ux500-hash
then:
properties:
interrupts: false
else:
required:
- interrupts
additionalProperties: false additionalProperties: false
......
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/rng/starfive,jh7110-trng.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: StarFive SoC TRNG Module
maintainers:
- Jia Jie Ho <jiajie.ho@starfivetech.com>
properties:
compatible:
const: starfive,jh7110-trng
reg:
maxItems: 1
clocks:
items:
- description: Hardware reference clock
- description: AHB reference clock
clock-names:
items:
- const: hclk
- const: ahb
resets:
maxItems: 1
interrupts:
maxItems: 1
required:
- compatible
- reg
- clocks
- clock-names
- resets
- interrupts
additionalProperties: false
examples:
- |
rng: rng@1600C000 {
compatible = "starfive,jh7110-trng";
reg = <0x1600C000 0x4000>;
clocks = <&clk 15>, <&clk 16>;
clock-names = "hclk", "ahb";
resets = <&reset 3>;
interrupts = <30>;
};
...
...@@ -3149,7 +3149,7 @@ ASPEED CRYPTO DRIVER ...@@ -3149,7 +3149,7 @@ ASPEED CRYPTO DRIVER
M: Neal Liu <neal_liu@aspeedtech.com> M: Neal Liu <neal_liu@aspeedtech.com>
L: linux-aspeed@lists.ozlabs.org (moderated for non-subscribers) L: linux-aspeed@lists.ozlabs.org (moderated for non-subscribers)
S: Maintained S: Maintained
F: Documentation/devicetree/bindings/crypto/aspeed,ast2500-hace.yaml F: Documentation/devicetree/bindings/crypto/aspeed,*
F: drivers/crypto/aspeed/ F: drivers/crypto/aspeed/
ASUS NOTEBOOKS AND EEEPC ACPI/WMI EXTRAS DRIVERS ASUS NOTEBOOKS AND EEEPC ACPI/WMI EXTRAS DRIVERS
...@@ -19769,6 +19769,12 @@ F: Documentation/devicetree/bindings/reset/starfive,jh7100-reset.yaml ...@@ -19769,6 +19769,12 @@ F: Documentation/devicetree/bindings/reset/starfive,jh7100-reset.yaml
F: drivers/reset/reset-starfive-jh7100.c F: drivers/reset/reset-starfive-jh7100.c
F: include/dt-bindings/reset/starfive-jh7100.h F: include/dt-bindings/reset/starfive-jh7100.h
STARFIVE TRNG DRIVER
M: Jia Jie Ho <jiajie.ho@starfivetech.com>
S: Supported
F: Documentation/devicetree/bindings/rng/starfive*
F: drivers/char/hw_random/jh7110-trng.c
STATIC BRANCH/CALL STATIC BRANCH/CALL
M: Peter Zijlstra <peterz@infradead.org> M: Peter Zijlstra <peterz@infradead.org>
M: Josh Poimboeuf <jpoimboe@kernel.org> M: Josh Poimboeuf <jpoimboe@kernel.org>
......
...@@ -98,6 +98,11 @@ gic: interrupt-controller@40461000 { ...@@ -98,6 +98,11 @@ gic: interrupt-controller@40461000 {
<0x40466000 0x2000>; <0x40466000 0x2000>;
}; };
ahbc: bus@1e600000 {
compatible = "aspeed,ast2600-ahbc", "syscon";
reg = <0x1e600000 0x100>;
};
fmc: spi@1e620000 { fmc: spi@1e620000 {
reg = <0x1e620000 0xc4>, <0x20000000 0x10000000>; reg = <0x1e620000 0xc4>, <0x20000000 0x10000000>;
#address-cells = <1>; #address-cells = <1>;
...@@ -431,6 +436,14 @@ sbc: secure-boot-controller@1e6f2000 { ...@@ -431,6 +436,14 @@ sbc: secure-boot-controller@1e6f2000 {
reg = <0x1e6f2000 0x1000>; reg = <0x1e6f2000 0x1000>;
}; };
acry: crypto@1e6fa000 {
compatible = "aspeed,ast2600-acry";
reg = <0x1e6fa000 0x400>, <0x1e710000 0x1800>;
interrupts = <GIC_SPI 160 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&syscon ASPEED_CLK_GATE_RSACLK>;
aspeed,ahbc = <&ahbc>;
};
video: video@1e700000 { video: video@1e700000 {
compatible = "aspeed,ast2600-video-engine"; compatible = "aspeed,ast2600-video-engine";
reg = <0x1e700000 0x1000>; reg = <0x1e700000 0x1000>;
......
...@@ -21,31 +21,29 @@ ...@@ -21,31 +21,29 @@
#include "sha1.h" #include "sha1.h"
asmlinkage void sha1_block_data_order(u32 *digest, asmlinkage void sha1_block_data_order(struct sha1_state *digest,
const unsigned char *data, unsigned int rounds); const u8 *data, int rounds);
int sha1_update_arm(struct shash_desc *desc, const u8 *data, int sha1_update_arm(struct shash_desc *desc, const u8 *data,
unsigned int len) unsigned int len)
{ {
/* make sure casting to sha1_block_fn() is safe */ /* make sure signature matches sha1_block_fn() */
BUILD_BUG_ON(offsetof(struct sha1_state, state) != 0); BUILD_BUG_ON(offsetof(struct sha1_state, state) != 0);
return sha1_base_do_update(desc, data, len, return sha1_base_do_update(desc, data, len, sha1_block_data_order);
(sha1_block_fn *)sha1_block_data_order);
} }
EXPORT_SYMBOL_GPL(sha1_update_arm); EXPORT_SYMBOL_GPL(sha1_update_arm);
static int sha1_final(struct shash_desc *desc, u8 *out) static int sha1_final(struct shash_desc *desc, u8 *out)
{ {
sha1_base_do_finalize(desc, (sha1_block_fn *)sha1_block_data_order); sha1_base_do_finalize(desc, sha1_block_data_order);
return sha1_base_finish(desc, out); return sha1_base_finish(desc, out);
} }
int sha1_finup_arm(struct shash_desc *desc, const u8 *data, int sha1_finup_arm(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out) unsigned int len, u8 *out)
{ {
sha1_base_do_update(desc, data, len, sha1_base_do_update(desc, data, len, sha1_block_data_order);
(sha1_block_fn *)sha1_block_data_order);
return sha1_final(desc, out); return sha1_final(desc, out);
} }
EXPORT_SYMBOL_GPL(sha1_finup_arm); EXPORT_SYMBOL_GPL(sha1_finup_arm);
......
...@@ -161,43 +161,39 @@ static int ccm_encrypt(struct aead_request *req) ...@@ -161,43 +161,39 @@ static int ccm_encrypt(struct aead_request *req)
memcpy(buf, req->iv, AES_BLOCK_SIZE); memcpy(buf, req->iv, AES_BLOCK_SIZE);
err = skcipher_walk_aead_encrypt(&walk, req, false); err = skcipher_walk_aead_encrypt(&walk, req, false);
if (unlikely(err))
return err;
kernel_neon_begin(); kernel_neon_begin();
if (req->assoclen) if (req->assoclen)
ccm_calculate_auth_mac(req, mac); ccm_calculate_auth_mac(req, mac);
do { while (walk.nbytes) {
u32 tail = walk.nbytes % AES_BLOCK_SIZE; u32 tail = walk.nbytes % AES_BLOCK_SIZE;
bool final = walk.nbytes == walk.total;
if (walk.nbytes == walk.total) if (final)
tail = 0; tail = 0;
ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr, ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
walk.nbytes - tail, ctx->key_enc, walk.nbytes - tail, ctx->key_enc,
num_rounds(ctx), mac, walk.iv); num_rounds(ctx), mac, walk.iv);
if (walk.nbytes == walk.total) if (!final)
ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); kernel_neon_end();
err = skcipher_walk_done(&walk, tail);
if (!final)
kernel_neon_begin();
}
kernel_neon_end(); ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
if (walk.nbytes) { kernel_neon_end();
err = skcipher_walk_done(&walk, tail);
if (unlikely(err))
return err;
if (unlikely(walk.nbytes))
kernel_neon_begin();
}
} while (walk.nbytes);
/* copy authtag to end of dst */ /* copy authtag to end of dst */
scatterwalk_map_and_copy(mac, req->dst, req->assoclen + req->cryptlen, scatterwalk_map_and_copy(mac, req->dst, req->assoclen + req->cryptlen,
crypto_aead_authsize(aead), 1); crypto_aead_authsize(aead), 1);
return 0; return err;
} }
static int ccm_decrypt(struct aead_request *req) static int ccm_decrypt(struct aead_request *req)
...@@ -219,37 +215,36 @@ static int ccm_decrypt(struct aead_request *req) ...@@ -219,37 +215,36 @@ static int ccm_decrypt(struct aead_request *req)
memcpy(buf, req->iv, AES_BLOCK_SIZE); memcpy(buf, req->iv, AES_BLOCK_SIZE);
err = skcipher_walk_aead_decrypt(&walk, req, false); err = skcipher_walk_aead_decrypt(&walk, req, false);
if (unlikely(err))
return err;
kernel_neon_begin(); kernel_neon_begin();
if (req->assoclen) if (req->assoclen)
ccm_calculate_auth_mac(req, mac); ccm_calculate_auth_mac(req, mac);
do { while (walk.nbytes) {
u32 tail = walk.nbytes % AES_BLOCK_SIZE; u32 tail = walk.nbytes % AES_BLOCK_SIZE;
bool final = walk.nbytes == walk.total;
if (walk.nbytes == walk.total) if (final)
tail = 0; tail = 0;
ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr, ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
walk.nbytes - tail, ctx->key_enc, walk.nbytes - tail, ctx->key_enc,
num_rounds(ctx), mac, walk.iv); num_rounds(ctx), mac, walk.iv);
if (walk.nbytes == walk.total) if (!final)
ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); kernel_neon_end();
err = skcipher_walk_done(&walk, tail);
if (!final)
kernel_neon_begin();
}
kernel_neon_end(); ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
if (walk.nbytes) { kernel_neon_end();
err = skcipher_walk_done(&walk, tail);
if (unlikely(err)) if (unlikely(err))
return err; return err;
if (unlikely(walk.nbytes))
kernel_neon_begin();
}
} while (walk.nbytes);
/* compare calculated auth tag with the stored one */ /* compare calculated auth tag with the stored one */
scatterwalk_map_and_copy(buf, req->src, scatterwalk_map_and_copy(buf, req->src,
......
...@@ -9,6 +9,7 @@ ...@@ -9,6 +9,7 @@
#include <asm/simd.h> #include <asm/simd.h>
#include <asm/unaligned.h> #include <asm/unaligned.h>
#include <crypto/aes.h> #include <crypto/aes.h>
#include <crypto/gcm.h>
#include <crypto/algapi.h> #include <crypto/algapi.h>
#include <crypto/b128ops.h> #include <crypto/b128ops.h>
#include <crypto/gf128mul.h> #include <crypto/gf128mul.h>
...@@ -28,7 +29,8 @@ MODULE_ALIAS_CRYPTO("ghash"); ...@@ -28,7 +29,8 @@ MODULE_ALIAS_CRYPTO("ghash");
#define GHASH_BLOCK_SIZE 16 #define GHASH_BLOCK_SIZE 16
#define GHASH_DIGEST_SIZE 16 #define GHASH_DIGEST_SIZE 16
#define GCM_IV_SIZE 12
#define RFC4106_NONCE_SIZE 4
struct ghash_key { struct ghash_key {
be128 k; be128 k;
...@@ -43,6 +45,7 @@ struct ghash_desc_ctx { ...@@ -43,6 +45,7 @@ struct ghash_desc_ctx {
struct gcm_aes_ctx { struct gcm_aes_ctx {
struct crypto_aes_ctx aes_key; struct crypto_aes_ctx aes_key;
u8 nonce[RFC4106_NONCE_SIZE];
struct ghash_key ghash_key; struct ghash_key ghash_key;
}; };
...@@ -226,8 +229,8 @@ static int num_rounds(struct crypto_aes_ctx *ctx) ...@@ -226,8 +229,8 @@ static int num_rounds(struct crypto_aes_ctx *ctx)
return 6 + ctx->key_length / 4; return 6 + ctx->key_length / 4;
} }
static int gcm_setkey(struct crypto_aead *tfm, const u8 *inkey, static int gcm_aes_setkey(struct crypto_aead *tfm, const u8 *inkey,
unsigned int keylen) unsigned int keylen)
{ {
struct gcm_aes_ctx *ctx = crypto_aead_ctx(tfm); struct gcm_aes_ctx *ctx = crypto_aead_ctx(tfm);
u8 key[GHASH_BLOCK_SIZE]; u8 key[GHASH_BLOCK_SIZE];
...@@ -258,17 +261,9 @@ static int gcm_setkey(struct crypto_aead *tfm, const u8 *inkey, ...@@ -258,17 +261,9 @@ static int gcm_setkey(struct crypto_aead *tfm, const u8 *inkey,
return 0; return 0;
} }
static int gcm_setauthsize(struct crypto_aead *tfm, unsigned int authsize) static int gcm_aes_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
{ {
switch (authsize) { return crypto_gcm_check_authsize(authsize);
case 4:
case 8:
case 12 ... 16:
break;
default:
return -EINVAL;
}
return 0;
} }
static void gcm_update_mac(u64 dg[], const u8 *src, int count, u8 buf[], static void gcm_update_mac(u64 dg[], const u8 *src, int count, u8 buf[],
...@@ -302,13 +297,12 @@ static void gcm_update_mac(u64 dg[], const u8 *src, int count, u8 buf[], ...@@ -302,13 +297,12 @@ static void gcm_update_mac(u64 dg[], const u8 *src, int count, u8 buf[],
} }
} }
static void gcm_calculate_auth_mac(struct aead_request *req, u64 dg[]) static void gcm_calculate_auth_mac(struct aead_request *req, u64 dg[], u32 len)
{ {
struct crypto_aead *aead = crypto_aead_reqtfm(req); struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct gcm_aes_ctx *ctx = crypto_aead_ctx(aead); struct gcm_aes_ctx *ctx = crypto_aead_ctx(aead);
u8 buf[GHASH_BLOCK_SIZE]; u8 buf[GHASH_BLOCK_SIZE];
struct scatter_walk walk; struct scatter_walk walk;
u32 len = req->assoclen;
int buf_count = 0; int buf_count = 0;
scatterwalk_start(&walk, req->src); scatterwalk_start(&walk, req->src);
...@@ -338,27 +332,25 @@ static void gcm_calculate_auth_mac(struct aead_request *req, u64 dg[]) ...@@ -338,27 +332,25 @@ static void gcm_calculate_auth_mac(struct aead_request *req, u64 dg[])
} }
} }
static int gcm_encrypt(struct aead_request *req) static int gcm_encrypt(struct aead_request *req, char *iv, int assoclen)
{ {
struct crypto_aead *aead = crypto_aead_reqtfm(req); struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct gcm_aes_ctx *ctx = crypto_aead_ctx(aead); struct gcm_aes_ctx *ctx = crypto_aead_ctx(aead);
int nrounds = num_rounds(&ctx->aes_key); int nrounds = num_rounds(&ctx->aes_key);
struct skcipher_walk walk; struct skcipher_walk walk;
u8 buf[AES_BLOCK_SIZE]; u8 buf[AES_BLOCK_SIZE];
u8 iv[AES_BLOCK_SIZE];
u64 dg[2] = {}; u64 dg[2] = {};
be128 lengths; be128 lengths;
u8 *tag; u8 *tag;
int err; int err;
lengths.a = cpu_to_be64(req->assoclen * 8); lengths.a = cpu_to_be64(assoclen * 8);
lengths.b = cpu_to_be64(req->cryptlen * 8); lengths.b = cpu_to_be64(req->cryptlen * 8);
if (req->assoclen) if (assoclen)
gcm_calculate_auth_mac(req, dg); gcm_calculate_auth_mac(req, dg, assoclen);
memcpy(iv, req->iv, GCM_IV_SIZE); put_unaligned_be32(2, iv + GCM_AES_IV_SIZE);
put_unaligned_be32(2, iv + GCM_IV_SIZE);
err = skcipher_walk_aead_encrypt(&walk, req, false); err = skcipher_walk_aead_encrypt(&walk, req, false);
...@@ -403,7 +395,7 @@ static int gcm_encrypt(struct aead_request *req) ...@@ -403,7 +395,7 @@ static int gcm_encrypt(struct aead_request *req)
return 0; return 0;
} }
static int gcm_decrypt(struct aead_request *req) static int gcm_decrypt(struct aead_request *req, char *iv, int assoclen)
{ {
struct crypto_aead *aead = crypto_aead_reqtfm(req); struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct gcm_aes_ctx *ctx = crypto_aead_ctx(aead); struct gcm_aes_ctx *ctx = crypto_aead_ctx(aead);
...@@ -412,21 +404,19 @@ static int gcm_decrypt(struct aead_request *req) ...@@ -412,21 +404,19 @@ static int gcm_decrypt(struct aead_request *req)
struct skcipher_walk walk; struct skcipher_walk walk;
u8 otag[AES_BLOCK_SIZE]; u8 otag[AES_BLOCK_SIZE];
u8 buf[AES_BLOCK_SIZE]; u8 buf[AES_BLOCK_SIZE];
u8 iv[AES_BLOCK_SIZE];
u64 dg[2] = {}; u64 dg[2] = {};
be128 lengths; be128 lengths;
u8 *tag; u8 *tag;
int ret; int ret;
int err; int err;
lengths.a = cpu_to_be64(req->assoclen * 8); lengths.a = cpu_to_be64(assoclen * 8);
lengths.b = cpu_to_be64((req->cryptlen - authsize) * 8); lengths.b = cpu_to_be64((req->cryptlen - authsize) * 8);
if (req->assoclen) if (assoclen)
gcm_calculate_auth_mac(req, dg); gcm_calculate_auth_mac(req, dg, assoclen);
memcpy(iv, req->iv, GCM_IV_SIZE); put_unaligned_be32(2, iv + GCM_AES_IV_SIZE);
put_unaligned_be32(2, iv + GCM_IV_SIZE);
scatterwalk_map_and_copy(otag, req->src, scatterwalk_map_and_copy(otag, req->src,
req->assoclen + req->cryptlen - authsize, req->assoclen + req->cryptlen - authsize,
...@@ -471,14 +461,76 @@ static int gcm_decrypt(struct aead_request *req) ...@@ -471,14 +461,76 @@ static int gcm_decrypt(struct aead_request *req)
return ret ? -EBADMSG : 0; return ret ? -EBADMSG : 0;
} }
static struct aead_alg gcm_aes_alg = { static int gcm_aes_encrypt(struct aead_request *req)
.ivsize = GCM_IV_SIZE, {
u8 iv[AES_BLOCK_SIZE];
memcpy(iv, req->iv, GCM_AES_IV_SIZE);
return gcm_encrypt(req, iv, req->assoclen);
}
static int gcm_aes_decrypt(struct aead_request *req)
{
u8 iv[AES_BLOCK_SIZE];
memcpy(iv, req->iv, GCM_AES_IV_SIZE);
return gcm_decrypt(req, iv, req->assoclen);
}
static int rfc4106_setkey(struct crypto_aead *tfm, const u8 *inkey,
unsigned int keylen)
{
struct gcm_aes_ctx *ctx = crypto_aead_ctx(tfm);
int err;
keylen -= RFC4106_NONCE_SIZE;
err = gcm_aes_setkey(tfm, inkey, keylen);
if (err)
return err;
memcpy(ctx->nonce, inkey + keylen, RFC4106_NONCE_SIZE);
return 0;
}
static int rfc4106_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
{
return crypto_rfc4106_check_authsize(authsize);
}
static int rfc4106_encrypt(struct aead_request *req)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct gcm_aes_ctx *ctx = crypto_aead_ctx(aead);
u8 iv[AES_BLOCK_SIZE];
memcpy(iv, ctx->nonce, RFC4106_NONCE_SIZE);
memcpy(iv + RFC4106_NONCE_SIZE, req->iv, GCM_RFC4106_IV_SIZE);
return crypto_ipsec_check_assoclen(req->assoclen) ?:
gcm_encrypt(req, iv, req->assoclen - GCM_RFC4106_IV_SIZE);
}
static int rfc4106_decrypt(struct aead_request *req)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct gcm_aes_ctx *ctx = crypto_aead_ctx(aead);
u8 iv[AES_BLOCK_SIZE];
memcpy(iv, ctx->nonce, RFC4106_NONCE_SIZE);
memcpy(iv + RFC4106_NONCE_SIZE, req->iv, GCM_RFC4106_IV_SIZE);
return crypto_ipsec_check_assoclen(req->assoclen) ?:
gcm_decrypt(req, iv, req->assoclen - GCM_RFC4106_IV_SIZE);
}
static struct aead_alg gcm_aes_algs[] = {{
.ivsize = GCM_AES_IV_SIZE,
.chunksize = AES_BLOCK_SIZE, .chunksize = AES_BLOCK_SIZE,
.maxauthsize = AES_BLOCK_SIZE, .maxauthsize = AES_BLOCK_SIZE,
.setkey = gcm_setkey, .setkey = gcm_aes_setkey,
.setauthsize = gcm_setauthsize, .setauthsize = gcm_aes_setauthsize,
.encrypt = gcm_encrypt, .encrypt = gcm_aes_encrypt,
.decrypt = gcm_decrypt, .decrypt = gcm_aes_decrypt,
.base.cra_name = "gcm(aes)", .base.cra_name = "gcm(aes)",
.base.cra_driver_name = "gcm-aes-ce", .base.cra_driver_name = "gcm-aes-ce",
...@@ -487,7 +539,23 @@ static struct aead_alg gcm_aes_alg = { ...@@ -487,7 +539,23 @@ static struct aead_alg gcm_aes_alg = {
.base.cra_ctxsize = sizeof(struct gcm_aes_ctx) + .base.cra_ctxsize = sizeof(struct gcm_aes_ctx) +
4 * sizeof(u64[2]), 4 * sizeof(u64[2]),
.base.cra_module = THIS_MODULE, .base.cra_module = THIS_MODULE,
}; }, {
.ivsize = GCM_RFC4106_IV_SIZE,
.chunksize = AES_BLOCK_SIZE,
.maxauthsize = AES_BLOCK_SIZE,
.setkey = rfc4106_setkey,
.setauthsize = rfc4106_setauthsize,
.encrypt = rfc4106_encrypt,
.decrypt = rfc4106_decrypt,
.base.cra_name = "rfc4106(gcm(aes))",
.base.cra_driver_name = "rfc4106-gcm-aes-ce",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct gcm_aes_ctx) +
4 * sizeof(u64[2]),
.base.cra_module = THIS_MODULE,
}};
static int __init ghash_ce_mod_init(void) static int __init ghash_ce_mod_init(void)
{ {
...@@ -495,7 +563,8 @@ static int __init ghash_ce_mod_init(void) ...@@ -495,7 +563,8 @@ static int __init ghash_ce_mod_init(void)
return -ENODEV; return -ENODEV;
if (cpu_have_named_feature(PMULL)) if (cpu_have_named_feature(PMULL))
return crypto_register_aead(&gcm_aes_alg); return crypto_register_aeads(gcm_aes_algs,
ARRAY_SIZE(gcm_aes_algs));
return crypto_register_shash(&ghash_alg); return crypto_register_shash(&ghash_alg);
} }
...@@ -503,7 +572,7 @@ static int __init ghash_ce_mod_init(void) ...@@ -503,7 +572,7 @@ static int __init ghash_ce_mod_init(void)
static void __exit ghash_ce_mod_exit(void) static void __exit ghash_ce_mod_exit(void)
{ {
if (cpu_have_named_feature(PMULL)) if (cpu_have_named_feature(PMULL))
crypto_unregister_aead(&gcm_aes_alg); crypto_unregister_aeads(gcm_aes_algs, ARRAY_SIZE(gcm_aes_algs));
else else
crypto_unregister_shash(&ghash_alg); crypto_unregister_shash(&ghash_alg);
} }
......
...@@ -166,7 +166,7 @@ static int ccm_crypt(struct aead_request *req, struct skcipher_walk *walk, ...@@ -166,7 +166,7 @@ static int ccm_crypt(struct aead_request *req, struct skcipher_walk *walk,
unsigned int nbytes, u8 *mac)) unsigned int nbytes, u8 *mac))
{ {
u8 __aligned(8) ctr0[SM4_BLOCK_SIZE]; u8 __aligned(8) ctr0[SM4_BLOCK_SIZE];
int err; int err = 0;
/* preserve the initial ctr0 for the TAG */ /* preserve the initial ctr0 for the TAG */
memcpy(ctr0, walk->iv, SM4_BLOCK_SIZE); memcpy(ctr0, walk->iv, SM4_BLOCK_SIZE);
...@@ -177,33 +177,37 @@ static int ccm_crypt(struct aead_request *req, struct skcipher_walk *walk, ...@@ -177,33 +177,37 @@ static int ccm_crypt(struct aead_request *req, struct skcipher_walk *walk,
if (req->assoclen) if (req->assoclen)
ccm_calculate_auth_mac(req, mac); ccm_calculate_auth_mac(req, mac);
do { while (walk->nbytes && walk->nbytes != walk->total) {
unsigned int tail = walk->nbytes % SM4_BLOCK_SIZE; unsigned int tail = walk->nbytes % SM4_BLOCK_SIZE;
const u8 *src = walk->src.virt.addr;
u8 *dst = walk->dst.virt.addr;
if (walk->nbytes == walk->total) sm4_ce_ccm_crypt(rkey_enc, walk->dst.virt.addr,
tail = 0; walk->src.virt.addr, walk->iv,
walk->nbytes - tail, mac);
kernel_neon_end();
err = skcipher_walk_done(walk, tail);
kernel_neon_begin();
}
if (walk->nbytes - tail) if (walk->nbytes) {
sm4_ce_ccm_crypt(rkey_enc, dst, src, walk->iv, sm4_ce_ccm_crypt(rkey_enc, walk->dst.virt.addr,
walk->nbytes - tail, mac); walk->src.virt.addr, walk->iv,
walk->nbytes, mac);
if (walk->nbytes == walk->total) sm4_ce_ccm_final(rkey_enc, ctr0, mac);
sm4_ce_ccm_final(rkey_enc, ctr0, mac);
kernel_neon_end(); kernel_neon_end();
if (walk->nbytes) { err = skcipher_walk_done(walk, 0);
err = skcipher_walk_done(walk, tail); } else {
if (err) sm4_ce_ccm_final(rkey_enc, ctr0, mac);
return err;
if (walk->nbytes)
kernel_neon_begin();
}
} while (walk->nbytes > 0);
return 0; kernel_neon_end();
}
return err;
} }
static int ccm_encrypt(struct aead_request *req) static int ccm_encrypt(struct aead_request *req)
......
...@@ -135,22 +135,23 @@ static void gcm_calculate_auth_mac(struct aead_request *req, u8 ghash[]) ...@@ -135,22 +135,23 @@ static void gcm_calculate_auth_mac(struct aead_request *req, u8 ghash[])
} }
static int gcm_crypt(struct aead_request *req, struct skcipher_walk *walk, static int gcm_crypt(struct aead_request *req, struct skcipher_walk *walk,
struct sm4_gcm_ctx *ctx, u8 ghash[], u8 ghash[], int err,
void (*sm4_ce_pmull_gcm_crypt)(const u32 *rkey_enc, void (*sm4_ce_pmull_gcm_crypt)(const u32 *rkey_enc,
u8 *dst, const u8 *src, u8 *iv, u8 *dst, const u8 *src, u8 *iv,
unsigned int nbytes, u8 *ghash, unsigned int nbytes, u8 *ghash,
const u8 *ghash_table, const u8 *lengths)) const u8 *ghash_table, const u8 *lengths))
{ {
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct sm4_gcm_ctx *ctx = crypto_aead_ctx(aead);
u8 __aligned(8) iv[SM4_BLOCK_SIZE]; u8 __aligned(8) iv[SM4_BLOCK_SIZE];
be128 __aligned(8) lengths; be128 __aligned(8) lengths;
int err;
memset(ghash, 0, SM4_BLOCK_SIZE); memset(ghash, 0, SM4_BLOCK_SIZE);
lengths.a = cpu_to_be64(req->assoclen * 8); lengths.a = cpu_to_be64(req->assoclen * 8);
lengths.b = cpu_to_be64(walk->total * 8); lengths.b = cpu_to_be64(walk->total * 8);
memcpy(iv, walk->iv, GCM_IV_SIZE); memcpy(iv, req->iv, GCM_IV_SIZE);
put_unaligned_be32(2, iv + GCM_IV_SIZE); put_unaligned_be32(2, iv + GCM_IV_SIZE);
kernel_neon_begin(); kernel_neon_begin();
...@@ -158,49 +159,51 @@ static int gcm_crypt(struct aead_request *req, struct skcipher_walk *walk, ...@@ -158,49 +159,51 @@ static int gcm_crypt(struct aead_request *req, struct skcipher_walk *walk,
if (req->assoclen) if (req->assoclen)
gcm_calculate_auth_mac(req, ghash); gcm_calculate_auth_mac(req, ghash);
do { while (walk->nbytes) {
unsigned int tail = walk->nbytes % SM4_BLOCK_SIZE; unsigned int tail = walk->nbytes % SM4_BLOCK_SIZE;
const u8 *src = walk->src.virt.addr; const u8 *src = walk->src.virt.addr;
u8 *dst = walk->dst.virt.addr; u8 *dst = walk->dst.virt.addr;
if (walk->nbytes == walk->total) { if (walk->nbytes == walk->total) {
tail = 0;
sm4_ce_pmull_gcm_crypt(ctx->key.rkey_enc, dst, src, iv, sm4_ce_pmull_gcm_crypt(ctx->key.rkey_enc, dst, src, iv,
walk->nbytes, ghash, walk->nbytes, ghash,
ctx->ghash_table, ctx->ghash_table,
(const u8 *)&lengths); (const u8 *)&lengths);
} else if (walk->nbytes - tail) {
sm4_ce_pmull_gcm_crypt(ctx->key.rkey_enc, dst, src, iv, kernel_neon_end();
walk->nbytes - tail, ghash,
ctx->ghash_table, NULL); return skcipher_walk_done(walk, 0);
} }
sm4_ce_pmull_gcm_crypt(ctx->key.rkey_enc, dst, src, iv,
walk->nbytes - tail, ghash,
ctx->ghash_table, NULL);
kernel_neon_end(); kernel_neon_end();
err = skcipher_walk_done(walk, tail); err = skcipher_walk_done(walk, tail);
if (err)
return err;
if (walk->nbytes)
kernel_neon_begin();
} while (walk->nbytes > 0);
return 0; kernel_neon_begin();
}
sm4_ce_pmull_gcm_crypt(ctx->key.rkey_enc, NULL, NULL, iv,
walk->nbytes, ghash, ctx->ghash_table,
(const u8 *)&lengths);
kernel_neon_end();
return err;
} }
static int gcm_encrypt(struct aead_request *req) static int gcm_encrypt(struct aead_request *req)
{ {
struct crypto_aead *aead = crypto_aead_reqtfm(req); struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct sm4_gcm_ctx *ctx = crypto_aead_ctx(aead);
u8 __aligned(8) ghash[SM4_BLOCK_SIZE]; u8 __aligned(8) ghash[SM4_BLOCK_SIZE];
struct skcipher_walk walk; struct skcipher_walk walk;
int err; int err;
err = skcipher_walk_aead_encrypt(&walk, req, false); err = skcipher_walk_aead_encrypt(&walk, req, false);
if (err) err = gcm_crypt(req, &walk, ghash, err, sm4_ce_pmull_gcm_enc);
return err;
err = gcm_crypt(req, &walk, ctx, ghash, sm4_ce_pmull_gcm_enc);
if (err) if (err)
return err; return err;
...@@ -215,17 +218,13 @@ static int gcm_decrypt(struct aead_request *req) ...@@ -215,17 +218,13 @@ static int gcm_decrypt(struct aead_request *req)
{ {
struct crypto_aead *aead = crypto_aead_reqtfm(req); struct crypto_aead *aead = crypto_aead_reqtfm(req);
unsigned int authsize = crypto_aead_authsize(aead); unsigned int authsize = crypto_aead_authsize(aead);
struct sm4_gcm_ctx *ctx = crypto_aead_ctx(aead);
u8 __aligned(8) ghash[SM4_BLOCK_SIZE]; u8 __aligned(8) ghash[SM4_BLOCK_SIZE];
u8 authtag[SM4_BLOCK_SIZE]; u8 authtag[SM4_BLOCK_SIZE];
struct skcipher_walk walk; struct skcipher_walk walk;
int err; int err;
err = skcipher_walk_aead_decrypt(&walk, req, false); err = skcipher_walk_aead_decrypt(&walk, req, false);
if (err) err = gcm_crypt(req, &walk, ghash, err, sm4_ce_pmull_gcm_dec);
return err;
err = gcm_crypt(req, &walk, ctx, ghash, sm4_ce_pmull_gcm_dec);
if (err) if (err)
return err; return err;
......
...@@ -398,10 +398,6 @@ static int xts_aes_set_key(struct crypto_skcipher *tfm, const u8 *in_key, ...@@ -398,10 +398,6 @@ static int xts_aes_set_key(struct crypto_skcipher *tfm, const u8 *in_key,
if (err) if (err)
return err; return err;
/* In fips mode only 128 bit or 256 bit keys are valid */
if (fips_enabled && key_len != 32 && key_len != 64)
return -EINVAL;
/* Pick the correct function code based on the key length */ /* Pick the correct function code based on the key length */
fc = (key_len == 32) ? CPACF_KM_XTS_128 : fc = (key_len == 32) ? CPACF_KM_XTS_128 :
(key_len == 64) ? CPACF_KM_XTS_256 : 0; (key_len == 64) ? CPACF_KM_XTS_256 : 0;
......
...@@ -474,7 +474,7 @@ static int xts_paes_set_key(struct crypto_skcipher *tfm, const u8 *in_key, ...@@ -474,7 +474,7 @@ static int xts_paes_set_key(struct crypto_skcipher *tfm, const u8 *in_key,
return rc; return rc;
/* /*
* xts_check_key verifies the key length is not odd and makes * xts_verify_key verifies the key length is not odd and makes
* sure that the two keys are not the same. This can be done * sure that the two keys are not the same. This can be done
* on the two protected keys as well * on the two protected keys as well
*/ */
......
...@@ -19,3 +19,8 @@ config AS_TPAUSE ...@@ -19,3 +19,8 @@ config AS_TPAUSE
def_bool $(as-instr,tpause %ecx) def_bool $(as-instr,tpause %ecx)
help help
Supported by binutils >= 2.31.1 and LLVM integrated assembler >= V7 Supported by binutils >= 2.31.1 and LLVM integrated assembler >= V7
config AS_GFNI
def_bool $(as-instr,vgf2p8mulb %xmm0$(comma)%xmm1$(comma)%xmm2)
help
Supported by binutils >= 2.30 and LLVM integrated assembler
...@@ -304,6 +304,44 @@ config CRYPTO_ARIA_AESNI_AVX_X86_64 ...@@ -304,6 +304,44 @@ config CRYPTO_ARIA_AESNI_AVX_X86_64
Processes 16 blocks in parallel. Processes 16 blocks in parallel.
config CRYPTO_ARIA_AESNI_AVX2_X86_64
tristate "Ciphers: ARIA with modes: ECB, CTR (AES-NI/AVX2/GFNI)"
depends on X86 && 64BIT
select CRYPTO_SKCIPHER
select CRYPTO_SIMD
select CRYPTO_ALGAPI
select CRYPTO_ARIA
select CRYPTO_ARIA_AESNI_AVX_X86_64
help
Length-preserving cipher: ARIA cipher algorithms
(RFC 5794) with ECB and CTR modes
Architecture: x86_64 using:
- AES-NI (AES New Instructions)
- AVX2 (Advanced Vector Extensions)
- GFNI (Galois Field New Instructions)
Processes 32 blocks in parallel.
config CRYPTO_ARIA_GFNI_AVX512_X86_64
tristate "Ciphers: ARIA with modes: ECB, CTR (AVX512/GFNI)"
depends on X86 && 64BIT && AS_AVX512 && AS_GFNI
select CRYPTO_SKCIPHER
select CRYPTO_SIMD
select CRYPTO_ALGAPI
select CRYPTO_ARIA
select CRYPTO_ARIA_AESNI_AVX_X86_64
select CRYPTO_ARIA_AESNI_AVX2_X86_64
help
Length-preserving cipher: ARIA cipher algorithms
(RFC 5794) with ECB and CTR modes
Architecture: x86_64 using:
- AVX512 (Advanced Vector Extensions)
- GFNI (Galois Field New Instructions)
Processes 64 blocks in parallel.
config CRYPTO_CHACHA20_X86_64 config CRYPTO_CHACHA20_X86_64
tristate "Ciphers: ChaCha20, XChaCha20, XChaCha12 (SSSE3/AVX2/AVX-512VL)" tristate "Ciphers: ChaCha20, XChaCha20, XChaCha12 (SSSE3/AVX2/AVX-512VL)"
depends on X86 && 64BIT depends on X86 && 64BIT
......
...@@ -103,6 +103,12 @@ sm4-aesni-avx2-x86_64-y := sm4-aesni-avx2-asm_64.o sm4_aesni_avx2_glue.o ...@@ -103,6 +103,12 @@ sm4-aesni-avx2-x86_64-y := sm4-aesni-avx2-asm_64.o sm4_aesni_avx2_glue.o
obj-$(CONFIG_CRYPTO_ARIA_AESNI_AVX_X86_64) += aria-aesni-avx-x86_64.o obj-$(CONFIG_CRYPTO_ARIA_AESNI_AVX_X86_64) += aria-aesni-avx-x86_64.o
aria-aesni-avx-x86_64-y := aria-aesni-avx-asm_64.o aria_aesni_avx_glue.o aria-aesni-avx-x86_64-y := aria-aesni-avx-asm_64.o aria_aesni_avx_glue.o
obj-$(CONFIG_CRYPTO_ARIA_AESNI_AVX2_X86_64) += aria-aesni-avx2-x86_64.o
aria-aesni-avx2-x86_64-y := aria-aesni-avx2-asm_64.o aria_aesni_avx2_glue.o
obj-$(CONFIG_CRYPTO_ARIA_GFNI_AVX512_X86_64) += aria-gfni-avx512-x86_64.o
aria-gfni-avx512-x86_64-y := aria-gfni-avx512-asm_64.o aria_gfni_avx512_glue.o
quiet_cmd_perlasm = PERLASM $@ quiet_cmd_perlasm = PERLASM $@
cmd_perlasm = $(PERL) $< > $@ cmd_perlasm = $(PERL) $< > $@
$(obj)/%.S: $(src)/%.pl FORCE $(obj)/%.S: $(src)/%.pl FORCE
......
This diff is collapsed.
This diff is collapsed.
...@@ -5,12 +5,58 @@ ...@@ -5,12 +5,58 @@
#include <linux/types.h> #include <linux/types.h>
#define ARIA_AESNI_PARALLEL_BLOCKS 16 #define ARIA_AESNI_PARALLEL_BLOCKS 16
#define ARIA_AESNI_PARALLEL_BLOCK_SIZE (ARIA_BLOCK_SIZE * 16) #define ARIA_AESNI_PARALLEL_BLOCK_SIZE (ARIA_BLOCK_SIZE * ARIA_AESNI_PARALLEL_BLOCKS)
#define ARIA_AESNI_AVX2_PARALLEL_BLOCKS 32
#define ARIA_AESNI_AVX2_PARALLEL_BLOCK_SIZE (ARIA_BLOCK_SIZE * ARIA_AESNI_AVX2_PARALLEL_BLOCKS)
#define ARIA_GFNI_AVX512_PARALLEL_BLOCKS 64
#define ARIA_GFNI_AVX512_PARALLEL_BLOCK_SIZE (ARIA_BLOCK_SIZE * ARIA_GFNI_AVX512_PARALLEL_BLOCKS)
asmlinkage void aria_aesni_avx_encrypt_16way(const void *ctx, u8 *dst,
const u8 *src);
asmlinkage void aria_aesni_avx_decrypt_16way(const void *ctx, u8 *dst,
const u8 *src);
asmlinkage void aria_aesni_avx_ctr_crypt_16way(const void *ctx, u8 *dst,
const u8 *src,
u8 *keystream, u8 *iv);
asmlinkage void aria_aesni_avx_gfni_encrypt_16way(const void *ctx, u8 *dst,
const u8 *src);
asmlinkage void aria_aesni_avx_gfni_decrypt_16way(const void *ctx, u8 *dst,
const u8 *src);
asmlinkage void aria_aesni_avx_gfni_ctr_crypt_16way(const void *ctx, u8 *dst,
const u8 *src,
u8 *keystream, u8 *iv);
asmlinkage void aria_aesni_avx2_encrypt_32way(const void *ctx, u8 *dst,
const u8 *src);
asmlinkage void aria_aesni_avx2_decrypt_32way(const void *ctx, u8 *dst,
const u8 *src);
asmlinkage void aria_aesni_avx2_ctr_crypt_32way(const void *ctx, u8 *dst,
const u8 *src,
u8 *keystream, u8 *iv);
asmlinkage void aria_aesni_avx2_gfni_encrypt_32way(const void *ctx, u8 *dst,
const u8 *src);
asmlinkage void aria_aesni_avx2_gfni_decrypt_32way(const void *ctx, u8 *dst,
const u8 *src);
asmlinkage void aria_aesni_avx2_gfni_ctr_crypt_32way(const void *ctx, u8 *dst,
const u8 *src,
u8 *keystream, u8 *iv);
struct aria_avx_ops { struct aria_avx_ops {
void (*aria_encrypt_16way)(const void *ctx, u8 *dst, const u8 *src); void (*aria_encrypt_16way)(const void *ctx, u8 *dst, const u8 *src);
void (*aria_decrypt_16way)(const void *ctx, u8 *dst, const u8 *src); void (*aria_decrypt_16way)(const void *ctx, u8 *dst, const u8 *src);
void (*aria_ctr_crypt_16way)(const void *ctx, u8 *dst, const u8 *src, void (*aria_ctr_crypt_16way)(const void *ctx, u8 *dst, const u8 *src,
u8 *keystream, u8 *iv); u8 *keystream, u8 *iv);
void (*aria_encrypt_32way)(const void *ctx, u8 *dst, const u8 *src);
void (*aria_decrypt_32way)(const void *ctx, u8 *dst, const u8 *src);
void (*aria_ctr_crypt_32way)(const void *ctx, u8 *dst, const u8 *src,
u8 *keystream, u8 *iv);
void (*aria_encrypt_64way)(const void *ctx, u8 *dst, const u8 *src);
void (*aria_decrypt_64way)(const void *ctx, u8 *dst, const u8 *src);
void (*aria_ctr_crypt_64way)(const void *ctx, u8 *dst, const u8 *src,
u8 *keystream, u8 *iv);
}; };
#endif #endif
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
* Glue Code for the AVX2/AES-NI/GFNI assembler implementation of the ARIA Cipher
*
* Copyright (c) 2022 Taehee Yoo <ap420073@gmail.com>
*/
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/aria.h>
#include <linux/crypto.h>
#include <linux/err.h>
#include <linux/module.h>
#include <linux/types.h>
#include "ecb_cbc_helpers.h"
#include "aria-avx.h"
asmlinkage void aria_aesni_avx2_encrypt_32way(const void *ctx, u8 *dst,
const u8 *src);
EXPORT_SYMBOL_GPL(aria_aesni_avx2_encrypt_32way);
asmlinkage void aria_aesni_avx2_decrypt_32way(const void *ctx, u8 *dst,
const u8 *src);
EXPORT_SYMBOL_GPL(aria_aesni_avx2_decrypt_32way);
asmlinkage void aria_aesni_avx2_ctr_crypt_32way(const void *ctx, u8 *dst,
const u8 *src,
u8 *keystream, u8 *iv);
EXPORT_SYMBOL_GPL(aria_aesni_avx2_ctr_crypt_32way);
#ifdef CONFIG_AS_GFNI
asmlinkage void aria_aesni_avx2_gfni_encrypt_32way(const void *ctx, u8 *dst,
const u8 *src);
EXPORT_SYMBOL_GPL(aria_aesni_avx2_gfni_encrypt_32way);
asmlinkage void aria_aesni_avx2_gfni_decrypt_32way(const void *ctx, u8 *dst,
const u8 *src);
EXPORT_SYMBOL_GPL(aria_aesni_avx2_gfni_decrypt_32way);
asmlinkage void aria_aesni_avx2_gfni_ctr_crypt_32way(const void *ctx, u8 *dst,
const u8 *src,
u8 *keystream, u8 *iv);
EXPORT_SYMBOL_GPL(aria_aesni_avx2_gfni_ctr_crypt_32way);
#endif /* CONFIG_AS_GFNI */
static struct aria_avx_ops aria_ops;
struct aria_avx2_request_ctx {
u8 keystream[ARIA_AESNI_AVX2_PARALLEL_BLOCK_SIZE];
};
static int ecb_do_encrypt(struct skcipher_request *req, const u32 *rkey)
{
ECB_WALK_START(req, ARIA_BLOCK_SIZE, ARIA_AESNI_PARALLEL_BLOCKS);
ECB_BLOCK(ARIA_AESNI_AVX2_PARALLEL_BLOCKS, aria_ops.aria_encrypt_32way);
ECB_BLOCK(ARIA_AESNI_PARALLEL_BLOCKS, aria_ops.aria_encrypt_16way);
ECB_BLOCK(1, aria_encrypt);
ECB_WALK_END();
}
static int ecb_do_decrypt(struct skcipher_request *req, const u32 *rkey)
{
ECB_WALK_START(req, ARIA_BLOCK_SIZE, ARIA_AESNI_PARALLEL_BLOCKS);
ECB_BLOCK(ARIA_AESNI_AVX2_PARALLEL_BLOCKS, aria_ops.aria_decrypt_32way);
ECB_BLOCK(ARIA_AESNI_PARALLEL_BLOCKS, aria_ops.aria_decrypt_16way);
ECB_BLOCK(1, aria_decrypt);
ECB_WALK_END();
}
static int aria_avx2_ecb_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct aria_ctx *ctx = crypto_skcipher_ctx(tfm);
return ecb_do_encrypt(req, ctx->enc_key[0]);
}
static int aria_avx2_ecb_decrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct aria_ctx *ctx = crypto_skcipher_ctx(tfm);
return ecb_do_decrypt(req, ctx->dec_key[0]);
}
static int aria_avx2_set_key(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
return aria_set_key(&tfm->base, key, keylen);
}
static int aria_avx2_ctr_encrypt(struct skcipher_request *req)
{
struct aria_avx2_request_ctx *req_ctx = skcipher_request_ctx(req);
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct aria_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
unsigned int nbytes;
int err;
err = skcipher_walk_virt(&walk, req, false);
while ((nbytes = walk.nbytes) > 0) {
const u8 *src = walk.src.virt.addr;
u8 *dst = walk.dst.virt.addr;
while (nbytes >= ARIA_AESNI_AVX2_PARALLEL_BLOCK_SIZE) {
kernel_fpu_begin();
aria_ops.aria_ctr_crypt_32way(ctx, dst, src,
&req_ctx->keystream[0],
walk.iv);
kernel_fpu_end();
dst += ARIA_AESNI_AVX2_PARALLEL_BLOCK_SIZE;
src += ARIA_AESNI_AVX2_PARALLEL_BLOCK_SIZE;
nbytes -= ARIA_AESNI_AVX2_PARALLEL_BLOCK_SIZE;
}
while (nbytes >= ARIA_AESNI_PARALLEL_BLOCK_SIZE) {
kernel_fpu_begin();
aria_ops.aria_ctr_crypt_16way(ctx, dst, src,
&req_ctx->keystream[0],
walk.iv);
kernel_fpu_end();
dst += ARIA_AESNI_PARALLEL_BLOCK_SIZE;
src += ARIA_AESNI_PARALLEL_BLOCK_SIZE;
nbytes -= ARIA_AESNI_PARALLEL_BLOCK_SIZE;
}
while (nbytes >= ARIA_BLOCK_SIZE) {
memcpy(&req_ctx->keystream[0], walk.iv, ARIA_BLOCK_SIZE);
crypto_inc(walk.iv, ARIA_BLOCK_SIZE);
aria_encrypt(ctx, &req_ctx->keystream[0],
&req_ctx->keystream[0]);
crypto_xor_cpy(dst, src, &req_ctx->keystream[0],
ARIA_BLOCK_SIZE);
dst += ARIA_BLOCK_SIZE;
src += ARIA_BLOCK_SIZE;
nbytes -= ARIA_BLOCK_SIZE;
}
if (walk.nbytes == walk.total && nbytes > 0) {
memcpy(&req_ctx->keystream[0], walk.iv,
ARIA_BLOCK_SIZE);
crypto_inc(walk.iv, ARIA_BLOCK_SIZE);
aria_encrypt(ctx, &req_ctx->keystream[0],
&req_ctx->keystream[0]);
crypto_xor_cpy(dst, src, &req_ctx->keystream[0],
nbytes);
dst += nbytes;
src += nbytes;
nbytes = 0;
}
err = skcipher_walk_done(&walk, nbytes);
}
return err;
}
static int aria_avx2_init_tfm(struct crypto_skcipher *tfm)
{
crypto_skcipher_set_reqsize(tfm, sizeof(struct aria_avx2_request_ctx));
return 0;
}
static struct skcipher_alg aria_algs[] = {
{
.base.cra_name = "__ecb(aria)",
.base.cra_driver_name = "__ecb-aria-avx2",
.base.cra_priority = 500,
.base.cra_flags = CRYPTO_ALG_INTERNAL,
.base.cra_blocksize = ARIA_BLOCK_SIZE,
.base.cra_ctxsize = sizeof(struct aria_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = ARIA_MIN_KEY_SIZE,
.max_keysize = ARIA_MAX_KEY_SIZE,
.setkey = aria_avx2_set_key,
.encrypt = aria_avx2_ecb_encrypt,
.decrypt = aria_avx2_ecb_decrypt,
}, {
.base.cra_name = "__ctr(aria)",
.base.cra_driver_name = "__ctr-aria-avx2",
.base.cra_priority = 500,
.base.cra_flags = CRYPTO_ALG_INTERNAL |
CRYPTO_ALG_SKCIPHER_REQSIZE_LARGE,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct aria_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = ARIA_MIN_KEY_SIZE,
.max_keysize = ARIA_MAX_KEY_SIZE,
.ivsize = ARIA_BLOCK_SIZE,
.chunksize = ARIA_BLOCK_SIZE,
.setkey = aria_avx2_set_key,
.encrypt = aria_avx2_ctr_encrypt,
.decrypt = aria_avx2_ctr_encrypt,
.init = aria_avx2_init_tfm,
}
};
static struct simd_skcipher_alg *aria_simd_algs[ARRAY_SIZE(aria_algs)];
static int __init aria_avx2_init(void)
{
const char *feature_name;
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AVX2) ||
!boot_cpu_has(X86_FEATURE_AES) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
pr_info("AVX2 or AES-NI instructions are not detected.\n");
return -ENODEV;
}
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
&feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
return -ENODEV;
}
if (boot_cpu_has(X86_FEATURE_GFNI) && IS_ENABLED(CONFIG_AS_GFNI)) {
aria_ops.aria_encrypt_16way = aria_aesni_avx_gfni_encrypt_16way;
aria_ops.aria_decrypt_16way = aria_aesni_avx_gfni_decrypt_16way;
aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_gfni_ctr_crypt_16way;
aria_ops.aria_encrypt_32way = aria_aesni_avx2_gfni_encrypt_32way;
aria_ops.aria_decrypt_32way = aria_aesni_avx2_gfni_decrypt_32way;
aria_ops.aria_ctr_crypt_32way = aria_aesni_avx2_gfni_ctr_crypt_32way;
} else {
aria_ops.aria_encrypt_16way = aria_aesni_avx_encrypt_16way;
aria_ops.aria_decrypt_16way = aria_aesni_avx_decrypt_16way;
aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_ctr_crypt_16way;
aria_ops.aria_encrypt_32way = aria_aesni_avx2_encrypt_32way;
aria_ops.aria_decrypt_32way = aria_aesni_avx2_decrypt_32way;
aria_ops.aria_ctr_crypt_32way = aria_aesni_avx2_ctr_crypt_32way;
}
return simd_register_skciphers_compat(aria_algs,
ARRAY_SIZE(aria_algs),
aria_simd_algs);
}
static void __exit aria_avx2_exit(void)
{
simd_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs),
aria_simd_algs);
}
module_init(aria_avx2_init);
module_exit(aria_avx2_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Taehee Yoo <ap420073@gmail.com>");
MODULE_DESCRIPTION("ARIA Cipher Algorithm, AVX2/AES-NI/GFNI optimized");
MODULE_ALIAS_CRYPTO("aria");
MODULE_ALIAS_CRYPTO("aria-aesni-avx2");
...@@ -18,21 +18,33 @@ ...@@ -18,21 +18,33 @@
asmlinkage void aria_aesni_avx_encrypt_16way(const void *ctx, u8 *dst, asmlinkage void aria_aesni_avx_encrypt_16way(const void *ctx, u8 *dst,
const u8 *src); const u8 *src);
EXPORT_SYMBOL_GPL(aria_aesni_avx_encrypt_16way);
asmlinkage void aria_aesni_avx_decrypt_16way(const void *ctx, u8 *dst, asmlinkage void aria_aesni_avx_decrypt_16way(const void *ctx, u8 *dst,
const u8 *src); const u8 *src);
EXPORT_SYMBOL_GPL(aria_aesni_avx_decrypt_16way);
asmlinkage void aria_aesni_avx_ctr_crypt_16way(const void *ctx, u8 *dst, asmlinkage void aria_aesni_avx_ctr_crypt_16way(const void *ctx, u8 *dst,
const u8 *src, const u8 *src,
u8 *keystream, u8 *iv); u8 *keystream, u8 *iv);
EXPORT_SYMBOL_GPL(aria_aesni_avx_ctr_crypt_16way);
#ifdef CONFIG_AS_GFNI
asmlinkage void aria_aesni_avx_gfni_encrypt_16way(const void *ctx, u8 *dst, asmlinkage void aria_aesni_avx_gfni_encrypt_16way(const void *ctx, u8 *dst,
const u8 *src); const u8 *src);
EXPORT_SYMBOL_GPL(aria_aesni_avx_gfni_encrypt_16way);
asmlinkage void aria_aesni_avx_gfni_decrypt_16way(const void *ctx, u8 *dst, asmlinkage void aria_aesni_avx_gfni_decrypt_16way(const void *ctx, u8 *dst,
const u8 *src); const u8 *src);
EXPORT_SYMBOL_GPL(aria_aesni_avx_gfni_decrypt_16way);
asmlinkage void aria_aesni_avx_gfni_ctr_crypt_16way(const void *ctx, u8 *dst, asmlinkage void aria_aesni_avx_gfni_ctr_crypt_16way(const void *ctx, u8 *dst,
const u8 *src, const u8 *src,
u8 *keystream, u8 *iv); u8 *keystream, u8 *iv);
EXPORT_SYMBOL_GPL(aria_aesni_avx_gfni_ctr_crypt_16way);
#endif /* CONFIG_AS_GFNI */
static struct aria_avx_ops aria_ops; static struct aria_avx_ops aria_ops;
struct aria_avx_request_ctx {
u8 keystream[ARIA_AESNI_PARALLEL_BLOCK_SIZE];
};
static int ecb_do_encrypt(struct skcipher_request *req, const u32 *rkey) static int ecb_do_encrypt(struct skcipher_request *req, const u32 *rkey)
{ {
ECB_WALK_START(req, ARIA_BLOCK_SIZE, ARIA_AESNI_PARALLEL_BLOCKS); ECB_WALK_START(req, ARIA_BLOCK_SIZE, ARIA_AESNI_PARALLEL_BLOCKS);
...@@ -73,6 +85,7 @@ static int aria_avx_set_key(struct crypto_skcipher *tfm, const u8 *key, ...@@ -73,6 +85,7 @@ static int aria_avx_set_key(struct crypto_skcipher *tfm, const u8 *key,
static int aria_avx_ctr_encrypt(struct skcipher_request *req) static int aria_avx_ctr_encrypt(struct skcipher_request *req)
{ {
struct aria_avx_request_ctx *req_ctx = skcipher_request_ctx(req);
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct aria_ctx *ctx = crypto_skcipher_ctx(tfm); struct aria_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk; struct skcipher_walk walk;
...@@ -86,10 +99,9 @@ static int aria_avx_ctr_encrypt(struct skcipher_request *req) ...@@ -86,10 +99,9 @@ static int aria_avx_ctr_encrypt(struct skcipher_request *req)
u8 *dst = walk.dst.virt.addr; u8 *dst = walk.dst.virt.addr;
while (nbytes >= ARIA_AESNI_PARALLEL_BLOCK_SIZE) { while (nbytes >= ARIA_AESNI_PARALLEL_BLOCK_SIZE) {
u8 keystream[ARIA_AESNI_PARALLEL_BLOCK_SIZE];
kernel_fpu_begin(); kernel_fpu_begin();
aria_ops.aria_ctr_crypt_16way(ctx, dst, src, keystream, aria_ops.aria_ctr_crypt_16way(ctx, dst, src,
&req_ctx->keystream[0],
walk.iv); walk.iv);
kernel_fpu_end(); kernel_fpu_end();
dst += ARIA_AESNI_PARALLEL_BLOCK_SIZE; dst += ARIA_AESNI_PARALLEL_BLOCK_SIZE;
...@@ -98,28 +110,29 @@ static int aria_avx_ctr_encrypt(struct skcipher_request *req) ...@@ -98,28 +110,29 @@ static int aria_avx_ctr_encrypt(struct skcipher_request *req)
} }
while (nbytes >= ARIA_BLOCK_SIZE) { while (nbytes >= ARIA_BLOCK_SIZE) {
u8 keystream[ARIA_BLOCK_SIZE]; memcpy(&req_ctx->keystream[0], walk.iv, ARIA_BLOCK_SIZE);
memcpy(keystream, walk.iv, ARIA_BLOCK_SIZE);
crypto_inc(walk.iv, ARIA_BLOCK_SIZE); crypto_inc(walk.iv, ARIA_BLOCK_SIZE);
aria_encrypt(ctx, keystream, keystream); aria_encrypt(ctx, &req_ctx->keystream[0],
&req_ctx->keystream[0]);
crypto_xor_cpy(dst, src, keystream, ARIA_BLOCK_SIZE); crypto_xor_cpy(dst, src, &req_ctx->keystream[0],
ARIA_BLOCK_SIZE);
dst += ARIA_BLOCK_SIZE; dst += ARIA_BLOCK_SIZE;
src += ARIA_BLOCK_SIZE; src += ARIA_BLOCK_SIZE;
nbytes -= ARIA_BLOCK_SIZE; nbytes -= ARIA_BLOCK_SIZE;
} }
if (walk.nbytes == walk.total && nbytes > 0) { if (walk.nbytes == walk.total && nbytes > 0) {
u8 keystream[ARIA_BLOCK_SIZE]; memcpy(&req_ctx->keystream[0], walk.iv,
ARIA_BLOCK_SIZE);
memcpy(keystream, walk.iv, ARIA_BLOCK_SIZE);
crypto_inc(walk.iv, ARIA_BLOCK_SIZE); crypto_inc(walk.iv, ARIA_BLOCK_SIZE);
aria_encrypt(ctx, keystream, keystream); aria_encrypt(ctx, &req_ctx->keystream[0],
&req_ctx->keystream[0]);
crypto_xor_cpy(dst, src, keystream, nbytes); crypto_xor_cpy(dst, src, &req_ctx->keystream[0],
nbytes);
dst += nbytes; dst += nbytes;
src += nbytes; src += nbytes;
nbytes = 0; nbytes = 0;
...@@ -130,6 +143,13 @@ static int aria_avx_ctr_encrypt(struct skcipher_request *req) ...@@ -130,6 +143,13 @@ static int aria_avx_ctr_encrypt(struct skcipher_request *req)
return err; return err;
} }
static int aria_avx_init_tfm(struct crypto_skcipher *tfm)
{
crypto_skcipher_set_reqsize(tfm, sizeof(struct aria_avx_request_ctx));
return 0;
}
static struct skcipher_alg aria_algs[] = { static struct skcipher_alg aria_algs[] = {
{ {
.base.cra_name = "__ecb(aria)", .base.cra_name = "__ecb(aria)",
...@@ -160,6 +180,7 @@ static struct skcipher_alg aria_algs[] = { ...@@ -160,6 +180,7 @@ static struct skcipher_alg aria_algs[] = {
.setkey = aria_avx_set_key, .setkey = aria_avx_set_key,
.encrypt = aria_avx_ctr_encrypt, .encrypt = aria_avx_ctr_encrypt,
.decrypt = aria_avx_ctr_encrypt, .decrypt = aria_avx_ctr_encrypt,
.init = aria_avx_init_tfm,
} }
}; };
...@@ -182,7 +203,7 @@ static int __init aria_avx_init(void) ...@@ -182,7 +203,7 @@ static int __init aria_avx_init(void)
return -ENODEV; return -ENODEV;
} }
if (boot_cpu_has(X86_FEATURE_GFNI)) { if (boot_cpu_has(X86_FEATURE_GFNI) && IS_ENABLED(CONFIG_AS_GFNI)) {
aria_ops.aria_encrypt_16way = aria_aesni_avx_gfni_encrypt_16way; aria_ops.aria_encrypt_16way = aria_aesni_avx_gfni_encrypt_16way;
aria_ops.aria_decrypt_16way = aria_aesni_avx_gfni_decrypt_16way; aria_ops.aria_decrypt_16way = aria_aesni_avx_gfni_decrypt_16way;
aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_gfni_ctr_crypt_16way; aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_gfni_ctr_crypt_16way;
......
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
* Glue Code for the AVX512/GFNI assembler implementation of the ARIA Cipher
*
* Copyright (c) 2022 Taehee Yoo <ap420073@gmail.com>
*/
#include <crypto/algapi.h>
#include <crypto/internal/simd.h>
#include <crypto/aria.h>
#include <linux/crypto.h>
#include <linux/err.h>
#include <linux/module.h>
#include <linux/types.h>
#include "ecb_cbc_helpers.h"
#include "aria-avx.h"
asmlinkage void aria_gfni_avx512_encrypt_64way(const void *ctx, u8 *dst,
const u8 *src);
asmlinkage void aria_gfni_avx512_decrypt_64way(const void *ctx, u8 *dst,
const u8 *src);
asmlinkage void aria_gfni_avx512_ctr_crypt_64way(const void *ctx, u8 *dst,
const u8 *src,
u8 *keystream, u8 *iv);
static struct aria_avx_ops aria_ops;
struct aria_avx512_request_ctx {
u8 keystream[ARIA_GFNI_AVX512_PARALLEL_BLOCK_SIZE];
};
static int ecb_do_encrypt(struct skcipher_request *req, const u32 *rkey)
{
ECB_WALK_START(req, ARIA_BLOCK_SIZE, ARIA_AESNI_PARALLEL_BLOCKS);
ECB_BLOCK(ARIA_GFNI_AVX512_PARALLEL_BLOCKS, aria_ops.aria_encrypt_64way);
ECB_BLOCK(ARIA_AESNI_AVX2_PARALLEL_BLOCKS, aria_ops.aria_encrypt_32way);
ECB_BLOCK(ARIA_AESNI_PARALLEL_BLOCKS, aria_ops.aria_encrypt_16way);
ECB_BLOCK(1, aria_encrypt);
ECB_WALK_END();
}
static int ecb_do_decrypt(struct skcipher_request *req, const u32 *rkey)
{
ECB_WALK_START(req, ARIA_BLOCK_SIZE, ARIA_AESNI_PARALLEL_BLOCKS);
ECB_BLOCK(ARIA_GFNI_AVX512_PARALLEL_BLOCKS, aria_ops.aria_decrypt_64way);
ECB_BLOCK(ARIA_AESNI_AVX2_PARALLEL_BLOCKS, aria_ops.aria_decrypt_32way);
ECB_BLOCK(ARIA_AESNI_PARALLEL_BLOCKS, aria_ops.aria_decrypt_16way);
ECB_BLOCK(1, aria_decrypt);
ECB_WALK_END();
}
static int aria_avx512_ecb_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct aria_ctx *ctx = crypto_skcipher_ctx(tfm);
return ecb_do_encrypt(req, ctx->enc_key[0]);
}
static int aria_avx512_ecb_decrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct aria_ctx *ctx = crypto_skcipher_ctx(tfm);
return ecb_do_decrypt(req, ctx->dec_key[0]);
}
static int aria_avx512_set_key(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
return aria_set_key(&tfm->base, key, keylen);
}
static int aria_avx512_ctr_encrypt(struct skcipher_request *req)
{
struct aria_avx512_request_ctx *req_ctx = skcipher_request_ctx(req);
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct aria_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
unsigned int nbytes;
int err;
err = skcipher_walk_virt(&walk, req, false);
while ((nbytes = walk.nbytes) > 0) {
const u8 *src = walk.src.virt.addr;
u8 *dst = walk.dst.virt.addr;
while (nbytes >= ARIA_GFNI_AVX512_PARALLEL_BLOCK_SIZE) {
kernel_fpu_begin();
aria_ops.aria_ctr_crypt_64way(ctx, dst, src,
&req_ctx->keystream[0],
walk.iv);
kernel_fpu_end();
dst += ARIA_GFNI_AVX512_PARALLEL_BLOCK_SIZE;
src += ARIA_GFNI_AVX512_PARALLEL_BLOCK_SIZE;
nbytes -= ARIA_GFNI_AVX512_PARALLEL_BLOCK_SIZE;
}
while (nbytes >= ARIA_AESNI_AVX2_PARALLEL_BLOCK_SIZE) {
kernel_fpu_begin();
aria_ops.aria_ctr_crypt_32way(ctx, dst, src,
&req_ctx->keystream[0],
walk.iv);
kernel_fpu_end();
dst += ARIA_AESNI_AVX2_PARALLEL_BLOCK_SIZE;
src += ARIA_AESNI_AVX2_PARALLEL_BLOCK_SIZE;
nbytes -= ARIA_AESNI_AVX2_PARALLEL_BLOCK_SIZE;
}
while (nbytes >= ARIA_AESNI_PARALLEL_BLOCK_SIZE) {
kernel_fpu_begin();
aria_ops.aria_ctr_crypt_16way(ctx, dst, src,
&req_ctx->keystream[0],
walk.iv);
kernel_fpu_end();
dst += ARIA_AESNI_PARALLEL_BLOCK_SIZE;
src += ARIA_AESNI_PARALLEL_BLOCK_SIZE;
nbytes -= ARIA_AESNI_PARALLEL_BLOCK_SIZE;
}
while (nbytes >= ARIA_BLOCK_SIZE) {
memcpy(&req_ctx->keystream[0], walk.iv,
ARIA_BLOCK_SIZE);
crypto_inc(walk.iv, ARIA_BLOCK_SIZE);
aria_encrypt(ctx, &req_ctx->keystream[0],
&req_ctx->keystream[0]);
crypto_xor_cpy(dst, src, &req_ctx->keystream[0],
ARIA_BLOCK_SIZE);
dst += ARIA_BLOCK_SIZE;
src += ARIA_BLOCK_SIZE;
nbytes -= ARIA_BLOCK_SIZE;
}
if (walk.nbytes == walk.total && nbytes > 0) {
memcpy(&req_ctx->keystream[0], walk.iv,
ARIA_BLOCK_SIZE);
crypto_inc(walk.iv, ARIA_BLOCK_SIZE);
aria_encrypt(ctx, &req_ctx->keystream[0],
&req_ctx->keystream[0]);
crypto_xor_cpy(dst, src, &req_ctx->keystream[0],
nbytes);
dst += nbytes;
src += nbytes;
nbytes = 0;
}
err = skcipher_walk_done(&walk, nbytes);
}
return err;
}
static int aria_avx512_init_tfm(struct crypto_skcipher *tfm)
{
crypto_skcipher_set_reqsize(tfm,
sizeof(struct aria_avx512_request_ctx));
return 0;
}
static struct skcipher_alg aria_algs[] = {
{
.base.cra_name = "__ecb(aria)",
.base.cra_driver_name = "__ecb-aria-avx512",
.base.cra_priority = 600,
.base.cra_flags = CRYPTO_ALG_INTERNAL,
.base.cra_blocksize = ARIA_BLOCK_SIZE,
.base.cra_ctxsize = sizeof(struct aria_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = ARIA_MIN_KEY_SIZE,
.max_keysize = ARIA_MAX_KEY_SIZE,
.setkey = aria_avx512_set_key,
.encrypt = aria_avx512_ecb_encrypt,
.decrypt = aria_avx512_ecb_decrypt,
}, {
.base.cra_name = "__ctr(aria)",
.base.cra_driver_name = "__ctr-aria-avx512",
.base.cra_priority = 600,
.base.cra_flags = CRYPTO_ALG_INTERNAL |
CRYPTO_ALG_SKCIPHER_REQSIZE_LARGE,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct aria_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = ARIA_MIN_KEY_SIZE,
.max_keysize = ARIA_MAX_KEY_SIZE,
.ivsize = ARIA_BLOCK_SIZE,
.chunksize = ARIA_BLOCK_SIZE,
.setkey = aria_avx512_set_key,
.encrypt = aria_avx512_ctr_encrypt,
.decrypt = aria_avx512_ctr_encrypt,
.init = aria_avx512_init_tfm,
}
};
static struct simd_skcipher_alg *aria_simd_algs[ARRAY_SIZE(aria_algs)];
static int __init aria_avx512_init(void)
{
const char *feature_name;
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AVX2) ||
!boot_cpu_has(X86_FEATURE_AVX512F) ||
!boot_cpu_has(X86_FEATURE_AVX512VL) ||
!boot_cpu_has(X86_FEATURE_GFNI) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE)) {
pr_info("AVX512/GFNI instructions are not detected.\n");
return -ENODEV;
}
if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
XFEATURE_MASK_AVX512, &feature_name)) {
pr_info("CPU feature '%s' is not supported.\n", feature_name);
return -ENODEV;
}
aria_ops.aria_encrypt_16way = aria_aesni_avx_gfni_encrypt_16way;
aria_ops.aria_decrypt_16way = aria_aesni_avx_gfni_decrypt_16way;
aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_gfni_ctr_crypt_16way;
aria_ops.aria_encrypt_32way = aria_aesni_avx2_gfni_encrypt_32way;
aria_ops.aria_decrypt_32way = aria_aesni_avx2_gfni_decrypt_32way;
aria_ops.aria_ctr_crypt_32way = aria_aesni_avx2_gfni_ctr_crypt_32way;
aria_ops.aria_encrypt_64way = aria_gfni_avx512_encrypt_64way;
aria_ops.aria_decrypt_64way = aria_gfni_avx512_decrypt_64way;
aria_ops.aria_ctr_crypt_64way = aria_gfni_avx512_ctr_crypt_64way;
return simd_register_skciphers_compat(aria_algs,
ARRAY_SIZE(aria_algs),
aria_simd_algs);
}
static void __exit aria_avx512_exit(void)
{
simd_unregister_skciphers(aria_algs, ARRAY_SIZE(aria_algs),
aria_simd_algs);
}
module_init(aria_avx512_init);
module_exit(aria_avx512_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Taehee Yoo <ap420073@gmail.com>");
MODULE_DESCRIPTION("ARIA Cipher Algorithm, AVX512/GFNI optimized");
MODULE_ALIAS_CRYPTO("aria");
MODULE_ALIAS_CRYPTO("aria-gfni-avx512");
...@@ -6,7 +6,6 @@ ...@@ -6,7 +6,6 @@
*/ */
#include <linux/linkage.h> #include <linux/linkage.h>
#include <linux/cfi_types.h>
.file "blowfish-x86_64-asm.S" .file "blowfish-x86_64-asm.S"
.text .text
...@@ -100,16 +99,11 @@ ...@@ -100,16 +99,11 @@
bswapq RX0; \ bswapq RX0; \
movq RX0, (RIO); movq RX0, (RIO);
#define xor_block() \ SYM_FUNC_START(blowfish_enc_blk)
bswapq RX0; \
xorq RX0, (RIO);
SYM_FUNC_START(__blowfish_enc_blk)
/* input: /* input:
* %rdi: ctx * %rdi: ctx
* %rsi: dst * %rsi: dst
* %rdx: src * %rdx: src
* %rcx: bool, if true: xor output
*/ */
movq %r12, %r11; movq %r12, %r11;
...@@ -130,19 +124,13 @@ SYM_FUNC_START(__blowfish_enc_blk) ...@@ -130,19 +124,13 @@ SYM_FUNC_START(__blowfish_enc_blk)
add_roundkey_enc(16); add_roundkey_enc(16);
movq %r11, %r12; movq %r11, %r12;
movq %r10, RIO; movq %r10, RIO;
test %cl, %cl;
jnz .L__enc_xor;
write_block(); write_block();
RET; RET;
.L__enc_xor: SYM_FUNC_END(blowfish_enc_blk)
xor_block();
RET;
SYM_FUNC_END(__blowfish_enc_blk)
SYM_TYPED_FUNC_START(blowfish_dec_blk) SYM_FUNC_START(blowfish_dec_blk)
/* input: /* input:
* %rdi: ctx * %rdi: ctx
* %rsi: dst * %rsi: dst
...@@ -272,28 +260,26 @@ SYM_FUNC_END(blowfish_dec_blk) ...@@ -272,28 +260,26 @@ SYM_FUNC_END(blowfish_dec_blk)
movq RX3, 24(RIO); movq RX3, 24(RIO);
#define xor_block4() \ #define xor_block4() \
bswapq RX0; \ movq (RIO), RT0; \
xorq RX0, (RIO); \ bswapq RT0; \
xorq RT0, RX1; \
\ \
bswapq RX1; \ movq 8(RIO), RT2; \
xorq RX1, 8(RIO); \ bswapq RT2; \
xorq RT2, RX2; \
\ \
bswapq RX2; \ movq 16(RIO), RT3; \
xorq RX2, 16(RIO); \ bswapq RT3; \
\ xorq RT3, RX3;
bswapq RX3; \
xorq RX3, 24(RIO);
SYM_FUNC_START(__blowfish_enc_blk_4way) SYM_FUNC_START(blowfish_enc_blk_4way)
/* input: /* input:
* %rdi: ctx * %rdi: ctx
* %rsi: dst * %rsi: dst
* %rdx: src * %rdx: src
* %rcx: bool, if true: xor output
*/ */
pushq %r12; pushq %r12;
pushq %rbx; pushq %rbx;
pushq %rcx;
movq %rdi, CTX movq %rdi, CTX
movq %rsi, %r11; movq %rsi, %r11;
...@@ -313,37 +299,28 @@ SYM_FUNC_START(__blowfish_enc_blk_4way) ...@@ -313,37 +299,28 @@ SYM_FUNC_START(__blowfish_enc_blk_4way)
round_enc4(14); round_enc4(14);
add_preloaded_roundkey4(); add_preloaded_roundkey4();
popq %r12;
movq %r11, RIO; movq %r11, RIO;
test %r12b, %r12b;
jnz .L__enc_xor4;
write_block4(); write_block4();
popq %rbx; popq %rbx;
popq %r12; popq %r12;
RET; RET;
SYM_FUNC_END(blowfish_enc_blk_4way)
.L__enc_xor4: SYM_FUNC_START(__blowfish_dec_blk_4way)
xor_block4();
popq %rbx;
popq %r12;
RET;
SYM_FUNC_END(__blowfish_enc_blk_4way)
SYM_TYPED_FUNC_START(blowfish_dec_blk_4way)
/* input: /* input:
* %rdi: ctx * %rdi: ctx
* %rsi: dst * %rsi: dst
* %rdx: src * %rdx: src
* %rcx: cbc (bool)
*/ */
pushq %r12; pushq %r12;
pushq %rbx; pushq %rbx;
pushq %rcx;
pushq %rdx;
movq %rdi, CTX; movq %rdi, CTX;
movq %rsi, %r11 movq %rsi, %r11;
movq %rdx, RIO; movq %rdx, RIO;
preload_roundkey_dec(17); preload_roundkey_dec(17);
...@@ -359,6 +336,14 @@ SYM_TYPED_FUNC_START(blowfish_dec_blk_4way) ...@@ -359,6 +336,14 @@ SYM_TYPED_FUNC_START(blowfish_dec_blk_4way)
round_dec4(3); round_dec4(3);
add_preloaded_roundkey4(); add_preloaded_roundkey4();
popq RIO;
popq %r12;
testq %r12, %r12;
jz .L_no_cbc_xor;
xor_block4();
.L_no_cbc_xor:
movq %r11, RIO; movq %r11, RIO;
write_block4(); write_block4();
...@@ -366,4 +351,4 @@ SYM_TYPED_FUNC_START(blowfish_dec_blk_4way) ...@@ -366,4 +351,4 @@ SYM_TYPED_FUNC_START(blowfish_dec_blk_4way)
popq %r12; popq %r12;
RET; RET;
SYM_FUNC_END(blowfish_dec_blk_4way) SYM_FUNC_END(__blowfish_dec_blk_4way)
...@@ -16,26 +16,28 @@ ...@@ -16,26 +16,28 @@
#include <linux/module.h> #include <linux/module.h>
#include <linux/types.h> #include <linux/types.h>
#include "ecb_cbc_helpers.h"
/* regular block cipher functions */ /* regular block cipher functions */
asmlinkage void __blowfish_enc_blk(struct bf_ctx *ctx, u8 *dst, const u8 *src, asmlinkage void blowfish_enc_blk(struct bf_ctx *ctx, u8 *dst, const u8 *src);
bool xor);
asmlinkage void blowfish_dec_blk(struct bf_ctx *ctx, u8 *dst, const u8 *src); asmlinkage void blowfish_dec_blk(struct bf_ctx *ctx, u8 *dst, const u8 *src);
/* 4-way parallel cipher functions */ /* 4-way parallel cipher functions */
asmlinkage void __blowfish_enc_blk_4way(struct bf_ctx *ctx, u8 *dst, asmlinkage void blowfish_enc_blk_4way(struct bf_ctx *ctx, u8 *dst,
const u8 *src, bool xor);
asmlinkage void blowfish_dec_blk_4way(struct bf_ctx *ctx, u8 *dst,
const u8 *src); const u8 *src);
asmlinkage void __blowfish_dec_blk_4way(struct bf_ctx *ctx, u8 *dst,
const u8 *src, bool cbc);
static inline void blowfish_enc_blk(struct bf_ctx *ctx, u8 *dst, const u8 *src) static inline void blowfish_dec_ecb_4way(struct bf_ctx *ctx, u8 *dst,
const u8 *src)
{ {
__blowfish_enc_blk(ctx, dst, src, false); return __blowfish_dec_blk_4way(ctx, dst, src, false);
} }
static inline void blowfish_enc_blk_4way(struct bf_ctx *ctx, u8 *dst, static inline void blowfish_dec_cbc_4way(struct bf_ctx *ctx, u8 *dst,
const u8 *src) const u8 *src)
{ {
__blowfish_enc_blk_4way(ctx, dst, src, false); return __blowfish_dec_blk_4way(ctx, dst, src, true);
} }
static void blowfish_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src) static void blowfish_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
...@@ -54,183 +56,35 @@ static int blowfish_setkey_skcipher(struct crypto_skcipher *tfm, ...@@ -54,183 +56,35 @@ static int blowfish_setkey_skcipher(struct crypto_skcipher *tfm,
return blowfish_setkey(&tfm->base, key, keylen); return blowfish_setkey(&tfm->base, key, keylen);
} }
static int ecb_crypt(struct skcipher_request *req,
void (*fn)(struct bf_ctx *, u8 *, const u8 *),
void (*fn_4way)(struct bf_ctx *, u8 *, const u8 *))
{
unsigned int bsize = BF_BLOCK_SIZE;
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct bf_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
unsigned int nbytes;
int err;
err = skcipher_walk_virt(&walk, req, false);
while ((nbytes = walk.nbytes)) {
u8 *wsrc = walk.src.virt.addr;
u8 *wdst = walk.dst.virt.addr;
/* Process four block batch */
if (nbytes >= bsize * 4) {
do {
fn_4way(ctx, wdst, wsrc);
wsrc += bsize * 4;
wdst += bsize * 4;
nbytes -= bsize * 4;
} while (nbytes >= bsize * 4);
if (nbytes < bsize)
goto done;
}
/* Handle leftovers */
do {
fn(ctx, wdst, wsrc);
wsrc += bsize;
wdst += bsize;
nbytes -= bsize;
} while (nbytes >= bsize);
done:
err = skcipher_walk_done(&walk, nbytes);
}
return err;
}
static int ecb_encrypt(struct skcipher_request *req) static int ecb_encrypt(struct skcipher_request *req)
{ {
return ecb_crypt(req, blowfish_enc_blk, blowfish_enc_blk_4way); ECB_WALK_START(req, BF_BLOCK_SIZE, -1);
ECB_BLOCK(4, blowfish_enc_blk_4way);
ECB_BLOCK(1, blowfish_enc_blk);
ECB_WALK_END();
} }
static int ecb_decrypt(struct skcipher_request *req) static int ecb_decrypt(struct skcipher_request *req)
{ {
return ecb_crypt(req, blowfish_dec_blk, blowfish_dec_blk_4way); ECB_WALK_START(req, BF_BLOCK_SIZE, -1);
} ECB_BLOCK(4, blowfish_dec_ecb_4way);
ECB_BLOCK(1, blowfish_dec_blk);
static unsigned int __cbc_encrypt(struct bf_ctx *ctx, ECB_WALK_END();
struct skcipher_walk *walk)
{
unsigned int bsize = BF_BLOCK_SIZE;
unsigned int nbytes = walk->nbytes;
u64 *src = (u64 *)walk->src.virt.addr;
u64 *dst = (u64 *)walk->dst.virt.addr;
u64 *iv = (u64 *)walk->iv;
do {
*dst = *src ^ *iv;
blowfish_enc_blk(ctx, (u8 *)dst, (u8 *)dst);
iv = dst;
src += 1;
dst += 1;
nbytes -= bsize;
} while (nbytes >= bsize);
*(u64 *)walk->iv = *iv;
return nbytes;
} }
static int cbc_encrypt(struct skcipher_request *req) static int cbc_encrypt(struct skcipher_request *req)
{ {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); CBC_WALK_START(req, BF_BLOCK_SIZE, -1);
struct bf_ctx *ctx = crypto_skcipher_ctx(tfm); CBC_ENC_BLOCK(blowfish_enc_blk);
struct skcipher_walk walk; CBC_WALK_END();
unsigned int nbytes;
int err;
err = skcipher_walk_virt(&walk, req, false);
while (walk.nbytes) {
nbytes = __cbc_encrypt(ctx, &walk);
err = skcipher_walk_done(&walk, nbytes);
}
return err;
}
static unsigned int __cbc_decrypt(struct bf_ctx *ctx,
struct skcipher_walk *walk)
{
unsigned int bsize = BF_BLOCK_SIZE;
unsigned int nbytes = walk->nbytes;
u64 *src = (u64 *)walk->src.virt.addr;
u64 *dst = (u64 *)walk->dst.virt.addr;
u64 ivs[4 - 1];
u64 last_iv;
/* Start of the last block. */
src += nbytes / bsize - 1;
dst += nbytes / bsize - 1;
last_iv = *src;
/* Process four block batch */
if (nbytes >= bsize * 4) {
do {
nbytes -= bsize * 4 - bsize;
src -= 4 - 1;
dst -= 4 - 1;
ivs[0] = src[0];
ivs[1] = src[1];
ivs[2] = src[2];
blowfish_dec_blk_4way(ctx, (u8 *)dst, (u8 *)src);
dst[1] ^= ivs[0];
dst[2] ^= ivs[1];
dst[3] ^= ivs[2];
nbytes -= bsize;
if (nbytes < bsize)
goto done;
*dst ^= *(src - 1);
src -= 1;
dst -= 1;
} while (nbytes >= bsize * 4);
}
/* Handle leftovers */
for (;;) {
blowfish_dec_blk(ctx, (u8 *)dst, (u8 *)src);
nbytes -= bsize;
if (nbytes < bsize)
break;
*dst ^= *(src - 1);
src -= 1;
dst -= 1;
}
done:
*dst ^= *(u64 *)walk->iv;
*(u64 *)walk->iv = last_iv;
return nbytes;
} }
static int cbc_decrypt(struct skcipher_request *req) static int cbc_decrypt(struct skcipher_request *req)
{ {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); CBC_WALK_START(req, BF_BLOCK_SIZE, -1);
struct bf_ctx *ctx = crypto_skcipher_ctx(tfm); CBC_DEC_BLOCK(4, blowfish_dec_cbc_4way);
struct skcipher_walk walk; CBC_DEC_BLOCK(1, blowfish_dec_blk);
unsigned int nbytes; CBC_WALK_END();
int err;
err = skcipher_walk_virt(&walk, req, false);
while (walk.nbytes) {
nbytes = __cbc_decrypt(ctx, &walk);
err = skcipher_walk_done(&walk, nbytes);
}
return err;
} }
static struct crypto_alg bf_cipher_alg = { static struct crypto_alg bf_cipher_alg = {
......
...@@ -13,13 +13,14 @@ ...@@ -13,13 +13,14 @@
#define ECB_WALK_START(req, bsize, fpu_blocks) do { \ #define ECB_WALK_START(req, bsize, fpu_blocks) do { \
void *ctx = crypto_skcipher_ctx(crypto_skcipher_reqtfm(req)); \ void *ctx = crypto_skcipher_ctx(crypto_skcipher_reqtfm(req)); \
const int __fpu_blocks = (fpu_blocks); \
const int __bsize = (bsize); \ const int __bsize = (bsize); \
struct skcipher_walk walk; \ struct skcipher_walk walk; \
int err = skcipher_walk_virt(&walk, (req), false); \ int err = skcipher_walk_virt(&walk, (req), false); \
while (walk.nbytes > 0) { \ while (walk.nbytes > 0) { \
unsigned int nbytes = walk.nbytes; \ unsigned int nbytes = walk.nbytes; \
bool do_fpu = (fpu_blocks) != -1 && \ bool do_fpu = __fpu_blocks != -1 && \
nbytes >= (fpu_blocks) * __bsize; \ nbytes >= __fpu_blocks * __bsize; \
const u8 *src = walk.src.virt.addr; \ const u8 *src = walk.src.virt.addr; \
u8 *dst = walk.dst.virt.addr; \ u8 *dst = walk.dst.virt.addr; \
u8 __maybe_unused buf[(bsize)]; \ u8 __maybe_unused buf[(bsize)]; \
...@@ -35,7 +36,12 @@ ...@@ -35,7 +36,12 @@
} while (0) } while (0)
#define ECB_BLOCK(blocks, func) do { \ #define ECB_BLOCK(blocks, func) do { \
while (nbytes >= (blocks) * __bsize) { \ const int __blocks = (blocks); \
if (do_fpu && __blocks < __fpu_blocks) { \
kernel_fpu_end(); \
do_fpu = false; \
} \
while (nbytes >= __blocks * __bsize) { \
(func)(ctx, dst, src); \ (func)(ctx, dst, src); \
ECB_WALK_ADVANCE(blocks); \ ECB_WALK_ADVANCE(blocks); \
} \ } \
...@@ -53,7 +59,12 @@ ...@@ -53,7 +59,12 @@
} while (0) } while (0)
#define CBC_DEC_BLOCK(blocks, func) do { \ #define CBC_DEC_BLOCK(blocks, func) do { \
while (nbytes >= (blocks) * __bsize) { \ const int __blocks = (blocks); \
if (do_fpu && __blocks < __fpu_blocks) { \
kernel_fpu_end(); \
do_fpu = false; \
} \
while (nbytes >= __blocks * __bsize) { \
const u8 *__iv = src + ((blocks) - 1) * __bsize; \ const u8 *__iv = src + ((blocks) - 1) * __bsize; \
if (dst == src) \ if (dst == src) \
__iv = memcpy(buf, __iv, __bsize); \ __iv = memcpy(buf, __iv, __bsize); \
......
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
* instructions. This file contains accelerated part of ghash * instructions. This file contains accelerated part of ghash
* implementation. More information about PCLMULQDQ can be found at: * implementation. More information about PCLMULQDQ can be found at:
* *
* http://software.intel.com/en-us/articles/carry-less-multiplication-and-its-usage-for-computing-the-gcm-mode/ * https://www.intel.com/content/dam/develop/external/us/en/documents/clmul-wp-rev-2-02-2014-04-20.pdf
* *
* Copyright (c) 2009 Intel Corp. * Copyright (c) 2009 Intel Corp.
* Author: Huang Ying <ying.huang@intel.com> * Author: Huang Ying <ying.huang@intel.com>
...@@ -88,7 +88,7 @@ SYM_FUNC_START_LOCAL(__clmul_gf128mul_ble) ...@@ -88,7 +88,7 @@ SYM_FUNC_START_LOCAL(__clmul_gf128mul_ble)
RET RET
SYM_FUNC_END(__clmul_gf128mul_ble) SYM_FUNC_END(__clmul_gf128mul_ble)
/* void clmul_ghash_mul(char *dst, const u128 *shash) */ /* void clmul_ghash_mul(char *dst, const le128 *shash) */
SYM_FUNC_START(clmul_ghash_mul) SYM_FUNC_START(clmul_ghash_mul)
FRAME_BEGIN FRAME_BEGIN
movups (%rdi), DATA movups (%rdi), DATA
...@@ -104,7 +104,7 @@ SYM_FUNC_END(clmul_ghash_mul) ...@@ -104,7 +104,7 @@ SYM_FUNC_END(clmul_ghash_mul)
/* /*
* void clmul_ghash_update(char *dst, const char *src, unsigned int srclen, * void clmul_ghash_update(char *dst, const char *src, unsigned int srclen,
* const u128 *shash); * const le128 *shash);
*/ */
SYM_FUNC_START(clmul_ghash_update) SYM_FUNC_START(clmul_ghash_update)
FRAME_BEGIN FRAME_BEGIN
......
...@@ -19,21 +19,22 @@ ...@@ -19,21 +19,22 @@
#include <crypto/internal/simd.h> #include <crypto/internal/simd.h>
#include <asm/cpu_device_id.h> #include <asm/cpu_device_id.h>
#include <asm/simd.h> #include <asm/simd.h>
#include <asm/unaligned.h>
#define GHASH_BLOCK_SIZE 16 #define GHASH_BLOCK_SIZE 16
#define GHASH_DIGEST_SIZE 16 #define GHASH_DIGEST_SIZE 16
void clmul_ghash_mul(char *dst, const u128 *shash); void clmul_ghash_mul(char *dst, const le128 *shash);
void clmul_ghash_update(char *dst, const char *src, unsigned int srclen, void clmul_ghash_update(char *dst, const char *src, unsigned int srclen,
const u128 *shash); const le128 *shash);
struct ghash_async_ctx { struct ghash_async_ctx {
struct cryptd_ahash *cryptd_tfm; struct cryptd_ahash *cryptd_tfm;
}; };
struct ghash_ctx { struct ghash_ctx {
u128 shash; le128 shash;
}; };
struct ghash_desc_ctx { struct ghash_desc_ctx {
...@@ -54,22 +55,40 @@ static int ghash_setkey(struct crypto_shash *tfm, ...@@ -54,22 +55,40 @@ static int ghash_setkey(struct crypto_shash *tfm,
const u8 *key, unsigned int keylen) const u8 *key, unsigned int keylen)
{ {
struct ghash_ctx *ctx = crypto_shash_ctx(tfm); struct ghash_ctx *ctx = crypto_shash_ctx(tfm);
be128 *x = (be128 *)key;
u64 a, b; u64 a, b;
if (keylen != GHASH_BLOCK_SIZE) if (keylen != GHASH_BLOCK_SIZE)
return -EINVAL; return -EINVAL;
/* perform multiplication by 'x' in GF(2^128) */ /*
a = be64_to_cpu(x->a); * GHASH maps bits to polynomial coefficients backwards, which makes it
b = be64_to_cpu(x->b); * hard to implement. But it can be shown that the GHASH multiplication
*
ctx->shash.a = (b << 1) | (a >> 63); * D * K (mod x^128 + x^7 + x^2 + x + 1)
ctx->shash.b = (a << 1) | (b >> 63); *
* (where D is a data block and K is the key) is equivalent to:
*
* bitreflect(D) * bitreflect(K) * x^(-127)
* (mod x^128 + x^127 + x^126 + x^121 + 1)
*
* So, the code below precomputes:
*
* bitreflect(K) * x^(-127) (mod x^128 + x^127 + x^126 + x^121 + 1)
*
* ... but in Montgomery form (so that Montgomery multiplication can be
* used), i.e. with an extra x^128 factor, which means actually:
*
* bitreflect(K) * x (mod x^128 + x^127 + x^126 + x^121 + 1)
*
* The within-a-byte part of bitreflect() cancels out GHASH's built-in
* reflection, and thus bitreflect() is actually a byteswap.
*/
a = get_unaligned_be64(key);
b = get_unaligned_be64(key + 8);
ctx->shash.a = cpu_to_le64((a << 1) | (b >> 63));
ctx->shash.b = cpu_to_le64((b << 1) | (a >> 63));
if (a >> 63) if (a >> 63)
ctx->shash.b ^= ((u64)0xc2) << 56; ctx->shash.a ^= cpu_to_le64((u64)0xc2 << 56);
return 0; return 0;
} }
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment