Commits · c4f6d44d774eff382b6fc79a9fe1ff376b5ac6d7 · Kirill Smelkov / linux

23 Dec, 2018 18 commits

crypto: chelsio - cleanup:send addr as value in function argument · c4f6d44d

Harsh Jain authored Dec 11, 2018

Send dma address as value to function arguments instead of pointer.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c4f6d44d

crypto: chelsio - Use same value for both channel in single WR · d5a4dfbd

Harsh Jain authored Dec 11, 2018

Use tx_channel_id instead of rx_channel_id.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

d5a4dfbd

crypto: chelsio - Swap location of AAD and IV sent in WR · 1f479e4c

Harsh Jain authored Dec 11, 2018

Send input as IV | AAD | Data. It will allow sending IV as Immediate
Data and Creates space in Work request to add more dma mapped entries.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

1f479e4c

crypto: chelsio - remove set but not used variable 'kctx_len' · 3cc04c16

YueHaibing authored Dec 11, 2018

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/crypto/chelsio/chcr_ipsec.c: In function 'chcr_ipsec_xmit':
drivers/crypto/chelsio/chcr_ipsec.c:674:33: warning:
 variable 'kctx_len' set but not used [-Wunused-but-set-variable]
  unsigned int flits = 0, ndesc, kctx_len;

It not used since commit 8362ea16 ("crypto: chcr - ESN for Inline IPSec Tx")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

3cc04c16

crypto: ux500 - Use proper enum in hash_set_dma_transfer · 5ac93f80

Nathan Chancellor authored Dec 10, 2018

Clang warns when one enumerated type is implicitly converted to another:

drivers/crypto/ux500/hash/hash_core.c:169:4: warning: implicit
conversion from enumeration type 'enum dma_data_direction' to different
enumeration type 'enum dma_transfer_direction' [-Wenum-conversion]
                        direction, DMA_CTRL_ACK | DMA_PREP_INTERRUPT);
                        ^~~~~~~~~
1 warning generated.

dmaengine_prep_slave_sg expects an enum from dma_transfer_direction.
We know that the only direction supported by this function is
DMA_TO_DEVICE because of the check at the top of this function so we can
just use the equivalent value from dma_transfer_direction.

DMA_TO_DEVICE = DMA_MEM_TO_DEV = 1
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

5ac93f80

crypto: ux500 - Use proper enum in cryp_set_dma_transfer · 9d880c59

Nathan Chancellor authored Dec 10, 2018

Clang warns when one enumerated type is implicitly converted to another:

drivers/crypto/ux500/cryp/cryp_core.c:559:5: warning: implicit
conversion from enumeration type 'enum dma_data_direction' to different
enumeration type 'enum dma_transfer_direction' [-Wenum-conversion]
                                direction, DMA_CTRL_ACK);
                                ^~~~~~~~~
drivers/crypto/ux500/cryp/cryp_core.c:583:5: warning: implicit
conversion from enumeration type 'enum dma_data_direction' to different
enumeration type 'enum dma_transfer_direction' [-Wenum-conversion]
                                direction,
                                ^~~~~~~~~
2 warnings generated.

dmaengine_prep_slave_sg expects an enum from dma_transfer_direction.
Because we know the value of the dma_data_direction enum from the
switch statement, we can just use the proper value from
dma_transfer_direction so there is no more conversion.

DMA_TO_DEVICE = DMA_MEM_TO_DEV = 1
DMA_FROM_DEVICE = DMA_DEV_TO_MEM = 2
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

9d880c59

crypto: aesni - Add scatter/gather avx stubs, and use them in C · 603f8c3b

Dave Watson authored Dec 10, 2018

Add the appropriate scatter/gather stubs to the avx asm.
In the C code, we can now always use crypt_by_sg, since both
sse and asm code now support scatter/gather.

Introduce a new struct, aesni_gcm_tfm, that is initialized on
startup to point to either the SSE, AVX, or AVX2 versions of the
four necessary encryption/decryption routines.

GENX_OPTSIZE is still checked at the start of crypt_by_sg.  The
total size of the data is checked, since the additional overhead
is in the init function, calculating additional HashKeys.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

603f8c3b

crypto: aesni - Introduce partial block macro · e044d505

Dave Watson authored Dec 10, 2018

Before this diff, multiple calls to GCM_ENC_DEC will
succeed, but only if all calls are a multiple of 16 bytes.

Handle partial blocks at the start of GCM_ENC_DEC, and update
aadhash as appropriate.

The data offset %r11 is also updated after the partial block.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e044d505

crypto: aesni - Introduce READ_PARTIAL_BLOCK macro · ec8c02d9

Dave Watson authored Dec 10, 2018

Introduce READ_PARTIAL_BLOCK macro, and use it in the two existing
partial block cases: AAD and the end of ENC_DEC.   In particular,
the ENC_DEC case should be faster, since we read by 8/4 bytes if
possible.

This macro will also be used to read partial blocks between
enc_update and dec_update calls.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ec8c02d9

crypto: aesni - Move ghash_mul to GCM_COMPLETE · 517a448e

Dave Watson authored Dec 10, 2018

Prepare to handle partial blocks between scatter/gather calls.
For the last partial block, we only want to calculate the aadhash
in GCM_COMPLETE, and a new partial block macro will handle both
aadhash update and encrypting partial blocks between calls.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

517a448e

crypto: aesni - Fill in new context data structures · a44b419f

Dave Watson authored Dec 10, 2018

Fill in aadhash, aadlen, pblocklen, curcount with appropriate values.
pblocklen, aadhash, and pblockenckey are also updated at the end
of each scatter/gather operation, to be carried over to the next
operation.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a44b419f

crypto: aesni - Merge avx precompute functions · 1cb1bcbb

Dave Watson authored Dec 10, 2018

The precompute functions differ only by the sub-macros
they call, merge them to a single macro.   Later diffs
add more code to fill in the gcm_context_data structure,
this allows changes in a single place.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

1cb1bcbb

crypto: aesni - Split AAD hash calculation to separate macro · 38003cd2

Dave Watson authored Dec 10, 2018

AAD hash only needs to be calculated once for each scatter/gather operation.
Move it to its own macro, and call it from GCM_INIT instead of
INITIAL_BLOCKS.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

38003cd2

crypto: aesni - Add GCM_COMPLETE macro · e377bedb

Dave Watson authored Dec 10, 2018

Merge encode and decode tag calculations in GCM_COMPLETE macro.
Scatter/gather routines will call this once at the end of encryption
or decryption.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e377bedb

crypto: aesni - support 256 byte keys in avx asm · 5350b0f5

Dave Watson authored Dec 10, 2018

Add support for 192/256-bit keys using the avx gcm/aes routines.
The sse routines were previously updated in e31ac32d (Add support
for 192 & 256 bit keys to AESNI RFC4106).

Instead of adding an additional loop in the hotpath as in e31ac32d,
this diff instead generates separate versions of the code using macros,
and the entry routines choose which version once. This results
in a 5% performance improvement vs. adding a loop to the hot path.
This is the same strategy chosen by the intel isa-l_crypto library.

The key size checks are removed from the c code where appropriate.

Note that this diff depends on using gcm_context_data - 256 bit keys
require 16 HashKeys + 15 expanded keys, which is larger than
struct crypto_aes_ctx, so they are stored in struct gcm_context_data.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

5350b0f5

crypto: aesni - Macro-ify func save/restore · 2426f64b

Dave Watson authored Dec 10, 2018

Macro-ify function save and restore.  These will be used in new functions
added for scatter/gather update operations.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

2426f64b

crypto: aesni - Introduce gcm_context_data · de85fc46

Dave Watson authored Dec 10, 2018

Add the gcm_context_data structure to the avx asm routines.
This will be necessary to support both 256 bit keys and
scatter/gather.

The pre-computed HashKeys are now stored in the gcm_context_data
struct, which is expanded to hold the greater number of hashkeys
necessary for avx.

Loads and stores to the new struct are always done unlaligned to
avoid compiler issues, see e5b954e8 "Use unaligned loads from
gcm_context_data"
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

de85fc46

crypto: aesni - Merge GCM_ENC_DEC · f9b1d646

Dave Watson authored Dec 10, 2018

The GCM_ENC_DEC routines for AVX and AVX2 are identical, except they
call separate sub-macros.  Pass the macros as arguments, and merge them.
This facilitates additional refactoring, by requiring changes in only
one place.

The GCM_ENC_DEC macro was moved above the CONFIG_AS_AVX* ifdefs,
since it will be used by both AVX and AVX2.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

f9b1d646

13 Dec, 2018 20 commits

crypto: adiantum - fix leaking reference to hash algorithm · 00c9fe37

Eric Biggers authored Dec 10, 2018

crypto_alg_mod_lookup() takes a reference to the hash algorithm but
crypto_init_shash_spawn() doesn't take ownership of it, hence the
reference needs to be dropped in adiantum_create().

Fixes: 059c2a4d ("crypto: adiantum - add Adiantum support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

00c9fe37

crypto: user - support incremental algorithm dumps · 0ac6b8fb

Eric Biggers authored Dec 06, 2018

CRYPTO_MSG_GETALG in NLM_F_DUMP mode sometimes doesn't return all
registered crypto algorithms, because it doesn't support incremental
dumps. crypto_dump_report() only permits itself to be called once, yet
the netlink subsystem allocates at most ~64 KiB for the skb being dumped
to. Thus only the first recvmsg() returns data, and it may only include
a subset of the crypto algorithms even if the user buffer passed to
recvmsg() is large enough to hold all of them.

Fix this by using one of the arguments in the netlink_callback structure
to keep track of the current position in the algorithm list. Then
userspace can do multiple recvmsg() on the socket after sending the dump
request. This is the way netlink dumps work elsewhere in the kernel;
it's unclear why this was different (probably just an oversight).

Also fix an integer overflow when calculating the dump buffer size hint.

Fixes: a38f7907 ("crypto: Add userspace configuration API")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

0ac6b8fb

crypto: adiantum - adjust some comments to match latest paper · c6018e1a

Eric Biggers authored Dec 06, 2018

The 2018-11-28 revision of the Adiantum paper has revised some notation:

- 'M' was replaced with 'L' (meaning "Left", for the left-hand part of
  the message) in the definition of Adiantum hashing, to avoid confusion
  with the full message
- ε-almost-∆-universal is now abbreviated as ε-∆U instead of εA∆U
- "block" is now used only to mean block cipher and Poly1305 blocks

Also, Adiantum hashing was moved from the appendix to the main paper.

To avoid confusion, update relevant comments in the code to match.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c6018e1a

crypto: xchacha20 - fix comments for test vectors · 282c1485

Eric Biggers authored Dec 06, 2018

The kernel's ChaCha20 uses the RFC7539 convention of the nonce being 12
bytes rather than 8, so actually I only appended 12 random bytes (not
16) to its test vectors to form 24-byte nonces for the XChaCha20 test
vectors. The other 4 bytes were just from zero-padding the stream
position to 8 bytes. Fix the comments above the test vectors.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

282c1485

crypto: xchacha - add test vector from XChaCha20 draft RFC · 5569e8c0

Eric Biggers authored Dec 06, 2018

There is a draft specification for XChaCha20 being worked on.  Add the
XChaCha20 test vector from the appendix so that we can be extra sure the
kernel's implementation is compatible.

I also recomputed the ciphertext with XChaCha12 and added it there too,
to keep the tests for XChaCha20 and XChaCha12 in sync.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

5569e8c0

crypto: x86/chacha - yield the FPU occasionally · a033aed5

Eric Biggers authored Dec 04, 2018

To improve responsiveness, yield the FPU (temporarily re-enabling
preemption) every 4 KiB encrypted/decrypted, rather than keeping
preemption disabled during the entire encryption/decryption operation.

Alternatively we could do this for every skcipher_walk step, but steps
may be small in some cases, and yielding the FPU is expensive on x86.
Suggested-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a033aed5

crypto: x86/chacha - add XChaCha12 support · 7a507d62

Eric Biggers authored Dec 04, 2018

Now that the x86_64 SIMD implementations of ChaCha20 and XChaCha20 have
been refactored to support varying the number of rounds, add support for
XChaCha12. This is identical to XChaCha20 except for the number of
rounds, which is 12 instead of 20. This can be used by Adiantum.
Reviewed-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

7a507d62

crypto: x86/chacha20 - refactor to allow varying number of rounds · 8b65f34c

Eric Biggers authored Dec 04, 2018

In preparation for adding XChaCha12 support, rename/refactor the x86_64
SIMD implementations of ChaCha20 to support different numbers of rounds.
Reviewed-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8b65f34c

crypto: x86/chacha20 - add XChaCha20 support · 4af78261

Eric Biggers authored Dec 04, 2018

Add an XChaCha20 implementation that is hooked up to the x86_64 SIMD
implementations of ChaCha20. This can be used by Adiantum.

An SSSE3 implementation of single-block HChaCha20 is also added so that
XChaCha20 can use it rather than the generic implementation. This
required refactoring the ChaCha permutation into its own function.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

4af78261

crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305 · 0f961f9f

Eric Biggers authored Dec 04, 2018

Add a 64-bit AVX2 implementation of NHPoly1305, an ε-almost-∆-universal
hash function used in the Adiantum encryption mode.  For now, only the
NH portion is actually AVX2-accelerated; the Poly1305 part is less
performance-critical so is just implemented in C.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

0f961f9f

crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305 · 012c8238

Eric Biggers authored Dec 04, 2018

Add a 64-bit SSE2 implementation of NHPoly1305, an ε-almost-∆-universal
hash function used in the Adiantum encryption mode.  For now, only the
NH portion is actually SSE2-accelerated; the Poly1305 part is less
performance-critical so is just implemented in C.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

012c8238

crypto: adiantum - propagate CRYPTO_ALG_ASYNC flag to instance · b299362e

Eric Biggers authored Dec 04, 2018

If the stream cipher implementation is asynchronous, then the Adiantum
instance must be flagged as asynchronous as well. Otherwise someone
asking for a synchronous algorithm can get an asynchronous algorithm.

There are no asynchronous xchacha12 or xchacha20 implementations yet
which makes this largely a theoretical issue, but it should be fixed.

Fixes: 059c2a4d ("crypto: adiantum - add Adiantum support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

b299362e

crypto: arm64/chacha - use combined SIMD/ALU routine for more speed · 2fe55987

Ard Biesheuvel authored Dec 04, 2018

To some degree, most known AArch64 micro-architectures appear to be
able to issue ALU instructions in parellel to SIMD instructions
without affecting the SIMD throughput. This means we can use the ALU
to process a fifth ChaCha block while the SIMD is processing four
blocks in parallel.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

2fe55987

crypto: arm64/chacha - optimize for arbitrary length inputs · f2ca1cbd

Ard Biesheuvel authored Dec 04, 2018

Update the 4-way NEON ChaCha routine so it can handle input of any
length >64 bytes in its entirety, rather than having to call into
the 1-way routine and/or memcpy()s via temp buffers to handle the
tail of a ChaCha invocation that is not a multiple of 256 bytes.

On inputs that are a multiple of 256 bytes (and thus in tcrypt
benchmarks), performance drops by around 1% on Cortex-A57, while
performance for inputs drawn randomly from the range [64, 1024)
increases by around 30%.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

f2ca1cbd

crypto: tcrypt - add block size of 1472 to skcipher template · ee5bbc9f

Ard Biesheuvel authored Dec 04, 2018

In order to have better coverage of algorithms operating on block
sizes that are in the ballpark of a VPN  packet, add 1472 to the
block_sizes array.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ee5bbc9f

crypto: cavium/nitrox - Enabled Mailbox support · cf718eaa

Srikanth, Jampala authored Dec 04, 2018

Enabled the PF->VF Mailbox support. Mailbox message are interpreted
as {type, opcode, data}. Supported message types are REQ, ACK and NACK.
Signed-off-by: Srikanth Jampala <Jampala.Srikanth@cavium.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

cf718eaa

crypto: arm64/chacha - add XChaCha12 support · 19c11c97

Eric Biggers authored Dec 03, 2018

Now that the ARM64 NEON implementation of ChaCha20 and XChaCha20 has
been refactored to support varying the number of rounds, add support for
XChaCha12. This is identical to XChaCha20 except for the number of
rounds, which is 12 instead of 20. This can be used by Adiantum.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

19c11c97

crypto: arm64/chacha20 - refactor to allow varying number of rounds · 95a34b77

Eric Biggers authored Dec 03, 2018

In preparation for adding XChaCha12 support, rename/refactor the ARM64
NEON implementation of ChaCha20 to support different numbers of rounds.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

95a34b77

crypto: arm64/chacha20 - add XChaCha20 support · cc7cf991

Eric Biggers authored Dec 03, 2018

Add an XChaCha20 implementation that is hooked up to the ARM64 NEON
implementation of ChaCha20.  This can be used by Adiantum.

A NEON implementation of single-block HChaCha20 is also added so that
XChaCha20 can use it rather than the generic implementation.  This
required refactoring the ChaCha20 permutation into its own function.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

cc7cf991

crypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305 · a00fa0c8

Eric Biggers authored Dec 03, 2018

Add an ARM64 NEON implementation of NHPoly1305, an ε-almost-∆-universal
hash function used in the Adiantum encryption mode.  For now, only the
NH portion is actually NEON-accelerated; the Poly1305 part is less
performance-critical so is just implemented in C.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> # big-endian
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a00fa0c8

07 Dec, 2018 2 commits

crypto: cavium/nitrox - convert to DEFINE_SHOW_ATTRIBUTE · 88d905e2

Yangtao Li authored Dec 01, 2018

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

88d905e2

crypto: chcr - ESN for Inline IPSec Tx · 8362ea16

Atul Gupta authored Nov 30, 2018

Send SPI, 64b seq nos and 64b IV with aadiv drop for inline crypto.
This information is added in outgoing packet after the CPL TX PKT XT
and removed by hardware.
The aad, auth and cipher offsets are then adjusted for ESN enabled tunnel.
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8362ea16