Commits · b299362ee48db8eab34208302ee9730ff9d6091c · nexedi / linux

13 Dec, 2018 9 commits

crypto: adiantum - propagate CRYPTO_ALG_ASYNC flag to instance · b299362e

Eric Biggers authored Dec 04, 2018

If the stream cipher implementation is asynchronous, then the Adiantum
instance must be flagged as asynchronous as well. Otherwise someone
asking for a synchronous algorithm can get an asynchronous algorithm.

There are no asynchronous xchacha12 or xchacha20 implementations yet
which makes this largely a theoretical issue, but it should be fixed.

Fixes: 059c2a4d ("crypto: adiantum - add Adiantum support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

b299362e

crypto: arm64/chacha - use combined SIMD/ALU routine for more speed · 2fe55987

Ard Biesheuvel authored Dec 04, 2018

To some degree, most known AArch64 micro-architectures appear to be
able to issue ALU instructions in parellel to SIMD instructions
without affecting the SIMD throughput. This means we can use the ALU
to process a fifth ChaCha block while the SIMD is processing four
blocks in parallel.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

2fe55987

crypto: arm64/chacha - optimize for arbitrary length inputs · f2ca1cbd

Ard Biesheuvel authored Dec 04, 2018

Update the 4-way NEON ChaCha routine so it can handle input of any
length >64 bytes in its entirety, rather than having to call into
the 1-way routine and/or memcpy()s via temp buffers to handle the
tail of a ChaCha invocation that is not a multiple of 256 bytes.

On inputs that are a multiple of 256 bytes (and thus in tcrypt
benchmarks), performance drops by around 1% on Cortex-A57, while
performance for inputs drawn randomly from the range [64, 1024)
increases by around 30%.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

f2ca1cbd

crypto: tcrypt - add block size of 1472 to skcipher template · ee5bbc9f

Ard Biesheuvel authored Dec 04, 2018

In order to have better coverage of algorithms operating on block
sizes that are in the ballpark of a VPN  packet, add 1472 to the
block_sizes array.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ee5bbc9f

crypto: cavium/nitrox - Enabled Mailbox support · cf718eaa

Srikanth, Jampala authored Dec 04, 2018

Enabled the PF->VF Mailbox support. Mailbox message are interpreted
as {type, opcode, data}. Supported message types are REQ, ACK and NACK.
Signed-off-by: Srikanth Jampala <Jampala.Srikanth@cavium.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

cf718eaa

crypto: arm64/chacha - add XChaCha12 support · 19c11c97

Eric Biggers authored Dec 03, 2018

Now that the ARM64 NEON implementation of ChaCha20 and XChaCha20 has
been refactored to support varying the number of rounds, add support for
XChaCha12. This is identical to XChaCha20 except for the number of
rounds, which is 12 instead of 20. This can be used by Adiantum.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

19c11c97

crypto: arm64/chacha20 - refactor to allow varying number of rounds · 95a34b77

Eric Biggers authored Dec 03, 2018

In preparation for adding XChaCha12 support, rename/refactor the ARM64
NEON implementation of ChaCha20 to support different numbers of rounds.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

95a34b77

crypto: arm64/chacha20 - add XChaCha20 support · cc7cf991

Eric Biggers authored Dec 03, 2018

Add an XChaCha20 implementation that is hooked up to the ARM64 NEON
implementation of ChaCha20.  This can be used by Adiantum.

A NEON implementation of single-block HChaCha20 is also added so that
XChaCha20 can use it rather than the generic implementation.  This
required refactoring the ChaCha20 permutation into its own function.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

cc7cf991

crypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305 · a00fa0c8

Eric Biggers authored Dec 03, 2018

Add an ARM64 NEON implementation of NHPoly1305, an ε-almost-∆-universal
hash function used in the Adiantum encryption mode.  For now, only the
NH portion is actually NEON-accelerated; the Poly1305 part is less
performance-critical so is just implemented in C.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> # big-endian
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a00fa0c8

07 Dec, 2018 20 commits

crypto: cavium/nitrox - convert to DEFINE_SHOW_ATTRIBUTE · 88d905e2

Yangtao Li authored Dec 01, 2018

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

88d905e2

crypto: chcr - ESN for Inline IPSec Tx · 8362ea16

Atul Gupta authored Nov 30, 2018

Send SPI, 64b seq nos and 64b IV with aadiv drop for inline crypto.
This information is added in outgoing packet after the CPL TX PKT XT
and removed by hardware.
The aad, auth and cipher offsets are then adjusted for ESN enabled tunnel.
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8362ea16

crypto: chcr - small packet Tx stalls the queue · c35828ea

Atul Gupta authored Nov 30, 2018

Immediate packets sent to hardware should include the work
request length in calculating the flits. WR occupy one flit and
if not accounted result in invalid request which stalls the HW
queue.

Cc: stable@vger.kernel.org
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c35828ea

crypto: user - Add crypto_stats_init · 1f6669b9

Corentin Labbe authored Nov 29, 2018

This patch add the crypto_stats_init() function.
This will permit to remove some ifdef from __crypto_register_alg().
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

1f6669b9

crypto: user - rename err_cnt parameter · 44f13133

Corentin Labbe authored Nov 29, 2018

Since now all crypto stats are on their own structures, it is now
useless to have the algorithm name in the err_cnt member.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

44f13133

crypto: user - Split stats in multiple structures · 17c18f9e

Corentin Labbe authored Nov 29, 2018

Like for userspace, this patch splits stats into multiple structures,
one for each algorithm class.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

17c18f9e

crypto: user - remove intermediate variable · 5fff8172

Corentin Labbe authored Nov 29, 2018

The use of the v64 intermediate variable is useless, and removing it
bring to much readable code.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

5fff8172

crypto: user - Fix invalid stat reporting · b0af91c1

Corentin Labbe authored Nov 29, 2018

Some error count use the wrong name for getting this data.
But this had not caused any reporting problem, since all error count are shared in the same
union.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

b0af91c1

crypto: user - fix use_after_free of struct xxx_request · f7d76e05

Corentin Labbe authored Nov 29, 2018

All crypto_stats functions use the struct xxx_request for feeding stats,
but in some case this structure could already be freed.

For fixing this, the needed parameters (len and alg) will be stored
before the request being executed.
Fixes: cac5818c ("crypto: user - Implement a generic crypto statistics")
Reported-by: syzbot <syzbot+6939a606a5305e9e9799@syzkaller.appspotmail.com>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

f7d76e05

crypto: tool: getstat: convert user space example to the new crypto_user_stat uapi · 76d09ea7

Corentin Labbe authored Nov 29, 2018

This patch converts the getstat example tool to the recent changes done in crypto_user_stat
- changed all stats to u64
- separated struct stats for each crypto alg
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

76d09ea7

crypto: user - split user space crypto stat structures · 7f0a9d5c

Corentin Labbe authored Nov 29, 2018

It is cleaner to have each stat in their own structures.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

7f0a9d5c

crypto: user - convert all stats from u32 to u64 · 6e8e72cd

Corentin Labbe authored Nov 29, 2018

All the 32-bit fields need to be 64-bit.  In some cases, UINT32_MAX crypto
operations can be done in seconds.
Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

6e8e72cd

crypto: user - CRYPTO_STATS should depend on CRYPTO_USER · a6a31385

Corentin Labbe authored Nov 29, 2018

CRYPTO_STATS is using CRYPTO_USER stuff, so it should depends on it.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a6a31385

crypto: user - made crypto_user_stat optional · 2ced2607

Corentin Labbe authored Nov 29, 2018

Even if CRYPTO_STATS is set to n, some part of CRYPTO_STATS are
compiled.
This patch made all part of crypto_user_stat uncompiled in that case.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

2ced2607

MAINTAINERS: change NX/VMX maintainers · c97e4df5

Paulo Flabiano Smorigo authored Nov 27, 2018

Add Breno and Nayna as NX/VMX crypto driver maintainers. Also change my
email address to my personal account and remove Leonidas since he's not
working with the driver anymore.
Signed-off-by: Paulo Flabiano Smorigo <pfsmorigo@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c97e4df5

MAINTAINERS: ccree: add co-maintainer · 18596781

Gilad Ben-Yossef authored Nov 13, 2018

Add Yael Chemla as co-maintainer of Arm TrustZone CryptoCell REE
driver.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

18596781

dt-bindings: crypto: ccree: add dt bindings for ccree 703 · fefbc0b4

Gilad Ben-Yossef authored Nov 13, 2018

Add device tree bindings associating Arm TrustZone CryptoCell 703 with the
ccree driver.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

fefbc0b4

crypto: ccree - add support for CryptoCell 703 · 1c876a90

Gilad Ben-Yossef authored Nov 13, 2018

Add support for Arm TrustZone CryptoCell 703.
The 703 is a variant of the CryptoCell 713 that supports only
algorithms certified by the Chinesse Office of the State Commercial
Cryptography Administration (OSCCA).
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

1c876a90

Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 946dca8f
Herbert Xu authored Dec 07, 2018
```
Merge crypto tree to pick up crypto stats API revert.
```
946dca8f

crypto: user - Disable statistics interface · e61efff4

Herbert Xu authored Dec 07, 2018

Since this user-space API is still undergoing significant changes,
this patch disables it for the current merge window.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e61efff4

29 Nov, 2018 6 commits

crypto: cavium/nitrox - Enable interrups for PF in SR-IOV mode. · 7a027b57

Srikanth, Jampala authored Nov 21, 2018

Enable the available interrupt vectors for PF in SR-IOV Mode.
Only single vector entry 192 is valid of PF. This is used to
notify any hardware errors and mailbox messages from VF(s).
Signed-off-by: Srikanth Jampala <Jampala.Srikanth@cavium.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

7a027b57

crypto: cavium/nitrox - crypto request format changes · 4bede34c

Nagadheeraj, Rottela authored Nov 21, 2018

nitrox_skcipher_crypt() will do the necessary formatting/ordering of
input and output sglists based on the algorithm requirements.
It will also accommodate the mandatory output buffers required for
NITROX hardware like Output request headers (ORH) and Completion headers.
Signed-off-by: Nagadheeraj Rottela <rottela.nagadheeraj@cavium.com>
Reviewed-by: Srikanth Jampala <Jampala.Srikanth@cavium.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

4bede34c

crypto: x86/chacha20 - Add a 4-block AVX-512VL variant · 180def6c

Martin Willi authored Nov 20, 2018

This version uses the same principle as the AVX2 version by scheduling the
operations for two block pairs in parallel. It benefits from the AVX-512VL
rotate instructions and the more efficient partial block handling using
"vmovdqu8", resulting in a speedup of the raw block function of ~20%.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

180def6c

crypto: x86/chacha20 - Add a 2-block AVX-512VL variant · 29a47b54

Martin Willi authored Nov 20, 2018

This version uses the same principle as the AVX2 version. It benefits
from the AVX-512VL rotate instructions and the more efficient partial
block handling using "vmovdqu8", resulting in a speedup of ~20%.

Unlike the AVX2 version, it is faster than the single block SSSE3 version
to process a single block. Hence we engage that function for (partial)
single block lengths as well.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

29a47b54

crypto: x86/chacha20 - Add a 8-block AVX-512VL variant · cee7a36e

Martin Willi authored Nov 20, 2018

This variant is similar to the AVX2 version, but benefits from the AVX-512
rotate instructions and the additional registers, so it can operate without
any data on the stack. It uses ymm registers only to avoid the massive core
throttling on Skylake-X platforms. Nontheless does it bring a ~30% speed
improvement compared to the AVX2 variant for random encryption lengths.

The AVX2 version uses "rep movsb" for partial block XORing via the stack.
With AVX-512, the new "vmovdqu8" can do this much more efficiently. The
associated "kmov" instructions to work with dynamic masks is not part of
the AVX-512VL instruction set, hence we depend on AVX-512BW as well. Given
that the major AVX-512VL architectures provide AVX-512BW and this extension
does not affect core clocking, this seems to be no problem at least for
now.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

cee7a36e

crypto: do not free algorithm before using · e5bde04c

Pan Bian authored Nov 22, 2018

In multiple functions, the algorithm fields are read after its reference
is dropped through crypto_mod_put. In this case, the algorithm memory
may be freed, resulting in use-after-free bugs. This patch delays the
put operation until the algorithm is never used.

Fixes: 79c65d17 ("crypto: cbc - Convert to skcipher")
Fixes: a7d85e06 ("crypto: cfb - add support for Cipher FeedBack mode")
Fixes: 043a4400 ("crypto: pcbc - Convert to skcipher")
Cc: <stable@vger.kernel.org>
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e5bde04c

20 Nov, 2018 5 commits

crypto: adiantum - add Adiantum support · 059c2a4d

Eric Biggers authored Nov 16, 2018

Add support for the Adiantum encryption mode. Adiantum was designed by
Paul Crowley and is specified by our paper:

Adiantum: length-preserving encryption for entry-level processors
(https://eprint.iacr.org/2018/720.pdf)

See our paper for full details; this patch only provides an overview.

Adiantum is a tweakable, length-preserving encryption mode designed for
fast and secure disk encryption, especially on CPUs without dedicated
crypto instructions. Adiantum encrypts each sector using the XChaCha12
stream cipher, two passes of an ε-almost-∆-universal (εA∆U) hash
function, and an invocation of the AES-256 block cipher on a single
16-byte block. On CPUs without AES instructions, Adiantum is much
faster than AES-XTS; for example, on ARM Cortex-A7, on 4096-byte sectors
Adiantum encryption is about 4 times faster than AES-256-XTS encryption,
and decryption about 5 times faster.

Adiantum is a specialization of the more general HBSH construction. Our
earlier proposal, HPolyC, was also a HBSH specialization, but it used a
different εA∆U hash function, one based on Poly1305 only. Adiantum's
εA∆U hash function, which is based primarily on the "NH" hash function
like that used in UMAC (RFC4418), is about twice as fast as HPolyC's;
consequently, Adiantum is about 20% faster than HPolyC.

This speed comes with no loss of security: Adiantum is provably just as
secure as HPolyC, in fact slightly *more* secure. Like HPolyC,
Adiantum's security is reducible to that of XChaCha12 and AES-256,
subject to a security bound. XChaCha12 itself has a security reduction
to ChaCha12. Therefore, one need not "trust" Adiantum; one need only
trust ChaCha12 and AES-256. Note that the εA∆U hash function is only
used for its proven combinatorical properties so cannot be "broken".

Adiantum is also a true wide-block encryption mode, so flipping any
plaintext bit in the sector scrambles the entire ciphertext, and vice
versa. No other such mode is available in the kernel currently; doing
the same with XTS scrambles only 16 bytes. Adiantum also supports
arbitrary-length tweaks and naturally supports any length input >= 16
bytes without needing "ciphertext stealing".

For the stream cipher, Adiantum uses XChaCha12 rather than XChaCha20 in
order to make encryption feasible on the widest range of devices.
Although the 20-round variant is quite popular, the best known attacks
on ChaCha are on only 7 rounds, so ChaCha12 still has a substantial
security margin; in fact, larger than AES-256's. 12-round Salsa20 is
also the eSTREAM recommendation. For the block cipher, Adiantum uses
AES-256, despite it having a lower security margin than XChaCha12 and
needing table lookups, due to AES's extensive adoption and analysis
making it the obvious first choice. Nevertheless, for flexibility this
patch also permits the "adiantum" template to be instantiated with
XChaCha20 and/or with an alternate block cipher.

We need Adiantum support in the kernel for use in dm-crypt and fscrypt,
where currently the only other suitable options are block cipher modes
such as AES-XTS. A big problem with this is that many low-end mobile
devices (e.g. Android Go phones sold primarily in developing countries,
as well as some smartwatches) still have CPUs that lack AES
instructions, e.g. ARM Cortex-A7. Sadly, AES-XTS encryption is much too
slow to be viable on these devices. We did find that some "lightweight"
block ciphers are fast enough, but these suffer from problems such as
not having much cryptanalysis or being too controversial.

The ChaCha stream cipher has excellent performance but is insecure to
use directly for disk encryption, since each sector's IV is reused each
time it is overwritten. Even restricting the threat model to offline
attacks only isn't enough, since modern flash storage devices don't
guarantee that "overwrites" are really overwrites, due to wear-leveling.
Adiantum avoids this problem by constructing a
"tweakable super-pseudorandom permutation"; this is the strongest
possible security model for length-preserving encryption.

Of course, storing random nonces along with the ciphertext would be the
ideal solution. But doing that with existing hardware and filesystems
runs into major practical problems; in most cases it would require data
journaling (like dm-integrity) which severely degrades performance.
Thus, for now length-preserving encryption is still needed.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

059c2a4d

crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305 · 16aae359

Eric Biggers authored Nov 16, 2018

Add an ARM NEON implementation of NHPoly1305, an ε-almost-∆-universal
hash function used in the Adiantum encryption mode.  For now, only the
NH portion is actually NEON-accelerated; the Poly1305 part is less
performance-critical so is just implemented in C.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

16aae359

crypto: nhpoly1305 - add NHPoly1305 support · 26609a21

Eric Biggers authored Nov 16, 2018

Add a generic implementation of NHPoly1305, an ε-almost-∆-universal hash
function used in the Adiantum encryption mode.

CONFIG_NHPOLY1305 is not selectable by itself since there won't be any
real reason to enable it without also enabling Adiantum support.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

26609a21

crypto: poly1305 - add Poly1305 core API · 1b6fd3d5

Eric Biggers authored Nov 16, 2018

Expose a low-level Poly1305 API which implements the
ε-almost-∆-universal (εA∆U) hash function underlying the Poly1305 MAC
and supports block-aligned inputs only.

This is needed for Adiantum hashing, which builds an εA∆U hash function
from NH and a polynomial evaluation in GF(2^{130}-5); this polynomial
evaluation is identical to the one the Poly1305 MAC does.  However, the
crypto_shash Poly1305 API isn't very appropriate for this because its
calling convention assumes it is used as a MAC, with a 32-byte "one-time
key" provided for every digest.

But by design, in Adiantum hashing the performance of the polynomial
evaluation isn't nearly as critical as NH.  So it suffices to just have
some C helper functions.  Thus, this patch adds such functions.
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

1b6fd3d5

crypto: poly1305 - use structures for key and accumulator · 878afc35

Eric Biggers authored Nov 16, 2018

In preparation for exposing a low-level Poly1305 API which implements
the ε-almost-∆-universal (εA∆U) hash function underlying the Poly1305
MAC and supports block-aligned inputs only, create structures
poly1305_key and poly1305_state which hold the limbs of the Poly1305
"r" key and accumulator, respectively.

These structures could actually have the same type (e.g. poly1305_val),
but different types are preferable, to prevent misuse.
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

878afc35