• Jason A. Donenfeld's avatar
    random32: use real rng for non-deterministic randomness · d4150779
    Jason A. Donenfeld authored
    random32.c has two random number generators in it: one that is meant to
    be used deterministically, with some predefined seed, and one that does
    the same exact thing as random.c, except does it poorly. The first one
    has some use cases. The second one no longer does and can be replaced
    with calls to random.c's proper random number generator.
    
    The relatively recent siphash-based bad random32.c code was added in
    response to concerns that the prior random32.c was too deterministic.
    Out of fears that random.c was (at the time) too slow, this code was
    anonymously contributed. Then out of that emerged a kind of shadow
    entropy gathering system, with its own tentacles throughout various net
    code, added willy nilly.
    
    Stop👏making👏bespoke👏random👏number👏generators👏.
    
    Fortunately, recent advances in random.c mean that we can stop playing
    with this sketchiness, and just use get_random_u32(), which is now fast
    enough. In micro benchmarks using RDPMC, I'm seeing the same median
    cycle count between the two functions, with the mean being _slightly_
    higher due to batches refilling (which we can optimize further need be).
    However, when doing *real* benchmarks of the net functions that actually
    use these random numbers, the mean cycles actually *decreased* slightly
    (with the median still staying the same), likely because the additional
    prandom code means icache misses and complexity, whereas random.c is
    generally already being used by something else nearby.
    
    The biggest benefit of this is that there are many users of prandom who
    probably should be using cryptographically secure random numbers. This
    makes all of those accidental cases become secure by just flipping a
    switch. Later on, we can do a tree-wide cleanup to remove the static
    inline wrapper functions that this commit adds.
    
    There are also some low-ish hanging fruits for making this even faster
    in the future: a get_random_u16() function for use in the networking
    stack will give a 2x performance boost there, using SIMD for ChaCha20
    will let us compute 4 or 8 or 16 blocks of output in parallel, instead
    of just one, giving us large buffers for cheap, and introducing a
    get_random_*_bh() function that assumes irqs are already disabled will
    shave off a few cycles for ordinary calls. These are things we can chip
    away at down the road.
    Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
    Acked-by: default avatarTheodore Ts'o <tytso@mit.edu>
    Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
    d4150779
dev.c 284 KB