• Eric Biggers's avatar
    dm-verity: hash blocks with shash import+finup when possible · b76ad884
    Eric Biggers authored
    Currently dm-verity computes the hash of each block by using multiple
    calls to the "ahash" crypto API.  While the exact sequence depends on
    the chosen dm-verity settings, in the vast majority of cases it is:
    
        1. crypto_ahash_init()
        2. crypto_ahash_update() [salt]
        3. crypto_ahash_update() [data]
        4. crypto_ahash_final()
    
    This is inefficient for two main reasons:
    
    - It makes multiple indirect calls, which is expensive on modern CPUs
      especially when mitigations for CPU vulnerabilities are enabled.
    
      Since the salt is the same across all blocks on a given dm-verity
      device, a much more efficient sequence would be to do an import of the
      pre-salted state, then a finup.
    
    - It uses the ahash (asynchronous hash) API, despite the fact that
      CPU-based hashing is almost always used in practice, and therefore it
      experiences the overhead of the ahash-based wrapper for shash.
    
      Because dm-verity was intentionally converted to ahash to support
      off-CPU crypto accelerators, a full reversion to shash might not be
      acceptable.  Yet, we should still provide a fast path for shash with
      the most common dm-verity settings.
    
      Another reason for shash over ahash is that the upcoming multibuffer
      hashing support, which is specific to CPU-based hashing, is much
      better suited for shash than for ahash.  Supporting it via ahash would
      add significant complexity and overhead.  And it's not possible for
      the "same" code to properly support both multibuffer hashing and HW
      accelerators at the same time anyway, given the different computation
      models.  Unfortunately there will always be code specific to each
      model needed (for users who want to support both).
    
    Therefore, this patch adds a new shash import+finup based fast path to
    dm-verity.  It is used automatically when appropriate.  This makes
    dm-verity optimized for what the vast majority of users want: CPU-based
    hashing with the most common settings, while still retaining support for
    rarer settings and off-CPU crypto accelerators.
    
    In benchmarks with veritysetup's default parameters (SHA-256, 4K data
    and hash block sizes, 32-byte salt), which also match the parameters
    that Android currently uses, this patch improves block hashing
    performance by about 15% on x86_64 using the SHA-NI instructions, or by
    about 5% on arm64 using the ARMv8 SHA2 instructions.  On x86_64 roughly
    two-thirds of the improvement comes from the use of import and finup,
    while the remaining third comes from the switch from ahash to shash.
    
    Note that another benefit of using "import" to handle the salt is that
    if the salt size is equal to the input size of the hash algorithm's
    compression function, e.g. 64 bytes for SHA-256, then the performance is
    exactly the same as no salt.  This doesn't seem to be much better than
    veritysetup's current default of 32-byte salts, due to the way SHA-256's
    finalization padding works, but it should be marginally better.
    Reviewed-by: default avatarSami Tolvanen <samitolvanen@google.com>
    Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
    Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
    Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
    b76ad884
dm-verity.h 3.9 KB