• LEROY Christophe's avatar
    crypto: talitos - chain in buffered data for ahash on SEC1 · 37b5e889
    LEROY Christophe authored
    SEC1 doesn't support S/G in descriptors so for hash operations,
    the CPU has to build a buffer containing the buffered block and
    the incoming data. This generates a lot of memory copies which
    represents more than 50% of CPU time of a md5sum operation as
    shown below with a 'perf record'.
    
    |--86.24%-- kcapi_md_digest
    |          |
    |          |--86.18%-- _kcapi_common_vmsplice_chunk_fd
    |          |          |
    |          |          |--83.68%-- splice
    |          |          |          |
    |          |          |          |--83.59%-- ret_from_syscall
    |          |          |          |          |
    |          |          |          |          |--83.52%-- sys_splice
    |          |          |          |          |          |
    |          |          |          |          |          |--83.49%-- splice_from_pipe
    |          |          |          |          |          |          |
    |          |          |          |          |          |          |--83.04%-- __splice_from_pipe
    |          |          |          |          |          |          |          |
    |          |          |          |          |          |          |          |--80.67%-- pipe_to_sendpage
    |          |          |          |          |          |          |          |          |
    |          |          |          |          |          |          |          |          |--78.25%-- hash_sendpage
    |          |          |          |          |          |          |          |          |          |
    |          |          |          |          |          |          |          |          |          |--60.08%-- ahash_process_req
    |          |          |          |          |          |          |          |          |          |          |
    |          |          |          |          |          |          |          |          |          |          |--56.36%-- sg_copy_buffer
    |          |          |          |          |          |          |          |          |          |          |          |
    |          |          |          |          |          |          |          |          |          |          |          |--55.29%-- memcpy
    |          |          |          |          |          |          |          |          |          |          |          |
    
    However, unlike SEC2+, SEC1 offers the possibility to chain
    descriptors. It is therefore possible to build a first descriptor
    pointing to the buffered data and a second descriptor pointing to
    the incoming data, hence avoiding the memory copy to a single
    buffer.
    
    With this patch, the time necessary for a md5sum on a 90Mbytes file
    is approximately 3 seconds. Without the patch it takes 6 seconds.
    Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    37b5e889
talitos.h 15.5 KB