• Jussi Kivilinna's avatar
    crypto: cast6-avx - tune assembler code for more performance · c09220e1
    Jussi Kivilinna authored
    Patch replaces 'movb' instructions with 'movzbl' to break false register
    dependencies, interleaves instructions better for out-of-order scheduling
    and merges constant 16-bit rotation with round-key variable rotation.
    
    tcrypt ECB results:
    
    Intel Core i5-2450M:
    
    size    old-vs-new      new-vs-generic  old-vs-generic
            enc     dec     enc     dec     enc     dec
    256     1.13x   1.19x   2.05x   2.17x   1.82x   1.82x
    1k      1.18x   1.21x   2.26x   2.33x   1.93x   1.93x
    8k      1.19x   1.19x   2.32x   2.33x   1.95x   1.95x
    
    [v2]
     - Do instruction interleaving another way to avoid adding new FPU<=>CPU
       register moves as these cause performance drop on Bulldozer.
     - Improvements to round-key variable rotation handling.
     - Further interleaving improvements for better out-of-order scheduling.
    
    Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
    Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    c09220e1
cast6-avx-x86_64-asm_64.S 9.01 KB