• Jussi Kivilinna's avatar
    crypto: twofish-avx - tune assembler code for more performance · f94a73f8
    Jussi Kivilinna authored
    Patch replaces 'movb' instructions with 'movzbl' to break false register
    dependencies and interleaves instructions better for out-of-order scheduling.
    
    Tested on Intel Core i5-2450M and AMD FX-8100.
    
    tcrypt ECB results:
    
    Intel Core i5-2450M:
    
    size    old-vs-new      new-vs-3way     old-vs-3way
            enc     dec     enc     dec     enc     dec
    256     1.12x   1.13x   1.36x   1.37x   1.21x   1.22x
    1k      1.14x   1.14x   1.48x   1.49x   1.29x   1.31x
    8k      1.14x   1.14x   1.50x   1.52x   1.32x   1.33x
    
    AMD FX-8100:
    
    size    old-vs-new      new-vs-3way     old-vs-3way
            enc     dec     enc     dec     enc     dec
    256     1.10x   1.11x   1.01x   1.01x   0.92x   0.91x
    1k      1.11x   1.12x   1.08x   1.07x   0.97x   0.96x
    8k      1.11x   1.13x   1.10x   1.08x   0.99x   0.97x
    
    [v2]
     - Do instruction interleaving another way to avoid adding new FPU<=>CPU
       register moves as these cause performance drop on Bulldozer.
     - Further interleaving improvements for better out-of-order scheduling.
    Tested-by: default avatarBorislav Petkov <bp@alien8.de>
    Cc: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
    Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@mbnet.fi>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    f94a73f8
twofish-avx-x86_64-asm_64.S 9.15 KB