• Linus Torvalds's avatar
    string: improve default out-of-line memcmp() implementation · 291d47cc
    Linus Torvalds authored
    This just does the "if the architecture does efficient unaligned
    handling, start the memcmp using 'unsigned long' accesses", since
    Nikolay Borisov found a load that cares.
    
    This is basically the minimal patch, and limited to architectures that
    are known to not have slow unaligned handling.  We've had the stupid
    byte-at-a-time version forever, and nobody has ever even noticed before,
    so let's keep the fix minimal.
    
    A potential further improvement would be to align one of the sources in
    order to at least minimize unaligned cases, but the only real case of
    bigger memcmp() users seems to be the FIDEDUPERANGE ioctl().  As David
    Sterba says, the dedupe ioctl is typically called on ranges spanning
    many pages so the common case will all be page-aligned anyway.
    
    All the relevant architectures select HAVE_EFFICIENT_UNALIGNED_ACCESS,
    so I'm not going to worry about the combination of a very rare use-case
    and a rare architecture until somebody actually hits it.  Particularly
    since Nikolay also tested the more complex patch with extra alignment
    handling code, and it only added overhead.
    
    Link: https://lore.kernel.org/lkml/20210721135926.602840-1-nborisov@suse.com/Reported-by: default avatarNikolay Borisov <nborisov@suse.com>
    Cc: David Sterba <dsterba@suse.cz>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    291d47cc
string.c 26.1 KB