Commit 291d47cc authored by Linus Torvalds's avatar Linus Torvalds

string: improve default out-of-line memcmp() implementation

This just does the "if the architecture does efficient unaligned
handling, start the memcmp using 'unsigned long' accesses", since
Nikolay Borisov found a load that cares.

This is basically the minimal patch, and limited to architectures that
are known to not have slow unaligned handling.  We've had the stupid
byte-at-a-time version forever, and nobody has ever even noticed before,
so let's keep the fix minimal.

A potential further improvement would be to align one of the sources in
order to at least minimize unaligned cases, but the only real case of
bigger memcmp() users seems to be the FIDEDUPERANGE ioctl().  As David
Sterba says, the dedupe ioctl is typically called on ranges spanning
many pages so the common case will all be page-aligned anyway.

All the relevant architectures select HAVE_EFFICIENT_UNALIGNED_ACCESS,
so I'm not going to worry about the combination of a very rare use-case
and a rare architecture until somebody actually hits it.  Particularly
since Nikolay also tested the more complex patch with extra alignment
handling code, and it only added overhead.

Link: https://lore.kernel.org/lkml/20210721135926.602840-1-nborisov@suse.com/Reported-by: default avatarNikolay Borisov <nborisov@suse.com>
Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 7d2a07b7
...@@ -29,6 +29,7 @@ ...@@ -29,6 +29,7 @@
#include <linux/errno.h> #include <linux/errno.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <asm/unaligned.h>
#include <asm/byteorder.h> #include <asm/byteorder.h>
#include <asm/word-at-a-time.h> #include <asm/word-at-a-time.h>
#include <asm/page.h> #include <asm/page.h>
...@@ -935,6 +936,21 @@ __visible int memcmp(const void *cs, const void *ct, size_t count) ...@@ -935,6 +936,21 @@ __visible int memcmp(const void *cs, const void *ct, size_t count)
const unsigned char *su1, *su2; const unsigned char *su1, *su2;
int res = 0; int res = 0;
#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
if (count >= sizeof(unsigned long)) {
const unsigned long *u1 = cs;
const unsigned long *u2 = ct;
do {
if (get_unaligned(u1) != get_unaligned(u2))
break;
u1++;
u2++;
count -= sizeof(unsigned long);
} while (count >= sizeof(unsigned long));
cs = u1;
ct = u2;
}
#endif
for (su1 = cs, su2 = ct; 0 < count; ++su1, ++su2, count--) for (su1 = cs, su2 = ct; 0 < count; ++su1, ++su2, count--)
if ((res = *su1 - *su2) != 0) if ((res = *su1 - *su2) != 0)
break; break;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment