[PATCH] faster copy_*_user for bad alignments on intel ia32
This patch speeds up copy_*_user for some Intel ia32 processors. It is based on work by Mala Anand. It is a good win. Around 30% for all src/dest alignments except 32/32. In this test a fully-cached one gigabyte file was read into an 8192-byte userspace buffer using read(fd, buf, 8192). The alignment of the user-side buffer was altered between runs. This is a PIII. Times are in seconds. User buffer 2.5.41 2.5.41+ patch++ 0x804c000 4.373 4.343 0x804c001 10.024 6.401 0x804c002 10.002 6.347 0x804c003 10.013 6.328 0x804c004 10.105 6.273 0x804c005 10.184 6.323 0x804c006 10.179 6.322 0x804c007 10.185 6.319 0x804c008 9.725 6.347 0x804c009 9.780 6.275 0x804c00a 9.779 6.355 0x804c00b 9.778 6.350 0x804c00c 9.723 6.351 0x804c00d 9.790 6.307 0x804c00e 9.790 6.289 0x804c00f 9.785 6.294 0x804c010 9.727 6.277 0x804c011 9.779 6.251 0x804c012 9.783 6.246 0x804c013 9.786 6.245 0x804c014 9.772 6.063 0x804c015 9.919 6.237 0x804c016 9.920 6.234 0x804c017 9.918 6.237 0x804c018 9.846 6.372 0x804c019 10.060 6.294 0x804c01a 10.049 6.328 0x804c01b 10.041 6.337 0x804c01c 9.931 6.347 0x804c01d 10.013 6.273 0x804c01e 10.020 6.346 0x804c01f 10.016 6.356 0x804c020 4.442 4.366 So `rep;movsl' is slower at all non-cache-aligned offsets. PII is using the PIII alignment. I don't have a PII any more, but I do recall that it demonstrated the same behaviour as the PIII. The patch contains an enhancement (based on careful testing) from Hirokazu Takahashi <taka@valinux.co.jp>. In cases where source and dest have the same alignment, but that aligment is poor, we do a short copy of a few bytes to bring the two pointers onto a favourable boundary and then do the big copy. And also a bugfix from Hirokazu Takahashi. As an added bonus, this patch decreases the kernel text by 28 kbytes. 22k of this in in .text and the rest in __ex_table. I'm not really sure why .text shrunk so much. These copy routines have no special-case for constant-sized copies. So a lot of uaccess.h becomes dead code with this patch. The next patch which uninlines the copy_*_user functions cleans all that up and saves an additional 5k.
Showing
Please register or sign in to comment