• Ammar Faizi's avatar
    tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` · 553845ee
    Ammar Faizi authored
    Simplify memcpy() and memmove() on the x86-64 arch.
    
    The x86-64 arch has a 'rep movsb' instruction, which can perform
    memcpy() using only a single instruction, given:
    
        %rdi = destination
        %rsi = source
        %rcx = length
    
    Additionally, it can also handle the overlapping case by setting DF=1
    (backward copy), which can be used as the memmove() implementation.
    
    Before this patch:
    ```
      00000000000010ab <memmove>:
        10ab: 48 89 f8              mov    %rdi,%rax
        10ae: 31 c9                 xor    %ecx,%ecx
        10b0: 48 39 f7              cmp    %rsi,%rdi
        10b3: 48 83 d1 ff           adc    $0xffffffffffffffff,%rcx
        10b7: 48 85 d2              test   %rdx,%rdx
        10ba: 74 25                 je     10e1 <memmove+0x36>
        10bc: 48 83 c9 01           or     $0x1,%rcx
        10c0: 48 39 f0              cmp    %rsi,%rax
        10c3: 48 c7 c7 ff ff ff ff  mov    $0xffffffffffffffff,%rdi
        10ca: 48 0f 43 fa           cmovae %rdx,%rdi
        10ce: 48 01 cf              add    %rcx,%rdi
        10d1: 44 8a 04 3e           mov    (%rsi,%rdi,1),%r8b
        10d5: 44 88 04 38           mov    %r8b,(%rax,%rdi,1)
        10d9: 48 01 cf              add    %rcx,%rdi
        10dc: 48 ff ca              dec    %rdx
        10df: 75 f0                 jne    10d1 <memmove+0x26>
        10e1: c3                    ret
    
      00000000000010e2 <memcpy>:
        10e2: 48 89 f8              mov    %rdi,%rax
        10e5: 48 85 d2              test   %rdx,%rdx
        10e8: 74 12                 je     10fc <memcpy+0x1a>
        10ea: 31 c9                 xor    %ecx,%ecx
        10ec: 40 8a 3c 0e           mov    (%rsi,%rcx,1),%dil
        10f0: 40 88 3c 08           mov    %dil,(%rax,%rcx,1)
        10f4: 48 ff c1              inc    %rcx
        10f7: 48 39 ca              cmp    %rcx,%rdx
        10fa: 75 f0                 jne    10ec <memcpy+0xa>
        10fc: c3                    ret
    ```
    
    After this patch:
    ```
      // memmove is an alias for memcpy
      000000000040133b <memcpy>:
        40133b: 48 89 d1              mov    %rdx,%rcx
        40133e: 48 89 f8              mov    %rdi,%rax
        401341: 48 89 fa              mov    %rdi,%rdx
        401344: 48 29 f2              sub    %rsi,%rdx
        401347: 48 39 ca              cmp    %rcx,%rdx
        40134a: 72 03                 jb     40134f <memcpy+0x14>
        40134c: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
        40134e: c3                    ret
        40134f: 48 8d 7c 0f ff        lea    -0x1(%rdi,%rcx,1),%rdi
        401354: 48 8d 74 0e ff        lea    -0x1(%rsi,%rcx,1),%rsi
        401359: fd                    std
        40135a: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
        40135c: fc                    cld
        40135d: c3                    ret
    ```
    
    v3:
      - Make memmove as an alias for memcpy (Willy).
      - Make the forward copy the likely case (Alviro).
    
    v2:
      - Fix the broken memmove implementation (David).
    
    Link: https://lore.kernel.org/lkml/20230902062237.GA23141@1wt.eu
    Link: https://lore.kernel.org/lkml/5a821292d96a4dbc84c96ccdc6b5b666@AcuMS.aculab.comSuggested-by: default avatarDavid Laight <David.Laight@aculab.com>
    Signed-off-by: default avatarAmmar Faizi <ammarfaizi2@gnuweeb.org>
    Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
    Signed-off-by: default avatarThomas Weißschuh <linux@weissschuh.net>
    553845ee
arch-x86_64.h 9.89 KB