• Mark Rutland's avatar
    locking/atomic: scripts: fix fallback ifdeffery · 6d2779ec
    Mark Rutland authored
    Since commit:
    
      9257959a ("locking/atomic: scripts: restructure fallback ifdeffery")
    
    The ordering fallbacks for atomic*_read_acquire() and
    atomic*_set_release() erroneously fall back to the implictly relaxed
    atomic*_read() and atomic*_set() variants respectively, without any
    additional barriers. This loses the ACQUIRE and RELEASE ordering
    semantics, which can result in a wide variety of problems, even on
    strongly-ordered architectures where the implementation of
    atomic*_read() and/or atomic*_set() allows the compiler to reorder those
    relative to other accesses.
    
    In practice this has been observed to break bit spinlocks on arm64,
    resulting in dentry cache corruption.
    
    The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to
    be defined in terms of FULL ops, but where an op had RELAXED ordering by
    default, this unintentionally permitted the ACQUIRE/RELEASE ops to be
    defined in terms of the implicitly RELAXED default.
    
    This patch corrects the logic to avoid falling back to implicitly
    RELAXED ops, resulting in the same behaviour as prior to commit
    9257959a.
    
    I've verified the resulting assembly on arm64 by generating outlined
    wrappers of the atomics. Prior to this patch the compiler generates
    sequences using relaxed load (LDR) and store (STR) instructions, e.g.
    
    | <outlined_atomic64_read_acquire>:
    |         ldr     x0, [x0]
    |         ret
    |
    | <outlined_atomic64_set_release>:
    |         str     x1, [x0]
    |         ret
    
    With this patch applied the compiler generates sequences using the
    intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
    
    | <outlined_atomic64_read_acquire>:
    |         ldar    x0, [x0]
    |         ret
    |
    | <outlined_atomic64_set_release>:
    |         stlr    x1, [x0]
    |         ret
    
    To make sure that there were no other victims of the ifdeffery rewrite,
    I generated outlined copies of all of the {atomic,atomic64,atomic_long}
    atomic operations before and after commit 9257959a. A diff of
    the generated assembly on arm64 shows that only the read_acquire() and
    set_release() operations were changed, and only lost their intended
    ordering:
    
    | [mark@lakrids:~/src/linux]% diff -u \
    | 	<(aarch64-linux-gnu-objdump -d before-9257959a.o)
    | 	<(aarch64-linux-gnu-objdump -d after-9257959a.o)
    | --- /proc/self/fd/11    2023-09-19 16:51:51.114779415 +0100
    | +++ /proc/self/fd/16    2023-09-19 16:51:51.114779415 +0100
    | @@ -1,5 +1,5 @@
    |
    | -before-9257959a.o:     file format elf64-littleaarch64
    | +after-9257959a.o:     file format elf64-littleaarch64
    |
    |
    |  Disassembly of section .text:
    | @@ -9,7 +9,7 @@
    |         4:      d65f03c0        ret
    |
    |  0000000000000008 <outlined_atomic_read_acquire>:
    | -       8:      88dffc00        ldar    w0, [x0]
    | +       8:      b9400000        ldr     w0, [x0]
    |         c:      d65f03c0        ret
    |
    |  0000000000000010 <outlined_atomic_set>:
    | @@ -17,7 +17,7 @@
    |        14:      d65f03c0        ret
    |
    |  0000000000000018 <outlined_atomic_set_release>:
    | -      18:      889ffc01        stlr    w1, [x0]
    | +      18:      b9000001        str     w1, [x0]
    |        1c:      d65f03c0        ret
    |
    |  0000000000000020 <outlined_atomic_add>:
    | @@ -1230,7 +1230,7 @@
    |      1070:      d65f03c0        ret
    |
    |  0000000000001074 <outlined_atomic64_read_acquire>:
    | -    1074:      c8dffc00        ldar    x0, [x0]
    | +    1074:      f9400000        ldr     x0, [x0]
    |      1078:      d65f03c0        ret
    |
    |  000000000000107c <outlined_atomic64_set>:
    | @@ -1238,7 +1238,7 @@
    |      1080:      d65f03c0        ret
    |
    |  0000000000001084 <outlined_atomic64_set_release>:
    | -    1084:      c89ffc01        stlr    x1, [x0]
    | +    1084:      f9000001        str     x1, [x0]
    |      1088:      d65f03c0        ret
    |
    |  000000000000108c <outlined_atomic64_add>:
    | @@ -2427,7 +2427,7 @@
    |      207c:      d65f03c0        ret
    |
    |  0000000000002080 <outlined_atomic_long_read_acquire>:
    | -    2080:      c8dffc00        ldar    x0, [x0]
    | +    2080:      f9400000        ldr     x0, [x0]
    |      2084:      d65f03c0        ret
    |
    |  0000000000002088 <outlined_atomic_long_set>:
    | @@ -2435,7 +2435,7 @@
    |      208c:      d65f03c0        ret
    |
    |  0000000000002090 <outlined_atomic_long_set_release>:
    | -    2090:      c89ffc01        stlr    x1, [x0]
    | +    2090:      f9000001        str     x1, [x0]
    |      2094:      d65f03c0        ret
    |
    |  0000000000002098 <outlined_atomic_long_add>:
    
    I've build tested this with a variety of configs for alpha, arm, arm64,
    csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv,
    s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I
    was unable to build test for ia64 and parisc due to existing build
    breakage in v6.6-rc2.
    
    Fixes: 9257959a ("locking/atomic: scripts: restructure fallback ifdeffery")
    Reported-by: default avatarMing Lei <ming.lei@redhat.com>
    Reported-by: default avatarDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Tested-by: default avatarBaokun Li <libaokun1@huawei.com>
    Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com
    6d2779ec
gen-atomic-fallback.sh 8.34 KB