• Huacai Chen's avatar
    MIPS: Loongson: Introduce and use loongson_llsc_mb() · e02e07e3
    Huacai Chen authored
    On the Loongson-2G/2H/3A/3B there is a hardware flaw that ll/sc and
    lld/scd is very weak ordering. We should add sync instructions "before
    each ll/lld" and "at the branch-target between ll/sc" to workaround.
    Otherwise, this flaw will cause deadlock occasionally (e.g. when doing
    heavy load test with LTP).
    
    Below is the explaination of CPU designer:
    
    "For Loongson 3 family, when a memory access instruction (load, store,
    or prefetch)'s executing occurs between the execution of LL and SC, the
    success or failure of SC is not predictable. Although programmer would
    not insert memory access instructions between LL and SC, the memory
    instructions before LL in program-order, may dynamically executed
    between the execution of LL/SC, so a memory fence (SYNC) is needed
    before LL/LLD to avoid this situation.
    
    Since Loongson-3A R2 (3A2000), we have improved our hardware design to
    handle this case. But we later deduce a rarely circumstance that some
    speculatively executed memory instructions due to branch misprediction
    between LL/SC still fall into the above case, so a memory fence (SYNC)
    at branch-target (if its target is not between LL/SC) is needed for
    Loongson 3A1000, 3B1500, 3A2000 and 3A3000.
    
    Our processor is continually evolving and we aim to to remove all these
    workaround-SYNCs around LL/SC for new-come processor."
    
    Here is an example:
    
    Both cpu1 and cpu2 simutaneously run atomic_add by 1 on same atomic var,
    this bug cause both 'sc' run by two cpus (in atomic_add) succeed at same
    time('sc' return 1), and the variable is only *added by 1*, sometimes,
    which is wrong and unacceptable(it should be added by 2).
    
    Why disable fix-loongson3-llsc in compiler?
    Because compiler fix will cause problems in kernel's __ex_table section.
    
    This patch fix all the cases in kernel, but:
    
    +. the fix at the end of futex_atomic_cmpxchg_inatomic is for branch-target
    of 'bne', there other cases which smp_mb__before_llsc() and smp_llsc_mb() fix
    the ll and branch-target coincidently such as atomic_sub_if_positive/
    cmpxchg/xchg, just like this one.
    
    +. Loongson 3 does support CONFIG_EDAC_ATOMIC_SCRUB, so no need to touch
    edac.h
    
    +. local_ops and cmpxchg_local should not be affected by this bug since
    only the owner can write.
    
    +. mips_atomic_set for syscall.c is deprecated and rarely used, just let
    it go
    Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
    Signed-off-by: default avatarHuang Pei <huangpei@loongson.cn>
    [paul.burton@mips.com:
      - Simplify the addition of -mno-fix-loongson3-llsc to cflags, and add
        a comment describing why it's there.
      - Make loongson_llsc_mb() a no-op when
        CONFIG_CPU_LOONGSON3_WORKAROUNDS=n, rather than a compiler memory
        barrier.
      - Add a comment describing the bug & how loongson_llsc_mb() helps
        in asm/barrier.h.]
    Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
    Cc: Ralf Baechle <ralf@linux-mips.org>
    Cc: ambrosehua@gmail.com
    Cc: Steven J . Hill <Steven.Hill@cavium.com>
    Cc: linux-mips@linux-mips.org
    Cc: Fuxin Zhang <zhangfx@lemote.com>
    Cc: Zhangjin Wu <wuzhangjin@gmail.com>
    Cc: Li Xuefeng <lixuefeng@loongson.cn>
    Cc: Xu Chenghua <xuchenghua@loongson.cn>
    e02e07e3
bitops.h 15.8 KB