1. 29 Jun, 2023 17 commits
    • Huacai Chen's avatar
      LoongArch: Support dbar with different hints · e031a5f3
      Huacai Chen authored
      Traditionally, LoongArch uses "dbar 0" (full completion barrier) for
      everything. But the full completion barrier is a performance killer, so
      Loongson-3A6000 and newer processors have made finer granularity hints
      available:
      
      Bit4: ordering or completion (0: completion, 1: ordering)
      Bit3: barrier for previous read (0: true, 1: false)
      Bit2: barrier for previous write (0: true, 1: false)
      Bit1: barrier for succeeding read (0: true, 1: false)
      Bit0: barrier for succeeding write (0: true, 1: false)
      
      Hint 0x700: barrier for "read after read" from the same address, which
      is needed by LL-SC loops on old models (dbar 0x700 behaves the same as
      nop if such reordering is disabled on new models).
      
      This patch makes use of the various new hints for different kinds of
      memory barriers. It brings performance improvements on Loongson-3A6000
      series, while not affecting the existing models because all variants are
      treated as 'dbar 0' there.
      
      Why override queued_spin_unlock()?
      After commit 01e3b958 ("drivers: Remove explicit invocations
      of mmiowb()") we need a completion barrier in queued_spin_unlock(), but
      the generic implementation use smp_store_release() which only provide an
      ordering barrier.
      Signed-off-by: default avatarJun Yi <yijun@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      e031a5f3
    • Huacai Chen's avatar
      LoongArch: Add SMT (Simultaneous Multi-Threading) support · f6f0c9a7
      Huacai Chen authored
      Loongson-3A6000 has SMT (Simultaneous Multi-Threading) support, each
      physical core has two logical cores (threads). This patch add SMT probe
      and scheduler support via ACPI PPTT.
      
      If SCHED_SMT enabled, Loongson-3A6000 is treated as 4 cores, 8 threads;
      If SCHED_SMT disabled, Loongson-3A6000 is treated as 8 cores, 8 threads.
      
      Remove smp_num_siblings to support HMP (Heterogeneous Multi-Processing).
      Signed-off-by: default avatarLiupu Wang <wangliupu@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      f6f0c9a7
    • Huacai Chen's avatar
      LoongArch: Add vector extensions support · 61650023
      Huacai Chen authored
      Add LoongArch's vector extensions support, which including 128bit LSX
      (i.e., Loongson SIMD eXtension) and 256bit LASX (i.e., Loongson Advanced
      SIMD eXtension).
      
      Linux kernel doesn't use vector itself, it only handle exceptions and
      context save/restore. So it only needs a subset of these instructions:
      
      * Vector load/store:   vld vst vldx vstx xvld xvst xvldx xvstx
      * 8bit-elements move:  vpickve2gr.b xvpickve2gr.b vinsgr2vr.b xvinsgr2vr.b
      * 16bit-elements move: vpickve2gr.h xvpickve2gr.h vinsgr2vr.h xvinsgr2vr.h
      * 32bit-elements move: vpickve2gr.w xvpickve2gr.w vinsgr2vr.w xvinsgr2vr.w
      * 64bit-elements move: vpickve2gr.d xvpickve2gr.d vinsgr2vr.d xvinsgr2vr.d
      * Elements permute:    vpermi.w vpermi.d xvpermi.w xvpermi.d xvpermi.q
      
      Introduce AS_HAS_LSX_EXTENSION and AS_HAS_LASX_EXTENSION to avoid non-
      vector toolchains complains unsupported instructions.
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      61650023
    • Tiezhu Yang's avatar
      LoongArch: Add support to clone a time namespace · aa5e65dc
      Tiezhu Yang authored
      We can see that "Time namespaces are not supported" on LoongArch:
      
      (1) clone3 test
        # cd tools/testing/selftests/clone3 && make && ./clone3
        ...
        # Time namespaces are not supported
        ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME
        # Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0
      
      (2) timens test
        # cd tools/testing/selftests/timens && make && ./timens
        ...
        1..0 # SKIP Time namespaces are not supported
      
      On LoongArch the current kernel does not support CONFIG_TIME_NS which
      depends on GENERIC_VDSO_TIME_NS, select GENERIC_VDSO_TIME_NS to enable
      CONFIG_TIME_NS to build kernel/time/namespace.c.
      
      Additionally, it needs to define some arch-dependent functions for the
      timens, such as __arch_get_timens_vdso_data(), arch_get_vdso_data() and
      vdso_join_timens().
      
      At the same time, modify the layout of vvar to use one page size for
      generic vdso data, expand another page size for timens vdso data and
      assign LOONGARCH_VDSO_DATA_SIZE (maybe exceeds a page size if expand in
      the future) for loongarch vdso data, at last add the callback function
      vvar_fault() and modify stack_top().
      
      With this patch under CONFIG_TIME_NS:
      
      (1) clone3 test
        # cd tools/testing/selftests/clone3 && make && ./clone3
        ...
        ok 18 [739] Result (0) matches expectation (0)
        # Totals: pass:18 fail:0 xfail:0 xpass:0 skip:0 error:0
      
      (2) timens test
        # cd tools/testing/selftests/timens && make && ./timens
        ...
        # Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      aa5e65dc
    • WANG Xuerui's avatar
      Makefile: Add loongarch target flag for Clang compilation · 65eea6b4
      WANG Xuerui authored
      The LoongArch kernel is 64-bit and built with the soft-float ABI,
      hence the loongarch64-linux-gnusf target. (The "libc" part can affect
      the codegen of libcalls: other arches do not use a bare-metal target,
      and currently the only fully supported libc on LoongArch is glibc
      anyway.)
      
      See: https://lore.kernel.org/loongarch/CAKwvOdnimxv8oJ4mVY74zqtt1x7KTMrWvn2_T9x22SFDbU6rHQ@mail.gmail.com/Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      65eea6b4
    • WANG Xuerui's avatar
      LoongArch: Mark Clang LTO as working · 5a31ed46
      WANG Xuerui authored
      Confirmed working with QEMU system emulation.
      Acked-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      5a31ed46
    • WANG Xuerui's avatar
      LoongArch: Include KBUILD_CPPFLAGS in CHECKFLAGS invocation · 5ddc7a37
      WANG Xuerui authored
      This is a port of commit 08f6554f ("mips: Include KBUILD_CPPFLAGS in
      CHECKFLAGS invocation") to arch/loongarch, for fixing cross-compilation
      of Linux/LoongArch with Clang, where previously the `--target` flag
      would no longer be present for the CHECKFLAGS cc invocation leading to
      build failure.
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Link: https://github.com/ClangBuiltLinux/linux/issues/1787#issuecomment-1608306002Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      5ddc7a37
    • WANG Xuerui's avatar
      LoongArch: vDSO: Use CLANG_FLAGS instead of filtering out '--target=' · b89673a9
      WANG Xuerui authored
      This is a port of commit 76d7fff2 ("MIPS: VDSO: Use CLANG_FLAGS
      instead of filtering out '--target='") to arch/loongarch, for fixing
      cross-compilation with Clang.
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Link: https://github.com/ClangBuiltLinux/linux/issues/1787#issuecomment-1608306002Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      b89673a9
    • WANG Xuerui's avatar
      LoongArch: Tweak CFLAGS for Clang compatibility · 38b10b26
      WANG Xuerui authored
      Now the arch code is mostly ready for LLVM/Clang consumption, it is time
      to re-organize the CFLAGS a little to actually enable the LLVM build.
      Namely, all -G0 switches from CFLAGS are removed, and -mexplicit-relocs
      and -mdirect-extern-access are now wrapped with cc-option (with the
      related asm/percpu.h definition guarded against toolchain combos that
      are known to not work).
      
      A build with !RELOCATABLE && !MODULE is confirmed working within a QEMU
      environment; support for the two features are currently blocked on
      LLVM/Clang, and will come later.
      
      Why -G0 can be removed:
      
      In GCC, -G stands for "small data threshold", that instructs the
      compiler to put data smaller than the specified threshold in a dedicated
      "small data" section (called .sdata on LoongArch and several other
      arches).
      
      However, benefiting from this would require ABI cooperation, which is
      not the case for LoongArch; and current GCC behave the same whether -G0
      (equal to disabling this optimization) is given or not. So, remove -G0
      from CFLAGS altogether for one less thing to care about. This also
      benefits LLVM/Clang compatibility where the -G switch is not supported.
      
      Why -mexplicit-relocs can now be conditionally applied without
      regressions:
      
      Originally -mexplicit-relocs is unconditionally added to CFLAGS in case
      of CONFIG_AS_HAS_EXPLICIT_RELOCS, because not having it (i.e. old GCC +
      new binutils) would not work: modules will have R_LARCH_ABS_* relocs
      inside, but given the rarity of such toolchain combo in the wild, it may
      not be worthwhile to support it, so support for such relocs in modules
      were not added back when explicit relocs support was upstreamed, and
      -mexplicit-relocs is unconditionally added to fail the build early.
      
      Now that Clang compatibility is desired, given Clang is behaving like
      -mexplicit-relocs from day one but without support for the CLI flag, we
      must ensure the flag is not passed in case of Clang. However, explicit
      compiler flavor checks can be more brittle than feature detection: in
      this case what actually matters is support for __attribute__((model))
      when building modules. Given neither older GCC nor current Clang support
      this attribute, probing for the attribute support and #error'ing out
      would allow proper UX without checking for Clang, and also automatically
      work when Clang support for the attribute is to be added in the future.
      
      Why -mdirect-extern-access is now conditionally applied:
      
      This is actually a nice-to-have optimization that can reduce GOT
      accesses, but not having it is harmless either. Because Clang does not
      support the option currently, but might do so in the future, conditional
      application via cc-option ensures compatibility with both current and
      future Clang versions.
      
      Suggested-by: Xi Ruoyao <xry111@xry111.site> # cc-option changes
      Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      38b10b26
    • WANG Xuerui's avatar
      LoongArch: Simplify the invtlb wrappers · 83d8b389
      WANG Xuerui authored
      The invtlb instruction has been supported by upstream LoongArch
      toolchains from day one, so ditch the raw opcode trickery and just use
      plain inline asm for it.
      
      While at it, also make the invtlb asm statements barriers, for proper
      modeling of the side effects. The functions are also marked as
      __always_inline instead of just "inline", because they cannot work at
      all if not inlined: the op argument will not be compile-time const in
      that case, thus failing to satisfy the "i" constraint.
      
      The signature of the other more specific invtlb wrappers contain unused
      arguments right now, but these are not removed right away in order for
      the patch to be focused. In the meantime, assertions are added to ensure
      no accidental misuse happens before the refactor. (The more specific
      wrappers cannot re-use the generic invtlb wrapper, because the ISA
      manual says $zero shall be used in case a particular op does not take
      the respective argument: re-using the generic wrapper would mean losing
      control over the register usage.)
      Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      83d8b389
    • WANG Xuerui's avatar
      LoongArch: Make the CPUCFG&CSR ops simple aliases of compiler built-ins · 53a4858c
      WANG Xuerui authored
      In addition to less visual clutter, this also makes Clang happy
      regarding the const-ness of arguments. In the original approach, all
      Clang gets to see is the incoming arguments whose const-ness cannot be
      proven without first being inlined; so Clang errors out here while GCC
      is fine.
      
      While at it, tweak several printk format strings because the return type
      of csr_read64 becomes effectively unsigned long, instead of unsigned
      long long.
      Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      53a4858c
    • WANG Xuerui's avatar
      LoongArch: Prepare for assemblers with proper FCSR class support · 38bb46f9
      WANG Xuerui authored
      The GNU assembler (as of 2.40) mis-treats FCSR operands as GPRs, but
      the LLVM IAS does not. Probe for this and refer to FCSRs as "$fcsrNN"
      if support is present.
      Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      38bb46f9
    • WANG Rui's avatar
      LoongArch: extable: Also recognize ABI names of registers · 24da0249
      WANG Rui authored
      When the kernel is compiled with LLVM, the register names being handled
      during exception fixup building are ABI names instead of bare $rNN
      style. Add mapping for the ABI names for LLVM compatibility.
      Signed-off-by: default avatarWANG Rui <wangrui@loongson.cn>
      Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      24da0249
    • WANG Rui's avatar
      LoongArch: Calculate various sizes in the linker script · 414cefc7
      WANG Rui authored
      Taking the address delta between symbols in different sections is not
      supported by the LLVM IAS. Instead, do this in the linker script, so
      the same data can be properly referenced in assembly.
      Signed-off-by: default avatarWANG Rui <wangrui@loongson.cn>
      Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
      [chenhuacai: Fix build with !CONFIG_EFI_STUB]
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      414cefc7
    • WANG Rui's avatar
      LoongArch: Add guard for the larch_insn_gen_xxx functions · 0d03e9dc
      WANG Rui authored
      Add guard for the larch_insn_gen_xxx functions to verify whether the
      immediate operand is within the acceptable range.
      Signed-off-by: default avatarWANG Rui <wangrui@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      0d03e9dc
    • Dan Carpenter's avatar
      LoongArch: Delete unnecessary debugfs checking · d7c24960
      Dan Carpenter authored
      Debugfs functions are not supposed to be checked for errors.  This
      is sort of unusual but it is described in the comments for the
      debugfs_create_dir() function.  Also debugfs_create_dir() can never
      return NULL.
      Reviewed-by: default avatarWANG Xuerui <git@xen0n.name>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      d7c24960
    • Huacai Chen's avatar
      LoongArch: Set CPU#0 as the io master for FDT · 872b368b
      Huacai Chen authored
      ACPI systems set io masters by parsing ACPI MADT, FDT systems have no
      MADT so we explicitly set CPU#0 as the io master. Otherwise CPU#0 will
      be considered as hotpluggable.
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      872b368b
  2. 27 Jun, 2023 1 commit
  3. 25 Jun, 2023 5 commits
  4. 23 Jun, 2023 17 commits