Commit f2091321 authored by WANG Xuerui's avatar WANG Xuerui Committed by Huacai Chen

raid6: Add LoongArch SIMD recovery implementation

Similar to the syndrome calculation, the recovery algorithms also work
on 64 bytes at a time to align with the L1 cache line size of current
and future LoongArch cores (that we care about). Which means
unrolled-by-4 LSX and unrolled-by-2 LASX code.

The assembly is originally based on the x86 SSSE3/AVX2 ports, but
register allocation has been redone to take advantage of LSX/LASX's 32
vector registers, and instruction sequence has been optimized to suit
(e.g. LoongArch can perform per-byte srl and andi on vectors, but x86
cannot).

Performance numbers measured by instrumenting the raid6test code, on a
3A5000 system clocked at 2.5GHz:

> lasx  2data: 354.987 MiB/s
> lasx  datap: 350.430 MiB/s
> lsx   2data: 340.026 MiB/s
> lsx   datap: 337.318 MiB/s
> intx1 2data: 164.280 MiB/s
> intx1 datap: 187.966 MiB/s

Because recovery algorithms are chosen solely based on priority and
availability, lasx is marked as priority 2 and lsx priority 1. At least
for the current generation of LoongArch micro-architectures, LASX should
always be faster than LSX whenever supported, and have similar power
consumption characteristics (because the only known LASX-capable uarch,
the LA464, always compute the full 256-bit result for vector ops).
Acked-by: default avatarSong Liu <song@kernel.org>
Signed-off-by: default avatarWANG Xuerui <git@xen0n.name>
Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
parent 8f3f06df
......@@ -125,6 +125,8 @@ extern const struct raid6_recov_calls raid6_recov_avx2;
extern const struct raid6_recov_calls raid6_recov_avx512;
extern const struct raid6_recov_calls raid6_recov_s390xc;
extern const struct raid6_recov_calls raid6_recov_neon;
extern const struct raid6_recov_calls raid6_recov_lsx;
extern const struct raid6_recov_calls raid6_recov_lasx;
extern const struct raid6_calls raid6_neonx1;
extern const struct raid6_calls raid6_neonx2;
......
......@@ -9,7 +9,7 @@ raid6_pq-$(CONFIG_ALTIVEC) += altivec1.o altivec2.o altivec4.o altivec8.o \
vpermxor1.o vpermxor2.o vpermxor4.o vpermxor8.o
raid6_pq-$(CONFIG_KERNEL_MODE_NEON) += neon.o neon1.o neon2.o neon4.o neon8.o recov_neon.o recov_neon_inner.o
raid6_pq-$(CONFIG_S390) += s390vx8.o recov_s390xc.o
raid6_pq-$(CONFIG_LOONGARCH) += loongarch_simd.o
raid6_pq-$(CONFIG_LOONGARCH) += loongarch_simd.o recov_loongarch_simd.o
hostprogs += mktables
......
......@@ -111,6 +111,14 @@ const struct raid6_recov_calls *const raid6_recov_algos[] = {
#endif
#if defined(CONFIG_KERNEL_MODE_NEON)
&raid6_recov_neon,
#endif
#ifdef CONFIG_LOONGARCH
#ifdef CONFIG_CPU_HAS_LASX
&raid6_recov_lasx,
#endif
#ifdef CONFIG_CPU_HAS_LSX
&raid6_recov_lsx,
#endif
#endif
&raid6_recov_intx1,
NULL
......
This diff is collapsed.
......@@ -65,7 +65,7 @@ else ifeq ($(HAS_ALTIVEC),yes)
OBJS += altivec1.o altivec2.o altivec4.o altivec8.o \
vpermxor1.o vpermxor2.o vpermxor4.o vpermxor8.o
else ifeq ($(ARCH),loongarch64)
OBJS += loongarch_simd.o
OBJS += loongarch_simd.o recov_loongarch_simd.o
endif
.c.o:
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment