1. 22 Jul, 2024 3 commits
  2. 12 Jul, 2024 7 commits
  3. 11 Jul, 2024 1 commit
  4. 10 Jul, 2024 3 commits
    • Alexandre Ghiti's avatar
      riscv: Improve sbi_ecall() code generation by reordering arguments · 16badacd
      Alexandre Ghiti authored
      The sbi_ecall() function arguments are not in the same order as the
      ecall arguments, so we end up re-ordering the registers before the
      ecall which is useless and costly.
      
      So simply reorder the arguments in the same way as expected by ecall.
      Instead of reordering directly the arguments of sbi_ecall(), use a proxy
      macro since the current ordering is more natural.
      
      Before:
      
      Dump of assembler code for function sbi_ecall:
         0xffffffff800085e0 <+0>: add sp,sp,-32
         0xffffffff800085e2 <+2>: sd s0,24(sp)
         0xffffffff800085e4 <+4>: mv t1,a0
         0xffffffff800085e6 <+6>: add s0,sp,32
         0xffffffff800085e8 <+8>: mv t3,a1
         0xffffffff800085ea <+10>: mv a0,a2
         0xffffffff800085ec <+12>: mv a1,a3
         0xffffffff800085ee <+14>: mv a2,a4
         0xffffffff800085f0 <+16>: mv a3,a5
         0xffffffff800085f2 <+18>: mv a4,a6
         0xffffffff800085f4 <+20>: mv a5,a7
         0xffffffff800085f6 <+22>: mv a6,t3
         0xffffffff800085f8 <+24>: mv a7,t1
         0xffffffff800085fa <+26>: ecall
         0xffffffff800085fe <+30>: ld s0,24(sp)
         0xffffffff80008600 <+32>: add sp,sp,32
         0xffffffff80008602 <+34>: ret
      
      After:
      
      Dump of assembler code for function __sbi_ecall:
         0xffffffff8000b6b2 <+0>:	add	sp,sp,-32
         0xffffffff8000b6b4 <+2>:	sd	s0,24(sp)
         0xffffffff8000b6b6 <+4>:	add	s0,sp,32
         0xffffffff8000b6b8 <+6>:	ecall
         0xffffffff8000b6bc <+10>:	ld	s0,24(sp)
         0xffffffff8000b6be <+12>:	add	sp,sp,32
         0xffffffff8000b6c0 <+14>:	ret
      Signed-off-by: default avatarAlexandre Ghiti <alexghiti@rivosinc.com>
      Reviewed-by: default avatarAtish Patra <atishp@rivosinc.com>
      Reviewed-by: default avatarYunhui Cui <cuiyunhui@bytedance.com>
      Link: https://lore.kernel.org/r/20240322112629.68170-1-alexghiti@rivosinc.comSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      16badacd
    • Samuel Holland's avatar
      riscv: Add tracepoints for SBI calls and returns · 56c1c1a0
      Samuel Holland authored
      These are useful for measuring the latency of SBI calls. The SBI HSM
      extension is excluded because those functions are called from contexts
      such as cpuidle where instrumentation is not allowed.
      Reviewed-by: default avatarAndrew Jones <ajones@ventanamicro.com>
      Signed-off-by: default avatarSamuel Holland <samuel.holland@sifive.com>
      Link: https://lore.kernel.org/r/20240321230131.1838105-1-samuel.holland@sifive.comSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      56c1c1a0
    • Xiao Wang's avatar
      riscv: Optimize crc32 with Zbc extension · a43fe27d
      Xiao Wang authored
      As suggested by the B-ext spec, the Zbc (carry-less multiplication)
      instructions can be used to accelerate CRC calculations. Currently, the
      crc32 is the most widely used crc function inside kernel, so this patch
      focuses on the optimization of just the crc32 APIs.
      
      Compared with the current table-lookup based optimization, Zbc based
      optimization can also achieve large stride during CRC calculation loop,
      meantime, it avoids the memory access latency of the table-lookup based
      implementation and it reduces memory footprint.
      
      If Zbc feature is not supported in a runtime environment, then the
      table-lookup based implementation would serve as fallback via alternative
      mechanism.
      
      By inspecting the vmlinux built by gcc v12.2.0 with default optimization
      level (-O2), we can see below instruction count change for each 8-byte
      stride in the CRC32 loop:
      
      rv64: crc32_be (54->31), crc32_le (54->13), __crc32c_le (54->13)
      rv32: crc32_be (50->32), crc32_le (50->16), __crc32c_le (50->16)
      
      The compile target CPU is little endian, extra effort is needed for byte
      swapping for the crc32_be API, thus, the instruction count change is not
      as significant as that in the *_le cases.
      
      This patch is tested on QEMU VM with the kernel CRC32 selftest for both
      rv64 and rv32. Running the CRC32 selftest on a real hardware (SpacemiT K1)
      with Zbc extension shows 65% and 125% performance improvement respectively
      on crc32_test() and crc32c_test().
      Signed-off-by: default avatarXiao Wang <xiao.w.wang@intel.com>
      Reviewed-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
      Link: https://lore.kernel.org/r/20240621054707.1847548-1-xiao.w.wang@intel.comSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      a43fe27d
  5. 26 Jun, 2024 26 commits