• Palmer Dabbelt's avatar
    Merge patch series "riscv: ASID-related and UP-related TLB flush enhancements" · 4f16345d
    Palmer Dabbelt authored
    Samuel Holland <samuel.holland@sifive.com> says:
    
    This series converts uniprocessor kernel builds to use the same TLB
    flushing code as SMP builds, to take advantage of batching and existing
    range- and ASID-based TLB flush optimizations. It optimizes out IPIs and
    SBI calls based on the online CPU count, which also covers the scenario
    where SMP was enabled at build time but only one CPU is present/online.
    A final optimization is to use single-ASID flushes wherever possible, to
    avoid unnecessary TLB misses for kernel mappings.
    
    This series has a semantic conflict with the AIA patches that are in
    linux-next due to the removal of the third parameter of
    riscv_ipi_set_virq_range(), which is called from imsic_ipi_domain_init()
    in drivers/irqchip/irq-riscv-imsic-early.c. The resolution is to remove
    the extra argument from the call site.
    
    Here are some numbers from D1 which show the performance impact:
    
    v6.9-rc1:
     System Benchmarks Partial Index              BASELINE       RESULT    INDEX
     Execl Throughput                                 43.0        198.5     46.2
     File Copy 1024 bufsize 2000 maxblocks          3960.0      73934.4    186.7
     File Copy 256 bufsize 500 maxblocks            1655.0      20242.6    122.3
     File Copy 4096 bufsize 8000 maxblocks          5800.0     197706.4    340.9
     Pipe Throughput                               12440.0     176974.2    142.3
     Pipe-based Context Switching                   4000.0      23626.8     59.1
     Process Creation                                126.0        449.9     35.7
     Shell Scripts (1 concurrent)                     42.4        544.4    128.4
     Shell Scripts (16 concurrent)                     ---         35.3      ---
     Shell Scripts (8 concurrent)                      6.0         71.6    119.3
     System Call Overhead                          15000.0     248072.6    165.4
                                                                        ========
     System Benchmarks Index Score (Partial Only)                          110.6
    
    v6.9-rc1 + this patch series:
     System Benchmarks Partial Index              BASELINE       RESULT    INDEX
     Execl Throughput                                 43.0        196.8     45.8
     File Copy 1024 bufsize 2000 maxblocks          3960.0      71782.2    181.3
     File Copy 256 bufsize 500 maxblocks            1655.0      21269.4    128.5
     File Copy 4096 bufsize 8000 maxblocks          5800.0     199424.0    343.8
     Pipe Throughput                               12440.0     196468.6    157.9
     Pipe-based Context Switching                   4000.0      24261.8     60.7
     Process Creation                                126.0        459.0     36.4
     Shell Scripts (1 concurrent)                     42.4        543.8    128.2
     Shell Scripts (16 concurrent)                     ---         35.5      ---
     Shell Scripts (8 concurrent)                      6.0         71.7    119.6
     System Call Overhead                          15000.0     259415.2    172.9
                                                                        ========
     System Benchmarks Index Score (Partial Only)                          113.0
    
    * b4-shazam-lts:
      riscv: mm: Always use an ASID to flush mm contexts
      riscv: mm: Preserve global TLB entries when switching contexts
      riscv: mm: Make asid_bits a local variable
      riscv: mm: Use a fixed layout for the MM context ID
      riscv: mm: Introduce cntx2asid/cntx2version helper macros
      riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
      riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
      riscv: mm: Combine the SMP and UP TLB flush code
      riscv: Only send remote fences when some other CPU is online
      riscv: mm: Broadcast kernel TLB flushes only when needed
      riscv: Use IPIs for remote cache/TLB flushes by default
      riscv: Factor out page table TLB synchronization
      riscv: Flush the instruction cache during SMP bringup
    
    Link: https://lore.kernel.org/r/20240327045035.368512-1-samuel.holland@sifive.comSigned-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
    4f16345d
cacheflush.c 7.63 KB