• Julien Grall's avatar
    arm64/fpsimd: Don't disable softirq when touching FPSIMD/SVE state · 6dcdefcd
    Julien Grall authored
    When the kernel is compiled with CONFIG_KERNEL_MODE_NEON, some part of
    the kernel may be able to use FPSIMD/SVE. This is for instance the case
    for crypto code.
    
    Any use of FPSIMD/SVE in the kernel are clearly marked by using the
    function kernel_neon_{begin, end}. Furthermore, this can only be used
    when may_use_simd() returns true.
    
    The current implementation of may_use_simd() allows softirq to use
    FPSIMD/SVE unless it is currently in use (i.e kernel_neon_busy is true).
    When in use, softirqs usually fall back to a software method.
    
    At the moment, as a softirq may use FPSIMD/SVE, softirqs are disabled
    when touching the FPSIMD/SVE context. This has the drawback to disable
    all softirqs even if they are not using FPSIMD/SVE.
    
    Since a softirq is supposed to check may_use_simd() anyway before
    attempting to use FPSIMD/SVE, there is limited reason to keep softirq
    disabled when touching the FPSIMD/SVE context. Instead, we can simply
    disable preemption and mark the FPSIMD/SVE context as in use by setting
    CPU's fpsimd_context_busy flag.
    
    Two new helpers {get, put}_cpu_fpsimd_context are introduced to mark
    the area using FPSIMD/SVE context and they are used to replace
    local_bh_{disable, enable}. The functions kernel_neon_{begin, end} are
    also re-implemented to use the new helpers.
    
    Additionally, double-underscored versions of the helpers are provided to
    called when preemption is already disabled. These are only relevant on
    paths where irqs are disabled anyway, so they are not needed for
    correctness in the current code. Let's use them anyway though: this
    marks critical sections clearly and will help to avoid mistakes during
    future maintenance.
    
    The change has been benchmarked on Linux 5.1-rc4 with defconfig.
    
    On Juno2:
        * hackbench 100 process 1000 (10 times)
        * .7% quicker
    
    On ThunderX 2:
        * hackbench 1000 process 1000 (20 times)
        * 3.4% quicker
    Reviewed-by: default avatarDave Martin <dave.martin@arm.com>
    Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
    Signed-off-by: default avatarJulien Grall <julien.grall@arm.com>
    Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    6dcdefcd
simd.h 1.4 KB