• Will Deacon's avatar
    arm64: jump_label: Ensure patched jump_labels are visible to all CPUs · cfb00a35
    Will Deacon authored
    Although the Arm architecture permits concurrent modification and
    execution of NOP and branch instructions, it still requires some
    synchronisation to ensure that other CPUs consistently execute the newly
    written instruction:
    
     >  When the modified instructions are observable, each PE that is
     >  executing the modified instructions must execute an ISB or perform a
     >  context synchronizing event to ensure execution of the modified
     >  instructions
    
    Prior to commit f6cc0c50 ("arm64: Avoid calling stop_machine() when
    patching jump labels"), the arm64 jump_label patching machinery
    performed synchronisation using stop_machine() after each modification,
    however this was problematic when flipping static keys from atomic
    contexts (namely, the arm_arch_timer CPU hotplug startup notifier) and
    so we switched to the _nosync() patching routines to avoid "scheduling
    while atomic" BUG()s during boot.
    
    In hindsight, the analysis of the issue in f6cc0c50 isn't quite
    right: it cites the use of IPIs in the default patching routines as the
    cause of the lockup, whereas stop_machine() does not rely on IPIs and
    the I-cache invalidation is performed using __flush_icache_range(),
    which elides the call to kick_all_cpus_sync(). In fact, the blocking
    wait for other CPUs is what triggers the BUG() and the problem remains
    even after f6cc0c50, for example because we could block on the
    jump_label_mutex. Eventually, the arm_arch_timer driver was fixed to
    avoid the static key entirely in commit a862fc22
    ("clocksource/arm_arch_timer: Remove use of workaround static key").
    
    This all leaves the jump_label patching code in a funny situation on
    arm64 as we do not synchronise with other CPUs to reduce the likelihood
    of a bug which no longer exists. Consequently, toggling a static key on
    one CPU cannot be assumed to take effect on other CPUs, leading to
    potential issues, for example with missing preempt notifiers.
    
    Rather than revert f6cc0c50 and go back to stop_machine() for each
    patch site, implement arch_jump_label_transform_apply() and kick all
    the other CPUs with an IPI at the end of patching.
    
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Marc Zyngier <maz@kernel.org>
    Fixes: f6cc0c50 ("arm64: Avoid calling stop_machine() when patching jump labels")
    Signed-off-by: default avatarWill Deacon <will@kernel.org>
    Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
    Link: https://lore.kernel.org/r/20240731133601.3073-1-will@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    cfb00a35
jump_label.h 1.14 KB