• Christopher M. Riedl's avatar
    powerpc/code-patching: Use temporary mm for Radix MMU · c28c15b6
    Christopher M. Riedl authored
    x86 supports the notion of a temporary mm which restricts access to
    temporary PTEs to a single CPU. A temporary mm is useful for situations
    where a CPU needs to perform sensitive operations (such as patching a
    STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
    said mappings to other CPUs. Another benefit is that other CPU TLBs do
    not need to be flushed when the temporary mm is torn down.
    
    Mappings in the temporary mm can be set in the userspace portion of the
    address-space.
    
    Interrupts must be disabled while the temporary mm is in use. HW
    breakpoints, which may have been set by userspace as watchpoints on
    addresses now within the temporary mm, are saved and disabled when
    loading the temporary mm. The HW breakpoints are restored when unloading
    the temporary mm. All HW breakpoints are indiscriminately disabled while
    the temporary mm is in use - this may include breakpoints set by perf.
    
    Use the `poking_init` init hook to prepare a temporary mm and patching
    address. Initialize the temporary mm using mm_alloc(). Choose a
    randomized patching address inside the temporary mm userspace address
    space. The patching address is randomized between PAGE_SIZE and
    DEFAULT_MAP_WINDOW-PAGE_SIZE.
    
    Bits of entropy with 64K page size on BOOK3S_64:
    
    	bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
    
    	PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
    	bits of entropy = log2(128TB / 64K)
    	bits of entropy = 31
    
    The upper limit is DEFAULT_MAP_WINDOW due to how the Book3s64 Hash MMU
    operates - by default the space above DEFAULT_MAP_WINDOW is not
    available. Currently the Hash MMU does not use a temporary mm so
    technically this upper limit isn't necessary; however, a larger
    randomization range does not further "harden" this overall approach and
    future work may introduce patching with a temporary mm on Hash as well.
    
    Randomization occurs only once during initialization for each CPU as it
    comes online.
    
    The patching page is mapped with PAGE_KERNEL to set EAA[0] for the PTE
    which ignores the AMR (so no need to unlock/lock KUAP) according to
    PowerISA v3.0b Figure 35 on Radix.
    
    Based on x86 implementation:
    
    commit 4fc19708
    ("x86/alternatives: Initialize temporary mm for patching")
    
    and:
    
    commit b3fd8e83
    ("x86/alternatives: Use temporary mm for text poking")
    
    From: Benjamin Gray <bgray@linux.ibm.com>
    
    Synchronisation is done according to ISA 3.1B Book 3 Chapter 13
    "Synchronization Requirements for Context Alterations". Switching the mm
    is a change to the PID, which requires a CSI before and after the change,
    and a hwsync between the last instruction that performs address
    translation for an associated storage access.
    
    Instruction fetch is an associated storage access, but the instruction
    address mappings are not being changed, so it should not matter which
    context they use. We must still perform a hwsync to guard arbitrary
    prior code that may have accessed a userspace address.
    
    TLB invalidation is local and VA specific. Local because only this core
    used the patching mm, and VA specific because we only care that the
    writable mapping is purged. Leaving the other mappings intact is more
    efficient, especially when performing many code patches in a row (e.g.,
    as ftrace would).
    Signed-off-by: default avatarChristopher M. Riedl <cmr@bluescreens.de>
    Signed-off-by: default avatarBenjamin Gray <bgray@linux.ibm.com>
    [mpe: Use mm_alloc() per 107b6828a7cd ("x86/mm: Use mm_alloc() in poking_init()")]
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20221109045112.187069-9-bgray@linux.ibm.com
    c28c15b6
code-patching.c 12.2 KB