• Marco Elver's avatar
    mm, kmsan: fix infinite recursion due to RCU critical section · f6564fce
    Marco Elver authored
    Alexander Potapenko writes in [1]: "For every memory access in the code
    instrumented by KMSAN we call kmsan_get_metadata() to obtain the metadata
    for the memory being accessed.  For virtual memory the metadata pointers
    are stored in the corresponding `struct page`, therefore we need to call
    virt_to_page() to get them.
    
    According to the comment in arch/x86/include/asm/page.h,
    virt_to_page(kaddr) returns a valid pointer iff virt_addr_valid(kaddr) is
    true, so KMSAN needs to call virt_addr_valid() as well.
    
    To avoid recursion, kmsan_get_metadata() must not call instrumented code,
    therefore ./arch/x86/include/asm/kmsan.h forks parts of
    arch/x86/mm/physaddr.c to check whether a virtual address is valid or not.
    
    But the introduction of rcu_read_lock() to pfn_valid() added instrumented
    RCU API calls to virt_to_page_or_null(), which is called by
    kmsan_get_metadata(), so there is an infinite recursion now.  I do not
    think it is correct to stop that recursion by doing
    kmsan_enter_runtime()/kmsan_exit_runtime() in kmsan_get_metadata(): that
    would prevent instrumented functions called from within the runtime from
    tracking the shadow values, which might introduce false positives."
    
    Fix the issue by switching pfn_valid() to the _sched() variant of
    rcu_read_lock/unlock(), which does not require calling into RCU.  Given
    the critical section in pfn_valid() is very small, this is a reasonable
    trade-off (with preemptible RCU).
    
    KMSAN further needs to be careful to suppress calls into the scheduler,
    which would be another source of recursion.  This can be done by wrapping
    the call to pfn_valid() into preempt_disable/enable_no_resched().  The
    downside is that this sacrifices breaking scheduling guarantees; however,
    a kernel compiled with KMSAN has already given up any performance
    guarantees due to being heavily instrumented.
    
    Note, KMSAN code already disables tracing via Makefile, and since mmzone.h
    is included, it is not necessary to use the notrace variant, which is
    generally preferred in all other cases.
    
    Link: https://lkml.kernel.org/r/20240115184430.2710652-1-glider@google.com [1]
    Link: https://lkml.kernel.org/r/20240118110022.2538350-1-elver@google.com
    Fixes: 5ec8e8ea ("mm/sparsemem: fix race in accessing memory_section->usage")
    Signed-off-by: default avatarMarco Elver <elver@google.com>
    Reported-by: default avatarAlexander Potapenko <glider@google.com>
    Reported-by: syzbot+93a9e8a3dea8d6085e12@syzkaller.appspotmail.com
    Reviewed-by: default avatarAlexander Potapenko <glider@google.com>
    Tested-by: default avatarAlexander Potapenko <glider@google.com>
    Cc: Charan Teja Kalla <quic_charante@quicinc.com>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    f6564fce
mmzone.h 64.7 KB