• Mike Rapoport's avatar
    x86/mm: Fix kern_addr_valid() to cope with existing but not present entries · 34b1999d
    Mike Rapoport authored
    Jiri Olsa reported a fault when running:
    
      # cat /proc/kallsyms | grep ksys_read
      ffffffff8136d580 T ksys_read
      # objdump -d --start-address=0xffffffff8136d580 --stop-address=0xffffffff8136d590 /proc/kcore
    
      /proc/kcore:     file format elf64-x86-64
    
      Segmentation fault
    
      general protection fault, probably for non-canonical address 0xf887ffcbff000: 0000 [#1] SMP PTI
      CPU: 12 PID: 1079 Comm: objdump Not tainted 5.14.0-rc5qemu+ #508
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
      RIP: 0010:kern_addr_valid
      Call Trace:
       read_kcore
       ? rcu_read_lock_sched_held
       ? rcu_read_lock_sched_held
       ? rcu_read_lock_sched_held
       ? trace_hardirqs_on
       ? rcu_read_lock_sched_held
       ? lock_acquire
       ? lock_acquire
       ? rcu_read_lock_sched_held
       ? lock_acquire
       ? rcu_read_lock_sched_held
       ? rcu_read_lock_sched_held
       ? rcu_read_lock_sched_held
       ? lock_release
       ? _raw_spin_unlock
       ? __handle_mm_fault
       ? rcu_read_lock_sched_held
       ? lock_acquire
       ? rcu_read_lock_sched_held
       ? lock_release
       proc_reg_read
       ? vfs_read
       vfs_read
       ksys_read
       do_syscall_64
       entry_SYSCALL_64_after_hwframe
    
    The fault happens because kern_addr_valid() dereferences existent but not
    present PMD in the high kernel mappings.
    
    Such PMDs are created when free_kernel_image_pages() frees regions larger
    than 2Mb. In this case, a part of the freed memory is mapped with PMDs and
    the set_memory_np_noalias() -> ... -> __change_page_attr() sequence will
    mark the PMD as not present rather than wipe it completely.
    
    Have kern_addr_valid() check whether higher level page table entries are
    present before trying to dereference them to fix this issue and to avoid
    similar issues in the future.
    
    Stable backporting note:
    ------------------------
    
    Note that the stable marking is for all active stable branches because
    there could be cases where pagetable entries exist but are not valid -
    see 9a14aefc ("x86: cpa, fix lookup_address"), for example. So make
    sure to be on the safe side here and use pXY_present() accessors rather
    than pXY_none() which could #GP when accessing pages in the direct map.
    
    Also see:
    
      c40a56a7 ("x86/mm/init: Remove freed kernel image areas from alias mapping")
    
    for more info.
    Reported-by: default avatarJiri Olsa <jolsa@redhat.com>
    Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
    Acked-by: default avatarDave Hansen <dave.hansen@intel.com>
    Tested-by: default avatarJiri Olsa <jolsa@redhat.com>
    Cc: <stable@vger.kernel.org>	# 4.4+
    Link: https://lkml.kernel.org/r/20210819132717.19358-1-rppt@kernel.org
    34b1999d
init_64.c 42.7 KB