• Reinette Chatre's avatar
    x86/resctrl: Fix uninitialized memory read when last CPU of domain goes offline · c3eeb1ff
    Reinette Chatre authored
    Tony encountered this OOPS when the last CPU of a domain goes
    offline while running a kernel built with CONFIG_NO_HZ_FULL:
    
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0
        Oops: 0000 [#1] PREEMPT SMP NOPTI
        ...
        RIP: 0010:__find_nth_andnot_bit+0x66/0x110
        ...
        Call Trace:
         <TASK>
         ? __die()
         ? page_fault_oops()
         ? exc_page_fault()
         ? asm_exc_page_fault()
         cpumask_any_housekeeping()
         mbm_setup_overflow_handler()
         resctrl_offline_cpu()
         resctrl_arch_offline_cpu()
         cpuhp_invoke_callback()
         cpuhp_thread_fun()
         smpboot_thread_fn()
         kthread()
         ret_from_fork()
         ret_from_fork_asm()
         </TASK>
    
    The NULL pointer dereference is encountered while searching for another
    online CPU in the domain (of which there are none) that can be used to
    run the MBM overflow handler.
    
    Because the kernel is configured with CONFIG_NO_HZ_FULL the search for
    another CPU (in its effort to prefer those CPUs that aren't marked
    nohz_full) consults the mask representing the nohz_full CPUs,
    tick_nohz_full_mask. On a kernel with CONFIG_CPUMASK_OFFSTACK=y
    tick_nohz_full_mask is not allocated unless the kernel is booted with
    the "nohz_full=" parameter and because of that any access to
    tick_nohz_full_mask needs to be guarded with tick_nohz_full_enabled().
    
    Replace the IS_ENABLED(CONFIG_NO_HZ_FULL) with tick_nohz_full_enabled().
    The latter ensures tick_nohz_full_mask can be accessed safely and can be
    used whether kernel is built with CONFIG_NO_HZ_FULL enabled or not.
    
    [ Use Ingo's suggestion that combines the two NO_HZ checks into one. ]
    
    Fixes: a4846aaf ("x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow")
    Reported-by: default avatarTony Luck <tony.luck@intel.com>
    Signed-off-by: default avatarReinette Chatre <reinette.chatre@intel.com>
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    Reviewed-by: default avatarBabu Moger <babu.moger@amd.com>
    Link: https://lore.kernel.org/r/ff8dfc8d3dcb04b236d523d1e0de13d2ef585223.1711993956.git.reinette.chatre@intel.com
    Closes: https://lore.kernel.org/lkml/ZgIFT5gZgIQ9A9G7@agluck-desk3/
    c3eeb1ff
internal.h 18 KB