• Ravi Bangoria's avatar
    perf/x86/amd: Fix crash due to race between amd_pmu_enable_all, perf NMI and throttling · baa014b9
    Ravi Bangoria authored
    amd_pmu_enable_all() does:
    
          if (!test_bit(idx, cpuc->active_mask))
                  continue;
    
          amd_pmu_enable_event(cpuc->events[idx]);
    
    A perf NMI of another event can come between these two steps. Perf NMI
    handler internally disables and enables _all_ events, including the one
    which nmi-intercepted amd_pmu_enable_all() was in process of enabling.
    If that unintentionally enabled event has very low sampling period and
    causes immediate successive NMI, causing the event to be throttled,
    cpuc->events[idx] and cpuc->active_mask gets cleared by x86_pmu_stop().
    This will result in amd_pmu_enable_event() getting called with event=NULL
    when amd_pmu_enable_all() resumes after handling the NMIs. This causes a
    kernel crash:
    
      BUG: kernel NULL pointer dereference, address: 0000000000000198
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      [...]
      Call Trace:
       <TASK>
       amd_pmu_enable_all+0x68/0xb0
       ctx_resched+0xd9/0x150
       event_function+0xb8/0x130
       ? hrtimer_start_range_ns+0x141/0x4a0
       ? perf_duration_warn+0x30/0x30
       remote_function+0x4d/0x60
       __flush_smp_call_function_queue+0xc4/0x500
       flush_smp_call_function_queue+0x11d/0x1b0
       do_idle+0x18f/0x2d0
       cpu_startup_entry+0x19/0x20
       start_secondary+0x121/0x160
       secondary_startup_64_no_verify+0xe5/0xeb
       </TASK>
    
    amd_pmu_disable_all()/amd_pmu_enable_all() calls inside perf NMI handler
    were recently added as part of BRS enablement but I'm not sure whether
    we really need them. We can just disable BRS in the beginning and enable
    it back while returning from NMI. This will solve the issue by not
    enabling those events whose active_masks are set but are not yet enabled
    in hw pmu.
    
    Fixes: ada54345 ("perf/x86/amd: Add AMD Fam19h Branch Sampling support")
    Reported-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
    Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20221114044029.373-1-ravi.bangoria@amd.com
    baa014b9
core.c 38.6 KB