• Feng Tang's avatar
    perf c2c: Add report option to show false sharing in adjacent cachelines · 1470a108
    Feng Tang authored
    Many platforms have feature of adjacent cachelines prefetch, when it is
    enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if
    one is fetched to cache, the other one could likely be fetched too,
    which sort of extends the cacheline size to double, thus the false
    sharing could happens in adjacent cachelines.
    
    0Day has captured performance changed related with this [1], and some
    commercial software explicitly makes its hot global variables 128 bytes
    aligned (2 cache lines) to avoid this kind of extended false sharing.
    
    So add an option "--double-cl" for 'perf c2c report' to show false
    sharing in double cache line granularity, which acts just like the
    cacheline size is doubled. There is no change to c2c record. The
    hardware events of shared cacheline are still per cacheline, and this
    option just changes the granularity of how events are grouped and
    displayed.
    
    In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case
    on old kernel):
    
      ----------------------------------------------------------------------
         26       31        2        0        0        0  0xffff888103ec6000
      ----------------------------------------------------------------------
       35.48%   50.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff8133148b   1153   66    971   3748   74  [k] get_mem_cgroup_from_mm
        6.45%    0.00%    0.00%    0.00%    0.00%   0x10     0       1  0xffffffff813396e4    570    0   1531    879   75  [k] mem_cgroup_charge
       25.81%   50.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81331472    949   70    593   3359   74  [k] get_mem_cgroup_from_mm
       19.35%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81339686   1352    0   1073   1022   74  [k] mem_cgroup_charge
        9.68%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff813396d6   1401    0    863    768   74  [k] mem_cgroup_charge
        3.23%    0.00%    0.00%    0.00%    0.00%   0x54     0       1  0xffffffff81333106    618    0    804     11    9  [k] uncharge_batch
    
    The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are
    listed together to give users a hint of extended false sharing.
    
    [1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/
    
    Committer notes:
    
    Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx
    
    Removed -a, leaving just as --double-cl, as this probably is not used so
    frequently and perhaps will be even auto-detected if we manage to record
    the MSR where this is configured.
    Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
    Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
    Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
    Tested-by: default avatarLeo Yan <leo.yan@linaro.org>
    Acked-by: default avatarJoe Mario <jmario@redhat.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Kan Liang <kan.liang@linux.intel.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Tim Chen <tim.c.chen@intel.com>
    Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
    Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    1470a108
cacheline.h 708 Bytes