Commits · 16817ad7e8b31728b44ff9f17d8d894ed8a450d0 · Kirill Smelkov / linux

13 Sep, 2022 2 commits

perf/bpf: Always use perf callchains if exist · 16817ad7

Namhyung Kim authored Sep 08, 2022

If the perf_event has PERF_SAMPLE_CALLCHAIN, BPF can use it for stack trace.
The problematic cases like PEBS and IBS already handled in the PMU driver and
they filled the callchain info in the sample data. For others, we can call
perf_callchain() before the BPF handler.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908214104.3851807-2-namhyung@kernel.org

16817ad7

perf: Use sample_flags for callchain · 3749d33e

Namhyung Kim authored Sep 08, 2022

So that it can call perf_callchain() only if needed.  Historically it used
__PERF_SAMPLE_CALLCHAIN_EARLY but we can do that with sample_flags in the
struct perf_sample_data.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908214104.3851807-1-namhyung@kernel.org

3749d33e

07 Sep, 2022 14 commits

perf/x86/intel: Optimize FIXED_CTR_CTRL access · fae9ebde

Kan Liang authored Aug 04, 2022

All the fixed counters share a fixed control register. The current
perf reads and re-writes the fixed control register for each fixed
counter disable/enable, which is unnecessary.

When changing the fixed control register, the entire PMU must be
disabled via the global control register. The changing cannot be taken
effect until the entire PMU is re-enabled. Only updating the fixed
control register once right before the entire PMU re-enabling is
enough.

The read of the fixed control register is not necessary either. The
value can be cached in the per CPU cpu_hw_events.

Test results:

Counting all the fixed counters with the perf bench sched pipe as below
on a SPR machine.

 $perf stat -e cycles,instructions,ref-cycles,slots --no-inherit --
  taskset -c 1 perf bench sched pipe

The Total elapsed time reduces from 5.36s (without the patch) to 4.99s
(with the patch), which is ~6.9% improvement.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220804140729.2951259-1-kan.liang@linux.intel.com

fae9ebde

perf/x86/p4: Remove perfctr_second_write quirk · dbf4e792

Peter Zijlstra authored May 20, 2022

Now that we have a x86_pmu::set_period() method, use it to remove the
perfctr_second_write quirk from the generic code.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220829101321.839502514@infradead.org

dbf4e792

perf/x86/intel: Remove x86_pmu::update_topdown_event · 1acab2e0

Peter Zijlstra authored May 11, 2022

Now that it is all internal to the intel driver, remove
x86_pmu::update_topdown_event.

Assumes that is_topdown_count(event) can only be true when the
hardware has topdown stuff and the function is set.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220829101321.771635301@infradead.org

1acab2e0

perf/x86/intel: Remove x86_pmu::set_topdown_event_period · 23685167

Peter Zijlstra authored May 11, 2022

Now that it is all internal to the intel driver, remove
x86_pmu::set_topdown_event_period.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220829101321.706354189@infradead.org

23685167

perf/x86: Add a x86_pmu::limit_period static_call · 08b3068f

Peter Zijlstra authored May 10, 2022

Avoid a branch and indirect call.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220829101321.640658334@infradead.org

08b3068f

perf/x86: Change x86_pmu::limit_period signature · 28f0f3c4

Peter Zijlstra authored May 10, 2022

In preparation for making it a static_call, change the signature.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220829101321.573713839@infradead.org

28f0f3c4

perf/x86/intel: Move the topdown stuff into the intel driver · e577bb17

Peter Zijlstra authored May 10, 2022

Use the new x86_pmu::{set_period,update}() methods to push the topdown
stuff into the Intel driver, where it belongs.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220829101321.505933457@infradead.org

e577bb17

perf/x86: Add two more x86_pmu methods · 73759c34

Peter Zijlstra authored May 10, 2022

In order to clean up x86_perf_event_{set_period,update)() start by
adding them as x86_pmu methods.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220829101321.440196408@infradead.org

73759c34

perf: Add a few assertions · f3c0eba2

Peter Zijlstra authored Sep 02, 2022

While auditing 6b959ba2 ("perf/core: Fix reentry problem in
perf_output_read_group()") a few spots were found that wanted
assertions.

Notable for_each_sibling_event() relies on exclusion from
modification. This would normally be holding either ctx->lock or
ctx->mutex, however due to how things are constructed disabling IRQs
is a valid and sufficient substitute for ctx->lock.

Another possible site to add assertions would be the various
pmu::{add,del,read,..}() methods, but that's not trivially expressable
in C -- the best option is wrappers, but those are easy enough to
forget.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

f3c0eba2

x86/perf: Assert all platform event flags are within PERF_EVENT_FLAG_ARCH · 88081cfb

Anshuman Khandual authored Sep 07, 2022

Ensure all platform specific event flags are within PERF_EVENT_FLAG_ARCH.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220907091924.439193-5-anshuman.khandual@arm.com

88081cfb

arm64/perf: Assert all platform event flags are within PERF_EVENT_FLAG_ARCH · 91207f62

Anshuman Khandual authored Sep 07, 2022

91207f62

perf/core: Assert PERF_EVENT_FLAG_ARCH does not overlap with generic flags · f67dd218

Anshuman Khandual authored Sep 07, 2022

This just ensures that PERF_EVENT_FLAG_ARCH does not overlap with generic
hardware event flags.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220907091924.439193-3-anshuman.khandual@arm.com

f67dd218

perf/core: Expand PERF_EVENT_FLAG_ARCH · 7517f08b

Anshuman Khandual authored Sep 07, 2022

Two hardware event flags on x86 platform has overshot PERF_EVENT_FLAG_ARCH
(0x0000ffff). These flags are PERF_X86_EVENT_PEBS_LAT_HYBRID (0x20000) and
PERF_X86_EVENT_AMD_BRS (0x10000). Lets expand PERF_EVENT_FLAG_ARCH mask to
accommodate those flags, and also create room for two more in the future.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lkml.kernel.org/r/20220907091924.439193-2-anshuman.khandual@arm.com

7517f08b

perf: Consolidate branch sample filter helpers · 03b02db9

Anshuman Khandual authored Sep 06, 2022

Besides the branch type filtering requests, 'event.attr.branch_sample_type'
also contains various flags indicating which additional information should
be captured, along with the base branch record. These flags help configure
the underlying hardware, and capture the branch records appropriately when
required e.g after PMU interrupt. But first, this moves an existing helper
perf_sample_save_hw_index() into the header before adding some more helpers
for other branch sample filter flags.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220906084414.396220-1-anshuman.khandual@arm.com

03b02db9

06 Sep, 2022 6 commits

perf: Use sample_flags for txn · ee9db0e1