- 27 Oct, 2021 35 commits
-
-
Suzuki K Poulose authored
With the TRBE driver workaround available, enable the config symbols to be built without COMPILE_TEST Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-16-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
With the workaround enabled in TRBE, enable the config entries to be built without COMPILE_TEST Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-15-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
TRBE implementations affected by Arm erratum (2253138 or 2224489), could write to the next address after the TRBLIMITR.LIMIT, instead of wrapping to the TRBBASER. This implies that the TRBE could potentially corrupt : - A page used by the rest of the kernel/user (if the LIMIT = end of perf ring buffer) - A page within the ring buffer, but outside the driver's range. [head, head + size]. This may contain some trace data, may be consumed by the userspace. We workaround this erratum by : - Making sure that there is at least an extra PAGE space left in the TRBE's range than we normally assign. This will be additional to other restrictions (e.g, the TRBE alignment for working around TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE. Thus we would have 2 * PAGE_SIZE) - Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by : TRBLIMITR.LIMIT -= PAGE_SIZE Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-14-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
The TRBE driver makes sure that there is enough space for a meaningful run, otherwise pads the given space and restarts the offset calculation once. But there is no guarantee that we may find space or hit "no space". Make sure that we repeat the step until, either : - We have the minimum space OR - There is NO space at all. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-13-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
For the TRBE to operate, we need a minimum space available to collect meaningful trace session. This is currently a few bytes, but we may need to extend this for working around errata. So, abstract this into a helper function. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-12-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from an erratum, which when triggered, might cause the TRBE to overwrite the trace data already collected in FILL mode, in the event of a WRAP. i.e, the TRBE doesn't stop writing the data, instead wraps to the base and could write upto 3 cache line size worth trace. Thus, this could corrupt the trace at the "BASE" pointer. The workaround is to program the write pointer 256bytes from the base, such that if the erratum is triggered, it doesn't overwrite the trace data that was captured. This skipped region could be padded with ignore packets at the end of the session, so that the decoder sees a continuous buffer with some padding at the beginning. The trace data written at the base is considered lost as the limit could have been in the middle of the perf ring buffer, and jumping to the "base" is not acceptable. We set the flags already to indicate that some amount of trace was lost during the FILL event IRQ. So this is fine. One important change with the work around is, we program the TRBBASER_EL1 to current page where we are allowed to write. Otherwise, it could overwrite a region that may be consumed by the perf. Towards this, we always make sure that the "handle->head" and thus the trbe_write is PAGE_SIZE aligned, so that we can set the BASE to the PAGE base and move the TRBPTR to the 256bytes offset. Cc: Mike Leach <mike.leach@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-11-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
Add a minimal infrastructure to keep track of the errata affecting the given TRBE instance. Given that we have heterogeneous CPUs, we have to manage the list per-TRBE instance to be able to apply the work around as needed. Thus we will need to check if individual CPUs are affected by the erratum. We rely on the arm64 errata framework for the actual description and the discovery of a given erratum, to keep the Erratum work around at a central place and benefit from the code and the advertisement from the kernel. Though we could reuse the "this_cpu_has_cap()" to apply an erratum work around, it is a bit of a heavy operation, as it must go through the "erratum" detection check on the CPU every time it is called (e.g, scanning through a table of affected MIDRs). Since we need to do this check for every session, may be multiple times (depending on the wrok around), we could save the cycles by caching the affected errata per-CPU instance in the per-CPU struct trbe_cpudata. Since we are only interested in the errata affecting the TRBE driver, we only need to track a very few of them per-CPU. Thus we use a local mapping of the CPUCAP for the erratum to avoid bloating up a bitmap for trbe_cpudata. i.e, each arm64 TRBE erratum bit is assigned a "index" within the driver to track. Each trbe instance updates the list of affected erratum at probe time on the CPU. This makes sure that we can easily access the list of errata on a given TRBE instance without much overhead. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-10-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
The TRBE hardware mandates a minimum alignment for the TRBPTR_EL1, advertised via the TRBIDR_EL1. This is used by the driver to align the buffer write head. This patch allows the driver to choose a different alignment from that of the hardware, by decoupling the alignment tracking. This will be useful for working around errata. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-9-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
We always set the TRBBASER_EL1 to the base of the virtual ring buffer. We are about to change this for working around an erratum. So, in preparation to that, allow the driver to choose a different base for the TRBBASER_EL1 (which is within the buffer range). Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-8-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
Refactor the helper to pad a given AUX buffer area to allow "filling" ignore packets, without moving any handle pointers. This will be useful in working around errata, where we may have to fill the buffer after a session. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-7-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
We collect the trace from the TRBE on FILL event from IRQ context and via update_buffer(), when the event is stopped. Let us consolidate how we calculate the trace generated into a helper. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-6-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
If a CPU is offline during the driver init, we could end up causing a kernel crash trying to register the coresight device for the TRBE instance. The trbe_cpudata for the TRBE instance is initialized only when it is probed. Otherwise, we could end up dereferencing a NULL cpudata->drvdata. e.g: [ 0.149999] coresight ete0: CPU0: ete v1.1 initialized [ 0.149999] coresight-etm4x ete_1: ETM arch init failed [ 0.149999] coresight-etm4x: probe of ete_1 failed with error -22 [ 0.150085] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050 [ 0.150085] Mem abort info: [ 0.150085] ESR = 0x96000005 [ 0.150085] EC = 0x25: DABT (current EL), IL = 32 bits [ 0.150085] SET = 0, FnV = 0 [ 0.150085] EA = 0, S1PTW = 0 [ 0.150085] Data abort info: [ 0.150085] ISV = 0, ISS = 0x00000005 [ 0.150085] CM = 0, WnR = 0 [ 0.150085] [0000000000000050] user address but active_mm is swapper [ 0.150085] Internal error: Oops: 96000005 [#1] PREEMPT SMP [ 0.150085] Modules linked in: [ 0.150085] Hardware name: FVP Base RevC (DT) [ 0.150085] pstate: 00800009 (nzcv daif -PAN +UAO -TCO BTYPE=--) [ 0.150155] pc : arm_trbe_register_coresight_cpu+0x74/0x144 [ 0.150155] lr : arm_trbe_register_coresight_cpu+0x48/0x144 ... [ 0.150237] Call trace: [ 0.150237] arm_trbe_register_coresight_cpu+0x74/0x144 [ 0.150237] arm_trbe_device_probe+0x1c0/0x2d8 [ 0.150259] platform_drv_probe+0x94/0xbc [ 0.150259] really_probe+0x1bc/0x4a8 [ 0.150266] driver_probe_device+0x7c/0xb8 [ 0.150266] device_driver_attach+0x6c/0xac [ 0.150266] __driver_attach+0xc4/0x148 [ 0.150266] bus_for_each_dev+0x7c/0xc8 [ 0.150266] driver_attach+0x24/0x30 [ 0.150266] bus_add_driver+0x100/0x1e0 [ 0.150266] driver_register+0x78/0x110 [ 0.150266] __platform_driver_register+0x44/0x50 [ 0.150266] arm_trbe_init+0x28/0x84 [ 0.150266] do_one_initcall+0x94/0x2bc [ 0.150266] do_initcall_level+0xa4/0x158 [ 0.150266] do_initcalls+0x54/0x94 [ 0.150319] do_basic_setup+0x24/0x30 [ 0.150319] kernel_init_freeable+0xe8/0x14c [ 0.150319] kernel_init+0x14/0x18c [ 0.150319] ret_from_fork+0x10/0x30 [ 0.150319] Code: f94012c8 b0004ce2 9134a442 52819801 (f9402917) [ 0.150319] ---[ end trace d23e0cfe5098535e ]--- [ 0.150346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Fix this by skipping the step, if we are unable to probe the CPU. Fixes: 3fbf7f01 ("coresight: sink: Add TRBE driver") Reported-by: Bransilav Rankov <branislav.rankov@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: stable <stable@vger.kernel.org> Tested-by: Branislav Rankov <branislav.rankov@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20211014142238.2221248-1-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
The TRBE driver wrongly treats the aux private data as the TRBE driver specific buffer for a given perf handle, while it is the ETM PMU's event specific data. Fix this by correcting the instance to use appropriate helper. Cc: stable <stable@vger.kernel.org> Fixes: 3fbf7f01 ("coresight: sink: Add TRBE driver") Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210921134121.2423546-2-suzuki.poulose@arm.com [Fixed 13 character SHA down to 12] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Tao Zhang authored
Add ETM PID for Kryo-5XX to the list of supported ETMs. Otherwise, Kryo-5XX ETMs will not be initialized successfully. e.g. This change can be verified on qrb5165-rb5 board. ETM4-ETM7 nodes will not be visible without this change. Signed-off-by: Tao Zhang <quic_taozha@quicinc.com> Link: https://lore.kernel.org/r/1632477981-13632-2-git-send-email-quic_taozha@quicinc.comSigned-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
When the TRBE generates an IRQ, we stop the TRBE, collect the trace and then reprogram the TRBE with the updated buffer pointers, whenever possible. We might also leave the TRBE disabled, if there is not enough space left in the buffer. However, we do not touch the ETE at all during all of this. This means the ETE is only disabled when the event is disabled later (via irq_work). This is incorrect, as the ETE trace is still ON without actually being captured and may be routed to the ATB (even if it is for a short duration). So, we move the CPU into trace prohibited state always before disabling the TRBE, upon entering the IRQ handler. The state is restored if the TRBE is enabled back. Otherwise the trace remains prohibited. Since, the ETM/ETE driver now controls the TRFCR_EL1 per session, the tracing can be restored/enabled back when the event is rescheduled in. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210923143919.2944311-6-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
When we detect that there isn't enough space left to start a meaningful session, we disable the TRBE, marking the buffer as TRUNCATED. But we delay the notification to the perf layer by perf_aux_output_end() until the event is scheduled out, triggered from the kernel perf layer. This will cause significant black outs in the trace. Now that the CoreSight PMU layer can handle a closed "AUX" handle properly, we can close the handle as soon as we detect the case, allowing the userspace to collect and re-enable the event. Also, while in the IRQ handler, move the irq_work_run() after we have updated the handle, to make sure the "TRUNCATED" flag causes the event to be disabled as soon as possible. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210923143919.2944311-5-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
The TRBE driver marks the AUX buffer as TRUNCATED when we get an IRQ on FILL event. This has rather unwanted side-effect of the event being disabled when there may be more space in the ring buffer. So, instead of TRUNCATE we need a different flag to indicate that the trace may have lost a few bytes (i.e from the point of generating the FILL event until the IRQ is consumed). Anyways, the userspace must use the size from RECORD_AUX headers to restrict the "trace" decoding. Using PARTIAL flag causes the perf tool to generate the following warning: Warning: AUX data had gaps in it XX times out of YY! Are you running a KVM guest in the background? which is pointlessly scary for a user. The other remaining options are : - COLLISION - Use by SPE to indicate samples collided - Add a new flag - Specifically for CoreSight, doesn't sound so good, if we can re-use something. Given that we don't already use the "COLLISION" flag, the above behavior can be notified using this flag for CoreSight. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: James Clark <james.clark@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210923143919.2944311-4-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
On a spurious IRQ, right now we disable the TRBE and then re-enable it back, resetting the "buffer" pointers(i.e BASE, LIMIT and more importantly WRITE) to the original pointers from the AUX handle. This implies that we overwrite any trace that was written so far, (by overwriting TRBPTR) while we should have ignored the IRQ. On detecting a spurious IRQ after examining the TRBSR we simply re-enable the TRBE without touching the other parameters. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210923143919.2944311-3-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
The IRQ handler of the TRBE driver could race against the update_buffer() in consuming the IRQ. So, if the update_buffer() gets to processing the TRBE irq, the TRBSR will be cleared. Thus by the time IRQ handler is triggered, there is nothing to do there. Handle these cases and do not disable the TRBE unnecessarily. Since the TRBSR can be read without stopping the TRBE, we can check that before disabling the TRBE. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210923143919.2944311-2-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
Unify the sequence of enabling the TRBE. We do this from event_start and also from the TRBE IRQ handler. Lets move this to a common helper. The only minor functional change is returning an error when we fail to enable the TRBE. This should be handled already. Since we now have unique entry point to trying to enable TRBE, move the format flag setting to the central place. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-9-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
We mark the buffer as TRUNCATED when there is no space left in the buffer. But we do it at different points. __trbe_normal_offset() and also, at all the callers of the above function via compute_trbe_buffer_limit(), when the limit == base (i.e offset = 0 as returned by the __trbe_normal_offset()). So, given that the callers already mark the buffer as TRUNCATED drop the caller inside the __trbe_normal_offset(). This is in preparation to moving the handling of TRUNCATED into a central place. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-6-suzuki.poulose@arm.com [Moved comment as Anshuman requested] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
When the TRBE is stopped on truncating an event, we may not set the FORMAT flag, even though the size of the record is 0. Let us be consistent and not confuse the user. To ensure that the format flag is always set on all the records generated by TRBE, set the flag when we have a new handle. Rather than deferring to the "end" operation, which makes it clear. So, we can do this from - arm_trbe_enable() -> When a new handle is provided by the CoreSight PMU, triggered via etm_event_start() - trbe_handle_overflow() -> When we begin a new handle after closing the previous on overflow. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-5-suzuki.poulose@arm.com [Fixed inverted words in title] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
The ETM perf infrastructure closes out a handle during event_stop or on an error in starting the event. In either case, it is possible for a "sink" to update/close the handle, under certain circumstances. (e.g no space in ring buffer.). So, ensure that we handle this gracefully in the PMU driver by verifying the handle is still valid. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-4-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
The Trace Filtering support (FEAT_TRF) ensures that the ETM can be prohibited from generating any trace for a given EL. This is much stricter knob, than the TRCVICTLR exception level masks, which doesn't prevent the ETM from generating Context packets for an "excluded" EL. At the moment, we do a onetime enable trace at user and kernel and leave it untouched for the kernel life time. This implies that the ETM could potentially generate trace packets containing the kernel addresses, and thus leaking the kernel virtual address in the trace. This patch makes the switch dynamic, by honoring the filters set by the user and enforcing them in the TRFCR controls. We also rename the cpu_enable_tracing() appropriately to cpu_detect_trace_filtering() and the drvdata member trfc => trfcr to indicate the "value" of the TRFCR_EL1. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Al Grant <al.grant@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-3-suzuki.poulose@arm.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Suzuki K Poulose authored
When the CPU enters a low power mode, the TRFCR_EL1 contents could be reset. Thus we need to save/restore the TRFCR_EL1 along with the ETM4x registers to allow the tracing. The TRFCR related helpers are in a new header file, as we need to use them for TRBE in the later patches. Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210914102641.1852544-2-suzuki.poulose@arm.com [Fixed cosmetic details] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
James Clark authored
When a traced process runs on a CPU that can't reach the selected sink, the event will be stopped with PERF_HES_STOPPED. This means that even if the process migrates to a valid CPU, tracing will not resume. This can be reproduced (on N1SDP) by using taskset to start the process on CPU 0, and then switching it to CPU 2 (ETF 1 is only reachable from CPU 2): taskset --cpu-list 0 ./perf record -e cs_etm/@tmc_etf1/ --per-thread -- taskset --cpu-list 2 ls This produces a single 0 length AUX record, and then no more trace: 0x3c8 [0x30]: PERF_RECORD_AUX offset: 0 size: 0 flags: 0x1 [T] After the fix, the same command produces normal AUX records. The perf self test "89: Check Arm CoreSight trace data recording and synthesized samples" no longer fails intermittently. This was because the taskset in the test is after the fork, so there is a period where the task is scheduled on a random CPU rather than forced to a valid one. Specifically selecting an invalid CPU will still result in a failure to open the event because it will never produce trace: ./perf record -C 2 -e cs_etm/@tmc_etf0/ failed to mmap with 12 (Cannot allocate memory) The only scenario that has changed is if the CPU mask has a valid CPU sink combo in it. Testing ======= * Coresight self test passes consistently: ./perf test Coresight * CPU wide mode still produces trace: ./perf record -e cs_etm// -a * Invalid -C options still fail to open: ./perf record -C 2,3 -e cs_etm/@tmc_etf0/ failed to mmap with 12 (Cannot allocate memory) * Migrating a task to a valid sink/CPU now produces trace: taskset --cpu-list 0 ./perf record -e cs_etm/@tmc_etf1/ --per-thread -- taskset --cpu-list 2 ls * If the task remains on an invalid CPU, no trace is emitted: taskset --cpu-list 0 ./perf record -e cs_etm/@tmc_etf1/ --per-thread -- ls Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210922125144.133872-2-james.clark@arm.comSigned-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Leo Yan authored
The AUX bounce buffer is allocated with API dma_alloc_coherent(), in the low level's architecture code, e.g. for Arm64, it maps the memory with the attribution "Normal non-cacheable"; this can be concluded from the definition for pgprot_dmacoherent() in arch/arm64/include/asm/pgtable.h. Later when access the AUX bounce buffer, since the memory mapping is non-cacheable, it's low efficiency due to every load instruction must reach out DRAM. This patch changes to allocate pages with dma_alloc_noncoherent(), the driver can access the memory via cacheable mapping; therefore, load instructions can fetch data from cache lines rather than always read data from DRAM, the driver can boost memory performance. After using the cacheable mapping, the driver uses dma_sync_single_for_cpu() to invalidate cacheline prior to read bounce buffer so can avoid read stale trace data. By measurement the duration for function tmc_update_etr_buffer() with ftrace function_graph tracer, it shows the performance significant improvement for copying 4MiB data from bounce buffer: # echo tmc_etr_get_data_flat_buf > set_graph_notrace // avoid noise # echo tmc_update_etr_buffer > set_graph_function # echo function_graph > current_tracer before: # CPU DURATION FUNCTION CALLS # | | | | | | | 2) | tmc_update_etr_buffer() { ... 2) # 8148.320 us | } after: # CPU DURATION FUNCTION CALLS # | | | | | | | 2) | tmc_update_etr_buffer() { ... 2) # 2525.420 us | } Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210905032144.966766-1-leo.yan@linaro.orgSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Leo Yan authored
Commit 2f01c200 ("perf cs-etm: Remove callback cs_etm_find_snapshot()") has removed the function cs_etm_find_snapshot() from the perf tool in the user space, now CoreSight trace directly uses the perf common function __auxtrace_mmap__read() to calcualte the head and size for AUX trace data in snapshot mode. This patch updates the comments in drivers to make them generic and not stick to any specific function from perf tool. Signed-off-by: Leo Yan <leo.yan@linaro.org> Link: https://lore.kernel.org/r/20210912125748.2816606-3-leo.yan@linaro.orgSigned-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Leo Yan authored
When enable the Arm CoreSight PMU event, the context for AUX ring buffer is prepared in the structure perf_output_handle, and its field "head" points the head of the AUX ring buffer and it is updated after filling AUX trace data into buffer. Current code uses an extra field etr_perf_buffer::head to maintain the header for the AUX ring buffer which is not necessary; alternatively, it's better to directly use perf_output_handle::head. This patch removes the field etr_perf_buffer::head and directly uses perf_output_handle::head for the head of AUX ring buffer. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210912125748.2816606-2-leo.yan@linaro.orgSigned-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Leo Yan authored
Since the function CS_LOCK() has contained memory barrier mb(), it ensures the visibility of the AUX trace data before updating the aux_head, thus it's needless to add any explicit barrier anymore. Add comment to make clear for the barrier usage for ETF. Signed-off-by: Leo Yan <leo.yan@linaro.org> Link: https://lore.kernel.org/r/20210809111407.596077-4-leo.yan@linaro.orgSigned-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Leo Yan authored
Since a memory barrier is required between AUX trace data store and aux_head store, and the AUX trace data is filled with memcpy(), it's sufficient to use smp_wmb() so can ensure the trace data is visible prior to updating aux_head. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20210809111407.596077-3-leo.yan@linaro.orgSigned-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Tanmay Jagdale authored
The current driver sets the write burst size initiated by TMC-ETR on AXI bus to a fixed value of 16. Make this configurable by reading the value specified in fwnode. If not specified, then default to 16. Introduced a "max_burst_size" variable in tmc_drvdata structure to facilitate this change. Signed-off-by: Tanmay Jagdale <tanmay@marvell.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Link: https://lore.kernel.org/r/20210901131049.1365367-3-tanmay@marvell.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Tanmay Jagdale authored
Add "arm,max-burst-size" optional property for TMC ETR. If specified, this value indicates the maximum burst size that can be initiated by TMC on the AXI bus. Signed-off-by: Tanmay Jagdale <tanmay@marvell.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20210901131049.1365367-2-tanmay@marvell.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Brian Norris authored
Debugfs is nice and so are module parameters, but * debugfs doesn't take effect early (e.g., if drivers are locking up before user space gets anywhere) * module parameters either add a lot to the kernel command line, or else take effect late as well (if you build =m and configure in /etc/modprobe.d/) So in the same spirit as these CONFIG_PANIC_ON_OOPS (also available via cmdline or modparam) CONFIG_INTEL_IOMMU_DEFAULT_ON (also available via cmdline) add a new Kconfig option. Module parameters and debugfs can still override. Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Leo Yan <leo.yan@linaro.org> [Fixed missing double quote in Kconfig title] Link: https://lore.kernel.org/r/20210903182839.1.I20856983f2841b78936134dcf9cdf6ecafe632b9@changeidSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
Tao Zhang authored
The input parameter of the function pm_runtime_put should be the same in the function cti_enable_hw and cti_disable_hw. The correct parameter to use here should be dev->parent. Signed-off-by: Tao Zhang <quic_taozha@quicinc.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Fixes: 835d722b ("coresight: cti: Initial CoreSight CTI Driver") Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/1629365377-5937-1-git-send-email-quic_taozha@quicinc.comSigned-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
-
- 21 Oct, 2021 4 commits
-
-
Suzuki K Poulose authored
Arm Neoverse-N2 and Cortex-A710 cores are affected by an erratum where the trbe, under some circumstances, might write upto 64bytes to an address after the Limit as programmed by the TRBLIMITR_EL1.LIMIT. This might - - Corrupt a page in the ring buffer, which may corrupt trace from a previous session, consumed by userspace. - Hit the guard page at the end of the vmalloc area and raise a fault. To keep the handling simpler, we always leave the last page from the range, which TRBE is allowed to write. This can be achieved by ensuring that we always have more than a PAGE worth space in the range, while calculating the LIMIT for TRBE. And then the LIMIT pointer can be adjusted to leave the PAGE (TRBLIMITR.LIMIT -= PAGE_SIZE), out of the TRBE range while enabling it. This makes sure that the TRBE will only write to an area within its allowed limit (i.e, [head-head+size]) and we do not have to handle address faults within the driver. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-5-suzuki.poulose@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Suzuki K Poulose authored
Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers from errata, where a TSB (trace synchronization barrier) fails to flush the trace data completely, when executed from a trace prohibited region. In Linux we always execute it after we have moved the PE to trace prohibited region. So, we can apply the workaround every time a TSB is executed. The work around is to issue two TSB consecutively. NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying that a late CPU could be blocked from booting if it is the first CPU that requires the workaround. This is because we do not allow setting a cpu_hwcaps after the SMP boot. The other alternative is to use "this_cpu_has_cap()" instead of the faster system wide check, which may be a bit of an overhead, given we may have to do this in nvhe KVM host before a guest entry. Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-4-suzuki.poulose@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Suzuki K Poulose authored
Arm Neoverse-N2 and the Cortex-A710 cores are affected by a CPU erratum where the TRBE will overwrite the trace buffer in FILL mode. The TRBE doesn't stop (as expected in FILL mode) when it reaches the limit and wraps to the base to continue writing upto 3 cache lines. This will overwrite any trace that was written previously. Add the Neoverse-N2 erratum(#2139208) and Cortex-A710 erratum (#2119858) to the detection logic. This will be used by the TRBE driver in later patches to work around the issue. The detection has been kept with the core arm64 errata framework list to make sure : - We don't duplicate the framework in TRBE driver - The errata detection is advertised like the rest of the CPU errata. Note that the Kconfig entries are not fully active until the TRBE driver implements the work around. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> cc: Leo Yan <leo.yan@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-3-suzuki.poulose@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
Suzuki K Poulose authored
Add the CPU Partnumbers for the new Arm designs. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20211019163153.3692640-2-suzuki.poulose@arm.comSigned-off-by: Will Deacon <will@kernel.org>
-
- 26 Sep, 2021 1 commit
-
-
Linus Torvalds authored
-