Commit 100ee7c3 authored by Ian Rogers's avatar Ian Rogers Committed by Arnaldo Carvalho de Melo

perf vendor events intel: Refresh skylakex metrics

Update the skylakex events from 1.28 to 1.29 and metrics to TMA
version 4.5. Generation was done using
https://github.com/intel/perfmon.

Notable changes are TMA info metrics are renamed from their node name
to be lower case and prefixed by tma_info_, MetricThreshold
expressions are added, "Sample with" documentation is added to many
TMA metrics, smi_cost and transaction metric groups are added
replicating existing hard coded metrics in stat-shadow.
Signed-off-by: default avatarIan Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-30-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
parent 9d9675bb
...@@ -25,7 +25,7 @@ GenuineIntel-6-2A,v18,sandybridge,core ...@@ -25,7 +25,7 @@ GenuineIntel-6-2A,v18,sandybridge,core
GenuineIntel-6-(8F|CF),v1.11,sapphirerapids,core GenuineIntel-6-(8F|CF),v1.11,sapphirerapids,core
GenuineIntel-6-(37|4A|4C|4D|5A),v15,silvermont,core GenuineIntel-6-(37|4A|4C|4D|5A),v15,silvermont,core
GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v54,skylake,core GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v54,skylake,core
GenuineIntel-6-55-[01234],v1.28,skylakex,core GenuineIntel-6-55-[01234],v1.29,skylakex,core
GenuineIntel-6-86,v1.20,snowridgex,core GenuineIntel-6-86,v1.20,snowridgex,core
GenuineIntel-6-8[CD],v1.08,tigerlake,core GenuineIntel-6-8[CD],v1.08,tigerlake,core
GenuineIntel-6-2C,v3,westmereep-dp,core GenuineIntel-6-2C,v3,westmereep-dp,core
......
...@@ -234,20 +234,22 @@ ...@@ -234,20 +234,22 @@
"UMask": "0x4f" "UMask": "0x4f"
}, },
{ {
"BriefDescription": "All retired load instructions.", "BriefDescription": "Retired load instructions.",
"Data_LA": "1", "Data_LA": "1",
"EventCode": "0xD0", "EventCode": "0xD0",
"EventName": "MEM_INST_RETIRED.ALL_LOADS", "EventName": "MEM_INST_RETIRED.ALL_LOADS",
"PEBS": "1", "PEBS": "1",
"PublicDescription": "Counts all retired load instructions. This event accounts for SW prefetch instructions of PREFETCHNTA or PREFETCHT0/1/2 or PREFETCHW.",
"SampleAfterValue": "2000003", "SampleAfterValue": "2000003",
"UMask": "0x81" "UMask": "0x81"
}, },
{ {
"BriefDescription": "All retired store instructions.", "BriefDescription": "Retired store instructions.",
"Data_LA": "1", "Data_LA": "1",
"EventCode": "0xD0", "EventCode": "0xD0",
"EventName": "MEM_INST_RETIRED.ALL_STORES", "EventName": "MEM_INST_RETIRED.ALL_STORES",
"PEBS": "1", "PEBS": "1",
"PublicDescription": "Counts all retired store instructions.",
"SampleAfterValue": "2000003", "SampleAfterValue": "2000003",
"UMask": "0x82" "UMask": "0x82"
}, },
...@@ -484,7 +486,7 @@ ...@@ -484,7 +486,7 @@
"UMask": "0x80" "UMask": "0x80"
}, },
{ {
"BriefDescription": "Cacheable and noncachaeble code read requests", "BriefDescription": "Cacheable and non-cacheable code read requests",
"EventCode": "0xB0", "EventCode": "0xB0",
"EventName": "OFFCORE_REQUESTS.DEMAND_CODE_RD", "EventName": "OFFCORE_REQUESTS.DEMAND_CODE_RD",
"PublicDescription": "Counts both cacheable and non-cacheable code read requests.", "PublicDescription": "Counts both cacheable and non-cacheable code read requests.",
......
...@@ -322,7 +322,7 @@ ...@@ -322,7 +322,7 @@
"UMask": "0x4" "UMask": "0x4"
}, },
{ {
"BriefDescription": "Cycles when uops are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy", "BriefDescription": "Cycles when uops are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequencer (MS) is busy",
"CounterMask": "1", "CounterMask": "1",
"EventCode": "0x79", "EventCode": "0x79",
"EventName": "IDQ.MS_CYCLES", "EventName": "IDQ.MS_CYCLES",
...@@ -331,7 +331,7 @@ ...@@ -331,7 +331,7 @@
"UMask": "0x30" "UMask": "0x30"
}, },
{ {
"BriefDescription": "Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy", "BriefDescription": "Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequencer (MS) is busy",
"CounterMask": "1", "CounterMask": "1",
"EventCode": "0x79", "EventCode": "0x79",
"EventName": "IDQ.MS_DSB_CYCLES", "EventName": "IDQ.MS_DSB_CYCLES",
...@@ -340,7 +340,7 @@ ...@@ -340,7 +340,7 @@
"UMask": "0x10" "UMask": "0x10"
}, },
{ {
"BriefDescription": "Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy", "BriefDescription": "Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequencer (MS) is busy",
"EventCode": "0x79", "EventCode": "0x79",
"EventName": "IDQ.MS_MITE_UOPS", "EventName": "IDQ.MS_MITE_UOPS",
"PublicDescription": "Counts the number of uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS) is busy. Counting includes uops that may 'bypass' the IDQ.", "PublicDescription": "Counts the number of uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS) is busy. Counting includes uops that may 'bypass' the IDQ.",
...@@ -358,7 +358,7 @@ ...@@ -358,7 +358,7 @@
"UMask": "0x30" "UMask": "0x30"
}, },
{ {
"BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy", "BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) while Microcode Sequencer (MS) is busy",
"EventCode": "0x79", "EventCode": "0x79",
"EventName": "IDQ.MS_UOPS", "EventName": "IDQ.MS_UOPS",
"PublicDescription": "Counts the total number of uops delivered by the Microcode Sequencer (MS). Any instruction over 4 uops will be delivered by the MS. Some instructions such as transcendentals may additionally generate uops from the MS.", "PublicDescription": "Counts the total number of uops delivered by the Microcode Sequencer (MS). Any instruction over 4 uops will be delivered by the MS. Some instructions such as transcendentals may additionally generate uops from the MS.",
......
...@@ -93,6 +93,22 @@ ...@@ -93,6 +93,22 @@
"SampleAfterValue": "400009", "SampleAfterValue": "400009",
"UMask": "0x10" "UMask": "0x10"
}, },
{
"BriefDescription": "Speculative and retired mispredicted macro conditional branches",
"EventCode": "0x89",
"EventName": "BR_MISP_EXEC.ALL_BRANCHES",
"PublicDescription": "This event counts both taken and not taken speculative and retired mispredicted branch instructions.",
"SampleAfterValue": "200003",
"UMask": "0xff"
},
{
"BriefDescription": "Speculative mispredicted indirect branches",
"EventCode": "0x89",
"EventName": "BR_MISP_EXEC.INDIRECT",
"PublicDescription": "Counts speculatively miss-predicted indirect branches at execution time. Counts for indirect near CALL or JMP instructions (RET excluded).",
"SampleAfterValue": "200003",
"UMask": "0xe4"
},
{ {
"BriefDescription": "All mispredicted macro branch instructions retired.", "BriefDescription": "All mispredicted macro branch instructions retired.",
"EventCode": "0xC5", "EventCode": "0xC5",
......
This source diff could not be displayed because it is too large. You can view the blob instead.
...@@ -1952,7 +1952,7 @@ ...@@ -1952,7 +1952,7 @@
"EventCode": "0x81", "EventCode": "0x81",
"EventName": "UNC_M_WPQ_OCCUPANCY", "EventName": "UNC_M_WPQ_OCCUPANCY",
"PerPkg": "1", "PerPkg": "1",
"PublicDescription": "Counts the number of entries in the Write Pending Queue (WPQ) at each cycle. This can then be used to calculate both the average queue occupancy (in conjunction with the number of cycles not empty) and the average latency (in conjunction with the number of allocations). The WPQ is used to schedule writes out to the memory controller and to track the requests. Requests allocate into the WPQ soon after they enter the memory controller, and need credits for an entry in this buffer before being sent from the CHA to the iMC (memory controller). They deallocate after being issued to DRAM. Write requests themselves are able to complete (from the perspective of the rest of the system) as soon they have 'posted' to the iMC. This is not to be confused with actually performing the write to DRAM. Therefore, the average latency for this queue is actually not useful for deconstruction intermediate write latencies. So, we provide filtering based on if the request has posted or not. By using the 'not posted' filter, we can track how long writes spent in the iMC before completions were sent to the HA. The 'posted' filter, on the other hand, provides information about how much queueing is actually happenning in the iMC for writes before they are actually issued to memory. High average occupancies will generally coincide with high write major mode counts. Is there a filter of sorts???", "PublicDescription": "Counts the number of entries in the Write Pending Queue (WPQ) at each cycle. This can then be used to calculate both the average queue occupancy (in conjunction with the number of cycles not empty) and the average latency (in conjunction with the number of allocations). The WPQ is used to schedule writes out to the memory controller and to track the requests. Requests allocate into the WPQ soon after they enter the memory controller, and need credits for an entry in this buffer before being sent from the CHA to the iMC (memory controller). They deallocate after being issued to DRAM. Write requests themselves are able to complete (from the perspective of the rest of the system) as soon they have 'posted' to the iMC. This is not to be confused with actually performing the write to DRAM. Therefore, the average latency for this queue is actually not useful for deconstruction intermediate write latencies. So, we provide filtering based on if the request has posted or not. By using the 'not posted' filter, we can track how long writes spent in the iMC before completions were sent to the HA. The 'posted' filter, on the other hand, provides information about how much queueing is actually happening in the iMC for writes before they are actually issued to memory. High average occupancies will generally coincide with high write major mode counts. Is there a filter of sorts???",
"Unit": "iMC" "Unit": "iMC"
}, },
{ {
......
...@@ -143,7 +143,7 @@ ...@@ -143,7 +143,7 @@
"EventCode": "0x80", "EventCode": "0x80",
"EventName": "UNC_P_POWER_STATE_OCCUPANCY.CORES_C0", "EventName": "UNC_P_POWER_STATE_OCCUPANCY.CORES_C0",
"PerPkg": "1", "PerPkg": "1",
"PublicDescription": "This is an occupancy event that tracks the number of cores that are in the chosen C-State. It can be used by itself to get the average number of cores in that C-state with threshholding to generate histograms, or with other PCU events and occupancy triggering to capture other details.", "PublicDescription": "This is an occupancy event that tracks the number of cores that are in the chosen C-State. It can be used by itself to get the average number of cores in that C-state with thresholding to generate histograms, or with other PCU events and occupancy triggering to capture other details.",
"Unit": "PCU" "Unit": "PCU"
}, },
{ {
...@@ -151,7 +151,7 @@ ...@@ -151,7 +151,7 @@
"EventCode": "0x80", "EventCode": "0x80",
"EventName": "UNC_P_POWER_STATE_OCCUPANCY.CORES_C3", "EventName": "UNC_P_POWER_STATE_OCCUPANCY.CORES_C3",
"PerPkg": "1", "PerPkg": "1",
"PublicDescription": "This is an occupancy event that tracks the number of cores that are in the chosen C-State. It can be used by itself to get the average number of cores in that C-state with threshholding to generate histograms, or with other PCU events and occupancy triggering to capture other details.", "PublicDescription": "This is an occupancy event that tracks the number of cores that are in the chosen C-State. It can be used by itself to get the average number of cores in that C-state with thresholding to generate histograms, or with other PCU events and occupancy triggering to capture other details.",
"Unit": "PCU" "Unit": "PCU"
}, },
{ {
...@@ -159,7 +159,7 @@ ...@@ -159,7 +159,7 @@
"EventCode": "0x80", "EventCode": "0x80",
"EventName": "UNC_P_POWER_STATE_OCCUPANCY.CORES_C6", "EventName": "UNC_P_POWER_STATE_OCCUPANCY.CORES_C6",
"PerPkg": "1", "PerPkg": "1",
"PublicDescription": "This is an occupancy event that tracks the number of cores that are in the chosen C-State. It can be used by itself to get the average number of cores in that C-state with threshholding to generate histograms, or with other PCU events and occupancy triggering to capture other details.", "PublicDescription": "This is an occupancy event that tracks the number of cores that are in the chosen C-State. It can be used by itself to get the average number of cores in that C-state with thresholding to generate histograms, or with other PCU events and occupancy triggering to capture other details.",
"Unit": "PCU" "Unit": "PCU"
}, },
{ {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment