An error occurred fetching the project authors.
- 16 Feb, 2024 11 commits
-
-
James Morse authored
When a CPU is taken offline resctrl may need to move the overflow or limbo handlers to run on a different CPU. Once the offline callbacks have been split, cqm_setup_limbo_handler() will be called while the CPU that is going offline is still present in the CPU mask. Pass the CPU to exclude to cqm_setup_limbo_handler() and mbm_setup_overflow_handler(). These functions can use a variant of cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs need excluding. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Babu Moger <babu.moger@amd.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-22-james.morse@arm.comSigned-off-by:
Borislav Petkov (AMD) <bp@alien8.de>
-
James Morse authored
resctrl reads rdt_alloc_capable or rdt_mon_capable to determine whether any of the resources support the corresponding features. resctrl also uses the static keys that affect the architecture's context-switch code to determine the same thing. This forces another architecture to have the same static keys. As the static key is enabled based on the capable flag, and none of the filesystem uses of these are in the scheduler path, move the capable flags behind helpers, and use these in the filesystem code instead of the static key. After this change, only the architecture code manages and uses the static keys to ensure __resctrl_sched_in() does not need runtime checks. This avoids multiple architectures having to define the same static keys. Cases where the static key implicitly tested if the resctrl filesystem was mounted all have an explicit check now. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Reviewed-by:
Babu Moger <babu.moger@amd.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-20-james.morse@arm.comSigned-off-by:
Borislav Petkov (AMD) <bp@alien8.de>
-
James Morse authored
resctrl enables three static keys depending on the features it has enabled. Another architecture's context switch code may look different, any static keys that control it should be buried behind helpers. Move the alloc/mon logic into arch-specific helpers as a preparatory step for making the rdt_enable_key's status something the arch code decides. This means other architectures don't have to mirror the static keys. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Reviewed-by:
Babu Moger <babu.moger@amd.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-18-james.morse@arm.comSigned-off-by:
Borislav Petkov (AMD) <bp@alien8.de>
-
James Morse authored
The rdt_enable_key is switched when resctrl is mounted, and used to prevent a second mount of the filesystem. It also enables the architecture's context switch code. This requires another architecture to have the same set of static keys, as resctrl depends on them too. The existing users of these static keys are implicitly also checking if the filesystem is mounted. Make the resctrl_mounted checks explicit: resctrl can keep track of whether it has been mounted once. This doesn't need to be combined with whether the arch code is context switching the CLOSID. rdt_mon_enable_key is never used just to test that resctrl is mounted, but does also have this implication. Add a resctrl_mounted to all uses of rdt_mon_enable_key. This will allow the static key changing to be moved behind resctrl_arch_ calls. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Reviewed-by:
Babu Moger <babu.moger@amd.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-17-james.morse@arm.comSigned-off-by:
Borislav Petkov (AMD) <bp@alien8.de>
-
James Morse authored
Depending on the number of monitors available, Arm's MPAM may need to allocate a monitor prior to reading the counter value. Allocating a contended resource may involve sleeping. __check_limbo() and mon_event_count() each make multiple calls to resctrl_arch_rmid_read(), to avoid extra work on contended systems, the allocation should be valid for multiple invocations of resctrl_arch_rmid_read(). The memory or hardware allocated is not specific to a domain. Add arch hooks for this allocation, which need calling before resctrl_arch_rmid_read(). The allocated monitor is passed to resctrl_arch_rmid_read(), then freed again afterwards. The helper can be called on any CPU, and can sleep. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Reviewed-by:
Babu Moger <babu.moger@amd.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-16-james.morse@arm.comSigned-off-by:
Borislav Petkov (AMD) <bp@alien8.de>
-
James Morse authored
The limbo and overflow code picks a CPU to use from the domain's list of online CPUs. Work is then scheduled on these CPUs to maintain the limbo list and any counters that may overflow. cpumask_any() may pick a CPU that is marked nohz_full, which will either penalise the work that CPU was dedicated to, or delay the processing of limbo list or counters that may overflow. Perhaps indefinitely. Delaying the overflow handling will skew the bandwidth values calculated by mba_sc, which expects to be called once a second. Add cpumask_any_housekeeping() as a replacement for cpumask_any() that prefers housekeeping CPUs. This helper will still return a nohz_full CPU if that is the only option. The CPU to use is re-evaluated each time the limbo/overflow work runs. This ensures the work will move off a nohz_full CPU once a housekeeping CPU is available. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Reviewed-by:
Babu Moger <babu.moger@amd.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-13-james.morse@arm.comSigned-off-by:
Borislav Petkov (AMD) <bp@alien8.de>
-
James Morse authored
MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be used for different control groups. This means once a CLOSID is allocated, all its monitoring ids may still be dirty, and held in limbo. Instead of allocating the first free CLOSID, on architectures where CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID is enabled, search closid_num_dirty_rmid[] to find the cleanest CLOSID. The CLOSID found is returned to closid_alloc() for the free list to be updated. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Reviewed-by:
Babu Moger <babu.moger@amd.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-11-james.morse@arm.comSigned-off-by:
Borislav Petkov (AMD) <bp@alien8.de>
-
James Morse authored
MPAMs RMID values are not unique unless the CLOSID is considered as well. alloc_rmid() expects the RMID to be an independent number. Pass the CLOSID in to alloc_rmid(). Use this to compare indexes when allocating. If the CLOSID is not relevant to the index, this ends up comparing the free RMID with itself, and the first free entry will be used. With MPAM the CLOSID is included in the index, so this becomes a walk of the free RMID entries, until one that matches the supplied CLOSID is found. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Reviewed-by:
Babu Moger <babu.moger@amd.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-8-james.morse@arm.comSigned-off-by:
Borislav Petkov (AMD) <bp@alien8.de>
-
James Morse authored
x86 systems identify traffic using the CLOSID and RMID. The CLOSID is used to lookup the control policy, the RMID is used for monitoring. For x86 these are independent numbers. Arm's MPAM has equivalent features PARTID and PMG, where the PARTID is used to lookup the control policy. The PMG in contrast is a small number of bits that are used to subdivide PARTID when monitoring. The cache-occupancy monitors require the PARTID to be specified when monitoring. This means MPAM's PMG field is not unique. There are multiple PMG-0, one per allocated CLOSID/PARTID. If PMG is treated as equivalent to RMID, it cannot be allocated as an independent number. Bitmaps like rmid_busy_llc need to be sized by the number of unique entries for this resource. Treat the combined CLOSID and RMID as an index, and provide architecture helpers to pack and unpack an index. This makes the MPAM values unique. The domain's rmid_busy_llc and rmid_ptrs[] are then sized by index, as are domain mbm_local[] and mbm_total[]. x86 can ignore the CLOSID field when packing and unpacking an index, and report as many indexes as RMID. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Babu Moger <babu.moger@amd.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-7-james.morse@arm.comSigned-off-by:
Borislav Petkov (AMD) <bp@alien8.de>
-
James Morse authored
x86's RMID are independent of the CLOSID. An RMID can be allocated, used and freed without considering the CLOSID. MPAM's equivalent feature is PMG, which is not an independent number, it extends the CLOSID/PARTID space. For MPAM, only PMG-bits worth of 'RMID' can be allocated for a single CLOSID. i.e. if there is 1 bit of PMG space, then each CLOSID can have two monitor groups. To allow resctrl to disambiguate RMID values for different CLOSID, everything in resctrl that keeps an RMID value needs to know the CLOSID too. This will always be ignored on x86. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Xin Hao <xhao@linux.alibaba.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-6-james.morse@arm.comSigned-off-by:
Borislav Petkov (AMD) <bp@alien8.de>
-
James Morse authored
rmid_ptrs[] is allocated from dom_data_init() but never free()d. While the exit text ends up in the linker script's DISCARD section, the direction of travel is for resctrl to be/have loadable modules. Add resctrl_put_mon_l3_config() to cleanup any memory allocated by rdt_get_mon_l3_config(). There is no reason to backport this to a stable kernel. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Babu Moger <babu.moger@amd.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-3-james.morse@arm.comSigned-off-by:
Borislav Petkov (AMD) <bp@alien8.de>
-
- 24 Jan, 2024 1 commit
-
-
Tony Luck authored
The mba_MBps feedback loop increases throttling when a group is using more bandwidth than the target set by the user in the schemata file, and decreases throttling when below target. To avoid possibly stepping throttling up and down on every poll a flag "delta_comp" is set whenever throttling is changed to indicate that the actual change in bandwidth should be recorded on the next poll in "delta_bw". Throttling is only reduced if the current bandwidth plus delta_bw is below the user target. This algorithm works well if the workload has steady bandwidth needs. But it can go badly wrong if the workload moves to a different phase just as the throttling level changed. E.g. if the workload becomes essentially idle right as throttling level is increased, the value calculated for delta_bw will be more or less the old bandwidth level. If the workload then resumes, Linux may never reduce throttling because current bandwidth plus delta_bw is above the target set by the user. Implement a simpler heuristic by assuming that in the worst case the currently measured bandwidth is being controlled by the current level of throttling. Compute how much it may increase if throttling is relaxed to the next higher level. If that is still below the user target, then it is ok to reduce the amount of throttling. Fixes: ba0f26d8 ("x86/intel_rdt/mba_sc: Prepare for feedback loop") Reported-by:
Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by:
Tony Luck <tony.luck@intel.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xiaochen Shen <xiaochen.shen@intel.com> Link: https://lore.kernel.org/r/20240122180807.70518-1-tony.luck@intel.com
-
- 23 Jan, 2024 2 commits
-
-
Babu Moger authored
If the BMEC (Bandwidth Monitoring Event Configuration) feature is supported, the bandwidth events can be configured. The maximum supported bandwidth bitmask can be read from CPUID: CPUID_Fn80000020_ECX_x03 [Platform QoS Monitoring Bandwidth Event Configuration] Bits Description 31:7 Reserved 6:0 Identifies the bandwidth sources that can be tracked. While at it, move the mask checking to mon_config_write() before iterating over all the domains. Also, print the valid bitmask when the user tries to configure invalid event configuration value. The CPUID details are documented in the Processor Programming Reference (PPR) Vol 1.1 for AMD Family 19h Model 11h B1 - 55901 Rev 0.25 in the Link tag. Fixes: dc2a3e85 ("x86/resctrl: Add interface to read mbm_total_bytes_config") Signed-off-by:
Babu Moger <babu.moger@amd.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Link: https://lore.kernel.org/r/669896fa512c7451319fa5ca2fdb6f7e015b5635.1705359148.git.babu.moger@amd.com
-
Babu Moger authored
The QOS Memory Bandwidth Enforcement Limit is reported by CPUID_Fn80000020_EAX_x01 and CPUID_Fn80000020_EAX_x02: Bits Description 31:0 BW_LEN: Size of the QOS Memory Bandwidth Enforcement Limit. Newer processors can support higher bandwidth limit than the current hard-coded value. Remove latter and detect using CPUID instead. Also, update the register variables eax and edx to match the AMD CPUID definition. The CPUID details are documented in the Processor Programming Reference (PPR) Vol 1.1 for AMD Family 19h Model 11h B1 - 55901 Rev 0.25 in the Link tag below. Fixes: 4d05bf71 ("x86/resctrl: Introduce AMD QOS feature") Signed-off-by:
Babu Moger <babu.moger@amd.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Link: https://lore.kernel.org/r/c26a8ca79d399ed076cf8bf2e9fbc58048808289.1705359148.git.babu.moger@amd.com
-
- 17 Oct, 2023 4 commits
-
-
Babu Moger authored
In x86, hardware uses RMID to identify a monitoring group. When a user creates a monitor group these details are not visible. These details can help resctrl debugging. Add RMID(mon_hw_id) to the monitor groups display in the resctrl interface. Users can see these details when resctrl is mounted with "-o debug" option. Add RFTYPE_MON_BASE that complements existing RFTYPE_CTRL_BASE and represents files belonging to monitoring groups. Other architectures do not use "RMID". Use the name mon_hw_id to refer to "RMID" in an effort to keep the naming generic. For example: $cat /sys/fs/resctrl/mon_groups/mon_grp1/mon_hw_id 3 Signed-off-by:
Babu Moger <babu.moger@amd.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Peter Newman <peternewman@google.com> Reviewed-by:
Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by:
Fenghua Yu <fenghua.yu@intel.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Reviewed-by:
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-10-babu.moger@amd.com
-
Babu Moger authored
Add "-o debug" option to mount resctrl filesystem in debug mode. When in debug mode resctrl displays files that have the new RFTYPE_DEBUG flag to help resctrl debugging. Signed-off-by:
Babu Moger <babu.moger@amd.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Peter Newman <peternewman@google.com> Reviewed-by:
Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by:
Fenghua Yu <fenghua.yu@intel.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Reviewed-by:
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-7-babu.moger@amd.com
-
Babu Moger authored
resctrl associates rftype flags with its files so that files can be chosen based on the resource, whether it is info or base, and if it is control or monitor type file. These flags use the RF_ as well as RFTYPE_ prefixes. Change the prefix to RFTYPE_ for all these flags to be consistent. Signed-off-by:
Babu Moger <babu.moger@amd.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Peter Newman <peternewman@google.com> Reviewed-by:
Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by:
Fenghua Yu <fenghua.yu@intel.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Reviewed-by:
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-4-babu.moger@amd.com
-
Babu Moger authored
The rftype flags are bitmaps used for adding files under the resctrl filesystem. Some of these bitmap defines have one extra level of indirection which is not necessary. Drop the RF_* defines and simplify the macros. [ bp: Massage commit message. ] Signed-off-by:
Babu Moger <babu.moger@amd.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Peter Newman <peternewman@google.com> Reviewed-by:
Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Reviewed-by:
Fenghua Yu <fenghua.yu@intel.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Reviewed-by:
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by:
Peter Newman <peternewman@google.com> Tested-by:
Tan Shaopeng <tan.shaopeng@jp.fujitsu.com> Link: https://lore.kernel.org/r/20231017002308.134480-3-babu.moger@amd.com
-
- 11 Oct, 2023 1 commit
-
-
Maciej Wieczor-Retman authored
The setting for non-contiguous 1s support in Intel CAT is hardcoded to false. On these systems, writing non-contiguous 1s into the schemata file will fail before resctrl passes the value to the hardware. In Intel CAT CPUID.0x10.1:ECX[3] and CPUID.0x10.2:ECX[3] stopped being reserved and now carry information about non-contiguous 1s value support for L3 and L2 cache respectively. The CAT capacity bitmask (CBM) supports a non-contiguous 1s value if the bit is set. The exception are Haswell systems where non-contiguous 1s value support needs to stay disabled since they can't make use of CPUID for Cache allocation. Originally-by:
Fenghua Yu <fenghua.yu@intel.com> Signed-off-by:
Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by:
Peter Newman <peternewman@google.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Reviewed-by:
Babu Moger <babu.moger@amd.com> Tested-by:
Peter Newman <peternewman@google.com> Link: https://lore.kernel.org/r/1849b487256fe4de40b30f88450cba3d9abc9171.1696934091.git.maciej.wieczor-retman@intel.com
-
- 15 Mar, 2023 1 commit
-
-
Shawn Wang authored
As a temporary storage, staged_config[] in rdt_domain should be cleared before and after it is used. The stale value in staged_config[] could cause an MSR access error. Here is a reproducer on a system with 16 usable CLOSIDs for a 15-way L3 Cache (MBA should be disabled if the number of CLOSIDs for MB is less than 16.) : mount -t resctrl resctrl -o cdp /sys/fs/resctrl mkdir /sys/fs/resctrl/p{1..7} umount /sys/fs/resctrl/ mount -t resctrl resctrl /sys/fs/resctrl mkdir /sys/fs/resctrl/p{1..8} An error occurs when creating resource group named p8: unchecked MSR access error: WRMSR to 0xca0 (tried to write 0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60) Call Trace: <IRQ> __flush_smp_call_function_queue+0x11d/0x170 __sysvec_call_function+0x24/0xd0 sysvec_call_function+0x89/0xc0 </IRQ> <TASK> asm_sysvec_call_function+0x16/0x20 When creating a new resource control group, hardware will be configured by the following process: rdtgroup_mkdir() rdtgroup_mkdir_ctrl_mon() rdtgroup_init_alloc() resctrl_arch_update_domains() resctrl_arch_update_domains() iterates and updates all resctrl_conf_type whose have_new_ctrl is true. Since staged_config[] holds the same values as when CDP was enabled, it will continue to update the CDP_CODE and CDP_DATA configurations. When group p8 is created, get_config_index() called in resctrl_arch_update_domains() will return 16 and 17 as the CLOSIDs for CDP_CODE and CDP_DATA, which will be translated to an invalid register - 0xca0 in this scenario. Fix it by clearing staged_config[] before and after it is used. [reinette: re-order commit tags] Fixes: 75408e43 ("x86/resctrl: Allow different CODE/DATA configurations to be staged") Suggested-by:
Xin Hao <xhao@linux.alibaba.com> Signed-off-by:
Shawn Wang <shawnwang@linux.alibaba.com> Signed-off-by:
Reinette Chatre <reinette.chatre@intel.com> Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Tested-by:
Reinette Chatre <reinette.chatre@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/2fad13f49fbe89687fc40e9a5a61f23a28d1507a.1673988935.git.reinette.chatre%40intel.com
-
- 23 Jan, 2023 4 commits
-
-
Babu Moger authored
The event configuration can be viewed by the user by reading the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config. The event configuration settings are domain specific and will affect all the CPUs in the domain. Following are the types of events supported: ==== =========================================================== Bits Description ==== =========================================================== 6 Dirty Victims from the QOS domain to all types of memory 5 Reads to slow memory in the non-local NUMA domain 4 Reads to slow memory in the local NUMA domain 3 Non-temporal writes to non-local NUMA domain 2 Non-temporal writes to local NUMA domain 1 Reads to memory in the non-local NUMA domain 0 Reads to memory in the local NUMA domain ==== =========================================================== By default, the mbm_total_bytes_config is set to 0x7f to count all the event types. For example: $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 0=0x7f;1=0x7f;2=0x7f;3=0x7f In this case, the event mbm_total_bytes is configured with 0x7f on domains 0 to 3. Signed-off-by:
Babu Moger <babu.moger@amd.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20230113152039.770054-10-babu.moger@amd.com
-
Babu Moger authored
Add a new field in struct mon_evt to support Bandwidth Monitoring Event Configuration (BMEC) and also update the "mon_features" display. The resctrl file "mon_features" will display the supported events and files that can be used to configure those events if monitor configuration is supported. Before the change: $ cat /sys/fs/resctrl/info/L3_MON/mon_features llc_occupancy mbm_total_bytes mbm_local_bytes After the change when BMEC is supported: $ cat /sys/fs/resctrl/info/L3_MON/mon_features llc_occupancy mbm_total_bytes mbm_total_bytes_config mbm_local_bytes mbm_local_bytes_config Signed-off-by:
Babu Moger <babu.moger@amd.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20230113152039.770054-9-babu.moger@amd.com
-
Babu Moger authored
In an upcoming change, rdt_get_mon_l3_config() needs to call rdt_cpu_has() to query the monitor related features. It cannot be called right now because rdt_cpu_has() has the __init attribute but rdt_get_mon_l3_config() doesn't. Add the __init attribute to rdt_get_mon_l3_config() that is only called by get_rdt_mon_resources() that already has the __init attribute. Also make rdt_cpu_has() available to by rdt_get_mon_l3_config() via the internal header file. Signed-off-by:
Babu Moger <babu.moger@amd.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20230113152039.770054-8-babu.moger@amd.comSigned-off-by:
Borislav Petkov (AMD) <bp@alien8.de>
-
Babu Moger authored
Add a new resource type RDT_RESOURCE_SMBA to handle the QoS enforcement policies on the external slow memory. Mostly initialization of the essentials. Setting fflags to RFTYPE_RES_MB configures the SMBA resource to have the same resctrl files as the existing MBA resource. The SMBA resource has identical properties to the existing MBA resource. These properties will be enumerated in an upcoming change and exposed via resctrl because of this flag. Signed-off-by:
Babu Moger <babu.moger@amd.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20230113152039.770054-4-babu.moger@amd.com
-
- 27 Nov, 2022 1 commit
-
-
Borislav Petkov authored
msr-index.h should contain all MSRs for easier grepping for MSR numbers when dealing with unchecked MSR access warnings, for example. Move the resctrl ones. Prefix IA32_PQR_ASSOC with "MSR_" while at it. No functional changes. Signed-off-by:
Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221106212923.20699-1-bp@alien8.de
-
- 23 Sep, 2022 6 commits
-
-
James Morse authored
resctrl_arch_rmid_read() returns a value in chunks, as read from the hardware. This needs scaling to bytes by mon_scale, as provided by the architecture code. Now that resctrl_arch_rmid_read() performs the overflow and corrections itself, it may as well return a value in bytes directly. This allows the accesses to the architecture specific 'hw' structure to be removed. Move the mon_scale conversion into resctrl_arch_rmid_read(). mbm_bw_count() is updated to calculate bandwidth from bytes. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-22-james.morse@arm.com
-
James Morse authored
resctrl_cqm_threshold is stored in a hardware specific chunk size, but exposed to user-space as bytes. This means the filesystem parts of resctrl need to know how the hardware counts, to convert the user provided byte value to chunks. The interface between the architecture's resctrl code and the filesystem ought to treat everything as bytes. Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read() still returns its value in chunks, so this needs converting to bytes. As all the users have been touched, rename the variable to resctrl_rmid_realloc_threshold, which describes what the value is for. Neither r->num_rmid nor hw_res->mon_scale are guaranteed to be a power of 2, so the existing code introduces a rounding error from resctrl's theoretical fraction of the cache usage. This behaviour is kept as it ensures the user visible value matches the value read from hardware when the rmid will be reallocated. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-20-james.morse@arm.com
-
James Morse authored
resctrl_arch_rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a counter. Currently the function returns the MBM values in chunks directly from hardware. When reading a bandwidth counter, get_corrected_mbm_count() must be used to correct the value read. get_corrected_mbm_count() is architecture specific, this work should be done in resctrl_arch_rmid_read(). Move the function calls. This allows the resctrl filesystems's chunks value to be removed in favour of the architecture private version. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-19-james.morse@arm.com
-
James Morse authored
resctrl_arch_rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a counter. Currently the function returns the MBM values in chunks directly from hardware. When reading a bandwidth counter, mbm_overflow_count() must be used to correct for any possible overflow. mbm_overflow_count() is architecture specific, its behaviour should be part of resctrl_arch_rmid_read(). Move the mbm_overflow_count() calls into resctrl_arch_rmid_read(). This allows the resctrl filesystems's prev_msr to be removed in favour of the architecture private version. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-18-james.morse@arm.com
-
James Morse authored
__rmid_read() selects the specified eventid and returns the counter value from the MSR. The error handling is architecture specific, and handled by the callers, rdtgroup_mondata_show() and __mon_event_count(). Error handling should be handled by architecture specific code, as a different architecture may have different requirements. MPAM's counters can report that they are 'not ready', requiring a second read after a short delay. This should be hidden from resctrl. Make __rmid_read() the architecture specific function for reading a counter. Rename it resctrl_arch_rmid_read() and move the error handling into it. A read from a counter that hardware supports but resctrl does not now returns -EINVAL instead of -EIO from the default case in __mon_event_count(). It isn't possible for user-space to see this change as resctrl doesn't expose counters it doesn't support. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-16-james.morse@arm.com
-
James Morse authored
To abstract the rmid counters into a helper that returns the number of bytes counted, architecture specific per-rmid state is needed. It needs to be possible to reset this hidden state, as the values may outlive the life of an rmid, or the mount time of the filesystem. mon_event_read() is called with first = true when an rmid is first allocated in mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid() and call it from __mon_event_count()'s rr->first check. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-15-james.morse@arm.com
-
- 22 Sep, 2022 9 commits
-
-
James Morse authored
A renamed __rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a counter. Currently the function returns the MBM values in chunks directly from hardware. For bandwidth counters the resctrl filesystem uses this to calculate the number of bytes ever seen. MPAM's scaling of counters can be changed at runtime, reducing the resolution but increasing the range. When this is changed the prev_msr values need to be converted by the architecture code. Add an array for per-rmid private storage. The prev_msr and chunks values will move here to allow resctrl_arch_rmid_read() to always return the number of bytes read by this counter without assistance from the filesystem. The values are moved in later patches when the overflow and correction calls are moved into __rmid_read(). Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-14-james.morse@arm.com
-
James Morse authored
mbm_bw_count() is only called by the mbm_handle_overflow() worker once a second. It reads the hardware register, calculates the bandwidth and updates m->prev_bw_msr which is used to hold the previous hardware register value. Operating directly on hardware register values makes it difficult to make this code architecture independent, so that it can be moved to /fs/, making the mba_sc feature something resctrl supports with no additional support from the architecture. Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware register using __mon_event_count(). Change mbm_bw_count() to use the current chunks value most recently saved by __mon_event_count(). This removes an extra call to __rmid_read(). Instead of using m->prev_msr to calculate the number of chunks seen, use the rr->val that was updated by __mon_event_count(). This removes an extra call to mbm_overflow_count() and get_corrected_mbm_count(). Calculating bandwidth like this means mbm_bw_count() no longer operates on hardware register values directly. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-13-james.morse@arm.com
-
James Morse authored
update_mba_bw() calculates a new control value for the MBA resource based on the user provided mbps_val and the current measured bandwidth. Some control values need remapping by delay_bw_map(). It does this by calling wrmsrl() directly. This needs splitting up to be done by an architecture specific helper, so that the remainder can eventually be moved to /fs/. Add resctrl_arch_update_one() to apply one configuration value to the provided resource and domain. This avoids the staging and cross-calling that is only needed with changes made by user-space. delay_bw_map() moves to be part of the arch code, to maintain the 'percentage control' view of MBA resources in resctrl. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-12-james.morse@arm.com
-
James Morse authored
The resctrl arch code provides a second configuration array mbps_val[] for the MBA software controller. Since resctrl switched over to allocating and freeing its own array when needed, nothing uses the arch code version. Remove it. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-11-james.morse@arm.com
-
James Morse authored
To support resctrl's MBA software controller, the architecture must provide a second configuration array to hold the mbps_val[] from user-space. This complicates the interface between the architecture specific code and the filesystem portions of resctrl that will move to /fs/, to allow multiple architectures to support resctrl. Make the filesystem parts of resctrl create an array for the mba_sc values. The software controller can be changed to use this, allowing the architecture code to only consider the values configured in hardware. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-9-james.morse@arm.com
-
James Morse authored
Because domains are exposed to user-space via resctrl, the filesystem must update its state when CPU hotplug callbacks are triggered. Some of this work is common to any architecture that would support resctrl, but the work is tied up with the architecture code to free the memory. Move the monitor subdir removal and the cancelling of the mbm/limbo works into a new resctrl_offline_domain() call. These bits are not specific to the architecture. Grouping them in one function allows that code to be moved to /fs/ and re-used by another architecture. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-6-james.morse@arm.com
-
James Morse authored
Because domains are exposed to user-space via resctrl, the filesystem must update its state when CPU hotplug callbacks are triggered. Some of this work is common to any architecture that would support resctrl, but the work is tied up with the architecture code to allocate the memory. Move domain_setup_mon_state(), the monitor subdir creation call and the mbm/limbo workers into a new resctrl_online_domain() call. These bits are not specific to the architecture. Grouping them in one function allows that code to be moved to /fs/ and re-used by another architecture. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-4-james.morse@arm.com
-
James Morse authored
mon_enabled and mon_capable are always set as a pair by rdt_get_mon_l3_config(). There is no point having two values. Merge them together. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-3-james.morse@arm.com
-
James Morse authored
rdt_resources_all[] used to have extra entries for L2CODE/L2DATA. These were hidden from resctrl by the alloc_enabled value. Now that the L2/L2CODE/L2DATA resources have been merged together, alloc_enabled doesn't mean anything, it always has the same value as alloc_capable which indicates allocation is supported by this resource. Remove alloc_enabled and its helpers. Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Jamie Iles <quic_jiles@quicinc.com> Reviewed-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Tested-by:
Xin Hao <xhao@linux.alibaba.com> Tested-by:
Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by:
Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-2-james.morse@arm.com
-