Commit 6fde1424 authored by James Morse's avatar James Morse Committed by Borislav Petkov (AMD)

x86/resctrl: Allow resctrl_arch_rmid_read() to sleep

MPAM's cache occupancy counters can take a little while to settle once the
monitor has been configured. The maximum settling time is described to the
driver via a firmware table. The value could be large enough that it makes
sense to sleep. To avoid exposing this to resctrl, it should be hidden behind
MPAM's resctrl_arch_rmid_read().

resctrl_arch_rmid_read() may be called via IPI meaning it is unable to sleep.
In this case, it should return an error if it needs to sleep. This will only
affect MPAM platforms where the cache occupancy counter isn't available
immediately, nohz_full is in use, and there are no housekeeping CPUs in the
necessary domain.

There are three callers of resctrl_arch_rmid_read(): __mon_event_count() and
__check_limbo() are both called from a non-migrateable context.
mon_event_read() invokes __mon_event_count() using smp_call_on_cpu(), which
adds work to the target CPUs workqueue.  rdtgroup_mutex() is held, meaning this
cannot race with the resctrl cpuhp callback. __check_limbo() is invoked via
schedule_delayed_work_on() also adds work to a per-cpu workqueue.

The remaining call is add_rmid_to_limbo() which is called in response to
a user-space syscall that frees an RMID. This opportunistically reads the LLC
occupancy counter on the current domain to see if the RMID is over the dirty
threshold. This has to disable preemption to avoid reading the wrong domain's
value. Disabling preemption here prevents resctrl_arch_rmid_read() from
sleeping.

add_rmid_to_limbo() walks each domain, but only reads the counter on one
domain. If the system has more than one domain, the RMID will always be added
to the limbo list. If the RMIDs usage was not over the threshold, it will be
removed from the list when __check_limbo() runs.  Make this the default
behaviour. Free RMIDs are always added to the limbo list for each domain.

The user visible effect of this is that a clean RMID is not available for
re-allocation immediately after 'rmdir()' completes. This behaviour was never
portable as it never happened on a machine with multiple domains.

Removing this path allows resctrl_arch_rmid_read() to sleep if its called with
interrupts unmasked. Document this is the expected behaviour, and add
a might_sleep() annotation to catch changes that won't work on arm64.
Signed-off-by: default avatarJames Morse <james.morse@arm.com>
Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: default avatarShaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: default avatarReinette Chatre <reinette.chatre@intel.com>
Reviewed-by: default avatarBabu Moger <babu.moger@amd.com>
Tested-by: default avatarShaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: default avatarPeter Newman <peternewman@google.com>
Tested-by: default avatarBabu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-15-james.morse@arm.comSigned-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
parent 09909e09
...@@ -277,6 +277,8 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d, ...@@ -277,6 +277,8 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
u64 msr_val, chunks; u64 msr_val, chunks;
int ret; int ret;
resctrl_arch_rmid_read_context_check();
if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask)) if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
return -EINVAL; return -EINVAL;
...@@ -455,8 +457,6 @@ static void add_rmid_to_limbo(struct rmid_entry *entry) ...@@ -455,8 +457,6 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
{ {
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
struct rdt_domain *d; struct rdt_domain *d;
int cpu, err;
u64 val = 0;
u32 idx; u32 idx;
lockdep_assert_held(&rdtgroup_mutex); lockdep_assert_held(&rdtgroup_mutex);
...@@ -464,17 +464,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry) ...@@ -464,17 +464,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
idx = resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid); idx = resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid);
entry->busy = 0; entry->busy = 0;
cpu = get_cpu();
list_for_each_entry(d, &r->domains, list) { list_for_each_entry(d, &r->domains, list) {
if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
err = resctrl_arch_rmid_read(r, d, entry->closid,
entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
&val);
if (err || val <= resctrl_rmid_realloc_threshold)
continue;
}
/* /*
* For the first limbo RMID in the domain, * For the first limbo RMID in the domain,
* setup up the limbo worker. * setup up the limbo worker.
...@@ -484,15 +474,10 @@ static void add_rmid_to_limbo(struct rmid_entry *entry) ...@@ -484,15 +474,10 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
set_bit(idx, d->rmid_busy_llc); set_bit(idx, d->rmid_busy_llc);
entry->busy++; entry->busy++;
} }
put_cpu();
if (entry->busy) {
rmid_limbo_count++; rmid_limbo_count++;
if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
closid_num_dirty_rmid[entry->closid]++; closid_num_dirty_rmid[entry->closid]++;
} else {
list_add_tail(&entry->list, &rmid_free_lru);
}
} }
void free_rmid(u32 closid, u32 rmid) void free_rmid(u32 closid, u32 rmid)
......
...@@ -236,7 +236,12 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d); ...@@ -236,7 +236,12 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
* @eventid: eventid to read, e.g. L3 occupancy. * @eventid: eventid to read, e.g. L3 occupancy.
* @val: result of the counter read in bytes. * @val: result of the counter read in bytes.
* *
* Call from process context on a CPU that belongs to domain @d. * Some architectures need to sleep when first programming some of the counters.
* (specifically: arm64's MPAM cache occupancy counters can return 'not ready'
* for a short period of time). Call from a non-migrateable process context on
* a CPU that belongs to domain @d. e.g. use smp_call_on_cpu() or
* schedule_work_on(). This function can be called with interrupts masked,
* e.g. using smp_call_function_any(), but may consistently return an error.
* *
* Return: * Return:
* 0 on success, or -EIO, -EINVAL etc on error. * 0 on success, or -EIO, -EINVAL etc on error.
...@@ -245,6 +250,22 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d, ...@@ -245,6 +250,22 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, u32 rmid, enum resctrl_event_id eventid, u32 closid, u32 rmid, enum resctrl_event_id eventid,
u64 *val); u64 *val);
/**
* resctrl_arch_rmid_read_context_check() - warn about invalid contexts
*
* When built with CONFIG_DEBUG_ATOMIC_SLEEP generate a warning when
* resctrl_arch_rmid_read() is called with preemption disabled.
*
* The contract with resctrl_arch_rmid_read() is that if interrupts
* are unmasked, it can sleep. This allows NOHZ_FULL systems to use an
* IPI, (and fail if the call needed to sleep), while most of the time
* the work is scheduled, allowing the call to sleep.
*/
static inline void resctrl_arch_rmid_read_context_check(void)
{
if (!irqs_disabled())
might_sleep();
}
/** /**
* resctrl_arch_reset_rmid() - Reset any private state associated with rmid * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment