An error occurred fetching the project authors.
- 18 Jun, 2024 8 commits
-
-
David Vernet authored
The most common and critical way that a BPF scheduler can misbehave is by failing to run runnable tasks for too long. This patch implements a watchdog. * All tasks record when they become runnable. * A watchdog work periodically scans all runnable tasks. If any task has stayed runnable for too long, the BPF scheduler is aborted. * scheduler_tick() monitors whether the watchdog itself is stuck. If so, the BPF scheduler is aborted. Because the watchdog only scans the tasks which are currently runnable and usually very infrequently, the overhead should be negligible. scx_qmap is updated so that it can be told to stall user and/or kernel tasks. A detected task stall looks like the following: sched_ext: BPF scheduler "qmap" errored, disabling sched_ext: runnable task stall (dbus-daemon[953] failed to run for 6.478s) scx_check_timeout_workfn+0x10e/0x1b0 process_one_work+0x287/0x560 worker_thread+0x234/0x420 kthread+0xe9/0x100 ret_from_fork+0x1f/0x30 A detected watchdog stall: sched_ext: BPF scheduler "qmap" errored, disabling sched_ext: runnable task stall (watchdog failed to check in for 5.001s) scheduler_tick+0x2eb/0x340 update_process_times+0x7a/0x90 tick_sched_timer+0xd8/0x130 __hrtimer_run_queues+0x178/0x3b0 hrtimer_interrupt+0xfc/0x390 __sysvec_apic_timer_interrupt+0xb7/0x2b0 sysvec_apic_timer_interrupt+0x90/0xb0 asm_sysvec_apic_timer_interrupt+0x1b/0x20 default_idle+0x14/0x20 arch_cpu_idle+0xf/0x20 default_idle_call+0x50/0x90 do_idle+0xe8/0x240 cpu_startup_entry+0x1d/0x20 kernel_init+0x0/0x190 start_kernel+0x0/0x392 start_kernel+0x324/0x392 x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x104/0x109 secondary_startup_64_no_verify+0xce/0xdb Note that this patch exposes scx_ops_error[_type]() in kernel/sched/ext.h to inline scx_notify_sched_tick(). v4: - While disabling, cancel_delayed_work_sync(&scx_watchdog_work) was being called before forward progress was guaranteed and thus could lead to system lockup. Relocated. - While enabling, it was comparing msecs against jiffies without conversion leading to spurious load failures on lower HZ kernels. Fixed. - runnable list management is now used by core bypass logic and moved to the patch implementing sched_ext core. v3: - bpf_scx_init_member() was incorrectly comparing ops->timeout_ms against SCX_WATCHDOG_MAX_TIMEOUT which is in jiffies without conversion leading to spurious load failures in lower HZ kernels. Fixed. v2: - Julia Lawall noticed that the watchdog code was mixing msecs and jiffies. Fix by using jiffies for everything. Signed-off-by:
David Vernet <dvernet@meta.com> Reviewed-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Josh Don <joshdon@google.com> Acked-by:
Hao Luo <haoluo@google.com> Acked-by:
Barret Rhoden <brho@google.com> Cc: Julia Lawall <julia.lawall@inria.fr>
-
Tejun Heo authored
Implement a new scheduler class sched_ext (SCX), which allows scheduling policies to be implemented as BPF programs to achieve the following: 1. Ease of experimentation and exploration: Enabling rapid iteration of new scheduling policies. 2. Customization: Building application-specific schedulers which implement policies that are not applicable to general-purpose schedulers. 3. Rapid scheduler deployments: Non-disruptive swap outs of scheduling policies in production environments. sched_ext leverages BPF’s struct_ops feature to define a structure which exports function callbacks and flags to BPF programs that wish to implement scheduling policies. The struct_ops structure exported by sched_ext is struct sched_ext_ops, and is conceptually similar to struct sched_class. The role of sched_ext is to map the complex sched_class callbacks to the more simple and ergonomic struct sched_ext_ops callbacks. For more detailed discussion on the motivations and overview, please refer to the cover letter. Later patches will also add several example schedulers and documentation. This patch implements the minimum core framework to enable implementation of BPF schedulers. Subsequent patches will gradually add functionalities including safety guarantee mechanisms, nohz and cgroup support. include/linux/sched/ext.h defines struct sched_ext_ops. With the comment on top, each operation should be self-explanatory. The followings are worth noting: - Both "sched_ext" and its shorthand "scx" are used. If the identifier already has "sched" in it, "ext" is used; otherwise, "scx". - In sched_ext_ops, only .name is mandatory. Every operation is optional and if omitted a simple but functional default behavior is provided. - A new policy constant SCHED_EXT is added and a task can select sched_ext by invoking sched_setscheduler(2) with the new policy constant. However, if the BPF scheduler is not loaded, SCHED_EXT is the same as SCHED_NORMAL and the task is scheduled by CFS. When the BPF scheduler is loaded, all tasks which have the SCHED_EXT policy are switched to sched_ext. - To bridge the workflow imbalance between the scheduler core and sched_ext_ops callbacks, sched_ext uses simple FIFOs called dispatch queues (dsq's). By default, there is one global dsq (SCX_DSQ_GLOBAL), and one local per-CPU dsq (SCX_DSQ_LOCAL). SCX_DSQ_GLOBAL is provided for convenience and need not be used by a scheduler that doesn't require it. SCX_DSQ_LOCAL is the per-CPU FIFO that sched_ext pulls from when putting the next task on the CPU. The BPF scheduler can manage an arbitrary number of dsq's using scx_bpf_create_dsq() and scx_bpf_destroy_dsq(). - sched_ext guarantees system integrity no matter what the BPF scheduler does. To enable this, each task's ownership is tracked through p->scx.ops_state and all tasks are put on scx_tasks list. The disable path can always recover and revert all tasks back to CFS. See p->scx.ops_state and scx_tasks. - A task is not tied to its rq while enqueued. This decouples CPU selection from queueing and allows sharing a scheduling queue across an arbitrary subset of CPUs. This adds some complexities as a task may need to be bounced between rq's right before it starts executing. See dispatch_to_local_dsq() and move_task_to_local_dsq(). - One complication that arises from the above weak association between task and rq is that synchronizing with dequeue() gets complicated as dequeue() may happen anytime while the task is enqueued and the dispatch path might need to release the rq lock to transfer the task. Solving this requires a bit of complexity. See the logic around p->scx.sticky_cpu and p->scx.ops_qseq. - Both enable and disable paths are a bit complicated. The enable path switches all tasks without blocking to avoid issues which can arise from partially switched states (e.g. the switching task itself being starved). The disable path can't trust the BPF scheduler at all, so it also has to guarantee forward progress without blocking. See scx_ops_enable() and scx_ops_disable_workfn(). - When sched_ext is disabled, static_branches are used to shut down the entry points from hot paths. v7: - scx_ops_bypass() was incorrectly and unnecessarily trying to grab scx_ops_enable_mutex which can lead to deadlocks in the disable path. Fixed. - Fixed TASK_DEAD handling bug in scx_ops_enable() path which could lead to use-after-free. - Consolidated per-cpu variable usages and other cleanups. v6: - SCX_NR_ONLINE_OPS replaced with SCX_OPI_*_BEGIN/END so that multiple groups can be expressed. Later CPU hotplug operations are put into their own group. - SCX_OPS_DISABLING state is replaced with the new bypass mechanism which allows temporarily putting the system into simple FIFO scheduling mode bypassing the BPF scheduler. In addition to the shut down path, this will also be used to isolate the BPF scheduler across PM events. Enabling and disabling the bypass mode requires iterating all runnable tasks. rq->scx.runnable_list addition is moved from the later watchdog patch. - ops.prep_enable() is replaced with ops.init_task() and ops.enable/disable() are now called whenever the task enters and leaves sched_ext instead of when the task becomes schedulable on sched_ext and stops being so. A new operation - ops.exit_task() - is called when the task stops being schedulable on sched_ext. - scx_bpf_dispatch() can now be called from ops.select_cpu() too. This removes the need for communicating local dispatch decision made by ops.select_cpu() to ops.enqueue() via per-task storage. SCX_KF_SELECT_CPU is added to support the change. - SCX_TASK_ENQ_LOCAL which told the BPF scheudler that scx_select_cpu_dfl() wants the task to be dispatched to the local DSQ was removed. Instead, scx_bpf_select_cpu_dfl() now dispatches directly if it finds a suitable idle CPU. If such behavior is not desired, users can use scx_bpf_select_cpu_dfl() which returns the verdict in a bool out param. - scx_select_cpu_dfl() was mishandling WAKE_SYNC and could end up queueing many tasks on a local DSQ which makes tasks to execute in order while other CPUs stay idle which made some hackbench numbers really bad. Fixed. - The current state of sched_ext can now be monitored through files under /sys/sched_ext instead of /sys/kernel/debug/sched/ext. This is to enable monitoring on kernels which don't enable debugfs. - sched_ext wasn't telling BPF that ops.dispatch()'s @prev argument may be NULL and a BPF scheduler which derefs the pointer without checking could crash the kernel. Tell BPF. This is currently a bit ugly. A better way to annotate this is expected in the future. - scx_exit_info updated to carry pointers to message buffers instead of embedding them directly. This decouples buffer sizes from API so that they can be changed without breaking compatibility. - exit_code added to scx_exit_info. This is used to indicate different exit conditions on non-error exits and will be used to handle e.g. CPU hotplugs. - The patch "sched_ext: Allow BPF schedulers to switch all eligible tasks into sched_ext" is folded in and the interface is changed so that partial switching is indicated with a new ops flag %SCX_OPS_SWITCH_PARTIAL. This makes scx_bpf_switch_all() unnecessasry and in turn SCX_KF_INIT. ops.init() is now called with SCX_KF_SLEEPABLE. - Code reorganized so that only the parts necessary to integrate with the rest of the kernel are in the header files. - Changes to reflect the BPF and other kernel changes including the addition of bpf_sched_ext_ops.cfi_stubs. v5: - To accommodate 32bit configs, p->scx.ops_state is now atomic_long_t instead of atomic64_t and scx_dsp_buf_ent.qseq which uses load_acquire/store_release is now unsigned long instead of u64. - Fix the bug where bpf_scx_btf_struct_access() was allowing write access to arbitrary fields. - Distinguish kfuncs which can be called from any sched_ext ops and from anywhere. e.g. scx_bpf_pick_idle_cpu() can now be called only from sched_ext ops. - Rename "type" to "kind" in scx_exit_info to make it easier to use on languages in which "type" is a reserved keyword. - Since cff9b233 ("kernel/sched: Modify initial boot task idle setup"), PF_IDLE is not set on idle tasks which haven't been online yet which made scx_task_iter_next_filtered() include those idle tasks in iterations leading to oopses. Update scx_task_iter_next_filtered() to directly test p->sched_class against idle_sched_class instead of using is_idle_task() which tests PF_IDLE. - Other updates to match upstream changes such as adding const to set_cpumask() param and renaming check_preempt_curr() to wakeup_preempt(). v4: - SCHED_CHANGE_BLOCK replaced with the previous sched_deq_and_put_task()/sched_enq_and_set_tsak() pair. This is because upstream is adaopting a different generic cleanup mechanism. Once that lands, the code will be adapted accordingly. - task_on_scx() used to test whether a task should be switched into SCX, which is confusing. Renamed to task_should_scx(). task_on_scx() now tests whether a task is currently on SCX. - scx_has_idle_cpus is barely used anymore and replaced with direct check on the idle cpumask. - SCX_PICK_IDLE_CORE added and scx_pick_idle_cpu() improved to prefer fully idle cores. - ops.enable() now sees up-to-date p->scx.weight value. - ttwu_queue path is disabled for tasks on SCX to avoid confusing BPF schedulers expecting ->select_cpu() call. - Use cpu_smt_mask() instead of topology_sibling_cpumask() like the rest of the scheduler. v3: - ops.set_weight() added to allow BPF schedulers to track weight changes without polling p->scx.weight. - move_task_to_local_dsq() was losing SCX-specific enq_flags when enqueueing the task on the target dsq because it goes through activate_task() which loses the upper 32bit of the flags. Carry the flags through rq->scx.extra_enq_flags. - scx_bpf_dispatch(), scx_bpf_pick_idle_cpu(), scx_bpf_task_running() and scx_bpf_task_cpu() now use the new KF_RCU instead of KF_TRUSTED_ARGS to make it easier for BPF schedulers to call them. - The kfunc helper access control mechanism implemented through sched_ext_entity.kf_mask is improved. Now SCX_CALL_OP*() is always used when invoking scx_ops operations. v2: - balance_scx_on_up() is dropped. Instead, on UP, balance_scx() is called from put_prev_taks_scx() and pick_next_task_scx() as necessary. To determine whether balance_scx() should be called from put_prev_task_scx(), SCX_TASK_DEQD_FOR_SLEEP flag is added. See the comment in put_prev_task_scx() for details. - sched_deq_and_put_task() / sched_enq_and_set_task() sequences replaced with SCHED_CHANGE_BLOCK(). - Unused all_dsqs list removed. This was a left-over from previous iterations. - p->scx.kf_mask is added to track and enforce which kfunc helpers are allowed. Also, init/exit sequences are updated to make some kfuncs always safe to call regardless of the current BPF scheduler state. Combined, this should make all the kfuncs safe. - BPF now supports sleepable struct_ops operations. Hacky workaround removed and operations and kfunc helpers are tagged appropriately. - BPF now supports bitmask / cpumask helpers. scx_bpf_get_idle_cpumask() and friends are added so that BPF schedulers can use the idle masks with the generic helpers. This replaces the hacky kfunc helpers added by a separate patch in V1. - CONFIG_SCHED_CLASS_EXT can no longer be enabled if SCHED_CORE is enabled. This restriction will be removed by a later patch which adds core-sched support. - Add MAINTAINERS entries and other misc changes. Signed-off-by:
Tejun Heo <tj@kernel.org> Co-authored-by:
David Vernet <dvernet@meta.com> Acked-by:
Josh Don <joshdon@google.com> Acked-by:
Hao Luo <haoluo@google.com> Acked-by:
Barret Rhoden <brho@google.com> Cc: Andrea Righi <andrea.righi@canonical.com>
-
Tejun Heo authored
This adds dummy implementations of sched_ext interfaces which interact with the scheduler core and hook them in the correct places. As they're all dummies, this doesn't cause any behavior changes. This is split out to help reviewing. v2: balance_scx_on_up() dropped. This will be handled in sched_ext proper. Signed-off-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
David Vernet <dvernet@meta.com> Acked-by:
Josh Don <joshdon@google.com> Acked-by:
Hao Luo <haoluo@google.com> Acked-by:
Barret Rhoden <brho@google.com>
-
Tejun Heo authored
Factor out sched_weight_from/to_cgroup() which convert between scheduler shares and cgroup weight. No functional change. The factored out functions will be used by a new BPF extensible sched_class so that the weights can be exposed to the BPF programs in a way which is consistent cgroup weights and easier to interpret. The weight conversions will be used regardless of cgroup usage. It's just borrowing the cgroup weight range as it's more intuitive. CGROUP_WEIGHT_MIN/DFL/MAX constants are moved outside CONFIG_CGROUPS so that the conversion helpers can always be defined. v2: The helpers are now defined regardless of COFNIG_CGROUPS. Signed-off-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
David Vernet <dvernet@meta.com> Acked-by:
Josh Don <joshdon@google.com> Acked-by:
Hao Luo <haoluo@google.com> Acked-by:
Barret Rhoden <brho@google.com>
-
Tejun Heo authored
When a task switches to a new sched_class, the prev and new classes are notified through ->switched_from() and ->switched_to(), respectively, after the switching is done. A new BPF extensible sched_class will have callbacks that allow the BPF scheduler to keep track of relevant task states (like priority and cpumask). Those callbacks aren't called while a task is on a different sched_class. When a task comes back, we wanna tell the BPF progs the up-to-date state before the task gets enqueued, so we need a hook which is called before the switching is committed. This patch adds ->switching_to() which is called during sched_class switch through check_class_changing() before the task is restored. Also, this patch exposes check_class_changing/changed() in kernel/sched/sched.h. They will be used by the new BPF extensible sched_class to implement implicit sched_class switching which is used e.g. when falling back to CFS when the BPF scheduler fails or unloads. This is a prep patch and doesn't cause any behavior changes. The new operation and exposed functions aren't used yet. v3: Refreshed on top of tip:sched/core. v2: Improve patch description w/ details on planned use. Signed-off-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
David Vernet <dvernet@meta.com> Acked-by:
Josh Don <joshdon@google.com> Acked-by:
Hao Luo <haoluo@google.com> Acked-by:
Barret Rhoden <brho@google.com>
-
Tejun Heo authored
Currently, during a task weight change, sched core directly calls reweight_task() defined in fair.c if @p is on CFS. Let's make it a proper sched_class operation instead. CFS's reweight_task() is renamed to reweight_task_fair() and now called through sched_class. While it turns a direct call into an indirect one, set_load_weight() isn't called from a hot path and this change shouldn't cause any noticeable difference. This will be used to implement reweight_task for a new BPF extensible sched_class so that it can keep its cached task weight up-to-date. Signed-off-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
David Vernet <dvernet@meta.com> Acked-by:
Josh Don <joshdon@google.com> Acked-by:
Hao Luo <haoluo@google.com> Acked-by:
Barret Rhoden <brho@google.com>
-
Tejun Heo authored
A new BPF extensible sched_class will need more control over the forking process. It wants to be able to fail from sched_cgroup_fork() after the new task's sched_task_group is initialized so that the loaded BPF program can prepare the task with its cgroup association is established and reject fork if e.g. allocation fails. Allow sched_cgroup_fork() to fail by making it return int instead of void and adding sched_cancel_fork() to undo sched_fork() in the error path. sched_cgroup_fork() doesn't fail yet and this patch shouldn't cause any behavior changes. v2: Patch description updated to detail the expected use. Signed-off-by:
Tejun Heo <tj@kernel.org> Reviewed-by:
David Vernet <dvernet@meta.com> Acked-by:
Josh Don <joshdon@google.com> Acked-by:
Hao Luo <haoluo@google.com> Acked-by:
Barret Rhoden <brho@google.com>
-
Tejun Heo authored
Currently, sched_init() checks that the sched_class'es are in the expected order by testing each adjacency which is a bit brittle and makes it cumbersome to add optional sched_class'es. Instead, let's verify whether they're in the expected order using sched_class_above() which is what matters. Signed-off-by:
Tejun Heo <tj@kernel.org> Suggested-by:
Peter Zijlstra <peterz@infradead.org> Reviewed-by:
David Vernet <dvernet@meta.com>
-
- 05 Jun, 2024 1 commit
-
-
Ingo Molnar authored
Remove unnecessary use of the address operator. Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org
-
- 27 May, 2024 2 commits
-
-
Ingo Molnar authored
Do a spell-checking pass. Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
core.c has become rather large, move most scheduler syscall related functionality into a separate file, syscalls.c. This is about ~15% of core.c's raw linecount. Move the alloc_user_cpus_ptr(), __rt_effective_prio(), rt_effective_prio(), uclamp_none(), uclamp_se_set() and uclamp_bucket_id() inlines to kernel/sched/sched.h. Internally export the __sched_setscheduler(), __sched_setaffinity(), __setscheduler_prio(), set_load_weight(), enqueue_task(), dequeue_task(), check_class_changed(), splice_balance_callbacks() and balance_callbacks() methods to better facilitate this. Move the new file's build to sched_policy.c, because it fits there semantically, but also because it's the smallest of the 4 build units under an allmodconfig build: -rw-rw-r-- 1 mingo mingo 7.3M May 27 12:35 kernel/sched/core.i -rw-rw-r-- 1 mingo mingo 6.4M May 27 12:36 kernel/sched/build_utility.i -rw-rw-r-- 1 mingo mingo 6.3M May 27 12:36 kernel/sched/fair.i -rw-rw-r-- 1 mingo mingo 5.8M May 27 12:36 kernel/sched/build_policy.i This better balances build time for scheduler subsystem rebuilds. I build-tested this new file as a standalone syscalls.o file for a bit, to make sure all the encapsulations & abstractions are robust. Also update/add my copyright notices to these files. Build time measurements: # -Before/+After: kepler:~/tip> perf stat -e 'cycles,instructions,duration_time' --sync --repeat 5 --pre 'rm -f kernel/sched/*.o' m kernel/sched/built-in.a >/dev/null Performance counter stats for 'm kernel/sched/built-in.a' (5 runs): - 71,938,508,607 cycles ( +- 0.17% ) + 71,992,916,493 cycles ( +- 0.22% ) - 106,214,780,964 instructions # 1.48 insn per cycle ( +- 0.01% ) + 105,450,231,154 instructions # 1.46 insn per cycle ( +- 0.01% ) - 5,878,232,620 ns duration_time ( +- 0.38% ) + 5,290,085,069 ns duration_time ( +- 0.21% ) - 5.8782 +- 0.0221 seconds time elapsed ( +- 0.38% ) + 5.2901 +- 0.0111 seconds time elapsed ( +- 0.21% ) Build time improvement of -11.1% (duration_time) is expected: the parallel build time of the scheduler subsystem is determined by the largest, slowest to build object file, which is kernel/sched/core.o. By moving ~15% of its complexity into another build unit, we reduced build time by -11%. Measured cycles spent on building is within its ~0.2% stddev noise envelope. The -0.7% reduction in instructions spent on building the scheduler is statistically reliable and somewhat surprising - I can only speculate: maybe compilers aren't that efficient at building & optimizing 10+ KLOC files (core.c), and it's an overall win to balance the linecount a bit. Anyway, this might be a data point that suggests that reducing the linecount of our largest files will improve not just code readability and maintainability, but might also improve build times a bit. Code generation got a bit worse, by 0.5kb text on an x86 defconfig build: # -Before/+After: kepler:~/tip> size vmlinux text data bss dec hex filename -26475475 10439178 1740804 38655457 24dd5e1 vmlinux +26476003 10439178 1740804 38655985 24dd7f1 vmlinux kepler:~/tip> size kernel/sched/built-in.a text data bss dec hex filename - 76056 30025 489 106570 1a04a kernel/sched/core.o (ex kernel/sched/built-in.a) + 63452 29453 489 93394 16cd2 kernel/sched/core.o (ex kernel/sched/built-in.a) 44299 2181 104 46584 b5f8 kernel/sched/fair.o (ex kernel/sched/built-in.a) - 42764 3424 120 46308 b4e4 kernel/sched/build_policy.o (ex kernel/sched/built-in.a) + 55651 4044 120 59815 e9a7 kernel/sched/build_policy.o (ex kernel/sched/built-in.a) 44866 12655 2192 59713 e941 kernel/sched/build_utility.o (ex kernel/sched/built-in.a) 44866 12655 2192 59713 e941 kernel/sched/build_utility.o (ex kernel/sched/built-in.a) This is primarily due to the extra functions exported, and the size gets exaggerated somewhat by __pfx CFI function padding: ffffffff810cc710 <__pfx_enqueue_task>: ffffffff810cc710: 90 nop ffffffff810cc711: 90 nop ffffffff810cc712: 90 nop ffffffff810cc713: 90 nop ffffffff810cc714: 90 nop ffffffff810cc715: 90 nop ffffffff810cc716: 90 nop ffffffff810cc717: 90 nop ffffffff810cc718: 90 nop ffffffff810cc719: 90 nop ffffffff810cc71a: 90 nop ffffffff810cc71b: 90 nop ffffffff810cc71c: 90 nop ffffffff810cc71d: 90 nop ffffffff810cc71e: 90 nop ffffffff810cc71f: 90 nop AFAICS the cost is primarily not to core.o and fair.o though (which contain most performance sensitive scheduler functions), only to syscalls.o that get called with much lower frequency - so I think this is an acceptable trade-off for better code separation. Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Link: https://lore.kernel.org/r/20240407084319.1462211-2-mingo@kernel.org
-
- 17 May, 2024 1 commit
-
-
Cheng Yu authored
In the cgroup v2 CPU subsystem, assuming we have a cgroup named 'test', and we set cpu.max and cpu.max.burst: # echo 1000000 > /sys/fs/cgroup/test/cpu.max # echo 1000000 > /sys/fs/cgroup/test/cpu.max.burst then we check cpu.max and cpu.max.burst: # cat /sys/fs/cgroup/test/cpu.max 1000000 100000 # cat /sys/fs/cgroup/test/cpu.max.burst 1000000 Next we set cpu.max again and check cpu.max and cpu.max.burst: # echo 2000000 > /sys/fs/cgroup/test/cpu.max # cat /sys/fs/cgroup/test/cpu.max 2000000 100000 # cat /sys/fs/cgroup/test/cpu.max.burst 1000 ... we find that the cpu.max.burst value changed unexpectedly. In cpu_max_write(), the unit of the burst value returned by tg_get_cfs_burst() is microseconds, while in cpu_max_write(), the burst unit used for calculation should be nanoseconds, which leads to the bug. To fix it, get the burst value directly from tg->cfs_bandwidth.burst. Fixes: f4183717 ("sched/fair: Introduce the burstable CFS controller") Reported-by:
Qixin Liao <liaoqixin@huawei.com> Signed-off-by:
Cheng Yu <serein.chengyu@huawei.com> Signed-off-by:
Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Vincent Guittot <vincent.guittot@linaro.org> Tested-by:
Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20240424132438.514720-1-serein.chengyu@huawei.com
-
- 24 Apr, 2024 3 commits
-
-
Vincent Guittot authored
The optional shift of the clock used by thermal/hw load avg has been introduced to handle case where the signal was not always a high frequency hw signal. Now that cpufreq provides a signal for firmware and SW pressure, we can remove this exception and always keep this PELT signal aligned with other signals. Mark sysctl_sched_migration_cost boot parameter as deprecated Signed-off-by:
Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Tested-by:
Lukasz Luba <lukasz.luba@arm.com> Reviewed-by:
Qais Yousef <qyousef@layalina.io> Reviewed-by:
Lukasz Luba <lukasz.luba@arm.com> Link: https://lore.kernel.org/r/20240326091616.3696851-6-vincent.guittot@linaro.org
-
Vincent Guittot authored
Now that cpufreq provides a pressure value to the scheduler, rename arch_update_thermal_pressure into HW pressure to reflect that it returns a pressure applied by HW (i.e. with a high frequency change) and not always related to thermal mitigation but also generated by max current limitation as an example. Such high frequency signal needs filtering to be smoothed and provide an value that reflects the average available capacity into the scheduler time scale. Signed-off-by:
Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Tested-by:
Lukasz Luba <lukasz.luba@arm.com> Reviewed-by:
Qais Yousef <qyousef@layalina.io> Reviewed-by:
Lukasz Luba <lukasz.luba@arm.com> Link: https://lore.kernel.org/r/20240326091616.3696851-5-vincent.guittot@linaro.org
-
Joel Granados authored
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) rm sentinel element from ctl_table arrays Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by:
Valentin Schneider <vschneid@redhat.com> Reviewed-by:
Valentin Schneider <vschneid@redhat.com> Signed-off-by:
Joel Granados <j.granados@samsung.com>
-
- 12 Mar, 2024 2 commits
-
-
Ingo Molnar authored
Standardize scheduler load-balancing function names on the sched_balance_() prefix. Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-4-mingo@kernel.org
-
Ingo Molnar authored
- Standardize on prefixing scheduler-internal functions defined in <linux/sched.h> with sched_*() prefix. scheduler_tick() was the only function using the scheduler_ prefix. Harmonize it. - The other reason to rename it is the NOHZ scheduler tick handling functions are already named sched_tick_*(). Make the 'git grep sched_tick' more meaningful. Signed-off-by:
Ingo Molnar <mingo@kernel.org> Acked-by:
Valentin Schneider <vschneid@redhat.com> Reviewed-by:
Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-3-mingo@kernel.org
-
- 24 Feb, 2024 1 commit
-
-
Qais Yousef authored
The new helper function is needed to help blk-mq check if it needs to dispatch the softirq on another CPU to match the performance level the IO requester is running at. This is important on HMP systems where not all CPUs have the same compute capacity. Signed-off-by:
Qais Yousef <qyousef@layalina.io> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240223155749.2958009-2-qyousef@layalina.ioSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 16 Feb, 2024 1 commit
-
-
Shrikanth Hegde authored
There's a few cases of nested #ifdefs in the scheduler code that can be simplified: #ifdef DEFINE_A ...code block... #ifdef DEFINE_A <-- This is a duplicate. ...code block... #endif #else #ifndef DEFINE_A <-- This is also duplicate. ...code block... #endif #endif More details about the script and methods used to find these code patterns can be found at: https://lore.kernel.org/all/20240118080326.13137-1-sshegde@linux.ibm.com/ No change in functionality intended. [ mingo: Clarified the changelog. ] Signed-off-by:
Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20240216061433.535522-1-sshegde@linux.ibm.com
-
- 15 Feb, 2024 3 commits
-
-
Andrea Parri authored
RISC-V uses xRET instructions on return from interrupt and to go back to user-space; the xRET instruction is not core serializing. Use FENCE.I for providing core serialization as follows: - by calling sync_core_before_usermode() on return from interrupt (cf. ipi_sync_core()), - via switch_mm() and sync_core_before_usermode() (respectively, for uthread->uthread and kthread->uthread transitions) before returning to user-space. On RISC-V, the serialization in switch_mm() is activated by resetting the icache_stale_mask of the mm at prepare_sync_core_cmd(). Suggested-by:
Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by:
Andrea Parri <parri.andrea@gmail.com> Reviewed-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20240131144936.29190-5-parri.andrea@gmail.comSigned-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Andrea Parri authored
To gather the architecture requirements of the "private/global expedited" membarrier commands. The file will be expanded to integrate further information about the membarrier syscall (as needed/desired in the future). While at it, amend some related inline comments in the membarrier codebase. Suggested-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by:
Andrea Parri <parri.andrea@gmail.com> Reviewed-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20240131144936.29190-3-parri.andrea@gmail.comSigned-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Andrea Parri authored
The membarrier system call requires a full memory barrier after storing to rq->curr, before going back to user-space. The barrier is only needed when switching between processes: the barrier is implied by mmdrop() when switching from kernel to userspace, and it's not needed when switching from userspace to kernel. Rely on the feature/mechanism ARCH_HAS_MEMBARRIER_CALLBACKS and on the primitive membarrier_arch_switch_mm(), already adopted by the PowerPC architecture, to insert the required barrier. Fixes: fab957c1 ("RISC-V: Atomic and Locking Code") Signed-off-by:
Andrea Parri <parri.andrea@gmail.com> Reviewed-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20240131144936.29190-2-parri.andrea@gmail.comSigned-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
- 05 Feb, 2024 1 commit
-
-
Jens Axboe authored
Mark the task as having a cached timestamp when set assign it, so we can efficiently check if it needs updating post being scheduled back in. This covers both the actual schedule out case, which would've flushed the plug, and the preemption case which doesn't touch the plugged requests (for many reasons, one of them being then we'd need to have preemption disabled around plug state manipulation). Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 27 Dec, 2023 1 commit
-
-
Kent Overstreet authored
We're trying to get sched.h down to more or less just types only, not code - rseq can live in its own header. This helps us kill the dependency on preempt.h in sched.h. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
- 23 Nov, 2023 1 commit
-
-
Vincent Guittot authored
The current method to take into account uclamp hints when estimating the target frequency can end in a situation where the selected target frequency is finally higher than uclamp hints, whereas there are no real needs. Such cases mainly happen because we are currently mixing the traditional scheduler utilization signal with the uclamp performance hints. By adding these 2 metrics, we loose an important information when it comes to select the target frequency, and we have to make some assumptions which can't fit all cases. Rework the interface between the scheduler and schedutil governor in order to propagate all information down to the cpufreq governor. effective_cpu_util() interface changes and now returns the actual utilization of the CPU with 2 optional inputs: - The minimum performance for this CPU; typically the capacity to handle the deadline task and the interrupt pressure. But also uclamp_min request when available. - The maximum targeting performance for this CPU which reflects the maximum level that we would like to not exceed. By default it will be the CPU capacity but can be reduced because of some performance hints set with uclamp. The value can be lower than actual utilization and/or min performance level. A new sugov_effective_cpu_perf() interface is also available to compute the final performance level that is targeted for the CPU, after applying some cpufreq headroom and taking into account all inputs. With these 2 functions, schedutil is now able to decide when it must go above uclamp hints. It now also has a generic way to get the min performance level. The dependency between energy model and cpufreq governor and its headroom policy doesn't exist anymore. eenv_pd_max_util() asks schedutil for the targeted performance after applying the impact of the waking task. [ mingo: Refined the changelog & C comments. ] Signed-off-by:
Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Acked-by:
Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20231122133904.446032-2-vincent.guittot@linaro.org
-
- 15 Nov, 2023 4 commits
-
-
Frederic Weisbecker authored
Trying to avoid that didn't bring much value after testing, add comment about this. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Rafael J. Wysocki <rafael@kernel.org> Link: https://lkml.kernel.org/r/20231114193840.4041-3-frederic@kernel.org
-
Peter Zijlstra authored
Low priority tasks (e.g., SCHED_OTHER) can suffer starvation if tasks with higher priority (e.g., SCHED_FIFO) monopolize CPU(s). RT Throttling has been introduced a while ago as a (mostly debug) countermeasure one can utilize to reserve some CPU time for low priority tasks (usually background type of work, e.g. workqueues, timers, etc.). It however has its own problems (see documentation) and the undesired effect of unconditionally throttling FIFO tasks even when no lower priority activity needs to run (there are mechanisms to fix this issue as well, but, again, with their own problems). Introduce deadline servers to service low priority tasks needs under starvation conditions. Deadline servers are built extending SCHED_DEADLINE implementation to allow 2-level scheduling (a sched_deadline entity becomes a container for lower priority scheduling entities). Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/4968601859d920335cf85822eb573a5f179f04b8.1699095159.git.bristot@kernel.org
-
Peter Zijlstra authored
Create a single function that initializes a sched_dl_entity. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Phil Auld <pauld@redhat.com> Reviewed-by:
Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/51acc695eecf0a1a2f78f9a044e11ffd9b316bcf.1699095159.git.bristot@kernel.org
-
Paul E. McKenney authored
Since RCU-tasks uses READ_ONCE(p->on_rq), ensure the write-side matches with WRITE_ONCE(). Signed-off-by:
"Paul E. McKenney" <paulmck@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/e4896e0b-eacc-45a2-a7a8-de2280a51ecc@paulmck-laptop
-
- 24 Oct, 2023 3 commits
-
-
Peter Zijlstra authored
SIS_UTIL seems to work well, lets remove the old thing. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20231020134337.GD33965@noisy.programming.kicks-ass.net
-
Barry Song authored
Add cpus_share_resources() API. This is the preparation for the optimization of select_idle_cpu() on platforms with cluster scheduler level. On a machine with clusters cpus_share_resources() will test whether two cpus are within the same cluster. On a non-cluster machine it will behaves the same as cpus_share_cache(). So we use "resources" here for cache resources. Signed-off-by:
Barry Song <song.bao.hua@hisilicon.com> Signed-off-by:
Yicong Yang <yangyicong@hisilicon.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Gautham R. Shenoy <gautham.shenoy@amd.com> Reviewed-by:
Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by:
Vincent Guittot <vincent.guittot@linaro.org> Tested-and-reviewed-by:
Chen Yu <yu.c.chen@intel.com> Tested-by:
K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lkml.kernel.org/r/20231019033323.54147-2-yangyicong@huawei.com
-
Hao Jia authored
Igor Raits and Bagas Sanjaya report a RQCF_ACT_SKIP leak warning. This warning may be triggered in the following situations: CPU0 CPU1 __schedule() *rq->clock_update_flags <<= 1;* unregister_fair_sched_group() pick_next_task_fair+0x4a/0x410 destroy_cfs_bandwidth() newidle_balance+0x115/0x3e0 for_each_possible_cpu(i) *i=0* rq_unpin_lock(this_rq, rf) __cfsb_csd_unthrottle() raw_spin_rq_unlock(this_rq) rq_lock(*CPU0_rq*, &rf) rq_clock_start_loop_update() rq->clock_update_flags & RQCF_ACT_SKIP <-- raw_spin_rq_lock(this_rq) The purpose of RQCF_ACT_SKIP is to skip the update rq clock, but the update is very early in __schedule(), but we clear RQCF_*_SKIP very late, causing it to span that gap above and triggering this warning. In __schedule() we can clear the RQCF_*_SKIP flag immediately after update_rq_clock() to avoid this RQCF_ACT_SKIP leak warning. And set rq->clock_update_flags to RQCF_UPDATED to avoid rq->clock_update_flags < RQCF_ACT_SKIP warning that may be triggered later. Fixes: ebb83d84 ("sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()") Closes: https://lore.kernel.org/all/20230913082424.73252-1-jiahao.os@bytedance.comReported-by:
Igor Raits <igor.raits@gmail.com> Reported-by:
Bagas Sanjaya <bagasdotme@gmail.com> Suggested-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Hao Jia <jiahao.os@bytedance.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/a5dd536d-041a-2ce9-f4b7-64d8d85c86dc@gmail.com
-
- 18 Oct, 2023 1 commit
-
-
Alexey Dobriyan authored
__read_mostly predates __ro_after_init. Many variables which are marked __read_mostly should have been __ro_after_init from day 1. Also, mark some stuff as "const" and "__init" while I'm at it. [akpm@linux-foundation.org: revert sysctl_nr_open_min, sysctl_nr_open_max changes due to arm warning] [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/4f6bb9c0-abba-4ee4-a7aa-89265e886817@p183Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- 13 Oct, 2023 1 commit
-
-
Peter Zijlstra authored
Kuyo reported sporadic failures on a sched_setaffinity() vs CPU hotplug stress-test -- notably affine_move_task() remains stuck in wait_for_completion(), leading to a hung-task detector warning. Specifically, it was reported that stop_one_cpu_nowait(.fn = migration_cpu_stop) returns false -- this stopper is responsible for the matching complete(). The race scenario is: CPU0 CPU1 // doing _cpu_down() __set_cpus_allowed_ptr() task_rq_lock(); takedown_cpu() stop_machine_cpuslocked(take_cpu_down..) <PREEMPT: cpu_stopper_thread() MULTI_STOP_PREPARE ... __set_cpus_allowed_ptr_locked() affine_move_task() task_rq_unlock(); <PREEMPT: cpu_stopper_thread()\> ack_state() MULTI_STOP_RUN take_cpu_down() __cpu_disable(); stop_machine_park(); stopper->enabled = false; /> /> stop_one_cpu_nowait(.fn = migration_cpu_stop); if (stopper->enabled) // false!!! That is, by doing stop_one_cpu_nowait() after dropping rq-lock, the stopper thread gets a chance to preempt and allows the cpu-down for the target CPU to complete. OTOH, since stop_one_cpu_nowait() / cpu_stop_queue_work() needs to issue a wakeup, it must not be ran under the scheduler locks. Solve this apparent contradiction by keeping preemption disabled over the unlock + queue_stopper combination: preempt_disable(); task_rq_unlock(...); if (!stop_pending) stop_one_cpu_nowait(...) preempt_enable(); This respects the lock ordering contraints while still avoiding the above race. That is, if we find the CPU is online under rq-lock, the targeted stop_one_cpu_nowait() must succeed. Apply this pattern to all similar stop_one_cpu_nowait() invocations. Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Reported-by:
"Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by:
"Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com> Link: https://lkml.kernel.org/r/20231010200442.GA16515@noisy.programming.kicks-ass.net
-
- 09 Oct, 2023 1 commit
-
-
Vincent Guittot authored
Remove the rq::cpu_capacity_orig field and use arch_scale_cpu_capacity() instead. The scheduler uses 3 methods to get access to a CPU's max compute capacity: - arch_scale_cpu_capacity(cpu) which is the default way to get a CPU's capacity. - cpu_capacity_orig field which is periodically updated with arch_scale_cpu_capacity(). - capacity_orig_of(cpu) which encapsulates rq->cpu_capacity_orig. There is no real need to save the value returned by arch_scale_cpu_capacity() in struct rq. arch_scale_cpu_capacity() returns: - either a per_cpu variable. - or a const value for systems which have only one capacity. Remove rq::cpu_capacity_orig and use arch_scale_cpu_capacity() everywhere. No functional changes. Some performance tests on Arm64: - small SMP device (hikey): no noticeable changes - HMP device (RB5): hackbench shows minor improvement (1-2%) - large smp (thx2): hackbench and tbench shows minor improvement (1%) Signed-off-by:
Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20231009103621.374412-2-vincent.guittot@linaro.org
-
- 07 Oct, 2023 2 commits
-
-
Yajun Deng authored
Multiple blocked tasks are printed when the system hangs. They may have the same parent pid, but belong to different task groups. Printing tgid lets users better know whether these tasks are from the same task group or not. Signed-off-by:
Yajun Deng <yajun.deng@linux.dev> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230720080516.1515297-1-yajun.deng@linux.dev
-
Ingo Molnar authored
The following commit: 9b3c4ab3 ("sched,rcu: Rework try_invoke_on_locked_down_task()") ... renamed try_invoke_on_locked_down_task() to task_call_func(), but forgot to update the comment in try_to_wake_up(). But it turns out that the smp_rmb() doesn't live in task_call_func() either, it was moved to __task_needs_rq_lock() in: 91dabf33 ("sched: Fix race in task_call_func()") Fix that now. Also fix the s/smb/smp typo while at it. Reported-by:
Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230731085759.11443-1-zhangqiao22@huawei.com
-
- 03 Oct, 2023 1 commit
-
-
Yu Liao authored
<linux/psi.h> and "autogroup.h" are included twice, remove the duplicate header inclusion. Signed-off-by:
Yu Liao <liaoyu15@huawei.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230802021501.2511569-1-liaoyu15@huawei.com
-
- 29 Sep, 2023 1 commit
-
-
Qais Yousef authored
It was useful to track feec() placement decision and debug the spare capacity and optimization issues vs uclamp_max. Signed-off-by:
Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230916232955.2099394-4-qyousef@layalina.io
-
- 24 Sep, 2023 1 commit
-
-
Wang Jinchao authored
Simplify the conditional logic for checking worker flags by splitting the original compound `if` statement into separate `if` and `else if` clauses. This modification not only retains the previous functionality, but also reduces a single `if` check, improving code clarity and potentially enhancing performance. Signed-off-by:
Wang Jinchao <wangjinchao@xfusion.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/ZOIMvURE99ZRAYEj@fedora
-