• Tejun Heo's avatar
    sched_ext: Implement scx_bpf_kick_cpu() and task preemption support · 81aae789
    Tejun Heo authored
    It's often useful to wake up and/or trigger reschedule on other CPUs. This
    patch adds scx_bpf_kick_cpu() kfunc helper that BPF scheduler can call to
    kick the target CPU into the scheduling path.
    
    As a sched_ext task relinquishes its CPU only after its slice is depleted,
    this patch also adds SCX_KICK_PREEMPT and SCX_ENQ_PREEMPT which clears the
    slice of the target CPU's current task to guarantee that sched_ext's
    scheduling path runs on the CPU.
    
    If SCX_KICK_IDLE is specified, the target CPU is kicked iff the CPU is idle
    to guarantee that the target CPU will go through at least one full sched_ext
    scheduling cycle after the kicking. This can be used to wake up idle CPUs
    without incurring unnecessary overhead if it isn't currently idle.
    
    As a demonstration of how backward compatibility can be supported using BPF
    CO-RE, tools/sched_ext/include/scx/compat.bpf.h is added. It provides
    __COMPAT_scx_bpf_kick_cpu_IDLE() which uses SCX_KICK_IDLE if available or
    becomes a regular kicking otherwise. This allows schedulers to use the new
    SCX_KICK_IDLE while maintaining support for older kernels. The plan is to
    temporarily use compat helpers to ease API updates and drop them after a few
    kernel releases.
    
    v5: - SCX_KICK_IDLE added. Note that this also adds a compat mechanism for
          schedulers so that they can support kernels without SCX_KICK_IDLE.
          This is useful as a demonstration of how new feature flags can be
          added in a backward compatible way.
    
        - kick_cpus_irq_workfn() reimplemented so that it touches the pending
          cpumasks only as necessary to reduce kicking overhead on machines with
          a lot of CPUs.
    
        - tools/sched_ext/include/scx/compat.bpf.h added.
    
    v4: - Move example scheduler to its own patch.
    
    v3: - Make scx_example_central switch all tasks by default.
    
        - Convert to BPF inline iterators.
    
    v2: - Julia Lawall reported that scx_example_central can overflow the
          dispatch buffer and malfunction. As scheduling for other CPUs can't be
          handled by the automatic retry mechanism, fix by implementing an
          explicit overflow and retry handling.
    
        - Updated to use generic BPF cpumask helpers.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Reviewed-by: default avatarDavid Vernet <dvernet@meta.com>
    Acked-by: default avatarJosh Don <joshdon@google.com>
    Acked-by: default avatarHao Luo <haoluo@google.com>
    Acked-by: default avatarBarret Rhoden <brho@google.com>
    81aae789
ext.c 141 KB