1. 31 Aug, 2024 2 commits
    • Tejun Heo's avatar
      sched_ext: Use sched_clock_cpu() instead of rq_clock_task() in touch_core_sched() · 62607d03
      Tejun Heo authored
      Since 3cf78c5d ("sched_ext: Unpin and repin rq lock from
      balance_scx()"), sched_ext's balance path terminates rq_pin in the outermost
      function. This is simpler and in line with what other balance functions are
      doing but it loses control over rq->clock_update_flags which makes
      assert_clock_udpated() trigger if other CPUs pins the rq lock.
      
      The only place this matters is touch_core_sched() which uses the timestamp
      to order tasks from sibling rq's. Switch to sched_clock_cpu(). Later, it may
      be better to use per-core dispatch sequence number.
      
      v2: Use sched_clock_cpu() instead of ktime_get_ns() per David.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 3cf78c5d ("sched_ext: Unpin and repin rq lock from balance_scx()")
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      62607d03
    • Tejun Heo's avatar
      sched_ext: Use task_can_run_on_remote_rq() test in dispatch_to_local_dsq() · 0366017e
      Tejun Heo authored
      When deciding whether a task can be migrated to a CPU,
      dispatch_to_local_dsq() was open-coding p->cpus_allowed and scx_rq_online()
      tests instead of using task_can_run_on_remote_rq(). This had two problems.
      
      - It was missing is_migration_disabled() check and thus could try to migrate
        a task which shouldn't leading to assertion and scheduling failures.
      
      - It was testing p->cpus_ptr directly instead of using task_allowed_on_cpu()
        and thus failed to consider ISA compatibility.
      
      Update dispatch_to_local_dsq() to use task_can_run_on_remote_rq():
      
      - Move scx_ops_error() triggering into task_can_run_on_remote_rq().
      
      - When migration isn't allowed, fall back to the global DSQ instead of the
        source DSQ by returning DTL_INVALID. This is both simpler and an overall
        better behavior.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      0366017e
  2. 28 Aug, 2024 1 commit
  3. 27 Aug, 2024 1 commit
    • Tejun Heo's avatar
      scx_central: Fix smatch checker warning · 59cfdf3f
      Tejun Heo authored
      ARRAY_ELEM_PTR() is an access macro used to help the BPF verifier not
      confused by offseted memory acceeses by yiedling a valid pointer or NULL in
      a way that's clear to the verifier. As such, the canonical usage involves
      checking NULL return from the macro. Note that in many cases, the NULL
      condition can never happen - they're there just to hint the verifier.
      
      In a bpf_loop in scx_central.bpf.c::central_dispatch(), the NULL check was
      incorrect in that there was another dereference of the pointer in addition
      to the NULL checked access. This worked as the pointer can never be NULL and
      the verifier could tell it would never be NULL in this case.
      
      However, this still looks wrong and trips smatch:
      
        ./tools/sched_ext/scx_central.bpf.c:205 ____central_dispatch()
        error: we previously assumed 'gimme' could be null (see line 201)
      
        ./tools/sched_ext/scx_central.bpf.c
            195
            196                         if (!scx_bpf_dispatch_nr_slots())
            197                                 break;
            198
            199                         /* central's gimme is never set */
            200                         gimme = ARRAY_ELEM_PTR(cpu_gimme_task, cpu, nr_cpu_ids);
            201                         if (gimme && !*gimme)
      				      ^^^^^
        If gimme is NULL
      
            202                                 continue;
            203
            204                         if (dispatch_to_cpu(cpu))
        --> 205                                 *gimme = false;
      
      Fix the NULL check so that there are no derefs if NULL. This doesn't change
      actual behavior.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Link: http://lkml.kernel.org/r/<955e1c3c-ace2-4a1d-b246-15b8196038a3@stanley.mountain>
      59cfdf3f
  4. 20 Aug, 2024 2 commits
    • Yipeng Zou's avatar
      sched_ext: Allow dequeue_task_scx to fail · 9ad2861b
      Yipeng Zou authored
      Since dequeue_task() allowed to fail, there is a compile error:
      
      kernel/sched/ext.c:3630:19: error: initialization of ‘bool (*)(struct rq*, struct task_struct *, int)’ {aka ‘_Bool (*)(struct rq *, struct task_struct *, int)’} from incompatible pointer type ‘void (*)(struct rq*, struct task_struct *, int)’
        3630 |  .dequeue_task  = dequeue_task_scx,
             |                   ^~~~~~~~~~~~~~~~
      
      Allow dequeue_task_scx to fail too.
      
      Fixes: 863ccdbb ("sched: Allow sched_class::dequeue_task() to fail")
      Signed-off-by: default avatarYipeng Zou <zouyipeng@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      9ad2861b
    • Tejun Heo's avatar
      Merge branch 'tip/sched/core' into for-6.12 · 5ac99857
      Tejun Heo authored
      To receive 863ccdbb ("sched: Allow sched_class::dequeue_task() to fail")
      which makes sched_class.dequeue_task() return bool instead of void. This
      leads to compile breakage and will be fixed by a follow-up patch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      5ac99857
  5. 17 Aug, 2024 23 commits
  6. 13 Aug, 2024 2 commits
    • Tejun Heo's avatar
      sched_ext: Don't use double locking to migrate tasks across CPUs · 89909296
      Tejun Heo authored
      consume_remote_task() and dispatch_to_local_dsq() use
      move_task_to_local_dsq() to migrate the task to the target CPU. Currently,
      move_task_to_local_dsq() expects the caller to lock both the source and
      destination rq's. While this may save a few lock operations while the rq's
      are not contended, under contention, the double locking can exacerbate the
      situation significantly (refer to the linked message below).
      
      Update the migration path so that double locking is not used.
      move_task_to_local_dsq() now expects the caller to be locking the source rq,
      drops it and then acquires the destination rq lock. Code is simpler this way
      and, on a 2-way NUMA machine w/ Xeon Gold 6138, 'hackbench 100 thread 5000`
      shows ~3% improvement with scx_simple.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20240806082716.GP37996@noisy.programming.kicks-ass.netAcked-by: default avatarDavid Vernet <void@manifault.com>
      89909296
    • Manu Bretelle's avatar
      sched_ext: define missing cfi stubs for sched_ext · 33d031ec
      Manu Bretelle authored
      `__bpf_ops_sched_ext_ops` was missing the initialization of some struct
      attributes. With
      
        https://lore.kernel.org/all/20240722183049.2254692-4-martin.lau@linux.dev/
      
      every single attributes need to be initialized programs (like scx_layered)
      will fail to load.
      
        05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_init not found in kernel, skipping it as it's set to zero
        05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_exit not found in kernel, skipping it as it's set to zero
        05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_prep_move not found in kernel, skipping it as it's set to zero
        05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_move not found in kernel, skipping it as it's set to zero
        05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_cancel_move not found in kernel, skipping it as it's set to zero
        05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_set_weight not found in kernel, skipping it as it's set to zero
        05:26:48 [WARN] libbpf: prog 'layered_dump': BPF program load failed: unknown error (-524)
        05:26:48 [WARN] libbpf: prog 'layered_dump': -- BEGIN PROG LOAD LOG --
        attach to unsupported member dump of struct sched_ext_ops
        processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
        -- END PROG LOAD LOG --
        05:26:48 [WARN] libbpf: prog 'layered_dump': failed to load: -524
        05:26:48 [WARN] libbpf: failed to load object 'bpf_bpf'
        05:26:48 [WARN] libbpf: failed to load BPF skeleton 'bpf_bpf': -524
        Error: Failed to load BPF program
      Signed-off-by: default avatarManu Bretelle <chantr4@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      33d031ec
  7. 08 Aug, 2024 3 commits
    • Tejun Heo's avatar
      sched_ext: Improve logging around enable/disable · 344576fa
      Tejun Heo authored
      sched_ext currently doesn't generate messages when the BPF scheduler is
      enabled and disabled unless there are errors. It is useful to have paper
      trail. Improve logging around enable/disable:
      
      - Generate info messages on enable and non-error disable.
      
      - Update error exit message formatting so that it's consistent with
        non-error message. Also, prefix ei->msg with the BPF scheduler's name to
        make it clear where the message is coming from.
      
      - Shorten scx_exit_reason() strings for SCX_EXIT_UNREG* for brevity and
        consistency.
      
      v2: Use pr_*() instead of KERN_* consistently. (David)
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Suggested-by: default avatarPhil Auld <pauld@redhat.com>
      Reviewed-by: default avatarPhil Auld <pauld@redhat.com>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      344576fa
    • Tejun Heo's avatar
      sched_ext: Make scx_rq_online() also test cpu_active() in addition to SCX_RQ_ONLINE · 991ef53a
      Tejun Heo authored
      scx_rq_online() currently only tests SCX_RQ_ONLINE. This isn't fully correct
      - e.g. consume_dispatch_q() uses task_run_on_remote_rq() which tests
      scx_rq_online() to see whether the current rq can run the task, and, if so,
      calls consume_remote_task() to migrate the task to @rq. While the test
      itself was done while locking @rq, @rq can be temporarily unlocked by
      consume_remote_task() and nothing prevents SCX_RQ_ONLINE from going offline
      before the migration takes place.
      
      To address the issue, add cpu_active() test to scx_rq_online(). There is a
      synchronize_rcu() between cpu_active() being cleared and the rq going
      offline, so if an on-going scheduling operation sees cpu_active(), the
      associated rq is guaranteed to not go offline until the scheduling operation
      is complete.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 60c27fb5 ("sched_ext: Implement sched_ext_ops.cpu_online/offline()")
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      991ef53a
    • Tejun Heo's avatar
      sched_ext: Fix unsafe list iteration in process_ddsp_deferred_locals() · 72763ea3
      Tejun Heo authored
      process_ddsp_deferred_locals() executes deferred direct dispatches to the
      local DSQs of remote CPUs. It iterates the tasks on
      rq->scx.ddsp_deferred_locals list, removing and calling
      dispatch_to_local_dsq() on each. However, the list is protected by the rq
      lock that can be dropped by dispatch_to_local_dsq() temporarily, so the list
      can be modified during the iteration, which can lead to oopses and other
      failures.
      
      Fix it by popping from the head of the list instead of iterating the list.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 5b26f7b9 ("sched_ext: Allow SCX_DSQ_LOCAL_ON for direct dispatches")
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      72763ea3
  8. 07 Aug, 2024 5 commits
  9. 06 Aug, 2024 1 commit
    • Tejun Heo's avatar
      sched_ext: Make task_can_run_on_remote_rq() use common task_allowed_on_cpu() · 2c390dda
      Tejun Heo authored
      task_can_run_on_remote_rq() is similar to is_cpu_allowed() but there are
      subtle differences. It currently open codes all the tests. This is
      cumbersome to understand and error-prone in case the intersecting tests need
      to be updated.
      
      Factor out the common part - testing whether the task is allowed on the CPU
      at all regardless of the CPU state - into task_allowed_on_cpu() and make
      both is_cpu_allowed() and SCX's task_can_run_on_remote_rq() use it. As the
      code is now linked between the two and each contains only the extra tests
      that differ between them, it's less error-prone when the conditions need to
      be updated. Also, improve the comment to explain why they are different.
      
      v2: Replace accidental "extern inline" with "static inline" (Peter).
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      2c390dda