1. 07 Oct, 2024 3 commits
    • Tejun Heo's avatar
      sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED · 9b671793
      Tejun Heo authored
      scx_qmap and other schedulers in the SCX repo are using SCX_ENQ_WAKEUP to
      tell whether ops.select_cpu() was called. This is incorrect as
      ops.select_cpu() can be skipped in the wakeup path and leads to e.g.
      incorrectly skipping direct dispatch for tasks that are bound to a single
      CPU.
      
      sched core has been updated to specify ENQUEUE_RQ_SELECTED if
      ->select_task_rq() was called. Map it to SCX_ENQ_CPU_SELECTED and update
      scx_qmap to test it instead of SCX_ENQ_WAKEUP.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
      Cc: Changwoo Min <multics69@gmail.com>
      Cc: Andrea Righi <andrea.righi@linux.dev>
      Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
      9b671793
    • Tejun Heo's avatar
      sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called · f207dc2d
      Tejun Heo authored
      During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
      migration is disabled. sched_ext schedulers may perform operations such as
      direct dispatch from ->select_task_rq() path and it is useful for them to
      know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
      
      Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
      and end up assuming incorrectly that ->select_task_rq() was called for tasks
      that are bound to a single CPU or migration disabled.
      
      Make select_task_rq() indicate whether ->select_task_rq() was called by
      setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
      to ENQUEUE_RQ_SELECTED for ->enqueue_task().
      
      This will be used by sched_ext to fix ->select_task_rq() skip detection.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      f207dc2d
    • Tejun Heo's avatar
      sched/core: Make select_task_rq() take the pointer to wake_flags instead of value · b62933ee
      Tejun Heo authored
      This will be used to allow select_task_rq() to indicate whether
      ->select_task_rq() was called by modifying *wake_flags.
      
      This makes try_to_wake_up() call all functions that take wake_flags with
      WF_TTWU set. Previously, only select_task_rq() was. Using the same flags is
      more consistent, and, as the flag is only tested by ->select_task_rq()
      implementations, it doesn't cause any behavior differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      b62933ee
  2. 04 Oct, 2024 2 commits
    • Tejun Heo's avatar
      sched_ext: scx_cgroup_exit() may be called without successful scx_cgroup_init() · ec010333
      Tejun Heo authored
      568894ed ("sched_ext: Add scx_cgroup_enabled to gate cgroup operations
      and fix scx_tg_online()") assumed that scx_cgroup_exit() is only called
      after scx_cgroup_init() finished successfully. This isn't true.
      scx_cgroup_exit() can be called without scx_cgroup_init() being called at
      all or after scx_cgroup_init() failed in the middle.
      
      As init state is tracked per cgroup, scx_cgroup_exit() can be used safely to
      clean up in all cases. Remove the incorrect WARN_ON_ONCE().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 568894ed ("sched_ext: Add scx_cgroup_enabled to gate cgroup operations and fix scx_tg_online()")
      ec010333
    • Tejun Heo's avatar
      sched_ext: Improve error reporting during loading · cc9877fb
      Tejun Heo authored
      When the BPF scheduler fails, ops.exit() allows rich error reporting through
      scx_exit_info. Use scx.exit() path consistently for all failures which can
      be caused by the BPF scheduler:
      
      - scx_ops_error() is called after ops.init() and ops.cgroup_init() failure
        to record error information.
      
      - ops.init_task() failure now uses scx_ops_error() instead of pr_err().
      
      - The err_disable path updated to automatically trigger scx_ops_error() to
        cover cases that the error message hasn't already been generated and
        always return 0 indicating init success so that the error is reported
        through ops.exit().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Vernet <void@manifault.com>
      Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
      Cc: Changwoo Min <multics69@gmail.com>
      Cc: Andrea Righi <andrea.righi@linux.dev>
      Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
      cc9877fb
  3. 02 Oct, 2024 1 commit
  4. 27 Sep, 2024 9 commits
    • Zhang Qiao's avatar
      sched_ext: Remove redundant p->nr_cpus_allowed checker · 95b87369
      Zhang Qiao authored
      select_rq_task() already checked that 'p->nr_cpus_allowed > 1',
      'p->nr_cpus_allowed == 1' checker in scx_select_cpu_dfl() is redundant.
      Signed-off-by: default avatarZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      95b87369
    • Tejun Heo's avatar
      sched_ext: Decouple locks in scx_ops_enable() · efe231d9
      Tejun Heo authored
      The enable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and
      cpus_read_lock. Currently, the locks are grabbed together which is prone to
      locking order problems.
      
      For example, currently, there is a possible deadlock involving
      scx_fork_rwsem and cpus_read_lock. cpus_read_lock has to nest inside
      scx_fork_rwsem due to locking order existing in other subsystems. However,
      there exists a dependency in the other direction during hotplug if hotplug
      needs to fork a new task, which happens in some cases. This leads to the
      following deadlock:
      
             scx_ops_enable()                               hotplug
      
                                                percpu_down_write(&cpu_hotplug_lock)
         percpu_down_write(&scx_fork_rwsem)
         block on cpu_hotplug_lock
                                                kthread_create() waits for kthreadd
      					  kthreadd blocks on scx_fork_rwsem
      
      Note that this doesn't trigger lockdep because the hotplug side dependency
      bounces through kthreadd.
      
      With the preceding scx_cgroup_enabled change, this can be solved by
      decoupling cpus_read_lock, which is needed for static_key manipulations,
      from the other two locks.
      
      - Move the first block of static_key manipulations outside of scx_fork_rwsem
        and scx_cgroup_rwsem. This is now safe with the preceding
        scx_cgroup_enabled change.
      
      - Drop scx_cgroup_rwsem and scx_fork_rwsem between the two task iteration
        blocks so that __scx_ops_enabled static_key enabling is outside the two
        rwsems.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-tested-by: default avatarAboorva Devarajan <aboorvad@linux.ibm.com>
      Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
      efe231d9
    • Tejun Heo's avatar
      sched_ext: Decouple locks in scx_ops_disable_workfn() · 16021656
      Tejun Heo authored
      The disable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and
      cpus_read_lock. Currently, the locks are grabbed together which is prone to
      locking order problems. With the preceding scx_cgroup_enabled change, we can
      decouple them:
      
      - As cgroup disabling no longer requires modifying a static_key which
        requires cpus_read_lock(), no need to grab cpus_read_lock() before
        grabbing scx_cgroup_rwsem.
      
      - cgroup can now be independently disabled before tasks are moved back to
        the fair class.
      
      Relocate scx_cgroup_exit() invocation before scx_fork_rwsem is grabbed, drop
      now unnecessary cpus_read_lock() and move static_key operations out of
      scx_fork_rwsem. This decouples all three locks in the disable path.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-tested-by: default avatarAboorva Devarajan <aboorvad@linux.ibm.com>
      Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
      16021656
    • Tejun Heo's avatar
      sched_ext: Add scx_cgroup_enabled to gate cgroup operations and fix scx_tg_online() · 568894ed
      Tejun Heo authored
      If the BPF scheduler does not implement ops.cgroup_init(), scx_tg_online()
      didn't set SCX_TG_INITED which meant that ops.cgroup_exit(), even if
      implemented, won't be called from scx_tg_offline(). This is because
      SCX_HAS_OP(cgroupt_init) is used to test both whether SCX cgroup operations
      are enabled and ops.cgroup_init() exists.
      
      Fix it by introducing a separate bool scx_cgroup_enabled to gate cgroup
      operations and use SCX_HAS_OP(cgroup_init) only to test whether
      ops.cgroup_init() exists. Make all cgroup operations consistently use
      scx_cgroup_enabled to test whether cgroup operations are enabled.
      scx_cgroup_enabled is added instead of using scx_enabled() to ease planned
      locking updates.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      568894ed
    • Tejun Heo's avatar
      sched_ext: Enable scx_ops_init_task() separately · 4269c603
      Tejun Heo authored
      scx_ops_init_task() and the follow-up scx_ops_enable_task() in the fork path
      were gated by scx_enabled() test and thus __scx_ops_enabled had to be turned
      on before the first scx_ops_init_task() loop in scx_ops_enable(). However,
      if an external entity causes sched_class switch before the loop is complete,
      tasks which are not initialized could be switched to SCX.
      
      The following can be reproduced by running a program which keeps toggling a
      process between SCHED_OTHER and SCHED_EXT using sched_setscheduler(2).
      
        sched_ext: Invalid task state transition 0 -> 3 for fish[1623]
        WARNING: CPU: 1 PID: 1650 at kernel/sched/ext.c:3392 scx_ops_enable_task+0x1a1/0x200
        ...
        Sched_ext: simple (enabling)
        RIP: 0010:scx_ops_enable_task+0x1a1/0x200
        ...
         switching_to_scx+0x13/0xa0
         __sched_setscheduler+0x850/0xa50
         do_sched_setscheduler+0x104/0x1c0
         __x64_sys_sched_setscheduler+0x18/0x30
         do_syscall_64+0x7b/0x140
         entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
      Fix it by gating scx_ops_init_task() separately using
      scx_ops_init_task_enabled. __scx_ops_enabled is now set after all tasks are
      finished with scx_ops_init_task().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      4269c603
    • Tejun Heo's avatar
      sched_ext: Fix SCX_TASK_INIT -> SCX_TASK_READY transitions in scx_ops_enable() · 9753358a
      Tejun Heo authored
      scx_ops_enable() has two task iteration loops. The first one calls
      scx_ops_init_task() on every task and the latter switches the eligible ones
      into SCX. The first loop left the tasks in SCX_TASK_INIT state and then the
      second loop switched it into READY before switching the task into SCX.
      
      The distinction between INIT and READY is only meaningful in the fork path
      where it's used to tell whether the task finished forking so that we can
      tell ops.exit_task() accordingly. Leaving task in INIT state between the two
      loops is incosistent with the fork path and incorrect. The following can be
      triggered by running a program which keeps toggling a task between
      SCHED_OTHER and SCHED_SCX while enabling a task:
      
        sched_ext: Invalid task state transition 1 -> 3 for fish[1526]
        WARNING: CPU: 2 PID: 1615 at kernel/sched/ext.c:3393 scx_ops_enable_task+0x1a1/0x200
        ...
        Sched_ext: qmap (enabling+all)
        RIP: 0010:scx_ops_enable_task+0x1a1/0x200
        ...
         switching_to_scx+0x13/0xa0
         __sched_setscheduler+0x850/0xa50
         do_sched_setscheduler+0x104/0x1c0
         __x64_sys_sched_setscheduler+0x18/0x30
         do_syscall_64+0x7b/0x140
         entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
      Fix it by transitioning to READY in the first loop right after
      scx_ops_init_task() succeeds.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Vernet <void@manifault.com>
      9753358a
    • Tejun Heo's avatar
      sched_ext: Initialize in bypass mode · 8c2090c5
      Tejun Heo authored
      scx_ops_enable() used preempt_disable() around the task iteration loop to
      switch tasks into SCX to guarantee forward progress of the task which is
      running scx_ops_enable(). However, in the gap between setting
      __scx_ops_enabled and preeempt_disable(), an external entity can put tasks
      including the enabling one into SCX prematurely, which can lead to
      malfunctions including stalls.
      
      The bypass mode can wrap the entire enabling operation and guarantee forward
      progress no matter what the BPF scheduler does. Use the bypass mode instead
      to guarantee forward progress while enabling.
      
      While at it, release and regrab scx_tasks_lock between the two task
      iteration locks in scx_ops_enable() for clarity as there is no reason to
      keep holding the lock between them.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      8c2090c5
    • Tejun Heo's avatar
      sched_ext: Remove SCX_OPS_PREPPING · fc1fcebe
      Tejun Heo authored
      The distinction between SCX_OPS_PREPPING and SCX_OPS_ENABLING is not used
      anywhere and only adds confusion. Drop SCX_OPS_PREPPING.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      fc1fcebe
    • Tejun Heo's avatar
      sched_ext: Relocate check_hotplug_seq() call in scx_ops_enable() · 1bbcfe62
      Tejun Heo authored
      check_hotplug_seq() is used to detect CPU hotplug event which occurred while
      the BPF scheduler is being loaded so that initialization can be retried if
      CPU hotplug events take place before the CPU hotplug callbacks are online.
      
      As such, the best place to call it is in the same cpu_read_lock() section
      that enables the CPU hotplug ops. Currently, it is called in the next
      cpus_read_lock() block in scx_ops_enable(). The side effect of this
      placement is a small window in which hotplug sequence detection can trigger
      unnecessarily, which isn't critical.
      
      Move check_hotplug_seq() invocation to the same cpus_read_lock() block as
      the hotplug operation enablement to close the window and get the invocation
      out of the way for planned locking updates.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Vernet <void@manifault.com>
      1bbcfe62
  5. 26 Sep, 2024 5 commits
    • Tejun Heo's avatar
      sched_ext: Use shorter slice while bypassing · 6f34d8d3
      Tejun Heo authored
      While bypassing, tasks are scheduled in FIFO order which favors tasks that
      hog CPUs. This can slow down e.g. unloading of the BPF scheduler. While
      bypassing, guaranteeing timely forward progress is the main goal. There's no
      point in giving long slices. Shorten the time slice used while bypassing
      from 20ms to 5ms.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      6f34d8d3
    • Tejun Heo's avatar
      sched_ext: Split the global DSQ per NUMA node · b7b3b2db
      Tejun Heo authored
      In the bypass mode, the global DSQ is used to schedule all tasks in simple
      FIFO order. All tasks are queued into the global DSQ and all CPUs try to
      execute tasks from it. This creates a lot of cross-node cacheline accesses
      and scheduling across the node boundaries, and can lead to live-lock
      conditions where the system takes tens of minutes to disable the BPF
      scheduler while executing in the bypass mode.
      
      Split the global DSQ per NUMA node. Each node has its own global DSQ. When a
      task is dispatched to SCX_DSQ_GLOBAL, it's put into the global DSQ local to
      the task's CPU and all CPUs in a node only consume its node-local global
      DSQ.
      
      This resolves a livelock condition which could be reliably triggered on an
      2x EPYC 7642 system by running `stress-ng --race-sched 1024` together with
      `stress-ng --workload 80 --workload-threads 10` while repeatedly enabling
      and disabling a SCX scheduler.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      b7b3b2db
    • Tejun Heo's avatar
      sched_ext: Relocate find_user_dsq() · bba26bf3
      Tejun Heo authored
      To prepare for the addition of find_global_dsq(). No functional changes.
      Signed-off-by: default avatartejun heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      bba26bf3
    • Tejun Heo's avatar
      sched_ext: Allow only user DSQs for scx_bpf_consume(), scx_bpf_dsq_nr_queued()... · 63fb3ec8
      Tejun Heo authored
      sched_ext: Allow only user DSQs for scx_bpf_consume(), scx_bpf_dsq_nr_queued() and bpf_iter_scx_dsq_new()
      
      SCX_DSQ_GLOBAL is special in that it can't be used as a priority queue and
      is consumed implicitly, but all BPF DSQ related kfuncs could be used on it.
      SCX_DSQ_GLOBAL will be split per-node for scalability and those operations
      won't make sense anymore. Disallow SCX_DSQ_GLOBAL on scx_bpf_consume(),
      scx_bpf_dsq_nr_queued() and bpf_iter_scx_dsq_new(). This means that
      SCX_DSQ_GLOBAL can only be used as a dispatch target from BPF schedulers.
      
      With scx_flatcg, which was using SCX_DSQ_GLOBAL as the fallback DSQ,
      updated, this shouldn't affect any schedulers.
      
      This leaves find_dsq_for_dispatch() the only user of find_non_local_dsq().
      Open code and remove find_non_local_dsq().
      Signed-off-by: default avatartejun heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      63fb3ec8
    • Tejun Heo's avatar
      scx_flatcg: Use a user DSQ for fallback instead of SCX_DSQ_GLOBAL · c9c809f4
      Tejun Heo authored
      scx_flatcg was using SCX_DSQ_GLOBAL for fallback handling. However, it is
      assuming that SCX_DSQ_GLOBAL isn't automatically consumed, which was true a
      while ago but is no longer the case. Also, there are further changes planned
      for SCX_DSQ_GLOBAL which will disallow explicit consumption from it. Switch
      to a user DSQ for fallback.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      c9c809f4
  6. 25 Sep, 2024 2 commits
  7. 24 Sep, 2024 18 commits
    • Tejun Heo's avatar
      sched_ext: Build fix for !CONFIG_SMP · 42268ad0
      Tejun Heo authored
      move_remote_task_to_local_dsq() is only defined on SMP configs but
      scx_disaptch_from_dsq() was calling move_remote_task_to_local_dsq() on UP
      configs too causing build failures. Add a dummy
      move_remote_task_to_local_dsq() which triggers a warning.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 4c30f5ce ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202409241108.jaocHiDJ-lkp@intel.com/
      42268ad0
    • Linus Torvalds's avatar
      Merge tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · 68e5c7d4
      Linus Torvalds authored
      Pull Kbuild updates from Masahiro Yamada:
      
       - Support cross-compiling linux-headers Debian package and kernel-devel
         RPM package
      
       - Add support for the linux-debug Pacman package
      
       - Improve module rebuilding speed by factoring out the common code to
         scripts/module-common.c
      
       - Separate device tree build rules into scripts/Makefile.dtbs
      
       - Add a new script to generate modules.builtin.ranges, which is useful
         for tracing tools to find symbols in built-in modules
      
       - Refactor Kconfig and misc tools
      
       - Update Kbuild and Kconfig documentation
      
      * tag 'kbuild-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (51 commits)
        kbuild: doc: replace "gcc" in external module description
        kbuild: doc: describe the -C option precisely for external module builds
        kbuild: doc: remove the description about shipped files
        kbuild: doc: drop section numbering, use references in modules.rst
        kbuild: doc: throw out the local table of contents in modules.rst
        kbuild: doc: remove outdated description of the limitation on -I usage
        kbuild: doc: remove description about grepping CONFIG options
        kbuild: doc: update the description about Kbuild/Makefile split
        kbuild: remove unnecessary export of RUST_LIB_SRC
        kbuild: remove append operation on cmd_ld_ko_o
        kconfig: cache expression values
        kconfig: use hash table to reuse expressions
        kconfig: refactor expr_eliminate_dups()
        kconfig: add comments to expression transformations
        kconfig: change some expr_*() functions to bool
        scripts: move hash function from scripts/kconfig/ to scripts/include/
        kallsyms: change overflow variable to bool type
        kallsyms: squash output_address()
        kbuild: add install target for modules.builtin.ranges
        scripts: add verifier script for builtin module range data
        ...
      68e5c7d4
    • Linus Torvalds's avatar
      Merge tag 'linux-cpupower-6.12-rc1-fixes' of... · 7f8de2bf
      Linus Torvalds authored
      Merge tag 'linux-cpupower-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux
      
      Pull cpupower updates from Shuah Khan
       "The 'raw_pylibcpupower.i' file was being removed by "make mrproper".
      
        That was because '*.i', '.s' and '*.o' files are generated during
        kernel compile and removed when the repo is cleaned by mrproper.
      
        Rename it to use .swg extension instead to avoid the problem.
      
        A second patch removes references to it from .gitignore"
      
      * tag 'linux-cpupower-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
        pm: cpupower: Clean up bindings gitignore
        pm: cpupower: rename raw_pylibcpupower.i
      7f8de2bf
    • Linus Torvalds's avatar
      Merge tag 'i3c/for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux · cd3d6477
      Linus Torvalds authored
      Pull i3c updates from Alexandre Belloni:
       "This adds support for the I3C HCI controller of the AMD SoC which as
        expected requires quirks. Also fixes for the other drivers, including
        rate selection fixes for svc.
      
        Core:
         - allow adjusting first broadcast address speed
      
        Drivers:
         - cdns: few fixes
         - mipi-i3c-hci: Add AMD SoC I3C controller support and quirks, fix
           get_i3c_mode
         - svc: adjust rates, fix race condition"
      
      * tag 'i3c/for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
        i3c: master: svc: Fix use after free vulnerability in svc_i3c_master Driver Due to Race Condition
        i3c: master: cdns: Fix use after free vulnerability in cdns_i3c_master Driver Due to Race Condition
        i3c: master: svc: adjust SDR according to i3c spec
        i3c: master: svc: use slow speed for first broadcast address
        i3c: master: support to adjust first broadcast address speed
        i3c/master: cmd_v1: Fix the rule for getting i3c mode
        i3c: master: cdns: fix module autoloading
        i3c: mipi-i3c-hci: Add a quirk to set Response buffer threshold
        i3c: mipi-i3c-hci: Add a quirk to set timing parameters
        i3c: mipi-i3c-hci: Relocate helper macros to HCI header file
        i3c: mipi-i3c-hci: Add a quirk to set PIO mode
        i3c: mipi-i3c-hci: Read HC_CONTROL_PIO_MODE only after i3c hci v1.1
        i3c: mipi-i3c-hci: Add AMDI5017 ACPI ID to the I3C Support List
      cd3d6477
    • Linus Torvalds's avatar
      remoteproc: k3-m4: use the proper dependencies · ba0c0cb5
      Linus Torvalds authored
      The TI_K3_M4_REMOTEPROC Kconfig entry selects OMAP2PLUS_MBOX, but that
      driver in turn depends on other things, which the k4-m4 driver didn't.
      
      This causes a Kconfig time warning:
      
        WARNING: unmet direct dependencies detected for OMAP2PLUS_MBOX
          Depends on [n]: MAILBOX [=y] && (ARCH_OMAP2PLUS || ARCH_K3)
          Selected by [m]:
          - TI_K3_M4_REMOTEPROC [=m] && REMOTEPROC [=y] && (ARCH_K3 || COMPILE_TEST [=y])
      
      because you can't select something that is unavailable.
      
      Make the dependencies for TI_K3_M4_REMOTEPROC match those of the
      OMAP2PLUS_MBOX driver that it needs.
      
      Fixes: ebcf9008 ("remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem")
      Cc: Bjorn Andersson <andersson@kernel.org>
      Cc: Martyn Welch <martyn.welch@collabora.com>
      Cc: Hari Nagalla <hnagalla@ti.com>
      Cc: Andrew Davis <afd@ti.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba0c0cb5
    • Linus Torvalds's avatar
      Merge tag 'input-for-v6.12-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 9ae2940c
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
      
       - support for PixArt PS/2 touchpad
      
       - updates to tsc2004/5, usbtouchscreen, and zforce_ts drivers
      
       - support for GPIO-only mode for ADP55888 controller
      
       - support for touch keys in Zinitix driver
      
       - support for querying density of Synaptics sensors
      
       - sysfs interface for Goodex "Berlin" devices to read and write touch
         IC registers
      
       - more quirks to i8042 to handle various Tuxedo laptops
      
       - a number of drivers have been converted to using "guard" notation
         when acquiring various locks, as well as using other cleanup
         functions to simplify releasing of resources (with more drivers to
         follow)
      
       - evdev will limit amount of data that can be written into an evdev
         instance at a given time to 4096 bytes (170 input events) to avoid
         holding evdev->mutex for too long and starving other users
      
       - Spitz has been converted to use software nodes/properties to describe
         its matrix keypad and GPIO-connected LEDs
      
       - msc5000_ts, msc_touchkey and keypad-nomadik-ske drivers have been
         removed since noone in mainline have been using them
      
       - other assorted cleanups and fixes
      
      * tag 'input-for-v6.12-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (98 commits)
        ARM: spitz: fix compile error when matrix keypad driver is enabled
        Input: hynitron_cstxxx - drop explicit initialization of struct i2c_device_id::driver_data to 0
        Input: adp5588-keys - fix check on return code
        Input: Convert comma to semicolon
        Input: i8042 - add TUXEDO Stellaris 15 Slim Gen6 AMD to i8042 quirk table
        Input: i8042 - add another board name for TUXEDO Stellaris Gen5 AMD line
        Input: tegra-kbc - use of_property_read_variable_u32_array() and of_property_present()
        Input: ps2-gpio - use IRQF_NO_AUTOEN flag in request_irq()
        Input: ims-pcu - fix calling interruptible mutex
        Input: zforce_ts - switch to using asynchronous probing
        Input: zforce_ts - remove assert/deassert wrappers
        Input: zforce_ts - do not hardcode interrupt level
        Input: zforce_ts - switch to using devm_regulator_get_enable()
        Input: zforce_ts - stop treating VDD regulator as optional
        Input: zforce_ts - make zforce_idtable constant
        Input: zforce_ts - use dev_err_probe() where appropriate
        Input: zforce_ts - do not ignore errors when acquiring regulator
        Input: zforce_ts - make parsing of contacts less confusing
        Input: zforce_ts - switch to using get_unaligned_le16
        Input: zforce_ts - use guard notation when acquiring mutexes
        ...
      9ae2940c
    • Linus Torvalds's avatar
      Merge tag 'hwlock-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux · 6db6a19f
      Linus Torvalds authored
      Pull hwspinlock update from Bjorn Andersson:
       "This converts the Spreadtrum hardware spinlock DeviceTree binding to
        YAML, to allow validation of related DeviceTree source"
      
      * tag 'hwlock-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
        dt-bindings: hwlock: sprd-hwspinlock: convert to YAML
      6db6a19f
    • Linus Torvalds's avatar
      Merge tag 'rpmsg-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux · 6e10aa1f
      Linus Torvalds authored
      Pull rpmsg updates from Bjorn Andersson:
      
       - Minor cleanup/refactor to the Qualcomm GLINK code, in order to add
         trace events related to the messages exchange with the remote side,
         useful for debugging a range of interoperability issues
      
       - Rewrite the nested structs with flexible array members in order to
         avoid the risk of invalid accesses
      
      * tag 'rpmsg-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
        rpmsg: glink: Avoid -Wflex-array-member-not-at-end warnings
        rpmsg: glink: Introduce packet tracepoints
        rpmsg: glink: Pass channel to qcom_glink_send_close_ack()
        rpmsg: glink: Tidy up RX advance handling
      6e10aa1f
    • Linus Torvalds's avatar
      Merge tag 'rproc-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux · 5c480f1d
      Linus Torvalds authored
      Pull remoteproc updates from Bjorn Andersson:
      
       - Add remoteproc support for the Cortex M4F found in AM62x and AM64x of
         the TI K3 family, support for the modem remoteproc in the Qualcomm
         SDX75, and audio, compute and general-purpose DSPs of the Qualcomm
         SA8775P.
      
       - Add support for blocking and non-blocking mailbox transmissions to
         the i.MX remoteproc driver, and implement poweroff and reboot
         mechanisms using them. Plus a few bug fixes and minor improvements.
      
       - Cleanups and bug fixes for the TI K3 DSP and R5F drivers
      
       - Support mapping SRAM regions into the AMD-Xilinx Zynqmp R5 cores
      
       - Use devres helpers for various allocations in the Ingenic, TI DA8xx,
         TI Keystone, TI K3, ST slim drivers
      
       - Replace uses of of_{find,get}_property() with of_property_present()
         where possible
      
      * tag 'rproc-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: (25 commits)
        remoteporc: ingenic: Use devm_platform_ioremap_resource_byname()
        remoteproc: da8xx: Use devm_platform_ioremap_resource_byname()
        remoteproc: st_slim: Use devm_platform_ioremap_resource_byname()
        remoteproc: xlnx: Add sram support
        remoteproc: k3-r5: Fix error handling when power-up failed
        remoteproc: imx_rproc: Add support for poweroff and reboot
        remoteproc: imx_rproc: Allow setting of the mailbox transmit mode
        remoteproc: k3-r5: Delay notification of wakeup event
        remoteproc: k3-m4: Add a remoteproc driver for M4F subsystem
        remoteproc: k3: Factor out TI-SCI processor control OF get function
        dt-bindings: remoteproc: k3-m4f: Add K3 AM64x SoCs
        remoteproc: k3-dsp: Acquire mailbox handle during probe routine
        remoteproc: k3-r5: Acquire mailbox handle during probe routine
        remoteproc: k3-r5: Use devm_rproc_alloc() helper
        remoteproc: qcom: pas: Add support for SA8775p ADSP, CDSP and GPDSP
        remoteproc: qcom: pas: Add SDX75 remoteproc support
        dt-bindings: remoteproc: qcom,sm8550-pas: document the SDX75 PAS
        remoteproc: keystone: Use devm_rproc_alloc() helper
        remoteproc: keystone: Use devm_kasprintf() to build name string
        dt-bindings: remoteproc: xlnx,zynqmp-r5fss: Add missing "additionalProperties" on child nodes
        ...
      5c480f1d
    • Linus Torvalds's avatar
      Merge tag 'vfio-v6.12-rc1' of https://github.com/awilliam/linux-vfio · 7bc21c5e
      Linus Torvalds authored
      Pull VFIO updates from Alex Williamson:
       "Just a few cleanups this cycle:
      
         - Remove several unused structure and function declarations, and
           unused variables (Dr. David Alan Gilbert, Yue Haibing, Zhang Zekun)
      
         - Constify unmodified structure in mdev (Hongbo Li)
      
         - Convert to unsigned type to catch overflow with less fanfare than
           passing a negative value to kcalloc() (Dan Carpenter)"
      
      * tag 'vfio-v6.12-rc1' of https://github.com/awilliam/linux-vfio:
        vfio/pci: clean up a type in vfio_pci_ioctl_pci_hot_reset_groups()
        vfio/mdev: Constify struct kobj_type
        vfio: mdev: Remove unused function declarations
        vfio/fsl-mc: Remove unused variable 'hwirq'
        vfio/pci: Remove unused struct 'vfio_pci_mmap_vma'
      7bc21c5e
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-6.12-2024-09-24' of git://git.infradead.org/users/hch/dma-mapping · 4491b854
      Linus Torvalds authored
      Pull dma-mapping fixes from Christoph Hellwig:
      
       - sort out a few issues with the direct calls to iommu-dma (Christoph
         Hellwig, Leon Romanovsky)
      
      * tag 'dma-mapping-6.12-2024-09-24' of git://git.infradead.org/users/hch/dma-mapping:
        dma-mapping: report unlimited DMA addressing in IOMMU DMA path
        iommu/dma: remove most stubs in iommu-dma.h
        dma-mapping: fix vmap and mmap of noncontiougs allocations
      4491b854
    • Linus Torvalds's avatar
      Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd · db78436b
      Linus Torvalds authored
      Pull iommufd updates from Jason Gunthorpe:
       "Collection of small cleanup and one fix:
      
         - Sort headers and struct forward declarations
      
         - Fix random selftest failures in some cases due to dirty tracking
           tests
      
         - Have the reserved IOVA regions mechanism work when a HWPT is used
           as a nesting parent. This updates the nesting parent's IOAS with
           the reserved regions of the device and will also install the ITS
           doorbell page on ARM.
      
         - Add missed validation of parent domain ops against the current
           iommu
      
         - Fix a syzkaller bug related to integer overflow during ALIGN()
      
         - Tidy two iommu_domain attach paths"
      
      * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
        iommu: Set iommu_attach_handle->domain in core
        iommufd: Avoid duplicated __iommu_group_set_core_domain() call
        iommufd: Protect against overflow of ALIGN() during iova allocation
        iommufd: Reorder struct forward declarations
        iommufd: Check the domain owner of the parent before creating a nesting domain
        iommufd/device: Enforce reserved IOVA also when attached to hwpt_nested
        iommufd/selftest: Fix buffer read overrrun in the dirty test
        iommufd: Reorder include files
      db78436b
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 54d7e819
      Linus Torvalds authored
      Pull rdma updates from Jason Gunthorpe:
       "Usual collection of small improvements and fixes, nothing especially
        stands out to me here.
      
        The new multipath PCI feature is a sign of things to come, I think we
        will see more of this in the next 10 years. Broadcom and HNS continue
        to update their drivers for their new HW generations.
      
        Summary:
      
         - Bug fixes and minor improvments in cxgb4, siw, mlx5, rxe, efa, rts,
           hfi, erdma, hns, irdma
      
         - Code cleanups/typos/etc. Tidy alloc_ordered_workqueue() calls
      
         - Multipath PCI for mlx5
      
         - Variable size work queue, SRQ changes, and relaxed ordering for new
           bnxt HW
      
         - New ODP fault resolution FW protocol in mlx5
      
         - New 'rdma monitor' netlink mechanism"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits)
        RDMA/bnxt_re: Remove the unused variable en_dev
        RDMA/nldev: Add missing break in rdma_nl_notify_err_msg()
        RDMA/irdma: fix error message in irdma_modify_qp_roce()
        RDMA/cxgb4: Added NULL check for lookup_atid
        RDMA/hns: Fix ah error counter in sw stat not increasing
        RDMA/bnxt_re: Recover the device when FW error is detected
        RDMA/bnxt_re: Group all operations under add_device and remove_device
        RDMA/bnxt_re: Use the aux device for L2 ULP callbacks
        RDMA/bnxt_re: Change aux driver data to en_info to hold more information
        RDMA/nldev: Expose whether RDMA monitoring is supported
        RDMA/nldev: Add support for RDMA monitoring
        RDMA/mlx5: Use IB set_netdev and get_netdev functions
        RDMA/device: Remove optimization in ib_device_get_netdev()
        RDMA/mlx5: Initialize phys_port_cnt earlier in RDMA device creation
        RDMA/mlx5: Obtain upper net device only when needed
        RDMA/mlx5: Check RoCE LAG status before getting netdev
        RDMA/mlx5: Consider the query_vuid cap for data_direct
        net/mlx5: Handle memory scheme ODP capabilities
        RDMA/mlx5: Add implicit MR handling to ODP memory scheme
        RDMA/mlx5: Add handling for memory scheme page fault events
        ...
      54d7e819
    • Linus Torvalds's avatar
      Merge tag 'sched_ext-for-6.12-rc1-fixes' of... · 6fa6588e
      Linus Torvalds authored
      Merge tag 'sched_ext-for-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
      
      Pull sched_ext fixes from Tejun Heo:
      
       - Three build fixes
      
       - The fix for a stall bug introduced by a recent optimization in sched
         core (SM_IDLE)
      
       - Addition of /sys/kernel/sched_ext/enable_seq. While not a fix, it is
         a simple addition that distro people want to be able to tell whether
         an SCX scheduler has ever been loaded on the system
      
      * tag 'sched_ext-for-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
        sched_ext: Provide a sysfs enable_seq counter
        sched_ext: Fix build when !CONFIG_STACKTRACE
        sched, sched_ext: Disable SM_IDLE/rq empty path when scx_enabled()
        sched: Put task_group::idle under CONFIG_GROUP_SCHED_WEIGHT
        sched: Add dummy version of sched_group_set_idle()
      6fa6588e
    • Linus Torvalds's avatar
      Merge tag 'for-6.12/io_uring-20240922' of git://git.kernel.dk/linux · 3147a068
      Linus Torvalds authored
      Pull more io_uring updates from Jens Axboe:
       "Mostly just a set of fixes in here, or little changes that didn't get
        included in the initial pull request. This contains:
      
         - Move the SQPOLL napi polling outside the submission lock (Olivier)
      
         - Rename of the "copy buffers" API that got added in the 6.12 merge
           window. There's really no copying going on, it's just referencing
           the buffers. After a bit of consideration, decided that it was
           better to simply rename this to avoid potential confusion (me)
      
         - Shrink struct io_mapped_ubuf from 48 to 32 bytes, by changing it to
           start + len tracking rather than having start / end in there, and
           by removing the caching of folio_mask when we can just calculate it
           from folio_shift when we need it (me)
      
         - Fixes for the SQPOLL affinity checking (me, Felix)
      
         - Fix for how cqring waiting checks for the presence of task_work.
           Just check it directly rather than check for a specific
           notification mechanism (me)
      
         - Tweak to how request linking is represented in tracing (me)
      
         - Fix a syzbot report that deliberately sets up a huge list of
           overflow entries, and then hits rcu stalls when flushing this list.
           Just check for the need to preempt, and drop/reacquire locks in the
           loop. There's no state maintained over the loop itself, and each
           entry is yanked from head-of-list (me)"
      
      * tag 'for-6.12/io_uring-20240922' of git://git.kernel.dk/linux:
        io_uring: check if we need to reschedule during overflow flush
        io_uring: improve request linking trace
        io_uring: check for presence of task_work rather than TIF_NOTIFY_SIGNAL
        io_uring/sqpoll: do the napi busy poll outside the submission block
        io_uring: clean up a type in io_uring_register_get_file()
        io_uring/sqpoll: do not put cpumask on stack
        io_uring/sqpoll: retain test for whether the CPU is valid
        io_uring/rsrc: change ubuf->ubuf_end to length tracking
        io_uring/rsrc: get rid of io_mapped_ubuf->folio_mask
        io_uring: rename "copy buffers" to "clone buffers"
      3147a068
    • Linus Torvalds's avatar
      Merge tag 'sysctl-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl · 172d5139
      Linus Torvalds authored
      Pull sysctl update from Joel Granados:
      
       - Avoid evaluating non-mount ctl_tables as a sysctl_mount_point by
         removing the unlikely (but possible) chance that the permanently
         empty ctl_table array shares its address with another ctl_table
      
       - Update Joel Granados' contact info in MAINTAINERS
      
      * tag 'sysctl-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
        MAINTAINERS: update email for Joel Granados
        sysctl: avoid spurious permanent empty tables
      172d5139
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 97d8894b
      Linus Torvalds authored
      Pull RISC-V updates from Palmer Dabbelt:
      
       - Support using Zkr to seed KASLR
      
       - Support IPI-triggered CPU backtracing
      
       - Support for generic CPU vulnerabilities reporting to userspace
      
       - A few cleanups for missing licenses
      
       - The size limit on the XIP kernel has been removed
      
       - Support for tracing userspace stacks
      
       - Support for the Svvptc extension
      
       - Various cleanups and fixes throughout the tree
      
      * tag 'riscv-for-linus-6.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (47 commits)
        crash: Fix riscv64 crash memory reserve dead loop
        perf/riscv-sbi: Add platform specific firmware event handling
        tools: Optimize ring buffer for riscv
        tools: Add riscv barrier implementation
        RISC-V: Don't have MAX_PHYSMEM_BITS exceed phys_addr_t
        ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
        riscv: Enable bitops instrumentation
        riscv: Omit optimized string routines when using KASAN
        ACPI: RISCV: Make acpi_numa_get_nid() to be static
        riscv: Randomize lower bits of stack address
        selftests: riscv: Allow mmap test to compile on 32-bit
        riscv: Make riscv_isa_vendor_ext_andes array static
        riscv: Use LIST_HEAD() to simplify code
        riscv: defconfig: Disable RZ/Five peripheral support
        RISC-V: Implement kgdb_roundup_cpus() to enable future NMI Roundup
        riscv: avoid Imbalance in RAS
        riscv: cacheinfo: Add back init_cache_level() function
        riscv: Remove unused _TIF_WORK_MASK
        drivers/perf: riscv: Remove redundant macro check
        riscv: define ILLEGAL_POINTER_VALUE for 64bit
        ...
      97d8894b
    • Linus Torvalds's avatar
      Merge tag 'm68knommu-for-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 7108fff8
      Linus Torvalds authored
      Pull m68knommu fixlet from Greg Ungerer:
       "Only a single change, cleaning up white space in debug message"
      
      * tag 'm68knommu-for-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
        m68k: remove trailing space after \n newline
      7108fff8