1. 21 Apr, 2023 4 commits
    • Aaron Thompson's avatar
      sched/clock: Fix local_clock() before sched_clock_init() · f31dcb15
      Aaron Thompson authored
      Have local_clock() return sched_clock() if sched_clock_init() has not
      yet run. sched_clock_cpu() has this check but it was not included in the
      new noinstr implementation of local_clock().
      
      The effect can be seen on x86 with CONFIG_PRINTK_TIME enabled, for
      instance. scd->clock quickly reaches the value of TICK_NSEC and that
      value is returned until sched_clock_init() runs.
      
      dmesg without this patch:
      
          [    0.000000] kvm-clock: ...
          [    0.000002] kvm-clock: ...
          [    0.000672] clocksource: ...
          [    0.001000] tsc: ...
          [    0.001000] e820: ...
          [    0.001000] e820: ...
           ...
          [    0.001000] ..TIMER: ...
          [    0.001000] clocksource: ...
          [    0.378956] Calibrating delay loop ...
          [    0.379955] pid_max: ...
      
      dmesg with this patch:
      
          [    0.000000] kvm-clock: ...
          [    0.000001] kvm-clock: ...
          [    0.000675] clocksource: ...
          [    0.002685] tsc: ...
          [    0.003331] e820: ...
          [    0.004190] e820: ...
           ...
          [    0.421939] ..TIMER: ...
          [    0.422842] clocksource: ...
          [    0.424582] Calibrating delay loop ...
          [    0.425580] pid_max: ...
      
      Fixes: 776f2291 ("sched/clock: Make local_clock() noinstr")
      Signed-off-by: default avatarAaron Thompson <dev@aaront.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20230413175012.2201-1-dev@aaront.org
      f31dcb15
    • Schspa Shi's avatar
      sched/rt: Fix bad task migration for rt tasks · feffe5bb
      Schspa Shi authored
      Commit 95158a89 ("sched,rt: Use the full cpumask for balancing")
      allows find_lock_lowest_rq() to pick a task with migration disabled.
      The purpose of the commit is to push the current running task on the
      CPU that has the migrate_disable() task away.
      
      However, there is a race which allows a migrate_disable() task to be
      migrated. Consider:
      
        CPU0                                    CPU1
        push_rt_task
          check is_migration_disabled(next_task)
      
                                                task not running and
                                                migration_disabled == 0
      
          find_lock_lowest_rq(next_task, rq);
            _double_lock_balance(this_rq, busiest);
              raw_spin_rq_unlock(this_rq);
              double_rq_lock(this_rq, busiest);
                <<wait for busiest rq>>
                                                    <wakeup>
                                                task become running
                                                migrate_disable();
                                                  <context out>
          deactivate_task(rq, next_task, 0);
          set_task_cpu(next_task, lowest_rq->cpu);
            WARN_ON_ONCE(is_migration_disabled(p));
      
      Fixes: 95158a89 ("sched,rt: Use the full cpumask for balancing")
      Signed-off-by: default avatarSchspa Shi <schspa@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Reviewed-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
      Reviewed-by: default avatarValentin Schneider <vschneid@redhat.com>
      Tested-by: default avatarDwaine Gonyier <dgonyier@redhat.com>
      feffe5bb
    • Mathieu Desnoyers's avatar
      sched: Fix performance regression introduced by mm_cid · 223baf9d
      Mathieu Desnoyers authored
      Introduce per-mm/cpu current concurrency id (mm_cid) to fix a PostgreSQL
      sysbench regression reported by Aaron Lu.
      
      Keep track of the currently allocated mm_cid for each mm/cpu rather than
      freeing them immediately on context switch. This eliminates most atomic
      operations when context switching back and forth between threads
      belonging to different memory spaces in multi-threaded scenarios (many
      processes, each with many threads). The per-mm/per-cpu mm_cid values are
      serialized by their respective runqueue locks.
      
      Thread migration is handled by introducing invocation to
      sched_mm_cid_migrate_to() (with destination runqueue lock held) in
      activate_task() for migrating tasks. If the destination cpu's mm_cid is
      unset, and if the source runqueue is not actively using its mm_cid, then
      the source cpu's mm_cid is moved to the destination cpu on migration.
      
      Introduce a task-work executed periodically, similarly to NUMA work,
      which delays reclaim of cid values when they are unused for a period of
      time.
      
      Keep track of the allocation time for each per-cpu cid, and let the task
      work clear them when they are observed to be older than
      SCHED_MM_CID_PERIOD_NS and unused. This task work also clears all
      mm_cids which are greater or equal to the Hamming weight of the mm
      cidmask to keep concurrency ids compact.
      
      Because we want to ensure the mm_cid converges towards the smaller
      values as migrations happen, the prior optimization that was done when
      context switching between threads belonging to the same mm is removed,
      because it could delay the lazy release of the destination runqueue
      mm_cid after it has been replaced by a migration. Removing this prior
      optimization is not an issue performance-wise because the introduced
      per-mm/per-cpu mm_cid tracking also covers this more specific case.
      
      Fixes: af7f588d ("sched: Introduce per-memory-map concurrency ID")
      Reported-by: default avatarAaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarAaron Lu <aaron.lu@intel.com>
      Link: https://lore.kernel.org/lkml/20230327080502.GA570847@ziqianlu-desk2/
      223baf9d
    • Peter Zijlstra's avatar
      Merge branch 'v6.3-rc7' · 5a4d3b38
      Peter Zijlstra authored
      Sync with the urgent patches; in particular:
      
      a53ce18c ("sched/fair: Sanitize vruntime of entity being migrated")
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      5a4d3b38
  2. 16 Apr, 2023 12 commits
  3. 15 Apr, 2023 6 commits
  4. 14 Apr, 2023 14 commits
  5. 13 Apr, 2023 4 commits
    • Linus Torvalds's avatar
      Merge tag 'cgroup-for-6.3-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 44149752
      Linus Torvalds authored
      Pull cgroup fixes from Tejun Heo:
       "This is a relatively big pull request this late in the cycle but the
        major contributor is the cpuset bug which is rather significant:
      
         - Fix several cpuset bugs including one where it wasn't applying the
           target cgroup when tasks are created with CLONE_INTO_CGROUP
      
        With a few smaller fixes:
      
         - Fix inversed locking order in cgroup1 freezer implementation
      
         - Fix garbage cpu.stat::core_sched.forceidle_usec reporting in the
           root cgroup"
      
      * tag 'cgroup-for-6.3-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup/cpuset: Make cpuset_attach_task() skip subpartitions CPUs for top_cpuset
        cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods
        cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly
        cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
        cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex
        cgroup/cpuset: Fix partition root's cpuset.cpus update bug
        cgroup: fix display of forceidle time at root
      44149752
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · e44f45fe
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "A few more clk driver fixes:
      
         - Set the max_register member of the spreadtrum regmap so that reads
           don't go off the end of the I/O space
      
         - Avoid a clk parent error in the i.MX imx6ul driver when the
           selector is unknown
      
         - Fix an oops due to REGCACHE_NONE usage by the Renesas 9-series
           driver"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: rs9: Fix suspend/resume
        clk: imx6ul: fix "failed to get parent" error
        clk: sprd: set max_register according to mapping range
      e44f45fe
    • Linus Torvalds's avatar
      Merge tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 829cca4d
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf, and bluetooth.
      
        Not all that quiet given spring celebrations, but "current" fixes are
        thinning out, which is encouraging. One outstanding regression in the
        mlx5 driver when using old FW, not blocking but we're pushing for a
        fix.
      
        Current release - new code bugs:
      
         - eth: enetc: workaround for unresponsive pMAC after receiving
           express traffic
      
        Previous releases - regressions:
      
         - rtnetlink: restore RTM_NEW/DELLINK notification behavior, keep the
           pid/seq fields 0 for backward compatibility
      
        Previous releases - always broken:
      
         - sctp: fix a potential overflow in sctp_ifwdtsn_skip
      
         - mptcp:
            - use mptcp_schedule_work instead of open-coding it and make the
              worker check stricter, to avoid scheduling work on closed
              sockets
            - fix NULL pointer dereference on fastopen early fallback
      
         - skbuff: fix memory corruption due to a race between skb coalescing
           and releasing clones confusing page_pool reference counting
      
         - bonding: fix neighbor solicitation validation on backup slaves
      
         - bpf: tcp: use sock_gen_put instead of sock_put in bpf_iter_tcp
      
         - bpf: arm64: fixed a BTI error on returning to patched function
      
         - openvswitch: fix race on port output leading to inf loop
      
         - sfp: initialize sfp->i2c_block_size at sfp allocation to avoid
           returning a different errno than expected
      
         - phy: nxp-c45-tja11xx: unregister PTP, purge queues on remove
      
         - Bluetooth: fix printing errors if LE Connection times out
      
         - Bluetooth: assorted UaF, deadlock and data race fixes
      
         - eth: macb: fix memory corruption in extended buffer descriptor mode
      
        Misc:
      
         - adjust the XDP Rx flow hash API to also include the protocol layers
           over which the hash was computed"
      
      * tag 'net-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
        selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
        mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
        veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
        mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
        xdp: rss hash types representation
        selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
        skbuff: Fix a race between coalescing and releasing SKBs
        net: macb: fix a memory corruption in extended buffer descriptor mode
        selftests: add the missing CONFIG_IP_SCTP in net config
        udp6: fix potential access to stale information
        selftests: openvswitch: adjust datapath NL message declaration
        selftests: mptcp: userspace pm: uniform verify events
        mptcp: fix NULL pointer dereference on fastopen early fallback
        mptcp: stricter state check in mptcp_worker
        mptcp: use mptcp_schedule_work instead of open-coding it
        net: enetc: workaround for unresponsive pMAC after receiving express traffic
        sctp: fix a potential overflow in sctp_ifwdtsn_skip
        net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
        rtnetlink: Restore RTM_NEW/DELLINK notification behavior
        net: ti/cpsw: Add explicit platform_device.h and of_platform.h includes
        ...
      829cca4d
    • Linus Torvalds's avatar
      Merge tag 'devicetree-fixes-for-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 4413ad01
      Linus Torvalds authored
      Pull devicetree fixes from Rob Herring:
      
       - Fix interaction between fw_devlink and DT overlays causing devices to
         not be probed
      
       - Fix the compatible string for loongson,cpu-interrupt-controller
      
      * tag 'devicetree-fixes-for-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        treewide: Fix probing of devices in DT overlays
        dt-bindings: interrupt-controller: loongarch: Fix mismatched compatible
      4413ad01