- 14 Aug, 2023 6 commits
-
-
Peter Zijlstra authored
Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.032678917@infradead.org
-
Peter Zijlstra authored
Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211811.964370836@infradead.org
-
Peter Zijlstra authored
Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211811.896559109@infradead.org
-
Peter Zijlstra authored
Use guards to reduce gotos and simplify control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211811.828443100@infradead.org
-
Cyril Hrubis authored
The sched_rr_timeslice can be reset to default by writing value that is <= 0. However after reading from this file we always got the last value written, which is not useful at all. $ echo -1 > /proc/sys/kernel/sched_rr_timeslice_ms $ cat /proc/sys/kernel/sched_rr_timeslice_ms -1 Fix this by setting the variable that holds the sysctl file value to the jiffies_to_msecs(RR_TIMESLICE) in case that <= 0 value was written. Signed-off-by: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Petr Vorel <pvorel@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Tested-by: Petr Vorel <pvorel@suse.cz> Link: https://lore.kernel.org/r/20230802151906.25258-3-chrubis@suse.cz
-
Cyril Hrubis authored
There is a 10% rounding error in the intial value of the sysctl_sched_rr_timeslice with CONFIG_HZ_300=y. This was found with LTP test sched_rr_get_interval01: sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 What this test does is to compare the return value from the sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and fails if they do not match. The problem it found is the intial sysctl file value which was computed as: static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE; which works fine as long as MSEC_PER_SEC is multiple of HZ, however it introduces 10% rounding error for CONFIG_HZ_300: (MSEC_PER_SEC / HZ) * (100 * HZ / 1000) (1000 / 300) * (100 * 300 / 1000) 3 * 30 = 90 This can be easily fixed by reversing the order of the multiplication and division. After this fix we get: (MSEC_PER_SEC * (100 * HZ / 1000)) / HZ (1000 * (100 * 300 / 1000)) / 300 (1000 * 30) / 300 = 100 Fixes: 975e155e ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds") Signed-off-by: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Petr Vorel <pvorel@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Tested-by: Petr Vorel <pvorel@suse.cz> Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz
-
- 10 Aug, 2023 1 commit
-
-
Ingo Molnar authored
Pick up the EEVDF work into the main branch - it's looking good so far. Conflicts: kernel/sched/features.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
- 02 Aug, 2023 4 commits
-
-
Phil Auld authored
CFS bandwidth limits and NOHZ full don't play well together. Tasks can easily run well past their quotas before a remote tick does accounting. This leads to long, multi-period stalls before such tasks can run again. Currently, when presented with these conflicting requirements the scheduler is favoring nohz_full and letting the tick be stopped. However, nohz tick stopping is already best-effort, there are a number of conditions that can prevent it, whereas cfs runtime bandwidth is expected to be enforced. Make the scheduler favor bandwidth over stopping the tick by setting TICK_DEP_BIT_SCHED when the only running task is a cfs task with runtime limit enabled. We use cfs_b->hierarchical_quota to determine if the task requires the tick. Add check in pick_next_task_fair() as well since that is where we have a handle on the task that is actually going to be running. Add check in sched_can_stop_tick() to cover some edge cases such as nr_running going from 2->1 and the 1 remains the running task. Reviewed-By: Ben Segall <bsegall@google.com> Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230712133357.381137-3-pauld@redhat.com
-
Phil Auld authored
In cgroupv2 cfs_b->hierarchical_quota is set to -1 for all task groups due to the previous fix simply taking the min. It should reflect a limit imposed at that level or by an ancestor. Even though cgroupv2 does not require child quota to be less than or equal to that of its ancestors the task group will still be constrained by such a quota so this should be shown here. Cgroupv1 continues to set this correctly. In both cases, add initialization when a new task group is created based on the current parent's value (or RUNTIME_INF in the case of root_task_group). Otherwise, the field is wrong until a quota is changed after creation and __cfs_schedulable() is called. Fixes: c53593e5 ("sched, cgroup: Don't reject lower cpu.max on ancestors") Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230714125746.812891-1-pauld@redhat.com
-
Johannes Weiner authored
Peter is kind enough to route the low-volume psi patches through the scheduler tree, but he is frequently not CC'd on them. While he is matched through the SCHEDULER maintainers and reviewers on kern/sched/*, that list is long, and mostly not applicable to psi code. Thus, patch submitters often just CC the explicit PSI entries. Add him to that section, to make sure he gets those patches. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230801133235.GA1766885@cmpxchg.org
-
Randy Dunlap authored
Users of KERNFS should select it to enforce its being built, so do this to prevent a build error. In file included from ../kernel/sched/build_utility.c:97: ../kernel/sched/psi.c: In function 'psi_trigger_poll': ../kernel/sched/psi.c:1479:17: error: implicit declaration of function 'kernfs_generic_poll' [-Werror=implicit-function-declaration] 1479 | kernfs_generic_poll(t->of, wait); Fixes: aff03707 ("sched/psi: use kernfs polling functions for PSI trigger polling") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Suren Baghdasaryan <surenb@google.com> Link: lore.kernel.org/r/202307310732.r65EQFY0-lkp@intel.com
-
- 26 Jul, 2023 2 commits
-
-
Chen Yu authored
The flags of the child of a given scheduling domain are used to initialize the flags of its scheduling groups. When the child of a scheduling domain is degenerated, the flags of its local scheduling group need to be updated to align with the flags of its new child domain. The flag SD_SHARE_CPUCAPACITY was aligned in Commit bf2dc42d ("sched/topology: Propagate SMT flags when removing degenerate domain"). Further generalize this alignment so other flags can be used later, such as in cluster-based task wakeup. [1] Reported-by: Yicong Yang <yangyicong@huawei.com> Suggested-by: Ricardo Neri <ricardo.neri@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20230713013133.2314153-1-yu.c.chen@intel.com
-
Vincent Guittot authored
There is no need to use runnable_avg when estimating util_est and that even generates wrong behavior because one includes blocked tasks whereas the other one doesn't. This can lead to accounting twice the waking task p, once with the blocked runnable_avg and another one when adding its util_est. cpu's runnable_avg is already used when computing util_avg which is then compared with util_est. In some situation, feec will not select prev_cpu but another one on the same performance domain because of higher max_util Fixes: 7d0583cf ("sched/fair, cpufreq: Introduce 'runnable boosting'") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20230706135144.324311-1-vincent.guittot@linaro.org
-
- 19 Jul, 2023 12 commits
-
-
Peter Zijlstra authored
This allows place_entity() to consider ENQUEUE_WAKEUP and ENQUEUE_MIGRATED. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230531124604.274010996@infradead.org
-
Peter Zijlstra authored
EEVDF uses this tunable as the base request/slice -- make sure the name reflects this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230531124604.205287511@infradead.org
-
Peter Zijlstra authored
EEVDF is a better defined scheduling policy, as a result it has less heuristics/tunables. There is no compelling reason to keep CFS around. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230531124604.137187212@infradead.org
-
Peter Zijlstra authored
Using lag is both more correct and simpler when moving between runqueues. Notable, min_vruntime() was invented as a cheap approximation of avg_vruntime() for this very purpose (SMP migration). Since we now have the real thing; use it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230531124604.068911180@infradead.org
-
Peter Zijlstra authored
Removes the FAIR_SLEEPERS code in favour of the new LAG based placement. Specifically, the whole FAIR_SLEEPER thing was a very crude approximation to make up for the lack of lag based placement, specifically the 'service owed' part. This is important for things like 'starve' and 'hackbench'. One side effect of FAIR_SLEEPER is that it caused 'small' unfairness, specifically, by always ignoring up-to 'thresh' sleeptime it would have a 50%/50% time distribution for a 50% sleeper vs a 100% runner, while strictly speaking this should (of course) result in a 33%/67% split (as CFS will also do if the sleep period exceeds 'thresh'). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230531124604.000198861@infradead.org
-
Peter Zijlstra authored
Where CFS is currently a WFQ based scheduler with only a single knob, the weight. The addition of a second, latency oriented parameter, makes something like WF2Q or EEVDF based a much better fit. Specifically, EEVDF does EDF like scheduling in the left half of the tree -- those entities that are owed service. Except because this is a virtual time scheduler, the deadlines are in virtual time as well, which is what allows over-subscription. EEVDF has two parameters: - weight, or time-slope: which is mapped to nice just as before - request size, or slice length: which is used to compute the virtual deadline as: vd_i = ve_i + r_i/w_i Basically, by setting a smaller slice, the deadline will be earlier and the task will be more eligible and ran earlier. Tick driven preemption is driven by request/slice completion; while wakeup preemption is driven by the deadline. Because the tree is now effectively an interval tree, and the selection is no longer 'leftmost', over-scheduling is less of a problem. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230531124603.931005524@infradead.org
-
Peter Zijlstra authored
While slightly sub-optimal, updating the augmented data while going down the tree during lookup would be faster -- alas the augment interface does not currently allow for that, provide a generic helper to add a node to an augmented cached tree. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230531124603.862983648@infradead.org
-
Peter Zijlstra authored
With the introduction of avg_vruntime, it is possible to approximate lag (the entire purpose of introducing it in fact). Use this to do lag based placement over sleep+wake. Specifically, the FAIR_SLEEPERS thing places things too far to the left and messes up the deadline aspect of EEVDF. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230531124603.794929315@infradead.org
-
Peter Zijlstra authored
With the introduction of avg_vruntime() there is no need to use worse approximations. Take the 0-lag point as starting point for inserting new tasks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230531124603.722361178@infradead.org
-
Peter Zijlstra authored
In order to move to an eligibility based scheduling policy, we need to have a better approximation of the ideal scheduler. Specifically, for a virtual time weighted fair queueing based scheduler the ideal scheduler will be the weighted average of the individual virtual runtimes (math in the comment). As such, compute the weighted average to approximate the ideal scheduler -- note that the approximation is in the individual task behaviour, which isn't strictly conformant. Specifically consider adding a task with a vruntime left of center, in this case the average will move backwards in time -- something the ideal scheduler would of course never do. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230531124603.654144274@infradead.org
-
Ingo Molnar authored
Sync with upstream fixes before applying EEVDF. Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Chin Yik Ming authored
The rename in 2f064a59 ("sched: Change task_struct::state") missed the comments. [ mingo: Improved the changelog. ] Signed-off-by: Chin Yik Ming <yikming2222@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org> Link: https://lore.kernel.org/r/20230717064952.2804-1-yikming2222@gmail.com
-
- 16 Jul, 2023 10 commits
-
-
Linus Torvalds authored
-
https://github.com/jcmvbkbc/linux-xtensaLinus Torvalds authored
Pull xtensa fixes from Max Filippov: - fix interaction between unaligned exception handler and load/store exception handler - fix parsing ISS network interface specification string - add comment about etherdev freeing to ISS network driver * tag 'xtensa-20230716' of https://github.com/jcmvbkbc/linux-xtensa: xtensa: fix unaligned and load/store configuration interaction xtensa: ISS: fix call to split_if_spec xtensa: ISS: add comment about etherdev freeing
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fix from Borislav Petkov: - Fix a lockdep warning when the event given is the first one, no event group exists yet but the code still goes and iterates over event siblings * tag 'perf_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix lockdep warning in for_each_sibling_event() on SPR
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull objtool fixes from Borislav Petkov: - Mark copy_iovec_from_user() __noclone in order to prevent gcc from doing an inter-procedural optimization and confuse objtool - Initialize struct elf fully to avoid build failures * tag 'objtool_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: iov_iter: Mark copy_iovec_from_user() noclone objtool: initialize all of struct elf
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fixes from Borislav Petkov: - Remove a cgroup from under a polling process properly - Fix the idle sibling selection * tag 'sched_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/psi: use kernfs polling functions for PSI trigger polling sched/fair: Use recent_used_cpu to test p->cpus_ptr
-
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrlLinus Torvalds authored
Pull pin control fixes from Linus Walleij: "I'm mostly on vacation but what would vacation be without a few critical fixes so people can use their gaming laptops when hiding away from the sun (or rain)? - Fix a really annoying interrupt storm in the AMD driver affecting Asus TUF gaming notebooks - Fix device tree parsing in the Renesas driver" * tag 'pinctrl-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: amd: Unify debounce handling into amd_pinconf_set() pinctrl: amd: Drop pull up select configuration pinctrl: amd: Use amd_pinconf_set() for all config options pinctrl: amd: Only use special debounce behavior for GPIO 0 pinctrl: renesas: rzg2l: Handle non-unique subnode names pinctrl: renesas: rzv2m: Handle non-unique subnode names
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull smb client fixes from Steve French: - Two reconnect fixes: important fix to address inFlight count to leak (which can leak credits), and fix for better handling a deleted share - DFS fix - SMB1 cleanup fix - deferred close fix * tag '6.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix mid leak during reconnection after timeout threshold cifs: is_network_name_deleted should return a bool smb: client: fix missed ses refcounting smb: client: Fix -Wstringop-overflow issues cifs: if deferred close is disabled then close files immediately
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fixes from Michael Ellerman: - Fix Speculation_Store_Bypass reporting in /proc/self/status on Power10 - Fix HPT with 4K pages since recent changes by implementing pmd_same() - Fix 64-bit native_hpte_remove() to be irq-safe Thanks to Aneesh Kumar K.V, Nageswara R Sastry, and Russell Currey. * tag 'powerpc-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm/book3s64/hash/4k: Add pmd_same callback for 4K page size powerpc/64e: Fix obtool warnings in exceptions-64e.S powerpc/security: Fix Speculation_Store_Bypass reporting on Power10 powerpc/64s: Fix native_hpte_remove() to be irq-safe
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull hardening fixes from Kees Cook: - Remove LTO-only suffixes from promoted global function symbols (Yonghong Song) - Remove unused .text..refcount section from vmlinux.lds.h (Petr Pavlu) - Add missing __always_inline to sparc __arch_xchg() (Arnd Bergmann) - Claim maintainership of string routines * tag 'hardening-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: sparc: mark __arch_xchg() as __always_inline MAINTAINERS: Foolishly claim maintainership of string routines kallsyms: strip LTO-only suffixes from promoted global functions vmlinux.lds.h: Remove a reference to no longer used sections .text..refcount
-
Linus Torvalds authored
Merge tag 'probes-fixes-v6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probe fixes from Masami Hiramatsu: - fprobe: Add a comment why fprobe will be skipped if another kprobe is running in fprobe_kprobe_handler(). - probe-events: Fix some issues related to fetch-arguments: - Fix double counting of the string length for user-string and symstr. This will require longer buffer in the array case. - Fix not to count error code (minus value) for the total used length in array argument. This makes the total used length shorter. - Fix to update dynamic used data size counter only if fetcharg uses the dynamic size data. This may mis-count the used dynamic data size and corrupt data. - Revert "tracing: Add "(fault)" name injection to kernel probes" because that did not work correctly with a bug, and we agreed the current '(fault)' output (instead of '"(fault)"' like a string) explains what happened more clearly. - Fix to record 0-length (means fault access) data_loc data in fetch function itself, instead of store_trace_args(). If we record an array of string, this will fix to save fault access data on each entry of the array correctly. * tag 'probes-fixes-v6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails Revert "tracing: Add "(fault)" name injection to kernel probes" tracing/probes: Fix to update dynamic data counter if fetcharg uses it tracing/probes: Fix not to count error code to total length tracing/probes: Fix to avoid double count of the string length on the array fprobes: Add a comment why fprobe_kprobe_handler exits if kprobe is running
-
- 15 Jul, 2023 5 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spiLinus Torvalds authored
Pull spi fixes from Mark Brown: "A couple of fairly minor driver specific fixes here, plus a bunch of maintainership and admin updates. Nothing too remarkable" * tag 'spi-fix-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: mailmap: add entry for Jonas Gorski MAINTAINERS: add myself for spi-bcm63xx spi: s3c64xx: clear loopback bit after loopback test spi: bcm63xx: fix max prepend length MAINTAINERS: Add myself as a maintainer for Microchip SPI
-
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmapLinus Torvalds authored
Pull regmap fix from Mark Brown: "One fix for an out of bounds access in the interupt code here" * tag 'regmap-fix-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap-irq: Fix out-of-bounds access when allocating config buffers
-
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds authored
Pull iommu fixes from Joerg Roedel: - Fix a regression causing a crash on sysfs access of iommu-group specific files - Fix signedness bug in SVA code * tag 'iommu-fixes-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/sva: Fix signedness bug in iommu_sva_alloc_pasid() iommu: Fix crash during syfs iommu_groups/N/type
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 CFI fixes from Peter Zijlstra: "Fix kCFI/FineIBT weaknesses The primary bug Alyssa noticed was that with FineIBT enabled function prologues have a spurious ENDBR instruction: __cfi_foo: endbr64 subl $hash, %r10d jz 1f ud2 nop 1: foo: endbr64 <--- *sadface* This means that any indirect call that fails to target the __cfi symbol and instead targets (the regular old) foo+0, will succeed due to that second ENDBR. Fixing this led to the discovery of a single indirect call that was still doing this: ret_from_fork(). Since that's an assembly stub the compiler would not generate the proper kCFI indirect call magic and it would not get patched. Brian came up with the most comprehensive fix -- convert the thing to C with only a very thin asm wrapper. This ensures the kernel thread boostrap is a proper kCFI call. While discussing all this, Kees noted that kCFI hashes could/should be poisoned to seal all functions whose address is never taken, further limiting the valid kCFI targets -- much like we already do for IBT. So what was a 'simple' observation and fix cascaded into a bunch of inter-related CFI infrastructure fixes" * tag 'x86_urgent_for_6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cfi: Only define poison_cfi() if CONFIG_X86_KERNEL_IBT=y x86/fineibt: Poison ENDBR at +0 x86: Rewrite ret_from_fork() in C x86/32: Remove schedule_tail_wrapper() x86/cfi: Extend ENDBR sealing to kCFI x86/alternative: Rename apply_ibt_endbr() x86/cfi: Extend {JMP,CAKK}_NOSPEC comment
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "This is a bunch of small driver fixes and a larger rework of zone disk handling (which reaches into blk and nvme). The aacraid array-bounds fix is now critical since the security people turned on -Werror for some build tests, which now fail without it" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: storvsc: Handle SRB status value 0x30 scsi: block: Improve checks in blk_revalidate_disk_zones() scsi: block: virtio_blk: Set zone limits before revalidating zones scsi: block: nullblk: Set zone limits before revalidating zones scsi: nvme: zns: Set zone limits before revalidating zones scsi: sd_zbc: Set zone limits before revalidating zones scsi: ufs: core: Add support for qTimestamp attribute scsi: aacraid: Avoid -Warray-bounds warning scsi: ufs: ufs-mediatek: Add dependency for RESET_CONTROLLER scsi: ufs: core: Update contact email for monitor sysfs nodes scsi: scsi_debug: Remove dead code scsi: qla2xxx: Use vmalloc_array() and vcalloc() scsi: fnic: Use vmalloc_array() and vcalloc() scsi: qla2xxx: Fix error code in qla2x00_start_sp() scsi: qla2xxx: Silence a static checker warning scsi: lpfc: Fix a possible data race in lpfc_unregister_fcf_rescan()
-