Commits · ab407a1919d2676ddc5761ed459d4cc5c7be18ed · Kirill Smelkov / linux

13 Feb, 2023 18 commits

Merge tag 'clocksource.2023.02.06b' of... · ab407a19

Thomas Gleixner authored Feb 13, 2023

Merge tag 'clocksource.2023.02.06b' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into timers/core

Pull clocksource watchdog changes from Paul McKenney:

o Improvements to clocksource-watchdog console messages.

o Loosening of the clocksource-watchdog skew criteria to match
those of NTP (500 parts per million, relaxed from 400 parts
per million). If it is good enough for NTP, it is good enough
for the clocksource watchdog.

o Suspend clocksource-watchdog checking temporarily when high
memory latencies are detected. This avoids the false-positive
clock-skew events that have been seen on production systems
running memory-intensive workloads.

o On systems where the TSC is deemed trustworthy, use it as the
watchdog timesource, but only when specifically requested using
the tsc=watchdog kernel boot parameter. This permits clock-skew
events to be detected, but avoids forcing workloads to use the
slow HPET and ACPI PM timers. These last two timers are slow
enough to cause systems to be needlessly marked bad on the one
hand, and real skew does sometimes happen on production systems
running production workloads on the other. And sometimes it is
the fault of the TSC, or at least of the firmware that told the
kernel to program the TSC with the wrong frequency.

o Add a tsc=revalidate kernel boot parameter to allow the kernel
to diagnose cases where the TSC hardware works fine, but was told
by firmware to tick at the wrong frequency. Such cases are rare,
but they really have happened on production systems.

Link: https://lore.kernel.org/r/20230210193640.GA3325193@paulmck-ThinkPad-P17-Gen-1

ab407a19

Merge tag 'timers-v6.3-rc1' of https://git.linaro.org/people/daniel.lezcano/linux into timers/core · 7b0f95f2

Thomas Gleixner authored Feb 13, 2023

Pull clocksource/event changes from Daniel Lezcano:

   - Add rktimer for rv1126 Rockchip based board (Jagan Teki)

   - Initialize hrtimer based broadcast clock event device on RISC-V
     before C3STOP can be used (Conor Dooley)

   - Add DT binding for RISC-V timer and add the C3STOP flag if the DT
     tells the timer can not wake up the CPU (Anup Patel)

   - Increase the RISC-V timer rating as it is more efficient than mmio
     timers (Samuel Holland)

   - Drop obsolete dependency on COMPILE_TEST on microchip-pit64b as the
     OF is already depending on it (Jean Delvare)

   - Mark sh_cmt, sh_tmu, em_sti drivers as non-removable (Uwe
     Kleine-König)

   - Add binding description for mediatek,mt8365-systimer (Bernhard
     Rosenkränzer)

   - Add compatibles for T-Head's C9xx (Icenowy Zheng)

   - Restrict the microchip-pit64b compilation to the ARM architecture
     and add the delay timer (Claudiu Beznea)

   - Set the static key to select the SBI or Sstc timer sooner to prevent
     the first call to use the SBI while Sstc must be used (Matt Evans)

   - Add the CLOCK_EVT_FEAT_DYNIRQ flag to optimize the timer wake up on
     the sun4i platform (Yangtao Li)

Link: https://lore.kernel/org/r/b7d1d982-d717-2930-b353-19b92cbe390f@linaro.org

7b0f95f2

clocksource/drivers/timer-sun4i: Add CLOCK_EVT_FEAT_DYNIRQ · 5ccb51b0

Yangtao Li authored Feb 09, 2023

Add CLOCK_EVT_FEAT_DYNIRQ to allow the IRQ could be runtime set affinity
to the cores that needs wake up, otherwise saying core0 has to send
IPI to wakeup core1. With CLOCK_EVT_FEAT_DYNIRQ set, when broadcast
timer could wake up the cores, IPI is not needed.

After enabling this feature, especially the scene where cpuidle is
enabled can benefit.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Link: https://lore.kernel.org/r/20230209040239.24710-1-frank.li@vivo.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

5ccb51b0

clocksource/drivers/em_sti: Mark driver as non-removable · cf16f631

Uwe Kleine-König authored Feb 07, 2023

The comment in the remove callback suggests that the driver is not
supposed to be unbound. However returning an error code in the remove
callback doesn't accomplish that. Instead set the suppress_bind_attrs
property (which makes it impossible to unbind the driver via sysfs).
The only remaining way to unbind a em_sti device would be module
unloading, but that doesn't apply here, as the driver cannot be built as
a module.

Also drop the useless remove callback.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230207193010.469495-1-u.kleine-koenig@pengutronix.deSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

cf16f631

clocksource/drivers/sh_tmu: Mark driver as non-removable · d8c695d3

Uwe Kleine-König authored Feb 07, 2023

The comment in the remove callback suggests that the driver is not
supposed to be unbound. However returning an error code in the remove
callback doesn't accomplish that. Instead set the suppress_bind_attrs
property (which makes it impossible to unbind the driver via sysfs).
The only remaining way to unbind a sh_tmu device would be module
unloading, but that doesn't apply here, as the driver cannot be built as
a module.

Also drop the useless remove callback.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230207193614.472060-1-u.kleine-koenig@pengutronix.deSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

d8c695d3

clocksource/drivers/riscv: Patch riscv_clock_next_event() jump before first use · 225b9596

Matt Evans authored Feb 01, 2023

A static key is used to select between SBI and Sstc timer usage in
riscv_clock_next_event(), but currently the direction is resolved
after cpuhp_setup_state() is called (which sets the next event).  The
first event will therefore fall through the sbi_set_timer() path; this
breaks Sstc-only systems.  So, apply the jump patching before first
use.

Fixes: 9f7a8ff6 ("RISC-V: Prefer sstc extension if available")
Signed-off-by: Matt Evans <mev@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/CDDAB2D0-264E-42F3-8E31-BA210BEB8EC1@rivosinc.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

225b9596

clocksource/drivers/timer-microchip-pit64b: Add delay timer · f3af3dc7

Claudiu Beznea authored Feb 03, 2023

Add delay timer.
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20230203130537.1921608-3-claudiu.beznea@microchip.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

f3af3dc7

clocksource/drivers/timer-microchip-pit64b: Select driver only on ARM · d19c8b2e

Claudiu Beznea authored Feb 03, 2023

Microchip PIT64B is currently available on ARM based devices. Thus
select it only for ARM. This allows implementing delay timer.
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20230203130537.1921608-2-claudiu.beznea@microchip.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

d19c8b2e

dt-bindings: timer: sifive,clint: add comaptibles for T-Head's C9xx · abd873af

Icenowy Zheng authored Feb 02, 2023

T-Head C906/C910 CLINT is not compliant to SiFive ones (and even not
compliant to the newcoming ACLINT spec) because of lack of mtime
register.

Add a compatible string formatted like the C9xx-specific PLIC
compatible, and do not allow a SiFive one as fallback because they're
not really compliant.
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230202072814.319903-1-uwu@icenowy.meSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

abd873af

dt-bindings: timer: mediatek,mtk-timer: add MT8365 · 27788e01

Bernhard Rosenkränzer authored Jan 25, 2023

Add binding description for mediatek,mt8365-systimer
Signed-off-by: Bernhard Rosenkränzer <bero@baylibre.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230125143503.1015424-8-bero@baylibre.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

27788e01

clocksource/drivers/riscv: Get rid of clocksource_arch_init() callback · 3aff0403

Lad Prabhakar authored Dec 29, 2022

Having a clocksource_arch_init() callback always sets vdso_clock_mode to
VDSO_CLOCKMODE_ARCHTIMER if GENERIC_GETTIMEOFDAY is enabled, this is
required for the riscv-timer.

This works for platforms where just riscv-timer clocksource is present.
On platforms where other clock sources are available we want them to
register with vdso_clock_mode set to VDSO_CLOCKMODE_NONE.

On the Renesas RZ/Five SoC OSTM block can be used as clocksource [0], to
avoid multiple clock sources being registered as VDSO_CLOCKMODE_ARCHTIMER
move setting of vdso_clock_mode in the riscv-timer driver instead of doing
this in clocksource_arch_init() callback as done similarly for ARM/64
architecture.

[0] drivers/clocksource/renesas-ostm.c
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20221229224601.103851-1-prabhakar.mahadev-lad.rj@bp.renesas.comSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

3aff0403

clocksource/drivers/sh_cmt: Mark driver as non-removable · c3daa475

Uwe Kleine-König authored Jan 23, 2023

The comment in the remove callback suggests that the driver is not
supposed to be unbound. However returning an error code in the remove
callback doesn't accomplish that. Instead set the suppress_bind_attrs
property (which makes it impossible to unbind the driver via sysfs).
The only remaining way to unbind a sh_cmt device would be module
unloading, but that doesn't apply here, as the driver cannot be built as
a module.

Also drop the useless remove callback.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230123220221.48164-1-u.kleine-koenig@pengutronix.deSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

c3daa475

clocksource/drivers/timer-microchip-pit64b: Drop obsolete dependency on COMPILE_TEST · 8d17aca9

Jean Delvare authored Jan 21, 2023

Since commit 0166dc11 ("of: make CONFIG_OF user selectable"), it
is possible to test-build any driver which depends on OF on any
architecture by explicitly selecting OF. Therefore depending on
COMPILE_TEST as an alternative is no longer needed.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Claudiu Beznea <claudiu.beznea@microchip.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20230121182911.4e47a5ff@endymion.delvareSigned-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

8d17aca9

clocksource/drivers/riscv: Increase the clock source rating · 674402b0

Samuel Holland authored Dec 27, 2022

RISC-V provides an architectural clock source via the time CSR. This
clock source exposes a 64-bit counter synchronized across all CPUs.
Because it is accessed using a CSR, it is much more efficient to read
than MMIO clock sources. For example, on the Allwinner D1, reading the
sun4i timer in a loop takes 131 cycles/iteration, while reading the
RISC-V time CSR takes only 5 cycles/iteration.

Adjust the RISC-V clock source rating so it is preferred over the
various platform-specific MMIO clock sources.
Signed-off-by: Samuel Holland <samuel@sholland.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/r/20221228004444.61568-1-samuel@sholland.orgSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

674402b0

clocksource/drivers/timer-riscv: Set CLOCK_EVT_FEAT_C3STOP based on DT · 8932a953

Anup Patel authored Jan 03, 2023

We should set CLOCK_EVT_FEAT_C3STOP for a clock_event_device only
when riscv,timer-cannot-wake-cpu DT property is present in the RISC-V
timer DT node.

This way CLOCK_EVT_FEAT_C3STOP feature is set for clock_event_device
based on RISC-V platform capabilities rather than having it set for
all RISC-V platforms.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230103141102.772228-4-apatel@ventanamicro.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

8932a953

dt-bindings: timer: Add bindings for the RISC-V timer device · e2bcf2d8

Anup Patel authored Jan 03, 2023

We add DT bindings for a separate RISC-V timer DT node which can
be used to describe implementation specific behaviour (such as
timer interrupt not triggered during non-retentive suspend).
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230103141102.772228-3-apatel@ventanamicro.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

e2bcf2d8

RISC-V: time: initialize hrtimer based broadcast clock event device · 8b3b8fbb

Conor Dooley authored Jan 03, 2023

Similarly to commit 022eb8ae ("ARM: 8938/1: kernel: initialize
broadcast hrtimer based clock event device"), RISC-V needs to initiate
hrtimer based broadcast clock event device before C3STOP can be used.
Otherwise, the introduction of C3STOP for the RISC-V arch timer in
commit 232ccac1 ("clocksource/drivers/riscv: Events are stopped
during CPU suspend") leaves us without any broadcast timer registered.
This prevents the kernel from entering oneshot mode, which breaks timer
behaviour, for example clock_nanosleep().

A test app that sleeps each cpu for 6, 5, 4, 3 ms respectively, HZ=250
& C3STOP enabled, the sleep times are rounded up to the next jiffy:
== CPU: 1 ==      == CPU: 2 ==      == CPU: 3 ==      == CPU: 4 ==
Mean: 7.974992    Mean: 7.976534    Mean: 7.962591    Mean: 3.952179
Std Dev: 0.154374 Std Dev: 0.156082 Std Dev: 0.171018 Std Dev: 0.076193
Hi: 9.472000      Hi: 10.495000     Hi: 8.864000      Hi: 4.736000
Lo: 6.087000      Lo: 6.380000      Lo: 4.872000      Lo: 3.403000
Samples: 521      Samples: 521      Samples: 521      Samples: 521

Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
Fixes: 232ccac1 ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
Suggested-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230103141102.772228-2-apatel@ventanamicro.comSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

8b3b8fbb

dt-bindings: timer: rk-timer: Add rktimer for rv1126 · b3cbfb79

Jagan Teki authored Nov 24, 2022

Add rockchip timer compatible string for rockchip rv1126.
Signed-off-by: Jagan Teki <jagan@edgeble.ai>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221123183124.6911-3-jagan@edgeble.aiSigned-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>

b3cbfb79

09 Feb, 2023 1 commit

time/debug: Fix memory leak with using debugfs_lookup() · 5b268d8a

Greg Kroah-Hartman authored Feb 02, 2023

When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time.  To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic at
once.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230202151214.2306822-1-gregkh@linuxfoundation.org

5b268d8a

07 Feb, 2023 1 commit

clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested · 0051293c

Paul E. McKenney authored Feb 01, 2023

Unconditionally enabling TSC watchdog checking of the HPET and PMTMR
clocksources can degrade latency and performance.  Therefore, provide
a new "watchdog" option to the tsc= boot parameter that opts into such
checking.  Note that tsc=watchdog is overridden by a tsc=nowatchdog
regardless of their relative positions in the list of boot parameters.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Waiman Long <longman@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>

0051293c

06 Feb, 2023 1 commit

posix-timers: Use atomic64_try_cmpxchg() in __update_gt_cputime() · 915d4ad3

Uros Bizjak authored Jan 16, 2023

Use atomic64_try_cmpxchg() instead of atomic64_cmpxchg() in
__update_gt_cputime(). The x86 CMPXCHG instruction returns success in ZF
flag, so this change saves a compare after cmpxchg() (and related move
instruction in front of cmpxchg()).

Also, atomic64_try_cmpxchg() implicitly assigns old *ptr value to "old"
when cmpxchg() fails.  There is no need to re-read the value in the loop.

No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230116165337.5810-1-ubizjak@gmail.com

915d4ad3

02 Feb, 2023 2 commits

clocksource: Verify HPET and PMTMR when TSC unverified · efc8b329

Paul E. McKenney authored Dec 21, 2022

On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC,
NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the
TSC is disabled.  This works well much of the time, but there is the
occasional production-level system that meets all of these criteria, but
which still has a TSC that skews significantly from atomic-clock time.
This is usually attributed to a firmware or hardware fault.  Yes, the
various NTP daemons do express their opinions of userspace-to-atomic-clock
time skew, but they put them in various places, depending on the daemon
and distro in question.  It would therefore be good for the kernel to
have some clue that there is a problem.

The old behavior of marking the TSC unstable is a non-starter because a
great many workloads simply cannot tolerate the overheads and latencies
of the various non-TSC clocksources.  In addition, NTP-corrected systems
sometimes can tolerate significant kernel-space time skew as long as
the userspace time sources are within epsilon of atomic-clock time.

Therefore, when watchdog verification of TSC is disabled, enable it for
HPET and PMTMR (AKA ACPI PM timer).  This provides the needed in-kernel
time-skew diagnostic without degrading the system's performance.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Waiman Long <longman@redhat.com>
Cc: <x86@kernel.org>
Tested-by: Feng Tang <feng.tang@intel.com>

efc8b329

x86/tsc: Add option to force frequency recalibration with HW timer · a7ec817d

Feng Tang authored Jan 04, 2023

The kernel assumes that the TSC frequency which is provided by the
hardware / firmware via MSRs or CPUID(0x15) is correct after applying
a few basic consistency checks. This disables the TSC recalibration
against HPET or PM timer.

As a result there is no mechanism to validate that frequency in cases
where a firmware or hardware defect is suspected. And there was case
that some user used atomic clock to measure the TSC frequency and
reported an inaccuracy issue, which was later fixed in firmware.

Add an option 'recalibrate' for 'tsc' kernel parameter to force the
tsc freq recalibration with HPET or PM timer, and warn if the
deviation from previous value is more than about 500 PPM, which
provides a way to verify the data from hardware / firmware.

There is no functional change to existing work flow.

Recently there was a real-world case: "The 40ms/s divergence between
TSC and HPET was observed on hardware that is quite recent" [1], on
that platform the TSC frequence 1896 MHz was got from CPUID(0x15),
and the force-reclibration with HPET/PMTIMER both calibrated out
value of 1975 MHz, which also matched with check from software
'chronyd', indicating it's a problem of BIOS or firmware.

[Thanks tglx for helping improving the commit log]
[ paulmck: Wordsmith Kconfig help text. ]

[1]. https://lore.kernel.org/lkml/20221117230910.GI4001@paulmck-ThinkPad-P17-Gen-1/Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: <x86@kernel.org>
Cc: <linux-doc@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

a7ec817d

31 Jan, 2023 3 commits

vdso/bits.h: Add BIT_ULL() for the sake of consistency · cbdb1f16

Andy Shevchenko authored Nov 28, 2022

The minimization done in 3945ff37 ("linux/bits.h: Extract common header
for vDSO") was required to isolate the VDSO build from the larger kernel
header impact.

The split added some inconsistency since BIT() and BIT_ULL() are now
defined in the different files which confuses unprepared reader.

Move BIT_ULL() to vdso/bits.h. No functional change.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221128141003.77929-1-andriy.shevchenko@linux.intel.com

cbdb1f16

hrtimer: Ignore slack time for RT tasks in schedule_hrtimeout_range() · 0c52310f

Davidlohr Bueso authored Jan 23, 2023

While in theory the timer can be triggered before expires + delta, for the
cases of RT tasks they really have no business giving any lenience for
extra slack time, so override any passed value by the user and always use
zero for schedule_hrtimeout_range() calls. Furthermore, this is similar to
what the nanosleep(2) family already does with current->timer_slack_ns.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230123173206.6764-3-dave@stgolabs.net

0c52310f

hrtimer: Rely on rt_task() for DL tasks too · c14fd3dc

Davidlohr Bueso authored Jan 23, 2023

Checking dl_task() is redundant as rt_task() returns true for deadline
tasks too.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230123173206.6764-2-dave@stgolabs.net

c14fd3dc

24 Jan, 2023 1 commit

clocksource: Suspend the watchdog temporarily when high read latency detected · b7082cdf

Feng Tang authored Dec 20, 2022

Bugs have been reported on 8 sockets x86 machines in which the TSC was
wrongly disabled when the system is under heavy workload.

 [ 818.380354] clocksource: timekeeping watchdog on CPU336: hpet wd-wd read-back delay of 1203520ns
 [ 818.436160] clocksource: wd-tsc-wd read-back delay of 181880ns, clock-skew test skipped!
 [ 819.402962] clocksource: timekeeping watchdog on CPU338: hpet wd-wd read-back delay of 324000ns
 [ 819.448036] clocksource: wd-tsc-wd read-back delay of 337240ns, clock-skew test skipped!
 [ 819.880863] clocksource: timekeeping watchdog on CPU339: hpet read-back delay of 150280ns, attempt 3, marking unstable
 [ 819.936243] tsc: Marking TSC unstable due to clocksource watchdog
 [ 820.068173] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
 [ 820.092382] sched_clock: Marking unstable (818769414384, 1195404998)
 [ 820.643627] clocksource: Checking clocksource tsc synchronization from CPU 267 to CPUs 0,4,25,70,126,430,557,564.
 [ 821.067990] clocksource: Switched to clocksource hpet

This can be reproduced by running memory intensive 'stream' tests,
or some of the stress-ng subcases such as 'ioport'.

The reason for these issues is the when system is under heavy load, the
read latency of the clocksources can be very high.  Even lightweight TSC
reads can show high latencies, and latencies are much worse for external
clocksources such as HPET or the APIC PM timer.  These latencies can
result in false-positive clocksource-unstable determinations.

These issues were initially reported by a customer running on a production
system, and this problem was reproduced on several generations of Xeon
servers, especially when running the stress-ng test.  These Xeon servers
were not production systems, but they did have the latest steppings
and firmware.

Given that the clocksource watchdog is a continual diagnostic check with
frequency of twice a second, there is no need to rush it when the system
is under heavy load.  Therefore, when high clocksource read latencies
are detected, suspend the watchdog timer for 5 minutes.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Waiman Long <longman@redhat.com>
Cc: John Stultz <jstultz@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Feng Tang <feng.tang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

b7082cdf

11 Jan, 2023 1 commit

timers: Prevent union confusion from unexpected restart_syscall() · 9f76d591

Jann Horn authored Jan 05, 2023

The nanosleep syscalls use the restart_block mechanism, with a quirk:
The `type` and `rmtp`/`compat_rmtp` fields are set up unconditionally on
syscall entry, while the rest of the restart_block is only set up in the
unlikely case that the syscall is actually interrupted by a signal (or
pseudo-signal) that doesn't have a signal handler.

If the restart_block was set up by a previous syscall (futex(...,
FUTEX_WAIT, ...) or poll()) and hasn't been invalidated somehow since then,
this will clobber some of the union fields used by futex_wait_restart() and
do_restart_poll().

If userspace afterwards wrongly calls the restart_syscall syscall,
futex_wait_restart()/do_restart_poll() will read struct fields that have
been clobbered.

This doesn't actually lead to anything particularly interesting because
none of the union fields contain trusted kernel data, and
futex(..., FUTEX_WAIT, ...) and poll() aren't syscalls where it makes much
sense to apply seccomp filters to their arguments.

So the current consequences are just of the "if userspace does bad stuff,
it can damage itself, and that's not a problem" flavor.

But still, it seems like a hazard for future developers, so invalidate the
restart_block when partly setting it up in the nanosleep syscalls.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230105134403.754986-1-jannh@google.com

9f76d591

08 Jan, 2023 3 commits

Linux 6.2-rc3 · b7bfaa76
Linus Torvalds authored Jan 08, 2023

b7bfaa76

Merge tag 'powerpc-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 93928d48

Linus Torvalds authored Jan 08, 2023

Pull powerpc fixes from Michael Ellerman:

 - Three fixes for various bogosity in our linker script, revealed
   by the recent commit which changed discard behaviour with some
   toolchains.

* tag 'powerpc-6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/vmlinux.lds: Don't discard .comment
  powerpc/vmlinux.lds: Don't discard .rela* for relocatable builds
  powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT

93928d48

Merge tag 'fixes-2023-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock · e9ffbf16

Linus Torvalds authored Jan 08, 2023

Pull memblock fixes from Mike Rapoport:
 "Small fixes in kernel-doc and tests:

   - Fix kernel-doc for memblock_phys_free() to use correct names for
     the counterpart allocation methods

   - Fix compilation error in memblock tests"

* tag 'fixes-2023-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock: Fix doc for memblock_phys_free
  memblock tests: Fix compilation error.

e9ffbf16

07 Jan, 2023 6 commits

Merge tag 'nfs-for-6.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 9b43a525

Linus Torvalds authored Jan 07, 2023

Pull NFS client fixes from Trond Myklebust:

 - Fix a race in the RPCSEC_GSS upcall code that causes hung RPC calls

 - Fix a broken coalescing test in the pNFS file layout driver

 - Ensure that the access cache rcu path also applies the login test

 - Fix up for a sparse warning

* tag 'nfs-for-6.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix up a sparse warning
  NFS: Judge the file access cache's timestamp in rcu path
  pNFS/filelayout: Fix coalescing test for single DS
  SUNRPC: ensure the matching upcall is in-flight upon downcall

9b43a525

Merge tag '6.2-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · f18fca98

Linus Torvalds authored Jan 07, 2023

Pull cifs fixes from Steve French:
 "cifs/smb3 client fixes:

   - two multichannel fixes

   - three reconnect fixes

   - unmap fix"

* tag '6.2-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix interface count calculation during refresh
  cifs: refcount only the selected iface during interface update
  cifs: protect access of TCP_Server_Info::{dstaddr,hostname}
  cifs: fix race in assemble_neg_contexts()
  cifs: ignore ipc reconnect failures during dfs failover
  cifs: Fix kmap_local_page() unmapping

f18fca98

Merge tag 'devicetree-fixes-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · 0007c040

Linus Torvalds authored Jan 07, 2023

Pull devicetree fixes from Rob Herring:

 - Fix DT memory scanning for some MIPS boards when memory is not
   specified in DT

 - Redo CONFIG_CMDLINE* handling for missing /chosen node. The first
   attempt broke PS3 (and possibly other PPC platforms).

 - Fix constraints in QCom Soundwire schema

* tag 'devicetree-fixes-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: fdt: Honor CONFIG_CMDLINE* even without /chosen node, take 2
  Revert "of: fdt: Honor CONFIG_CMDLINE* even without /chosen node"
  dt-bindings: soundwire: qcom,soundwire: correct sizes related to number of ports
  of/fdt: run soc memory setup when early_init_dt_scan_memory fails

0007c040

Merge tag 'usb-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · c28bdeaf

Linus Torvalds authored Jan 07, 2023

Pull USB fixes from Greg KH:
 "Here are some small USB driver fixes for 6.2-rc3 that resolve some
  reported issues. They include:

   - of-reported ulpi problem, so the offending commit is reverted

   - dwc3 driver bugfixes for recent changes

   - fotg210 fixes

  Most of these have been in linux-next for a while, the last few were
  on the mailing list for a long time and passed all the 0-day bot
  testing so all should be fine with them as well"

* tag 'usb-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: dwc3: gadget: Ignore End Transfer delay on teardown
  usb: dwc3: xilinx: include linux/gpio/consumer.h
  usb: fotg210-udc: fix error return code in fotg210_udc_probe()
  usb: fotg210: fix OTG-only build
  Revert "usb: ulpi: defer ulpi_register on ulpi_read_id timeout"

c28bdeaf

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 4a4dcea0

Linus Torvalds authored Jan 07, 2023

Pull rdma fixes from Jason Gunthorpe:
 "Most noticeable is that Yishai found a big data corruption regression
  due to a change in the scatterlist:

   - Do not wrongly combine non-contiguous pages in scatterlist

   - Fix compilation warnings on gcc 13

   - Oops when using some mlx5 stats

   - Bad enforcement of atomic responder resources in mlx5"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  lib/scatterlist: Fix to merge contiguous pages into the last SG properly
  RDMA/mlx5: Fix validation of max_rd_atomic caps for DC
  RDMA/mlx5: Fix mlx5_ib_get_hw_stats when used for device
  RDMA/srp: Move large values to a new enum for gcc13

4a4dcea0

Merge tag 'kbuild-fixes-v6.2-2' of... · a7c4127a

Linus Torvalds authored Jan 07, 2023

Merge tag 'kbuild-fixes-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Fix single *.ko build

 - Fix module builds when vmlinux.o or Module.symver is missing

* tag 'kbuild-fixes-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: readd -w option when vmlinux.o or Module.symver is missing
  kbuild: fix single *.ko build

a7c4127a

06 Jan, 2023 3 commits

Merge tag 'drm-fixes-2023-01-06' of git://anongit.freedesktop.org/drm/drm · 0a715535

Linus Torvalds authored Jan 06, 2023

Pull drm fixes from Daniel Vetter:
 "Still not much, but more than last week. Dave should be back next week
  from the beaching.

  drivers:
   - i915-gvt fixes
   - amdgpu/kfd fixes
   - panfrost bo refcounting fix
   - meson afbc corruption fix
   - imx plane width fix

  core:
   - drm/sched fixes
   - drm/mm kunit test fix
   - dma-buf export error handling fixes"

* tag 'drm-fixes-2023-01-06' of git://anongit.freedesktop.org/drm/drm:
  Revert "drm/amd/display: Enable Freesync Video Mode by default"
  drm/i915/gvt: fix double free bug in split_2MB_gtt_entry
  drm/i915/gvt: use atomic operations to change the vGPU status
  drm/i915/gvt: fix vgpu debugfs clean in remove
  drm/i915/gvt: fix gvt debugfs destroy
  drm/i915: unpin on error in intel_vgpu_shadow_mm_pin()
  drm/amd/display: Uninitialized variables causing 4k60 UCLK to stay at DPM1 and not DPM0
  drm/amdkfd: Fix kernel warning during topology setup
  drm/scheduler: Fix lockup in drm_sched_entity_kill()
  drm/imx: ipuv3-plane: Fix overlay plane width
  drm/scheduler: Fix lockup in drm_sched_entity_kill()
  drm/virtio: Fix memory leak in virtio_gpu_object_create()
  drm/meson: Reduce the FIFO lines held when AFBC is not used
  drm/tests: reduce drm_mm_test stack usage
  drm/panfrost: Fix GEM handle creation ref-counting
  drm/plane-helper: Add the missing declaration of drm_atomic_state
  dma-buf: fix dma_buf_export init order v2

0a715535

tpm: Allow system suspend to continue when TPM suspend fails · 1382999a

Jason A. Donenfeld authored Jan 06, 2023

TPM 1 is sometimes broken across system suspends, due to races or
locking issues or something else that haven't been diagnosed or fixed
yet, most likely having to do with concurrent reads from the TPM's
hardware random number generator driver. These issues prevent the system
from actually suspending, with errors like:

  tpm tpm0: A TPM error (28) occurred continue selftest
  ...
  tpm tpm0: A TPM error (28) occurred attempting get random
  ...
  tpm tpm0: Error (28) sending savestate before suspend
  tpm_tis 00:08: PM: __pnp_bus_suspend(): tpm_pm_suspend+0x0/0x80 returns 28
  tpm_tis 00:08: PM: dpm_run_callback(): pnp_bus_suspend+0x0/0x10 returns 28
  tpm_tis 00:08: PM: failed to suspend: error 28
  PM: Some devices failed to suspend, or early wake event detected

This issue was partially fixed by 23393c64 ("char: tpm: Protect
tpm_pm_suspend with locks"), in a last minute 6.1 commit that Linus took
directly because the TPM maintainers weren't available. However, it
seems like this just addresses the most common cases of the bug, rather
than addressing it entirely. So there are more things to fix still,
apparently.

In lieu of actually fixing the underlying bug, just allow system suspend
to continue, so that laptops still go to sleep fine. Later, this can be
reverted when the real bug is fixed.

Link: https://lore.kernel.org/lkml/7cbe96cf-e0b5-ba63-d1b4-f63d2e826efa@suse.cz/
Cc: stable@vger.kernel.org # 6.1+
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Luigi Semenzato <semenzato@chromium.org>
Cc: Peter Huewe <peterhuewe@gmx.de>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Johannes Altmanninger <aclopte@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

1382999a

hfs/hfsplus: avoid WARN_ON() for sanity check, use proper error handling · cb7a95af

Linus Torvalds authored Jan 04, 2023

Commit 55d1cbbb ("hfs/hfsplus: use WARN_ON for sanity check") fixed
a build warning by turning a comment into a WARN_ON(), but it turns out
that syzbot then complains because it can trigger said warning with a
corrupted hfs image.

The warning actually does warn about a bad situation, but we are much
better off just handling it as the error it is.  So rather than warn
about us doing bad things, stop doing the bad things and return -EIO.

While at it, also fix a memory leak that was introduced by an earlier
fix for a similar syzbot warning situation, and add a check for one case
that historically wasn't handled at all (ie neither comment nor
subsequent WARN_ON).

Reported-by: syzbot+7bb7cd3595533513a9e7@syzkaller.appspotmail.com
Fixes: 55d1cbbb ("hfs/hfsplus: use WARN_ON for sanity check")
Fixes: 8d824e69 ("hfs: fix OOB Read in __hfs_brec_find")
Link: https://lore.kernel.org/lkml/000000000000dbce4e05f170f289@google.com/Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cb7a95af