1. 13 Feb, 2023 18 commits
  2. 09 Feb, 2023 1 commit
  3. 07 Feb, 2023 1 commit
  4. 06 Feb, 2023 1 commit
  5. 02 Feb, 2023 2 commits
    • Paul E. McKenney's avatar
      clocksource: Verify HPET and PMTMR when TSC unverified · efc8b329
      Paul E. McKenney authored
      On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC,
      NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the
      TSC is disabled.  This works well much of the time, but there is the
      occasional production-level system that meets all of these criteria, but
      which still has a TSC that skews significantly from atomic-clock time.
      This is usually attributed to a firmware or hardware fault.  Yes, the
      various NTP daemons do express their opinions of userspace-to-atomic-clock
      time skew, but they put them in various places, depending on the daemon
      and distro in question.  It would therefore be good for the kernel to
      have some clue that there is a problem.
      
      The old behavior of marking the TSC unstable is a non-starter because a
      great many workloads simply cannot tolerate the overheads and latencies
      of the various non-TSC clocksources.  In addition, NTP-corrected systems
      sometimes can tolerate significant kernel-space time skew as long as
      the userspace time sources are within epsilon of atomic-clock time.
      
      Therefore, when watchdog verification of TSC is disabled, enable it for
      HPET and PMTMR (AKA ACPI PM timer).  This provides the needed in-kernel
      time-skew diagnostic without degrading the system's performance.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: <x86@kernel.org>
      Tested-by: default avatarFeng Tang <feng.tang@intel.com>
      efc8b329
    • Feng Tang's avatar
      x86/tsc: Add option to force frequency recalibration with HW timer · a7ec817d
      Feng Tang authored
      The kernel assumes that the TSC frequency which is provided by the
      hardware / firmware via MSRs or CPUID(0x15) is correct after applying
      a few basic consistency checks. This disables the TSC recalibration
      against HPET or PM timer.
      
      As a result there is no mechanism to validate that frequency in cases
      where a firmware or hardware defect is suspected. And there was case
      that some user used atomic clock to measure the TSC frequency and
      reported an inaccuracy issue, which was later fixed in firmware.
      
      Add an option 'recalibrate' for 'tsc' kernel parameter to force the
      tsc freq recalibration with HPET or PM timer, and warn if the
      deviation from previous value is more than about 500 PPM, which
      provides a way to verify the data from hardware / firmware.
      
      There is no functional change to existing work flow.
      
      Recently there was a real-world case: "The 40ms/s divergence between
      TSC and HPET was observed on hardware that is quite recent" [1], on
      that platform the TSC frequence 1896 MHz was got from CPUID(0x15),
      and the force-reclibration with HPET/PMTIMER both calibrated out
      value of 1975 MHz, which also matched with check from software
      'chronyd', indicating it's a problem of BIOS or firmware.
      
      [Thanks tglx for helping improving the commit log]
      [ paulmck: Wordsmith Kconfig help text. ]
      
      [1]. https://lore.kernel.org/lkml/20221117230910.GI4001@paulmck-ThinkPad-P17-Gen-1/Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: <x86@kernel.org>
      Cc: <linux-doc@vger.kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      a7ec817d
  6. 31 Jan, 2023 3 commits
  7. 24 Jan, 2023 1 commit
    • Feng Tang's avatar
      clocksource: Suspend the watchdog temporarily when high read latency detected · b7082cdf
      Feng Tang authored
      Bugs have been reported on 8 sockets x86 machines in which the TSC was
      wrongly disabled when the system is under heavy workload.
      
       [ 818.380354] clocksource: timekeeping watchdog on CPU336: hpet wd-wd read-back delay of 1203520ns
       [ 818.436160] clocksource: wd-tsc-wd read-back delay of 181880ns, clock-skew test skipped!
       [ 819.402962] clocksource: timekeeping watchdog on CPU338: hpet wd-wd read-back delay of 324000ns
       [ 819.448036] clocksource: wd-tsc-wd read-back delay of 337240ns, clock-skew test skipped!
       [ 819.880863] clocksource: timekeeping watchdog on CPU339: hpet read-back delay of 150280ns, attempt 3, marking unstable
       [ 819.936243] tsc: Marking TSC unstable due to clocksource watchdog
       [ 820.068173] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
       [ 820.092382] sched_clock: Marking unstable (818769414384, 1195404998)
       [ 820.643627] clocksource: Checking clocksource tsc synchronization from CPU 267 to CPUs 0,4,25,70,126,430,557,564.
       [ 821.067990] clocksource: Switched to clocksource hpet
      
      This can be reproduced by running memory intensive 'stream' tests,
      or some of the stress-ng subcases such as 'ioport'.
      
      The reason for these issues is the when system is under heavy load, the
      read latency of the clocksources can be very high.  Even lightweight TSC
      reads can show high latencies, and latencies are much worse for external
      clocksources such as HPET or the APIC PM timer.  These latencies can
      result in false-positive clocksource-unstable determinations.
      
      These issues were initially reported by a customer running on a production
      system, and this problem was reproduced on several generations of Xeon
      servers, especially when running the stress-ng test.  These Xeon servers
      were not production systems, but they did have the latest steppings
      and firmware.
      
      Given that the clocksource watchdog is a continual diagnostic check with
      frequency of twice a second, there is no need to rush it when the system
      is under heavy load.  Therefore, when high clocksource read latencies
      are detected, suspend the watchdog timer for 5 minutes.
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Cc: John Stultz <jstultz@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: Feng Tang <feng.tang@intel.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      b7082cdf
  8. 11 Jan, 2023 1 commit
    • Jann Horn's avatar
      timers: Prevent union confusion from unexpected restart_syscall() · 9f76d591
      Jann Horn authored
      The nanosleep syscalls use the restart_block mechanism, with a quirk:
      The `type` and `rmtp`/`compat_rmtp` fields are set up unconditionally on
      syscall entry, while the rest of the restart_block is only set up in the
      unlikely case that the syscall is actually interrupted by a signal (or
      pseudo-signal) that doesn't have a signal handler.
      
      If the restart_block was set up by a previous syscall (futex(...,
      FUTEX_WAIT, ...) or poll()) and hasn't been invalidated somehow since then,
      this will clobber some of the union fields used by futex_wait_restart() and
      do_restart_poll().
      
      If userspace afterwards wrongly calls the restart_syscall syscall,
      futex_wait_restart()/do_restart_poll() will read struct fields that have
      been clobbered.
      
      This doesn't actually lead to anything particularly interesting because
      none of the union fields contain trusted kernel data, and
      futex(..., FUTEX_WAIT, ...) and poll() aren't syscalls where it makes much
      sense to apply seccomp filters to their arguments.
      
      So the current consequences are just of the "if userspace does bad stuff,
      it can damage itself, and that's not a problem" flavor.
      
      But still, it seems like a hazard for future developers, so invalidate the
      restart_block when partly setting it up in the nanosleep syscalls.
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20230105134403.754986-1-jannh@google.com
      9f76d591
  9. 08 Jan, 2023 3 commits
  10. 07 Jan, 2023 6 commits
  11. 06 Jan, 2023 3 commits
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2023-01-06' of git://anongit.freedesktop.org/drm/drm · 0a715535
      Linus Torvalds authored
      Pull drm fixes from Daniel Vetter:
       "Still not much, but more than last week. Dave should be back next week
        from the beaching.
      
        drivers:
         - i915-gvt fixes
         - amdgpu/kfd fixes
         - panfrost bo refcounting fix
         - meson afbc corruption fix
         - imx plane width fix
      
        core:
         - drm/sched fixes
         - drm/mm kunit test fix
         - dma-buf export error handling fixes"
      
      * tag 'drm-fixes-2023-01-06' of git://anongit.freedesktop.org/drm/drm:
        Revert "drm/amd/display: Enable Freesync Video Mode by default"
        drm/i915/gvt: fix double free bug in split_2MB_gtt_entry
        drm/i915/gvt: use atomic operations to change the vGPU status
        drm/i915/gvt: fix vgpu debugfs clean in remove
        drm/i915/gvt: fix gvt debugfs destroy
        drm/i915: unpin on error in intel_vgpu_shadow_mm_pin()
        drm/amd/display: Uninitialized variables causing 4k60 UCLK to stay at DPM1 and not DPM0
        drm/amdkfd: Fix kernel warning during topology setup
        drm/scheduler: Fix lockup in drm_sched_entity_kill()
        drm/imx: ipuv3-plane: Fix overlay plane width
        drm/scheduler: Fix lockup in drm_sched_entity_kill()
        drm/virtio: Fix memory leak in virtio_gpu_object_create()
        drm/meson: Reduce the FIFO lines held when AFBC is not used
        drm/tests: reduce drm_mm_test stack usage
        drm/panfrost: Fix GEM handle creation ref-counting
        drm/plane-helper: Add the missing declaration of drm_atomic_state
        dma-buf: fix dma_buf_export init order v2
      0a715535
    • Jason A. Donenfeld's avatar
      tpm: Allow system suspend to continue when TPM suspend fails · 1382999a
      Jason A. Donenfeld authored
      TPM 1 is sometimes broken across system suspends, due to races or
      locking issues or something else that haven't been diagnosed or fixed
      yet, most likely having to do with concurrent reads from the TPM's
      hardware random number generator driver. These issues prevent the system
      from actually suspending, with errors like:
      
        tpm tpm0: A TPM error (28) occurred continue selftest
        ...
        tpm tpm0: A TPM error (28) occurred attempting get random
        ...
        tpm tpm0: Error (28) sending savestate before suspend
        tpm_tis 00:08: PM: __pnp_bus_suspend(): tpm_pm_suspend+0x0/0x80 returns 28
        tpm_tis 00:08: PM: dpm_run_callback(): pnp_bus_suspend+0x0/0x10 returns 28
        tpm_tis 00:08: PM: failed to suspend: error 28
        PM: Some devices failed to suspend, or early wake event detected
      
      This issue was partially fixed by 23393c64 ("char: tpm: Protect
      tpm_pm_suspend with locks"), in a last minute 6.1 commit that Linus took
      directly because the TPM maintainers weren't available. However, it
      seems like this just addresses the most common cases of the bug, rather
      than addressing it entirely. So there are more things to fix still,
      apparently.
      
      In lieu of actually fixing the underlying bug, just allow system suspend
      to continue, so that laptops still go to sleep fine. Later, this can be
      reverted when the real bug is fixed.
      
      Link: https://lore.kernel.org/lkml/7cbe96cf-e0b5-ba63-d1b4-f63d2e826efa@suse.cz/
      Cc: stable@vger.kernel.org # 6.1+
      Reported-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarLuigi Semenzato <semenzato@chromium.org>
      Cc: Peter Huewe <peterhuewe@gmx.de>
      Cc: Jarkko Sakkinen <jarkko@kernel.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Johannes Altmanninger <aclopte@gmail.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1382999a
    • Linus Torvalds's avatar
      hfs/hfsplus: avoid WARN_ON() for sanity check, use proper error handling · cb7a95af
      Linus Torvalds authored
      Commit 55d1cbbb ("hfs/hfsplus: use WARN_ON for sanity check") fixed
      a build warning by turning a comment into a WARN_ON(), but it turns out
      that syzbot then complains because it can trigger said warning with a
      corrupted hfs image.
      
      The warning actually does warn about a bad situation, but we are much
      better off just handling it as the error it is.  So rather than warn
      about us doing bad things, stop doing the bad things and return -EIO.
      
      While at it, also fix a memory leak that was introduced by an earlier
      fix for a similar syzbot warning situation, and add a check for one case
      that historically wasn't handled at all (ie neither comment nor
      subsequent WARN_ON).
      
      Reported-by: syzbot+7bb7cd3595533513a9e7@syzkaller.appspotmail.com
      Fixes: 55d1cbbb ("hfs/hfsplus: use WARN_ON for sanity check")
      Fixes: 8d824e69 ("hfs: fix OOB Read in __hfs_brec_find")
      Link: https://lore.kernel.org/lkml/000000000000dbce4e05f170f289@google.com/Tested-by: default avatarMichael Schmitz <schmitzmic@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Viacheslav Dubeyko <slava@dubeyko.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb7a95af