Commits · b9fa4cbd8415c57d514a45c5eb6272d40961d6c7 · Kirill Smelkov / linux

25 Jan, 2024 4 commits

Merge tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · b9fa4cbd

Linus Torvalds authored Jan 25, 2024

Pull nfsd fixes from Chuck Lever:

 - Fix in-kernel RPC UDP transport

 - Fix NFSv4.0 RELEASE_LOCKOWNER

* tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: fix RELEASE_LOCKOWNER
  SUNRPC: use request size to initialize bio_vec in svc_udp_sendto()

b9fa4cbd

Merge tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux · 3cb9871f

Linus Torvalds authored Jan 25, 2024

Pull RCU fix from Neeraj Upadhyay:
 "This fixes RCU grace period stalls, which are observed when an
  outgoing CPU's quiescent state reporting results in wakeup of one of
  the grace period kthreads, to complete the grace period.

  If those kthreads have SCHED_FIFO policy, the wake up can indirectly
  arm the RT bandwith timer to the local offline CPU.

  Earlier migration of the hrtimers from the CPU introduced in commit
  5c0930cc ("hrtimers: Push pending hrtimers away from outgoing CPU
  earlier") results in this timer getting ignored.

  If the RCU grace period kthreads are waiting for RT bandwidth to be
  available, they may never be actually scheduled, resulting in RCU
  stall warnings"

* tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux:
  rcu: Defer RCU kthreads wakeup when CPU is dying

3cb9871f

Merge tag 'ceph-for-6.8-rc2' of https://github.com/ceph/ceph-client · 6098d87e

Linus Torvalds authored Jan 24, 2024

Pull ceph fixes from Ilya Dryomov:
 "A fix to avoid triggering an assert in some cases where RBD exclusive
  mappings are involved and a deprecated API cleanup"

* tag 'ceph-for-6.8-rc2' of https://github.com/ceph/ceph-client:
  rbd: don't move requests to the running list on errors
  rbd: remove usage of the deprecated ida_simple_*() API

6098d87e

Merge tag 'integrity-v6.8-rc1' of... · f22face1

Linus Torvalds authored Jan 24, 2024

Merge tag 'integrity-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity fix from Mimi Zohar:
 "Revert patch that required user-provided key data, since keys can be
  created from kernel-generated random numbers"

* tag 'integrity-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  Revert "KEYS: encrypted: Add check for strsep"

f22face1

24 Jan, 2024 10 commits

Merge tag 'execve-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · cf10015a

Linus Torvalds authored Jan 24, 2024

Pull execve fixes from Kees Cook:

 - Fix error handling in begin_new_exec() (Bernd Edlinger)

 - MAINTAINERS: specifically mention ELF (Alexey Dobriyan)

 - Various cleanups related to earlier open() (Askar Safin, Kees Cook)

* tag 'execve-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  exec: Distinguish in_execve from in_exec
  exec: Fix error handling in begin_new_exec()
  exec: Add do_close_execat() helper
  exec: remove useless comment
  ELF, MAINTAINERS: specifically mention ELF

cf10015a

uselib: remove use of __FMODE_EXEC · 3eab8301

Linus Torvalds authored Jan 24, 2024

Jann Horn points out that uselib() really shouldn't trigger the new
FMODE_EXEC logic introduced by commit 4759ff71 ("exec: __FMODE_EXEC
instead of in_execve for LSMs").

In fact, it shouldn't even have ever triggered the old pre-existing
logic for __FMODE_EXEC (like the NFS code that makes executables not
need read permissions).  Unlike a real execve(), that can work even with
files that are purely executable by the user (not readable), uselib()
has that MAY_READ requirement becasue it's really just a convenience
wrapper around mmap() for legacy shared libraries.

The whole FMODE_EXEC bit was originally introduced by commit
b500531e ("[PATCH] Introduce FMODE_EXEC file flag"), primarily to
give ETXTBUSY error returns for distributed filesystems.

It has since grown a few other warts (like that NFS thing), but there
really isn't any reason to use it for uselib(), and now that we are
trying to use it to replace the horrid 'tsk->in_execve' flag, it's
actively wrong.

Of course, as Jann Horn also points out, nobody should be enabling
CONFIG_USELIB in the first place in this day and age, but that's a
different discussion entirely.
Reported-by: Jann Horn <jannh@google.com>
Fixes: 4759ff71 ("exec: __FMODE_EXEC instead of in_execve for LSMs")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3eab8301

Revert "KEYS: encrypted: Add check for strsep" · 1ed4b563

Mimi Zohar authored Jan 24, 2024

This reverts commit b4af096b.

New encrypted keys are created either from kernel-generated random
numbers or user-provided decrypted data.  Revert the change requiring
user-provided decrypted data.
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>

1ed4b563

samples/cgroup: add .gitignore file for generated samples · 443b3490

Linus Torvalds authored Jan 24, 2024

Make 'git status' quietly happy again after a full allmodconfig build.

Fixes: 60433a9d ("samples: introduce new samples subdir for cgroup")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

443b3490

exec: Distinguish in_execve from in_exec · 90383cc0

Kees Cook authored Jan 24, 2024

Just to help distinguish the fs->in_exec flag from the current->in_execve
flag, add comments in check_unsafe_exec() and copy_fs() for more
context. Also note that in_execve is only used by TOMOYO now.

Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Kees Cook <keescook@chromium.org>

90383cc0

exec: Check __FMODE_EXEC instead of in_execve for LSMs · 4759ff71

Kees Cook authored Jan 24, 2024

After commit 978ffcbf ("execve: open the executable file before
doing anything else"), current->in_execve was no longer in sync with the
open(). This broke AppArmor and TOMOYO which depend on this flag to
distinguish "open" operations from being "exec" operations.

Instead of moving around in_execve, switch to using __FMODE_EXEC, which
is where the "is this an exec?" intent is stored. Note that TOMOYO still
uses in_execve around cred handling.
Reported-by: Kevin Locke <kevin@kevinlocke.name>
Closes: https://lore.kernel.org/all/ZbE4qn9_h14OqADK@kevinlocke.nameSuggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 978ffcbf ("execve: open the executable file before doing anything else")
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: James Morris <jmorris@namei.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc:  <linux-fsdevel@vger.kernel.org>
Cc:  <linux-mm@kvack.org>
Cc:  <apparmor@lists.ubuntu.com>
Cc:  <linux-security-module@vger.kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4759ff71

rcu: Defer RCU kthreads wakeup when CPU is dying · e787644c

Frederic Weisbecker authored Dec 19, 2023

When the CPU goes idle for the last time during the CPU down hotplug
process, RCU reports a final quiescent state for the current CPU. If
this quiescent state propagates up to the top, some tasks may then be
woken up to complete the grace period: the main grace period kthread
and/or the expedited main workqueue (or kworker).

If those kthreads have a SCHED_FIFO policy, the wake up can indirectly
arm the RT bandwith timer to the local offline CPU. Since this happens
after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the
timer gets ignored. Therefore if the RCU kthreads are waiting for RT
bandwidth to be available, they may never be actually scheduled.

This triggers TREE03 rcutorture hangs:

	 rcu: INFO: rcu_preempt self-detected stall on CPU
	 rcu:     4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved)
	 rcu:     (t=21035 jiffies g=938281 q=40787 ncpus=6)
	 rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
	 rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
	 rcu: RCU grace-period kthread stack dump:
	 task:rcu_preempt     state:R  running task     stack:14896 pid:14    tgid:14    ppid:2      flags:0x00004000
	 Call Trace:
	  <TASK>
	  __schedule+0x2eb/0xa80
	  schedule+0x1f/0x90
	  schedule_timeout+0x163/0x270
	  ? __pfx_process_timeout+0x10/0x10
	  rcu_gp_fqs_loop+0x37c/0x5b0
	  ? __pfx_rcu_gp_kthread+0x10/0x10
	  rcu_gp_kthread+0x17c/0x200
	  kthread+0xde/0x110
	  ? __pfx_kthread+0x10/0x10
	  ret_from_fork+0x2b/0x40
	  ? __pfx_kthread+0x10/0x10
	  ret_from_fork_asm+0x1b/0x30
	  </TASK>

The situation can't be solved with just unpinning the timer. The hrtimer
infrastructure and the nohz heuristics involved in finding the best
remote target for an unpinned timer would then also need to handle
enqueues from an offline CPU in the most horrendous way.

So fix this on the RCU side instead and defer the wake up to an online
CPU if it's too late for the local one.
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Fixes: 5c0930cc ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>

e787644c

Merge tag 'fbdev-for-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev · 1110ebe0

Linus Torvalds authored Jan 24, 2024

Pull fbdev fixes and cleanups from Helge Deller:
 "A crash fix in stifb which was missed to be included in the drm-misc
  tree, two checks to prevent wrong userspace input in sisfb and
  savagefb and two trivial printk cleanups:

   - stifb: Fix crash in stifb_blank()

   - savage/sis: Error out if pixclock equals zero

   - minor trivial cleanups"

* tag 'fbdev-for-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: stifb: Fix crash in stifb_blank()
  fbcon: Fix incorrect printed function name in fbcon_prepare_logo()
  fbdev: sis: Error out if pixclock equals zero
  fbdev: savage: Error out if pixclock equals zero
  fbdev: vt8500lcdfb: Remove unnecessary print function dev_err()

1110ebe0

nfsd: fix RELEASE_LOCKOWNER · edcf9725

NeilBrown authored Jan 22, 2024

The test on so_count in nfsd4_release_lockowner() is nonsense and
harmful.  Revert to using check_for_locks(), changing that to not sleep.

First: harmful.
As is documented in the kdoc comment for nfsd4_release_lockowner(), the
test on so_count can transiently return a false positive resulting in a
return of NFS4ERR_LOCKS_HELD when in fact no locks are held.  This is
clearly a protocol violation and with the Linux NFS client it can cause
incorrect behaviour.

If RELEASE_LOCKOWNER is sent while some other thread is still
processing a LOCK request which failed because, at the time that request
was received, the given owner held a conflicting lock, then the nfsd
thread processing that LOCK request can hold a reference (conflock) to
the lock owner that causes nfsd4_release_lockowner() to return an
incorrect error.

The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
it knows that the error is impossible.  It assumes the lock owner was in
fact released so it feels free to use the same lock owner identifier in
some later locking request.

When it does reuse a lock owner identifier for which a previous RELEASE
failed, it will naturally use a lock_seqid of zero.  However the server,
which didn't release the lock owner, will expect a larger lock_seqid and
so will respond with NFS4ERR_BAD_SEQID.

So clearly it is harmful to allow a false positive, which testing
so_count allows.

The test is nonsense because ... well... it doesn't mean anything.

so_count is the sum of three different counts.
1/ the set of states listed on so_stateids
2/ the set of active vfs locks owned by any of those states
3/ various transient counts such as for conflicting locks.

When it is tested against '2' it is clear that one of these is the
transient reference obtained by find_lockowner_str_locked().  It is not
clear what the other one is expected to be.

In practice, the count is often 2 because there is precisely one state
on so_stateids.  If there were more, this would fail.

In my testing I see two circumstances when RELEASE_LOCKOWNER is called.
In one case, CLOSE is called before RELEASE_LOCKOWNER.  That results in
all the lock states being removed, and so the lockowner being discarded
(it is removed when there are no more references which usually happens
when the lock state is discarded).  When nfsd4_release_lockowner() finds
that the lock owner doesn't exist, it returns success.

The other case shows an so_count of '2' and precisely one state listed
in so_stateid.  It appears that the Linux client uses a separate lock
owner for each file resulting in one lock state per lock owner, so this
test on '2' is safe.  For another client it might not be safe.

So this patch changes check_for_locks() to use the (newish)
find_any_file_locked() so that it doesn't take a reference on the
nfs4_file and so never calls nfsd_file_put(), and so never sleeps.  With
this check is it safe to restore the use of check_for_locks() rather
than testing so_count against the mysterious '2'.

Fixes: ce3c4ad7 ("NFSD: Fix possible sleep during nfsd4_release_lockowner()")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org # v6.2+
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

edcf9725

Merge tag 'trace-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 615d3006

Linus Torvalds authored Jan 23, 2024

Pull tracing and eventfs fixes from Steven Rostedt:

 - Fix histogram tracing_map insertion.

   The tracing_map_insert copies the value into the elt variable and
   then assigns the elt to the entry value. But it is possible that the
   entry value becomes visible on other CPUs before the elt is fully
   initialized. This is fixed by adding a wmb() between the
   initialization of the elt variable and assigning it.

 - Have eventfs directory have unique inode numbers.

   Having them be all the same proved to be a failure as the 'find'
   application will think that the directories are causing loops, as it
   checks for directory loops via their inodes. Have the evenfs dir
   entries get their inodes assigned when they are referenced and then
   save them in the eventfs_inode structure.

* tag 'trace-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Save directory inodes in the eventfs_inode structure
  tracing: Ensure visibility when inserting an element into tracing_map

615d3006

23 Jan, 2024 3 commits

eventfs: Save directory inodes in the eventfs_inode structure · 834bf76a

Steven Rostedt (Google) authored Jan 22, 2024

The eventfs inodes and directories are allocated when referenced. But this
leaves the issue of keeping consistent inode numbers and the number is
only saved in the inode structure itself. When the inode is no longer
referenced, it can be freed. When the file that the inode was representing
is referenced again, the inode is once again created, but the inode number
needs to be the same as it was before.

Just making the inode numbers the same for all files is fine, but that
does not work with directories. The find command will check for loops via
the inode number and having the same inode number for directories triggers:

  # find /sys/kernel/tracing
find: File system loop detected;
'/sys/kernel/debug/tracing/events/initcall/initcall_finish' is part of the same file system loop as
'/sys/kernel/debug/tracing/events/initcall'.
[..]

Linus pointed out that the eventfs_inode structure ends with a single
32bit int, and on 64 bit machines, there's likely a 4 byte hole due to
alignment. We can use this hole to store the inode number for the
eventfs_inode. All directories in eventfs are represented by an
eventfs_inode and that data structure can hold its inode number.

That last int was also purposely placed at the end of the structure to
prevent holes from within. Now that there's a 4 byte number to hold the
inode, both the inode number and the last integer can be moved up in the
structure for better cache locality, where the llist and rcu fields can be
moved to the end as they are only used when the eventfs_inode is being
deleted.

Link: https://lore.kernel.org/all/CAMuHMdXKiorg-jiuKoZpfZyDJ3Ynrfb8=X+c7x0Eewxn-YRdCA@mail.gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240122152748.46897388@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: 53c41052 ("eventfs: Have the inodes all for files and directories all be the same")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Kees Cook <keescook@chromium.org>

834bf76a

fbdev: stifb: Fix crash in stifb_blank() · 4b088005

Helge Deller authored Jan 22, 2024

Avoid a kernel crash in stifb by providing the correct pointer to the fb_info
struct. Prior to commit e2e0b838 ("video/sticore: Remove info field from
STI struct") the fb_info struct was at the beginning of the fb struct.

Fixes: e2e0b838 ("video/sticore: Remove info field from STI struct")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Thomas Zimmermann <tzimmermann@suse.de>

4b088005

drm/ttm: fix ttm pool initialization for no-dma-device drivers · 7ed2632e

Fedor Pchelkin authored Jan 14, 2024

The QXL driver doesn't use any device for DMA mappings or allocations so
dev_to_node() will panic inside ttm_device_init() on NUMA systems:

  general protection fault, probably for non-canonical address 0xdffffc000000007a: 0000 [#1] PREEMPT SMP KASAN NOPTI
  KASAN: null-ptr-deref in range [0x00000000000003d0-0x00000000000003d7]
  CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.7.0+ #9
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  RIP: 0010:ttm_device_init+0x10e/0x340
  Call Trace:
    qxl_ttm_init+0xaa/0x310
    qxl_device_init+0x1071/0x2000
    qxl_pci_probe+0x167/0x3f0
    local_pci_probe+0xe1/0x1b0
    pci_device_probe+0x29d/0x790
    really_probe+0x251/0x910
    __driver_probe_device+0x1ea/0x390
    driver_probe_device+0x4e/0x2e0
    __driver_attach+0x1e3/0x600
    bus_for_each_dev+0x12d/0x1c0
    bus_add_driver+0x25a/0x590
    driver_register+0x15c/0x4b0
    qxl_pci_driver_init+0x67/0x80
    do_one_initcall+0xf5/0x5d0
    kernel_init_freeable+0x637/0xb10
    kernel_init+0x1c/0x2e0
    ret_from_fork+0x48/0x80
    ret_from_fork_asm+0x1b/0x30
  RIP: 0010:ttm_device_init+0x10e/0x340

Fall back to NUMA_NO_NODE if there is no device for DMA.

Found by Linux Verification Center (linuxtesting.org).

Fixes: b0a7ce53 ("drm/ttm: Schedule delayed_delete worker closer")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7ed2632e

22 Jan, 2024 10 commits

Revert "btrfs: zstd: fix and simplify the inline extent decompression" · e01a83e1

Linus Torvalds authored Jan 22, 2024

This reverts commit 1e7f6def.

It causes my machine to not even boot, and Klara Modin reports that the
cause is that small zstd-compressed files return garbage when read.
Reported-by: Klara Modin <klarasmodin@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CABq1_vj4GpUeZpVG49OHCo-3sdbe2-2ROcu_xDvUG-6-5zPRXg@mail.gmail.com/Reported-and-bisected-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: David Sterba <dsterba@suse.com>
Cc: Qu Wenruo <wqu@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e01a83e1

tracing: Ensure visibility when inserting an element into tracing_map · 2b447606

Petr Pavlu authored Jan 22, 2024

Running the following two commands in parallel on a multi-processor
AArch64 machine can sporadically produce an unexpected warning about
duplicate histogram entries:

 $ while true; do
     echo hist:key=id.syscall:val=hitcount > \
       /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
     cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
     sleep 0.001
   done
 $ stress-ng --sysbadaddr $(nproc)

The warning looks as follows:

[ 2911.172474] ------------[ cut here ]------------
[ 2911.173111] Duplicates detected: 1
[ 2911.173574] WARNING: CPU: 2 PID: 12247 at kernel/trace/tracing_map.c:983 tracing_map_sort_entries+0x3e0/0x408
[ 2911.174702] Modules linked in: iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) af_packet(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ena(E) tiny_power_button(E) qemu_fw_cfg(E) button(E) fuse(E) efi_pstore(E) ip_tables(E) x_tables(E) xfs(E) libcrc32c(E) aes_ce_blk(E) aes_ce_cipher(E) crct10dif_ce(E) polyval_ce(E) polyval_generic(E) ghash_ce(E) gf128mul(E) sm4_ce_gcm(E) sm4_ce_ccm(E) sm4_ce(E) sm4_ce_cipher(E) sm4(E) sm3_ce(E) sm3(E) sha3_ce(E) sha512_ce(E) sha512_arm64(E) sha2_ce(E) sha256_arm64(E) nvme(E) sha1_ce(E) nvme_core(E) nvme_auth(E) t10_pi(E) sg(E) scsi_mod(E) scsi_common(E) efivarfs(E)
[ 2911.174738] Unloaded tainted modules: cppc_cpufreq(E):1
[ 2911.180985] CPU: 2 PID: 12247 Comm: cat Kdump: loaded Tainted: G            E      6.7.0-default #2 1b58bbb22c97e4399dc09f92d309344f69c44a01
[ 2911.182398] Hardware name: Amazon EC2 c7g.8xlarge/, BIOS 1.0 11/1/2018
[ 2911.183208] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 2911.184038] pc : tracing_map_sort_entries+0x3e0/0x408
[ 2911.184667] lr : tracing_map_sort_entries+0x3e0/0x408
[ 2911.185310] sp : ffff8000a1513900
[ 2911.185750] x29: ffff8000a1513900 x28: ffff0003f272fe80 x27: 0000000000000001
[ 2911.186600] x26: ffff0003f272fe80 x25: 0000000000000030 x24: 0000000000000008
[ 2911.187458] x23: ffff0003c5788000 x22: ffff0003c16710c8 x21: ffff80008017f180
[ 2911.188310] x20: ffff80008017f000 x19: ffff80008017f180 x18: ffffffffffffffff
[ 2911.189160] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000a15134b8
[ 2911.190015] x14: 0000000000000000 x13: 205d373432323154 x12: 5b5d313131333731
[ 2911.190844] x11: 00000000fffeffff x10: 00000000fffeffff x9 : ffffd1b78274a13c
[ 2911.191716] x8 : 000000000017ffe8 x7 : c0000000fffeffff x6 : 000000000057ffa8
[ 2911.192554] x5 : ffff0012f6c24ec0 x4 : 0000000000000000 x3 : ffff2e5b72b5d000
[ 2911.193404] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0003ff254480
[ 2911.194259] Call trace:
[ 2911.194626]  tracing_map_sort_entries+0x3e0/0x408
[ 2911.195220]  hist_show+0x124/0x800
[ 2911.195692]  seq_read_iter+0x1d4/0x4e8
[ 2911.196193]  seq_read+0xe8/0x138
[ 2911.196638]  vfs_read+0xc8/0x300
[ 2911.197078]  ksys_read+0x70/0x108
[ 2911.197534]  __arm64_sys_read+0x24/0x38
[ 2911.198046]  invoke_syscall+0x78/0x108
[ 2911.198553]  el0_svc_common.constprop.0+0xd0/0xf8
[ 2911.199157]  do_el0_svc+0x28/0x40
[ 2911.199613]  el0_svc+0x40/0x178
[ 2911.200048]  el0t_64_sync_handler+0x13c/0x158
[ 2911.200621]  el0t_64_sync+0x1a8/0x1b0
[ 2911.201115] ---[ end trace 0000000000000000 ]---

The problem appears to be caused by CPU reordering of writes issued from
__tracing_map_insert().

The check for the presence of an element with a given key in this
function is:

 val = READ_ONCE(entry->val);
 if (val && keys_match(key, val->key, map->key_size)) ...

The write of a new entry is:

 elt = get_free_elt(map);
 memcpy(elt->key, key, map->key_size);
 entry->val = elt;

The "memcpy(elt->key, key, map->key_size);" and "entry->val = elt;"
stores may become visible in the reversed order on another CPU. This
second CPU might then incorrectly determine that a new key doesn't match
an already present val->key and subsequently insert a new element,
resulting in a duplicate.

Fix the problem by adding a write barrier between
"memcpy(elt->key, key, map->key_size);" and "entry->val = elt;", and for
good measure, also use WRITE_ONCE(entry->val, elt) for publishing the
element. The sequence pairs with the mentioned "READ_ONCE(entry->val);"
and the "val->key" check which has an address dependency.

The barrier is placed on a path executed when adding an element for
a new key. Subsequent updates targeting the same key remain unaffected.

From the user's perspective, the issue was introduced by commit
c193707d ("tracing: Remove code which merges duplicates"), which
followed commit cbf4100e ("tracing: Add support to detect and avoid
duplicates"). The previous code operated differently; it inherently
expected potential races which result in duplicates but merged them
later when they occurred.

Link: https://lore.kernel.org/linux-trace-kernel/20240122150928.27725-1-petr.pavlu@suse.com

Fixes: c193707d ("tracing: Remove code which merges duplicates")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

2b447606

fbcon: Fix incorrect printed function name in fbcon_prepare_logo() · 018856c3

Geert Uytterhoeven authored Jan 22, 2024

If the boot logo does not fit, a message is printed, including a wrong
function name prefix.  Instead of correcting the function name (or using
__func__), just use "fbcon", like is done in several other messages.

While at it, modernize the call by switching to pr_info().
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Helge Deller <deller@gmx.de>

018856c3

Merge tag 'for-6.8-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 5d9248ee

Linus Torvalds authored Jan 22, 2024

Pull btrfs fixes from David Sterba:

 - zoned mode fixes:
     - fix slowdown when writing large file sequentially by looking up
       block groups with enough space faster
     - locking fixes when activating a zone

 - new mount API fixes:
     - preserve mount options for a ro/rw mount of the same subvolume

 - scrub fixes:
     - fix use-after-free in case the chunk length is not aligned to
       64K, this does not happen normally but has been reported on
       images converted from ext4
     - similar alignment check was missing with raid-stripe-tree

 - subvolume deletion fixes:
     - prevent calling ioctl on already deleted subvolume
     - properly track flag tracking a deleted subvolume

 - in subpage mode, fix decompression of an inline extent (zlib, lzo,
   zstd)

 - fix crash when starting writeback on a folio, after integration with
   recent MM changes this needs to be started conditionally

 - reject unknown flags in defrag ioctl

 - error handling, API fixes, minor warning fixes

* tag 'for-6.8-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: scrub: limit RST scrub to chunk boundary
  btrfs: scrub: avoid use-after-free when chunk length is not 64K aligned
  btrfs: don't unconditionally call folio_start_writeback in subpage
  btrfs: use the original mount's mount options for the legacy reconfigure
  btrfs: don't warn if discard range is not aligned to sector
  btrfs: tree-checker: fix inline ref size in error messages
  btrfs: zstd: fix and simplify the inline extent decompression
  btrfs: lzo: fix and simplify the inline extent decompression
  btrfs: zlib: fix and simplify the inline extent decompression
  btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args
  btrfs: avoid copying BTRFS_ROOT_SUBVOL_DEAD flag to snapshot of subvolume being deleted
  btrfs: don't abort filesystem when attempting to snapshot deleted subvolume
  btrfs: zoned: fix lock ordering in btrfs_zone_activate()
  btrfs: fix unbalanced unlock of mapping_tree_lock
  btrfs: ref-verify: free ref cache before clearing mount opt
  btrfs: fix kvcalloc() arguments order in btrfs_ioctl_send()
  btrfs: zoned: optimize hint byte for zoned allocator
  btrfs: zoned: factor out prepare_allocation_zoned()

5d9248ee

exec: Fix error handling in begin_new_exec() · 84c39ec5

Bernd Edlinger authored Jan 22, 2024

If get_unused_fd_flags() fails, the error handling is incomplete because
bprm->cred is already set to NULL, and therefore free_bprm will not
unlock the cred_guard_mutex. Note there are two error conditions which
end up here, one before and one after bprm->cred is cleared.

Fixes: b8a61c9e ("exec: Generic execfd support")
Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Link: https://lore.kernel.org/r/AS8P193MB128517ADB5EFF29E04389EDAE4752@AS8P193MB1285.EURP193.PROD.OUTLOOK.COM
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>

84c39ec5

exec: Add do_close_execat() helper · bdd8f624

Kees Cook authored Sep 16, 2022

Consolidate the calls to allow_write_access()/fput() into a single
place, since we repeat this code pattern. Add comments around the
callers for the details on it.

Link: https://lore.kernel.org/r/202209161637.9EDAF6B18@keescookSigned-off-by: Kees Cook <keescook@chromium.org>

bdd8f624

exec: remove useless comment · 8788a17c

Askar Safin authored Jan 09, 2024

Function name is wrong and the comment tells us nothing
Signed-off-by: Askar Safin <safinaskar@zohomail.com>
Link: https://lore.kernel.org/r/20240109030801.31827-1-safinaskar@zohomail.comSigned-off-by: Kees Cook <keescook@chromium.org>

8788a17c

ELF, MAINTAINERS: specifically mention ELF · 27daa514

Alexey Dobriyan authored Dec 06, 2023

People complain when I miss people in Cc.

[ kees: Also add the ELF uapi doc link ]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Link: https://lore.kernel.org/r/2cb0891e-d7c0-4939-bb5f-282812de6078@p183Signed-off-by: Kees Cook <keescook@chromium.org>

27daa514

Merge tag 'Wstringop-overflow-for-6.8-rc2' of... · 610347ef

Linus Torvalds authored Jan 22, 2024

Merge tag 'Wstringop-overflow-for-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull stringop-overflow warning update from Gustavo A. R. Silva:
 "Enable -Wstringop-overflow globally.

  I waited for the release of -rc1 to run a final build-test on top of
  it before sending this pull request. Fortunatelly, after building 358
  kernels overnight (basically all supported archs with a wide variety
  of configs), no more warnings have surfaced! :)

  Thus, we are in a good position to enable this compiler option for all
  versions of GCC that support it, with the exception of GCC-11, which
  appears to have some issues with this option [1]"

Link: https://lore.kernel.org/lkml/b3c99290-40bc-426f-b3d2-1aa903f95c4e@embeddedor.com/ [1]

* tag 'Wstringop-overflow-for-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  init: Kconfig: Disable -Wstringop-overflow for GCC-11
  Makefile: Enable -Wstringop-overflow globally

610347ef

Merge tag 'xsa448-6.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 0f0d819a

Linus Torvalds authored Jan 22, 2024

Pull xen netback fix from Juergen Gross:
 "Transmit requests in Xen's virtual network protocol can consist of
  multiple parts. While not really useful, except for the initial part
  any of them may be of zero length, i.e. carry no data at all.

  Besides a certain initial portion of the to be transferred data, these
  parts are directly translated into what Linux calls SKB fragments.
  Such converted request parts can, when for a particular SKB they are
  all of length zero, lead to a de-reference of NULL in core networking
  code"

* tag 'xsa448-6.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen-netback: don't produce zero-size SKB frags

0f0d819a

21 Jan, 2024 13 commits

init: Kconfig: Disable -Wstringop-overflow for GCC-11 · a5e0ace0

Gustavo A. R. Silva authored Nov 30, 2023

-Wstringop-overflow is buggy in GCC-11. Therefore, we should disable
this option specifically for that compiler version. To achieve this,
we introduce a new configuration option: GCC11_NO_STRINGOP_OVERFLOW.

The compiler option related to string operation overflow is now managed
under configuration CC_STRINGOP_OVERFLOW. This option is enabled by
default for all other versions of GCC that support it.

Link: https://lore.kernel.org/lkml/b3c99290-40bc-426f-b3d2-1aa903f95c4e@embeddedor.com/
Link: https://lore.kernel.org/lkml/20231128091351.2bfb38dd@canb.auug.org.au/Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/linux-hardening/ZWj1+jkweEDWbmAR@work/Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>

a5e0ace0

Makefile: Enable -Wstringop-overflow globally · 113a6186

Gustavo A. R. Silva authored Oct 31, 2023

It seems that we have finished addressing all the remaining
issues regarding -Wstringop-overflow. So, we are now in good
shape to enable this compiler option globally.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>

113a6186

rbd: don't move requests to the running list on errors · ded080c8

Ilya Dryomov authored Jan 17, 2024

The running list is supposed to contain requests that are pinning the
exclusive lock, i.e. those that must be flushed before exclusive lock
is released.  When wake_lock_waiters() is called to handle an error,
requests on the acquiring list are failed with that error and no
flushing takes place.  Briefly moving them to the running list is not
only pointless but also harmful: if exclusive lock gets acquired
before all of their state machines are scheduled and go through
rbd_lock_del_request(), we trigger

    rbd_assert(list_empty(&rbd_dev->running_list));

in rbd_try_acquire_lock().

Cc: stable@vger.kernel.org
Fixes: 637cd060 ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>

ded080c8

rbd: remove usage of the deprecated ida_simple_*() API · cd30e8bd

Christophe JAILLET authored Jan 15, 2024

ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().

Note that the upper limit of ida_simple_get() is exclusive, while that
of ida_alloc_max() is inclusive, so 1 has been subtracted.

[ idryomov: tweak changelog ]
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

cd30e8bd

Linux 6.8-rc1 · 6613476e
Linus Torvalds authored Jan 21, 2024

6613476e

Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs · 35a4474b

Linus Torvalds authored Jan 21, 2024

Pull more bcachefs updates from Kent Overstreet:
 "Some fixes, Some refactoring, some minor features:

   - Assorted prep work for disk space accounting rewrite

   - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this
     makes our trigger context more explicit

   - A few fixes to avoid excessive transaction restarts on
     multithreaded workloads: fstests (in addition to ktest tests) are
     now checking slowpath counters, and that's shaking out a few bugs

   - Assorted tracepoint improvements

   - Starting to break up bcachefs_format.h and move on disk types so
     they're with the code they belong to; this will make room to start
     documenting the on disk format better.

   - A few minor fixes"

* tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits)
  bcachefs: Improve inode_to_text()
  bcachefs: logged_ops_format.h
  bcachefs: reflink_format.h
  bcachefs; extents_format.h
  bcachefs: ec_format.h
  bcachefs: subvolume_format.h
  bcachefs: snapshot_format.h
  bcachefs: alloc_background_format.h
  bcachefs: xattr_format.h
  bcachefs: dirent_format.h
  bcachefs: inode_format.h
  bcachefs; quota_format.h
  bcachefs: sb-counters_format.h
  bcachefs: counters.c -> sb-counters.c
  bcachefs: comment bch_subvolume
  bcachefs: bch_snapshot::btime
  bcachefs: add missing __GFP_NOWARN
  bcachefs: opts->compression can now also be applied in the background
  bcachefs: Prep work for variable size btree node buffers
  bcachefs: grab s_umount only if snapshotting
  ...

35a4474b

Merge tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4fbbed78

Linus Torvalds authored Jan 21, 2024

Pull timer updates from Thomas Gleixner:
 "Updates for time and clocksources:

   - A fix for the idle and iowait time accounting vs CPU hotplug.

     The time is reset on CPU hotplug which makes the accumulated
     systemwide time jump backwards.

   - Assorted fixes and improvements for clocksource/event drivers"

* tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug
  clocksource/drivers/ep93xx: Fix error handling during probe
  clocksource/drivers/cadence-ttc: Fix some kernel-doc warnings
  clocksource/drivers/timer-ti-dm: Fix make W=n kerneldoc warnings
  clocksource/timer-riscv: Add riscv_clock_shutdown callback
  dt-bindings: timer: Add StarFive JH8100 clint
  dt-bindings: timer: thead,c900-aclint-mtimer: separate mtime and mtimecmp regs

4fbbed78

Merge tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 7b297a5c

Linus Torvalds authored Jan 21, 2024

Pull powerpc fixes from Aneesh Kumar:

 - Increase default stack size to 32KB for Book3S

Thanks to Michael Ellerman.

* tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Increase default stack size to 32KB

7b297a5c

bcachefs: Improve inode_to_text() · 249f441f

Kent Overstreet authored Jan 21, 2024

Add line breaks - inode_to_text() is now much easier to read.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

249f441f

bcachefs: logged_ops_format.h · d826cc57
Kent Overstreet authored Jan 21, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
d826cc57
bcachefs: reflink_format.h · 8d52ba60
Kent Overstreet authored Jan 21, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
8d52ba60
bcachefs; extents_format.h · b2fa1b63
Kent Overstreet authored Jan 21, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
b2fa1b63
bcachefs: ec_format.h · 0560eb9a
Kent Overstreet authored Jan 21, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
0560eb9a