Commits · 1b83ec057db16b4d0697dc21ef7a9743b6041f72 · Kirill Smelkov / linux

06 Nov, 2021 1 commit

fs: explicitly unregister per-superblock BDIs · 0b3ea092

Add a new SB_I_ flag to mark superblocks that have an ephemeral bdi
associated with them, and unregister it when the superblock is shut
down.

Link: https://lkml.kernel.org/r/20211021124441.668816-4-hch@lst.de

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0b3ea092

09 Aug, 2021 1 commit

block: remove the bd_bdi in struct block_device · a11d7fc2

Christoph Hellwig authored 3 years ago


Just retrieve the bdi from the disk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210809141744.1203023-6-hch@lst.de

Signed-off-by: Jens Axboe <axboe@kernel.dk>

a11d7fc2

01 Jun, 2021 1 commit

block: move bd_mutex to struct gendisk · a8698707

Christoph Hellwig authored 3 years ago


Replace the per-block device bd_mutex with a per-gendisk open_mutex,
thus simplifying locking wherever we deal with partitions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Link: https://lore.kernel.org/r/20210525061301.2242282-4-hch@lst.de

Signed-off-by: Jens Axboe <axboe@kernel.dk>

a8698707

22 Apr, 2021 1 commit

fs,security: Add sb_delete hook · 83e804f0

Mickaël Salaün authored 3 years ago


The sb_delete security hook is called when shutting down a superblock,
which may be useful to release kernel objects tied to the superblock's
lifetime (e.g. inodes).

This new hook is needed by Landlock to release (ephemerally) tagged
struct inodes.  This comes from the unprivileged nature of Landlock
described in the next commit.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-7-mic@digikod.net

Signed-off-by: James Morris <jamorris@linux.microsoft.com>

83e804f0

25 Jan, 2021 1 commit

block: remove the NULL bdev check in bdev_read_only · 6f0d9689

Christoph Hellwig authored 4 years ago


Only a single caller can end up in bdev_read_only, so move the check
there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6f0d9689

21 Jan, 2021 1 commit

fs: fix kernel-doc markups · 961f3c89

Mauro Carvalho Chehab authored 4 years ago


Two markups are at the wrong place. Kernel-doc only
support having the comment just before the identifier.

Also, some identifiers have different names between their
prototypes and the kernel-doc markup.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/96b1e1b388600ab092331f6c4e88ff8e8779ce6c.1610610937.git.mchehab+huawei@kernel.org

Signed-off-by: Jonathan Corbet <corbet@lwn.net>

961f3c89

01 Dec, 2020 2 commits

block: remove i_bdev · 4e7b5671

Christoph Hellwig authored 4 years ago


Switch the block device lookup interfaces to directly work with a dev_t
so that struct block_device references are only acquired by the
blkdev_get variants (and the blk-cgroup special case).  This means that
we now don't need an extra reference in the inode and can generally
simplify handling of struct block_device to keep the lookups contained
in the core block layer code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Coly Li <colyli@suse.de>		[bcache]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4e7b5671

fs: remove get_super_thawed and get_super_exclusive_thawed · 60b49885

Christoph Hellwig authored 4 years ago


Just open code the wait in the only caller of both functions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

60b49885

11 Nov, 2020 3 commits

vfs: move __sb_{start,end}_write* to fs.h · 9b852342

Darrick J. Wong authored 4 years ago


Now that we've straightened out the callers, move these three functions
to fs.h since they're fairly trivial.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>

9b852342

vfs: separate __sb_start_write into blocking and non-blocking helpers · 8a3c84b6

Darrick J. Wong authored 4 years ago


Break this function into two helpers so that it's obvious that the
trylock versions return a value that must be checked, and the blocking
versions don't require that.  While we're at it, clean up the return
type mismatch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>

8a3c84b6

vfs: remove lockdep bogosity in __sb_start_write · 22843291

Darrick J. Wong authored 4 years ago

__sb_start_write has some weird looking lockdep code that claims to
exist to handle nested freeze locking requests from xfs.  The code as
written seems broken -- if we think we hold a read lock on any of the
higher freeze levels (e.g. we hold SB_FREEZE_WRITE and are trying to
lock SB_FREEZE_PAGEFAULT), it converts a blocking lock attempt into a
trylock.

However, it's not correct to downgrade a blocking lock attempt to a
trylock unless the downgrading code or the callers are prepared to deal
with that situation.  Neither __sb_start_write nor its callers handle
this at all.  For example:

sb_start_pagefault ignores the return value completely, with the result
that if xfs_filemap_fault loses a race with a different thread trying to
fsfreeze, it will proceed without pagefault freeze protection (thereby
breaking locking rules) and then unlocks the pagefault freeze lock that
it doesn't own on its way out (thereby corrupting the lock state), which
leads to a system hang shortly afterwards.

Normally, this won't happen because our ownership of a read lock on a
higher freeze protection level blocks fsfreeze from grabbing a write
lock on that higher level.  *However*, if lockdep is offline,
lock_is_held_type unconditionally returns 1, which means that
percpu_rwsem_is_held returns 1, which means that __sb_start_write
unconditionally converts blocking freeze lock attempts into trylocks,
even when we *don't* hold anything that would block a fsfreeze.

Apparently this all held together until 5.10-rc1, when bugs in lockdep
caused lockdep to shut itself off early in an fstests run, and once
fstests gets to the "race writes with freezer" tests, kaboom.  This
might explain the long trail of vanishingly infrequent livelocks in
fstests after lockdep goes offline that I've never been able to
diagnose.

We could fix it by spinning on the trylock if wait==true, but AFAICT the
locking works fine if lockdep is not built at all (and I didn't see any
complaints running fstests overnight), so remove this snippet entirely.

NOTE: Commit f4b554af in 2015 created the current weird logic (which
used to exist in a different form in commit 5accdf82

 from 2012) in
__sb_start_write.  XFS solved this whole problem in the late 2.6 era by
creating a variant of transactions (XFS_TRANS_NO_WRITECOUNT) that don't
grab intwrite freeze protection, thus making lockdep's solution
unnecessary.  The commit claims that Dave Chinner explained that the
trylock hack + comment could be removed, but nobody ever did.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>

22843291

24 Sep, 2020 1 commit

bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag · 1cb039f3

Christoph Hellwig authored 4 years ago


The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it.  This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.

One downside is that we an't support the stable_pages_required bdi
attribute in sysfs anymore.  It is replaced with a queue attribute which
also is writable for easier testing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1cb039f3

29 May, 2020 1 commit

fs: fix indentation in deactivate_super() · cc23402c

Yufen Yu authored 5 years ago


Fix the breaked indent in deactive_super().
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

cc23402c

09 May, 2020 2 commits

bdi: remove the name field in struct backing_dev_info · 1cd925d5

Christoph Hellwig authored 4 years ago


The name is only printed for a not registered bdi in writeback.  Use the
device name there as is more useful anyway for the unlike case that the
warning triggers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1cd925d5

bdi: simplify bdi_alloc · aef33c2f

Christoph Hellwig authored 4 years ago


Merge the _node vs normal version and drop the superflous gfp_t argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

aef33c2f

28 Apr, 2020 1 commit

Fix use after free in get_tree_bdev() · dd7bc815

David Howells authored 4 years ago

Commit 6fcf0c72, a fix to get_tree_bdev() put a missing blkdev_put() in
the wrong place, before a warnf() that displays the bdev under
consideration rather after it.

This results in a silent lockup in printk("%pg") called via warnf() from
get_tree_bdev() under some circumstances when there's a race with the
blockdev being frozen.  This can be caused by xfstests/tests/generic/085 in
combination with Lukas Czerner's ext4 mount API conversion patchset.  It
looks like it ought to occur with other users of get_tree_bdev() such as
XFS, but apparently doesn't.

Fix this by switching the order of the lines.

Fixes: 6fcf0c72

 ("vfs: add missing blkdev_put() in get_tree_bdev()")
Reported-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ian Kent <raven@themaw.net>
cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dd7bc815

18 Dec, 2019 1 commit

fs: call fsnotify_sb_delete after evict_inodes · 1edc8eb2

Eric Sandeen authored 5 years ago


When a filesystem is unmounted, we currently call fsnotify_sb_delete()
before evict_inodes(), which means that fsnotify_unmount_inodes()
must iterate over all inodes on the superblock looking for any inodes
with watches.  This is inefficient and can lead to livelocks as it
iterates over many unwatched inodes.

At this point, SB_ACTIVE is gone and dropping refcount to zero kicks
the inode out out immediately, so anything processed by
fsnotify_sb_delete / fsnotify_unmount_inodes gets evicted in that loop.

After that, the call to evict_inodes will evict everything else with a
zero refcount.

This should speed things up overall, and avoid livelocks in
fsnotify_unmount_inodes().
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

1edc8eb2

10 Oct, 2019 1 commit

vfs: add missing blkdev_put() in get_tree_bdev() · 6fcf0c72

Ian Kent authored 5 years ago


Is there are a couple of missing blkdev_put() in get_tree_bdev()?
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6fcf0c72

06 Sep, 2019 1 commit

vfs: subtype handling moved to fuse · c7eb6869

David Howells authored 6 years ago


The unused vfs code can be removed.  Don't pass empty subtype (same as if
->parse callback isn't called).

The bits that are left involve determining whether it's permitted to split the
filesystem type string passed in to mount(2).  Consequently, this means that we
cannot get rid of the FS_HAS_SUBTYPE flag unless we define that a type string
with a dot in it always indicates a subtype specification.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

c7eb6869

05 Sep, 2019 3 commits

vfs: Add a single-or-reconfig keying to vfs_get_super() · 43ce4c1f

David Howells authored 6 years ago


Add an additional keying mode to vfs_get_super() to indicate that only a
single superblock should exist in the system, and that, if it does, further
mounts should invoke reconfiguration upon it.

This allows mount_single() to be replaced.

[Fix by Eric Biggers folded in]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

43ce4c1f

vfs: Create fs_context-aware mount_bdev() replacement · fe62c3a4

David Howells authored 6 years ago


Create a function, get_tree_bdev(), that is fs_context-aware and a
->get_tree() counterpart of mount_bdev().

It caches the block device pointer in the fs_context struct so that this
information can be passed into sget_fc()'s test and set functions.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: linux-block@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fe62c3a4

new helper: get_tree_keyed() · 533770cc

Al Viro authored 5 years ago


For vfs_get_keyed_super users.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

533770cc

30 Aug, 2019 1 commit

vfs: Add file timestamp range support · 188d20bc

Deepa Dinamani authored 7 years ago


Add fields to the superblock to track the min and max
timestamps supported by filesystems.

Initially, when a superblock is allocated, initialize
it to the max and min values the fields can hold.
Individual filesystems override these to match their
actual limits.

Pseudo filesystems are assumed to always support the
min and max allowable values for the fields.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>

188d20bc

13 Aug, 2019 1 commit

fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl · 22d94f49

Eric Biggers authored 5 years ago

Add a new fscrypt ioctl, FS_IOC_ADD_ENCRYPTION_KEY.  This ioctl adds an
encryption key to the filesystem's fscrypt keyring ->s_master_keys,
making any files encrypted with that key appear "unlocked".

Why we need this
~~~~~~~~~~~~~~~~

The main problem is that the "locked/unlocked" (ciphertext/plaintext)
status of encrypted files is global, but the fscrypt keys are not.
fscrypt only looks for keys in the keyring(s) the process accessing the
filesystem is subscribed to: the thread keyring, process keyring, and
session keyring, where the session keyring may contain the user keyring.

Therefore, userspace has to put fscrypt keys in the keyrings for
individual users or sessions.  But this means that when a process with a
different keyring tries to access encrypted files, whether they appear
"unlocked" or not is nondeterministic.  This is because it depends on
whether the files are currently present in the inode cache.

Fixing this by consistently providing each process its own view of the
filesystem depending on whether it has the key or not isn't feasible due
to how the VFS caches work.  Furthermore, while sometimes users expect
this behavior, it is misguided for two reasons.  First, it would be an
OS-level access control mechanism largely redundant with existing access
control mechanisms such as UNIX file permissions, ACLs, LSMs, etc.
Encryption is actually for protecting the data at rest.

Second, almost all users of fscrypt actually do need the keys to be
global.  The largest users of fscrypt, Android and Chromium OS, achieve
this by having PID 1 create a "session keyring" that is inherited by
every process.  This works, but it isn't scalable because it prevents
session keyrings from being used for any other purpose.

On general-purpose Linux distros, the 'fscrypt' userspace tool [1] can't
similarly abuse the session keyring, so to make 'sudo' work on all
systems it has to link all the user keyrings into root's user keyring
[2].  This is ugly and raises security concerns.  Moreover it can't make
the keys available to system services, such as sshd trying to access the
user's '~/.ssh' directory (see [3], [4]) or NetworkManager trying to
read certificates from the user's home directory (see [5]); or to Docker
containers (see [6], [7]).

By having an API to add a key to the *filesystem* we'll be able to fix
the above bugs, remove userspace workarounds, and clearly express the
intended semantics: the locked/unlocked status of an encrypted directory
is global, and encryption is orthogonal to OS-level access control.

Why not use the add_key() syscall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We use an ioctl for this API rather than the existing add_key() system
call because the ioctl gives us the flexibility needed to implement
fscrypt-specific semantics that will be introduced in later patches:

- Supporting key removal with the semantics such that the secret is
  removed immediately and any unused inodes using the key are evicted;
  also, the eviction of any in-use inodes can be retried.

- Calculating a key-dependent cryptographic identifier and returning it
  to userspace.

- Allowing keys to be added and removed by non-root users, but only keys
  for v2 encryption policies; and to prevent denial-of-service attacks,
  users can only remove keys they themselves have added, and a key is
  only really removed after all users who added it have removed it.

Trying to shoehorn these semantics into the keyrings syscalls would be
very difficult, whereas the ioctls make things much easier.

However, to reuse code the implementation still uses the keyrings
service internally.  Thus we get lockless RCU-mode key lookups without
having to re-implement it, and the keys automatically show up in
/proc/keys for debugging purposes.

References:

    [1] https://github.com/google/fscrypt
    [2] https://goo.gl/55cCrI#heading=h.vf09isp98isb
    [3] https://github.com/google/fscrypt/issues/111#issuecomment-444347939
    [4] https://github.com/google/fscrypt/issues/116
    [5] https://bugs.launchpad.net/ubuntu/+source/fscrypt/+bug/1770715
    [6] https://github.com/google/fscrypt/issues/128
    [7] https://askubuntu.com/questions/1130306/cannot-run-docker-on-an-encrypted-filesystem

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>

22d94f49

31 Jul, 2019 1 commit

Unbreak mount_capable() · c2c44ec2

Al Viro authored 5 years ago


In "consolidate the capability checks in sget_{fc,userns}())" the
wrong argument had been passed to mount_capable() by sget_fc().
That mistake had been further obscured later, when switching
mount_capable() to fs_context has moved the calculation of
bogus argument from sget_fc() to mount_capable() itself.  It
should've been fc->user_ns all along.
Screwed-up-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Christian Brauner <christian@brauner.io>
Tested-by: Christian Brauner <christian@brauner.io>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

c2c44ec2

05 Jul, 2019 2 commits

convenience helper: get_tree_single() · c23a0bba

Al Viro authored 5 years ago


counterpart of mount_single(); switch fusectl to it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

c23a0bba

convenience helper get_tree_nodev() · 2ac295d4

Al Viro authored 5 years ago


counterpart of mount_nodev().  Switch hugetlb and pseudo to it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

2ac295d4

25 May, 2019 9 commits

vfs: Kill sget_userns() · 023d066a

David Howells authored 6 years ago


Kill sget_userns(), folding it into sget() as that's the only remaining
user.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-fsdevel@vger.kernel.org

023d066a

vfs: Provide sb->s_iflags settings in fs_context struct · c80fa7c8

David Howells authored 6 years ago


Provide a field in the fs_context struct through which bits in the
sb->s_iflags superblock field can be set.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-fsdevel@vger.kernel.org

c80fa7c8

move mount_capable() further out · c3aabf07

Al Viro authored 5 years ago


Call graph of vfs_get_tree():
	vfs_fsconfig_locked()	# neither kernmount, nor submount
	do_new_mount()		# neither kernmount, nor submount
	fc_mount()
		afs_mntpt_do_automount()	# submount
		mount_one_hugetlbfs()		# kernmount
		pid_ns_prepare_proc()		# kernmount
		mq_create_mount()		# kernmount
		vfs_kern_mount()
			simple_pin_fs()		# kernmount
			vfs_submount()		# submount
			kern_mount()		# kernmount
			init_mount_tree()
			btrfs_mount()
			nfs_do_root_mount()

	The first two need the check (unconditionally).
init_mount_tree() is setting rootfs up; any capability
checks make zero sense for that one.  And btrfs_mount()/
nfs_do_root_mount() have the checks already done in their
callers.

	IOW, we can shift mount_capable() handling into
the two callers - one in the normal case of mount(2),
another - in fsconfig(2) handling of FSCONFIG_CMD_CREATE.
I.e. the syscalls that set a new filesystem up.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

c3aabf07

move mount_capable() calls to vfs_get_tree() · 059338aa

Al Viro authored 5 years ago


sget_fc() is called only from ->get_tree() instances and
the only instance not calling it is legacy_get_tree(),
which calls mount_capable() directly.

In all sget_fc() callers the checks could be moved to the
very beginning of ->get_tree() - ->user_ns is not changed
in between.  So lifting the checks to the only caller of
->get_tree() is OK.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

059338aa

switch mount_capable() to fs_context · 20284ab7

Al Viro authored 5 years ago


	now both callers of mount_capable() have access to fs_context;
the only difference is that for sget_fc() we have the possibility
of fc->global being true, while for legacy_get_tree() it's guaranteed
to be impossible.  Unify to more generic variant...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

20284ab7

move the capability checks from sget_userns() to legacy_get_tree() · 2527b284

Al Viro authored 5 years ago


1) all call chains leading to sget_userns() pass through ->mount()
instances.
2) none of ->mount() instances is ever called directly - the only
call site is legacy_get_tree()
3) all remaining ->mount() instances end up calling sget_userns()

IOW, we might as well do the capability checks just before calling
->mount().  As for the arguments passed to mount_capable(),
in case of call chains to sget_userns() going through sget(),
we either don't call mount_capable() at all, or pass current_user_ns()
to it.  The call chains going through mount_pseudo_xattr() don't
call mount_capable() at all (SB_KERNMOUNT in flags on those).

That could've been split into smaller steps (lifting the checks
into sget(), then callers of sget(), then all the way to the
entries of every ->mount() out there, then to the sole caller),
but that would be too much churn for little benefit...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

2527b284

vfs: Kill mount_ns() · bb7b6b2b

David Howells authored 6 years ago


Kill mount_ns() as it has been replaced by vfs_get_super() in the new mount
API.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

bb7b6b2b

consolidate the capability checks in sget_{fc,userns}() · 0ce0cf12

Al Viro authored 5 years ago


... into a common helper - mount_capable(type, userns)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

0ce0cf12

start massaging the checks in sget_...(): move to sget_userns() · feb8ae43

Al Viro authored 5 years ago


there are 3 remaining callers of sget_userns() - sget(), mount_ns()
and mount_pseudo_xattr().  Extra check in sget() is conditional
upon mount being neither KERNMOUNT nor SUBMOUNT, the identical one
in mount_ns() - upon being not KERNMOUNT; mount_pseudo_xattr()
has no such checks at all.

However, mount_ns() is never used with SUBMOUNT and mount_pseudo_xattr()
is used only for KERNMOUNT, so both would be fine with the same logics
as currently done in sget(), allowing to consolidate the entire thing
in sget_userns() itself.

That's not where these checks will end up in the long run, though -
the whole reason why they'd been done so deep in the bowels of
mount(2) was that there had been no way for a filesystem to specify
which userns to look at until it has entered ->mount().

Now there is a place where filesystem could override the defaults -
->init_fs_context().  Which allows to pull the checks out into
the callers of vfs_get_tree().  That'll take quite a bit of
massage, but that mess is possible to tease apart.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

feb8ae43

29 Apr, 2019 1 commit

[fix] get rid of checking for absent device name in vfs_get_tree() · ee948837

Al Viro authored 5 years ago

It has no business being there, it's checked by relevant ->get_tree()
as it is *and* it returns the wrong error for no reason whatsoever.

Fixes: f3a09c92

 "introduce fs_context methods"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ee948837

28 Feb, 2019 2 commits

vfs: Add some logging to the core users of the fs_context log · 06a2ae56

David Howells authored 6 years ago


Add some logging to the core users of the fs_context log so that
information can be extracted from them as to the reason for failure.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

06a2ae56

convenience helpers: vfs_get_super() and sget_fc() · cb50b348

Al Viro authored 6 years ago


the former is an analogue of mount_{single,nodev} for use in
->get_tree() instances, the latter - analogue of sget() for the
same.

These are fairly similar to the originals, but the callback signature
for sget_fc() is different from sget() ones, so getting bits and
pieces shared would be too convoluted; we might get around to that
later, but for now let's just remember to keep them in sync.  They
do live next to each other, and changes in either won't be hard
to spot.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

cb50b348

30 Jan, 2019 1 commit
- introduce fs_context methods · f3a09c92
  Al Viro authored 6 years ago
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  f3a09c92