Commits · b4e89bcba2b3a966e043107cb52c682bb860cee7 · Kirill Smelkov / linux

08 Jul, 2021 19 commits

NFSv4/pnfs: Clean up layout get on open · b4e89bcb

Trond Myklebust authored Jul 02, 2021

Cache the layout in the arguments so we don't have to keep looking it up
from the inode.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

b4e89bcb

NFSv4/pnfs: Fix layoutget behaviour after invalidation · 0b77f97a

Trond Myklebust authored Jul 02, 2021

If the layout gets invalidated, we should wait for any outstanding
layoutget requests for that layout to complete, and we should resend
them only after re-establishing the layout stateid.

Fixes: d29b468d ("pNFS/NFSv4: Improve rejection of out-of-order layouts")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0b77f97a

NFSv4/pnfs: Fix the layout barrier update · aa95edf3

Trond Myklebust authored Jul 02, 2021

If we have multiple outstanding layoutget requests, the current code to
update the layout barrier assumes that the outstanding layout stateids
are updated in order. That's not necessarily the case.

Instead of using the value of lo->plh_outstanding as a guesstimate for
the window of values we need to accept, just wait to update the window
until we're processing the last one. The intention here is just to
ensure that we don't process 2^31 seqid updates without also updating
the barrier.

Fixes: 1bcf34fd ("pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

aa95edf3

NFS: Fix fscache read from NFS after cache error · ba512c1b

Dave Wysochanski authored Jun 29, 2021

Earlier commits refactored some NFS read code and removed
nfs_readpage_async(), but neglected to properly fixup
nfs_readpage_from_fscache_complete().  The code path is
only hit when something unusual occurs with the cachefiles
backing filesystem, such as an IO error or while a cookie
is being invalidated.

Mark page with PG_checked if fscache IO completes in error,
unlock the page, and let the VM decide to re-issue based on
PG_uptodate.  When the VM reissues the readpage, PG_checked
allows us to skip over fscache and read from the server.

Link: https://marc.info/?l=linux-nfs&m=162498209518739
Fixes: 1e83b173 ("NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

ba512c1b

NFS: Ensure nfs_readpage returns promptly when internal error occurs · e0340f16

Dave Wysochanski authored Jun 29, 2021

A previous refactoring of nfs_readpage() might end up calling
wait_on_page_locked_killable() even if readpage_async_filler() failed
with an internal error and pg_error was non-zero (for example, if
nfs_create_request() failed). In the case of an internal error,
skip over wait_on_page_locked_killable() as this is only needed
when the read is sent and an error occurs during completion handling.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

e0340f16

Merge branch 'sysfs-devel' · 526fca37
Trond Myklebust authored Jun 21, 2021
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
526fca37

sunrpc: provide showing transport's state info in the sysfs directory · 681d5699

Olga Kornievskaia authored Jun 08, 2021

In preparation of being able to change the xprt's state, add a way
to show currect state of the transport.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

681d5699

sunrpc: provide multipath info in the sysfs directory · 0e559035

Olga Kornievskaia authored Jun 08, 2021

Allow to query xrpt_switch attributes. Currently showing the following
fields of the rpc_xprt_switch structure: xps_nxprts, xps_nactive,
xps_queuelen.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0e559035

sunrpc: provide transport info in the sysfs directory · 4a09651a

Olga Kornievskaia authored Jun 08, 2021

Allow to query transport's attributes. Currently showing following
fields of the rpc_xprt structure: state, last_used, cong, cwnd,
max_reqs, min_reqs, num_reqs, sizes of queues binding, sending,
pending, backlog.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

4a09651a

sunrpc: add dst_attr attributes to the sysfs xprt directory · 587bc725

Olga Kornievskaia authored Jun 08, 2021

Allow to query and set the destination's address of a transport.
Setting of the destination address is allowed only for TCP or RDMA
based connections.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

587bc725

sunrpc: add add sysfs directory per xprt under each xprt_switch · d408ebe0

Olga Kornievskaia authored Jun 08, 2021

Add individual transport directories under each transport switch
group. For instance, for each nconnect=X connections there will be
a transport directory. Naming conventions also identifies transport
type -- xprt-<id>-<type> where type is udp, tcp, rdma, local, bc.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

d408ebe0

sunrpc: add a symlink from rpc-client directory to the xprt_switch · 2a338a54

Olga Kornievskaia authored Jun 08, 2021

An rpc client uses a transport switch and one ore more transports
associated with that switch. Since transports are shared among
rpc clients, create a symlink into the xprt_switch directory
instead of duplicating entries under each rpc client.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

2a338a54

sunrpc: add xprt_switch direcotry to sunrpc's sysfs · baea9944

Olga Kornievskaia authored Jun 08, 2021

Add xprt_switch directory to the sysfs and create individual
xprt_swith subdirectories for multipath transport group.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

baea9944

sunrpc: keep track of the xprt_class in rpc_xprt structure · d3abc739

Olga Kornievskaia authored Jun 08, 2021

We need to keep track of the type for a given transport.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

d3abc739

sunrpc: add IDs to multipath · 5b926872

Olga Kornievskaia authored Jun 08, 2021

This is used to uniquely identify sunrpc multipath objects in /sys.
Signed-off-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

5b926872

sunrpc: add xprt id · 572caba4

Olga Kornievskaia authored Jun 08, 2021

This adds a unique identifier for a sunrpc transport in sysfs, which is
similarly managed to the unique IDs of clients.
Signed-off-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

572caba4

sunrpc: Create per-rpc_clnt sysfs kobjects · c5a382eb

Olga Kornievskaia authored Jun 08, 2021

These will eventually have files placed under them for sysfs operations.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c5a382eb

sunrpc: Create a client/ subdirectory in the sunrpc sysfs · c441f125

Olga Kornievskaia authored Jun 08, 2021

For network namespace separation.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c441f125

sunrpc: Create a sunrpc directory under /sys/kernel/ · 74678748

Olga Kornievskaia authored Jun 08, 2021

This is where we'll put per-rpc_client related files
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

74678748

29 Jun, 2021 3 commits

Merge branch 'leases-devel' · e9e8ee40
Trond Myklebust authored Jun 21, 2021
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
e9e8ee40

NFSv4: setlease should return EAGAIN if locks are not available · df2c7b95

Trond Myklebust authored Jun 25, 2021

Instead of returning ENOLCK when we can't hand out a lease, we should be
returning EAGAIN.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

df2c7b95

NFS: nfs_find_open_context() may only select open files · e97bc663

Trond Myklebust authored May 11, 2021

If a file has already been closed, then it should not be selected to
support further I/O.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
[Trond: Fix an invalid pointer deref reported by Colin Ian King]

e97bc663

28 Jun, 2021 4 commits

SUNRPC: Should wake up the privileged task firstly. · 5483b904

Zhang Xiaoxu authored Jun 26, 2021

When find a task from wait queue to wake up, a non-privileged task may
be found out, rather than the privileged. This maybe lead a deadlock
same as commit dfe1fe75 ("NFSv4: Fix deadlock between nfs4_evict_inode()
and nfs4_opendata_get_inode()"):

Privileged delegreturn task is queued to privileged list because all
the slots are assigned. If there has no enough slot to wake up the
non-privileged batch tasks(session less than 8 slot), then the privileged
delegreturn task maybe lost waked up because the found out task can't
get slot since the session is on draining.

So we should treate the privileged task as the emergency task, and
execute it as for as we can.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 5fcdfacc ("NFSv4: Return delegations synchronously in evict_inode")
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

5483b904

SUNRPC: Fix the batch tasks count wraparound. · fcb170a9

Zhang Xiaoxu authored Jun 26, 2021

The 'queue->nr' will wraparound from 0 to 255 when only current
priority queue has tasks. This maybe lead a deadlock same as commit
dfe1fe75 ("NFSv4: Fix deadlock between nfs4_evict_inode()
and nfs4_opendata_get_inode()"):

Privileged delegreturn task is queued to privileged list because all
the slots are assigned. When non-privileged task complete and release
the slot, a non-privileged maybe picked out. It maybe allocate slot
failed when the session on draining.

If the 'queue->nr' has wraparound to 255, and no enough slot to
service it, then the privileged delegreturn will lost to wake up.

So we should avoid the wraparound on 'queue->nr'.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 5fcdfacc ("NFSv4: Return delegations synchronously in evict_inode")
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

fcb170a9

NFS: Remove unnecessary inode parameter from nfs_pageio_complete_read() · b42ad64f

Dave Wysochanski authored Jun 24, 2021

Simplify nfs_pageio_complete_read() by using the inode pointer saved
inside nfs_pageio_descriptor.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

b42ad64f

nfs: update has_sec_mnt_opts after cloning lsm options from parent · eae00c5d

Scott Mayhew authored Jun 22, 2021

After calling security_sb_clone_mnt_opts() in nfs_get_root(), it's
necessary to copy the value of has_sec_mnt_opts from the cloned
super_block's nfs_server.  Otherwise, calls to nfs_compare_super()
using this super_block may not return the correct result, leading to
mount failures.

For example, mounting an nfs server with the following in /etc/exports:
/export *(rw,insecure,crossmnt,no_root_squash,security_label)
and having /export/scratch on a separate block device.

mount -o v4.2,context=system_u:object_r:root_t:s0 server:/export/test /mnt/test
mount -o v4.2,context=system_u:object_r:swapfile_t:s0 server:/export/scratch /mnt/scratch

The second mount would fail with "mount.nfs: /mnt/scratch is busy or
already mounted or sharecache fail" and "SELinux: mount invalid.  Same
superblock, different security settings for..." would appear in the
syslog.

Also while we're in there, replace several instances of "NFS_SB(s)"
with "server", which was already declared at the top of the
nfs_get_root().

Fixes: ec1ade6a ("nfs: account for selinux security context when deciding to share superblock")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

eae00c5d

26 Jun, 2021 3 commits

NFS: Avoid duplicate resets of attribute cache timeouts · a9601ac5

Trond Myklebust authored Jun 26, 2021

We know that the attributes changed on the server if and only if the
change attribute is different. Otherwise, we're just refreshing our
cache with values that were already known to be stale.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

a9601ac5

NFSv4: Fix handling of non-atomic change attrbute updates · 20cf7d4e

Trond Myklebust authored Jun 25, 2021

If the change attribute update is declared to be non-atomic by the
server, or our cached value does not match the server's value before the
operation was performed, then we should declare the inode cache invalid.

On the other hand, if the change to the directory raced with a lookup or
getattr which already updated the change attribute, then optimise away
the revalidation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

20cf7d4e

NFS: Fix up inode attribute revalidation timeouts · 213bb584

Trond Myklebust authored Jun 26, 2021

The inode is considered revalidated when we've checked the value of the
change attribute against our cached value since that suffices to
establish whether or not the other cached values are valid.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

213bb584

21 Jun, 2021 3 commits

nfs: fix acl memory leak of posix_acl_create() · 1fcb6fcd

Gao Xiang authored Jun 18, 2021

When looking into another nfs xfstests report, I found acl and
default_acl in nfs3_proc_create() and nfs3_proc_mknod() error
paths are possibly leaked. Fix them in advance.

Fixes: 013cdf10 ("nfs: use generic posix ACL infrastructure for v3 Posix ACLs")
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

1fcb6fcd

SUNRPC: prevent port reuse on transports which don't request it. · bc1c56e9

NeilBrown authored Jun 15, 2021

If an RPC client is created without RPC_CLNT_CREATE_REUSEPORT, it should
not reuse the source port when a TCP connection is re-established.
This is currently implemented by preventing the source port being
recorded after a successful connection (the call to xs_set_srcport()).

However the source port is also recorded after a successful bind in xs_bind().
This may not be needed at all and certainly is not wanted when
RPC_CLNT_CREATE_REUSEPORT wasn't requested.

So avoid that assignment when xprt.reuseport is not set.

With this change, NFSv4.1 and later mounts use a different port number on
each connection. This is helpful with some firewalls which don't cope
well with port reuse.
Signed-off-by: NeilBrown <neilb@suse.de>
Fixes: e6237b6f ("NFSv4.1: Don't rebind to the same source port when reconnecting to the server")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

bc1c56e9

rpc: remove redundant initialization of variable status · bb24cc0f

Colin Ian King authored Jun 13, 2021

The variable status is being initialized with a value that is never
read, the assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

bb24cc0f

13 Jun, 2021 8 commits

sunrpc: Avoid a KASAN slab-out-of-bounds bug in xdr_set_page_base() · 6d1c0f3d

Anna Schumaker authored Jun 09, 2021

This seems to happen fairly easily during READ_PLUS testing on NFS v4.2.
I found that we could end up accessing xdr->buf->pages[pgnr] with a pgnr
greater than the number of pages in the array. So let's just return
early if we're setting base to a point at the end of the page data and
let xdr_set_tail_base() handle setting up the buffer pointers instead.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Fixes: 8d86e373 ("SUNRPC: Clean up helpers xdr_set_iov() and xdr_set_page_base()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6d1c0f3d

NFSv4: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT · 3731d44b

Trond Myklebust authored Jun 11, 2021

Fix an Oopsable condition in pnfs_mark_request_commit() when we're
putting a set of writes on the commit list to reschedule them after a
failed pNFS attempt.

Fixes: 9c455a8c ("NFS/pNFS: Clean up pNFS commit operations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

3731d44b

NFSv4: Initialise connection to the server in nfs4_alloc_client() · dd99e9f9

Trond Myklebust authored Jun 09, 2021

Set up the connection to the NFSv4 server in nfs4_alloc_client(), before
we've added the struct nfs_client to the net-namespace's nfs_client_list
so that a downed server won't cause other mounts to hang in the trunking
detection code.
Reported-by: Michael Wakabayashi <mwakabayashi@vmware.com>
Fixes: 5c6e5b60 ("NFS: Fix an Oops in the pNFS files and flexfiles connection setup to the DS")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

dd99e9f9

NFSv4: Add support for application leases underpinned by a delegation · e93a5e93

Trond Myklebust authored May 07, 2021

If the NFSv4 client already holds a delegation for a file, then we can
support application leases (i.e. fcntl(fd, F_SETLEASE,...)) because the
underlying delegation guarantees that the file is not being modified on
the server by another client in a way that might conflict with the lease
guarantees.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

e93a5e93

NFSv4: Add lease breakpoints in case of a delegation recall or return · 6b4befc0

Trond Myklebust authored May 07, 2021

When we add support for application level leases and knfsd delegations
to the NFS client, we we want to have them safely underpinned by a
"real" delegation to provide the caching guarantees. If that real
delegation is recalled, then we need to ensure that the application
leases/delegations are recalled too.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6b4befc0

NFSv4: Fix delegation return in cases where we have to retry · be200377

Trond Myklebust authored May 08, 2021

If we're unable to immediately recover all locks because the server is
unable to immediately service our reclaim calls, then we want to retry
after we've finished servicing all the other asynchronous delegation
returns on our queue.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

be200377

Linux 5.13-rc6 · 009c9aa5
Linus Torvalds authored Jun 13, 2021

009c9aa5

Merge tag 'perf-tools-fixes-for-v5.13-2021-06-13' of... · e4e45343

Linus Torvalds authored Jun 13, 2021

Merge tag 'perf-tools-fixes-for-v5.13-2021-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Correct buffer copying when peeking events

 - Sync cpufeatures/disabled-features.h header with the kernel sources

* tag 'perf-tools-fixes-for-v5.13-2021-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  tools headers cpufeatures: Sync with the kernel sources
  perf session: Correct buffer copying when peeking events

e4e45343