Commits · 9d82819d5b065348ce623f196bf601028e22ed00 · Kirill Smelkov / linux

07 Apr, 2022 5 commits

SUNRPC: Handle low memory situations in call_status() · 9d82819d

Trond Myklebust authored Apr 07, 2022

We need to handle ENFILE, ENOBUFS, and ENOMEM, because
xprt_wake_pending_tasks() can be called with any one of these due to
socket creation failures.

Fixes: b61d59ff ("SUNRPC: xs_tcp_connect_worker{4,6}: merge common code")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

9d82819d

SUNRPC: Handle ENOMEM in call_transmit_status() · d3c15033

Trond Myklebust authored Apr 06, 2022

Both call_transmit() and call_bc_transmit() can now return ENOMEM, so
let's make sure that we handle the errors gracefully.

Fixes: 0472e476 ("SUNRPC: Convert socket page send code to use iov_iter()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

d3c15033

NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocation · dcc7977c

Muchun Song authored Apr 01, 2022

The commit 5c60e89e ("NFSv4.2: Fix up an invalid combination of memory
allocation flags") has stripped GFP_KERNEL_ACCOUNT down to GFP_KERNEL,
however, it forgot to remove SLAB_ACCOUNT from kmem_cache allocation.
It means that memory is still limited by kmemcg. This patch also fix a
NULL pointer reference issue [1] reported by NeilBrown.

Link: https://lore.kernel.org/all/164870069595.25542.17292003658915487357@noble.neil.brown.name/ [1]
Fixes: 5c60e89e ("NFSv4.2: Fix up an invalid combination of memory allocation flags")
Fixes: 5abc1e37 ("mm: list_lru: allocate list_lru_one only when needed")
Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

dcc7977c

SUNRPC: Ensure we flush any closed sockets before xs_xprt_free() · f0043206

Trond Myklebust authored Apr 03, 2022

We must ensure that all sockets are closed before we call xprt_free()
and release the reference to the net namespace. The problem is that
calling fput() will defer closing the socket until delayed_fput() gets
called.
Let's fix the situation by allowing rpciod and the transport teardown
code (which runs on the system wq) to call __fput_sync(), and directly
close the socket.
Reported-by: Felix Fu <foyjog@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: a73881c9 ("SUNRPC: Fix an Oops in udp_poll()")
Cc: stable@vger.kernel.org # 5.1.x: 3be232f1: SUNRPC: Prevent immediate close+reconnect
Cc: stable@vger.kernel.org # 5.1.x: 89f42494: SUNRPC: Don't call connect() more than once on a TCP socket
Cc: stable@vger.kernel.org # 5.1.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f0043206

NFS: Replace readdir's use of xxhash() with hash_64() · 830f1111

Trond Myklebust authored Mar 30, 2022

Both xxhash() and hash_64() appear to give similarly low collision
rates with a standard linearly increasing readdir offset. They both give
similarly higher collision rates when applied to ext4's offsets.

So switch to using the standard hash_64().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

830f1111

30 Mar, 2022 3 commits

SUNRPC: handle malloc failure in ->request_prepare · eb07d5a4

NeilBrown authored Mar 30, 2022

If ->request_prepare() detects an error, it sets ->rq_task->tk_status.
This is easy for callers to ignore.
The only caller is xprt_request_enqueue_receive() and it does ignore the
error, as does call_encode() which calls it.  This can result in a
request being queued to receive a reply without an allocated receive buffer.

So instead of setting rq_task->tk_status, return an error, and store in
->tk_status only in call_encode();

The call to xprt_request_enqueue_receive() is now earlier in
call_encode(), where the error can still be handled.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

eb07d5a4

NFSv4: fix open failure with O_ACCMODE flag · b243874f

ChenXiaoSong authored Mar 29, 2022

open() with O_ACCMODE|O_DIRECT flags secondly will fail.

Reproducer:
  1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/
  2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT)
  3. close(fd)
  4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT)

Server nfsd4_decode_share_access() will fail with error nfserr_bad_xdr when
client use incorrect share access mode of 0.

Fix this by using NFS4_SHARE_ACCESS_BOTH share access mode in client,
just like firstly opening.

Fixes: ce4ef7c0 ("NFS: Split out NFS v4 file operations")
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

b243874f

Revert "NFSv4: Handle the special Linux file open access mode" · ab0fc21b

ChenXiaoSong authored Mar 29, 2022

This reverts commit 44942b4e.

After secondly opening a file with O_ACCMODE|O_DIRECT flags,
nfs4_valid_open_stateid() will dereference NULL nfs4_state when lseek().

Reproducer:
  1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/
  2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT)
  3. close(fd)
  4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT)
  5. lseek(fd)
Reported-by: Lyu Tao <tao.lyu@epfl.ch>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

ab0fc21b

28 Mar, 2022 1 commit

NFSv4/pNFS: Fix another issue with a list iterator pointing to the head · 7c9d845f

Trond Myklebust authored Mar 28, 2022

In nfs4_callback_devicenotify(), if we don't find a matching entry for
the deviceid, we're left with a pointer to 'struct nfs_server' that
actually points to the list of super blocks associated with our struct
nfs_client.
Furthermore, even if we have a valid pointer, nothing pins the super
block, and so the struct nfs_server could end up getting freed while
we're using it.

Since all we want is a pointer to the struct pnfs_layoutdriver_type,
let's skip all the iteration over super blocks, and just use APIs to
find the layout driver directly.
Reported-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Fixes: 1be5683b ("pnfs: CB_NOTIFY_DEVICEID")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7c9d845f

26 Mar, 2022 1 commit

NFS: Don't loop forever in nfs_do_recoalesce() · d02d81ef

Trond Myklebust authored Mar 25, 2022

If __nfs_pageio_add_request() fails to add the request, it will return
with either desc->pg_error < 0, or mirror->pg_recoalesce will be set, so
we are guaranteed either to exit the function altogether, or to loop.

However if there is nothing left in mirror->pg_list to coalesce, we must
exit, so make sure that we clear mirror->pg_recoalesce every time we
loop.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: 70536bf4 ("NFS: Clean up reset of the mirror accounting variables")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

d02d81ef

25 Mar, 2022 2 commits

SUNRPC: Don't return error values in sysfs read of closed files · ebbe7887

Trond Myklebust authored Mar 24, 2022

Instead of returning an error value, which ends up being the return
value for the read() system call, it is more elegant to simply return
the error as a string value.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

ebbe7887

SUNRPC: Do not dereference non-socket transports in sysfs · 421ab1be

Trond Myklebust authored Mar 25, 2022

Do not cast the struct xprt to a sock_xprt unless we know it is a UDP or
TCP transport. Otherwise the call to lock the mutex will scribble over
whatever structure is actually there. This has been seen to cause hard
system lockups when the underlying transport was RDMA.

Fixes: b49ea673 ("SUNRPC: lock against ->sock changing during sysfs read")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

421ab1be

24 Mar, 2022 3 commits

NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error · 1d15d121

Olga Kornievskaia authored Mar 24, 2022

There is no reason to retry the operation if a session error had
occurred in such case result structure isn't filled out.

Fixes: dff58530 ("NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

1d15d121

SUNRPC don't resend a task on an offlined transport · 82ee41b8

Olga Kornievskaia authored Mar 24, 2022

When a task is being retried, due to an NFS error, if the assigned
transport has been put offline and the task is relocatable pick a new
transport.

Fixes: 6f081693 ("sunrpc: remove an offlined xprt using sysfs")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

82ee41b8

NFS: replace usage of found with dedicated list iterator variable · 3de24f3d

Jakob Koschel authored Mar 24, 2022

To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

3de24f3d

23 Mar, 2022 1 commit

SUNRPC: avoid race between mod_timer() and del_timer_sync() · 3848e96e

NeilBrown authored Mar 08, 2022

xprt_destory() claims XPRT_LOCKED and then calls del_timer_sync().
Both xprt_unlock_connect() and xprt_release() call
 ->release_xprt()
which drops XPRT_LOCKED and *then* xprt_schedule_autodisconnect()
which calls mod_timer().

This may result in mod_timer() being called *after* del_timer_sync().
When this happens, the timer may fire long after the xprt has been freed,
and run_timer_softirq() will probably crash.

The pairing of ->release_xprt() and xprt_schedule_autodisconnect() is
always called under ->transport_lock.  So if we take ->transport_lock to
call del_timer_sync(), we can be sure that mod_timer() will run first
(if it runs at all).

Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

3848e96e

22 Mar, 2022 16 commits

pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod · a245832a

Trond Myklebust authored Mar 21, 2022

Ensure that pNFS file commit allocations in rpciod/nfsiod callbacks can
fail in low memory mode, so that the threads don't block and loop
forever.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

a245832a

pNFS/flexfiles: Ensure pNFS allocation modes are consistent with nfsiod · 3e5f151e

Trond Myklebust authored Mar 21, 2022

Ensure that pNFS flexfile allocations in rpciod/nfsiod callbacks can
fail in low memory mode, so that the threads don't block and loop
forever.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

3e5f151e

NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod · 63d8a41b

Trond Myklebust authored Mar 21, 2022

Ensure that pNFS allocations that can be called from rpciod/nfsiod
callback can fail in low memory mode, so that the threads don't block
and loop forever.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

63d8a41b

NFS: Avoid writeback threads getting stuck in mempool_alloc() · 0bae835b

Trond Myklebust authored Mar 21, 2022

In a low memory situation, allow the NFS writeback code to fail without
getting stuck in infinite loops in mempool_alloc().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0bae835b

NFS: nfsiod should not block forever in mempool_alloc() · 515dcdcd

Trond Myklebust authored Mar 21, 2022

The concern is that since nfsiod is sometimes required to kick off a
commit, it can get locked up waiting forever in mempool_alloc() instead
of failing gracefully and leaving the commit until later.

Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY,
then fall back to a non-blocking attempt to allocate from the memory
pool.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

515dcdcd

SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent · b2648015

Trond Myklebust authored Mar 21, 2022

Make sure that rpciod and xprtiod are always using the same slab
allocation modes.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

b2648015

SUNRPC: Fix unx_lookup_cred() allocation · 059ee82b

Trond Myklebust authored Mar 21, 2022

Default to the same mempool allocation strategy as for rpc_malloc().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

059ee82b

NFS: Fix memory allocation in rpc_alloc_task() · 910ad386

Trond Myklebust authored Mar 21, 2022

As for rpc_malloc(), we first try allocating from the slab, then fall
back to a non-waiting allocation from the mempool.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

910ad386

NFS: Fix memory allocation in rpc_malloc() · 33e5c765

Trond Myklebust authored Mar 14, 2022

When in a low memory situation, we do want rpciod to kick off direct
reclaim in the case where that helps, however we don't want it looping
forever in mempool_alloc().
So first try allocating from the slab using GFP_KERNEL | __GFP_NORETRY,
and then fall back to a GFP_NOWAIT allocation from the mempool.

Ditto for rpc_alloc_task()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

33e5c765

SUNRPC: Improve accuracy of socket ENOBUFS determination · d0afde5f

Trond Myklebust authored Mar 14, 2022

The current code checks for whether or not the socket is in a writeable
state after we get an EAGAIN. That is racy, since we've dropped the
socket lock, so the amount of free buffer may have changed.

Instead, let's check whether the socket is writeable before we try to
write to it. If that was the case, we do expect the message to be at
least partially sent unless we're in a low memory situation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

d0afde5f

SUNRPC: Replace internal use of SOCKWQ_ASYNC_NOSPACE · 2790a624

Trond Myklebust authored Mar 15, 2022

The socket's SOCKWQ_ASYNC_NOSPACE can be cleared by various actors in
the socket layer, so replace it with our own flag in the transport
sock_state field.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

2790a624

SUNRPC: Fix socket waits for write buffer space · 7496b59f

Trond Myklebust authored Mar 14, 2022

The socket layer requires that we use the socket lock to protect changes
to the sock->sk_write_pending field and others.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7496b59f

SUNRPC: Only save the TCP source port after the connection is complete · 3b21f757

Trond Myklebust authored Mar 16, 2022

Since the RPC client uses a non-blocking connect(), we do not expect to
see it return '0' under normal circumstances.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

3b21f757

SUNRPC: Don't call connect() more than once on a TCP socket · 89f42494

Trond Myklebust authored Mar 16, 2022

Avoid socket state races due to repeated calls to ->connect() using the
same socket. If connect() returns 0 due to the connection having
completed, but we are in fact in a closing state, then we may leave the
XPRT_CONNECTING flag set on the transport.
Reported-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
Fixes: 3be232f1 ("SUNRPC: Prevent immediate close+reconnect")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

89f42494

NFS: Fix revalidation of empty readdir pages · e47a62df

Trond Myklebust authored Mar 22, 2022

If the page is empty, we need to check the array->last_cookie instead of
the first entry. Add a helper for the cases where we care.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

e47a62df

NFS: Don't deadlock when cookie hashes collide · 648a4548

Trond Myklebust authored Mar 21, 2022

In the very rare case where the readdir reply contains multiple cookies
that map to the same hash value, we can end up deadlocking waiting for a
page lock that we already hold. In this case we should fail the page
lock by using grab_cache_page_nowait().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

648a4548

21 Mar, 2022 1 commit

NFSv4.1 provide mount option to toggle trunking discovery · a43bf604

Olga Kornievskaia authored Mar 16, 2022

Introduce a new mount option -- trunkdiscovery,notrunkdiscovery -- to
toggle whether or not the client will engage in actively discovery
of trunking locations.

v2 make notrunkdiscovery default
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: 1976b2b3 ("NFSv4.1 query for fs_location attr on a new file system")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

a43bf604

13 Mar, 2022 7 commits

SUNRPC: change locking for xs_swap_enable/disable · 693486d5