- 06 Oct, 2021 1 commit
-
-
Benjamin Coddington authored
If nfsd has existing listening sockets without any processes, then an error returned from svc_create_xprt() for an additional transport will remove those existing listeners. We're seeing this in practice when userspace attempts to create rpcrdma transports without having the rpcrdma modules present before creating nfsd kernel processes. Fix this by checking for existing sockets before calling nfsd_destroy(). Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
- 01 Oct, 2021 2 commits
-
-
J. Bruce Fields authored
If sd_max is unsigned, then sd_max - GSS_SEQ_WIN is a very large number whenever sd_max is less than GSS_SEQ_WIN, and the comparison: seq_num <= sd->sd_max - GSS_SEQ_WIN in gss_check_seq_num is pretty much always true, even when that's clearly not what was intended. This was causing pynfs to hang when using krb5, because pynfs uses zero as the initial gss sequence number. That's perfectly legal, but this logic error causes knfsd to drop the rpc in that case. Out-of-order sequence IDs in the first GSS_SEQ_WIN (128) calls will also cause this. Fixes: 10b9d99a ("SUNRPC: Augment server-side rpcgss tracepoints") Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Trond Myklebust authored
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
- 30 Sep, 2021 2 commits
-
-
Trond Myklebust authored
RFC3530 notes that the 'dircount' field may be zero, in which case the recommendation is to ignore it, and only enforce the 'maxcount' field. In RFC5661, this recommendation to ignore a zero valued field becomes a requirement. Fixes: aee37764 ("nfsd4: fix rd_dircount enforcement") Cc: <stable@vger.kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Patrick Ho authored
init_nfsd() should not unregister pernet subsys if the register fails but should instead unwind from the last successful operation which is register_filesystem(). Unregistering a failed register_pernet_subsys() call can result in a kernel GPF as revealed by programmatically injecting an error in register_pernet_subsys(). Verified the fix handled failure gracefully with no lingering nfsd entry in /proc/filesystems. This change was introduced by the commit bd5ae928 ("nfsd: register pernet ops last, unregister first"), the original error handling logic was correct. Fixes: bd5ae928 ("nfsd: register pernet ops last, unregister first") Cc: stable@vger.kernel.org Signed-off-by: Patrick Ho <Patrick.Ho@netapp.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
- 17 Sep, 2021 2 commits
-
-
Dai Ngo authored
When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client recovers by sending BIND_CONN_TO_SESSION but the server fails to recover the back channel and leaves it as NFSD4_CB_DOWN. Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel by calling nfsd4_probe_callback. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Chuck Lever authored
Dai Ngo reports that, since the XDR overhaul, the NLM server crashes when the TEST procedure wants to return NLM_DENIED. There is a bug in svcxdr_encode_owner() that none of our standard test cases found. Replace the open-coded function with a call to an appropriate pre-fabricated XDR helper. Reported-by: Dai Ngo <Dai.Ngo@oracle.com> Fixes: a6a63ca5 ("lockd: Common NLM XDR helpers") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
- 03 Sep, 2021 1 commit
-
-
NeilBrown authored
When the NFS server receives a large gss (kerberos) credential and tries to pass it up to rpc.svcgssd (which is deprecated), it triggers an infinite loop in cache_read(). cache_request() always returns -EAGAIN, and this causes a "goto again". This patch: - changes the error to -E2BIG to avoid the infinite loop, and - generates a WARN_ONCE when rsi_request first sees an over-sized credential. The warning suggests switching to gssproxy. Link: https://bugzilla.kernel.org/show_bug.cgi?id=196583Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
- 01 Sep, 2021 1 commit
-
-
NeilBrown authored
alloc_pages_bulk_array() attempts to allocate at least one page based on the provided pages, and then opportunistically allocates more if that can be done without dropping the spinlock. So if it returns fewer than requested, that could just mean that it needed to drop the lock. In that case, try again immediately. Only pause for a time if no progress could be made. Reported-and-tested-by: Mike Javorski <mike.javorski@gmail.com> Reported-and-tested-by: Lothar Paltins <lopa@mailbox.org> Fixes: f6e70aab ("SUNRPC: refresh rq_pages using a bulk page allocator") Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Mel Gorman <mgorman@suse.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
- 26 Aug, 2021 4 commits
-
-
J. Bruce Fields authored
Unlike other filesystems, NFSv3 tries to use fl_file in the GETLK case. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
J. Bruce Fields authored
In the reexport case, nfsd is currently passing along locks with the reclaim bit set. The client sends a new lock request, which is granted if there's currently no conflict--even if it's possible a conflicting lock could have been briefly held in the interim. We don't currently have any way to safely grant reclaim, so for now let's just deny them all. I'm doing this by passing the reclaim bit to nfs and letting it fail the call, with the idea that eventually the client might be able to do something more forgiving here. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
J. Bruce Fields authored
As in the v4 case, it doesn't work well to block waiting for a lock on an nfs filesystem. As in the v4 case, that means we're depending on the client to poll. It's probably incorrect to depend on that, but I *think* clients do poll in practice. In any case, it's an improvement over hanging the lockd thread indefinitely as we currently are. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
J. Bruce Fields authored
NFS implements blocking locks by blocking inside its lock method. In the reexport case, this blocks the nfs server thread, which could lead to deadlocks since an nfs server thread might be required to unlock the conflicting lock. It also causes a crash, since the nfs server thread assumes it can free the lock when its lm_notify lock callback is called. Ideal would be to make the nfs lock method return without blocking in this case, but for now it works just not to attempt blocking locks. The difference is just that the original client will have to poll (as it does in the v4.0 case) instead of getting a callback when the lock's available. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
- 23 Aug, 2021 4 commits
-
-
J. Bruce Fields authored
We shouldn't really be using a read-only file descriptor to take a write lock. Most filesystems will put up with it. But NFS, for example, won't. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
J. Bruce Fields authored
Update comment to reflect that we *do* allow reexport, whether it's a good idea or not.... Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
J. Bruce Fields authored
Make this lookup slightly more concise, and prepare for changing how we look this up in a following patch. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
J. Bruce Fields authored
It'll come in handy to get the whole nlm_lock. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
- 21 Aug, 2021 1 commit
-
-
J. Bruce Fields authored
Locks have two sets of op arrays, fl_lmops for the lock manager (lockd or nfsd), fl_ops for the filesystem. The server-side lockd code has been setting its own fl_ops, which leads to confusion (and crashes) in the reexport case, where the filesystem expects to be the only one setting fl_ops. And there's no reason for it that I can see-the lm_get/put_owner ops do the same job. Reported-by: Daire Byrne <daire@dneg.com> Tested-by: Daire Byrne <daire@dneg.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
- 20 Aug, 2021 4 commits
-
-
Chuck Lever authored
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Chuck Lever authored
Disconnect injection stress-tests the ability for both client and server implementations to behave resiliently in the face of network instability. A file called /sys/kernel/debug/fail_sunrpc/ignore-server-disconnect enables administrators to turn off server-side disconnect injection while allowing other types of sunrpc errors to be injected. The default setting is that server-side disconnect injection is enabled (ignore=false). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Chuck Lever authored
Disconnect injection stress-tests the ability for both client and server implementations to behave resiliently in the face of network instability. Convert the existing client-side disconnect injection infrastructure to use the kernel's generic error injection facility. The generic facility has a richer set of injection criteria. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Chuck Lever authored
This directory will contain a set of administrative controls for enabling error injection for kernel RPC consumers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
- 19 Aug, 2021 1 commit
-
-
Chuck Lever authored
svc_xprt_free() already "puts" the bc_xprt before calling the transport's "free" method. No need to do it twice. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
- 17 Aug, 2021 17 commits
-
-
J. Bruce Fields authored
This should use the network-namespace-wide client_lock, not the per-client cl_lock. You shouldn't see any bugs unless you're actually using the forced-expiry interface introduced by 89c905be. Fixes: 89c905be "nfsd: allow forced expiration of NFSv4 clients" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
J. Bruce Fields authored
The failure case here should be rare, but it's obviously wrong. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Chuck Lever authored
Shared by client and server. See: https://www.iana.org/assignments/rpc-authentication-numbers/rpc-authentication-numbers.xhtmlSigned-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Jia He authored
nsm_use_hostnames is a module parameter and it will be exported to sysctl procfs. This is to let user sometimes change it from userspace. But the minimal unit for sysctl procfs read/write it sizeof(int). In big endian system, the converting from/to bool to/from int will cause error for proc items. This patch use a new proc_handler proc_dobool to fix it. Signed-off-by: Jia He <hejianet@gmail.com> Reviewed-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> [thuth: Fix typo in commit message] Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Jia He authored
This is to let bool variable could be correctly displayed in big/little endian sysctl procfs. sizeof(bool) is arch dependent, proc_dobool should work in all arches. Suggested-by: Pan Xinhui <xinhui@linux.vnet.ibm.com> Signed-off-by: Jia He <hejianet@gmail.com> [thuth: rebased the patch to the current kernel version] Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Chuck Lever authored
Some paths through svc_process() leave rqst->rq_procinfo set to NULL, which triggers a crash if tracing happens to be enabled. Fixes: 89ff8749 ("SUNRPC: Display RPC procedure names instead of proc numbers") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
NeilBrown authored
Including one's name in copyright claims is appropriate. Including it in random comments is just vanity. After 2 decades, it is time for these to be gone. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Chuck Lever authored
Relieve contention on sc_rw_ctxt_lock by converting rdma->sc_rw_ctxts to an llist. The goal is to reduce the average overhead of Send completions, because a transport's completion handlers are single-threaded on one CPU core. This change reduces CPU utilization of each Send completion by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>
-
Chuck Lever authored
/proc/lock_stat indicates the the sc_send_lock is heavily contended when the server is under load from a single client. To address this, convert the send_ctxt free list to an llist. Returning an item to the send_ctxt cache is now waitless, which reduces the instruction path length in the single-threaded Send handler (svc_rdma_wc_send). The goal is to enable the ib_comp_wq worker to handle a higher RPC/RDMA Send completion rate given the same CPU resources. This change reduces CPU utilization of Send completion by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>
-
Chuck Lever authored
Because wake_up() takes an IRQ-safe lock, it can be expensive, especially to call inside of a single-threaded completion handler. What's more, the Send wait queue almost never has waiters, so most of the time, this is an expensive no-op. As always, the goal is to reduce the average overhead of each completion, because a transport's completion handlers are single- threaded on one CPU core. This change reduces CPU utilization of the Send completion thread by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>
-
Benjamin Coddington authored
After calling vfs_test_lock() the pointer to a conflicting lock can be returned, and that lock is not guarunteed to be owned by nlm. In that case, we cannot cast it to struct nlm_lockowner. Instead return the pid of that conflicting lock. Fixes: 646d73e9 ("lockd: Show pid of lockd for remote locks") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Chuck Lever authored
Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Chuck Lever authored
Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Steven Rostedt (VMware) authored
There's a few cases that a string that is to be recorded in a trace event, does not have a terminating 'nul' character, and instead, the tracepoint passes in the length of the string to record. Add two helper macros to the trace event code that lets this work easier, than tricks with "%.*s" logic. __string_len() which is similar to __string() for declaration, but takes a length argument. __assign_str_len() which is similar to __assign_str() for assiging the string, but it too takes a length argument. Note, the TRACE_EVENT() macro will allocate the location on the ring buffer to 'len + 1', that will be used to store the string into. It is a requirement that the 'len' used for this is a most the length of the string being recorded. This string can still use __get_str() just like strings created with __string() can use to retrieve the string. Link: https://lore.kernel.org/linux-nfs/20210513105018.7539996a@gandalf.local.home/Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Chuck Lever authored
Large splice reads call put_page() repeatedly. put_page() is relatively expensive to call, so replace it with the new svc_rqst_replace_page() helper to help amortize that cost. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de>
-
Chuck Lever authored
Replacing a page in rq_pages[] requires a get_page(), which is a bus-locked operation, and a put_page(), which can be even more costly. To reduce the cost of replacing a page in rq_pages[], batch the put_page() operations by collecting "freed" pages in a pagevec, and then release those pages when the pagevec is full. This pagevec is also emptied when each RPC completes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-
Chuck Lever authored
A few useful observations: - The value in @size is never modified. - splice_desc.len is an unsigned int, and so is xdr_buf.page_len. An implicit cast to size_t is unnecessary. - The computation of .page_len is the same in all three arms of the "if" statement, so hoist it out to make it clear that the operation is an unconditional invariant. The resulting function is 18 bytes shorter on my system (-Os). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neilb@suse.de>
-