Commits · 02579b2ff8b0becfb51d85a975908ac4ab15fba8 · Kirill Smelkov / linux

17 Sep, 2021 2 commits

nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN · 02579b2f

Dai Ngo authored Sep 16, 2021

When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client
recovers by sending BIND_CONN_TO_SESSION but the server fails to recover
the back channel and leaves it as NFSD4_CB_DOWN.

Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel
by calling nfsd4_probe_callback.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

02579b2f

NLM: Fix svcxdr_encode_owner() · 89c485c7

Chuck Lever authored Sep 16, 2021

Dai Ngo reports that, since the XDR overhaul, the NLM server crashes
when the TEST procedure wants to return NLM_DENIED. There is a bug
in svcxdr_encode_owner() that none of our standard test cases found.

Replace the open-coded function with a call to an appropriate
pre-fabricated XDR helper.
Reported-by: Dai Ngo <Dai.Ngo@oracle.com>
Fixes: a6a63ca5 ("lockd: Common NLM XDR helpers")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

89c485c7

03 Sep, 2021 1 commit

SUNRPC: improve error response to over-size gss credential · 0c217d50

NeilBrown authored Sep 02, 2021

When the NFS server receives a large gss (kerberos) credential and tries
to pass it up to rpc.svcgssd (which is deprecated), it triggers an
infinite loop in cache_read().

cache_request() always returns -EAGAIN, and this causes a "goto again".

This patch:
 - changes the error to -E2BIG to avoid the infinite loop, and
 - generates a WARN_ONCE when rsi_request first sees an over-sized
   credential.  The warning suggests switching to gssproxy.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196583Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

0c217d50

01 Sep, 2021 1 commit

SUNRPC: don't pause on incomplete allocation · e38b3f20

NeilBrown authored Aug 30, 2021

alloc_pages_bulk_array() attempts to allocate at least one page based on
the provided pages, and then opportunistically allocates more if that
can be done without dropping the spinlock.

So if it returns fewer than requested, that could just mean that it
needed to drop the lock.  In that case, try again immediately.

Only pause for a time if no progress could be made.
Reported-and-tested-by: Mike Javorski <mike.javorski@gmail.com>
Reported-and-tested-by: Lothar Paltins <lopa@mailbox.org>
Fixes: f6e70aab ("SUNRPC: refresh rq_pages using a bulk page allocator")
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Mel Gorman <mgorman@suse.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

e38b3f20

26 Aug, 2021 4 commits

nfsd: fix crash on LOCKT on reexported NFSv3 · 0bcc7ca4

J. Bruce Fields authored Aug 24, 2021

Unlike other filesystems, NFSv3 tries to use fl_file in the GETLK case.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

0bcc7ca4

nfs: don't allow reexport reclaims · bb0a55bb

J. Bruce Fields authored Aug 20, 2021

In the reexport case, nfsd is currently passing along locks with the
reclaim bit set.  The client sends a new lock request, which is granted
if there's currently no conflict--even if it's possible a conflicting
lock could have been briefly held in the interim.

We don't currently have any way to safely grant reclaim, so for now
let's just deny them all.

I'm doing this by passing the reclaim bit to nfs and letting it fail the
call, with the idea that eventually the client might be able to do
something more forgiving here.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

bb0a55bb

lockd: don't attempt blocking locks on nfs reexports · b840be2f

J. Bruce Fields authored Aug 20, 2021

As in the v4 case, it doesn't work well to block waiting for a lock on
an nfs filesystem.

As in the v4 case, that means we're depending on the client to poll.
It's probably incorrect to depend on that, but I *think* clients do poll
in practice.  In any case, it's an improvement over hanging the lockd
thread indefinitely as we currently are.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

b840be2f

nfs: don't atempt blocking locks on nfs reexports · f657f8ee

J. Bruce Fields authored Aug 20, 2021

NFS implements blocking locks by blocking inside its lock method. In
the reexport case, this blocks the nfs server thread, which could lead
to deadlocks since an nfs server thread might be required to unlock the
conflicting lock. It also causes a crash, since the nfs server thread
assumes it can free the lock when its lm_notify lock callback is called.

Ideal would be to make the nfs lock method return without blocking in
this case, but for now it works just not to attempt blocking locks. The
difference is just that the original client will have to poll (as it
does in the v4.0 case) instead of getting a callback when the lock's
available.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

f657f8ee

23 Aug, 2021 4 commits

Keep read and write fds with each nlm_file · 7f024fcd

J. Bruce Fields authored Aug 23, 2021

We shouldn't really be using a read-only file descriptor to take a write
lock.

Most filesystems will put up with it.  But NFS, for example, won't.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

7f024fcd

lockd: update nlm_lookup_file reexport comment · b661601a

J. Bruce Fields authored Aug 20, 2021

Update comment to reflect that we *do* allow reexport, whether it's a
good idea or not....
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

b661601a

nlm: minor refactoring · a81041b7

J. Bruce Fields authored Aug 23, 2021

Make this lookup slightly more concise, and prepare for changing how we
look this up in a following patch.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

a81041b7

nlm: minor nlm_lookup_file argument change · 2dc6f19e

J. Bruce Fields authored Aug 23, 2021

It'll come in handy to get the whole nlm_lock.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

2dc6f19e

21 Aug, 2021 1 commit

lockd: lockd server-side shouldn't set fl_ops · 7de875b2

J. Bruce Fields authored Aug 20, 2021

Locks have two sets of op arrays, fl_lmops for the lock manager (lockd
or nfsd), fl_ops for the filesystem.  The server-side lockd code has
been setting its own fl_ops, which leads to confusion (and crashes) in
the reexport case, where the filesystem expects to be the only one
setting fl_ops.

And there's no reason for it that I can see-the lm_get/put_owner ops do
the same job.
Reported-by: Daire Byrne <daire@dneg.com>
Tested-by: Daire Byrne <daire@dneg.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

7de875b2

20 Aug, 2021 4 commits

SUNRPC: Add documentation for the fail_sunrpc/ directory · 400edd8c
Chuck Lever authored Aug 09, 2021
```
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
```
400edd8c

SUNRPC: Server-side disconnect injection · 3a126180

Chuck Lever authored Aug 03, 2021

Disconnect injection stress-tests the ability for both client and
server implementations to behave resiliently in the face of network
instability.

A file called /sys/kernel/debug/fail_sunrpc/ignore-server-disconnect
enables administrators to turn off server-side disconnect injection
while allowing other types of sunrpc errors to be injected. The
default setting is that server-side disconnect injection is enabled
(ignore=false).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

3a126180

SUNRPC: Move client-side disconnect injection · a4ae3081

Chuck Lever authored Aug 05, 2021

Disconnect injection stress-tests the ability for both client and
server implementations to behave resiliently in the face of network
instability.

Convert the existing client-side disconnect injection infrastructure
to use the kernel's generic error injection facility. The generic
facility has a richer set of injection criteria.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

a4ae3081

SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directory · c782af25

Chuck Lever authored Aug 03, 2021

This directory will contain a set of administrative controls for
enabling error injection for kernel RPC consumers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c782af25

19 Aug, 2021 1 commit

svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free() · 729580dd

Chuck Lever authored Aug 18, 2021

svc_xprt_free() already "puts" the bc_xprt before calling the
transport's "free" method. No need to do it twice.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

729580dd

17 Aug, 2021 17 commits

nfsd4: Fix forced-expiry locking · f7104cc1

J. Bruce Fields authored Aug 12, 2021

This should use the network-namespace-wide client_lock, not the
per-client cl_lock.

You shouldn't see any bugs unless you're actually using the
forced-expiry interface introduced by 89c905be.

Fixes: 89c905be "nfsd: allow forced expiration of NFSv4 clients"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

f7104cc1

rpc: fix gss_svc_init cleanup on failure · 5a475344

J. Bruce Fields authored Aug 12, 2021

The failure case here should be rare, but it's obviously wrong.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

5a475344

SUNRPC: Add RPC_AUTH_TLS protocol numbers · b4ab2fea

Chuck Lever authored Jul 30, 2021

Shared by client and server. See:

https://www.iana.org/assignments/rpc-authentication-numbers/rpc-authentication-numbers.xhtmlSigned-off-by: Chuck Lever <chuck.lever@oracle.com>

b4ab2fea

lockd: change the proc_handler for nsm_use_hostnames · d02a3a2c

Jia He authored Aug 03, 2021

nsm_use_hostnames is a module parameter and it will be exported to sysctl
procfs. This is to let user sometimes change it from userspace. But the
minimal unit for sysctl procfs read/write it sizeof(int).
In big endian system, the converting from/to  bool to/from int will cause
error for proc items.

This patch use a new proc_handler proc_dobool to fix it.
Signed-off-by: Jia He <hejianet@gmail.com>
Reviewed-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
[thuth: Fix typo in commit message]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

d02a3a2c

sysctl: introduce new proc handler proc_dobool · a2071573

Jia He authored Aug 03, 2021

This is to let bool variable could be correctly displayed in
big/little endian sysctl procfs. sizeof(bool) is arch dependent,
proc_dobool should work in all arches.
Suggested-by: Pan Xinhui <xinhui@linux.vnet.ibm.com>
Signed-off-by: Jia He <hejianet@gmail.com>
[thuth: rebased the patch to the current kernel version]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

a2071573

SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency() · 5c117207

Chuck Lever authored Aug 05, 2021

Some paths through svc_process() leave rqst->rq_procinfo set to
NULL, which triggers a crash if tracing happens to be enabled.

Fixes: 89ff8749 ("SUNRPC: Display RPC procedure names instead of proc numbers")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

5c117207

NFSD: remove vanity comments · ea49dc79

NeilBrown authored Jul 28, 2021

Including one's name in copyright claims is appropriate.  Including it
in random comments is just vanity.  After 2 decades, it is time for
these to be gone.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

ea49dc79

svcrdma: Convert rdma->sc_rw_ctxts to llist · 07a92d00

Chuck Lever authored Feb 08, 2021

Relieve contention on sc_rw_ctxt_lock by converting rdma->sc_rw_ctxts
to an llist.

The goal is to reduce the average overhead of Send completions,
because a transport's completion handlers are single-threaded on
one CPU core. This change reduces CPU utilization of each Send
completion by 2-3% on my server.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-By: Tom Talpey <tom@talpey.com>

07a92d00

svcrdma: Relieve contention on sc_send_lock. · b6c2bfea

Chuck Lever authored Feb 09, 2021

/proc/lock_stat indicates the the sc_send_lock is heavily
contended when the server is under load from a single client.

To address this, convert the send_ctxt free list to an llist.
Returning an item to the send_ctxt cache is now waitless, which
reduces the instruction path length in the single-threaded Send
handler (svc_rdma_wc_send).

The goal is to enable the ib_comp_wq worker to handle a higher
RPC/RDMA Send completion rate given the same CPU resources. This
change reduces CPU utilization of Send completion by 2-3% on my
server.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-By: Tom Talpey <tom@talpey.com>

b6c2bfea

svcrdma: Fewer calls to wake_up() in Send completion handler · 6c8c84f5

Chuck Lever authored Jul 07, 2021

Because wake_up() takes an IRQ-safe lock, it can be expensive,
especially to call inside of a single-threaded completion handler.
What's more, the Send wait queue almost never has waiters, so
most of the time, this is an expensive no-op.

As always, the goal is to reduce the average overhead of each
completion, because a transport's completion handlers are single-
threaded on one CPU core. This change reduces CPU utilization of
the Send completion thread by 2-3% on my server.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-By: Tom Talpey <tom@talpey.com>

6c8c84f5

lockd: Fix invalid lockowner cast after vfs_test_lock · cd2d644d

Benjamin Coddington authored Jul 26, 2021

After calling vfs_test_lock() the pointer to a conflicting lock can be
returned, and that lock is not guarunteed to be owned by nlm.  In that
case, we cannot cast it to struct nlm_lockowner.  Instead return the pid
of that conflicting lock.

Fixes: 646d73e9 ("lockd: Show pid of lockd for remote locks")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

cd2d644d

NFSD: Use new __string_len C macros for nfsd_clid_class · d27b74a8
Chuck Lever authored May 14, 2021
```
Clean up.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
```
d27b74a8
NFSD: Use new __string_len C macros for the nfs_dirent tracepoint · 408c0de7
Chuck Lever authored May 12, 2021
```
Clean up.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
```
408c0de7

tracing: Add trace_event helper macros __string_len() and __assign_str_len() · 883b4aee

Steven Rostedt (VMware) authored Jul 16, 2021

There's a few cases that a string that is to be recorded in a trace event,
does not have a terminating 'nul' character, and instead, the tracepoint
passes in the length of the string to record.

Add two helper macros to the trace event code that lets this work easier,
than tricks with "%.*s" logic.

  __string_len() which is similar to __string() for declaration, but takes a
                 length argument.

  __assign_str_len() which is similar to __assign_str() for assiging the
                 string, but it too takes a length argument.

Note, the TRACE_EVENT() macro will allocate the location on the ring
buffer to 'len + 1', that will be used to store the string into. It is a
requirement that the 'len' used for this is a most the length of the
string being recorded.

This string can still use __get_str() just like strings created with
__string() can use to retrieve the string.

Link: https://lore.kernel.org/linux-nfs/20210513105018.7539996a@gandalf.local.home/Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

883b4aee

NFSD: Batch release pages during splice read · 496d83cf

Chuck Lever authored Jun 28, 2021

Large splice reads call put_page() repeatedly. put_page() is
relatively expensive to call, so replace it with the new
svc_rqst_replace_page() helper to help amortize that cost.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>

496d83cf

SUNRPC: Add svc_rqst_replace_page() API · 2f0f88f4

Chuck Lever authored Jul 01, 2021

Replacing a page in rq_pages[] requires a get_page(), which is a
bus-locked operation, and a put_page(), which can be even more
costly.

To reduce the cost of replacing a page in rq_pages[], batch the
put_page() operations by collecting "freed" pages in a pagevec,
and then release those pages when the pagevec is full. This
pagevec is also emptied when each RPC completes.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

2f0f88f4

NFSD: Clean up splice actor · c7e0b781

Chuck Lever authored Jun 28, 2021

A few useful observations:

 - The value in @size is never modified.

 - splice_desc.len is an unsigned int, and so is xdr_buf.page_len.
   An implicit cast to size_t is unnecessary.

 - The computation of .page_len is the same in all three arms
   of the "if" statement, so hoist it out to make it clear that
   the operation is an unconditional invariant.

The resulting function is 18 bytes shorter on my system (-Os).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>

c7e0b781

15 Aug, 2021 5 commits

Linux 5.14-rc6 · 7c60610d
Linus Torvalds authored Aug 15, 2021

7c60610d

Merge tag 'powerpc-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · ecf93431

Linus Torvalds authored Aug 15, 2021

Pull powerpc fixes from Michael Ellerman:

 - Fix crashes coming out of nap on 32-bit Book3s (eg. powerbooks).

 - Fix critical and debug interrupts on BookE, seen as crashes when
   using ptrace.

 - Fix an oops when running an SMP kernel on a UP system.

 - Update pseries LPAR security flavor after partition migration.

 - Fix an oops when using kprobes on BookE.

 - Fix oops on 32-bit pmac by not calling do_IRQ() from
   timer_interrupt().

 - Fix softlockups on CPU hotplug into a CPU-less node with xive (P9).

Thanks to Cédric Le Goater, Christophe Leroy, Finn Thain, Geetika
Moolchandani, Laurent Dufour, Laurent Vivier, Nicholas Piggin, Pu Lehui,
Radu Rendec, Srikar Dronamraju, and Stan Johnson.

* tag 'powerpc-5.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/xive: Do not skip CPU-less nodes when creating the IPIs
  powerpc/interrupt: Do not call single_step_exception() from other exceptions
  powerpc/interrupt: Fix OOPS by not calling do_IRQ() from timer_interrupt()
  powerpc/kprobes: Fix kprobe Oops happens in booke
  powerpc/pseries: Fix update of LPAR security flavor after LPM
  powerpc/smp: Fix OOPS in topology_init()
  powerpc/32: Fix critical and debug interrupts on BOOKE
  powerpc/32s: Fix napping restore in data storage interrupt (DSI)

ecf93431

Merge tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c4f14eac

Linus Torvalds authored Aug 15, 2021

Pull irq fixes from Thomas Gleixner:
 "A set of fixes for PCI/MSI and x86 interrupt startup:

   - Mask all MSI-X entries when enabling MSI-X otherwise stale unmasked
     entries stay around e.g. when a crashkernel is booted.

   - Enforce masking of a MSI-X table entry when updating it, which
     mandatory according to speification

   - Ensure that writes to MSI[-X} tables are flushed.

   - Prevent invalid bits being set in the MSI mask register

   - Properly serialize modifications to the mask cache and the mask
     register for multi-MSI.

   - Cure the violation of the affinity setting rules on X86 during
     interrupt startup which can cause lost and stale interrupts. Move
     the initial affinity setting ahead of actualy enabling the
     interrupt.

   - Ensure that MSI interrupts are completely torn down before freeing
     them in the error handling case.

   - Prevent an array out of bounds access in the irq timings code"

* tag 'irq-urgent-2021-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  driver core: Add missing kernel doc for device::msi_lock
  genirq/msi: Ensure deactivation on teardown
  genirq/timings: Prevent potential array overflow in __irq_timings_store()
  x86/msi: Force affinity setup before startup
  x86/ioapic: Force affinity setup before startup
  genirq: Provide IRQCHIP_AFFINITY_PRE_STARTUP
  PCI/MSI: Protect msi_desc::masked for multi-MSI
  PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown()
  PCI/MSI: Correct misleading comments
  PCI/MSI: Do not set invalid bits in MSI mask
  PCI/MSI: Enforce MSI[X] entry updates to be visible
  PCI/MSI: Enforce that MSI-X table entry is masked for update
  PCI/MSI: Mask all unused MSI-X entries
  PCI/MSI: Enable and mask MSI-X early

c4f14eac

Merge tag 'locking_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 839da253

Linus Torvalds authored Aug 15, 2021

Pull locking fix from Borislav Petkov:

 - Fix a CONFIG symbol's spelling

* tag 'locking_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rtmutex: Use the correct rtmutex debugging config option

839da253

Merge tag 'efi_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 12aef8ac

Linus Torvalds authored Aug 15, 2021

Pull EFI fixes from Borislav Petkov:
 "A batch of fixes for the arm64 stub image loader:

   - fix a logic bug that can make the random page allocator fail
     spuriously

   - force reallocation of the Image when it overlaps with firmware
     reserved memory regions

   - fix an oversight that defeated on optimization introduced earlier
     where images loaded at a suitable offset are never moved if booting
     without randomization

   - complain about images that were not loaded at the right offset by
     the firmware image loader"

* tag 'efi_urgent_for_v5.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/libstub: arm64: Double check image alignment at entry
  efi/libstub: arm64: Warn when efi_random_alloc() fails
  efi/libstub: arm64: Relax 2M alignment again for relocatable kernels
  efi/libstub: arm64: Force Image reallocation if BSS was not reserved
  arm64: efi: kaslr: Fix occasional random alloc (and boot) failure

12aef8ac