Commits · 6518204d2304239507236919f70ecf7ff324fe20 · Kirill Smelkov / linux

07 Jan, 2024 39 commits

svcrdma: Update synopsis of svc_rdma_copy_inline_range() · 6518204d

Chuck Lever authored Dec 04, 2023

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_copy_inline_range() can use that recv_ctxt to derive the
read_info rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

6518204d

svcrdma: Update the synopsis of svc_rdma_read_data_item() · 6e4b9b86

Chuck Lever authored Dec 04, 2023

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_data_item() can use that recv_ctxt to derive
that information rather than the other way around. This removes
another usage of the ri_readctxt field, enabling its removal in a
subsequent patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

6e4b9b86

svcrdma: Update synopsis of svc_rdma_read_chunk_range() · c7eb4feb

Chuck Lever authored Dec 04, 2023

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_chunk_range() can use that recv_ctxt to derive
that information rather than the other way around. This removes
another usage of the ri_readctxt field, enabling its removal in a
subsequent patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c7eb4feb

svcrdma: Update synopsis of svc_rdma_build_read_chunk() · 02e8fe1e

Chuck Lever authored Dec 04, 2023

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_chunk() can use that recv_ctxt to derive that
information rather than the other way around. This removes another
usage of the ri_readctxt field, enabling its removal in a
subsequent patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

02e8fe1e

svcrdma: Update synopsis of svc_rdma_build_read_segment() · fc20f19b

Chuck Lever authored Dec 04, 2023

Since the RDMA Read I/O state is now contained in the recv_ctxt,
svc_rdma_build_read_segment() can use the recv_ctxt to derive that
information rather than the other way around. This removes one usage
of the ri_readctxt field, enabling its removal in a subsequent
patch.

At the same time, the use of ri_rqst can similarly be replaced with
a passed-in function parameter.

Start with build_read_segment() because it is a common utility
function at the bottom of the Read chunk path.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

fc20f19b

svcrdma: Move read_info::ri_pageoff into struct svc_rdma_recv_ctxt · 919f6e79

Chuck Lever authored Dec 04, 2023

Further clean up: move the starting byte offset field into
svc_rdma_recv_ctxt.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

919f6e79

svcrdma: Move svc_rdma_read_info::ri_pageno to struct svc_rdma_recv_ctxt · 8e122582

Chuck Lever authored Dec 04, 2023

Further clean up: move the page index field into svc_rdma_recv_ctxt.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

8e122582

svcrdma: Start moving fields out of struct svc_rdma_read_info · b1818412

Chuck Lever authored Dec 04, 2023

Since the request's svc_rdma_recv_ctxt will stay around for the
duration of the RDMA Read operation, the contents of struct
svc_rdma_read_info can reside in the request's svc_rdma_recv_ctxt
rather than being allocated separately. This will eventually save a
call to kmalloc() in a hot path.

Start this clean-up by moving the Read chunk's svc_rdma_chunk_ctxt.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

b1818412

svcrdma: Move struct svc_rdma_chunk_ctxt to svc_rdma.h · 6a04a434

Chuck Lever authored Dec 04, 2023

Prepare for nestling these into the send and recv ctxts so they
no longer have to be allocated dynamically.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

6a04a434

svcrdma: Remove the svc_rdma_chunk_ctxt::cc_rdma field · 2cc0f23b

Chuck Lever authored Dec 04, 2023

In every instance, the pointer address in that field is now
available by other means.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

2cc0f23b

svcrdma: Pass a pointer to the transport to svc_rdma_cc_release() · bc8fd4e9

Chuck Lever authored Dec 04, 2023

Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

bc8fd4e9

svcrdma: Explicitly pass the transport to svc_rdma_post_chunk_ctxt() · 83fe6dd6

Chuck Lever authored Dec 04, 2023

Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

83fe6dd6

svcrdma: Explicitly pass the transport into Read chunk I/O paths · 4a68edd9

Chuck Lever authored Dec 04, 2023

Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

4a68edd9

svcrdma: Explicitly pass the transport into Write chunk I/O paths · c3899b71

Chuck Lever authored Dec 04, 2023

Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma
field.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c3899b71

svcrdma: Acquire the svcxprt_rdma pointer from the CQ context · c4fd9f45

Chuck Lever authored Dec 04, 2023

Enable the removal of the svc_rdma_chunk_ctxt::cc_rdma field in a
subsequent patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c4fd9f45

svcrdma: Reduce size of struct svc_rdma_rw_ctxt · 5ef6c666

Chuck Lever authored Dec 04, 2023

SG_CHUNK_SIZE is 128, making struct svc_rdma_rw_ctxt + the first
SGL array more than 4200 bytes in length, pushing the memory
allocation well into order 1.

Even so, the RDMA rw core doesn't seem to use more than max_send_sge
entries in that array (typically 32 or less), so that is all wasted
space.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

5ef6c666

svcrdma: Update some svcrdma DMA-related tracepoints · 2dd6e29a

Chuck Lever authored Nov 27, 2023

A send/recv_ctxt already records transport-related information
in the cq.id, thus there is no need to record the IP addresses of
the transport endpoints.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

2dd6e29a

svcrdma: DMA error tracepoints should report completion IDs · 848760a9

Chuck Lever authored Nov 27, 2023

Update the DMA error flow tracepoints to report the completion ID of
the failing context. This ties the wait/failure to a particular
operation or request, which is more useful than knowing only the
failing transport.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

848760a9

svcrdma: SQ error tracepoints should report completion IDs · ad3656bd

Chuck Lever authored Nov 27, 2023

Update the Send Queue's error flow tracepoints to report the
completion ID of the waiting or failing context. This ties the
wait/failure to a particular operation or request, which is a little
more useful than knowing only the transport that is about to close.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

ad3656bd

rpcrdma: Introduce a simple cid tracepoint class · be2acb10

Chuck Lever authored Nov 27, 2023

De-duplicate some code, making it easier to add new tracepoints that
report only a completion ID.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

be2acb10

svcrdma: Add lockdep class keys for transport locks · 907e34a7

Chuck Lever authored Nov 27, 2023

Two svcrdma-related transport locks can become quite contended.
Collate their use and make them easy to find in /proc/lock_stat for
better observability.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

907e34a7

svcrdma: Clean up locking · bfb81535

Chuck Lever authored Nov 21, 2023

There's no need to protect llist_entry() with a spin lock.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

bfb81535

svcrdma: Add an async version of svc_rdma_write_info_free() · f09c36c8

Chuck Lever authored Nov 21, 2023

DMA unmapping can take quite some time, so it should not be handled
in a single-threaded completion handler. Defer releasing write_info
structs to the recently-added workqueue.

With this patch, DMA unmapping can be handled in parallel, and it
does not cause head-of-queue blocking of Write completions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

f09c36c8

svcrdma: Add an async version of svc_rdma_send_ctxt_put() · ae225fe2

Chuck Lever authored Nov 21, 2023

DMA unmapping can take quite some time, so it should not be handled
in a single-threaded completion handler. Defer releasing send_ctxts
to the recently-added workqueue.

With this patch, DMA unmapping can be handled in parallel, and it
does not cause head-of-queue blocking of Send completions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

ae225fe2

svcrdma: Add a utility workqueue to svcrdma · 9c7e1a06

Chuck Lever authored Nov 21, 2023

To handle work in the background, set up an UNBOUND workqueue for
svcrdma. Subsequent patches will make use of it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

9c7e1a06

svcrdma: Pre-allocate svc_rdma_recv_ctxt objects · 877118c6

Chuck Lever authored Nov 21, 2023

The original reason for allocating svc_rdma_recv_ctxt objects during
Receive completion was to ensure the objects were allocated on the
NUMA node closest to the underlying IB device.

Since commit c5d68d25 ("svcrdma: Clean up allocation of
svc_rdma_recv_ctxt"), however, the device's favored node is
explicitly passed to the memory allocator.

To enable switching Receive completion to soft IRQ context, move
memory allocation out of completion handling, since it can be
costly, and it can sleep.

A limited number of objects is now allocated at "accept" time.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

877118c6

svcrdma: Eliminate allocation of recv_ctxt objects in backchannel · b541dd55

Chuck Lever authored Nov 21, 2023

The svc_rdma_recv_ctxt free list uses a lockless list to avoid the
need for a spin lock in the fast path. llist_del_first(), which is
used by svc_rdma_recv_ctxt_get(), requires serialization, however,
when there are multiple list producers that are unserialized.

I mistakenly thought there was only one caller of
svc_rdma_recv_ctxt_get() (svc_rdma_refresh_recvs()), thus explicit
serialization would not be necessary. But there is another caller:
svc_rdma_bc_sendto(), and these two are not serialized against each
other. I haven't seen ill effects that I could directly ascribe to
a lack of serialization. It's just an observation based on code
audit.

When DMA-mapping before sending a Reply, the passed-in struct
svc_rdma_recv_ctxt is used only for its write and reply PCLs. These
are currently always empty in the backchannel case. So, instead of
passing a full svc_rdma_recv_ctxt object to
svc_rdma_map_reply_msg(), let's pass in just the Write and Reply
PCLs.

This change makes it unnecessary for the backchannel to acquire a
dummy svc_rdma_recv_ctxt object when sending an RPC Call. The need
for svc_rdma_recv_ctxt free list serialization is now completely
avoided.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

b541dd55

NFSv4, NFSD: move enum nfs_cb_opnum4 to include/linux/nfs4.h · 52e89100

ChenXiaoSong authored Dec 02, 2023

Callback operations enum is defined in client and server, move it to
common header file.
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Acked-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

52e89100

nfsd: remove unnecessary NULL check · 3c86e615

Dan Carpenter authored Dec 04, 2023

We check "state" for NULL on the previous line so it can't be NULL here.
No need to check again.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202312031425.LffZTarR-lkp@intel.com/Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

3c86e615

SUNRPC: Remove RQ_SPLICE_OK · 3587b5c7

Chuck Lever authored Nov 17, 2023

This flag is no longer used.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

3587b5c7

NFSD: Modify NFSv4 to use nfsd_read_splice_ok() · a2c91753

Chuck Lever authored Nov 17, 2023

Avoid the use of an atomic bitop, and prepare for adding a run-time
switch for using splice reads.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

a2c91753

NFSD: Replace RQ_SPLICE_OK in nfsd_read() · c21fd7a8

Chuck Lever authored Nov 17, 2023

RQ_SPLICE_OK is a bit of a layering violation. Also, a subsequent
patch is going to provide a mechanism for always disabling splice
reads.

Splicing is an issue only for NFS READs, so refactor nfsd_read() to
check the auth type directly instead of relying on an rq_flag
setting.

The new helper will be added into the NFSv4 read path in a
subsequent patch.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c21fd7a8

SUNRPC: Add a server-side API for retrieving an RPC's pseudoflavor · deb70428

Chuck Lever authored Nov 17, 2023

NFSD will use this new API to determine whether nfsd_splice_read is
safe to use. This avoids the need to add a dependency to NFSD for
CONFIG_SUNRPC_GSS.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

deb70428

NFSD: Document lack of f_pos_lock in nfsd_readdir() · a853ed55

Chuck Lever authored Nov 19, 2023

Al Viro notes that normal system calls hold f_pos_lock when calling
->iterate_shared and ->llseek; however nfsd_readdir() does not take
that mutex when calling these methods.

It should be safe however because the struct file acquired by
nfsd_readdir() is not visible to other threads.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

a853ed55

NFSD: Remove nfsd_drc_gc() tracepoint · d0ab8b64

Chuck Lever authored Nov 13, 2023

This trace point was for debugging the DRC's garbage collection. In
the field it's just noise.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

d0ab8b64

NFSD: Make the file_delayed_close workqueue UNBOUND · ce7df055

Chuck Lever authored Oct 22, 2023

workqueue: nfsd_file_delayed_close [nfsd] hogged CPU for >13333us 8
	times, consider switching to WQ_UNBOUND

There's no harm in closing a cached file descriptor on another core.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

ce7df055

NFSD: use read_seqbegin() rather than read_seqbegin_or_lock() · f3734cc4

Oleg Nesterov authored Oct 26, 2023

The usage of read_seqbegin_or_lock() in nfsd_copy_write_verifier()
is wrong. "seq" is always even and thus "or_lock" has no effect,
this code can never take ->writeverf_lock for writing.

I guess this is fine, nfsd_copy_write_verifier() just copies 8 bytes
and nfsd_reset_write_verifier() is supposed to be very rare operation
so we do not need the adaptive locking in this case.

Yet the code looks wrong and sub-optimal, it can use read_seqbegin()
without changing the behaviour.

[ cel: Note also that it eliminates this Sparse warning:

fs/nfsd/nfssvc.c:360:6: warning: context imbalance in 'nfsd_copy_write_verifier' -
	different lock contexts for basic block

]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

f3734cc4

nfsd: new Kconfig option for legacy client tracking · 74fd4873

Jeff Layton authored Oct 13, 2023

We've had a number of attempts at different NFSv4 client tracking
methods over the years, but now nfsdcld has emerged as the clear winner
since the others (recoverydir and the usermodehelper upcall) are
problematic.

As a case in point, the recoverydir backend uses MD5 hashes to encode
long form clientid strings, which means that nfsd repeatedly gets dinged
on FIPS audits, since MD5 isn't considered secure. Its use of MD5 is not
cryptographically significant, so there is no danger there, but allowing
us to compile that out allows us to sidestep the issue entirely.

As a prelude to eventually removing support for these client tracking
methods, add a new Kconfig option that enables them. Mark it deprecated
and make it default to N.
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

74fd4873

Linux 6.7 · 0dd3ee31
Linus Torvalds authored Jan 07, 2024

0dd3ee31

06 Jan, 2024 1 commit

Merge tag 'i2c-for-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 52b1853b

Linus Torvalds authored Jan 06, 2024

Pull i2c fixes from Wolfram Sang:
 "Improve the detection when to run atomic transfer handlers for kernels
  with preemption disabled. This removes some false positive splats a
  number of users were seeing if their driver didn't have support for
  atomic transfers.

  Also, fix a typo in the docs while we are here"

* tag 'i2c-for-6.7-final' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: core: Fix atomic xfer check for non-preempt config
  Documentation/i2c: fix spelling error in i2c-address-translators

52b1853b