Commits · 95d0d30c66b855f614e677b8cd0455eed0765a6f · Kirill Smelkov / linux

25 Jul, 2022 4 commits

SUNRPC create an iterator to list only OFFLINE xprts · 95d0d30c

Create a new iterator helper that will go thru the all the transports
in the switch and return transports that are marked OFFLINE.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

95d0d30c

NFSv4.1 offline trunkable transports on DESTROY_SESSION · 88363d3e

Olga Kornievskaia authored 2 years ago


When session is destroy, some of the transports might no longer be
valid trunks for the new session. Offline existing transports.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

88363d3e

SUNRPC add function to offline remove trunkable transports · 895245cc

Olga Kornievskaia authored 2 years ago


Iterate thru available transports in the xprt_switch for all
trunkable transports offline and possibly remote them as well.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

895245cc

SUNRPC expose functions for offline remote xprt functionality · 7ffcdaa6

Olga Kornievskaia authored 2 years ago


Re-arrange the code that make offline transport and delete transport
callable functions.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7ffcdaa6

23 Jul, 2022 12 commits

SUNRPC: Remove xdr_align_data() and xdr_expand_hole() · 29946fbc

Anna Schumaker authored 2 years ago


These functions are no longer needed now that the NFS client places data
and hole segments directly.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

29946fbc

NFS: Replace the READ_PLUS decoding code · d3b00a80

Anna Schumaker authored 2 years ago

We now take a 2-step process that allows us to place data and hole
segments directly at their final position in the xdr_stream without
needing to do a bunch of redundant copies to expand holes. Due to the
variable lengths of each segment, the xdr metadata might cross page
boundaries which I account for by setting a small scratch buffer so
xdr_inline_decode() won't fail.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

d3b00a80

SUNRPC: Add a function for zeroing out a portion of an xdr_stream · e1bd8760

Anna Schumaker authored 2 years ago

This will be used during READ_PLUS decoding for handling HOLE segments.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

e1bd8760

SUNRPC: Add a function for directly setting the xdr page len · 7c4cd5f4

Anna Schumaker authored 2 years ago

We need to do this step during READ_PLUS decoding so that we know pages
are the right length and any extra data has been preserved in the tail.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7c4cd5f4

SUNRPC: Introduce xdr_stream_move_subsegment() · 4f5f3b60

Anna Schumaker authored 2 years ago


I do this by creating an xdr subsegment for the range we will be
operating over. This lets me shift data to the correct place without
potentially overwriting anything already there.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

4f5f3b60

NFS: Replace fs_context-related dprintk() call sites with tracepoints · 33ce83ef

Chuck Lever authored 2 years ago


Contributed as part of the long patch series that converts NFS from
using dprintk to tracepoints for observability.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

33ce83ef

SUNRPC: Replace dprintk() call site in xs_data_ready · f67939e4

Chuck Lever authored 2 years ago

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f67939e4

SUNRPC: Fail faster on bad verifier · 0701214c

Chuck Lever authored 2 years ago


A bad verifier is not a garbage argument, it's an authentication
failure. Retrying it doesn't make the problem go away, and delays
upper layer recovery steps.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0701214c

nfs: only issue commit in DIO codepath if we have uncommitted data · 69d96651

Jeff Layton authored 2 years ago


Currently, we try to determine whether to issue a commit based on
nfs_write_need_commit which looks at the current verifier. In the case
where we got a short write and then tried to follow it up with one that
failed, the verifier can't be trusted.

What we really want to know is whether the pgio request had any
successful writes that came back as UNSTABLE. Add a new flag to the pgio
request, and use that to indicate that we've had a successful unstable
write. Only issue a commit if that flag is set.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

69d96651

nfs: always check dreq->error after a commit · 55051c0c

Jeff Layton authored 2 years ago

When the client gets back a short DIO write, it will then attempt to
issue another write to finish the DIO request. If that write then fails
(as is often the case in an -ENOSPC situation), then we still may need
to issue a COMMIT if the earlier short write was unstable. If that COMMIT
then succeeds, then we don't want the client to reschedule the write
requests, and to instead just return a short write. Otherwise, we can
end up looping over the same DIO write forever.

Always consult dreq->error after a successful RPC, even when the flag
state is not NFS_ODIRECT_DONE.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2028370

Reported-by: Boyang Xue <bxue@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

55051c0c

nfs: add new nfs_direct_req tracepoint events · 8efc4bbe

Jeff Layton authored 2 years ago


Add some new tracepoints to the DIO write code.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

8efc4bbe

SUNRPC: Shrink size of struct rpc_task · ba8ec7a6

Trond Myklebust authored 2 years ago


Move the field 'tk_rpc_status' so that we eliminate a 4 byte hole in the
structure.
For x86_64, this shrinks the size of the struct by 8 bytes.

'pahole' output before the change:
        /* size: 232, cachelines: 4, members: 27 */
        /* sum members: 222, holes: 1, sum holes: 4 */
        /* sum bitfield members: 8 bits (1 bytes) */
        /* padding: 5 */
        /* last cacheline: 40 bytes */

'pahole' output after the change:
        /* size: 224, cachelines: 4, members: 27 */
        /* padding: 1 */
        /* last cacheline: 32 bytes */
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

ba8ec7a6

13 Jul, 2022 1 commit

NFSv4: Fix races in the legacy idmapper upcall · 51fd2eb5

Trond Myklebust authored 2 years ago

nfs_idmap_instantiate() will cause the process that is waiting in
request_key_with_auxdata() to wake up and exit. If there is a second
process waiting for the idmap->idmap_mutex, then it may wake up and
start a new call to request_key_with_auxdata(). If the call to
idmap_pipe_downcall() from the first process has not yet finished
calling nfs_idmap_complete_pipe_upcall_locked(), then we may end up
triggering the WARN_ON_ONCE() in nfs_idmap_prepare_pipe_upcall().

The fix is to ensure that we clear idmap->idmap_upcall_data before
calling nfs_idmap_instantiate().

Fixes: e9ab41b6

 ("NFSv4: Clean up the legacy idmapper upcall")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

51fd2eb5

12 Jul, 2022 8 commits

NFS: Allow setting rsize / wsize to a multiple of PAGE_SIZE · 940261a1

Anna Schumaker authored 2 years ago


Previously, we required this to value to be a power of 2 for UDP related
reasons. This patch keeps the power of 2 rule for UDP but allows more
flexibility for TCP and RDMA.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

940261a1

sunrpc: fix expiry of auth creds · f1bafa73

Dan Aloni authored 2 years ago

Before this commit, with a large enough LRU of expired items (100), the
loop skipped all the expired items and was entirely ineffectual in
trimming the LRU list.

Fixes: 95cd6232

 ('SUNRPC: Clean up the AUTH cache code')
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f1bafa73

nfs: fix port value parsing · 8b4e87a1

Ian Kent authored 2 years ago


The valid values of nfs options port and mountport are 0 to USHRT_MAX.

The fs parser will return a fail for port values that are negative
and the sloppy option handling then returns success.

But the sloppy option handling is meant to return success for invalid
options not valid options with invalid values.

Restricting the sloppy option override to handle failure returns for
invalid options only is sufficient to resolve this problem.

Changes:

v2: utilize the return value from fs_parse() to resolve this problem
    instead of changing the parameter definitions.
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

8b4e87a1

nfs: Replace kmap() with kmap_local_page() · c77c738c

Fabio M. De Francesco authored 2 years ago


The use of kmap() is being deprecated in favor of kmap_local_page().

With kmap_local_page(), the mapping is per thread, CPU local and not
globally visible. Furthermore, the mapping can be acquired from any context
(including interrupts).

Therefore, use kmap_local_page() in nfs_do_filldir() because this mapping
is per thread, CPU local, and not globally visible.
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c77c738c

NFS: remove redundant code in nfs_file_write() · 064109db

ChenXiaoSong authored 2 years ago

filemap_fdatawait_range() will always return 0, after patch 6c984083


("NFS: Use of mapping_set_error() results in spurious errors"), it will not
save the wb err in struct address_space->flags:

  result = filemap_fdatawait_range(file->f_mapping, ...) = 0
    filemap_check_errors(mapping) = 0
      test_bit(..., &mapping->flags) // flags is 0
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

064109db

nfs/blocklayout: refactor block device opening · f931d837

Christoph Hellwig authored 2 years ago


Deduplicate the helpers to open a device node by passing a name
prefix argument and using the same helper for both kinds of paths.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f931d837

NFSv4.1: Handle NFS4ERR_DELAY replies to OP_SEQUENCE correctly · 7ccafd4b

Trond Myklebust authored 2 years ago

Don't assume that the NFS4ERR_DELAY means that the server is processing
this slot id.

Fixes: 3453d570

 ("NFSv4.1: Avoid false retries when RPC calls are interrupted")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7ccafd4b

NFSv4.1: Don't decrease the value of seq_nr_highest_sent · f07a5d24

Trond Myklebust authored 2 years ago

When we're trying to figure out what the server may or may not have seen
in terms of request numbers, do not assume that requests with a larger
number were missed, just because we saw a reply to a request with a
smaller number.

Fixes: 3453d570

 ("NFSv4.1: Avoid false retries when RPC calls are interrupted")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f07a5d24

10 Jul, 2022 15 commits

NFS: Fix case insensitive renames · 6ca0a6f8

Trond Myklebust authored 2 years ago


For filesystems that are case insensitive and case preserving, we need
to be able to rename from one case folded variant of the filename to
another.
Currently, if we have looked up the target filename before the call to
rename, then we may have a hashed dentry with that target name in the
dcache, causing the vfs to optimise away the rename.
To avoid that, let's drop the target dentry, and leave it to the server
to optimise away the rename if that is the correct thing to do.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6ca0a6f8

pNFS/files: Handle RDMA connection errors correctly · 431794e6

Trond Myklebust authored 2 years ago

The RPC/RDMA driver will return -EPROTO and -ENODEV as connection errors
under certain circumstances. Make sure that we handle them correctly and
avoid cycling forever in a LAYOUTGET/LAYOUTRETURN loop.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

431794e6

pNFS/flexfiles: Report RDMA connection errors to the server · 7836d754

Trond Myklebust authored 2 years ago

The RPC/RDMA driver will return -EPROTO and -ENODEV as connection errors
under certain circumstances. Make sure that we handle them and report
them to the server. If not, we can end up cycling forever in a
LAYOUTGET/LAYOUTRETURN loop.

Fixes: a12f996d

 ("NFSv4/pNFS: Use connections to a DS that are all of the same protocol family")
Cc: stable@vger.kernel.org # 5.11.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7836d754

Revert "pNFS: nfs3_set_ds_client should set NFS_CS_NOPING" · 9597152d

Trond Myklebust authored 2 years ago

This reverts commit c6eb5843.
If a transport is down, then we want to fail over to other transports if
they are listed in the GETDEVICEINFO reply.

Fixes: c6eb5843

 ("pNFS: nfs3_set_ds_client should set NFS_CS_NOPING")
Cc: stable@vger.kernel.org # 5.11.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

9597152d

SUNRPC: Fix an RPC/RDMA performance regression · 4b8dbdfb

Trond Myklebust authored 2 years ago


Use the standard gfp mask instead of using GFP_NOWAIT. The latter causes
issues when under memory pressure.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

4b8dbdfb

Linux 5.19-rc6 · 32346491
Linus Torvalds authored 2 years ago

32346491

Merge branch 'hot-fixes' (fixes for rc6) · 24f4b40e

Linus Torvalds authored 2 years ago

This is a collection of three fixes for small annoyances.

Two of these are already pending in other trees, but I really don't want
to release another -rc with these issues pending, so I picked up the
patches for these things directly.  We'll end up with duplicate commits
eventually, I prefer that over having these issues pending.

The third one is just me getting rid of another BUG_ON() just because it
was reported and I dislike those things so much.

* merge 'hot-fixes' branch:
  ida: don't use BUG_ON() for debugging
  drm/aperture: Run fbdev removal before internal helpers
  ptrace: fix clearing of JOBCTL_TRACED in ptrace_unfreeze_traced()

24f4b40e

ida: don't use BUG_ON() for debugging · fc82bbf4

Linus Torvalds authored 2 years ago

This is another old BUG_ON() that just shouldn't exist (see also commit
a382f8fe: "signal handling: don't use BUG_ON() for debugging").

In fact, as Matthew Wilcox points out, this condition shouldn't really
even result in a warning, since a negative id allocation result is just
a normal allocation failure:

  "I wonder if we should even warn here -- sure, the caller is trying to
   free something that wasn't allocated, but we don't warn for
   kfree(NULL)"

and goes on to point out how that current error check is only causing
people to unnecessarily do their own index range checking before freeing
it.

This was noted by Itay Iellin, because the bluetooth HCI socket cookie
code does *not* do that range checking, and ends up just freeing the
error case too, triggering the BUG_ON().

The HCI code requires CAP_NET_RAW, and seems to just result in an ugly
splat, but there really is no reason to BUG_ON() here, and we have
generally striven for allocation models where it's always ok to just do

    free(alloc());

even if the allocation were to fail for some random reason (usually
obviously that "random" reason being some resource limit).

Fixes: 88eca020

 ("ida: simplified functions for id allocation")
Reported-by: Itay Iellin <ieitayie@gmail.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fc82bbf4

Merge tag 'dmaengine-fix-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine · 952c53cd

Linus Torvalds authored 2 years ago

Pull dmaengine fixes from Vinod Koul:
 "One core fix for DMA_INTERRUPT and rest driver fixes.

  Core:

   - Revert verification of DMA_INTERRUPT capability as that was
     incorrect

  Bunch of driver fixes for:

   - ti: refcount and put_device leak

   - qcom_bam: runtime pm overflow

   - idxd: force wq context cleanup and call idxd_enable_system_pasid()
     on success

   - dw-axi-dmac: RMW on channel suspend register

   - imx-sdma: restart cyclic channel when enabled

   - at_xdma: error handling for at_xdmac_alloc_desc

   - pl330: lockdep warning

   - lgm: error handling path in probe

   - allwinner: Fix min/max typo in binding"

* tag 'dmaengine-fix-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dt-bindings: dma: allwinner,sun50i-a64-dma: Fix min/max typo
  dmaengine: lgm: Fix an error handling path in intel_ldma_probe()
  dmaengine: pl330: Fix lockdep warning about non-static key
  dmaengine: idxd: Only call idxd_enable_system_pasid() if succeeded in enabling SVA feature
  dmaengine: at_xdma: handle errors of at_xdmac_alloc_desc() correctly
  dmaengine: imx-sdma: only restart cyclic channel when enabled
  dmaengine: dw-axi-dmac: Fix RMW on channel suspend register
  dmaengine: idxd: force wq context cleanup on device disable path
  dmaengine: qcom: bam_dma: fix runtime PM underflow
  dmaengine: imx-sdma: Allow imx8m for imx7 FW revs
  dmaengine: Revert "dmaengine: add verification of DMA_INTERRUPT capability for dmatest"
  dmaengine: ti: Add missing put_device in ti_dra7_xbar_route_allocate
  dmaengine: ti: Fix refcount leak in ti_dra7_xbar_route_allocate

952c53cd

Merge tag 'staging-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 5867f3b8

Linus Torvalds authored 2 years ago

Pull staging driver fix from Greg KH:
 "Here is a single staging driver fix for a reported problem that showed
  up in 5.19-rc1 in the wlan-ng driver. It has been in linux-next for a
  week with no reported problems"

* tag 'staging-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging/wlan-ng: get the correct struct hfa384x in work callback

5867f3b8

Merge tag 'char-misc-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · b41362fd

Linus Torvalds authored 2 years ago

Pull char/misc driver fixes from Greg KH:
 "Here are four small char/misc driver fixes for 5.19-rc6 to resolve
  some reported issues. They only affect two drivers:

   - rtsx_usb: fix for of-reported DMA warning error, the driver was
     handling memory buffers in odd ways, it has now been fixed up to be
     much simpler and correct by Shuah.

   - at25 eeprom driver bugfix for reported problem

  All of these have been in linux-next for a week with no reported
  problems"

* tag 'char-misc-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  misc: rtsx_usb: set return value in rsp_buf alloc err path
  misc: rtsx_usb: use separate command and response buffers
  misc: rtsx_usb: fix use of dma mapped buffer for usb bulk transfer
  eeprom: at25: Rework buggy read splitting

b41362fd

Merge tag 'io_uring-5.19-2022-07-09' of git://git.kernel.dk/linux-block · d9919d43

Linus Torvalds authored 2 years ago

Pull io_uring fix from Jens Axboe:
 "A single fix for an issue that came up yesterday that we should plug
  for -rc6.

  This is a regression introduced in this cycle"

* tag 'io_uring-5.19-2022-07-09' of git://git.kernel.dk/linux-block:
  io_uring: check that we have a file table when allocating update slots

d9919d43

Merge tag 'kbuild-fixes-v5.19-3' of... · 2fbd36df

Linus Torvalds authored 2 years ago

Merge tag 'kbuild-fixes-v5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Adjust gen_compile_commands.py to the format change of *.mod files

 - Remove unused macro in scripts/Makefile.modinst

* tag 'kbuild-fixes-v5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: remove unused cmd_none in scripts/Makefile.modinst
  gen_compile_commands: handle multiple lines per .mod file

2fbd36df

Merge tag 'irq_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2b9b31ce

Linus Torvalds authored 2 years ago

Pull irq fixes from Borislav Petkov:

 - Gracefully handle failure to request MMIO resources in the GICv3
   driver

 - Make a static key static in the Apple AIC driver

 - Fix the Xilinx intc driver dependency on OF_ADDRESS

* tag 'irq_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/apple-aic: Make symbol 'use_fast_ipi' static
  irqchip/xilinx: Add explicit dependency on OF_ADDRESS
  irqchip/gicv3: Handle resource request failure consistently

2b9b31ce

Merge tag 'x86_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 74a0032b

Linus Torvalds authored 2 years ago

Pull x86 fixes from Borislav Petkov:

 - Prepare for and clear .brk early in order to address XenPV guests
   failures where the hypervisor verifies page tables and uninitialized
   data in that range leads to bogus failures in those checks

 - Add any potential setup_data entries supplied at boot to the identity
   pagetable mappings to prevent kexec kernel boot failures. Usually,
   this is not a problem for the normal kernel as those mappings are
   part of the initially mapped 2M pages but if kexec gets to allocate
   the second kernel somewhere else, those setup_data entries need to be
   mapped there too.

 - Fix objtool not to discard text references from the __tracepoints
   section so that ENDBR validation still works

 - Correct the setup_data types limit as it is user-visible, before 5.19
   releases

* tag 'x86_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot:...

74a0032b