Commits · 793f5c2cca10d0f33d38563c142ee00942c3e21e · Kirill Smelkov / linux

13 Apr, 2023 4 commits

Merge tag 'scrub-fix-legalese-6.4_2023-04-11' of... · 793f5c2c

Dave Chinner authored Apr 14, 2023

Merge tag 'scrub-fix-legalese-6.4_2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into guilt/xfs-for-next

xfs_scrub: fix licensing and copyright notices [v24.5]

Fix various attribution problems in the xfs_scrub source code, such as
the author's contact information, out of date SPDX tags, and a rough
estimate of when the feature was under heavy development. The most
egregious parts are the files that are missing license information
completely.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>

793f5c2c

Merge tag 'pass-perag-refs-6.4_2023-04-11' of... · 1e5ffdc5

Dave Chinner authored Apr 14, 2023

Merge tag 'pass-perag-refs-6.4_2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into guilt/xfs-for-next

xfs: pass perag references around when possible [v24.5]

Avoid the cost of perag radix tree lookups by passing around active perag
references when possible.

v24.2: rework some of the naming and whatnot so there's less opencoding
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>

1e5ffdc5

Merge tag 'intents-perag-refs-6.4_2023-04-11' of... · 826053db

Dave Chinner authored Apr 14, 2023

Merge tag 'intents-perag-refs-6.4_2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into guilt/xfs-for-next

xfs: make intent items take a perag reference [v24.5]

Now that we've cleaned up some code warts in the deferred work item
processing code, let's make intent items take an active perag reference
from their creation until they are finally freed by the defer ops
machinery. This change facilitates the scrub drain in the next patchset
and will make it easier for the future AG removal code to detect a busy
AG in need of quiescing.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>

826053db

Merge tag 'online-fsck-design-6.4_2023-04-11' of... · bed25d80

Dave Chinner authored Apr 14, 2023

Merge tag 'online-fsck-design-6.4_2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into guilt/xfs-for-next

xfs: design documentation for online fsck [v24.5]

After six years of development and a nearly two year hiatus from
patchbombing, I think it is time to resume the process of merging the
online fsck feature into XFS. The full patchset comprises 105 separate
patchsets that capture 470 patches across the kernel, xfsprogs, and
fstests projects.

I would like to merge this feature into upstream in time for the 2023
LTS kernel. As of 5.15 (aka last year's LTS), we have merged all
generally useful infrastructure improvements into the regular
filesystem. The only changes to the core filesystem that remain are the
ones that are only useful to online fsck itself. In other words, the
vast majority of the new code in the patchsets comprising the online
fsck feature are is mostly self contained and can be turned off via
Kconfig.

Many of you readers might be wondering -- why have I chosen to make one
large submission with 100+ patchsets comprising ~500 patches? Why
didn't I merge small pieces of functionality bit by bit and revise
common code as necessary? Well, the simple answer is that in the past
six years, the fundamental algorithms have been revised repeatedly as
I've built out the functionality. In other words, the codebase as it is
now has the benefit that I now know every piece that's necessary to get
the job done in a reasonable manner and within the constraints laid out
by community reviews. I believe this has reduced code churn in mainline
and freed up my time so that I can iterate faster.

As a concession to the mail servers, I'm breaking up the submission into
smaller pieces; I'm only pushing the design document and the revisions
to the existing scrub code, which is the first 20%% of the patches.
Also, I'm arbitrarily restarting the version numbering by reversioning
all patchsets from version 22 to epoch 23, version 1.

The big question to everyone reading this is: How might I convince you
that there is more merit in merging the whole feature and dealing with
the consequences than continuing to maintain it out of tree?

---------

To prepare the XFS community and potential patch reviewers for the
upstream submission of the online fsck feature, I decided to write a
document capturing the broader picture behind the online repair
development effort. The document begins by defining the problems that
online fsck aims to solve and outlining specific use cases for the
functionality.

Using that as a base, the rest of the design document presents the high
level algorithms that fulfill the goals set out at the start and the
interactions between the large pieces of the system. Case studies round
out the design documentation by adding the details of exactly how
specific parts of the online fsck code integrate the algorithms with the
filesystem.

The goal of this effort is to help the XFS community understand how the
gigantic online repair patchset works. The questions I submit to the
community reviewers are:

1. As you read the design doc (and later the code), do you feel that you
understand what's going on well enough to try to fix a bug if you
found one?

2. What sorts of interactions between systems (or between scrub and the
rest of the kernel) am I missing?

3. Do you feel confident enough in the implementation as it is now that
the benefits of merging the feature (as EXPERIMENTAL) outweigh any
potential disruptions to XFS at large?

4. Are there problematic interactions between subsystems that ought to
be cleared up before merging?

5. Can I just merge all of this?

I intend to commit this document to the kernel's documentation directory
when we start merging the patchset, albeit without the links to
git.kernel.org. A much more readable version of this is posted at:
https://djwong.org/docs/xfs-online-fsck-design/

v2: add missing sections about: all the in-kernel data structures and
new apis that the scrub and repair functions use; how xattrs and
directories are checked; how space btree records are checked; and
add more details to the parts where all these bits tie together.
Proofread for verb tense inconsistencies and eliminate vague 'we'
usage. Move all the discussion of what we can do with pageable
kernel memory into a single source file and section. Document where
log incompat feature locks fit into the locking model.

v3: resync with 6.0, fix a few typos, begin discussion of the merging
plan for this megapatchset.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>

bed25d80

12 Apr, 2023 27 commits

xfs: fix BUG_ON in xfs_getbmap() · 8ee81ed5

Ye Bin authored Apr 12, 2023

There's issue as follows:
XFS: Assertion failed: (bmv->bmv_iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c, line: 329
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:102!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 14612 Comm: xfs_io Not tainted 6.3.0-rc2-next-20230315-00006-g2729d23ddb3b-dirty #422
RIP: 0010:assfail+0x96/0xa0
RSP: 0018:ffffc9000fa178c0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff888179a18000
RDX: 0000000000000000 RSI: ffff888179a18000 RDI: 0000000000000002
RBP: 0000000000000000 R08: ffffffff8321aab6 R09: 0000000000000000
R10: 0000000000000001 R11: ffffed1105f85139 R12: ffffffff8aacc4c0
R13: 0000000000000149 R14: ffff888269f58000 R15: 000000000000000c
FS:  00007f42f27a4740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000b92388 CR3: 000000024f006000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 xfs_getbmap+0x1a5b/0x1e40
 xfs_ioc_getbmap+0x1fd/0x5b0
 xfs_file_ioctl+0x2cb/0x1d50
 __x64_sys_ioctl+0x197/0x210
 do_syscall_64+0x39/0xb0
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Above issue may happen as follows:
         ThreadA                       ThreadB
do_shared_fault
 __do_fault
  xfs_filemap_fault
   __xfs_filemap_fault
    filemap_fault
                             xfs_ioc_getbmap -> Without BMV_IF_DELALLOC flag
			      xfs_getbmap
			       xfs_ilock(ip, XFS_IOLOCK_SHARED);
			       filemap_write_and_wait
 do_page_mkwrite
  xfs_filemap_page_mkwrite
   __xfs_filemap_fault
    xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
    iomap_page_mkwrite
     ...
     xfs_buffered_write_iomap_begin
      xfs_bmapi_reserve_delalloc -> Allocate delay extent
                              xfs_ilock_data_map_shared(ip)
	                      xfs_getbmap_report_one
			       ASSERT((bmv->bmv_iflags & BMV_IF_DELALLOC) != 0)
	                        -> trigger BUG_ON

As xfs_filemap_page_mkwrite() only hold XFS_MMAPLOCK_SHARED lock, there's
small window mkwrite can produce delay extent after file write in xfs_getbmap().
To solve above issue, just skip delalloc extents.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

8ee81ed5

xfs: verify buffer contents when we skip log replay · 22ed903e

Darrick J. Wong authored Apr 12, 2023

syzbot detected a crash during log recovery:

XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791
XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200.
XFS (loop0): Starting recovery (logdev: internal)
==================================================================
BUG: KASAN: slab-out-of-bounds in xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
Read of size 8 at addr ffff88807e89f258 by task syz-executor132/5074

CPU: 0 PID: 5074 Comm: syz-executor132 Not tainted 6.2.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
 print_address_description+0x74/0x340 mm/kasan/report.c:306
 print_report+0x107/0x1f0 mm/kasan/report.c:417
 kasan_report+0xcd/0x100 mm/kasan/report.c:517
 xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
 xfs_btree_lookup+0x346/0x12c0 fs/xfs/libxfs/xfs_btree.c:1913
 xfs_btree_simple_query_range+0xde/0x6a0 fs/xfs/libxfs/xfs_btree.c:4713
 xfs_btree_query_range+0x2db/0x380 fs/xfs/libxfs/xfs_btree.c:4953
 xfs_refcount_recover_cow_leftovers+0x2d1/0xa60 fs/xfs/libxfs/xfs_refcount.c:1946
 xfs_reflink_recover_cow+0xab/0x1b0 fs/xfs/xfs_reflink.c:930
 xlog_recover_finish+0x824/0x920 fs/xfs/xfs_log_recover.c:3493
 xfs_log_mount_finish+0x1ec/0x3d0 fs/xfs/xfs_log.c:829
 xfs_mountfs+0x146a/0x1ef0 fs/xfs/xfs_mount.c:933
 xfs_fs_fill_super+0xf95/0x11f0 fs/xfs/xfs_super.c:1666
 get_tree_bdev+0x400/0x620 fs/super.c:1282
 vfs_get_tree+0x88/0x270 fs/super.c:1489
 do_new_mount+0x289/0xad0 fs/namespace.c:3145
 do_mount fs/namespace.c:3488 [inline]
 __do_sys_mount fs/namespace.c:3697 [inline]
 __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f89fa3f4aca
Code: 83 c4 08 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd5fb5ef8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00646975756f6e2c RCX: 00007f89fa3f4aca
RDX: 0000000020000100 RSI: 0000000020009640 RDI: 00007fffd5fb5f10
RBP: 00007fffd5fb5f10 R08: 00007fffd5fb5f50 R09: 000000000000970d
R10: 0000000000200800 R11: 0000000000000206 R12: 0000000000000004
R13: 0000555556c6b2c0 R14: 0000000000200800 R15: 00007fffd5fb5f50
 </TASK>

The fuzzed image contains an AGF with an obviously garbage
agf_refcount_level value of 32, and a dirty log with a buffer log item
for that AGF.  The ondisk AGF has a higher LSN than the recovered log
item.  xlog_recover_buf_commit_pass2 reads the buffer, compares the
LSNs, and decides to skip replay because the ondisk buffer appears to be
newer.

Unfortunately, the ondisk buffer is corrupt, but recovery just read the
buffer with no buffer ops specified:

	error = xfs_buf_read(mp->m_ddev_targp, buf_f->blf_blkno,
			buf_f->blf_len, buf_flags, &bp, NULL);

Skipping the buffer leaves its contents in memory unverified.  This sets
us up for a kernel crash because xfs_refcount_recover_cow_leftovers
reads the buffer (which is still around in XBF_DONE state, so no read
verification) and creates a refcountbt cursor of height 32.  This is
impossible so we run off the end of the cursor object and crash.

Fix this by invoking the verifier on all skipped buffers and aborting
log recovery if the ondisk buffer is corrupt.  It might be smarter to
force replay the log item atop the buffer and then see if it'll pass the
write verifier (like ext4 does) but for now let's go with the
conservative option where we stop immediately.

Link: https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994eSigned-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

22ed903e

xfs: _{attr,data}_map_shared should take ILOCK_EXCL until iread_extents is completely done · c95356ca

Darrick J. Wong authored Apr 12, 2023

While fuzzing the data fork extent count on a btree-format directory
with xfs/375, I observed the following (excerpted) splat:

XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_bmap.c, line: 1208
------------[ cut here ]------------
WARNING: CPU: 0 PID: 43192 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
Call Trace:
 <TASK>
 xfs_iread_extents+0x1af/0x210 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xchk_dir_walk+0xb8/0x190 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xchk_parent_count_parent_dentries+0x41/0x80 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xchk_parent_validate+0x199/0x2e0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xchk_parent+0xdf/0x130 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xfs_scrub_metadata+0x2b8/0x730 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xfs_scrubv_metadata+0x38b/0x4d0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xfs_ioc_scrubv_metadata+0x111/0x160 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 xfs_file_ioctl+0x367/0xf50 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
 __x64_sys_ioctl+0x82/0xa0
 do_syscall_64+0x2b/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

The cause of this is a race condition in xfs_ilock_data_map_shared,
which performs an unlocked access to the data fork to guess which lock
mode it needs:

Thread 0                          Thread 1

xfs_need_iread_extents
<observe no iext tree>
xfs_ilock(..., ILOCK_EXCL)
xfs_iread_extents
<observe no iext tree>
<check ILOCK_EXCL>
<load bmbt extents into iext>
<notice iext size doesn't
 match nextents>
                                  xfs_need_iread_extents
                                  <observe iext tree>
                                  xfs_ilock(..., ILOCK_SHARED)
<tear down iext tree>
xfs_iunlock(..., ILOCK_EXCL)
                                  xfs_iread_extents
                                  <observe no iext tree>
                                  <check ILOCK_EXCL>
                                  *BOOM*

Fix this race by adding a flag to the xfs_ifork structure to indicate
that we have not yet read in the extent records and changing the
predicate to look at the flag state, not if_height.  The memory barrier
ensures that the flag will not be set until the very end of the
function.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

c95356ca

xfs: remove WARN when dquot cache insertion fails · 4b827b3f

Dave Chinner authored Apr 12, 2023

It just creates unnecessary bot noise these days.

Reported-by: syzbot+6ae213503fb12e87934f@syzkaller.appspotmail.com
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>

4b827b3f

xfs: don't consider future format versions valid · aa880198

Dave Chinner authored Apr 12, 2023

In commit fe08cc50 we reworked the valid superblock version
checks. If it is a V5 filesystem, it is always valid, then we
checked if the version was less than V4 (reject) and then checked
feature fields in the V4 flags to determine if it was valid.

What we missed was that if the version is not V4 at this point,
we shoudl reject the fs. i.e. the check current treats V6+
filesystems as if it was a v4 filesystem. Fix this.

cc: stable@vger.kernel.org
Fixes: fe08cc50 ("xfs: open code sb verifier feature checks")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>

aa880198

xfs: update copyright years for scrub/ files · ecc73f8a

Darrick J. Wong authored Apr 11, 2023

Update the copyright years in the scrub/ source code files.  This isn't
required, but it's helpful to remind myself just how long it's taken to
develop this feature.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

ecc73f8a

xfs: fix author and spdx headers on scrub/ files · 739a2fe0

Darrick J. Wong authored Apr 11, 2023

Fix the spdx tags to match current practice, and update the author
contact information.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

739a2fe0

xfs: create traced helper to get extra perag references · 9b2e5a23

Darrick J. Wong authored Apr 11, 2023

There are a few places in the XFS codebase where a caller has either an
active or a passive reference to a perag structure and wants to give
a passive reference to some other piece of code.  Btree cursor creation
and inode walks are good examples of this.  Replace the open-coded logic
with a helper to do this.

The new function adds a few safeguards -- it checks that there's at
least one reference to the perag structure passed in, and it records the
refcount bump in the ftrace information.  This makes it much easier to
debug perag refcounting problems.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

9b2e5a23

xfs: give xfs_refcount_intent its own perag reference · 00e7b3ba

Darrick J. Wong authored Apr 11, 2023

Give the xfs_refcount_intent a passive reference to the perag structure
data.  This reference will be used to enable scrub intent draining
functionality in subsequent patches.  Any space being modified by a
refcount intent is already allocated, so we need to be able to operate
even if the AG is being shrunk or offlined.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

00e7b3ba

xfs: give xfs_rmap_intent its own perag reference · c13418e8

Darrick J. Wong authored Apr 11, 2023

Give the xfs_rmap_intent a passive reference to the perag structure
data.  This reference will be used to enable scrub intent draining
functionality in subsequent patches.  The space we're (reverse) mapping
is already allocated, so we need to be able to operate even if the AG is
being shrunk or offlined.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

c13418e8

xfs: give xfs_extfree_intent its own perag reference · f6b38463

Darrick J. Wong authored Apr 11, 2023

Give the xfs_extfree_intent an passive reference to the perag structure
data.  This reference will be used to enable scrub intent draining
functionality in subsequent patches.  The space being freed must already
be allocated, so we need to able to run even if the AG is being offlined
or shrunk.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

f6b38463

xfs: pass per-ag references to xfs_free_extent · b2ccab31

Darrick J. Wong authored Apr 11, 2023

Pass a reference to the per-AG structure to xfs_free_extent.  Most
callers already have one, so we can eliminate unnecessary lookups.  The
one exception to this is the EFI code, which the next patch will fix.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

b2ccab31

xfs: give xfs_bmap_intent its own perag reference · 774a99b4

Darrick J. Wong authored Apr 11, 2023

Give the xfs_bmap_intent an active reference to the perag structure
data.  This reference will be used to enable scrub intent draining
functionality in subsequent patches.  Later, shrink will use these
passive references to know if an AG is quiesced or not.

The reason why we take a passive ref for a file mapping operation is
simple: we're committing to some sort of action involving space in an
AG, so we want to indicate our interest in that AG.  The space is
already allocated, so we need to be able to operate on AGs that are
offline or being shrunk.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

774a99b4

xfs: document future directions of online fsck · 03786f0a

Darrick J. Wong authored Apr 11, 2023

Add the seventh and final chapter of the online fsck documentation,
where we talk about future functionality that can tie in with the
functionality provided by the online fsck patchset.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

03786f0a

xfs: document the userspace fsck driver program · af051dfb

Darrick J. Wong authored Apr 11, 2023

Add the sixth chapter of the online fsck design documentation, where
we discuss the details of the data structures and algorithms used by the
driver program xfs_scrub.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

af051dfb

xfs: document directory tree repairs · a26aa252

Darrick J. Wong authored Apr 11, 2023

Directory tree repairs are the least complete part of online fsck, due
to the lack of directory parent pointers.  However, even without that
feature, we can still make some corrections to the directory tree -- we
can salvage as many directory entries as we can from a damaged
directory, and we can reattach orphaned inodes to the lost+found, just
as xfs_repair does now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

a26aa252

xfs: document metadata file repair · 2f754f7f

Darrick J. Wong authored Apr 11, 2023

File-based metadata (such as xattrs and directories) can be extremely
large.  To reduce the memory requirements and maximize code reuse, it is
very convenient to create a temporary file, use the regular dir/attr
code to store salvaged information, and then atomically swap the extents
between the file being repaired and the temporary file.  Record the high
level concepts behind how temporary files and atomic content swapping
should work, and then present some case studies of what the actual
repair functions do.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

2f754f7f

xfs: document full filesystem scans for online fsck · a0d856ee

Darrick J. Wong authored Apr 11, 2023

Certain parts of the online fsck code need to scan every file in the
entire filesystem.  It is not acceptable to block the entire filesystem
while this happens, which means that we need to be clever in allowing
scans to coordinate with ongoing filesystem updates.  We also need to
hook the filesystem so that regular updates propagate to the staging
records.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

a0d856ee

xfs: document online file metadata repair code · d6978871

Darrick J. Wong authored Apr 11, 2023

Add to the fifth chapter of the online fsck design documentation, where
we discuss the details of the data structures and algorithms used by the
kernel to repair file metadata.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

d6978871

xfs: document btree bulk loading · 7fb8ccff

Darrick J. Wong authored Apr 11, 2023

Add a discussion of the btree bulk loading code, which makes it easy to
take an in-memory recordset and write it out to disk in an efficient
manner.  This also enables atomic switchover from the old to the new
structure with minimal potential for leaking the old blocks.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

7fb8ccff

xfs: document pageable kernel memory · 5f658dad

Darrick J. Wong authored Apr 11, 2023

Add a discussion of pageable kernel memory, since online fsck needs
quite a bit more memory than most other parts of the filesystem to stage
records and other information.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

5f658dad

xfs: document how online fsck deals with eventual consistency · bae43864

Darrick J. Wong authored Apr 11, 2023

Writes to an XFS filesystem employ an eventual consistency update model
to break up complex multistep metadata updates into small chained
transactions. This is generally good for performance and scalability
because XFS doesn't need to prepare for enormous transactions, but it
also means that online fsck must be careful not to attempt a fsck action
unless it can be shown that there are no other threads processing a
transaction chain. This part of the design documentation covers the
thinking behind the consistency model and how scrub deals with it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

bae43864

xfs: document the filesystem metadata checking strategy · e5edad52

Darrick J. Wong authored Apr 11, 2023

Begin the fifth chapter of the online fsck design documentation, where
we discuss the details of the data structures and algorithms used by the
kernel to examine filesystem metadata and cross-reference it around the
filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

e5edad52

xfs: document the user interface for online fsck · 4f7f6469

Darrick J. Wong authored Apr 11, 2023

Start the fourth chapter of the online fsck design documentation, which
discusses the user interface and the background scrubbing service.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

4f7f6469

xfs: document the testing plan for online fsck · 9a30b5b5

Darrick J. Wong authored Apr 11, 2023

Start the third chapter of the online fsck design documentation.  This
covers the testing plan to make sure that both online and offline fsck
can detect arbitrary problems and correct them without making things
worse.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

9a30b5b5

xfs: document the general theory underlying online fsck design · 88757e04

Darrick J. Wong authored Apr 11, 2023

Start the second chapter of the online fsck design documentation.
This covers the general theory underlying how online fsck works.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

88757e04

xfs: document the motivation for online fsck design · a8f6c2e5

Darrick J. Wong authored Apr 11, 2023

Start the first chapter of the online fsck design documentation.
This covers the motivations for creating this in the first place.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

a8f6c2e5

09 Apr, 2023 5 commits

Linux 6.3-rc6 · 09a9639e
Linus Torvalds authored Apr 09, 2023

09a9639e

Merge tag 'perf_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · faf8f418

Linus Torvalds authored Apr 09, 2023

Pull perf fixes from Borislav Petkov:

 - Fix "same task" check when redirecting event output

 - Do not wait unconditionally for RCU on the event migration path if
   there are no events to migrate

* tag 'perf_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix the same task check in perf_event_set_output
  perf: Optimize perf_pmu_migrate_context()

faf8f418

Merge tag 'x86_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4ba115e2

Linus Torvalds authored Apr 09, 2023

Pull x86 fixes from Borislav Petkov:

 - Add a new Intel Arrow Lake CPU model number

 - Fix a confusion about how to check the version of the ACPI spec which
   supports a "online capable" bit in the MADT table which lead to a
   bunch of boot breakages with Zen1 systems and VMs

* tag 'x86_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add model number for Intel Arrow Lake processor
  x86/acpi/boot: Correct acpi_is_processor_usable() check
  x86/ACPI/boot: Use FADT version to check support for online capable

4ba115e2

Merge tag 'cxl-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · c08cfd67

Linus Torvalds authored Apr 09, 2023

Pull compute express link (cxl) fixes from Dan Williams:
 "Several fixes for driver startup regressions that landed during the
  merge window as well as some older bugs.

  The regressions were due to a lack of testing with what the CXL
  specification calls Restricted CXL Host (RCH) topologies compared to
  the testing with Virtual Host (VH) CXL topologies. A VH topology is
  typical PCIe while RCH topologies map CXL endpoints as Root Complex
  Integrated endpoints. The impact is some driver crashes on startup.

  This merge window also added compatibility for range registers (the
  mechanism that CXL 1.1 defined for mapping memory) to treat them like
  HDM decoders (the mechanism that CXL 2.0 defined for mapping
  Host-managed Device Memory). That work collided with the new region
  enumeration code that was tested with CXL 2.0 setups, and fails with
  crashes at startup.

  Lastly, the DOE (Data Object Exchange) implementation for retrieving
  an ACPI-like data table from CXL devices is being reworked for v6.4.
  Several fixes fell out of that work that are suitable for v6.3.

  All of this has been in linux-next for a while, and all reported
  issues [1] have been addressed.

  Summary:

   - Fix several issues with region enumeration in RCH topologies that
     can trigger crashes on driver startup or shutdown.

   - Fix CXL DVSEC range register compatibility versus region
     enumeration that leads to startup crashes

   - Fix CDAT endiannes handling

   - Fix multiple buffer handling boundary conditions

   - Fix Data Object Exchange (DOE) workqueue usage vs
     CONFIG_DEBUG_OBJECTS warn splats"

Link: http://lore.kernel.org/r/20230405075704.33de8121@canb.auug.org.au [1]

* tag 'cxl-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/hdm: Extend DVSEC range register emulation for region enumeration
  cxl/hdm: Limit emulation to the number of range registers
  cxl/region: Move coherence tracking into cxl_region_attach()
  cxl/region: Fix region setup/teardown for RCDs
  cxl/port: Fix find_cxl_root() for RCDs and simplify it
  cxl/hdm: Skip emulation when driver manages mem_enable
  cxl/hdm: Fix double allocation of @cxlhdm
  PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y
  PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y
  cxl/pci: Handle excessive CDAT length
  cxl/pci: Handle truncated CDAT entries
  cxl/pci: Handle truncated CDAT header
  cxl/pci: Fix CDAT retrieval on big endian

c08cfd67

Merge tag '6.3-rc5-smb3-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · cdc9718d

Linus Torvalds authored Apr 08, 2023

Pull cifs client fixes from Steve French:
 "Two cifs/smb3 client fixes, one for stable:

   - double lock fix for a cifs/smb1 reconnect path

   - DFS prefixpath fix for reconnect when server moved"

* tag '6.3-rc5-smb3-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: double lock in cifs_reconnect_tcon()
  cifs: sanitize paths in cifs_update_super_prepath.

cdc9718d

08 Apr, 2023 4 commits

Merge tag 'char-misc-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 68047c48

Linus Torvalds authored Apr 08, 2023

Pull char/misc driver fixes from Greg KH:
 "Here are a small set of various small driver changes for 6.3-rc6.
  Included in here are:

   - iio driver fixes for reported problems

   - coresight hwtracing bugfix for reported problem

   - small counter driver bugfixes

  All have been in linux-next for a while with no reported problems"

* tag 'char-misc-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  coresight: etm4x: Do not access TRCIDR1 for identification
  coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
  iio: adc: ti-ads7950: Set `can_sleep` flag for GPIO chip
  iio: adc: palmas_gpadc: fix NULL dereference on rmmod
  counter: 104-quad-8: Fix Synapse action reported for Index signals
  counter: 104-quad-8: Fix race condition between FLAG and CNTR reads
  iio: adc: max11410: fix read_poll_timeout() usage
  iio: dac: cio-dac: Fix max DAC write value check for 12-bit
  iio: light: cm32181: Unregister second I2C client if present
  iio: accel: kionix-kx022a: Get the timestamp from the driver's private data in the trigger_handler
  iio: adc: ad7791: fix IRQ flags
  iio: buffer: make sure O_NONBLOCK is respected
  iio: buffer: correctly return bytes written in output buffers
  iio: light: vcnl4000: Fix WARN_ON on uninitialized lock
  iio: adis16480: select CONFIG_CRC32
  drivers: iio: adc: ltc2497: fix LSB shift
  iio: adc: qcom-spmi-adc5: Fix the channel name

68047c48

Merge tag 'tty-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · aa46fe36

Linus Torvalds authored Apr 08, 2023

Pull tty/serial driver fixes from Greg KH:
 "Here are some small tty and serial driver fixes for some reported
  problems:

   - fsl_uart driver bugfixes

   - sh-sci serial driver bugfixes

   - renesas serial driver DT binding bugfixes

   - 8250 DMA bugfix

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'tty-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
  tty: serial: fsl_lpuart: fix crash in lpuart_uport_is_active
  tty: serial: fsl_lpuart: avoid checking for transfer complete when UARTCTRL_SBK is asserted in lpuart32_tx_empty
  serial: 8250: Prevent starting up DMA Rx on THRI interrupt
  dt-bindings: serial: renesas,scif: Fix 4th IRQ for 4-IRQ SCIFs
  tty: serial: sh-sci: Fix transmit end interrupt handler

aa46fe36

Merge tag 'usb-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · a211b1c0

Linus Torvalds authored Apr 08, 2023

Pull USB bugfixes from Greg KH:
 "Here are some small USB bugfixes for 6.3-rc6 that have been in my
  tree, and in linux-next, for a while. Included in here are:

   - new usb-serial driver device ids

   - xhci bugfixes for reported problems

   - gadget driver bugfixes for reported problems

   - dwc3 new device id

  All have been in linux-next with no reported problems"

* tag 'usb-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: cdnsp: Fixes error: uninitialized symbol 'len'
  usb: gadgetfs: Fix ep_read_iter to handle ITER_UBUF
  usb: gadget: f_fs: Fix ffs_epfile_read_iter to handle ITER_UBUF
  usb: typec: altmodes/displayport: Fix configure initial pin assignment
  usb: dwc3: pci: add support for the Intel Meteor Lake-S
  xhci: Free the command allocated for setting LPM if we return early
  Revert "usb: xhci-pci: Set PROBE_PREFER_ASYNCHRONOUS"
  xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a passthrough iommu
  USB: serial: option: add Quectel RM500U-CN modem
  usb: xhci: tegra: fix sleep in atomic call
  USB: serial: option: add Telit FE990 compositions
  USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs

a211b1c0

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · a79d5c76

Linus Torvalds authored Apr 08, 2023

Pull SCSI fixes from James Bottomley:
 "Four small fixes, all in drivers. They're all one or two lines except
  for the ufs one, but that's a simple revert of a previous feature"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param()
  scsi: qla2xxx: Fix memory leak in qla2x00_probe_one()
  scsi: mpi3mr: Handle soft reset in progress fault code (0xF002)
  scsi: Revert "scsi: ufs: core: Initialize devfreq synchronously"

a79d5c76