Commit 53ea7f62 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'xfs-6.6-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Chandan Babu:

 - Chandan Babu will be taking over as the XFS release manager. He has
   reviewed all the patches that are in this branch, though I'm signing
   the branch one last time since I'm still technically maintainer. :P

 - Create a maintainer entry profile for XFS in which we lay out the
   various roles that I have played for many years.  Aside from release
   manager, the remaining roles are as yet unfilled.

 - Start merging online repair -- we now have in-memory pageable memory
   for staging btrees, a bunch of pending fixes, and we've started the
   process of refactoring the scrub support code to support more of
   repair.  In particular, reaping of old blocks from damaged structures.

 - Scrub the realtime summary file.

 - Fix a bug where scrub's quota iteration only ever returned the root
   dquot.  Oooops.

 - Fix some typos.

[ Pull request from Chandan Babu, but signed tag and description from
  Darrick Wong, thus the first person singular above is Darrick, not
  Chandan ]

* tag 'xfs-6.6-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (37 commits)
  fs/xfs: Fix typos in comments
  xfs: fix dqiterate thinko
  xfs: don't check reflink iflag state when checking cow fork
  xfs: simplify returns in xchk_bmap
  xfs: rewrite xchk_inode_is_allocated to work properly
  xfs: hide xfs_inode_is_allocated in scrub common code
  xfs: fix agf_fllast when repairing an empty AGFL
  xfs: allow userspace to rebuild metadata structures
  xfs: clear pagf_agflreset when repairing the AGFL
  xfs: allow the user to cancel repairs before we start writing
  xfs: don't complain about unfixed metadata when repairs were injected
  xfs: implement online scrubbing of rtsummary info
  xfs: always rescan allegedly healthy per-ag metadata after repair
  xfs: move the realtime summary file scrubber to a separate source file
  xfs: wrap ilock/iunlock operations on sc->ip
  xfs: get our own reference to inodes that we want to scrub
  xfs: track usage statistics of online fsck
  xfs: improve xfarray quicksort pivot
  xfs: create scaffolding for creating debugfs entries
  xfs: cache pages used for xfarray quicksort convergence
  ...
parents 38663034 c1950a11
......@@ -122,6 +122,7 @@ Documentation for filesystem implementations.
virtiofs
vfat
xfs-delayed-logging-design
xfs-maintainer-entry-profile
xfs-self-describing-metadata
xfs-online-fsck-design
zonefs
XFS Maintainer Entry Profile
============================
Overview
--------
XFS is a well known high-performance filesystem in the Linux kernel.
The aim of this project is to provide and maintain a robust and
performant filesystem.
Patches are generally merged to the for-next branch of the appropriate
git repository.
After a testing period, the for-next branch is merged to the master
branch.
Kernel code are merged to the xfs-linux tree[0].
Userspace code are merged to the xfsprogs tree[1].
Test cases are merged to the xfstests tree[2].
Ondisk format documentation are merged to the xfs-documentation tree[3].
All patchsets involving XFS *must* be cc'd in their entirety to the mailing
list linux-xfs@vger.kernel.org.
Roles
-----
There are eight key roles in the XFS project.
A person can take on multiple roles, and a role can be filled by
multiple people.
Anyone taking on a role is advised to check in with themselves and
others on a regular basis about burnout.
- **Outside Contributor**: Anyone who sends a patch but is not involved
in the XFS project on a regular basis.
These folks are usually people who work on other filesystems or
elsewhere in the kernel community.
- **Developer**: Someone who is familiar with the XFS codebase enough to
write new code, documentation, and tests.
Developers can often be found in the IRC channel mentioned by the ``C:``
entry in the kernel MAINTAINERS file.
- **Senior Developer**: A developer who is very familiar with at least
some part of the XFS codebase and/or other subsystems in the kernel.
These people collectively decide the long term goals of the project
and nudge the community in that direction.
They should help prioritize development and review work for each release
cycle.
Senior developers tend to be more active participants in the IRC channel.
- **Reviewer**: Someone (most likely also a developer) who reads code
submissions to decide:
0. Is the idea behind the contribution sound?
1. Does the idea fit the goals of the project?
2. Is the contribution designed correctly?
3. Is the contribution polished?
4. Can the contribution be tested effectively?
Reviewers should identify themselves with an ``R:`` entry in the kernel
and fstests MAINTAINERS files.
- **Testing Lead**: This person is responsible for setting the test
coverage goals of the project, negotiating with developers to decide
on new tests for new features, and making sure that developers and
release managers execute on the testing.
The testing lead should identify themselves with an ``M:`` entry in
the XFS section of the fstests MAINTAINERS file.
- **Bug Triager**: Someone who examines incoming bug reports in just
enough detail to identify the person to whom the report should be
forwarded.
The bug triagers should identify themselves with a ``B:`` entry in
the kernel MAINTAINERS file.
- **Release Manager**: This person merges reviewed patchsets into an
integration branch, tests the result locally, pushes the branch to a
public git repository, and sends pull requests further upstream.
The release manager is not expected to work on new feature patchsets.
If a developer and a reviewer fail to reach a resolution on some point,
the release manager must have the ability to intervene to try to drive a
resolution.
The release manager should identify themselves with an ``M:`` entry in
the kernel MAINTAINERS file.
- **Community Manager**: This person calls and moderates meetings of as many
XFS participants as they can get when mailing list discussions prove
insufficient for collective decisionmaking.
They may also serve as liaison between managers of the organizations
sponsoring work on any part of XFS.
- **LTS Maintainer**: Someone who backports and tests bug fixes from
uptream to the LTS kernels.
There tend to be six separate LTS trees at any given time.
The maintainer for a given LTS release should identify themselves with an
``M:`` entry in the MAINTAINERS file for that LTS tree.
Unmaintained LTS kernels should be marked with status ``S: Orphan`` in that
same file.
Submission Checklist Addendum
-----------------------------
Please follow these additional rules when submitting to XFS:
- Patches affecting only the filesystem itself should be based against
the latest -rc or the for-next branch.
These patches will be merged back to the for-next branch.
- Authors of patches touching other subsystems need to coordinate with
the maintainers of XFS and the relevant subsystems to decide how to
proceed with a merge.
- Any patchset changing XFS should be cc'd in its entirety to linux-xfs.
Do not send partial patchsets; that makes analysis of the broader
context of the changes unnecessarily difficult.
- Anyone making kernel changes that have corresponding changes to the
userspace utilities should send the userspace changes as separate
patchsets immediately after the kernel patchsets.
- Authors of bug fix patches are expected to use fstests[2] to perform
an A/B test of the patch to determine that there are no regressions.
When possible, a new regression test case should be written for
fstests.
- Authors of new feature patchsets must ensure that fstests will have
appropriate functional and input corner-case test cases for the new
feature.
- When implementing a new feature, it is strongly suggested that the
developers write a design document to answer the following questions:
* **What** problem is this trying to solve?
* **Who** will benefit from this solution, and **where** will they
access it?
* **How** will this new feature work? This should touch on major data
structures and algorithms supporting the solution at a higher level
than code comments.
* **What** userspace interfaces are necessary to build off of the new
features?
* **How** will this work be tested to ensure that it solves the
problems laid out in the design document without causing new
problems?
The design document should be committed in the kernel documentation
directory.
It may be omitted if the feature is already well known to the
community.
- Patchsets for the new tests should be submitted as separate patchsets
immediately after the kernel and userspace code patchsets.
- Changes to the on-disk format of XFS must be described in the ondisk
format document[3] and submitted as a patchset after the fstests
patchsets.
- Patchsets implementing bug fixes and further code cleanups should put
the bug fixes at the beginning of the series to ease backporting.
Key Release Cycle Dates
-----------------------
Bug fixes may be sent at any time, though the release manager may decide to
defer a patch when the next merge window is close.
Code submissions targeting the next merge window should be sent between
-rc1 and -rc6.
This gives the community time to review the changes, to suggest other changes,
and for the author to retest those changes.
Code submissions also requiring changes to fs/iomap and targeting the
next merge window should be sent between -rc1 and -rc4.
This allows the broader kernel community adequate time to test the
infrastructure changes.
Review Cadence
--------------
In general, please wait at least one week before pinging for feedback.
To find reviewers, either consult the MAINTAINERS file, or ask
developers that have Reviewed-by tags for XFS changes to take a look and
offer their opinion.
References
----------
| [0] https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/
| [1] https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/
| [2] https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/
| [3] https://git.kernel.org/pub/scm/fs/xfs/xfs-documentation.git/
......@@ -105,3 +105,4 @@ to do something different in the near future.
../driver-api/media/maintainer-entry-profile
../driver-api/vfio-pci-device-specific-driver-acceptance
../nvme/feature-and-quirk-policy
../filesystems/xfs-maintainer-entry-profile
......@@ -23428,12 +23428,14 @@ F: include/xen/arm/swiotlb-xen.h
F: include/xen/swiotlb-xen.h
XFS FILESYSTEM
M: Darrick J. Wong <djwong@kernel.org>
M: Chandan Babu R <chandan.babu@oracle.com>
R: Darrick J. Wong <djwong@kernel.org>
L: linux-xfs@vger.kernel.org
S: Supported
W: http://xfs.org/
C: irc://irc.oftc.net/xfs
T: git git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
P: Documentation/filesystems/xfs-maintainer-entry-profile.rst
F: Documentation/ABI/testing/sysfs-fs-xfs
F: Documentation/admin-guide/xfs.rst
F: Documentation/filesystems/xfs-delayed-logging-design.rst
......
......@@ -128,6 +128,7 @@ config XFS_ONLINE_SCRUB
bool "XFS online metadata check support"
default n
depends on XFS_FS
depends on TMPFS && SHMEM
select XFS_DRAIN_INTENTS
help
If you say Y here you will be able to check metadata on a
......@@ -142,6 +143,23 @@ config XFS_ONLINE_SCRUB
If unsure, say N.
config XFS_ONLINE_SCRUB_STATS
bool "XFS online metadata check usage data collection"
default y
depends on XFS_ONLINE_SCRUB
select FS_DEBUG
help
If you say Y here, the kernel will gather usage data about
the online metadata check subsystem. This includes the number
of invocations, the outcomes, and the results of repairs, if any.
This may slow down scrub slightly due to the use of high precision
timers and the need to merge per-invocation information into the
filesystem counters.
Usage data are collected in /sys/kernel/debug/xfs/scrub.
If unsure, say N.
config XFS_ONLINE_REPAIR
bool "XFS online metadata repair support"
default n
......
......@@ -164,15 +164,24 @@ xfs-y += $(addprefix scrub/, \
rmap.o \
scrub.o \
symlink.o \
xfarray.o \
xfile.o \
)
xfs-$(CONFIG_XFS_ONLINE_SCRUB_STATS) += scrub/stats.o
xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \
rtbitmap.o \
rtsummary.o \
)
xfs-$(CONFIG_XFS_RT) += scrub/rtbitmap.o
xfs-$(CONFIG_XFS_QUOTA) += scrub/quota.o
# online repair
ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
xfs-y += $(addprefix scrub/, \
agheader_repair.o \
reap.o \
repair.o \
)
endif
......
......@@ -743,7 +743,11 @@ struct xfs_scrub_metadata {
*/
#define XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED (1u << 7)
#define XFS_SCRUB_FLAGS_IN (XFS_SCRUB_IFLAG_REPAIR)
/* i: Rebuild the data structure. */
#define XFS_SCRUB_IFLAG_FORCE_REBUILD (1u << 8)
#define XFS_SCRUB_FLAGS_IN (XFS_SCRUB_IFLAG_REPAIR | \
XFS_SCRUB_IFLAG_FORCE_REBUILD)
#define XFS_SCRUB_FLAGS_OUT (XFS_SCRUB_OFLAG_CORRUPT | \
XFS_SCRUB_OFLAG_PREEN | \
XFS_SCRUB_OFLAG_XFAIL | \
......
......@@ -26,6 +26,7 @@
#include "scrub/trace.h"
#include "scrub/repair.h"
#include "scrub/bitmap.h"
#include "scrub/reap.h"
/* Superblock */
......@@ -48,6 +49,10 @@ xrep_superblock(
if (error)
return error;
/* Last chance to abort before we start committing fixes. */
if (xchk_should_terminate(sc, &error))
return error;
/* Copy AG 0's superblock to this one. */
xfs_buf_zero(bp, 0, BBTOB(bp->b_length));
xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
......@@ -423,6 +428,10 @@ xrep_agf(
if (error)
return error;
/* Last chance to abort before we start committing fixes. */
if (xchk_should_terminate(sc, &error))
return error;
/* Start rewriting the header and implant the btrees we found. */
xrep_agf_init_header(sc, agf_bp, &old_agf);
xrep_agf_set_roots(sc, agf, fab);
......@@ -444,13 +453,13 @@ xrep_agf(
struct xrep_agfl {
/* Bitmap of alleged AGFL blocks that we're not going to add. */
struct xbitmap crossed;
struct xagb_bitmap crossed;
/* Bitmap of other OWN_AG metadata blocks. */
struct xbitmap agmetablocks;
struct xagb_bitmap agmetablocks;
/* Bitmap of free space. */
struct xbitmap *freesp;
struct xagb_bitmap *freesp;
/* rmapbt cursor for finding crosslinked blocks */
struct xfs_btree_cur *rmap_cur;
......@@ -466,7 +475,6 @@ xrep_agfl_walk_rmap(
void *priv)
{
struct xrep_agfl *ra = priv;
xfs_fsblock_t fsb;
int error = 0;
if (xchk_should_terminate(ra->sc, &error))
......@@ -474,14 +482,13 @@ xrep_agfl_walk_rmap(
/* Record all the OWN_AG blocks. */
if (rec->rm_owner == XFS_RMAP_OWN_AG) {
fsb = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.pag->pag_agno,
rec->rm_startblock);
error = xbitmap_set(ra->freesp, fsb, rec->rm_blockcount);
error = xagb_bitmap_set(ra->freesp, rec->rm_startblock,
rec->rm_blockcount);
if (error)
return error;
}
return xbitmap_set_btcur_path(&ra->agmetablocks, cur);
return xagb_bitmap_set_btcur_path(&ra->agmetablocks, cur);
}
/* Strike out the blocks that are cross-linked according to the rmapbt. */
......@@ -492,12 +499,10 @@ xrep_agfl_check_extent(
void *priv)
{
struct xrep_agfl *ra = priv;
xfs_agblock_t agbno = XFS_FSB_TO_AGBNO(ra->sc->mp, start);
xfs_agblock_t agbno = start;
xfs_agblock_t last_agbno = agbno + len - 1;
int error;
ASSERT(XFS_FSB_TO_AGNO(ra->sc->mp, start) == ra->sc->sa.pag->pag_agno);
while (agbno <= last_agbno) {
bool other_owners;
......@@ -507,7 +512,7 @@ xrep_agfl_check_extent(
return error;
if (other_owners) {
error = xbitmap_set(&ra->crossed, agbno, 1);
error = xagb_bitmap_set(&ra->crossed, agbno, 1);
if (error)
return error;
}
......@@ -533,7 +538,7 @@ STATIC int
xrep_agfl_collect_blocks(
struct xfs_scrub *sc,
struct xfs_buf *agf_bp,
struct xbitmap *agfl_extents,
struct xagb_bitmap *agfl_extents,
xfs_agblock_t *flcount)
{
struct xrep_agfl ra;
......@@ -543,8 +548,8 @@ xrep_agfl_collect_blocks(
ra.sc = sc;
ra.freesp = agfl_extents;
xbitmap_init(&ra.agmetablocks);
xbitmap_init(&ra.crossed);
xagb_bitmap_init(&ra.agmetablocks);
xagb_bitmap_init(&ra.crossed);
/* Find all space used by the free space btrees & rmapbt. */
cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.pag);
......@@ -556,7 +561,7 @@ xrep_agfl_collect_blocks(
/* Find all blocks currently being used by the bnobt. */
cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp,
sc->sa.pag, XFS_BTNUM_BNO);
error = xbitmap_set_btblocks(&ra.agmetablocks, cur);
error = xagb_bitmap_set_btblocks(&ra.agmetablocks, cur);
xfs_btree_del_cursor(cur, error);
if (error)
goto out_bmp;
......@@ -564,7 +569,7 @@ xrep_agfl_collect_blocks(
/* Find all blocks currently being used by the cntbt. */
cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp,
sc->sa.pag, XFS_BTNUM_CNT);
error = xbitmap_set_btblocks(&ra.agmetablocks, cur);
error = xagb_bitmap_set_btblocks(&ra.agmetablocks, cur);
xfs_btree_del_cursor(cur, error);
if (error)
goto out_bmp;
......@@ -573,17 +578,17 @@ xrep_agfl_collect_blocks(
* Drop the freesp meta blocks that are in use by btrees.
* The remaining blocks /should/ be AGFL blocks.
*/
error = xbitmap_disunion(agfl_extents, &ra.agmetablocks);
error = xagb_bitmap_disunion(agfl_extents, &ra.agmetablocks);
if (error)
goto out_bmp;
/* Strike out the blocks that are cross-linked. */
ra.rmap_cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.pag);
error = xbitmap_walk(agfl_extents, xrep_agfl_check_extent, &ra);
error = xagb_bitmap_walk(agfl_extents, xrep_agfl_check_extent, &ra);
xfs_btree_del_cursor(ra.rmap_cur, error);
if (error)
goto out_bmp;
error = xbitmap_disunion(agfl_extents, &ra.crossed);
error = xagb_bitmap_disunion(agfl_extents, &ra.crossed);
if (error)
goto out_bmp;
......@@ -591,12 +596,12 @@ xrep_agfl_collect_blocks(
* Calculate the new AGFL size. If we found more blocks than fit in
* the AGFL we'll free them later.
*/
*flcount = min_t(uint64_t, xbitmap_hweight(agfl_extents),
*flcount = min_t(uint64_t, xagb_bitmap_hweight(agfl_extents),
xfs_agfl_size(mp));
out_bmp:
xbitmap_destroy(&ra.crossed);
xbitmap_destroy(&ra.agmetablocks);
xagb_bitmap_destroy(&ra.crossed);
xagb_bitmap_destroy(&ra.agmetablocks);
return error;
}
......@@ -615,18 +620,24 @@ xrep_agfl_update_agf(
xfs_force_summary_recalc(sc->mp);
/* Update the AGF counters. */
if (xfs_perag_initialised_agf(sc->sa.pag))
if (xfs_perag_initialised_agf(sc->sa.pag)) {
sc->sa.pag->pagf_flcount = flcount;
clear_bit(XFS_AGSTATE_AGFL_NEEDS_RESET,
&sc->sa.pag->pag_opstate);
}
agf->agf_flfirst = cpu_to_be32(0);
agf->agf_flcount = cpu_to_be32(flcount);
agf->agf_fllast = cpu_to_be32(flcount - 1);
if (flcount)
agf->agf_fllast = cpu_to_be32(flcount - 1);
else
agf->agf_fllast = cpu_to_be32(xfs_agfl_size(sc->mp) - 1);
xfs_alloc_log_agf(sc->tp, agf_bp,
XFS_AGF_FLFIRST | XFS_AGF_FLLAST | XFS_AGF_FLCOUNT);
}
struct xrep_agfl_fill {
struct xbitmap used_extents;
struct xagb_bitmap used_extents;
struct xfs_scrub *sc;
__be32 *agfl_bno;
xfs_agblock_t flcount;
......@@ -642,17 +653,15 @@ xrep_agfl_fill(
{
struct xrep_agfl_fill *af = priv;
struct xfs_scrub *sc = af->sc;
xfs_fsblock_t fsbno = start;
xfs_agblock_t agbno = start;
int error;
while (fsbno < start + len && af->fl_off < af->flcount)
af->agfl_bno[af->fl_off++] =
cpu_to_be32(XFS_FSB_TO_AGBNO(sc->mp, fsbno++));
trace_xrep_agfl_insert(sc->sa.pag, agbno, len);
trace_xrep_agfl_insert(sc->mp, sc->sa.pag->pag_agno,
XFS_FSB_TO_AGBNO(sc->mp, start), len);
while (agbno < start + len && af->fl_off < af->flcount)
af->agfl_bno[af->fl_off++] = cpu_to_be32(agbno++);
error = xbitmap_set(&af->used_extents, start, fsbno - 1);
error = xagb_bitmap_set(&af->used_extents, start, agbno - 1);
if (error)
return error;
......@@ -667,7 +676,7 @@ STATIC int
xrep_agfl_init_header(
struct xfs_scrub *sc,
struct xfs_buf *agfl_bp,
struct xbitmap *agfl_extents,
struct xagb_bitmap *agfl_extents,
xfs_agblock_t flcount)
{
struct xrep_agfl_fill af = {
......@@ -695,17 +704,17 @@ xrep_agfl_init_header(
* blocks than fit in the AGFL, they will be freed in a subsequent
* step.
*/
xbitmap_init(&af.used_extents);
xagb_bitmap_init(&af.used_extents);
af.agfl_bno = xfs_buf_to_agfl_bno(agfl_bp),
xbitmap_walk(agfl_extents, xrep_agfl_fill, &af);
error = xbitmap_disunion(agfl_extents, &af.used_extents);
xagb_bitmap_walk(agfl_extents, xrep_agfl_fill, &af);
error = xagb_bitmap_disunion(agfl_extents, &af.used_extents);
if (error)
return error;
/* Write new AGFL to disk. */
xfs_trans_buf_set_type(sc->tp, agfl_bp, XFS_BLFT_AGFL_BUF);
xfs_trans_log_buf(sc->tp, agfl_bp, 0, BBTOB(agfl_bp->b_length) - 1);
xbitmap_destroy(&af.used_extents);
xagb_bitmap_destroy(&af.used_extents);
return 0;
}
......@@ -714,7 +723,7 @@ int
xrep_agfl(
struct xfs_scrub *sc)
{
struct xbitmap agfl_extents;
struct xagb_bitmap agfl_extents;
struct xfs_mount *mp = sc->mp;
struct xfs_buf *agf_bp;
struct xfs_buf *agfl_bp;
......@@ -725,7 +734,7 @@ xrep_agfl(
if (!xfs_has_rmapbt(mp))
return -EOPNOTSUPP;
xbitmap_init(&agfl_extents);
xagb_bitmap_init(&agfl_extents);
/*
* Read the AGF so that we can query the rmapbt. We hope that there's
......@@ -753,6 +762,10 @@ xrep_agfl(
if (error)
goto err;
/* Last chance to abort before we start committing fixes. */
if (xchk_should_terminate(sc, &error))
goto err;
/*
* Update AGF and AGFL. We reset the global free block counter when
* we adjust the AGF flcount (which can fail) so avoid updating any
......@@ -774,10 +787,10 @@ xrep_agfl(
goto err;
/* Dump any AGFL overflow. */
error = xrep_reap_extents(sc, &agfl_extents, &XFS_RMAP_OINFO_AG,
error = xrep_reap_agblocks(sc, &agfl_extents, &XFS_RMAP_OINFO_AG,
XFS_AG_RESV_AGFL);
err:
xbitmap_destroy(&agfl_extents);
xagb_bitmap_destroy(&agfl_extents);
return error;
}
......@@ -1000,6 +1013,10 @@ xrep_agi(
if (error)
return error;
/* Last chance to abort before we start committing fixes. */
if (xchk_should_terminate(sc, &error))
return error;
/* Start rewriting the header and implant the btrees we found. */
xrep_agi_init_header(sc, agi_bp, &old_agi);
xrep_agi_set_roots(sc, agi, fab);
......
......@@ -301,21 +301,15 @@ xagb_bitmap_set_btblocks(
* blocks going from the leaf towards the root.
*/
int
xbitmap_set_btcur_path(
struct xbitmap *bitmap,
xagb_bitmap_set_btcur_path(
struct xagb_bitmap *bitmap,
struct xfs_btree_cur *cur)
{
struct xfs_buf *bp;
xfs_fsblock_t fsb;
int i;
int error;
for (i = 0; i < cur->bc_nlevels && cur->bc_levels[i].ptr == 1; i++) {
xfs_btree_get_block(cur, i, &bp);
if (!bp)
continue;
fsb = XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp));
error = xbitmap_set(bitmap, fsb, 1);
error = xagb_bitmap_visit_btblock(cur, i, bitmap);
if (error)
return error;
}
......@@ -323,35 +317,6 @@ xbitmap_set_btcur_path(
return 0;
}
/* Collect a btree's block in the bitmap. */
STATIC int
xbitmap_collect_btblock(
struct xfs_btree_cur *cur,
int level,
void *priv)
{
struct xbitmap *bitmap = priv;
struct xfs_buf *bp;
xfs_fsblock_t fsbno;
xfs_btree_get_block(cur, level, &bp);
if (!bp)
return 0;
fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp));
return xbitmap_set(bitmap, fsbno, 1);
}
/* Walk the btree and mark the bitmap wherever a btree block is found. */
int
xbitmap_set_btblocks(
struct xbitmap *bitmap,
struct xfs_btree_cur *cur)
{
return xfs_btree_visit_blocks(cur, xbitmap_collect_btblock,
XFS_BTREE_VISIT_ALL, bitmap);
}
/* How many bits are set in this bitmap? */
uint64_t
xbitmap_hweight(
......@@ -385,43 +350,6 @@ xbitmap_walk(
return error;
}
struct xbitmap_walk_bits {
xbitmap_walk_bits_fn fn;
void *priv;
};
/* Walk all the bits in a run. */
static int
xbitmap_walk_bits_in_run(
uint64_t start,
uint64_t len,
void *priv)
{
struct xbitmap_walk_bits *wb = priv;
uint64_t i;
int error = 0;
for (i = start; i < start + len; i++) {
error = wb->fn(i, wb->priv);
if (error)
break;
}
return error;
}
/* Call a function for every set bit in this bitmap. */
int
xbitmap_walk_bits(
struct xbitmap *bitmap,
xbitmap_walk_bits_fn fn,
void *priv)
{
struct xbitmap_walk_bits wb = {.fn = fn, .priv = priv};
return xbitmap_walk(bitmap, xbitmap_walk_bits_in_run, &wb);
}
/* Does this bitmap have no bits set at all? */
bool
xbitmap_empty(
......
......@@ -16,10 +16,6 @@ void xbitmap_destroy(struct xbitmap *bitmap);
int xbitmap_clear(struct xbitmap *bitmap, uint64_t start, uint64_t len);
int xbitmap_set(struct xbitmap *bitmap, uint64_t start, uint64_t len);
int xbitmap_disunion(struct xbitmap *bitmap, struct xbitmap *sub);
int xbitmap_set_btcur_path(struct xbitmap *bitmap,
struct xfs_btree_cur *cur);
int xbitmap_set_btblocks(struct xbitmap *bitmap,
struct xfs_btree_cur *cur);
uint64_t xbitmap_hweight(struct xbitmap *bitmap);
/*
......@@ -33,10 +29,6 @@ typedef int (*xbitmap_walk_fn)(uint64_t start, uint64_t len, void *priv);
int xbitmap_walk(struct xbitmap *bitmap, xbitmap_walk_fn fn,
void *priv);
typedef int (*xbitmap_walk_bits_fn)(uint64_t bit, void *priv);
int xbitmap_walk_bits(struct xbitmap *bitmap, xbitmap_walk_bits_fn fn,
void *priv);
bool xbitmap_empty(struct xbitmap *bitmap);
bool xbitmap_test(struct xbitmap *bitmap, uint64_t start, uint64_t *len);
......@@ -110,5 +102,7 @@ static inline int xagb_bitmap_walk(struct xagb_bitmap *bitmap,
int xagb_bitmap_set_btblocks(struct xagb_bitmap *bitmap,
struct xfs_btree_cur *cur);
int xagb_bitmap_set_btcur_path(struct xagb_bitmap *bitmap,
struct xfs_btree_cur *cur);
#endif /* __XFS_SCRUB_BITMAP_H__ */
......@@ -38,8 +38,7 @@ xchk_setup_inode_bmap(
if (error)
goto out;
sc->ilock_flags = XFS_IOLOCK_EXCL;
xfs_ilock(sc->ip, XFS_IOLOCK_EXCL);
xchk_ilock(sc, XFS_IOLOCK_EXCL);
/*
* We don't want any ephemeral data/cow fork updates sitting around
......@@ -50,8 +49,7 @@ xchk_setup_inode_bmap(
sc->sm->sm_type != XFS_SCRUB_TYPE_BMBTA) {
struct address_space *mapping = VFS_I(sc->ip)->i_mapping;
sc->ilock_flags |= XFS_MMAPLOCK_EXCL;
xfs_ilock(sc->ip, XFS_MMAPLOCK_EXCL);
xchk_ilock(sc, XFS_MMAPLOCK_EXCL);
inode_dio_wait(VFS_I(sc->ip));
......@@ -79,9 +77,8 @@ xchk_setup_inode_bmap(
error = xchk_trans_alloc(sc, 0);
if (error)
goto out;
sc->ilock_flags |= XFS_ILOCK_EXCL;
xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
xchk_ilock(sc, XFS_ILOCK_EXCL);
out:
/* scrub teardown will unlock and release the inode */
return error;
......@@ -844,7 +841,7 @@ xchk_bmap(
/* Non-existent forks can be ignored. */
if (!ifp)
goto out;
return -ENOENT;
info.is_rt = whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip);
info.whichfork = whichfork;
......@@ -853,10 +850,10 @@ xchk_bmap(
switch (whichfork) {
case XFS_COW_FORK:
/* No CoW forks on non-reflink inodes/filesystems. */
if (!xfs_is_reflink_inode(ip)) {
/* No CoW forks on non-reflink filesystems. */
if (!xfs_has_reflink(mp)) {
xchk_ino_set_corrupt(sc, sc->ip->i_ino);
goto out;
return 0;
}
break;
case XFS_ATTR_FORK:
......@@ -876,31 +873,31 @@ xchk_bmap(
/* No mappings to check. */
if (whichfork == XFS_COW_FORK)
xchk_fblock_set_corrupt(sc, whichfork, 0);
goto out;
return 0;
case XFS_DINODE_FMT_EXTENTS:
break;
case XFS_DINODE_FMT_BTREE:
if (whichfork == XFS_COW_FORK) {
xchk_fblock_set_corrupt(sc, whichfork, 0);
goto out;
return 0;
}
error = xchk_bmap_btree(sc, whichfork, &info);
if (error)
goto out;
return error;
break;
default:
xchk_fblock_set_corrupt(sc, whichfork, 0);
goto out;
return 0;
}
if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
goto out;
return 0;
/* Find the offset of the last extent in the mapping. */
error = xfs_bmap_last_offset(ip, &endoff, whichfork);
if (!xchk_fblock_process_error(sc, whichfork, 0, &error))
goto out;
return error;
/*
* Scrub extent records. We use a special iterator function here that
......@@ -913,12 +910,12 @@ xchk_bmap(
while (xchk_bmap_iext_iter(&info, &irec)) {
if (xchk_should_terminate(sc, &error) ||
(sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT))
goto out;
return 0;
if (irec.br_startoff >= endoff) {
xchk_fblock_set_corrupt(sc, whichfork,
irec.br_startoff);
goto out;
return 0;
}
if (isnullstartblock(irec.br_startblock))
......@@ -931,10 +928,10 @@ xchk_bmap(
if (xchk_bmap_want_check_rmaps(&info)) {
error = xchk_bmap_check_rmaps(sc, whichfork);
if (!xchk_fblock_xref_process_error(sc, whichfork, 0, &error))
goto out;
return error;
}
out:
return error;
return 0;
}
/* Scrub an inode's data fork. */
......@@ -958,8 +955,5 @@ int
xchk_bmap_cow(
struct xfs_scrub *sc)
{
if (!xfs_is_reflink_inode(sc->ip))
return -ENOENT;
return xchk_bmap(sc, XFS_COW_FORK);
}
......@@ -831,6 +831,25 @@ xchk_install_handle_inode(
return 0;
}
/*
* Install an already-referenced inode for scrubbing. Get our own reference to
* the inode to make disposal simpler. The inode must not be in I_FREEING or
* I_WILL_FREE state!
*/
int
xchk_install_live_inode(
struct xfs_scrub *sc,
struct xfs_inode *ip)
{
if (!igrab(VFS_I(ip))) {
xchk_ino_set_corrupt(sc, ip->i_ino);
return -EFSCORRUPTED;
}
sc->ip = ip;
return 0;
}
/*
* In preparation to scrub metadata structures that hang off of an inode,
* grab either the inode referenced in the scrub control structure or the
......@@ -854,10 +873,8 @@ xchk_iget_for_scrubbing(
ASSERT(sc->tp == NULL);
/* We want to scan the inode we already had opened. */
if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) {
sc->ip = ip_in;
return 0;
}
if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino)
return xchk_install_live_inode(sc, ip_in);
/* Reject internal metadata files and obviously bad inode numbers. */
if (xfs_internal_inum(mp, sc->sm->sm_ino))
......@@ -1005,20 +1022,48 @@ xchk_setup_inode_contents(
return error;
/* Lock the inode so the VFS cannot touch this file. */
sc->ilock_flags = XFS_IOLOCK_EXCL;
xfs_ilock(sc->ip, sc->ilock_flags);
xchk_ilock(sc, XFS_IOLOCK_EXCL);
error = xchk_trans_alloc(sc, resblks);
if (error)
goto out;
sc->ilock_flags |= XFS_ILOCK_EXCL;
xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
xchk_ilock(sc, XFS_ILOCK_EXCL);
out:
/* scrub teardown will unlock and release the inode for us */
return error;
}
void
xchk_ilock(
struct xfs_scrub *sc,
unsigned int ilock_flags)
{
xfs_ilock(sc->ip, ilock_flags);
sc->ilock_flags |= ilock_flags;
}
bool
xchk_ilock_nowait(
struct xfs_scrub *sc,
unsigned int ilock_flags)
{
if (xfs_ilock_nowait(sc->ip, ilock_flags)) {
sc->ilock_flags |= ilock_flags;
return true;
}
return false;
}
void
xchk_iunlock(
struct xfs_scrub *sc,
unsigned int ilock_flags)
{
sc->ilock_flags &= ~ilock_flags;
xfs_iunlock(sc->ip, ilock_flags);
}
/*
* Predicate that decides if we need to evaluate the cross-reference check.
* If there was an error accessing the cross-reference btree, just delete
......@@ -1185,3 +1230,155 @@ xchk_fsgates_enable(
sc->flags |= scrub_fsgates;
}
/*
* Decide if this is this a cached inode that's also allocated. The caller
* must hold a reference to an AG and the AGI buffer lock to prevent inodes
* from being allocated or freed.
*
* Look up an inode by number in the given file system. If the inode number
* is invalid, return -EINVAL. If the inode is not in cache, return -ENODATA.
* If the inode is being reclaimed, return -ENODATA because we know the inode
* cache cannot be updating the ondisk metadata.
*
* Otherwise, the incore inode is the one we want, and it is either live,
* somewhere in the inactivation machinery, or reclaimable. The inode is
* allocated if i_mode is nonzero. In all three cases, the cached inode will
* be more up to date than the ondisk inode buffer, so we must use the incore
* i_mode.
*/
int
xchk_inode_is_allocated(
struct xfs_scrub *sc,
xfs_agino_t agino,
bool *inuse)
{
struct xfs_mount *mp = sc->mp;
struct xfs_perag *pag = sc->sa.pag;
xfs_ino_t ino;
struct xfs_inode *ip;
int error;
/* caller must hold perag reference */
if (pag == NULL) {
ASSERT(pag != NULL);
return -EINVAL;
}
/* caller must have AGI buffer */
if (sc->sa.agi_bp == NULL) {
ASSERT(sc->sa.agi_bp != NULL);
return -EINVAL;
}
/* reject inode numbers outside existing AGs */
ino = XFS_AGINO_TO_INO(sc->mp, pag->pag_agno, agino);
if (!xfs_verify_ino(mp, ino))
return -EINVAL;
error = -ENODATA;
rcu_read_lock();
ip = radix_tree_lookup(&pag->pag_ici_root, agino);
if (!ip) {
/* cache miss */
goto out_rcu;
}
/*
* If the inode number doesn't match, the incore inode got reused
* during an RCU grace period and the radix tree hasn't been updated.
* This isn't the inode we want.
*/
spin_lock(&ip->i_flags_lock);
if (ip->i_ino != ino)
goto out_skip;
trace_xchk_inode_is_allocated(ip);
/*
* We have an incore inode that matches the inode we want, and the
* caller holds the perag structure and the AGI buffer. Let's check
* our assumptions below:
*/
#ifdef DEBUG
/*
* (1) If the incore inode is live (i.e. referenced from the dcache),
* it will not be INEW, nor will it be in the inactivation or reclaim
* machinery. The ondisk inode had better be allocated. This is the
* most trivial case.
*/
if (!(ip->i_flags & (XFS_NEED_INACTIVE | XFS_INEW | XFS_IRECLAIMABLE |
XFS_INACTIVATING))) {
/* live inode */
ASSERT(VFS_I(ip)->i_mode != 0);
}
/*
* If the incore inode is INEW, there are several possibilities:
*
* (2) For a file that is being created, note that we allocate the
* ondisk inode before allocating, initializing, and adding the incore
* inode to the radix tree.
*
* (3) If the incore inode is being recycled, the inode has to be
* allocated because we don't allow freed inodes to be recycled.
* Recycling doesn't touch i_mode.
*/
if (ip->i_flags & XFS_INEW) {
/* created on disk already or recycling */
ASSERT(VFS_I(ip)->i_mode != 0);
}
/*
* (4) If the inode is queued for inactivation (NEED_INACTIVE) but
* inactivation has not started (!INACTIVATING), it is still allocated.
*/
if ((ip->i_flags & XFS_NEED_INACTIVE) &&
!(ip->i_flags & XFS_INACTIVATING)) {
/* definitely before difree */
ASSERT(VFS_I(ip)->i_mode != 0);
}
#endif
/*
* If the incore inode is undergoing inactivation (INACTIVATING), there
* are two possibilities:
*
* (5) It is before the point where it would get freed ondisk, in which
* case i_mode is still nonzero.
*
* (6) It has already been freed, in which case i_mode is zero.
*
* We don't take the ILOCK here, but difree and dialloc update the AGI,
* and we've taken the AGI buffer lock, which prevents that from
* happening.
*/
/*
* (7) Inodes undergoing inactivation (INACTIVATING) or queued for
* reclaim (IRECLAIMABLE) could be allocated or free. i_mode still
* reflects the ondisk state.
*/
/*
* (8) If the inode is in IFLUSHING, it's safe to query i_mode because
* the flush code uses i_mode to format the ondisk inode.
*/
/*
* (9) If the inode is in IRECLAIM and was reachable via the radix
* tree, it still has the same i_mode as it did before it entered
* reclaim. The inode object is still alive because we hold the RCU
* read lock.
*/
*inuse = VFS_I(ip)->i_mode != 0;
error = 0;
out_skip:
spin_unlock(&ip->i_flags_lock);
out_rcu:
rcu_read_unlock();
return error;
}
......@@ -88,10 +88,16 @@ int xchk_setup_xattr(struct xfs_scrub *sc);
int xchk_setup_symlink(struct xfs_scrub *sc);
int xchk_setup_parent(struct xfs_scrub *sc);
#ifdef CONFIG_XFS_RT
int xchk_setup_rt(struct xfs_scrub *sc);
int xchk_setup_rtbitmap(struct xfs_scrub *sc);
int xchk_setup_rtsummary(struct xfs_scrub *sc);
#else
static inline int
xchk_setup_rt(struct xfs_scrub *sc)
xchk_setup_rtbitmap(struct xfs_scrub *sc)
{
return -ENOENT;
}
static inline int
xchk_setup_rtsummary(struct xfs_scrub *sc)
{
return -ENOENT;
}
......@@ -137,6 +143,12 @@ int xchk_count_rmap_ownedby_ag(struct xfs_scrub *sc, struct xfs_btree_cur *cur,
int xchk_setup_ag_btree(struct xfs_scrub *sc, bool force_log);
int xchk_iget_for_scrubbing(struct xfs_scrub *sc);
int xchk_setup_inode_contents(struct xfs_scrub *sc, unsigned int resblks);
int xchk_install_live_inode(struct xfs_scrub *sc, struct xfs_inode *ip);
void xchk_ilock(struct xfs_scrub *sc, unsigned int ilock_flags);
bool xchk_ilock_nowait(struct xfs_scrub *sc, unsigned int ilock_flags);
void xchk_iunlock(struct xfs_scrub *sc, unsigned int ilock_flags);
void xchk_buffer_recheck(struct xfs_scrub *sc, struct xfs_buf *bp);
int xchk_iget(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_inode **ipp);
......@@ -155,8 +167,28 @@ static inline bool xchk_skip_xref(struct xfs_scrub_metadata *sm)
XFS_SCRUB_OFLAG_XCORRUPT);
}
#ifdef CONFIG_XFS_ONLINE_REPAIR
/* Decide if a repair is required. */
static inline bool xchk_needs_repair(const struct xfs_scrub_metadata *sm)
{
return sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
XFS_SCRUB_OFLAG_XCORRUPT |
XFS_SCRUB_OFLAG_PREEN);
}
#else
# define xchk_needs_repair(sc) (false)
#endif /* CONFIG_XFS_ONLINE_REPAIR */
int xchk_metadata_inode_forks(struct xfs_scrub *sc);
/*
* Helper macros to allocate and format xfile description strings.
* Callers must kfree the pointer returned.
*/
#define xchk_xfile_descr(sc, fmt, ...) \
kasprintf(XCHK_GFP_FLAGS, "XFS (%s): " fmt, \
(sc)->mp->m_super->s_id, ##__VA_ARGS__)
/*
* Setting up a hook to wait for intents to drain is costly -- we have to take
* the CPU hotplug lock and force an i-cache flush on all CPUs once to set it
......@@ -171,4 +203,7 @@ static inline bool xchk_need_intent_drain(struct xfs_scrub *sc)
void xchk_fsgates_enable(struct xfs_scrub *sc, unsigned int scrub_fshooks);
int xchk_inode_is_allocated(struct xfs_scrub *sc, xfs_agino_t agino,
bool *inuse);
#endif /* __XFS_SCRUB_COMMON_H__ */
......@@ -226,6 +226,16 @@ xchk_ag_btree_healthy_enough(
return true;
}
/*
* If we just repaired some AG metadata, sc->sick_mask will reflect all
* the per-AG metadata types that were repaired. Exclude these from
* the filesystem health query because we have not yet updated the
* health status and we want everything to be scanned.
*/
if ((sc->flags & XREP_ALREADY_FIXED) &&
type_to_health_flag[sc->sm->sm_type].group == XHG_AG)
mask &= ~sc->sick_mask;
if (xfs_ag_has_sickness(pag, mask)) {
sc->sm->sm_flags |= XFS_SCRUB_OFLAG_XFAIL;
return false;
......
......@@ -328,8 +328,7 @@ xchk_iallocbt_check_cluster_ifree(
goto out;
}
error = xfs_icache_inode_is_allocated(mp, bs->cur->bc_tp, fsino,
&ino_inuse);
error = xchk_inode_is_allocated(bs->sc, agino, &ino_inuse);
if (error == -ENODATA) {
/* Not cached, just read the disk buffer */
freemask_ok = irec_free ^ !!(dip->di_mode);
......
......@@ -32,15 +32,13 @@ xchk_prepare_iscrub(
{
int error;
sc->ilock_flags = XFS_IOLOCK_EXCL;
xfs_ilock(sc->ip, sc->ilock_flags);
xchk_ilock(sc, XFS_IOLOCK_EXCL);
error = xchk_trans_alloc(sc, 0);
if (error)
return error;
sc->ilock_flags |= XFS_ILOCK_EXCL;
xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
xchk_ilock(sc, XFS_ILOCK_EXCL);
return 0;
}
......@@ -83,7 +81,10 @@ xchk_setup_inode(
/* We want to scan the opened inode, so lock it and exit. */
if (sc->sm->sm_ino == 0 || sc->sm->sm_ino == ip_in->i_ino) {
sc->ip = ip_in;
error = xchk_install_live_inode(sc, ip_in);
if (error)
return error;
return xchk_prepare_iscrub(sc);
}
......
......@@ -150,8 +150,8 @@ xchk_parent_validate(
lock_mode = xchk_parent_ilock_dir(dp);
if (!lock_mode) {
xfs_iunlock(sc->ip, XFS_ILOCK_EXCL);
xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
xchk_iunlock(sc, XFS_ILOCK_EXCL);
xchk_ilock(sc, XFS_ILOCK_EXCL);
error = -EAGAIN;
goto out_rele;
}
......
......@@ -59,9 +59,12 @@ xchk_setup_quota(
error = xchk_setup_fs(sc);
if (error)
return error;
sc->ip = xfs_quota_inode(sc->mp, dqtype);
xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
sc->ilock_flags = XFS_ILOCK_EXCL;
error = xchk_install_live_inode(sc, xfs_quota_inode(sc->mp, dqtype));
if (error)
return error;
xchk_ilock(sc, XFS_ILOCK_EXCL);
return 0;
}
......@@ -235,13 +238,11 @@ xchk_quota(
* data fork we have to drop ILOCK_EXCL to use the regular dquot
* functions.
*/
xfs_iunlock(sc->ip, sc->ilock_flags);
sc->ilock_flags = 0;
xchk_iunlock(sc, sc->ilock_flags);
sqi.sc = sc;
sqi.last_id = 0;
error = xfs_qm_dqiterate(mp, dqtype, xchk_quota_item, &sqi);
sc->ilock_flags = XFS_ILOCK_EXCL;
xfs_ilock(sc->ip, sc->ilock_flags);
xchk_ilock(sc, XFS_ILOCK_EXCL);
if (error == -ECANCELED)
error = 0;
if (!xchk_fblock_process_error(sc, XFS_DATA_FORK,
......
This diff is collapsed.
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2022-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#ifndef __XFS_SCRUB_REAP_H__
#define __XFS_SCRUB_REAP_H__
int xrep_reap_agblocks(struct xfs_scrub *sc, struct xagb_bitmap *bitmap,
const struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type);
#endif /* __XFS_SCRUB_REAP_H__ */
This diff is collapsed.
......@@ -8,6 +8,8 @@
#include "xfs_quota_defs.h"
struct xchk_stats_run;
static inline int xrep_notsupported(struct xfs_scrub *sc)
{
return -EOPNOTSUPP;
......@@ -15,28 +17,28 @@ static inline int xrep_notsupported(struct xfs_scrub *sc)
#ifdef CONFIG_XFS_ONLINE_REPAIR
/*
* This is the maximum number of deferred extent freeing item extents (EFIs)
* that we'll attach to a transaction without rolling the transaction to avoid
* overrunning a tr_itruncate reservation.
*/
#define XREP_MAX_ITRUNCATE_EFIS (128)
/* Repair helpers */
int xrep_attempt(struct xfs_scrub *sc);
int xrep_attempt(struct xfs_scrub *sc, struct xchk_stats_run *run);
void xrep_failure(struct xfs_mount *mp);
int xrep_roll_ag_trans(struct xfs_scrub *sc);
int xrep_defer_finish(struct xfs_scrub *sc);
bool xrep_ag_has_space(struct xfs_perag *pag, xfs_extlen_t nr_blocks,
enum xfs_ag_resv_type type);
xfs_extlen_t xrep_calc_ag_resblks(struct xfs_scrub *sc);
int xrep_alloc_ag_block(struct xfs_scrub *sc,
const struct xfs_owner_info *oinfo, xfs_fsblock_t *fsbno,
enum xfs_ag_resv_type resv);
int xrep_init_btblock(struct xfs_scrub *sc, xfs_fsblock_t fsb,
struct xfs_buf **bpp, xfs_btnum_t btnum,
const struct xfs_buf_ops *ops);
struct xbitmap;
struct xagb_bitmap;
int xrep_fix_freelist(struct xfs_scrub *sc, bool can_shrink);
int xrep_invalidate_blocks(struct xfs_scrub *sc, struct xbitmap *btlist);
int xrep_reap_extents(struct xfs_scrub *sc, struct xbitmap *exlist,
const struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type);
struct xrep_find_ag_btree {
/* in: rmap owner of the btree we're looking for */
......@@ -70,7 +72,8 @@ int xrep_agi(struct xfs_scrub *sc);
static inline int
xrep_attempt(
struct xfs_scrub *sc)
struct xfs_scrub *sc,
struct xchk_stats_run *run)
{
return -EOPNOTSUPP;
}
......
......@@ -19,19 +19,20 @@
/* Set us up with the realtime metadata locked. */
int
xchk_setup_rt(
xchk_setup_rtbitmap(
struct xfs_scrub *sc)
{
int error;
error = xchk_setup_fs(sc);
error = xchk_trans_alloc(sc, 0);
if (error)
return error;
sc->ilock_flags = XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP;
sc->ip = sc->mp->m_rbmip;
xfs_ilock(sc->ip, sc->ilock_flags);
error = xchk_install_live_inode(sc, sc->mp->m_rbmip);
if (error)
return error;
xchk_ilock(sc, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP);
return 0;
}
......@@ -123,43 +124,6 @@ xchk_rtbitmap(
return error;
}
/* Scrub the realtime summary. */
int
xchk_rtsummary(
struct xfs_scrub *sc)
{
struct xfs_inode *rsumip = sc->mp->m_rsumip;
struct xfs_inode *old_ip = sc->ip;
uint old_ilock_flags = sc->ilock_flags;
int error = 0;
/*
* We ILOCK'd the rt bitmap ip in the setup routine, now lock the
* rt summary ip in compliance with the rt inode locking rules.
*
* Since we switch sc->ip to rsumip we have to save the old ilock
* flags so that we don't mix up the inode state that @sc tracks.
*/
sc->ip = rsumip;
sc->ilock_flags = XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM;
xfs_ilock(sc->ip, sc->ilock_flags);
/* Invoke the fork scrubber. */
error = xchk_metadata_inode_forks(sc);
if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT))
goto out;
/* XXX: implement this some day */
xchk_set_incomplete(sc);
out:
/* Switch back to the rtbitmap inode and lock flags. */
xfs_iunlock(sc->ip, sc->ilock_flags);
sc->ilock_flags = old_ilock_flags;
sc->ip = old_ip;
return error;
}
/* xref check that the extent is not free in the rtbitmap */
void
xchk_xref_is_used_rt_space(
......
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2017-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_btree.h"
#include "xfs_inode.h"
#include "xfs_log_format.h"
#include "xfs_trans.h"
#include "xfs_rtalloc.h"
#include "xfs_bit.h"
#include "xfs_bmap.h"
#include "scrub/scrub.h"
#include "scrub/common.h"
#include "scrub/trace.h"
#include "scrub/xfile.h"
/*
* Realtime Summary
* ================
*
* We check the realtime summary by scanning the realtime bitmap file to create
* a new summary file incore, and then we compare the computed version against
* the ondisk version. We use the 'xfile' functionality to store this
* (potentially large) amount of data in pageable memory.
*/
/* Set us up to check the rtsummary file. */
int
xchk_setup_rtsummary(
struct xfs_scrub *sc)
{
struct xfs_mount *mp = sc->mp;
char *descr;
int error;
/*
* Create an xfile to construct a new rtsummary file. The xfile allows
* us to avoid pinning kernel memory for this purpose.
*/
descr = xchk_xfile_descr(sc, "realtime summary file");
error = xfile_create(descr, mp->m_rsumsize, &sc->xfile);
kfree(descr);
if (error)
return error;
error = xchk_trans_alloc(sc, 0);
if (error)
return error;
/* Allocate a memory buffer for the summary comparison. */
sc->buf = kvmalloc(mp->m_sb.sb_blocksize, XCHK_GFP_FLAGS);
if (!sc->buf)
return -ENOMEM;
error = xchk_install_live_inode(sc, mp->m_rsumip);
if (error)
return error;
/*
* Locking order requires us to take the rtbitmap first. We must be
* careful to unlock it ourselves when we are done with the rtbitmap
* file since the scrub infrastructure won't do that for us. Only
* then we can lock the rtsummary inode.
*/
xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
xchk_ilock(sc, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM);
return 0;
}
/* Helper functions to record suminfo words in an xfile. */
typedef unsigned int xchk_rtsumoff_t;
static inline int
xfsum_load(
struct xfs_scrub *sc,
xchk_rtsumoff_t sumoff,
xfs_suminfo_t *info)
{
return xfile_obj_load(sc->xfile, info, sizeof(xfs_suminfo_t),
sumoff << XFS_WORDLOG);
}
static inline int
xfsum_store(
struct xfs_scrub *sc,
xchk_rtsumoff_t sumoff,
const xfs_suminfo_t info)
{
return xfile_obj_store(sc->xfile, &info, sizeof(xfs_suminfo_t),
sumoff << XFS_WORDLOG);
}
static inline int
xfsum_copyout(
struct xfs_scrub *sc,
xchk_rtsumoff_t sumoff,
xfs_suminfo_t *info,
unsigned int nr_words)
{
return xfile_obj_load(sc->xfile, info, nr_words << XFS_WORDLOG,
sumoff << XFS_WORDLOG);
}
/* Update the summary file to reflect the free extent that we've accumulated. */
STATIC int
xchk_rtsum_record_free(
struct xfs_mount *mp,
struct xfs_trans *tp,
const struct xfs_rtalloc_rec *rec,
void *priv)
{
struct xfs_scrub *sc = priv;
xfs_fileoff_t rbmoff;
xfs_rtblock_t rtbno;
xfs_filblks_t rtlen;
xchk_rtsumoff_t offs;
unsigned int lenlog;
xfs_suminfo_t v = 0;
int error = 0;
if (xchk_should_terminate(sc, &error))
return error;
/* Compute the relevant location in the rtsum file. */
rbmoff = XFS_BITTOBLOCK(mp, rec->ar_startext);
lenlog = XFS_RTBLOCKLOG(rec->ar_extcount);
offs = XFS_SUMOFFS(mp, lenlog, rbmoff);
rtbno = rec->ar_startext * mp->m_sb.sb_rextsize;
rtlen = rec->ar_extcount * mp->m_sb.sb_rextsize;
if (!xfs_verify_rtext(mp, rtbno, rtlen)) {
xchk_ino_xref_set_corrupt(sc, mp->m_rbmip->i_ino);
return -EFSCORRUPTED;
}
/* Bump the summary count. */
error = xfsum_load(sc, offs, &v);
if (error)
return error;
v++;
trace_xchk_rtsum_record_free(mp, rec->ar_startext, rec->ar_extcount,
lenlog, offs, v);
return xfsum_store(sc, offs, v);
}
/* Compute the realtime summary from the realtime bitmap. */
STATIC int
xchk_rtsum_compute(
struct xfs_scrub *sc)
{
struct xfs_mount *mp = sc->mp;
unsigned long long rtbmp_bytes;
/* If the bitmap size doesn't match the computed size, bail. */
rtbmp_bytes = howmany_64(mp->m_sb.sb_rextents, NBBY);
if (roundup_64(rtbmp_bytes, mp->m_sb.sb_blocksize) !=
mp->m_rbmip->i_disk_size)
return -EFSCORRUPTED;
return xfs_rtalloc_query_all(sc->mp, sc->tp, xchk_rtsum_record_free,
sc);
}
/* Compare the rtsummary file against the one we computed. */
STATIC int
xchk_rtsum_compare(
struct xfs_scrub *sc)
{
struct xfs_mount *mp = sc->mp;
struct xfs_buf *bp;
struct xfs_bmbt_irec map;
xfs_fileoff_t off;
xchk_rtsumoff_t sumoff = 0;
int nmap;
for (off = 0; off < XFS_B_TO_FSB(mp, mp->m_rsumsize); off++) {
int error = 0;
if (xchk_should_terminate(sc, &error))
return error;
if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
return 0;
/* Make sure we have a written extent. */
nmap = 1;
error = xfs_bmapi_read(mp->m_rsumip, off, 1, &map, &nmap,
XFS_DATA_FORK);
if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, off, &error))
return error;
if (nmap != 1 || !xfs_bmap_is_written_extent(&map)) {
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, off);
return 0;
}
/* Read a block's worth of ondisk rtsummary file. */
error = xfs_rtbuf_get(mp, sc->tp, off, 1, &bp);
if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, off, &error))
return error;
/* Read a block's worth of computed rtsummary file. */
error = xfsum_copyout(sc, sumoff, sc->buf, mp->m_blockwsize);
if (error) {
xfs_trans_brelse(sc->tp, bp);
return error;
}
if (memcmp(bp->b_addr, sc->buf,
mp->m_blockwsize << XFS_WORDLOG) != 0)
xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, off);
xfs_trans_brelse(sc->tp, bp);
sumoff += mp->m_blockwsize;
}
return 0;
}
/* Scrub the realtime summary. */
int
xchk_rtsummary(
struct xfs_scrub *sc)
{
struct xfs_mount *mp = sc->mp;
int error = 0;
/* Invoke the fork scrubber. */
error = xchk_metadata_inode_forks(sc);
if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT))
goto out_rbm;
/* Construct the new summary file from the rtbitmap. */
error = xchk_rtsum_compute(sc);
if (error == -EFSCORRUPTED) {
/*
* EFSCORRUPTED means the rtbitmap is corrupt, which is an xref
* error since we're checking the summary file.
*/
xchk_ino_xref_set_corrupt(sc, mp->m_rbmip->i_ino);
error = 0;
goto out_rbm;
}
if (error)
goto out_rbm;
/* Does the computed summary file match the actual rtsummary file? */
error = xchk_rtsum_compare(sc);
out_rbm:
/* Unlock the rtbitmap since we're done with it. */
xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
return error;
}
......@@ -22,6 +22,8 @@
#include "scrub/trace.h"
#include "scrub/repair.h"
#include "scrub/health.h"
#include "scrub/stats.h"
#include "scrub/xfile.h"
/*
* Online Scrub and Repair
......@@ -166,8 +168,6 @@ xchk_teardown(
struct xfs_scrub *sc,
int error)
{
struct xfs_inode *ip_in = XFS_I(file_inode(sc->file));
xchk_ag_free(sc, &sc->sa);
if (sc->tp) {
if (error == 0 && (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR))
......@@ -178,16 +178,18 @@ xchk_teardown(
}
if (sc->ip) {
if (sc->ilock_flags)
xfs_iunlock(sc->ip, sc->ilock_flags);
if (sc->ip != ip_in &&
!xfs_internal_inum(sc->mp, sc->ip->i_ino))
xchk_irele(sc, sc->ip);
xchk_iunlock(sc, sc->ilock_flags);
xchk_irele(sc, sc->ip);
sc->ip = NULL;
}
if (sc->flags & XCHK_HAVE_FREEZE_PROT) {
sc->flags &= ~XCHK_HAVE_FREEZE_PROT;
mnt_drop_write_file(sc->file);
}
if (sc->xfile) {
xfile_destroy(sc->xfile);
sc->xfile = NULL;
}
if (sc->buf) {
if (sc->buf_cleanup)
sc->buf_cleanup(sc->buf);
......@@ -322,14 +324,14 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
},
[XFS_SCRUB_TYPE_RTBITMAP] = { /* realtime bitmap */
.type = ST_FS,
.setup = xchk_setup_rt,
.setup = xchk_setup_rtbitmap,
.scrub = xchk_rtbitmap,
.has = xfs_has_realtime,
.repair = xrep_notsupported,
},
[XFS_SCRUB_TYPE_RTSUM] = { /* realtime summary */
.type = ST_FS,
.setup = xchk_setup_rt,
.setup = xchk_setup_rtsummary,
.scrub = xchk_rtsummary,
.has = xfs_has_realtime,
.repair = xrep_notsupported,
......@@ -409,6 +411,11 @@ xchk_validate_inputs(
goto out;
}
/* No rebuild without repair. */
if ((sm->sm_flags & XFS_SCRUB_IFLAG_FORCE_REBUILD) &&
!(sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR))
return -EINVAL;
/*
* We only want to repair read-write v5+ filesystems. Defer the check
* for ops->repair until after our scrub confirms that we need to
......@@ -463,8 +470,10 @@ xfs_scrub_metadata(
struct file *file,
struct xfs_scrub_metadata *sm)
{
struct xchk_stats_run run = { };
struct xfs_scrub *sc;
struct xfs_mount *mp = XFS_I(file_inode(file))->i_mount;
u64 check_start;
int error = 0;
BUILD_BUG_ON(sizeof(meta_scrub_ops) !=
......@@ -521,7 +530,9 @@ xfs_scrub_metadata(
goto out_teardown;
/* Scrub for errors. */
check_start = xchk_stats_now();
error = sc->ops->scrub(sc);
run.scrub_ns += xchk_stats_elapsed_ns(check_start);
if (error == -EDEADLOCK && !(sc->flags & XCHK_TRY_HARDER))
goto try_harder;
if (error == -ECHRNG && !(sc->flags & XCHK_NEED_DRAIN))
......@@ -533,15 +544,16 @@ xfs_scrub_metadata(
if ((sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) &&
!(sc->flags & XREP_ALREADY_FIXED)) {
bool needs_fix;
bool needs_fix = xchk_needs_repair(sc->sm);
/* Userspace asked us to rebuild the structure regardless. */
if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_FORCE_REBUILD)
needs_fix = true;
/* Let debug users force us into the repair routines. */
if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR))
sc->sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
if (XFS_TEST_ERROR(needs_fix, mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR))
needs_fix = true;
needs_fix = (sc->sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
XFS_SCRUB_OFLAG_XCORRUPT |
XFS_SCRUB_OFLAG_PREEN));
/*
* If userspace asked for a repair but it wasn't necessary,
* report that back to userspace.
......@@ -555,7 +567,7 @@ xfs_scrub_metadata(
* If it's broken, userspace wants us to fix it, and we haven't
* already tried to fix it, then attempt a repair.
*/
error = xrep_attempt(sc);
error = xrep_attempt(sc, &run);
if (error == -EAGAIN) {
/*
* Either the repair function succeeded or it couldn't
......@@ -583,12 +595,15 @@ xfs_scrub_metadata(
sm->sm_flags |= XFS_SCRUB_OFLAG_CORRUPT;
error = 0;
}
if (error != -ENOENT)
xchk_stats_merge(mp, sm, &run);
return error;
need_drain:
error = xchk_teardown(sc, 0);
if (error)
goto out_sc;
sc->flags |= XCHK_NEED_DRAIN;
run.retries++;
goto retry_op;
try_harder:
/*
......@@ -600,5 +615,6 @@ xfs_scrub_metadata(
if (error)
goto out_sc;
sc->flags |= XCHK_TRY_HARDER;
run.retries++;
goto retry_op;
}
......@@ -88,6 +88,10 @@ struct xfs_scrub {
*/
void (*buf_cleanup)(void *buf);
/* xfile used by the scrubbers; freed at teardown. */
struct xfile *xfile;
/* Lock flags for @ip. */
uint ilock_flags;
/* See the XCHK/XREP state flags below. */
......
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_trans_resv.h"
#include "xfs_mount.h"
#include "xfs_sysfs.h"
#include "xfs_btree.h"
#include "xfs_super.h"
#include "scrub/scrub.h"
#include "scrub/stats.h"
#include "scrub/trace.h"
struct xchk_scrub_stats {
/* all 32-bit counters here */
/* checking stats */
uint32_t invocations;
uint32_t clean;
uint32_t corrupt;
uint32_t preen;
uint32_t xfail;
uint32_t xcorrupt;
uint32_t incomplete;
uint32_t warning;
uint32_t retries;
/* repair stats */
uint32_t repair_invocations;
uint32_t repair_success;
/* all 64-bit items here */
/* runtimes */
uint64_t checktime_us;
uint64_t repairtime_us;
/* non-counter state must go at the end for clearall */
spinlock_t css_lock;
};
struct xchk_stats {
struct dentry *cs_debugfs;
struct xchk_scrub_stats cs_stats[XFS_SCRUB_TYPE_NR];
};
static struct xchk_stats global_stats;
static const char *name_map[XFS_SCRUB_TYPE_NR] = {
[XFS_SCRUB_TYPE_SB] = "sb",
[XFS_SCRUB_TYPE_AGF] = "agf",
[XFS_SCRUB_TYPE_AGFL] = "agfl",
[XFS_SCRUB_TYPE_AGI] = "agi",
[XFS_SCRUB_TYPE_BNOBT] = "bnobt",
[XFS_SCRUB_TYPE_CNTBT] = "cntbt",
[XFS_SCRUB_TYPE_INOBT] = "inobt",
[XFS_SCRUB_TYPE_FINOBT] = "finobt",
[XFS_SCRUB_TYPE_RMAPBT] = "rmapbt",
[XFS_SCRUB_TYPE_REFCNTBT] = "refcountbt",
[XFS_SCRUB_TYPE_INODE] = "inode",
[XFS_SCRUB_TYPE_BMBTD] = "bmapbtd",
[XFS_SCRUB_TYPE_BMBTA] = "bmapbta",
[XFS_SCRUB_TYPE_BMBTC] = "bmapbtc",
[XFS_SCRUB_TYPE_DIR] = "directory",
[XFS_SCRUB_TYPE_XATTR] = "xattr",
[XFS_SCRUB_TYPE_SYMLINK] = "symlink",
[XFS_SCRUB_TYPE_PARENT] = "parent",
[XFS_SCRUB_TYPE_RTBITMAP] = "rtbitmap",
[XFS_SCRUB_TYPE_RTSUM] = "rtsummary",
[XFS_SCRUB_TYPE_UQUOTA] = "usrquota",
[XFS_SCRUB_TYPE_GQUOTA] = "grpquota",
[XFS_SCRUB_TYPE_PQUOTA] = "prjquota",
[XFS_SCRUB_TYPE_FSCOUNTERS] = "fscounters",
};
/* Format the scrub stats into a text buffer, similar to pcp style. */
STATIC ssize_t
xchk_stats_format(
struct xchk_stats *cs,
char *buf,
size_t remaining)
{
struct xchk_scrub_stats *css = &cs->cs_stats[0];
unsigned int i;
ssize_t copied = 0;
int ret = 0;
for (i = 0; i < XFS_SCRUB_TYPE_NR; i++, css++) {
if (!name_map[i])
continue;
ret = scnprintf(buf, remaining,
"%s %u %u %u %u %u %u %u %u %u %llu %u %u %llu\n",
name_map[i],
(unsigned int)css->invocations,
(unsigned int)css->clean,
(unsigned int)css->corrupt,
(unsigned int)css->preen,
(unsigned int)css->xfail,
(unsigned int)css->xcorrupt,
(unsigned int)css->incomplete,
(unsigned int)css->warning,
(unsigned int)css->retries,
(unsigned long long)css->checktime_us,
(unsigned int)css->repair_invocations,
(unsigned int)css->repair_success,
(unsigned long long)css->repairtime_us);
if (ret <= 0)
break;
remaining -= ret;
copied += ret;
buf += ret;
}
return copied > 0 ? copied : ret;
}
/* Estimate the worst case buffer size required to hold the whole report. */
STATIC size_t
xchk_stats_estimate_bufsize(
struct xchk_stats *cs)
{
struct xchk_scrub_stats *css = &cs->cs_stats[0];
unsigned int i;
size_t field_width;
size_t ret = 0;
/* 4294967296 plus one space for each u32 field */
field_width = 11 * (offsetof(struct xchk_scrub_stats, checktime_us) /
sizeof(uint32_t));
/* 18446744073709551615 plus one space for each u64 field */
field_width += 21 * ((offsetof(struct xchk_scrub_stats, css_lock) -
offsetof(struct xchk_scrub_stats, checktime_us)) /
sizeof(uint64_t));
for (i = 0; i < XFS_SCRUB_TYPE_NR; i++, css++) {
if (!name_map[i])
continue;
/* name plus one space */
ret += 1 + strlen(name_map[i]);
/* all fields, plus newline */
ret += field_width + 1;
}
return ret;
}
/* Clear all counters. */
STATIC void
xchk_stats_clearall(
struct xchk_stats *cs)
{
struct xchk_scrub_stats *css = &cs->cs_stats[0];
unsigned int i;
for (i = 0; i < XFS_SCRUB_TYPE_NR; i++, css++) {
spin_lock(&css->css_lock);
memset(css, 0, offsetof(struct xchk_scrub_stats, css_lock));
spin_unlock(&css->css_lock);
}
}
#define XFS_SCRUB_OFLAG_UNCLEAN (XFS_SCRUB_OFLAG_CORRUPT | \
XFS_SCRUB_OFLAG_PREEN | \
XFS_SCRUB_OFLAG_XFAIL | \
XFS_SCRUB_OFLAG_XCORRUPT | \
XFS_SCRUB_OFLAG_INCOMPLETE | \
XFS_SCRUB_OFLAG_WARNING)
STATIC void
xchk_stats_merge_one(
struct xchk_stats *cs,
const struct xfs_scrub_metadata *sm,
const struct xchk_stats_run *run)
{
struct xchk_scrub_stats *css;
ASSERT(sm->sm_type < XFS_SCRUB_TYPE_NR);
css = &cs->cs_stats[sm->sm_type];
spin_lock(&css->css_lock);
css->invocations++;
if (!(sm->sm_flags & XFS_SCRUB_OFLAG_UNCLEAN))
css->clean++;
if (sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
css->corrupt++;
if (sm->sm_flags & XFS_SCRUB_OFLAG_PREEN)
css->preen++;
if (sm->sm_flags & XFS_SCRUB_OFLAG_XFAIL)
css->xfail++;
if (sm->sm_flags & XFS_SCRUB_OFLAG_XCORRUPT)
css->xcorrupt++;
if (sm->sm_flags & XFS_SCRUB_OFLAG_INCOMPLETE)
css->incomplete++;
if (sm->sm_flags & XFS_SCRUB_OFLAG_WARNING)
css->warning++;
css->retries += run->retries;
css->checktime_us += howmany_64(run->scrub_ns, NSEC_PER_USEC);
if (run->repair_attempted)
css->repair_invocations++;
if (run->repair_succeeded)
css->repair_success++;
css->repairtime_us += howmany_64(run->repair_ns, NSEC_PER_USEC);
spin_unlock(&css->css_lock);
}
/* Merge these scrub-run stats into the global and mount stat data. */
void
xchk_stats_merge(
struct xfs_mount *mp,
const struct xfs_scrub_metadata *sm,
const struct xchk_stats_run *run)
{
xchk_stats_merge_one(&global_stats, sm, run);
xchk_stats_merge_one(mp->m_scrub_stats, sm, run);
}
/* debugfs boilerplate */
static ssize_t
xchk_scrub_stats_read(
struct file *file,
char __user *ubuf,
size_t count,
loff_t *ppos)
{
struct xchk_stats *cs = file->private_data;
char *buf;
size_t bufsize;
ssize_t avail, ret;
/*
* This generates stringly snapshot of all the scrub counters, so we
* do not want userspace to receive garbled text from multiple calls.
* If the file position is greater than 0, return a short read.
*/
if (*ppos > 0)
return 0;
bufsize = xchk_stats_estimate_bufsize(cs);
buf = kvmalloc(bufsize, XCHK_GFP_FLAGS);
if (!buf)
return -ENOMEM;
avail = xchk_stats_format(cs, buf, bufsize);
if (avail < 0) {
ret = avail;
goto out;
}
ret = simple_read_from_buffer(ubuf, count, ppos, buf, avail);
out:
kvfree(buf);
return ret;
}
static const struct file_operations scrub_stats_fops = {
.open = simple_open,
.read = xchk_scrub_stats_read,
};
static ssize_t
xchk_clear_scrub_stats_write(
struct file *file,
const char __user *ubuf,
size_t count,
loff_t *ppos)
{
struct xchk_stats *cs = file->private_data;
unsigned int val;
int ret;
ret = kstrtouint_from_user(ubuf, count, 0, &val);
if (ret)
return ret;
if (val != 1)
return -EINVAL;
xchk_stats_clearall(cs);
return count;
}
static const struct file_operations clear_scrub_stats_fops = {
.open = simple_open,
.write = xchk_clear_scrub_stats_write,
};
/* Initialize the stats object. */
STATIC int
xchk_stats_init(
struct xchk_stats *cs,
struct xfs_mount *mp)
{
struct xchk_scrub_stats *css = &cs->cs_stats[0];
unsigned int i;
for (i = 0; i < XFS_SCRUB_TYPE_NR; i++, css++)
spin_lock_init(&css->css_lock);
return 0;
}
/* Connect the stats object to debugfs. */
void
xchk_stats_register(
struct xchk_stats *cs,
struct dentry *parent)
{
if (!parent)
return;
cs->cs_debugfs = xfs_debugfs_mkdir("scrub", parent);
if (!cs->cs_debugfs)
return;
debugfs_create_file("stats", 0644, cs->cs_debugfs, cs,
&scrub_stats_fops);
debugfs_create_file("clear_stats", 0400, cs->cs_debugfs, cs,
&clear_scrub_stats_fops);
}
/* Free all resources related to the stats object. */
STATIC int
xchk_stats_teardown(
struct xchk_stats *cs)
{
return 0;
}
/* Disconnect the stats object from debugfs. */
void
xchk_stats_unregister(
struct xchk_stats *cs)
{
debugfs_remove(cs->cs_debugfs);
}
/* Initialize global stats and register them */
int __init
xchk_global_stats_setup(
struct dentry *parent)
{
int error;
error = xchk_stats_init(&global_stats, NULL);
if (error)
return error;
xchk_stats_register(&global_stats, parent);
return 0;
}
/* Unregister global stats and tear them down */
void
xchk_global_stats_teardown(void)
{
xchk_stats_unregister(&global_stats);
xchk_stats_teardown(&global_stats);
}
/* Allocate per-mount stats */
int
xchk_mount_stats_alloc(
struct xfs_mount *mp)
{
struct xchk_stats *cs;
int error;
cs = kvzalloc(sizeof(struct xchk_stats), GFP_KERNEL);
if (!cs)
return -ENOMEM;
error = xchk_stats_init(cs, mp);
if (error)
goto out_free;
mp->m_scrub_stats = cs;
return 0;
out_free:
kvfree(cs);
return error;
}
/* Free per-mount stats */
void
xchk_mount_stats_free(
struct xfs_mount *mp)
{
xchk_stats_teardown(mp->m_scrub_stats);
kvfree(mp->m_scrub_stats);
mp->m_scrub_stats = NULL;
}
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#ifndef __XFS_SCRUB_STATS_H__
#define __XFS_SCRUB_STATS_H__
struct xchk_stats_run {
u64 scrub_ns;
u64 repair_ns;
unsigned int retries;
bool repair_attempted;
bool repair_succeeded;
};
#ifdef CONFIG_XFS_ONLINE_SCRUB_STATS
struct xchk_stats;
int __init xchk_global_stats_setup(struct dentry *parent);
void xchk_global_stats_teardown(void);
int xchk_mount_stats_alloc(struct xfs_mount *mp);
void xchk_mount_stats_free(struct xfs_mount *mp);
void xchk_stats_register(struct xchk_stats *cs, struct dentry *parent);
void xchk_stats_unregister(struct xchk_stats *cs);
void xchk_stats_merge(struct xfs_mount *mp, const struct xfs_scrub_metadata *sm,
const struct xchk_stats_run *run);
static inline u64 xchk_stats_now(void) { return ktime_get_ns(); }
static inline u64 xchk_stats_elapsed_ns(u64 since)
{
u64 now = xchk_stats_now();
/*
* If the system doesn't have a high enough resolution clock, charge at
* least one nanosecond so that our stats don't report instantaneous
* runtimes.
*/
if (now == since)
return 1;
return now - since;
}
#else
# define xchk_global_stats_setup(parent) (0)
# define xchk_global_stats_teardown() ((void)0)
# define xchk_mount_stats_alloc(mp) (0)
# define xchk_mount_stats_free(mp) ((void)0)
# define xchk_stats_register(cs, parent) ((void)0)
# define xchk_stats_unregister(cs) ((void)0)
# define xchk_stats_now() (0)
# define xchk_stats_elapsed_ns(x) (0 * (x))
# define xchk_stats_merge(mp, sm, run) ((void)0)
#endif /* CONFIG_XFS_ONLINE_SCRUB_STATS */
#endif /* __XFS_SCRUB_STATS_H__ */
......@@ -12,8 +12,10 @@
#include "xfs_mount.h"
#include "xfs_inode.h"
#include "xfs_btree.h"
#include "scrub/scrub.h"
#include "xfs_ag.h"
#include "scrub/scrub.h"
#include "scrub/xfile.h"
#include "scrub/xfarray.h"
/* Figure out which block the btree cursor was pointing to. */
static inline xfs_fsblock_t
......
This diff is collapsed.
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
* Copyright (C) 2021-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#ifndef __XFS_SCRUB_XFARRAY_H__
#define __XFS_SCRUB_XFARRAY_H__
/* xfile array index type, along with cursor initialization */
typedef uint64_t xfarray_idx_t;
#define XFARRAY_CURSOR_INIT ((__force xfarray_idx_t)0)
/* Iterate each index of an xfile array. */
#define foreach_xfarray_idx(array, idx) \
for ((idx) = XFARRAY_CURSOR_INIT; \
(idx) < xfarray_length(array); \
(idx)++)
struct xfarray {
/* Underlying file that backs the array. */
struct xfile *xfile;
/* Number of array elements. */
xfarray_idx_t nr;
/* Maximum possible array size. */
xfarray_idx_t max_nr;
/* Number of unset slots in the array below @nr. */
uint64_t unset_slots;
/* Size of an array element. */
size_t obj_size;
/* log2 of array element size, if possible. */
int obj_size_log;
};
int xfarray_create(const char *descr, unsigned long long required_capacity,
size_t obj_size, struct xfarray **arrayp);
void xfarray_destroy(struct xfarray *array);
int xfarray_load(struct xfarray *array, xfarray_idx_t idx, void *ptr);
int xfarray_unset(struct xfarray *array, xfarray_idx_t idx);
int xfarray_store(struct xfarray *array, xfarray_idx_t idx, const void *ptr);
int xfarray_store_anywhere(struct xfarray *array, const void *ptr);
bool xfarray_element_is_null(struct xfarray *array, const void *ptr);
/* Append an element to the array. */
static inline int xfarray_append(struct xfarray *array, const void *ptr)
{
return xfarray_store(array, array->nr, ptr);
}
uint64_t xfarray_length(struct xfarray *array);
int xfarray_load_next(struct xfarray *array, xfarray_idx_t *idx, void *rec);
/* Declarations for xfile array sort functionality. */
typedef cmp_func_t xfarray_cmp_fn;
/* Perform an in-memory heapsort for small subsets. */
#define XFARRAY_ISORT_SHIFT (4)
#define XFARRAY_ISORT_NR (1U << XFARRAY_ISORT_SHIFT)
/* Evalulate this many points to find the qsort pivot. */
#define XFARRAY_QSORT_PIVOT_NR (9)
struct xfarray_sortinfo {
struct xfarray *array;
/* Comparison function for the sort. */
xfarray_cmp_fn cmp_fn;
/* Maximum height of the partition stack. */
uint8_t max_stack_depth;
/* Current height of the partition stack. */
int8_t stack_depth;
/* Maximum stack depth ever used. */
uint8_t max_stack_used;
/* XFARRAY_SORT_* flags; see below. */
unsigned int flags;
/* Cache a page here for faster access. */
struct xfile_page xfpage;
void *page_kaddr;
#ifdef DEBUG
/* Performance statistics. */
uint64_t loads;
uint64_t stores;
uint64_t compares;
uint64_t heapsorts;
#endif
/*
* Extra bytes are allocated beyond the end of the structure to store
* quicksort information. C does not permit multiple VLAs per struct,
* so we document all of this in a comment.
*
* Pretend that we have a typedef for array records:
*
* typedef char[array->obj_size] xfarray_rec_t;
*
* First comes the quicksort partition stack:
*
* xfarray_idx_t lo[max_stack_depth];
* xfarray_idx_t hi[max_stack_depth];
*
* union {
*
* If for a given subset we decide to use an in-memory sort, we use a
* block of scratchpad records here to compare items:
*
* xfarray_rec_t scratch[ISORT_NR];
*
* Otherwise, we want to partition the records to partition the array.
* We store the chosen pivot record at the start of the scratchpad area
* and use the rest to sample some records to estimate the median.
* The format of the qsort_pivot array enables us to use the kernel
* heapsort function to place the median value in the middle.
*
* struct {
* xfarray_rec_t pivot;
* struct {
* xfarray_rec_t rec; (rounded up to 8 bytes)
* xfarray_idx_t idx;
* } qsort_pivot[QSORT_PIVOT_NR];
* };
* }
*/
};
/* Sort can be interrupted by a fatal signal. */
#define XFARRAY_SORT_KILLABLE (1U << 0)
int xfarray_sort(struct xfarray *array, xfarray_cmp_fn cmp_fn,
unsigned int flags);
#endif /* __XFS_SCRUB_XFARRAY_H__ */
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
* Copyright (C) 2018-2023 Oracle. All Rights Reserved.
* Author: Darrick J. Wong <djwong@kernel.org>
*/
#ifndef __XFS_SCRUB_XFILE_H__
#define __XFS_SCRUB_XFILE_H__
struct xfile_page {
struct page *page;
void *fsdata;
loff_t pos;
};
static inline bool xfile_page_cached(const struct xfile_page *xfpage)
{
return xfpage->page != NULL;
}
static inline pgoff_t xfile_page_index(const struct xfile_page *xfpage)
{
return xfpage->page->index;
}
struct xfile {
struct file *file;
};
int xfile_create(const char *description, loff_t isize, struct xfile **xfilep);
void xfile_destroy(struct xfile *xf);
ssize_t xfile_pread(struct xfile *xf, void *buf, size_t count, loff_t pos);
ssize_t xfile_pwrite(struct xfile *xf, const void *buf, size_t count,
loff_t pos);
/*
* Load an object. Since we're treating this file as "memory", any error or
* short IO is treated as a failure to allocate memory.
*/
static inline int
xfile_obj_load(struct xfile *xf, void *buf, size_t count, loff_t pos)
{
ssize_t ret = xfile_pread(xf, buf, count, pos);
if (ret < 0 || ret != count)
return -ENOMEM;
return 0;
}
/*
* Store an object. Since we're treating this file as "memory", any error or
* short IO is treated as a failure to allocate memory.
*/
static inline int
xfile_obj_store(struct xfile *xf, const void *buf, size_t count, loff_t pos)
{
ssize_t ret = xfile_pwrite(xf, buf, count, pos);
if (ret < 0 || ret != count)
return -ENOMEM;
return 0;
}
loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
struct xfile_stat {
loff_t size;
unsigned long long bytes;
};
int xfile_stat(struct xfile *xf, struct xfile_stat *statbuf);
int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
struct xfile_page *xbuf);
int xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
#endif /* __XFS_SCRUB_XFILE_H__ */
......@@ -478,7 +478,7 @@ xfs_discard_folio(
folio, ip->i_ino, pos);
/*
* The end of the punch range is always the offset of the the first
* The end of the punch range is always the offset of the first
* byte of the next folio. Hence the end offset is only dependent on the
* folio itself and not the start offset that is passed in.
*/
......
......@@ -481,7 +481,8 @@ _xfs_buf_obj_cmp(
* reallocating a busy extent. Skip this buffer and
* continue searching for an exact match.
*/
ASSERT(bp->b_flags & XBF_STALE);
if (!(map->bm_flags & XBM_LIVESCAN))
ASSERT(bp->b_flags & XBF_STALE);
return 1;
}
return 0;
......@@ -559,6 +560,10 @@ xfs_buf_find_lock(
* intact here.
*/
if (bp->b_flags & XBF_STALE) {
if (flags & XBF_LIVESCAN) {
xfs_buf_unlock(bp);
return -ENOENT;
}
ASSERT((bp->b_flags & _XBF_DELWRI_Q) == 0);
bp->b_flags &= _XBF_KMEM | _XBF_PAGES;
bp->b_ops = NULL;
......@@ -682,6 +687,8 @@ xfs_buf_get_map(
int error;
int i;
if (flags & XBF_LIVESCAN)
cmap.bm_flags |= XBM_LIVESCAN;
for (i = 0; i < nmaps; i++)
cmap.bm_len += map[i].bm_len;
......
......@@ -44,6 +44,11 @@ struct xfs_buf;
#define _XBF_DELWRI_Q (1u << 22)/* buffer on a delwri queue */
/* flags used only as arguments to access routines */
/*
* Online fsck is scanning the buffer cache for live buffers. Do not warn
* about length mismatches during lookups and do not return stale buffers.
*/
#define XBF_LIVESCAN (1u << 28)
#define XBF_INCORE (1u << 29)/* lookup only, return if found in cache */
#define XBF_TRYLOCK (1u << 30)/* lock requested, but do not wait */
#define XBF_UNMAPPED (1u << 31)/* do not map the buffer */
......@@ -67,6 +72,7 @@ typedef unsigned int xfs_buf_flags_t;
{ _XBF_KMEM, "KMEM" }, \
{ _XBF_DELWRI_Q, "DELWRI_Q" }, \
/* The following interface flags should never be set */ \
{ XBF_LIVESCAN, "LIVESCAN" }, \
{ XBF_INCORE, "INCORE" }, \
{ XBF_TRYLOCK, "TRYLOCK" }, \
{ XBF_UNMAPPED, "UNMAPPED" }
......@@ -114,8 +120,15 @@ typedef struct xfs_buftarg {
struct xfs_buf_map {
xfs_daddr_t bm_bn; /* block number for I/O */
int bm_len; /* size of I/O */
unsigned int bm_flags;
};
/*
* Online fsck is scanning the buffer cache for live buffers. Do not warn
* about length mismatches during lookups and do not return stale buffers.
*/
#define XBM_LIVESCAN (1U << 0)
#define DEFINE_SINGLE_BUF_MAP(map, blkno, numblk) \
struct xfs_buf_map (map) = { .bm_bn = (blkno), .bm_len = (numblk) };
......
......@@ -1386,7 +1386,7 @@ xfs_qm_dqiterate(
return error;
error = iter_fn(dq, type, priv);
id = dq->q_id;
id = dq->q_id + 1;
xfs_qm_dqput(dq);
} while (error == 0 && id != 0);
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment