- 26 Mar, 2020 5 commits
-
-
Darrick J. Wong authored
I noticed that fsfreeze can take a very long time to freeze an XFS if there happens to be a GETFSMAP caller running in the background. I also happened to notice the following in dmesg: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 43492 at fs/xfs/xfs_super.c:853 xfs_quiesce_attr+0x83/0x90 [xfs] Modules linked in: xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp xt_set ip_set_hash_mac ip_set nfnetlink ip6table_filter ip6_tables bfq iptable_filter sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: xfs] CPU: 2 PID: 43492 Comm: xfs_io Not tainted 5.6.0-rc4-djw #rc4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:xfs_quiesce_attr+0x83/0x90 [xfs] Code: 7c 07 00 00 85 c0 75 22 48 89 df 5b e9 96 c1 00 00 48 c7 c6 b0 2d 38 a0 48 89 df e8 57 64 ff ff 8b 83 7c 07 00 00 85 c0 74 de <0f> 0b 48 89 df 5b e9 72 c1 00 00 66 90 0f 1f 44 00 00 41 55 41 54 RSP: 0018:ffffc900030f3e28 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff88802ac54000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81e4a6f0 RDI: 00000000ffffffff RBP: ffff88807859f070 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000010 R12: 0000000000000000 R13: ffff88807859f388 R14: ffff88807859f4b8 R15: ffff88807859f5e8 FS: 00007fad1c6c0fc0(0000) GS:ffff88807e000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0c7d237000 CR3: 0000000077f01003 CR4: 00000000001606a0 Call Trace: xfs_fs_freeze+0x25/0x40 [xfs] freeze_super+0xc8/0x180 do_vfs_ioctl+0x70b/0x750 ? __fget_files+0x135/0x210 ksys_ioctl+0x3a/0xb0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe These two things appear to be related. The assertion trips when another thread initiates a fsmap request (which uses an empty transaction) after the freezer waited for m_active_trans to hit zero but before the the freezer executes the WARN_ON just prior to calling xfs_log_quiesce. The lengthy delays in freezing happen because the freezer calls xfs_wait_buftarg to clean out the buffer lru list. Meanwhile, the GETFSMAP caller is continuing to grab and release buffers, which means that it can take a very long time for the buffer lru list to empty out. We fix both of these races by calling sb_start_write to obtain freeze protection while using empty transactions for GETFSMAP and for metadata scrubbing. The other two users occur during mount, during which time we cannot fs freeze. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
-
Brian Foster authored
If the bio_add_page() call fails, we proceed to write out a partially constructed log buffer. This corrupts the physical log such that log recovery is not possible. Worse, persistent occurrences of this error eventually lead to a BUG_ON() failure in bio_split() as iclogs wrap the end of the physical log, which triggers log recovery on subsequent mount. Rather than warn about writing out a corrupted log buffer, shutdown the fs as is done for any log I/O related error. This preserves the consistency of the physical log such that log recovery succeeds on a subsequent mount. Note that this was observed on a 64k page debug kernel without upstream commit 59bb4798 ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), which demonstrated frequent iclog bio overflows due to unaligned (slab allocated) iclog data buffers. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Darrick J. Wong authored
When we're checking bestfree information in directory blocks, always drop the block buffer at the end of the function. We should always release resources when we're done using them. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
-
Darrick J. Wong authored
The dirattr btree checking code uses the altpath substructure of the dirattr state structure to check the sibling pointers of dir/attr tree blocks. At the end of sibling checks, xfs_da3_path_shift could have changed multiple levels of buffer pointers in the altpath structure. Although we release the leaf level buffer, this isn't enough -- we also need to release the node buffers that are unique to the altpath. Not releasing all of the altpath buffers leaves them locked to the transaction. This is suboptimal because we should release resources when we don't need them anymore. Fix the function to loop all levels of the altpath, and fix the return logic so that we always run the loop. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
-
Darrick J. Wong authored
When quotacheck runs, it zeroes all the timer fields in every dquot. Unfortunately, it also does this to the root dquot, which erases any preconfigured grace intervals and warning limits that the administrator may have set. Worse yet, the incore copies of those variables remain set. This cache coherence problem manifests itself as the grace interval mysteriously being reset back to the defaults at the /next/ mount. Fix it by not resetting the root disk dquot's timer and warning fields. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
- 23 Mar, 2020 8 commits
-
-
Christoph Hellwig authored
Open code the xlog_state_want_sync logic in its two callers given that this function is a trivial wrapper around xlog_state_switch_iclogs. Move the lockdep assert into xlog_state_switch_iclogs to not lose this debugging aid, and improve the comment that documents xlog_state_switch_iclogs as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Use the shutdown flag in the log to bypass xlog_state_clean_iclog entirely in case of a shut down log. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Factor out a few self-contained helpers from xlog_state_clean_iclog, and update the documentation so it primarily documents why things happens instead of how. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
We can just check for a shut down log all the way down in xlog_cil_committed instead of passing the parameter. This means a slight behavior change in that we now also abort log items if the shutdown came in halfway into the I/O completion processing, which actually is the right thing to do. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
There is no need to check for the ioerror state before the lock, as the shutdown case is not a fast path. Also remove the call to force shutdown the file system, as it must have been shut down already for an iclog to be in the ioerror state. Also clean up the flow of the function a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
The only caller of xfs_log_release_iclog doesn't care about the return value, so remove it. Also don't bother passing the mount pointer, given that we can trivially derive it from the iclog. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Factor out the shared code to wait for a log force into a new helper. This helper uses the XLOG_FORCED_SHUTDOWN check previous only used by the unmount code over the equivalent iclog ioerror state used by the other two functions. There is a slight behavior change in that the force of the unmount record is now accounted in the log force statistics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
xlog_cil_push is only called by xlog_cil_push_work, so merge the two functions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
- 19 Mar, 2020 5 commits
-
-
Christoph Hellwig authored
We know the version is 3 if on a v5 file system. For earlier file systems formats we always upgrade the remaining v1 inodes to v2 and thus only use v2 inodes. Use the xfs_sb_version_has_large_dinode helper to check if we deal with small or large dinodes, and thus remove the need for the di_version field in struct icdinode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Only v5 file systems can have the reflink feature, and those will always use the large dinode format. Remove the extra check for the inode version. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
di_flags2 is initialized to zero for v4 and earlier file systems. This means di_flags2 can only be non-zero for a v5 file systems, in which case both the parent and child inodes can store the field. Remove the extra di_version check, and also remove the rather pointless local di_flags2 variable while at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
The size of the dinode structure is only dependent on the file system version, so instead of checking the individual inode version just use the newly added xfs_sb_version_has_large_dinode helper, and simplify various calling conventions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Add a new wrapper to check if a file system supports the v3 inode format with a larger dinode core. Previously we used xfs_sb_version_hascrc for that, which is technically correct but a little confusing to read. Also move xfs_dinode_good_version next to xfs_sb_version_has_v3inode so that we have one place that documents the superblock version to inode version relationship. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
- 18 Mar, 2020 12 commits
-
-
Brian Foster authored
AIL removal of the quotaoff start intent and free of both quotaoff intents is currently limited to the ->iop_committed() handler of the end intent. This executes when the end intent is committed to the on-disk log and marks the completion of the operation. The problem with this is it assumes the success of the operation. If a shutdown or other error occurs during the quotaoff, it's possible for the quotaoff task to exit without removing the start intent from the AIL. This results in an unmount hang as the AIL cannot be emptied. Further, no other codepath frees the intents and so this is also a memory leak vector. First, update the high level quotaoff error path to directly remove and free the quotaoff start intent if it still exists in the AIL at the time of the error. Next, update both of the start and end quotaoff intents with an ->iop_release() callback to properly handle transaction abort. This means that If the quotaoff start transaction aborts, it frees the start intent in the transaction commit path. If the filesystem shuts down before the end transaction allocates, the quotaoff sequence removes and frees the start intent. If the end transaction aborts, it removes the start intent and frees both. This ensures that a shutdown does not result in a hung unmount and that memory is not leaked regardless of when a quotaoff error occurs. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Brian Foster authored
AIL removal of the quotaoff start intent and free of both intents is hardcoded to the ->iop_committed() handler of the end intent. Factor out the start intent handling code so it can be used in a future patch to properly handle quotaoff errors. Use xfs_trans_ail_remove() instead of the _delete() variant to acquire the AIL lock and also handle cases where an intent might not reside in the AIL at the time of a failure. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Darrick J. Wong authored
Add support for btree staging cursors for the rmap btrees. This is needed both for online repair and also to convert xfs_repair to use btree bulk loading. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-
Darrick J. Wong authored
Add support for btree staging cursors for the refcount btrees. This is needed both for online repair and also to convert xfs_repair to use btree bulk loading. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-
Darrick J. Wong authored
Add support for btree staging cursors for the inode btrees. This is needed both for online repair and also to convert xfs_repair to use btree bulk loading. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-
Darrick J. Wong authored
Add support for btree staging cursors for the free space btrees. This is needed both for online repair and also to convert xfs_repair to use btree bulk loading. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-
Darrick J. Wong authored
Add a new btree function that enables us to bulk load a btree cursor. This will be used by the upcoming online repair patches to generate new btrees. This avoids the programmatic inefficiency of calling xfs_btree_insert in a loop (which generates a lot of log traffic) in favor of stamping out new btree blocks with ordered buffers, and then committing both the new root and scheduling the removal of the old btree blocks in a single transaction commit. The design of this new generic code is based off the btree rebuilding code in xfs_repair's phase 5 code, with the explicit goal of enabling us to share that code between scrub and repair. It has the additional feature of being able to control btree block loading factors. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-
Darrick J. Wong authored
Create an in-core fake root for inode-rooted btree types so that callers can generate a whole new btree using the upcoming btree bulk load function without making the new tree accessible from the rest of the filesystem. It is up to the individual btree type to provide a function to create a staged cursor (presumably with the appropriate callouts to update the fakeroot) and then commit the staged root back into the filesystem. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-
Darrick J. Wong authored
Create an in-core fake root for AG-rooted btree types so that callers can generate a whole new btree using the upcoming btree bulk load function without making the new tree accessible from the rest of the filesystem. It is up to the individual btree type to provide a function to create a staged cursor (presumably with the appropriate callouts to update the fakeroot) and then commit the staged root back into the filesystem. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-
Darrick J. Wong authored
Add a xbitmap_hweight helper function so that we can get rid of the open-coded loop. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-
Darrick J. Wong authored
Shorten the name of xfs_bitmap to xbitmap since the scrub bitmap has nothing to do with the libxfs bitmap. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-
Darrick J. Wong authored
Remove the xfs_bitmap_destroy call from the end of xrep_reap_extents because this sort of violates our rule that the function initializing a structure should destroy it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-
- 17 Mar, 2020 1 commit
-
-
Darrick J. Wong authored
When I lifted the code in xfs_alloc_ag_vextent_lastblock out of a loop, I forgot to convert all the accesses to len to be pointer dereferences. Coverity-id: 1457918 Fixes: 5113f8ec ("xfs: clean up weird while loop in xfs_alloc_ag_vextent_near") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
- 15 Mar, 2020 1 commit
-
-
Darrick J. Wong authored
If the xfs_buf_map array allocation in xfs_dabuf_map fails for whatever reason, we bail out with error code zero. This will confuse callers, so make sure that we return ENOMEM. Allocation failure should never happen with the small size of the array, but code defensively anyway. Fixes: 45feef8f ("xfs: refactor xfs_dabuf_map") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
- 13 Mar, 2020 8 commits
-
-
Christoph Hellwig authored
Move the code for verifying the iclog state on a clean unmount into a helper, and instead of checking the iclog state just rely on the shutdown check as they are equivalent. Also remove the ifdef DEBUG as the compiler is smart enough to eliminate the dead code for non-debug builds. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
When the log is shut down all iclogs are in the XLOG_STATE_IOERROR state, which means that xlog_state_want_sync and xlog_state_release_iclog are no-ops. Remove the whole section of code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Remove the ignored return value from xfs_log_unmount_write, and also remove a rather pointless assert on the return value from xfs_log_force. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Christoph Hellwig authored
A shutdown log is a slow failure path. Add an unlikely annotation to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
-
Dave Chinner authored
This is much less widely used than the bc_private union was, so this is done as a single patch. The named union xfs_btree_cur_private goes away and is embedded into the struct xfs_btree_cur_ag as an anonymous union, and the code is modified via this script: $ sed -i 's/priv\.\([abt|refc]\)/\1/g' fs/xfs/*[ch] fs/xfs/*/*[ch] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-
Dave Chinner authored
we need to name the btree cursor private structures to be able to pull them out of the deeply nested structure definition they are in now. Based on code extracted from a patchset by Darrick Wong. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-
Dave Chinner authored
Rename the union and it's internal structures to the new name and remove the temporary defines that facilitated the change. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
-