- 07 Dec, 2016 1 commit
-
-
Lucas Stach authored
On filesystems with a lot of metadata and in metadata intensive workloads xfs_buf_find() is showing up at the top of the CPU cycles trace. Most of the CPU time is spent on CPU cache misses while traversing the rbtree. As the buffer cache does not need any kind of ordering, but fast lookups a hashtable is the natural data structure to use. The rhashtable infrastructure provides a self-scaling hashtable implementation and allows lookups to proceed while the table is going through a resize operation. This reduces the CPU-time spent for the lookups to 1/3 even for small filesystems with a relatively small number of cached buffers, with possibly much larger gains on higher loaded filesystems. [dchinner: reduce minimum hash size to an acceptable size for large filesystems with many AGs with no active use.] [dchinner: remove stale rbtree asserts.] [dchinner: use xfs_buf_map for compare function argument.] [dchinner: make functions static.] [dchinner: remove redundant comments.] Signed-off-by:
Lucas Stach <dev@lynxeye.de> Signed-off-by:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
- 05 Dec, 2016 14 commits
-
-
Dave Chinner authored
Nick Piggin reported that the CRC overhead in an fsync heavy workload was higher than expected on a Power8 machine. Part of this was to do with the fact that the power8 CRC implementation is not efficient for CRC lengths of less than 512 bytes, and so the way we split the CRCs over the CRC field means a lot of the CRCs are reduced to being less than than optimal size. To optimise this, change the CRC update mechanism to zero the CRC field first, and then compute the CRC in one pass over the buffer and write the result back into the buffer. We can do this safely because anything writing a CRC has exclusive access to the buffer the CRC is being calculated over. We leave the CRC verify code the same - it still splits the CRC calculation - because we do not want read-only operations modifying the underlying buffer. This is because read-only operations may not have an exclusive access to the buffer guaranteed, and so temporary modifications could leak out to to other processes accessing the buffer concurrently. Signed-off-by:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Dave Chinner authored
Embedding a switch statement in every btree stats inc/add adds a lot of code overhead to the core btree infrastructure paths. Stats are supposed to be small and lightweight, but the btree stats have become big and bloated as we've added more btrees. It needs fixing because the reflink code will just add more overhead again. Convert the v2 btree stats to arrays instead of independent variables, and instead use the type to index the specific btree array via an enum. This allows us to use array based indexing to update the stats, rather than having to derefence variables specific to the btree type. If we then wrap the xfsstats structure in a union and place uint32_t array beside it, and calculate the correct btree stats array base array index when creating a btree cursor, we can easily access entries in the stats structure without having to switch names based on the btree type. We then replace with the switch statement with a simple set of stats wrapper macros, resulting in a significant simplification of the btree stats code, and: text data bss dec hex filename 48905 144 8 49057 bfa1 fs/xfs/libxfs/xfs_btree.o.old 36793 144 8 36945 9051 fs/xfs/libxfs/xfs_btree.o it reduces the core btree infrastructure code size by close to 25%! Signed-off-by:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
After various discussions on linux-fsdevel, it has been decided that it is not necessary to cap the length of a dedupe request, and that correctly-written userspace client programs will be able to absorb the change. Therefore, remove the length clamping behavior. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
The on-disk field di_size is used to set i_size, which is a signed integer of loff_t. If the high bit of di_size is set, we'll end up with a negative i_size, which will cause all sorts of problems. Since the VFS won't let us create a file with such length, we should catch them here in the verifier too. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
We shouldn't assert if somehow we end up trying to add an attr fork to an inode that apparently already has attr extents because this is an indication of on-disk corruption. Instead, return an error code to userspace. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
In xfs_dir3_data_read, we can encounter the situation where err == 0 and *bpp == NULL if the given bno offset happens to be a hole; this leads to a crash if we try to set the buffer type after the _da_read_buf call. Holes can happen due to corrupt or malicious entries in the bmbt data, so be a little more careful when we're handling buffers. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
When reading into memory all extents of a btree-format inode fork, complain if the number of extents we find is not the same as the number of extents reported in the inode core. This is needed to stop an IO action from accessing the garbage areas of the in-core fork. [dchinner: removed redundant assert] Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
When we're reading a btree block, make sure that what we retrieved matches the owner and level; and has a plausible number of records. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
There is no such thing as a zero-level AG btree since even a single-node zero-records btree has one level. Btree cursor constructors read cur_nlevels straight from disk and then access things like cur_bufs[cur_nlevels - 1] which is /really/ bad if cur_nlevels is zero! Therefore, strengthen the verifiers to prevent this possibility. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
There are a handful of xattr functions which now return nothing but zero. They can be made void, chased through calling functions, and error handling etc can be removed. Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
By inspection, xfs_bmap_trace_exlist isn't handling cow forks, and will trace the data fork instead. Fix this by setting state appropriately if whichfork == XFS_COW_FORK. ()___() < @ @ > | | {o_o} (|) Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
When xfs_bmap_trace_exlist called trace_xfs_extlist, it sent in the "whichfork" var instead of the bmap "state" as expected (even though state was already set up for this purpose). As a result, the xfs_bmap_class in tracing code used "whichfork" not state in xfs_iext_state_to_fork(), and got the wrong ifork pointer. It all goes downhill from there, including an ASSERT when ifp_bytes is empty by the time it reaches xfs_iext_get_ext(): XFS: Assertion failed: idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t) Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
We've missed properly setting the buffer type for an AGI transaction in 3 spots now, so just move it into xfs_read_agi() and set it if we are in a transaction to avoid the problem in the future. This is similar to how it is done in i.e. the dir3 and attr3 read functions. Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Reviewed-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
xlog_recover_clear_agi_bucket didn't set the type to XFS_BLFT_AGI_BUF, so we got a warning during log replay (or an ASSERT on a debug build). XFS (md0): Unknown buffer type 0! XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1 Fix this, as was done in f19b872b for 2 other locations with the same problem. cc: <stable@vger.kernel.org> # 3.10 to current Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Reviewed-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
- 28 Nov, 2016 8 commits
-
-
Brian Foster authored
xfs_file_iomap_begin_delay() implements post-eof speculative preallocation by extending the block count of the requested delayed allocation. Now that xfs_bmapi_reserve_delalloc() has been updated to handle prealloc blocks separately and tag the inode, update xfs_file_iomap_begin_delay() to use the new parameter and rely on the former to tag the inode. Note that this patch does not change behavior. Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Brian Foster authored
COW fork reservation is implemented via delayed allocation. The code is modeled after the traditional delalloc allocation code, but is slightly different in terms of how preallocation occurs. Rather than post-eof speculative preallocation, COW fork preallocation is implemented via a COW extent size hint that is designed to minimize fragmentation as a reflinked file is split over time. xfs_reflink_reserve_cow() still uses logic that is oriented towards dealing with post-eof speculative preallocation, however, and is stale or not necessarily correct. First, the EOF alignment to the COW extent size hint is implemented in xfs_bmapi_reserve_delalloc() (which does so correctly by aligning the start and end offsets) and so is not necessary in xfs_reflink_reserve_cow(). The backoff and retry logic on ENOSPC is also ineffective for the same reason, as xfs_bmapi_reserve_delalloc() will simply perform the same allocation request on the retry. Finally, since the COW extent size hint aligns the start and end offset of the range to allocate, the end_fsb != orig_end_fsb logic is not sufficient. Indeed, if a write request happens to end on an aligned offset, it is possible that we do not tag the inode for COW preallocation even though xfs_bmapi_reserve_delalloc() may have preallocated at the start offset. Kill the unnecessary, duplicate code in xfs_reflink_reserve_cow(). Remove the inode tag logic as well since xfs_bmapi_reserve_delalloc() has been updated to tag the inode correctly. Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Brian Foster authored
Speculative preallocation is currently processed entirely by the callers of xfs_bmapi_reserve_delalloc(). The caller determines how much preallocation to include, adjusts the extent length and passes down the resulting request. While this works fine for post-eof speculative preallocation, it is not as reliable for COW fork preallocation. COW fork preallocation is implemented via the cowextszhint, which aligns the start offset as well as the length of the extent. Further, it is difficult for the caller to accurately identify when preallocation occurs because the returned extent could have been merged with neighboring extents in the fork. To simplify this situation and facilitate further COW fork preallocation enhancements, update xfs_bmapi_reserve_delalloc() to take a separate preallocation parameter to incorporate into the allocation request. The preallocation blocks value is tacked onto the end of the request and adjusted to accommodate neighboring extents and extent size limits. Since xfs_bmapi_reserve_delalloc() now knows precisely how much preallocation was included in the allocation, it can also tag the inodes appropriately to support preallocation reclaim. Note that xfs_bmapi_reserve_delalloc() callers are not yet updated to use the preallocation mechanism. This patch should not change behavior outside of correctly tagging reflink inodes when start offset preallocation occurs (which the caller does not handle correctly). Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
It turns out that btrfs and xfs had differing interpretations of what to do when the dedupe length is zero. Change xfs to follow btrfs' semantics so that the userland interface is consistent. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Bhumika Goyal authored
Declare the structure xfs_nameops as const as it is only stored in the m_dirnameops field of a xfs_mount structure. This field is of type const struct xfs_nameops *, so xfs_nameops structures having this property can be declared as const. Done using Coccinelle: @r1 disable optional_qualifier @ identifier i; position p; @@ static struct xfs_nameops i@p = {...}; @ok1@ identifier r1.i; position p; struct xfs_mount mp; @@ mp.m_dirnameops=&i@p @bad@ position p!={r1.p,ok1.p}; identifier r1.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r1.i; @@ static +const struct xfs_nameops i={...}; @depends on !bad disable optional_qualifier@ identifier r1.i; @@ +const struct xfs_nameops i; File size before: text data bss dec hex filename 5302 85 0 5387 150b fs/xfs/libxfs/xfs_dir2.o File size after: text data bss dec hex filename 5318 69 0 5387 150b fs/xfs/libxfs/xfs_dir2.o Signed-off-by:
Bhumika Goyal <bhumirks@gmail.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Bhumika Goyal authored
Declare the structure xfs_item_ops as const as it is only passed as an argument to the function xfs_log_item_init. As this argument is of type const struct xfs_item_ops *, so xfs_item_ops structures having this property can be declared as const. Done using Coccinelle: @r1 disable optional_qualifier @ identifier i; position p; @@ static struct xfs_item_ops i@p = {...}; @ok1@ identifier r1.i; position p; expression e1,e2,e3; @@ xfs_log_item_init(e1,e2,e3,&i@p) @bad@ position p!={r1.p,ok1.p}; identifier r1.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r1.i; @@ static +const struct xfs_item_ops i={...}; @depends on !bad disable optional_qualifier@ identifier r1.i; @@ +const struct xfs_item_ops i; File size before: text data bss dec hex filename 737 64 8 809 329 fs/xfs/xfs_icreate_item.o File size after: text data bss dec hex filename 801 0 8 809 329 fs/xfs/xfs_icreate_item.o Signed-off-by:
Bhumika Goyal <bhumirks@gmail.com> Reviewed-by:
Eric Sandeen <sandeen@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Darrick J. Wong authored
When we're estimating the amount of space it's going to take to satisfy a delalloc reservation, we need to include the space that we might need to grow the rmapbt. This helps us to avoid running out of space later when _iomap_write_allocate needs more space than we reserved. Eryu Guan observed this happening on generic/224 when sunit/swidth were set. Reported-by:
Eryu Guan <eguan@redhat.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
When XBF_NO_IOACCT got added, it missed the translation in XFS_BUF_FLAGS, so we see "0x8" in trace output rather than the flag name. Fix it. Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
- 24 Nov, 2016 14 commits
-
-
Christoph Hellwig authored
We only ever set a field to this constant for an impossible to reach error case in xfs_bmap_search_extents. That functions has been removed, so we can remove the constant as well. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Now that all users are gone. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
And remove the unused return value. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Use xfs_iext_lookup_extent to look up the extent, drop a useless check, drop a unneeded return value and clean up the general style a little bit. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
And only lookup the previous extent inside xfs_iomap_prealloc_size if we actually need it. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
We can easily lookup the previous extent for the cases where we need it, which saves the callers from looking it up for us later in the series. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
Rewrite the function using xfs_iext_lookup_extent and xfs_iext_get_extent, and massage the flow into something easily understandable. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Christoph Hellwig authored
xfs_iext_lookup_extent looks up a single extent at the passed in offset, and returns the extent covering the area, or the one behind it in case of a hole, as well as the index of the returned extent in arguments, as well as a simple bool as return value that is set to false if no extent could be found because the offset is behind EOF. It is a simpler replacement for xfs_bmap_search_extent that leaves looking up the rarely needed previous extent to the caller and has a nicer calling convention. xfs_iext_get_extent is a helper for iterating over the extent list, it takes an extent index as input, and returns the extent at that index in it's expanded form in an argument if it exists. The actual return value is a bool whether the index is valid or not. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
- 09 Nov, 2016 1 commit
-
-
Brian Foster authored
Filesystem shutdown testing on an older distro kernel has uncovered an imbalanced locking pattern for the inode flush lock in xfs_reclaim_inode(). Specifically, there is a double unlock sequence between the call to xfs_iflush_abort() and xfs_reclaim_inode() at the "reclaim:" label. This actually does not cause obvious problems on current kernels due to the current flush lock implementation. Older kernels use a counting based flush lock mechanism, however, which effectively breaks the lock indefinitely when an already unlocked flush lock is repeatedly unlocked. Though this only currently occurs on filesystem shutdown, it has reproduced the effect of elevating an fs shutdown to a system-wide crash or hang. As it turns out, the flush lock is not actually required for the reclaim logic in xfs_reclaim_inode() because by that time we have already cycled the flush lock once while holding ILOCK_EXCL. Therefore, remove the additional flush lock/unlock cycle around the 'reclaim:' label and update branches into this label to release the flush lock where appropriate. Add an assert to xfs_ifunlock() to help prevent future occurences of the same problem. Reported-by:
Zorro Lang <zlang@redhat.com> Signed-off-by:
Brian Foster <bfoster@redhat.com> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
- 08 Nov, 2016 2 commits
-
-
Eric Sandeen authored
The open-coded pattern: ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t) is all over the xfs code; provide a new helper xfs_iext_count(ifp) to count the number of inline extents in an inode fork. [dchinner: pick up several missed conversions] Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
Eric Sandeen authored
There have been several reports over the years of NULL pointer dereferences in xfs_trans_log_inode during xfs_fsr processes, when the process is doing an fput and tearing down extents on the temporary inode, something like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 PID: 29439 TASK: ffff880550584fa0 CPU: 6 COMMAND: "xfs_fsr" [exception RIP: xfs_trans_log_inode+0x10] #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs] #10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs] #11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs] #12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs] #13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs] #14 [ffff8800a57bbe00] evict at ffffffff811e1b67 #15 [ffff8800a57bbe28] iput at ffffffff811e23a5 #16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8 #17 [ffff8800a57bbe88] dput at ffffffff811dd06c #18 [ffff8800a57bbea8] __fput at ffffffff811c823b #19 [ffff8800a57bbef0] ____fput at ffffffff811c846e #20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27 #21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c #22 [ffff8800a57bbf50] int_signal at ffffffff8161405d As it turns out, this is because the i_itemp pointer, along with the d_ops pointer, has been overwritten with zeros when we tear down the extents during truncate. When the in-core inode fork on the temporary inode used by xfs_fsr was originally set up during the extent swap, we mistakenly looked at di_nextents to determine whether all extents fit inline, but this misses extents generated by speculative preallocation; we should be using if_bytes instead. This mistake corrupts the in-memory inode, and code in xfs_iext_remove_inline eventually gets bad inputs, causing it to memmove and memset incorrect ranges; this became apparent because the two values in ifp->if_u2.if_inline_ext[1] contained what should have been in d_ops and i_itemp; they were memmoved due to incorrect array indexing and then the original locations were zeroed with memset, again due to an array overrun. Fix this by properly using i_df.if_bytes to determine the number of extents, not di_nextents. Thanks to dchinner for looking at this with me and spotting the root cause. Cc: stable@vger.kernel.org Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Reviewed-by:
Brian Foster <bfoster@redhat.com> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-