Btrfs: fix stale directory entries after fsync log replay
We have another case where after an fsync log replay we get an inode with a wrong link count (smaller than it should be) and a number of directory entries greater than its link count. This happens when we add a new link hard link to our inode A and then we fsync some other inode B that has the side effect of logging the parent directory inode too. In this case at log replay time we add the new hard link to our inode (the item with key BTRFS_INODE_REF_KEY) when processing the parent directory but we never adjust the link count of our inode A. As a result we get stale dir entries for our inode A that can never be deleted and therefore it makes it impossible to remove the parent directory (as its i_size can never decrease back to 0). A simple reproducer for fstests that triggers this issue: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { _cleanup_flakey rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter . ./common/dmflakey # real QA test starts here _need_to_be_root _supported_fs generic _supported_os Linux _require_scratch _require_dm_flakey _require_metadata_journaling $SCRATCH_DEV rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _init_flakey _mount_flakey # Create our test directory and files. mkdir $SCRATCH_MNT/testdir touch $SCRATCH_MNT/testdir/foo touch $SCRATCH_MNT/testdir/bar # Make sure everything done so far is durably persisted. sync # Create one hard link for file foo and another one for file bar. After # that fsync only the file bar. ln $SCRATCH_MNT/testdir/bar $SCRATCH_MNT/testdir/bar_link ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/foo_link $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/bar # Silently drop all writes on scratch device to simulate power failure. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey # Allow writes again and mount the fs to trigger log/journal replay. _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # Now verify both our files have a link count of 2. echo "Link count for file foo: $(stat --format=%h $SCRATCH_MNT/testdir/foo)" echo "Link count for file bar: $(stat --format=%h $SCRATCH_MNT/testdir/bar)" # We should be able to remove all the links of our files in testdir, and # after that the parent directory should become empty and therefore # possible to remove it. rm -f $SCRATCH_MNT/testdir/* rmdir $SCRATCH_MNT/testdir _unmount_flakey # The fstests framework will call fsck against our filesystem which will verify # that all metadata is in a consistent state. status=0 exit The test fails with: -Link count for file foo: 2 +Link count for file foo: 1 Link count for file bar: 2 +rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/testdir/foo_link': Stale file handle +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty (...) _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent And fsck's output: (...) checking fs roots root 5 inode 258 errors 2001, no inode item, link count wrong unresolved ref dir 257 index 5 namelen 8 name foo_link filetype 1 errors 4, no inode ref Checking filesystem on /dev/sdc (...) So fix this by marking inodes for link count fixup at log replay time whenever a directory entry is replayed if the entry was created in the transaction where the fsync was made and if it points to a non-directory inode. This isn't a new problem/regression, the issue exists for a long time, possibly since the log tree feature was added (2008). Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
Showing
Please register or sign in to comment