• Filipe Manana's avatar
    Btrfs: fix stale directory entries after fsync log replay · bb53eda9
    Filipe Manana authored
    We have another case where after an fsync log replay we get an inode with
    a wrong link count (smaller than it should be) and a number of directory
    entries greater than its link count. This happens when we add a new link
    hard link to our inode A and then we fsync some other inode B that has
    the side effect of logging the parent directory inode too. In this case
    at log replay time we add the new hard link to our inode (the item with
    key BTRFS_INODE_REF_KEY) when processing the parent directory but we
    never adjust the link count of our inode A. As a result we get stale dir
    entries for our inode A that can never be deleted and therefore it makes
    it impossible to remove the parent directory (as its i_size can never
    decrease back to 0).
    
    A simple reproducer for fstests that triggers this issue:
    
      seq=`basename $0`
      seqres=$RESULT_DIR/$seq
      echo "QA output created by $seq"
      tmp=/tmp/$$
      status=1	# failure is the default!
      trap "_cleanup; exit \$status" 0 1 2 3 15
    
      _cleanup()
      {
          _cleanup_flakey
          rm -f $tmp.*
      }
    
      # get standard environment, filters and checks
      . ./common/rc
      . ./common/filter
      . ./common/dmflakey
    
      # real QA test starts here
      _need_to_be_root
      _supported_fs generic
      _supported_os Linux
      _require_scratch
      _require_dm_flakey
      _require_metadata_journaling $SCRATCH_DEV
    
      rm -f $seqres.full
    
      _scratch_mkfs >>$seqres.full 2>&1
      _init_flakey
      _mount_flakey
    
      # Create our test directory and files.
      mkdir $SCRATCH_MNT/testdir
      touch $SCRATCH_MNT/testdir/foo
      touch $SCRATCH_MNT/testdir/bar
    
      # Make sure everything done so far is durably persisted.
      sync
    
      # Create one hard link for file foo and another one for file bar. After
      # that fsync only the file bar.
      ln $SCRATCH_MNT/testdir/bar $SCRATCH_MNT/testdir/bar_link
      ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/foo_link
      $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/bar
    
      # Silently drop all writes on scratch device to simulate power failure.
      _load_flakey_table $FLAKEY_DROP_WRITES
      _unmount_flakey
    
      # Allow writes again and mount the fs to trigger log/journal replay.
      _load_flakey_table $FLAKEY_ALLOW_WRITES
      _mount_flakey
    
      # Now verify both our files have a link count of 2.
      echo "Link count for file foo: $(stat --format=%h $SCRATCH_MNT/testdir/foo)"
      echo "Link count for file bar: $(stat --format=%h $SCRATCH_MNT/testdir/bar)"
    
      # We should be able to remove all the links of our files in testdir, and
      # after that the parent directory should become empty and therefore
      # possible to remove it.
      rm -f $SCRATCH_MNT/testdir/*
      rmdir $SCRATCH_MNT/testdir
    
      _unmount_flakey
    
      # The fstests framework will call fsck against our filesystem which will verify
      # that all metadata is in a consistent state.
    
      status=0
      exit
    
    The test fails with:
    
     -Link count for file foo: 2
     +Link count for file foo: 1
      Link count for file bar: 2
     +rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/testdir/foo_link': Stale file handle
     +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty
     (...)
     _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
    
    And fsck's output:
    
      (...)
      checking fs roots
      root 5 inode 258 errors 2001, no inode item, link count wrong
          unresolved ref dir 257 index 5 namelen 8 name foo_link filetype 1 errors 4, no inode ref
      Checking filesystem on /dev/sdc
      (...)
    
    So fix this by marking inodes for link count fixup at log replay time
    whenever a directory entry is replayed if the entry was created in the
    transaction where the fsync was made and if it points to a non-directory
    inode.
    
    This isn't a new problem/regression, the issue exists for a long time,
    possibly since the log tree feature was added (2008).
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    bb53eda9
tree-log.c 141 KB