• Filipe Manana's avatar
    Btrfs: avoid fallback to transaction commit during fsync of files with holes · ebb92906
    Filipe Manana authored
    When we are doing a full fsync (bit BTRFS_INODE_NEEDS_FULL_SYNC set) of a
    file that has holes and has file extent items spanning two or more leafs,
    we can end up falling to back to a full transaction commit due to a logic
    bug that leads to failure to insert a duplicate file extent item that is
    meant to represent a hole between the last file extent item of a leaf and
    the first file extent item in the next leaf. The failure (EEXIST error)
    leads to a transaction commit (as most errors when logging an inode do).
    
    For example, we have the two following leafs:
    
    Leaf N:
    
      -----------------------------------------------
      | ..., ..., ..., (257, FILE_EXTENT_ITEM, 64K) |
      -----------------------------------------------
      The file extent item at the end of leaf N has a length of 4Kb,
      representing the file range from 64K to 68K - 1.
    
    Leaf N + 1:
    
      -----------------------------------------------
      | (257, FILE_EXTENT_ITEM, 72K), ..., ..., ... |
      -----------------------------------------------
      The file extent item at the first slot of leaf N + 1 has a length of
      4Kb too, representing the file range from 72K to 76K - 1.
    
    During the full fsync path, when we are at tree-log.c:copy_items() with
    leaf N as a parameter, after processing the last file extent item, that
    represents the extent at offset 64K, we take a look at the first file
    extent item at the next leaf (leaf N + 1), and notice there's a 4K hole
    between the two extents, and therefore we insert a file extent item
    representing that hole, starting at file offset 68K and ending at offset
    72K - 1. However we don't update the value of *last_extent, which is used
    to represent the end offset (plus 1, non-inclusive end) of the last file
    extent item inserted in the log, so it stays with a value of 68K and not
    with a value of 72K.
    
    Then, when copy_items() is called for leaf N + 1, because the value of
    *last_extent is smaller then the offset of the first extent item in the
    leaf (68K < 72K), we look at the last file extent item in the previous
    leaf (leaf N) and see it there's a 4K gap between it and our first file
    extent item (again, 68K < 72K), so we decide to insert a file extent item
    representing the hole, starting at file offset 68K and ending at offset
    72K - 1, this insertion will fail with -EEXIST being returned from
    btrfs_insert_file_extent() because we already inserted a file extent item
    representing a hole for this offset (68K) in the previous call to
    copy_items(), when processing leaf N.
    
    The -EEXIST error gets propagated to the fsync callback, btrfs_sync_file(),
    which falls back to a full transaction commit.
    
    Fix this by adjusting *last_extent after inserting a hole when we had to
    look at the next leaf.
    
    Fixes: 4ee3fad3 ("Btrfs: fix fsync after hole punching when using no-holes feature")
    Cc: stable@vger.kernel.org # 4.14+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    ebb92906
tree-log.c 172 KB