• Filipe Manana's avatar
    Btrfs: fix cloning range with a hole when using the NO_HOLES feature · fcb97058
    Filipe Manana authored
    When using the NO_HOLES feature if we clone a range that contains a hole
    and a temporary ENOSPC happens while dropping extents from the target
    inode's range, we can end up failing and aborting the transaction with
    -EEXIST or with a corrupt file extent item, that has a length greater
    than it should and overlaps with other extents. For example when cloning
    the following range from inode A to inode B:
    
      Inode A:
    
        extent A1                                          extent A2
      [ ----------- ]  [ hole, implicit, 4MB length ]  [ ------------- ]
      0            1MB                                 5MB            6MB
    
      Range to clone: [1MB, 6MB)
    
      Inode B:
    
        extent B1       extent B2        extent B3         extent B4
      [ ---------- ]  [ --------- ]    [ ---------- ]    [ ---------- ]
      0           1MB 1MB        2MB   2MB        5MB    5MB         6MB
    
      Target range: [1MB, 6MB) (same as source, to make it easier to explain)
    
    The following can happen:
    
    1) btrfs_punch_hole_range() gets -ENOSPC from __btrfs_drop_extents();
    
    2) At that point, 'cur_offset' is set to 1MB and __btrfs_drop_extents()
       set 'drop_end' to 2MB, meaning it was able to drop only extent B2;
    
    3) We then compute 'clone_len' as 'drop_end' - 'cur_offset' = 2MB - 1MB =
       1MB;
    
    4) We then attempt to insert a file extent item at inode B with a file
       offset of 5MB, which is the value of clone_info->file_offset. This
       fails with error -EEXIST because there's already an extent at that
       offset (extent B4);
    
    5) We abort the current transaction with -EEXIST and return that error
       to user space as well.
    
    Another example, for extent corruption:
    
      Inode A:
    
        extent A1                                           extent A2
      [ ----------- ]   [ hole, implicit, 10MB length ]  [ ------------- ]
      0            1MB                                  11MB            12MB
    
      Inode B:
    
        extent B1         extent B2
      [ ----------- ]   [ --------- ]    [ ----------------------------- ]
      0            1MB 1MB         5MB  5MB                             12MB
    
      Target range: [1MB, 12MB) (same as source, to make it easier to explain)
    
    1) btrfs_punch_hole_range() gets -ENOSPC from __btrfs_drop_extents();
    
    2) At that point, 'cur_offset' is set to 1MB and __btrfs_drop_extents()
       set 'drop_end' to 5MB, meaning it was able to drop only extent B2;
    
    3) We then compute 'clone_len' as 'drop_end' - 'cur_offset' = 5MB - 1MB =
       4MB;
    
    4) We then insert a file extent item at inode B with a file offset of 11MB
       which is the value of clone_info->file_offset, and a length of 4MB (the
       value of 'clone_len'). So we get 2 extents items with ranges that
       overlap and an extent length of 4MB, larger then the extent A2 from
       inode A (1MB length);
    
    5) After that we end the transaction, balance the btree dirty pages and
       then start another or join the previous transaction. It might happen
       that the transaction which inserted the incorrect extent was committed
       by another task so we end up with extent corruption if a power failure
       happens.
    
    So fix this by making sure we attempt to insert the extent to clone at
    the destination inode only if we are past dropping the sub-range that
    corresponds to a hole.
    
    Fixes: 690a5dbf ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents")
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    fcb97058
file.c 92.2 KB