• Filipe Manana's avatar
    btrfs: fix race causing unnecessary inode logging during link and rename · de53d892
    Filipe Manana authored
    
    
    When we are doing a rename or a link operation for an inode that was logged
    in the previous transaction and that transaction is still committing, we
    have a time window where we incorrectly consider that the inode was logged
    previously in the current transaction and therefore decide to log it to
    update it in the log. The following steps give an example on how this
    happens during a link operation:
    
    1) Inode X is logged in transaction 1000, so its logged_trans field is set
       to 1000;
    
    2) Task A starts to commit transaction 1000;
    
    3) The state of transaction 1000 is changed to TRANS_STATE_UNBLOCKED;
    
    4) Task B starts a link operation for inode X, and as a consequence it
       starts transaction 1001;
    
    5) Task A is still committing transaction 1000, therefore the value stored
       at fs_info->last_trans_committed is still 999;
    
    6) Task B calls btrfs_log_new_name(), it reads a value of 999 from
       fs_info->last_trans_committed and because the logged_trans field of
       inode X has a value of 1000, the function does not return immediately,
       instead it proceeds to logging the inode, which should not happen
       because the inode was logged in the previous transaction (1000) and
       not in the current one (1001).
    
    This is not a functional problem, just wasted time and space logging an
    inode that does not need to be logged, contributing to higher latency
    for link and rename operations.
    
    So fix this by comparing the inodes' logged_trans field with the
    generation of the current transaction instead of comparing with the value
    stored in fs_info->last_trans_committed.
    
    This case is often hit when running dbench for a long enough duration, as
    it does lots of rename operations.
    
    This patch belongs to a patch set that is comprised of the following
    patches:
    
      btrfs: fix race causing unnecessary inode logging during link and rename
      btrfs: fix race that results in logging old extents during a fast fsync
      btrfs: fix race that causes unnecessary logging of ancestor inodes
      btrfs: fix race that makes inode logging fallback to transaction commit
      btrfs: fix race leading to unnecessary transaction commit when logging inode
      btrfs: do not block inode logging for so long during transaction commit
    
    Performance results are mentioned in the change log of the last patch.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    de53d892
tree-log.c 173 KB