• Filipe Manana's avatar
    btrfs: fix directory logging due to race with concurrent index key deletion · 8bb6898d
    Filipe Manana authored
    
    
    Sometimes we log a directory without holding its VFS lock, so while we
    logging it, dir index entries may be added or removed. This typically
    happens when logging a dentry from a parent directory that points to a
    new directory, through log_new_dir_dentries(), or when while logging
    some other inode we also need to log its parent directories (through
    btrfs_log_all_parents()).
    
    This means that while we are at log_dir_items(), we may not find a dir
    index key we found before, because it was deleted in the meanwhile, so
    a call to btrfs_search_slot() may return 1 (key not found). In that case
    we return from log_dir_items() with a success value (the variable 'err'
    has a value of 0). This can lead to a few problems, specially in the case
    where the variable 'last_offset' has a value of (u64)-1 (and it's
    initialized to that when it was declared):
    
    1) By returning from log_dir_items() with success (0) and a value of
       (u64)-1 for '*last_offset_ret', we end up not logging any other dir
       index keys that follow the missing, just deleted, index key. The
       (u64)-1 value makes log_directory_changes() not call log_dir_items()
       again;
    
    2) Before returning with success (0), log_dir_items(), will log a dir
       index range item covering a range from the last old dentry index
       (stored in the variable 'last_old_dentry_offset') to the value of
       'last_offset'. If 'last_offset' has a value of (u64)-1, then it means
       if the log is persisted and replayed after a power failure, it will
       cause deletion of all the directory entries that have an index number
       between last_old_dentry_offset + 1 and (u64)-1;
    
    3) We can end up returning from log_dir_items() with
       ctx->last_dir_item_offset having a lower value than
       inode->last_dir_index_offset, because the former is set to the current
       key we are processing at process_dir_items_leaf(), and at the end of
       log_directory_changes() we set inode->last_dir_index_offset to the
       current value of ctx->last_dir_item_offset. So if for example a
       deletion of a lower dir index key happened, we set
       ctx->last_dir_item_offset to that index value, then if we return from
       log_dir_items() because btrfs_search_slot() returned 1, we end up
       returning from log_dir_items() with success (0) and then
       log_directory_changes() sets inode->last_dir_index_offset to a lower
       value than it had before.
       This can result in unpredictable and unexpected behaviour when we
       need to log again the directory in the same transaction, and can result
       in ending up with a log tree leaf that has duplicated keys, as we do
       batch insertions of dir index keys into a log tree.
    
    So fix this by making log_dir_items() move on to the next dir index key
    if it does not find the one it was looking for.
    Reported-by: default avatarDavid Arendt <admin@prnet.org>
    Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
    
    
    CC: stable@vger.kernel.org # 4.14+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    8bb6898d
tree-log.c 210 KB