• Eric Sandeen's avatar
    ext4: fix race in xattr block allocation path · 6d6a4351
    Eric Sandeen authored
    Ceph users reported that when using Ceph on ext4, the filesystem
    would often become corrupted, containing inodes with incorrect
    i_blocks counters.
    
    I managed to reproduce this with a very hacked-up "streamtest"
    binary from the Ceph tree.
    
    Ceph is doing a lot of xattr writes, to out-of-inode blocks.
    There is also another thread which does sync_file_range and close,
    of the same files.  The problem appears to happen due to this race:
    
    sync/flush thread               xattr-set thread
    -----------------               ----------------
    
    do_writepages                   ext4_xattr_set
    ext4_da_writepages              ext4_xattr_set_handle
    mpage_da_map_blocks             ext4_xattr_block_set
            set DELALLOC_RESERVE
                                    ext4_new_meta_blocks
                                            ext4_mb_new_blocks
                                                    if (!i_delalloc_reserved_flag)
                                                            vfs_dq_alloc_block
    ext4_get_blocks
    	down_write(i_data_sem)
            set i_delalloc_reserved_flag
    	...
    	up_write(i_data_sem)
                                            if (i_delalloc_reserved_flag)
                                                    vfs_dq_alloc_block_nofail
    
    
    In other words, the sync/flush thread pops in and sets
    i_delalloc_reserved_flag on the inode, which makes the xattr thread
    think that it's in a delalloc path in ext4_new_meta_blocks(),
    and add the block for a second time, after already having added
    it once in the !i_delalloc_reserved_flag case in ext4_mb_new_blocks
    
    The real problem is that we shouldn't be using the DELALLOC_RESERVED
    state flag, and instead we should be passing
    EXT4_GET_BLOCKS_DELALLOC_RESERVE down to ext4_map_blocks() instead of
    using an inode state flag.  We'll fix this for now with using
    i_data_sem to prevent this race, but this is really not the right way
    to fix things.
    Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
    Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
    Cc: stable@kernel.org
    6d6a4351
xattr.c 41.9 KB