• Qu Wenruo's avatar
    btrfs: zstd: fix and simplify the inline extent decompression (v2) · 56596a9f
    Qu Wenruo authored
    Note: this is a fixed version that was previously reverted as
    e01a83e1 ("Revert "btrfs: zstd: fix and simplify the inline extent
    decompression""), with fixed parameters to memzero_page().
    
    [BUG]
    If we have a filesystem with 4k sectorsize, and an inlined compressed
    extent created like this:
    
    	item 4 key (257 INODE_ITEM 0) itemoff 15863 itemsize 160
    		generation 8 transid 8 size 4096 nbytes 4096
    		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
    		sequence 1 flags 0x0(none)
    	item 5 key (257 INODE_REF 256) itemoff 15839 itemsize 24
    		index 2 namelen 14 name: source_inlined
    	item 6 key (257 EXTENT_DATA 0) itemoff 15770 itemsize 69
    		generation 8 type 0 (inline)
    		inline extent data size 48 ram_bytes 4096 compression 3 (zstd)
    
    Then trying to reflink that extent in an aarch64 system with 64K page
    size, the reflink would just fail:
    
      # xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
      XFS_IOC_CLONE_RANGE: Input/output error
    
    [CAUSE]
    In zstd_decompress(), we didn't treat @start_byte as just a page offset,
    but also use it as an indicator on whether we should error out, without
    any proper explanation (this is copied from other decompression code).
    
    In reality, for subpage cases, although @start_byte can be non-zero,
    we should never switch input/output buffer nor error out, since the whole
    input/output buffer should never exceed one sector, thus we should not
    need to do any buffer switch.
    
    Thus the current code using @start_byte as a condition to switch
    input/output buffer or finish the decompression is completely incorrect.
    
    [FIX]
    The fix involves several modification:
    
    - Rename @start_byte to @dest_pgoff to properly express its meaning
    
    - Use @sectorsize other than PAGE_SIZE to properly initialize the
      output buffer size
    
    - Use correct destination offset inside the destination page
    
    - Simplify the main loop
      Since the input/output buffer should never switch, we only need one
      zstd_decompress_stream() call.
    
    - Consider early end as an error
    
    After the fix, even on 64K page sized aarch64, above reflink now
    works as expected:
    
      # xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
      linked 4096/4096 bytes at offset 61440
    
    And results the correct file layout:
    
    	item 9 key (258 INODE_ITEM 0) itemoff 15542 itemsize 160
    		generation 10 transid 10 size 65536 nbytes 4096
    		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
    		sequence 1 flags 0x0(none)
    	item 10 key (258 INODE_REF 256) itemoff 15528 itemsize 14
    		index 3 namelen 4 name: dest
    	item 11 key (258 XATTR_ITEM 3817753667) itemoff 15445 itemsize 83
    		location key (0 UNKNOWN.0 0) type XATTR
    		transid 10 data_len 37 name_len 16
    		name: security.selinux
    		data unconfined_u:object_r:unlabeled_t:s0
    	item 12 key (258 EXTENT_DATA 61440) itemoff 15392 itemsize 53
    		generation 10 type 1 (regular)
    		extent data disk byte 13631488 nr 4096
    		extent data offset 0 nr 4096 ram 4096
    		extent compression 0 (none)
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    56596a9f
zstd.c 17.8 KB