1. 30 Mar, 2018 10 commits
  2. 26 Mar, 2018 30 commits
    • Liu Bo's avatar
      Btrfs: dev-replace: make sure target is identical to source when raid56 rebuild fails · 4759700a
      Liu Bo authored
      In the last step of scrub_handle_error_block, we try to combine good
      copies on all possible mirrors, this works fine for raid1 and raid10,
      but not for raid56 as it's doing parity rebuild.
      
      If parity rebuild doesn't get back with correct data which matches its
      checksum, in case of replace we'd rather write what is stored in the
      source device than the data calculuated from parity.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4759700a
    • Liu Bo's avatar
      Btrfs: raid56: remove redundant async_missing_raid56 · d6a69135
      Liu Bo authored
      async_missing_raid56() is identical to async_read_rebuild().
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d6a69135
    • Su Yue's avatar
      btrfs: adjust return values of btrfs_inode_by_name · 005d6712
      Su Yue authored
      Previously, btrfs_inode_by_name() returned 0 which left caller to check
      objectid of location even location if the type was invalid.
      
      Let btrfs_inode_by_name() return -EUCLEAN if a corrupted location of a
      dir entry is found.  Removal of label out_err also simplifies the
      function.
      Signed-off-by: default avatarSu Yue <suy.fnst@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ drop unlikely ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      005d6712
    • Anand Jain's avatar
      btrfs: rename btrfs_close_extra_device to btrfs_free_extra_devids · 9b99b115
      Anand Jain authored
      This function btrfs_close_extra_devices() is about freeing
      extra devids which once it may have belonged to this filesystem.
      So rename it and add the comment. The _devid suffix is
      appropriate as this function won't handle devices which are
      outside of the filesytem being mounted.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9b99b115
    • Nikolay Borisov's avatar
      btrfs: Remove root argument from cow_file_range_inline · d02c0e20
      Nikolay Borisov authored
      This argument is always set to the root of the inode, which is also
      passed. So let's get a reference inside the function and simplify
      the arg list.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d02c0e20
    • Liu Bo's avatar
      Btrfs: send: fix typo in TLV_PUT · 895a72be
      Liu Bo authored
      According to tlv_put()'s prototype, data and attrlen needs to be
      exchanged in the macro, but seems all callers are already aware of
      this misorder and are therefore not affected.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      895a72be
    • Nikolay Borisov's avatar
      btrfs: Remove root argument from btrfs_log_dentry_safe · e5b84f7a
      Nikolay Borisov authored
      Now that nothing uses the root arg of btrfs_log_dentry_safe it can be
      safely removed. No functional changes.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e5b84f7a
    • Nikolay Borisov's avatar
      btrfs: Remove root arg from btrfs_log_inode_parent · f882274b
      Nikolay Borisov authored
      btrfs_log_inode_parent is called from 2 places (btrfs_log_dentry_safe
      and btrfs_log_new_name) both of which pass inode->root as the root
      argument and the inode itself. Remove the redundant root argument and
      get a reference to the root directly from the inode, also remove
      redundant root != inode->root check from the same function. No
      functional change.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f882274b
    • Nikolay Borisov's avatar
      btrfs: Remove redundant comment from btrfs_search_forward · 448f3a17
      Nikolay Borisov authored
      This function always sets keep_locks to 1 and saves the old value of
      keep_locks which is restored at the end. So there is no way it can be
      called without keep_locks being set. Remove comment imposing redundant
      requirement on callers.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      448f3a17
    • David Sterba's avatar
      btrfs: move btrfs_listxattr prototype to xattr.h · 738c93d4
      David Sterba authored
      There's a proper header for xattr handlers.
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      738c93d4
    • David Sterba's avatar
      btrfs: adjust return type of btrfs_getxattr · bcadd705
      David Sterba authored
      The xattr_handler::get prototype returns int, use it. The only ssize_t
      exception is the per-inode listxattr handler.
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      bcadd705
    • David Sterba's avatar
      btrfs: drop extern from function declarations · ab0d0936
      David Sterba authored
      Extern for functions does not make any difference, there are only a few
      so let's remove them before it's too late.
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ab0d0936
    • David Sterba's avatar
    • Filipe Manana's avatar
      Btrfs: send, do not issue unnecessary truncate operations · ffa7c429
      Filipe Manana authored
      When send finishes processing an inode representing a regular file, it
      always issues a truncate operation for that file, even if its size did
      not change or the last write sets the file size correctly. In the most
      common cases, the issued write operations set the file to correct size
      (either full or incremental sends) or the file size did not change (for
      incremental sends), so the only case where a truncate operation is needed
      is when a file size becomes smaller in the send snapshot when compared
      to the parent snapshot.
      
      By not issuing unnecessary truncate operations we reduce the stream size
      and save time in the receiver. Currently truncating a file to the same
      size triggers writeback of its last page (if it's dirty) and waits for it
      to complete (only if the file size is not aligned with the filesystem's
      sector size). This is being fixed by another patch and is independent of
      this change (that patch's title is "Btrfs: skip writeback of last page
      when truncating file to same size").
      
      The following script was used to measure time spent by a receiver without
      this change applied, with this change applied, and without this change and
      with the truncate fix applied (the fix to not make it start and wait for
      writeback to complete).
      
        $ cat test_send.sh
        #!/bin/bash
      
        SRC_DEV=/dev/sdc
        DST_DEV=/dev/sdd
        SRC_MNT=/mnt/sdc
        DST_MNT=/mnt/sdd
      
        mkfs.btrfs -f $SRC_DEV >/dev/null
        mkfs.btrfs -f $DST_DEV >/dev/null
        mount $SRC_DEV $SRC_MNT
        mount $DST_DEV $DST_MNT
      
        echo "Creating source filesystem"
        for ((t = 0; t < 10; t++)); do
            (
                for ((i = 1; i <= 20000; i++)); do
                    xfs_io -f -c "pwrite -S 0xab 0 5000" \
                        $SRC_MNT/file_$i > /dev/null
                done
            ) &
           worker_pids[$t]=$!
        done
        wait ${worker_pids[@]}
      
        echo "Creating and sending snapshot"
        btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null
        /usr/bin/time -f "send took %e seconds"    \
               btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1
        /usr/bin/time -f "receive took %e seconds" \
               btrfs receive -f $SRC_MNT/send_file $DST_MNT
      
        umount $SRC_MNT
        umount $DST_MNT
      
      The results, which are averages for 5 runs for each case, were the
      following:
      
      * Without this change
      
      average receive time was 26.49 seconds
      standard deviation of 2.53 seconds
      
      * Without this change and with the truncate fix
      
      average receive time was 12.51 seconds
      standard deviation of 0.32 seconds
      
      * With this change and without the truncate fix
      
      average receive time was 10.02 seconds
      standard deviation of 1.11 seconds
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ffa7c429
    • Filipe Manana's avatar
      Btrfs: skip writeback of last page when truncating file to same size · 213e8c55
      Filipe Manana authored
      When we truncate a file to the same size and that size is not aligned
      with the sector size, we end up triggering writeback (and wait for it to
      complete) of the last page. This is unncessary as we can not have delayed
      allocation beyond the inode's i_size and the goal of truncating a file
      to its own size is to discard prealloc extents (allocated via the
      fallocate(2) system call). Besides the unnecessary IO start and wait, it
      also breaks the oppurtunity for larger contiguous extents on disk, as
      before the last dirty page there might be other dirty pages.
      
      This scenario is probably not very common in general, however it is
      common for btrfs receive implementations because currently the send
      stream always issues a truncate operation for each processed inode as
      the last operation for that inode (this truncate operation is not
      always needed and the send implementation will be addressed to avoid
      them).
      
      So improve this by not starting and waiting for writeback of the inode's
      last page when we are truncating to exactly the same size.
      
      The following script was used to quickly measure the time a receive
      operation takes:
      
       $ cat test_send.sh
       #!/bin/bash
      
       SRC_DEV=/dev/sdc
       DST_DEV=/dev/sdd
       SRC_MNT=/mnt/sdc
       DST_MNT=/mnt/sdd
      
       mkfs.btrfs -f $SRC_DEV >/dev/null
       mkfs.btrfs -f $DST_DEV >/dev/null
       mount $SRC_DEV $SRC_MNT
       mount $DST_DEV $DST_MNT
      
       echo "Creating source filesystem"
       for ((t = 0; t < 10; t++)); do
           (
               for ((i = 1; i <= 20000; i++)); do
                   xfs_io -f -c "pwrite -S 0xab 0 5000" \
                      $SRC_MNT/file_$i > /dev/null
               done
           ) &
           worker_pids[$t]=$!
       done
       wait ${worker_pids[@]}
      
       echo "Creating and sending snapshot"
       btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null
       /usr/bin/time -f "send took %e seconds"    \
           btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1
       /usr/bin/time -f "receive took %e seconds" \
           btrfs receive -f $SRC_MNT/send_file $DST_MNT
      
       umount $SRC_MNT
       umount $DST_MNT
      
      The results for 5 runs were the following:
      
      * Without this change
      
      average receive time was 26.49 seconds
      standard deviation of 2.53 seconds
      
      * With this change
      
      average receive time was 12.51 seconds
      standard deviation of 0.32 seconds
      Reported-by: default avatarRobbie Ko <robbieko@synology.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      213e8c55
    • Liu Bo's avatar
      Btrfs: dev-replace: skip prealloc extents when copy nocow pages · ed5d5f37
      Liu Bo authored
      It doens't make sense to process prealloc extents as pages will be
      filled with zero when reading prealloc extents.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ed5d5f37
    • Anand Jain's avatar
      btrfs: unify types for metadata_ratio and data_chunk_allocations · d612ac59
      Anand Jain authored
      We have btrfs_fs_info::data_chunk_allocations and
      btrfs_fs_info::metadata_ratio declared as unsigned which would be
      unsinged int and kernel style prefers unsigned int over bare unsigned.
      So this patch changes them to u32.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d612ac59
    • Nikolay Borisov's avatar
      btrfs: Remove redundant memory barriers around dio_private error status · de224b7c
      Nikolay Borisov authored
      Using any kind of memory barriers around atomic operations which have
      a return value is redundant, since those operations themselves are
      fully ordered. atomic_t.txt states:
      
          - RMW operations that have a return value are fully ordered;
      
          Fully ordered primitives are ordered against everything prior and
          everything subsequent. Therefore a fully ordered primitive is like
          having an smp_mb() before and an smp_mb() after the primitive.
      
      Given this let's replace the extra memory barriers with comments.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      de224b7c
    • Anand Jain's avatar
      btrfs: remove assert in btrfs_init_dev_replace_tgtdev() · 16db5758
      Anand Jain authored
      In the same function we just ran btrfs_alloc_device() which means the
      btrfs_device::resized_list is sure to be empty and we are protected
      with the btrfs_fs_info::volume_mutex.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      16db5758
    • David Sterba's avatar
      btrfs: add more __cold annotations · e67c718b
      David Sterba authored
      The __cold functions are placed to a special section, as they're
      expected to be called rarely. This could help i-cache prefetches or help
      compiler to decide which branches are more/less likely to be taken
      without any other annotations needed.
      
      Though we can't add more __exit annotations, it's still possible to add
      __cold (that's also added with __exit). That way the following function
      categories are tagged:
      
      - printf wrappers, error messages
      - exit helpers
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e67c718b
    • David Sterba's avatar
      btrfs: add (the only possible) __exit annotation · ffc5a379
      David Sterba authored
      Recently, the __init annotations have been added. There's unfortunatelly
      only one case where we can add __exit, because most of the cleanup
      helpers are also called from the __init phase.
      
      As the __exit annotated functions get discarded completely for a
      built-in code, we'd miss them from the init phase.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ffc5a379
    • Anand Jain's avatar
      btrfs: verify subvolid mount parameter · ccb0e7d1
      Anand Jain authored
      We aren't verifying the parameter passed to the subvolid mount option,
      so we won't report and fail the mount if a junk value is specified for
      example, -o subvolid=abc.
      This patch verifies the subvolid option with match_u64.
      
      Up to now the memparse function accepts the K/M/G/ suffixes, that are
      usually meant for size values and do not make sense for a subvolume it.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ update changelog ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ccb0e7d1
    • Liu Bo's avatar
      Btrfs: fix unexpected cow in run_delalloc_nocow · 58113753
      Liu Bo authored
      Fstests generic/475 provides a way to fail metadata reads while
      checking if checksum exists for the inode inside run_delalloc_nocow(),
      and csum_exist_in_range() interprets error (-EIO) as inode having
      checksum and makes its caller enter the cow path.
      
      In case of free space inode, this ends up with a warning in
      cow_file_range().
      
      The same problem applies to btrfs_cross_ref_exist() since it may also
      read metadata in between.
      
      With this, run_delalloc_nocow() bails out when errors occur at the two
      places.
      
      cc: <stable@vger.kernel.org> v2.6.28+
      Fixes: 17d217fe ("Btrfs: fix nodatasum handling in balancing code")
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      58113753
    • Nikolay Borisov's avatar
      btrfs: Remove custom crc32c init code · 9678c543
      Nikolay Borisov authored
      The custom crc32 init code was introduced in
      14a958e6 ("Btrfs: fix btrfs boot when compiled as built-in") to
      enable using btrfs as a built-in. However, later as pointed out by
      60efa5eb ("Btrfs: use late_initcall instead of module_init") this
      wasn't enough and finally btrfs was switched to late_initcall which
      comes after the generic crc32c implementation is initiliased. The
      latter commit superseeded the former. Now that we don't have to
      maintain our own code let's just remove it and switch to using the
      generic implementation.
      
      Despite touching a lot of files the patch is really simple. Here is the gist of
      the changes:
      
      1. Select LIBCRC32C rather than the low-level modules.
      2. s/btrfs_crc32c/crc32c/g
      3. replace hash.h with linux/crc32c.h
      4. Move the btrfs namehash funcs to ctree.h and change the tree accordingly.
      
      I've tested this with btrfs being both a module and a built-in and xfstest
      doesn't complain.
      
      Does seem to fix the longstanding problem of not automatically selectiong
      the crc32c module when btrfs is used. Possibly there is a workaround in
      dracut.
      
      The modinfo confirms that now all the module dependencies are there:
      
      before:
      depends:        zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate
      
      after:
      depends:        libcrc32c,zstd_compress,zstd_decompress,raid6_pq,xor,zlib_deflate
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ add more info to changelog from mails ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9678c543
    • Nikolay Borisov's avatar
      libcrc32c: Add crc32c_impl function · df91f56a
      Nikolay Borisov authored
      This function returns a string with the currently in-use implementation
      of the crc32c algorithm, i.e crc32c-generic (for unoptimised, generic
      implementation) or crc32c-intel for the sse optimised version. This
      will be used by btrfs.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      [ use crypto_shash_driver_name as suggested by Herbert ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      df91f56a
    • Qu Wenruo's avatar
      btrfs: Refactor __get_raid_index() to btrfs_bg_flags_to_raid_index() · 3e72ee88
      Qu Wenruo authored
      Function __get_raid_index() is used to convert block group flags into
      raid index, which can be used to get various info directly from
      btrfs_raid_array[].
      
      Refactor this function a little:
      
      1) Rename to btrfs_bg_flags_to_raid_index()
         Double underline prefix is normally for internal functions, while the
         function is used by both extent-tree and volumes.
      
         Although the name is a little longer, but it should explain its usage
         quite well.
      
      2) Move it to volumes.h and make it static inline
         Just several if-else branches, really no need to define it as a normal
         function.
      
         This also makes later code re-use between kernel and btrfs-progs
         easier.
      
      3) Remove function get_block_group_index()
         Really no need to do such a simple thing as an exported function.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      3e72ee88
    • Qu Wenruo's avatar
      btrfs: tree-checker: Replace root parameter with fs_info · 2f659546
      Qu Wenruo authored
      When inspecting the error message with real corruption, the "root=%llu"
      always shows "1" (root tree), instead of the correct owner.
      
      The problem is that we are getting @root from page->mapping->host, which
      points the same btree inode, so we will always get the same root.
      
      This makes the root owner output meaningless, and harder to port
      tree-checker to btrfs-progs.
      
      So get rid of the false and meaningless @root parameter and replace it
      with @fs_info.
      To get the owner, we can only rely on btrfs_header_owner() now.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2f659546
    • Liu Bo's avatar
      Btrfs: add tracepoint for em's EEXIST case · 393da918
      Liu Bo authored
      This is adding a tracepoint 'btrfs_handle_em_exist' to help debug the
      subtle bugs around merge_extent_mapping.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      393da918
    • Nikolay Borisov's avatar
      btrfs: Move qgroup rescan on quota enable to btrfs_quota_enable · 5d23515b
      Nikolay Borisov authored
      Currently btrfs_run_qgroups is doing a bit too much. Not only is it
      responsible for synchronizing in-memory state of qgroups to disk but
      it also contains code to trigger the initial qgroup rescan when
      quota is enabled initially. This condition is detected by checking that
      BTRFS_FS_QUOTA_ENABLED is not set and BTRFS_FS_QUOTA_ENABLING is set.
      Nothing really requires from the code to be structured (and scattered)
      the way it is so let's streamline things. First move the quota rescan
      code into btrfs_quota_enable, where its invocation is closer to the
      use. This also makes the FS_QUOTA_ENABLING flag redundant so let's
      remove it as well.
      
      This has been tested with a full xfstest run with qgroups enabled on
      the scratch device of every xfstest and no regressions were observed.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5d23515b
    • Gu JinXiang's avatar
      btrfs: use reada direction enum instead of constant value in load_free_space_tree · 7ce311d5
      Gu JinXiang authored
      load_free_space_tree calls either function load_free_space_bitmaps or
      load_free_space_extents. And either of those two will lead to call
      btrfs_next_item.  So in function load_free_space_tree, use READA_FORWARD
      to read forward ahead.
      
      This also changes the value from READA_BACK to READA_FORWARD, since
      according to the logic, it should reada_for_search forward, not
      backward.
      Signed-off-by: default avatarGu JinXiang <gujx@cn.fujitsu.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      [ update changelog ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7ce311d5