1. 09 Sep, 2019 40 commits
    • David Sterba's avatar
      btrfs: sysfs: unexport btrfs_raid_ktype · 536ea45c
      David Sterba authored
      The last non-sysfs usage of btrfs_raid_ktype has been moved to a private
      helper in previous patch so the variable can be made static.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      536ea45c
    • David Sterba's avatar
      btrfs: factor sysfs code out of link_block_group · 32a9991f
      David Sterba authored
      The part of link_block_group that just creates the sysfs object is
      independent and can be factored out to a helper.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      32a9991f
    • David Sterba's avatar
      btrfs: move sysfs declarations out of ctree.h · 89439109
      David Sterba authored
      As the header for sysfs code already exists, use it to clean up ctree.h.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      89439109
    • Anand Jain's avatar
      btrfs: opencode reset of all device stats · ae4b9b4c
      Anand Jain authored
      __btrfs_reset_dev_stats() is a small helper function to reset devices stat
      values, and is used only once, instead just open code it.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ae4b9b4c
    • Anand Jain's avatar
      btrfs: reset device stat using btrfs_dev_stat_set · 4e411a7d
      Anand Jain authored
      btrfs_dev_stat_reset() is an overdo in terms of wrapping. So this patch
      open codes btrfs_dev_stat_reset().
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4e411a7d
    • Qu Wenruo's avatar
      btrfs: qgroup: Try our best to delete qgroup relations · 73798c46
      Qu Wenruo authored
      When we try to delete qgroups, we're pretty cautious, we make sure both
      qgroups exist and there is a relationship between them, then try to
      delete the relation.
      
      This behavior is OK, but the problem is we need to two relation items,
      and if we failed the first item deletion, we error out, leaving the
      other relation item in qgroup tree.
      
      Sometimes the error from del_qgroup_relation_item() could just be
      -ENOENT, thus we can ignore that error and continue without any problem.
      
      Further more, such cautious behavior makes qgroup relation deletion
      impossible for orphan relation items.
      
      This patch will enhance __del_qgroup_relation():
      - If both qgroups and their relation items exist
        Go the regular deletion routine and update their accounting if needed.
      
      - If any qgroup or relation item doesn't exist
        Then we still try to delete the orphan items anyway, but don't trigger
        the accounting update.
      
      By this, we try our best to remove relation items, and can handle orphan
      relation items properly, while still keep the existing behavior for good
      qgroup tree.
      Reported-by: default avatarAndrei Borzenkov <arvidjaar@gmail.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      73798c46
    • Hans van Kranenburg's avatar
      btrfs: clarify btrfs_ioctl_get_dev_stats padding · 73a3ca20
      Hans van Kranenburg authored
      In commit c11d2c23 ("Btrfs: add ioctl to get and reset the device
      stats") the get_dev_stats ioctl was added.
      
      Shortly thereafter, in commit b27f7c0c ("btrfs: join DEV_STATS
      ioctls to one") , the flags field was added.  However, the calculation
      for unused padding space was not updated, which also invalidated the
      comment.
      
      Clarify what happened to reduce confusion and wasted time for anyone
      implementing this.
      Signed-off-by: default avatarHans van Kranenburg <hans.van.kranenburg@mendix.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      73a3ca20
    • Filipe Manana's avatar
      Btrfs: make test_find_first_clear_extent_bit fail on incorrect results · 202f64ef
      Filipe Manana authored
      If any call to find_first_clear_extent_bit() returns an unexpected result,
      the test should fail and not just print an error message, otherwise it
      makes detection of regressions much harder to notice.
      
      Fixes: 1eaebb34 ("btrfs: Don't trim returned range based on input value in find_first_clear_extent_bit")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      202f64ef
    • Filipe Manana's avatar
      Btrfs: fix memory leaks in the test test_find_first_clear_extent_bit · cdf52bd9
      Filipe Manana authored
      The test creates an extent io tree and sets several ranges with the
      CHUNK_ALLOCATED and CHUNK_TRIMMED bits, resulting in the allocation of
      several extent state structures. However the test never clears those
      ranges, resulting in memory leaks of the extent state structures.
      
      This is detected when CONFIG_BTRFS_DEBUG is set once we remove the
      btrfs module (rmmod btrfs):
      
      [57399.787918] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1
      [57399.790155] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1
      [57399.791941] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1
      [57399.793753] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1
      [57399.795188] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1
      [57399.796453] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1
      [57399.797765] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1
      [57399.799049] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1
      [57399.800142] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1
      [57399.801126] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1
      [57399.802106] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1
      [57399.803119] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1
      [57399.804153] BTRFS: state leak: start 67108864 end 75497471 state 1 in tree 1 refs 1
      [57399.805196] BTRFS: state leak: start 33554432 end 67108863 state 33 in tree 1 refs 1
      [57399.806191] BTRFS: state leak: start 1048576 end 4194303 state 33 in tree 1 refs 1
      
      The start and end offsets reported correspond exactly to the ranges
      used by the test.
      
      So fix that by clearing all the ranges when the test finishes.
      
      Fixes: 1eaebb34 ("btrfs: Don't trim returned range based on input value in find_first_clear_extent_bit")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      cdf52bd9
    • David Sterba's avatar
      btrfs: delete debugfs code · b33151e7
      David Sterba authored
      Replaced by the sysfs exports that provide a more fine grained interface
      for filesystem debugging.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      b33151e7
    • David Sterba's avatar
      btrfs: sysfs: add debugging exports · 6e369feb
      David Sterba authored
      Add 'debug' directories to global sysfs and per-filesystem. This will
      replace the debugfs directory. The sysfs location is simpler and builds
      on top of the existing file hierarchy so there will hopefully be no more
      questions about the sample debugfs file.
      
      The directory is called 'debug' and only present under
      CONFIG_BTRFS_DEBUG so this will not affect productions builds.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      6e369feb
    • Josef Bacik's avatar
      btrfs: make caching_thread use btrfs_find_next_key · 6a9fb468
      Josef Bacik authored
      extent-tree.c has a find_next_key that just walks up the path to find
      the next key, but it is used for both the caching stuff and the snapshot
      delete stuff.  The snapshot deletion stuff is special so it can't really
      use btrfs_find_next_key, but the caching thread stuff can.  We just need
      to fix btrfs_find_next_key to deal with ->skip_locking and then it works
      exactly the same as the private find_next_key helper.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      6a9fb468
    • Josef Bacik's avatar
      btrfs: temporarily export fragment_free_space · caa4efaf
      Josef Bacik authored
      This is used in caching and reading block groups, so export it while we
      move these chunks independently.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      caa4efaf
    • Josef Bacik's avatar
      btrfs: export the caching control helpers · e3cb339f
      Josef Bacik authored
      Man a lot of people use this stuff.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e3cb339f
    • Josef Bacik's avatar
      btrfs: export the excluded extents helpers · 6f410d1b
      Josef Bacik authored
      We'll need this to move the caching stuff around.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      6f410d1b
    • Josef Bacik's avatar
      btrfs: export the block group caching helpers · 676f1f75
      Josef Bacik authored
      This will make it so we can move them easily.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ coding style updates ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      676f1f75
    • Josef Bacik's avatar
      btrfs: migrate nocow and reservation helpers · 3eeb3226
      Josef Bacik authored
      These are relatively straightforward as well.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      3eeb3226
    • Josef Bacik's avatar
      btrfs: migrate the block group ref counting stuff · 3cad1284
      Josef Bacik authored
      Another easy set to move over to block-group.c.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      3cad1284
    • Josef Bacik's avatar
      btrfs: migrate the block group lookup code · 2e405ad8
      Josef Bacik authored
      Move these bits first as they are the easiest to move.  Export two of
      the helpers so they can be moved all at once.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ minor style updates ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2e405ad8
    • Josef Bacik's avatar
      btrfs: move basic block_group definitions to their own header · aac0023c
      Josef Bacik authored
      This is prep work for moving all of the block group cache code into its
      own file.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ minor comment updates ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      aac0023c
    • Josef Bacik's avatar
      btrfs: move btrfs_add_free_space out of a header file · 478b4d9f
      Josef Bacik authored
      This is prep work for moving block_group_cache around.  Having this in
      the header file makes the header file include need to be in a certain
      order, which is awkward, so just move it into free-space-cache.c and
      then we can re-arrange later.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      478b4d9f
    • David Sterba's avatar
    • David Sterba's avatar
      f64ce7b8
    • David Sterba's avatar
      btrfs: tree-log: convert defines to enums · e13976cf
      David Sterba authored
      Used only for in-memory state tracking.
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e13976cf
    • David Sterba's avatar
      btrfs: remove unused key type set/get helpers · 82253cb6
      David Sterba authored
      The switch to open coded set/get has happend long time ago in
      962a298f ("btrfs: kill the key type accessor helpers"), remove the
      stray helpers.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      82253cb6
    • David Sterba's avatar
      btrfs: remove unused btrfs_device::flush_bio_sent · adf4c0c5
      David Sterba authored
      The status of flush bio is tracked as a status bit, changed in commit
      1c3063b6 ("btrfs: cleanup device states define
      BTRFS_DEV_STATE_FLUSH_SENT"), the flush_bio_sent was forgotten.
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      adf4c0c5
    • Filipe Manana's avatar
      Btrfs: remove unnecessary condition in btrfs_clone() to avoid too much nesting · b64119b5
      Filipe Manana authored
      The bulk of the work done when cloning extents, at ioctl.c:btrfs_clone(),
      is done inside an if statement that checks if the found key has the type
      BTRFS_EXTENT_DATA_KEY. That if statement is redundant however, because
      btrfs_search_slot() always leaves us in a leaf slot that points to a key
      that is always greater then or equals to the search key, and our search
      key here has that type, and because we bail out before that if statement
      if the key at the given leaf slot is greater then BTRFS_EXTENT_DATA_KEY.
      
      Therefore just remove that if statement, not only because it is useless
      but mostly because it increases the nesting/indentation level in this
      function which is quite big and makes things a bit awkward whenever I need
      to fix something that requires changing btrfs_clone() (and it has been
      like that for many years already).
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      b64119b5
    • Nikolay Borisov's avatar
      btrfs: Refactor btrfs_calc_avail_data_space · 559ca6ea
      Nikolay Borisov authored
      Simplify the code by removing variables that don't bring any real value
      as well as simplifying the checks when buidling the candidate list of
      devices. No functional changes.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      559ca6ea
    • Nikolay Borisov's avatar
      btrfs: Remove unnecessary check from join_running_log_trans · e678934c
      Nikolay Borisov authored
      join_running_log_trans checks btrfs_root::log_root outside of
      btrfs_root::log_mutex to avoid contention on the mutex. Turns out this
      check is not necessary because the two callers of join_running_log_trans
      (both of which deal with removing entries from the tree-log during
      unlink) explicitly check whether the respective inode has been logged in
      the current transaction.
      
      If it hasn't then it won't have any items in the tree-log and call path
      will return before calling join_running_log_trans. If the check passes,
      however, then it's guaranteed that btrfs_root::log_root is set because
      the inode is logged.
      
      Those guarantees allows us to remove the speculative as well as the
      implicity and tricky memory barrier.
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e678934c
    • Filipe Manana's avatar
      Btrfs: wake up inode cache waiters sooner to reduce waiting time · 32e53440
      Filipe Manana authored
      If we need to start an inode caching thread, because none currently exists
      on disk, we can wake up all waiters as soon as we mark the range starting
      at root's highest objectid + 1 and ending at BTRFS_LAST_FREE_OBJECTID as
      free, so that they don't need to wait for the caching thread to start and
      do some progress. We follow the same approach within the caching thread,
      since as soon as it finds a free range and marks it as free space in the
      cache, it wakes up all waiters. So improve this by adding such a wakeup
      call after marking that initial range as free space.
      
      Fixes: a47d6b70 ("Btrfs: setup free ino caching in a more asynchronous way")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      32e53440
    • Filipe Manana's avatar
      Btrfs: fix inode cache waiters hanging on path allocation failure · 9d123a35
      Filipe Manana authored
      If the caching thread fails to allocate a path, it returns without waking
      up any cache waiters, leaving them hang forever. Fix this by following the
      same approach as when we fail to start the caching thread: print an error
      message, disable inode caching and make the wakers fallback to non-caching
      mode behaviour (calling btrfs_find_free_objectid()).
      
      Fixes: 581bb050 ("Btrfs: Cache free inode numbers in memory")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9d123a35
    • Filipe Manana's avatar
      Btrfs: fix inode cache waiters hanging on failure to start caching thread · a68ebe07
      Filipe Manana authored
      If we fail to start the inode caching thread, we print an error message
      and disable the inode cache, however we never wake up any waiters, so they
      hang forever waiting for the caching to finish. Fix this by waking them
      up and have them fallback to a call to btrfs_find_free_objectid().
      
      Fixes: e60efa84 ("Btrfs: avoid triggering bug_on() when we fail to start inode caching task")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a68ebe07
    • Filipe Manana's avatar
      Btrfs: fix inode cache block reserve leak on failure to allocate data space · 29d47d00
      Filipe Manana authored
      If we failed to allocate the data extent(s) for the inode space cache, we
      were bailing out without releasing the previously reserved metadata. This
      was triggering the following warnings when unmounting a filesystem:
      
        $ cat -n fs/btrfs/inode.c
        (...)
        9268  void btrfs_destroy_inode(struct inode *inode)
        9269  {
        (...)
        9276          WARN_ON(BTRFS_I(inode)->block_rsv.reserved);
        9277          WARN_ON(BTRFS_I(inode)->block_rsv.size);
        (...)
        9281          WARN_ON(BTRFS_I(inode)->csum_bytes);
        9282          WARN_ON(BTRFS_I(inode)->defrag_bytes);
        (...)
      
      Several fstests test cases triggered this often, such as generic/083,
      generic/102, generic/172, generic/269 and generic/300 at least, producing
      stack traces like the following in dmesg/syslog:
      
        [82039.079546] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9276 btrfs_destroy_inode+0x203/0x270 [btrfs]
        (...)
        [82039.081543] CPU: 2 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
        [82039.081912] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
        [82039.082673] RIP: 0010:btrfs_destroy_inode+0x203/0x270 [btrfs]
        (...)
        [82039.083913] RSP: 0018:ffffac0b426a7d30 EFLAGS: 00010206
        [82039.084320] RAX: ffff8ddf77691158 RBX: ffff8dde29b34660 RCX: 0000000000000002
        [82039.084736] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8dde29b34660
        [82039.085156] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
        [82039.085578] R10: ffffac0b426a7c90 R11: ffffffffb9aad768 R12: ffffac0b426a7db0
        [82039.086000] R13: ffff8ddf5fbec0a0 R14: dead000000000100 R15: 0000000000000000
        [82039.086416] FS:  00007f8db96d12c0(0000) GS:ffff8de036b00000(0000) knlGS:0000000000000000
        [82039.086837] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [82039.087253] CR2: 0000000001416108 CR3: 00000002315cc001 CR4: 00000000003606e0
        [82039.087672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [82039.088089] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [82039.088504] Call Trace:
        [82039.088918]  destroy_inode+0x3b/0x70
        [82039.089340]  btrfs_free_fs_root+0x16/0xa0 [btrfs]
        [82039.089768]  btrfs_free_fs_roots+0xd8/0x160 [btrfs]
        [82039.090183]  ? wait_for_completion+0x65/0x1a0
        [82039.090607]  close_ctree+0x172/0x370 [btrfs]
        [82039.091021]  generic_shutdown_super+0x6c/0x110
        [82039.091427]  kill_anon_super+0xe/0x30
        [82039.091832]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [82039.092233]  deactivate_locked_super+0x3a/0x70
        [82039.092636]  cleanup_mnt+0x3b/0x80
        [82039.093039]  task_work_run+0x93/0xc0
        [82039.093457]  exit_to_usermode_loop+0xfa/0x100
        [82039.093856]  do_syscall_64+0x162/0x1d0
        [82039.094244]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [82039.094634] RIP: 0033:0x7f8db8fbab37
        (...)
        [82039.095876] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        [82039.096290] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
        [82039.096700] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
        [82039.097110] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
        [82039.097522] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
        [82039.097937] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
        [82039.098350] irq event stamp: 0
        [82039.098750] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        [82039.099150] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.099545] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.099925] softirqs last disabled at (0): [<0000000000000000>] 0x0
        [82039.100292] ---[ end trace f2521afa616ddccc ]---
        [82039.100707] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9277 btrfs_destroy_inode+0x1ac/0x270 [btrfs]
        (...)
        [82039.103050] CPU: 2 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
        [82039.103428] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
        [82039.104203] RIP: 0010:btrfs_destroy_inode+0x1ac/0x270 [btrfs]
        (...)
        [82039.105461] RSP: 0018:ffffac0b426a7d30 EFLAGS: 00010206
        [82039.105866] RAX: ffff8ddf77691158 RBX: ffff8dde29b34660 RCX: 0000000000000002
        [82039.106270] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8dde29b34660
        [82039.106673] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
        [82039.107078] R10: ffffac0b426a7c90 R11: ffffffffb9aad768 R12: ffffac0b426a7db0
        [82039.107487] R13: ffff8ddf5fbec0a0 R14: dead000000000100 R15: 0000000000000000
        [82039.107894] FS:  00007f8db96d12c0(0000) GS:ffff8de036b00000(0000) knlGS:0000000000000000
        [82039.108309] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [82039.108723] CR2: 0000000001416108 CR3: 00000002315cc001 CR4: 00000000003606e0
        [82039.109146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [82039.109567] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [82039.109989] Call Trace:
        [82039.110405]  destroy_inode+0x3b/0x70
        [82039.110830]  btrfs_free_fs_root+0x16/0xa0 [btrfs]
        [82039.111257]  btrfs_free_fs_roots+0xd8/0x160 [btrfs]
        [82039.111675]  ? wait_for_completion+0x65/0x1a0
        [82039.112101]  close_ctree+0x172/0x370 [btrfs]
        [82039.112519]  generic_shutdown_super+0x6c/0x110
        [82039.112988]  kill_anon_super+0xe/0x30
        [82039.113439]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [82039.113861]  deactivate_locked_super+0x3a/0x70
        [82039.114278]  cleanup_mnt+0x3b/0x80
        [82039.114685]  task_work_run+0x93/0xc0
        [82039.115083]  exit_to_usermode_loop+0xfa/0x100
        [82039.115476]  do_syscall_64+0x162/0x1d0
        [82039.115863]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [82039.116254] RIP: 0033:0x7f8db8fbab37
        (...)
        [82039.117463] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        [82039.117882] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
        [82039.118330] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
        [82039.118743] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
        [82039.119159] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
        [82039.119574] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
        [82039.119987] irq event stamp: 0
        [82039.120387] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        [82039.120787] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.121182] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.121563] softirqs last disabled at (0): [<0000000000000000>] 0x0
        [82039.121933] ---[ end trace f2521afa616ddccd ]---
        [82039.122353] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9278 btrfs_destroy_inode+0x1bc/0x270 [btrfs]
        (...)
        [82039.124606] CPU: 2 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
        [82039.125008] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
        [82039.125801] RIP: 0010:btrfs_destroy_inode+0x1bc/0x270 [btrfs]
        (...)
        [82039.126998] RSP: 0018:ffffac0b426a7d30 EFLAGS: 00010202
        [82039.127399] RAX: ffff8ddf77691158 RBX: ffff8dde29b34660 RCX: 0000000000000002
        [82039.127803] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8dde29b34660
        [82039.128206] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
        [82039.128611] R10: ffffac0b426a7c90 R11: ffffffffb9aad768 R12: ffffac0b426a7db0
        [82039.129020] R13: ffff8ddf5fbec0a0 R14: dead000000000100 R15: 0000000000000000
        [82039.129428] FS:  00007f8db96d12c0(0000) GS:ffff8de036b00000(0000) knlGS:0000000000000000
        [82039.129846] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [82039.130261] CR2: 0000000001416108 CR3: 00000002315cc001 CR4: 00000000003606e0
        [82039.130684] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [82039.131142] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [82039.131561] Call Trace:
        [82039.131990]  destroy_inode+0x3b/0x70
        [82039.132417]  btrfs_free_fs_root+0x16/0xa0 [btrfs]
        [82039.132844]  btrfs_free_fs_roots+0xd8/0x160 [btrfs]
        [82039.133262]  ? wait_for_completion+0x65/0x1a0
        [82039.133688]  close_ctree+0x172/0x370 [btrfs]
        [82039.134157]  generic_shutdown_super+0x6c/0x110
        [82039.134575]  kill_anon_super+0xe/0x30
        [82039.134997]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [82039.135415]  deactivate_locked_super+0x3a/0x70
        [82039.135832]  cleanup_mnt+0x3b/0x80
        [82039.136239]  task_work_run+0x93/0xc0
        [82039.136637]  exit_to_usermode_loop+0xfa/0x100
        [82039.137029]  do_syscall_64+0x162/0x1d0
        [82039.137418]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [82039.137812] RIP: 0033:0x7f8db8fbab37
        (...)
        [82039.139059] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        [82039.139475] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
        [82039.139890] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
        [82039.140302] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
        [82039.140719] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
        [82039.141138] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
        [82039.141597] irq event stamp: 0
        [82039.142043] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        [82039.142443] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.142839] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.143220] softirqs last disabled at (0): [<0000000000000000>] 0x0
        [82039.143588] ---[ end trace f2521afa616ddcce ]---
        [82039.167472] WARNING: CPU: 3 PID: 13167 at fs/btrfs/extent-tree.c:10120 btrfs_free_block_groups+0x30d/0x460 [btrfs]
        (...)
        [82039.173800] CPU: 3 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
        [82039.174847] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
        [82039.177031] RIP: 0010:btrfs_free_block_groups+0x30d/0x460 [btrfs]
        (...)
        [82039.180397] RSP: 0018:ffffac0b426a7dd8 EFLAGS: 00010206
        [82039.181574] RAX: ffff8de010a1db40 RBX: ffff8de010a1db40 RCX: 0000000000170014
        [82039.182711] RDX: ffff8ddff4380040 RSI: ffff8de010a1da58 RDI: 0000000000000246
        [82039.183817] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
        [82039.184925] R10: ffff8de036404380 R11: ffffffffb8a5ea00 R12: ffff8de010a1b2b8
        [82039.186090] R13: ffff8de010a1b2b8 R14: 0000000000000000 R15: dead000000000100
        [82039.187208] FS:  00007f8db96d12c0(0000) GS:ffff8de036b80000(0000) knlGS:0000000000000000
        [82039.188345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [82039.189481] CR2: 00007fb044005170 CR3: 00000002315cc006 CR4: 00000000003606e0
        [82039.190674] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [82039.191829] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [82039.192978] Call Trace:
        [82039.194160]  close_ctree+0x19a/0x370 [btrfs]
        [82039.195315]  generic_shutdown_super+0x6c/0x110
        [82039.196486]  kill_anon_super+0xe/0x30
        [82039.197645]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [82039.198696]  deactivate_locked_super+0x3a/0x70
        [82039.199619]  cleanup_mnt+0x3b/0x80
        [82039.200559]  task_work_run+0x93/0xc0
        [82039.201505]  exit_to_usermode_loop+0xfa/0x100
        [82039.202436]  do_syscall_64+0x162/0x1d0
        [82039.203339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [82039.204091] RIP: 0033:0x7f8db8fbab37
        (...)
        [82039.206360] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        [82039.207132] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
        [82039.207906] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
        [82039.208621] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
        [82039.209285] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
        [82039.209984] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
        [82039.210642] irq event stamp: 0
        [82039.211306] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        [82039.211971] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.212643] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.213304] softirqs last disabled at (0): [<0000000000000000>] 0x0
        [82039.213875] ---[ end trace f2521afa616ddccf ]---
      
      Fix this by releasing the reserved metadata on failure to allocate data
      extent(s) for the inode cache.
      
      Fixes: 69fe2d75 ("btrfs: make the delalloc block rsv per inode")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      29d47d00
    • Filipe Manana's avatar
      Btrfs: fix hang when loading existing inode cache off disk · 7764d56b
      Filipe Manana authored
      If we are able to load an existing inode cache off disk, we set the state
      of the cache to BTRFS_CACHE_FINISHED, but we don't wake up any one waiting
      for the cache to be available. This means that anyone waiting for the
      cache to be available, waiting on the condition that either its state is
      BTRFS_CACHE_FINISHED or its available free space is greather than zero,
      can hang forever.
      
      This could be observed running fstests with MOUNT_OPTIONS="-o inode_cache",
      in particular test case generic/161 triggered it very frequently for me,
      producing a trace like the following:
      
        [63795.739712] BTRFS info (device sdc): enabling inode map caching
        [63795.739714] BTRFS info (device sdc): disk space caching is enabled
        [63795.739716] BTRFS info (device sdc): has skinny extents
        [64036.653886] INFO: task btrfs-transacti:3917 blocked for more than 120 seconds.
        [64036.654079]       Not tainted 5.2.0-rc4-btrfs-next-50 #1
        [64036.654143] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [64036.654232] btrfs-transacti D    0  3917      2 0x80004000
        [64036.654239] Call Trace:
        [64036.654258]  ? __schedule+0x3ae/0x7b0
        [64036.654271]  schedule+0x3a/0xb0
        [64036.654325]  btrfs_commit_transaction+0x978/0xae0 [btrfs]
        [64036.654339]  ? remove_wait_queue+0x60/0x60
        [64036.654395]  transaction_kthread+0x146/0x180 [btrfs]
        [64036.654450]  ? btrfs_cleanup_transaction+0x620/0x620 [btrfs]
        [64036.654456]  kthread+0x103/0x140
        [64036.654464]  ? kthread_create_worker_on_cpu+0x70/0x70
        [64036.654476]  ret_from_fork+0x3a/0x50
        [64036.654504] INFO: task xfs_io:3919 blocked for more than 120 seconds.
        [64036.654568]       Not tainted 5.2.0-rc4-btrfs-next-50 #1
        [64036.654617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [64036.654685] xfs_io          D    0  3919   3633 0x00000000
        [64036.654691] Call Trace:
        [64036.654703]  ? __schedule+0x3ae/0x7b0
        [64036.654716]  schedule+0x3a/0xb0
        [64036.654756]  btrfs_find_free_ino+0xa9/0x120 [btrfs]
        [64036.654764]  ? remove_wait_queue+0x60/0x60
        [64036.654809]  btrfs_create+0x72/0x1f0 [btrfs]
        [64036.654822]  lookup_open+0x6bc/0x790
        [64036.654849]  path_openat+0x3bc/0xc00
        [64036.654854]  ? __lock_acquire+0x331/0x1cb0
        [64036.654869]  do_filp_open+0x99/0x110
        [64036.654884]  ? __alloc_fd+0xee/0x200
        [64036.654895]  ? do_raw_spin_unlock+0x49/0xc0
        [64036.654909]  ? do_sys_open+0x132/0x220
        [64036.654913]  do_sys_open+0x132/0x220
        [64036.654926]  do_syscall_64+0x60/0x1d0
        [64036.654933]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fix this by adding a wake_up() call right after setting the cache state to
      BTRFS_CACHE_FINISHED, at start_caching(), when we are able to load the
      cache from disk.
      
      Fixes: 82d5902d ("Btrfs: Support reading/writing on disk free ino cache")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7764d56b
    • Qu Wenruo's avatar
      btrfs: tree-checker: Add ROOT_ITEM check · 259ee775
      Qu Wenruo authored
      This patch will introduce ROOT_ITEM check, which includes:
      - Key->objectid and key->offset check
        Currently only some easy check, e.g. 0 as rootid is invalid.
      
      - Item size check
        Root item size is fixed.
      
      - Generation checks
        Generation, generation_v2 and last_snapshot should not be greater than
        super generation + 1
      
      - Level and alignment check
        Level should be in [0, 7], and bytenr must be aligned to sector size.
      
      - Flags check
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203261Reported-by: default avatarJungyeon Yoon <jungyeon.yoon@gmail.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      259ee775
    • Qu Wenruo's avatar
      btrfs: extent-tree: Make sure we only allocate extents from block groups with the same type · 2a28468e
      Qu Wenruo authored
      [BUG]
      With fuzzed image and MIXED_GROUPS super flag, we can hit the following
      BUG_ON():
      
        kernel BUG at fs/btrfs/delayed-ref.c:491!
        invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
        CPU: 0 PID: 1849 Comm: sync Tainted: G           O      5.2.0-custom #27
        RIP: 0010:update_existing_head_ref.cold+0x44/0x46 [btrfs]
        Call Trace:
         add_delayed_ref_head+0x20c/0x2d0 [btrfs]
         btrfs_add_delayed_tree_ref+0x1fc/0x490 [btrfs]
         btrfs_free_tree_block+0x123/0x380 [btrfs]
         __btrfs_cow_block+0x435/0x500 [btrfs]
         btrfs_cow_block+0x110/0x240 [btrfs]
         btrfs_search_slot+0x230/0xa00 [btrfs]
         ? __lock_acquire+0x105e/0x1e20
         btrfs_insert_empty_items+0x67/0xc0 [btrfs]
         alloc_reserved_file_extent+0x9e/0x340 [btrfs]
         __btrfs_run_delayed_refs+0x78e/0x1240 [btrfs]
         ? kvm_clock_read+0x18/0x30
         ? __sched_clock_gtod_offset+0x21/0x50
         btrfs_run_delayed_refs.part.0+0x4e/0x180 [btrfs]
         btrfs_run_delayed_refs+0x23/0x30 [btrfs]
         btrfs_commit_transaction+0x53/0x9f0 [btrfs]
         btrfs_sync_fs+0x7c/0x1c0 [btrfs]
         ? __ia32_sys_fdatasync+0x20/0x20
         sync_fs_one_sb+0x23/0x30
         iterate_supers+0x95/0x100
         ksys_sync+0x62/0xb0
         __ia32_sys_sync+0xe/0x20
         do_syscall_64+0x65/0x240
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [CAUSE]
      This situation is caused by several factors:
      - Fuzzed image
        The extent tree of this fs missed one backref for extent tree root.
        So we can allocated space from that slot.
      
      - MIXED_BG feature
        Super block has MIXED_BG flag.
      
      - No mixed block groups exists
        All block groups are just regular ones.
      
      This makes data space_info->block_groups[] contains metadata block
      groups.  And when we reserve space for data, we can use space in
      metadata block group.
      
      Then we hit the following file operations:
      
      - fallocate
        We need to allocate data extents.
        find_free_extent() choose to use the metadata block to allocate space
        from, and choose the space of extent tree root, since its backref is
        missing.
      
        This generate one delayed ref head with is_data = 1.
      
      - extent tree update
        We need to update extent tree at run_delayed_ref time.
      
        This generate one delayed ref head with is_data = 0, for the same
        bytenr of old extent tree root.
      
      Then we trigger the BUG_ON().
      
      [FIX]
      The quick fix here is to check block_group->flags before using it.
      
      The problem can only happen for MIXED_GROUPS fs. Regular filesystems
      won't have space_info with DATA|METADATA flag, and no way to hit the
      bug.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203255Reported-by: default avatarJungyeon Yoon <jungyeon.yoon@gmail.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2a28468e
    • Qu Wenruo's avatar
      btrfs: delayed-inode: Kill the BUG_ON() in btrfs_delete_delayed_dir_index() · 933c22a7
      Qu Wenruo authored
      There is one report of fuzzed image which leads to BUG_ON() in
      btrfs_delete_delayed_dir_index().
      
      Although that fuzzed image can already be addressed by enhanced
      extent-tree error handler, it's still better to hunt down more BUG_ON().
      
      This patch will hunt down two BUG_ON()s in
      btrfs_delete_delayed_dir_index():
      - One for error from btrfs_delayed_item_reserve_metadata()
        Instead of BUG_ON(), we output an error message and free the item.
        And return the error.
        All callers of this function handles the error by aborting current
        trasaction.
      
      - One for possible EEXIST from __btrfs_add_delayed_deletion_item()
        That function can return -EEXIST.
        We already have a good enough error message for that, only need to
        clean up the reserved metadata space and allocated item.
      
      To help above cleanup, also modifiy __btrfs_remove_delayed_item() called
      in btrfs_release_delayed_item(), to skip unassociated item.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203253Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      933c22a7
    • Qu Wenruo's avatar
      btrfs: volumes: Remove ENOSPC-prone btrfs_can_relocate() · 112974d4
      Qu Wenruo authored
      [BUG]
      Test case btrfs/156 fails since commit 302167c5 ("btrfs: don't end
      the transaction for delayed refs in throttle") with ENOSPC.
      
      [CAUSE]
      The ENOSPC is reported from btrfs_can_relocate().
      
      This function will check:
      - If this block group is empty, we can relocate
      - If we can enough free space, we can relocate
      
      Above checks are valid but the following check is vague due to its
      implementation:
      - If and only if we can allocated a new block group to contain all the
        used space, we can relocate
      
      This design itself is OK, but the way to determine if we can allocate a
      new block group is problematic.
      
      btrfs_can_relocate() uses find_free_dev_extent() to find free space on a
      device.
      However find_free_dev_extent() only searches commit root and excludes
      dev extents allocated in current trans, this makes it unable to use dev
      extent just freed in current transaction.
      
      So for the following example, btrfs_can_relocate() will report ENOSPC:
      The example block group layout:
      1M      129M        257M       385M      513M       550M
      |///////|///////////|//////////|         |          |
      // = Used bg, consider all bg is 100% used for easy calculation.
      And all block groups are SINGLE, on-disk bytenr is the same as the
      logical bytenr.
      
      1) Bg in [129M, 257M) get relocated to [385M, 513M), transid=100
      1M      129M        257M       385M      513M       550M
      |///////|           |//////////|/////////|
      In transid 100, bg in [129M, 257M) get relocated to [385M, 513M)
      
      However transid 100 is not committed yet, so in dev commit tree, we
      still have the old dev extents layout:
      1M      129M        257M       385M      513M       550M
      |///////|///////////|//////////|         |          |
      
      2) Try to relocate bg [257M, 385M)
      We goes into btrfs_can_relocate(), no free space in current bgs, so we
      check if we can find large enough free dev extents.
      
      The first slot is [385M, 513M), but that is already used by new bg at
      [385M, 513M), so we continue search.
      
      The remaining slot is [512M, 550M), smaller than the bg's length 128M.
      So btrfs_can_relocate report ENOSPC.
      
      However this is over killed, in fact if we just skip btrfs_can_relocate()
      check, and go into regular relocation routine, at extent reservation time,
      if we can't find free extent, then we fallback to commit transaction,
      which will free up the dev extents and allow new block group to be created.
      
      [FIX]
      The fix here is to remove btrfs_can_relocate() completely.
      
      If we hit the false ENOSPC case just like btrfs/156, extent allocator
      will push harder by committing transaction and we will have space for
      new block group, avoiding the false ENOSPC.
      
      If we really ran out of space, we will hit ENOSPC at
      relocate_block_group(), and btrfs will just reports the ENOSPC error as
      usual.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      112974d4
    • Qu Wenruo's avatar
      btrfs: extent-tree: Add comment for inc_block_group_ro() · e9138142
      Qu Wenruo authored
      inc_block_group_ro() is only designed to mark one block group read-only,
      it doesn't really care if other block groups have enough free space to
      contain the used space in the block group.
      
      However due to the close connection between this function and
      relocation, sometimes we can be confused and think this function is
      responsible for balance space reservation, which is not true.
      
      Add some comment to make the functionality clear.
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e9138142
    • Qu Wenruo's avatar
      btrfs: volumes: Add comment for find_free_dev_extent_start() · 135da976
      Qu Wenruo authored
      Since commit 6df9a95e ("Btrfs: make the chunk allocator completely
      tree lockless") we search commit root of device tree to avoid deadlock.
      
      This introduced a safety feature, find_free_dev_extent_start() won't
      use dev extents which just get freed in current transaction.
      
      This safety feature makes sure we won't allocate new block group using
      just freed dev extents to break CoW.
      
      However, this feature also makes find_free_dev_extent_start() not
      reliable reporting free device space.  Just add such comment to make
      later viewer careful about this behavior.
      
      This behavior makes one caller, btrfs_can_relocate() unreliable
      determining the device free space.
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      135da976