• Josef Bacik's avatar
    btrfs: don't free qgroup space unless specified · d246331b
    Josef Bacik authored
    Boris noticed in his simple quotas testing that he was getting a leak
    with Sweet Tea's change to subvol create that stopped doing a
    transaction commit.  This was just a side effect of that change.
    
    In the delayed inode code we have an optimization that will free extra
    reservations if we think we can pack a dir item into an already modified
    leaf.  Previously this wouldn't be triggered in the subvolume create
    case because we'd commit the transaction, it was still possible but
    much harder to trigger.  It could actually be triggered if we did a
    mkdir && subvol create with qgroups enabled.
    
    This occurs because in btrfs_insert_delayed_dir_index(), which gets
    called when we're adding the dir item, we do the following:
    
      btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
    
    if we're able to skip reserving space.
    
    The problem here is that trans->block_rsv points at the temporary block
    rsv for the subvolume create, which has qgroup reservations in the block
    rsv.
    
    This is a problem because btrfs_block_rsv_release() will do the
    following:
    
      if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) {
    	  qgroup_to_release = block_rsv->qgroup_rsv_reserved -
    		  block_rsv->qgroup_rsv_size;
    	  block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size;
      }
    
    The temporary block rsv just has ->qgroup_rsv_reserved set,
    ->qgroup_rsv_size == 0.  The optimization in
    btrfs_insert_delayed_dir_index() sets ->qgroup_rsv_reserved = 0.  Then
    later on when we call btrfs_subvolume_release_metadata() which has
    
      btrfs_block_rsv_release(fs_info, rsv, (u64)-1, &qgroup_to_release);
      btrfs_qgroup_convert_reserved_meta(root, qgroup_to_release);
    
    qgroup_to_release is set to 0, and we do not convert the reserved
    metadata space.
    
    The problem here is that the block rsv code has been unconditionally
    messing with ->qgroup_rsv_reserved, because the main place this is used
    is delalloc, and any time we call btrfs_block_rsv_release() we do it
    with qgroup_to_release set, and thus do the proper accounting.
    
    The subvolume code is the only other code that uses the qgroup
    reservation stuff, but it's intermingled with the above optimization,
    and thus was getting its reservation freed out from underneath it and
    thus leaking the reserved space.
    
    The solution is to simply not mess with the qgroup reservations if we
    don't have qgroup_to_release set.  This works with the existing code as
    anything that messes with the delalloc reservations always have
    qgroup_to_release set.  This fixes the leak that Boris was observing.
    Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
    CC: stable@vger.kernel.org # 5.4+
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    d246331b
block-rsv.c 16.4 KB