• Josef Bacik's avatar
    Btrfs: fix enospc when there is plenty of space · 80eb234a
    Josef Bacik authored
    So there is an odd case where we can possibly return -ENOSPC when there is in
    fact space to be had.  It only happens with Metadata writes, and happens _very_
    infrequently.  What has to happen is we have to allocate have allocated out of
    the first logical byte on the disk, which would set last_alloc to
    first_logical_byte(root, 0), so search_start == orig_search_start.  We then
    need to allocate for normal metadata, so BTRFS_BLOCK_GROUP_METADATA |
    BTRFS_BLOCK_GROUP_DUP.  We will do a block lookup for the given search_start,
    block_group_bits() won't match and we'll go to choose another block group.
    However because search_start matches orig_search_start we go to see if we can
    allocate a chunk.
    
    If we are in the situation that we cannot allocate a chunk, we fail and ENOSPC.
    This is kind of a big flaw of the way find_free_extent works, as it along with
    find_free_space loop through _all_ of the block groups, not just the ones that
    we want to allocate out of.  This patch completely kills find_free_space and
    rolls it into find_free_extent.  I've introduced a sort of state machine into
    this, which will make it easier to get cache miss information out of the
    allocator, and will work well with my locking changes.
    
    The basic flow is this:  We have the variable loop which is 0, meaning we are
    in the hint phase.  We lookup the block group for the hint, and lookup the
    space_info for what we want to allocate out of.  If the block group we were
    pointed at by the hint either isn't of the correct type, or just doesn't have
    the space we need, we set head to space_info->block_groups, so we start at the
    beginning of the block groups for this particular space info, and loop through.
    
    This is also where we add the empty_cluster to total_needed.  At this point
    loop is set to 1 and we just loop through all of the block groups for this
    particular space_info looking for the space we need, just as find_free_space
    would have done, except we only hit the block groups we want and not _all_ of
    the block groups.  If we come full circle we see if we can allocate a chunk.
    If we cannot of course we exit with -ENOSPC and we are good.  If not we start
    over at space_info->block_groups and loop through again, with loop == 2.  If we
    come full circle and haven't found what we need then we exit with -ENOSPC.
    I've been running this for a couple of days now and it seems stable, and I
    haven't yet hit a -ENOSPC when there was plenty of space left.
    
    Also I've made a groups_sem to handle the group list for the space_info.  This
    is part of my locking changes, but is relatively safe and seems better than
    holding the space_info spinlock over that entire search time.  Thanks,
    Signed-off-by: default avatarJosef Bacik <jbacik@redhat.com>
     
    80eb234a
ctree.h 62.8 KB