1. 15 Dec, 2011 6 commits
    • Josef Bacik's avatar
      Btrfs: only set cache_generation if we setup the block group · e65cbb94
      Josef Bacik authored
      A user reported a problem booting into a new kernel with the old format inodes.
      He was panicing in cow_file_range while writing out the inode cache.  This is
      because if the block group is not cached we'll just skip writing out the cache,
      however if it gets dirtied again in the same transaction and it finished caching
      we'd go ahead and write it out, but since we set cache_generation to the transid
      we think we've already truncated it and will just carry on, running into
      cow_file_range and blowing up.  We need to make sure we only set
      cache_generation if we've done the truncate.  The user tested this patch and
      verified that the panic no longer occured.  Thanks,
      Reported-and-Tested-by: default avatarKlaus Bitto <klaus.bitto@gmail.com>
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      e65cbb94
    • Josef Bacik's avatar
      Btrfs: don't panic if orphan item already exists · ee4d89f0
      Josef Bacik authored
      I've been hitting this BUG_ON() in btrfs_orphan_add when running xfstest 269 in
      a loop.  This is because we will add an orphan item, do the truncate, the
      truncate will fail for whatever reason (*cough*ENOSPC*cough*) and then we're
      left with an orphan item still in the fs.  Then we come back later to do another
      truncate and it blows up because we already have an orphan item.  This is ok so
      just fix the BUG_ON() to only BUG() if ret is not EEXIST.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      ee4d89f0
    • Josef Bacik's avatar
      Btrfs: fix leaked space in truncate · 7041ee97
      Josef Bacik authored
      We were occasionaly leaking space when running xfstest 269.  This is because if
      we failed to start the transaction in the truncate loop we'd just goto out, but
      we need to break so that the inode is removed from the orphan list and the space
      is properly freed.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      7041ee97
    • Josef Bacik's avatar
      Btrfs: fix how we do delalloc reservations and how we free reservations on error · 660d3f6c
      Josef Bacik authored
      Running xfstests 269 with some tracing my scripts kept spitting out errors about
      releasing bytes that we didn't actually have reserved.  This took me down a huge
      rabbit hole and it turns out the way we deal with reserved_extents is wrong,
      we need to only be setting it if the reservation succeeds, otherwise the free()
      method will come in and unreserve space that isn't actually reserved yet, which
      can lead to other warnings and such.  The math was all working out right in the
      end, but it caused all sorts of other issues in addition to making my scripts
      yell and scream and generally make it impossible for me to track down the
      original issue I was looking for.  The other problem is with our error handling
      in the reservation code.  There are two cases that we need to deal with
      
      1) We raced with free.  In this case free won't free anything because csum_bytes
      is modified before we dro the lock in our reservation path, so free rightly
      doesn't release any space because the reservation code may be depending on that
      reservation.  However if we fail, we need the reservation side to do the free at
      that point since that space is no longer in use.  So as it stands the code was
      doing this fine and it worked out, except in case #2
      
      2) We don't race with free.  Nobody comes in and changes anything, and our
      reservation fails.  In this case we didn't reserve anything anyway and we just
      need to clean up csum_bytes but not free anything.  So we keep track of
      csum_bytes before we drop the lock and if it hasn't changed we know we can just
      decrement csum_bytes and carry on.
      
      Because of the case where we can race with free()'s since we have to drop our
      spin_lock to do the reservation, I'm going to serialize all reservations with
      the i_mutex.  We already get this for free in the heavy use paths, truncate and
      file write all hold the i_mutex, just needed to add it to page_mkwrite and
      various ioctl/balance things.  With this patch my space leak scripts no longer
      scream bloody murder.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      660d3f6c
    • Josef Bacik's avatar
      Btrfs: deal with enospc from dirtying inodes properly · 22c44fe6
      Josef Bacik authored
      Now that we're properly keeping track of delayed inode space we've been getting
      a lot of warnings out of btrfs_dirty_inode() when running xfstest 83.  This is
      because a bunch of people call mark_inode_dirty, which is void so we can't
      return ENOSPC.  This needs to be fixed in a few areas
      
      1) file_update_time - this updates the mtime and such when writing to a file,
      which will call mark_inode_dirty.  So copy file_update_time into btrfs so we can
      call btrfs_dirty_inode directly and return an error if we get one appropriately.
      
      2) fix symlinks to use btrfs_setattr for ->setattr.  For some reason we weren't
      setting ->setattr for symlinks, even though we should have been.  This catches
      one of the cases where we were getting errors in mark_inode_dirty.
      
      3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
      instead of mark_inode_dirty.  This lets us return errors properly for truncate
      and chown/anything related to setattr.
      
      4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
      print an error if we have one.  The only remaining user we can't control for
      this is touch_atime(), but we don't really want to keep people from walking
      down the tree if we don't have space to save the atime update, so just complain
      but don't worry about it.
      
      With this patch xfstests 83 complains a handful of times instead of hundreds of
      times.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      22c44fe6
    • Josef Bacik's avatar
      Btrfs: fix num_workers_starting bug and other bugs in async thread · 0dc3b84a
      Josef Bacik authored
      Al pointed out we have some random problems with the way we account for
      num_workers_starting in the async thread stuff.  First of all we need to make
      sure to decrement num_workers_starting if we fail to start the worker, so make
      __btrfs_start_workers do this.  Also fix __btrfs_start_workers so that it
      doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
      failed to create a worker.  Also check_pending_worker_creates needs to call
      __btrfs_start_work in it's work function since it already increments
      num_workers_starting.
      
      People only start one worker at a time, so get rid of the num_workers argument
      everywhere, and make btrfs_queue_worker a void since it will always succeed.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      0dc3b84a
  2. 09 Dec, 2011 1 commit
    • Chris Mason's avatar
      Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror · 5dbc8fca
      Chris Mason authored
      btrfs_end_bio checks the number of errors on a bio against the max
      number of errors allowed before sending any EIOs up to the higher
      levels.
      
      If we got enough copies of the bio done for a given raid level, it is
      supposed to clear the bio error flag and return success.
      
      We have pointers to the original bio sent down by the higher layers and
      pointers to any cloned bios we made for raid purposes.  If the original
      bio happens to be the one that got an io error, but not the last one to
      finish, it might not have the BIO_UPTODATE bit set.
      
      Then, when the last bio does finish, we'll call bio_end_io on the
      original bio.  It won't have the uptodate bit set and we'll end up
      sending EIO to the higher layers.
      
      We already had a check for this, it just was conditional on getting the
      IO error on the very last bio.  Make the check unconditional so we eat
      the EIOs properly.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      5dbc8fca
  3. 08 Dec, 2011 4 commits
  4. 01 Dec, 2011 1 commit
  5. 30 Nov, 2011 10 commits
  6. 21 Nov, 2011 1 commit
    • Chris Mason's avatar
      Btrfs: remove free-space-cache.c WARN during log replay · 24a70313
      Chris Mason authored
      The log replay code only partially loads block groups, since
      the block group caching code is able to detect and deal with
      extents the logging code has pinned down.
      
      While the logging code is pinning down block groups, there is
      a bogus WARN_ON we're hitting if the code wasn't able to find
      an extent in the cache.  This commit removes the warning because
      it can happen any time there isn't a valid free space cache
      for that block group.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      24a70313
  7. 20 Nov, 2011 10 commits
    • Josef Bacik's avatar
      Btrfs: sectorsize align offsets in fiemap · 4d479cf0
      Josef Bacik authored
      We've been hitting BUG()'s in btrfs_cont_expand and btrfs_fallocate and anywhere
      else that calls btrfs_get_extent while running xfstests 13 in a loop.  This is
      because fiemap is calling btrfs_get_extent with non-sectorsize aligned offsets,
      which will end up adding mappings that are not sectorsize aligned, which will
      cause problems in some cases for subsequent calls to btrfs_get_extent for
      similar areas that are sectorsize aligned.  With this patch I ran xfstests 13 in
      a loop for a couple of hours and didn't hit the problem that I could previously
      hit in at most 20 minutes.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      4d479cf0
    • Josef Bacik's avatar
      Btrfs: clear pages dirty for io and set them extent mapped · f7d61dcd
      Josef Bacik authored
      When doing the io_ctl helpers to clean up the free space cache stuff I stopped
      using our normal prepare_pages stuff, which means I of course forgot to do
      things like set the pages extent mapped, which will cause us all sorts of
      wonderful propblems.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      f7d61dcd
    • Josef Bacik's avatar
      Btrfs: wait on caching if we're loading the free space cache · 291c7d2f
      Josef Bacik authored
      We've been hitting panics when running xfstest 13 in a loop for long periods of
      time.  And actually this problem has always existed so we've been hitting these
      things randomly for a while.  Basically what happens is we get a thread coming
      into the allocator and reading the space cache off of disk and adding the
      entries to the free space cache as we go.  Then we get another thread that comes
      in and tries to allocate from that block group.  Since block_group->cached !=
      BTRFS_CACHE_NO it goes ahead and tries to do the allocation.  We do this because
      if we're doing the old slow way of caching we don't want to hold people up and
      wait for everything to finish.  The problem with this is we could end up
      discarding the space cache at some arbitrary point in the future, which means we
      could very well end up allocating space that is either bad, or when the real
      caching happens it could end up thinking the space isn't in use when it really
      is and cause all sorts of other problems.
      
      The solution is to add a new flag to indicate we are loading the free space
      cache from disk, and always try to cache the block group if cache->cached !=
      BTRFS_CACHE_FINISHED.  That way if we are loading the space cache anybody else
      who tries to allocate from the block group will have to wait until it's finished
      to make sure it completes successfully.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      291c7d2f
    • Arnd Hannemann's avatar
      Btrfs: prefix resize related printks with btrfs: · 5bb14682
      Arnd Hannemann authored
      For the user it is confusing to find something like:
      [10197.627710] new size for /dev/mapper/vg0-usr_share is 3221225472
      in kernel log, because it doesn't point directly to btrfs.
      
      This patch prefixes those messages with "btrfs:" like other btrfs
      related printks.
      Signed-off-by: default avatarArnd Hannemann <arnd@arndnet.de>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      5bb14682
    • David Sterba's avatar
      btrfs: fix stat blocks accounting · fadc0d8b
      David Sterba authored
      Round inode bytes and delalloc bytes up to real blocksize before
      converting to sector size. Otherwise eg. files smaller than 512
      are reported with zero blocks due to incorrect rounding.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      fadc0d8b
    • Li Zefan's avatar
      Btrfs: avoid unnecessary bitmap search for cluster setup · 52621cb6
      Li Zefan authored
      setup_cluster_no_bitmap() searches all the extents and bitmaps starting
      from offset. Therefore if it returns -ENOSPC, all the bitmaps starting
      from offset are in the bitmaps list, so it's sufficient to search from
      this list in setup_cluser_bitmap().
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      52621cb6
    • Li Zefan's avatar
      Btrfs: fix to search one more bitmap for cluster setup · 0f0fbf1d
      Li Zefan authored
      Suppose there are two bitmaps [0, 256], [256, 512] and one extent
      [100, 120] in the free space cache, and we want to setup a cluster
      with offset=100, bytes=50.
      
      In this case, there will be only one bitmap [256, 512] in the temporary
      bitmaps list, and then setup_cluster_bitmap() won't search bitmap [0, 256].
      
      The cause is, the list is constructed in setup_cluster_no_bitmap(),
      and only bitmaps with bitmap_entry->offset >= offset will be added
      into the list, and the very bitmap that convers offset has
      bitmap_entry->offset <= offset.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      0f0fbf1d
    • Jan Schmidt's avatar
      btrfs: mirror_num should be int, not u64 · 32240a91
      Jan Schmidt authored
      My previous patch introduced some u64 for failed_mirror variables, this one
      makes it consistent again.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      32240a91
    • Jeff Mahoney's avatar
      btrfs: Fix up 32/64-bit compatibility for new ioctls · 745c4d8e
      Jeff Mahoney authored
       This patch casts to unsigned long before casting to a pointer and fixes
       the following warnings:
      fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      745c4d8e
    • Chris Mason's avatar
      Btrfs: fix barrier flushes · 387125fc
      Chris Mason authored
      When btrfs is writing the super blocks, it send barrier flushes to make
      sure writeback caching drives get all the metadata on disk in the
      right order.
      
      But, we have two bugs in the way these are sent down.  When doing
      full commits (not via the tree log), we are sending the barrier down
      before the last super when it should be going down before the first.
      
      In multi-device setups, we should be waiting for the barriers to
      complete on all devices before writing any of the supers.
      
      Both of these bugs can cause corruptions on power failures.  We fix it
      with some new code to send down empty barriers to all devices before
      writing the first super.
      
      Alexandre Oliva found the multi-device bug.  Arne Jansen did the async
      barrier loop.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      Reported-by: default avatarAlexandre Oliva <oliva@lsd.ic.unicamp.br>
      387125fc
  8. 15 Nov, 2011 1 commit
    • Liu Bo's avatar
      Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush · f1ebcc74
      Liu Bo authored
      The btrfs snapshotting code requires that once a root has been
      snapshotted, we don't change it during a commit.
      
      But there are two cases to lead to tree corruptions:
      
      1) multi-thread snapshots can commit serveral snapshots in a transaction,
         and this may change the src root when processing the following pending
         snapshots, which lead to the former snapshots corruptions;
      
      2) the free inode cache was changing the roots when it root the cache,
         which lead to corruptions.
      
      This fixes things by making sure we force COW the block after we create a
      snapshot during commiting a transaction, then any changes to the roots
      will result in COW, and we get all the fs roots and snapshot roots to be
      consistent.
      Signed-off-by: default avatarLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      f1ebcc74
  9. 11 Nov, 2011 6 commits