1. 02 Jul, 2012 8 commits
    • Josef Bacik's avatar
      Btrfs: hold a ref on the inode during writepages · 7fd1a3f7
      Josef Bacik authored
      We can race with unlink and not actually be able to do our igrab in
      btrfs_add_ordered_extent.  This will result in all sorts of problems.
      Instead of doing the complicated work to try and handle returning an error
      properly from btrfs_add_ordered_extent, just hold a ref to the inode during
      writepages.  If we cannot grab a ref we know we're freeing this inode anyway
      and can just drop the dirty pages on the floor, because screw them we're
      going to invalidate them anyway.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      7fd1a3f7
    • Josef Bacik's avatar
      Btrfs: fix tree log remove space corner case · bdb7d303
      Josef Bacik authored
      The tree log stuff can have allocated space that we end up having split
      across a bitmap and a real extent.  The free space code does not deal with
      this, it assumes that if it finds an extent or bitmap entry that the entire
      range must fall within the entry it finds.  This isn't necessarily the case,
      so rework the remove function so it can handle this case properly.  This
      fixed two panics the user hit, first in the case where the space was
      initially in a bitmap and then in an extent entry, and then the reverse
      case.  Thanks,
      Reported-and-tested-by: default avatarShaun Reich <sreich@kde.org>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      bdb7d303
    • Liu Bo's avatar
      Btrfs: fix wrong check during log recovery · 6bf02314
      Liu Bo authored
      When we're evicting an inode during log recovery, we need to ensure that the inode
      is not in orphan state any more, which means inode's run_time flags has _no_
      BTRFS_INODE_HAS_ORPHAN_ITEM.  Thus, the BUG_ON was triggered because of a wrong
      check for the flags.
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      6bf02314
    • Alexander Block's avatar
      Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS · d3a94048
      Alexander Block authored
      We used the wrong ioctl macro for the getflags ioctl before.
      As we don't have the set/getflags ioctls in the user space ioctl.h
      at the moment, it's safe to fix it now.
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarAlexander Block <ablock84@googlemail.com>
      d3a94048
    • Ilya Dryomov's avatar
      Btrfs: resume balance on rw (re)mounts properly · 2b6ba629
      Ilya Dryomov authored
      This introduces btrfs_resume_balance_async(), which, given that
      restriper state was recovered earlier by btrfs_recover_balance(),
      resumes balance in btrfs-balance kthread.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      2b6ba629
    • Ilya Dryomov's avatar
      Btrfs: restore restriper state on all mounts · 68310a5e
      Ilya Dryomov authored
      Fix a bug that triggered asserts in btrfs_balance() in both normal and
      resume modes -- restriper state was not properly restored on read-only
      mounts.  This factors out resuming code from btrfs_restore_balance(),
      which is now also called earlier in the mount sequence to avoid the
      problem of some early writes getting the old profile.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      68310a5e
    • Josef Bacik's avatar
      Btrfs: fix dio write vs buffered read race · c3473e83
      Josef Bacik authored
      Miao pointed out there's a problem with mixing dio writes and buffered
      reads.  If the read happens between us invalidating the page range and
      actually locking the extent we can bring in pages into page cache.  Then
      once the write finishes if somebody tries to read again it will just find
      uptodate pages and we'll read stale data.  So we need to lock the extent and
      check for uptodate bits in the range.  If there are uptodate bits we need to
      unlock and invalidate again.  This will keep this race from happening since
      we will hold the extent locked until we create the ordered extent, and then
      teh read side always waits for ordered extents.  There was also a race in
      how we updated i_size, previously we were relying on the generic DIO stuff
      to adjust the i_size after the DIO had completed, but this happens outside
      of the extent lock which means reads could come in and not see the updated
      i_size.  So instead move this work into where we create the extents, and
      then this way the update ordered i_size stuff works properly in the endio
      handlers.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      c3473e83
    • Stefan Behrens's avatar
      Btrfs: don't count I/O statistic read errors for missing devices · 597a60fa
      Stefan Behrens authored
      It is normal behaviour of the low level btrfs function btrfs_map_bio()
      to complete a bio with -EIO if the device is missing, instead of just
      preventing the bio creation in an earlier step.
      This used to cause I/O statistic read error increments and annoying
      printk_ratelimited messages. This commit fixes the issue.
      Signed-off-by: default avatarStefan Behrens <sbehrens@giantdisaster.de>
      Reported-by: default avatarCarey Underwood <cwillu@cwillu.com>
      597a60fa
  2. 27 Jun, 2012 7 commits
    • Jan Schmidt's avatar
      Btrfs: resolve tree mod log locking issue in btrfs_next_leaf · d42244a0
      Jan Schmidt authored
      With the tree mod log, we may end up with two roots (the current root and a
      rewinded version of it) both pointing to two leaves, l1 and l2, of which l2
      had already been cow-ed in the current transaction. If we don't rewind any
      tree blocks, we cannot have two roots both pointing to an already cowed tree
      block.
      
      Now there is btrfs_next_leaf, which has a leaf locked and wants a lock on
      the next (right) leaf. And there is push_leaf_left, which has a (cowed!)
      leaf locked and wants a lock on the previous (left) leaf.
      
      In order to solve this dead lock situation, we use try_lock in
      btrfs_next_leaf (only in case it's called with a tree mod log time_seq
      paramter) and if we fail to get a lock on the next leaf, we give up our lock
      on the current leaf and retry from the very beginning.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      d42244a0
    • Jan Schmidt's avatar
      Btrfs: fix tree mod log rewind of ADD operations · 19956c7e
      Jan Schmidt authored
      When a MOD_LOG_KEY_ADD operation is rewinded, we remove the key from the
      tree block. If its not the last key, removal involves a move operation.
      This move operation was explicitly done before this commit.
      
      However, at insertion time, there's a move operation before the actual
      addition to make room for the new key, which is recorded in the tree mod
      log as well. This means, we must drop the move operation when rewinding the
      add operation, because the next operation we'll be rewinding will be the
      corresponding MOD_LOG_MOVE_KEYS operation.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      19956c7e
    • Jan Schmidt's avatar
      Btrfs: leave critical region in btrfs_find_all_roots as soon as possible · 155725c9
      Jan Schmidt authored
      When delayed refs exist, btrfs_find_all_roots used to hold the delayed ref
      mutex way longer than actually required. We ought to drop it immediately
      after we're done collecting all the delayed refs.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      155725c9
    • Jan Schmidt's avatar
      Btrfs: always put insert_ptr modifications into the tree mod log · c3e06965
      Jan Schmidt authored
      Several callers of insert_ptr set the tree_mod_log parameter to 0 to avoid
      addition to the tree mod log. In fact, we need all of those operations. This
      commit simply removes the additional parameter and makes addition to the
      tree mod log unconditional.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      c3e06965
    • Jan Schmidt's avatar
      Btrfs: fix tree mod log for root replacements at leaf level · 28da9fb4
      Jan Schmidt authored
      For the tree mod log, we don't log any operations at leaf level. If the root
      is at the leaf level (i.e. the tree consists only of the root), then
      __tree_mod_log_oldest_root will find a ROOT_REPLACE operation in the log
      (because we always log that one no matter which level), but no other
      operations.
      
      With this patch __tree_mod_log_oldest_root exits cleanly instead of
      BUGging in this situation. get_old_root checks if its really a root at leaf
      level in case we don't have any operations and WARNs if this assumption
      breaks.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      28da9fb4
    • Jan Schmidt's avatar
      Btrfs: support root level changes in __resolve_indirect_ref · 9345457f
      Jan Schmidt authored
      With the tree mod log, we can have a tree that's two levels high, but
      btrfs_search_old_slot may still return a path with the tree root at level
      one instead. __resolve_indirect_ref must care for this and accept parents in
      a lower level than expected.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      9345457f
    • Jan Schmidt's avatar
      Btrfs: avoid waiting for delayed refs when we must not · 8ca78f3e
      Jan Schmidt authored
      We track two conditions to decide if we should sleep while waiting for more
      delayed refs, the number of delayed refs (num_refs) and the first entry in
      the list of blockers (first_seq).
      
      When we suspect staleness, we save num_refs and do one more cycle. If
      nothing changes, we then save first_seq for later comparison and do
      wait_event. We ought to save first_seq the very same moment we're saving
      num_refs. Otherwise we cannot be sure that nothing has changed and we might
      start waiting when we shouldn't, which could lead to starvation.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      8ca78f3e
  3. 21 Jun, 2012 4 commits
  4. 16 Jun, 2012 2 commits
  5. 15 Jun, 2012 19 commits
    • Liu Bo's avatar
      Btrfs: update MAINTAINERS info for BTRFS FILE SYSTEM · 9c106405
      Liu Bo authored
      Update to the latest btrfs's maintainer mail and git repo.
      Signed-off-by: default avatarLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      9c106405
    • Miao Xie's avatar
      Btrfs: destroy the items of the delayed inodes in error handling routine · 67cde344
      Miao Xie authored
      the items of the delayed inodes were forgotten to be freed, this patch
      fixes it.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      67cde344
    • Liu Bo's avatar
      Btrfs: make sure that we've made everything in pinned tree clean · ed0eaa14
      Liu Bo authored
      Since we have two trees for recording pinned extents, we need to go through
      both of them to make sure that we've done everything clean.
      Signed-off-by: default avatarLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      ed0eaa14
    • Liu Bo's avatar
      Btrfs: avoid memory leak of extent state in error handling routine · 6e841e32
      Liu Bo authored
      We've forgotten to clear extent states in pinned tree, which will results in
      space counter mismatch and memory leak:
      
      WARNING: at fs/btrfs/extent-tree.c:7537 btrfs_free_block_groups+0x1f3/0x2e0 [btrfs]()
      ...
      space_info 2 has 8380416 free, is not full
      space_info total=12582912, used=4096, pinned=4096, reserved=0, may_use=0, readonly=4194304
      btrfs state leak: start 29364224 end 29376511 state 1 in tree ffff880075f20090 refs 1
      ...
      Signed-off-by: default avatarLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      6e841e32
    • Liu Bo's avatar
      Btrfs: do not resize a seeding device · 4e42ae1b
      Liu Bo authored
      Seeding devices are not supposed to change any more.
      Signed-off-by: default avatarLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      4e42ae1b
    • Liu Bo's avatar
      Btrfs: fix missing inherited flag in rename · bc178237
      Liu Bo authored
      When we move a file into a directory with compression flag, we need to
      inherite BTRFS_INODE_COMPRESS and clear BTRFS_INODE_NOCOMPRESS as well.
      But if we move a file into a directory without compression flag, we need
      to clear both of them.
      
      It is the way how our setflags deals with compression flag, so keep
      the same behaviour here.
      Signed-off-by: default avatarLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      bc178237
    • Chris Mason's avatar
    • Li Zefan's avatar
      Btrfs: fix incompat flags setting · 69e380d1
      Li Zefan authored
      It's a bug, but it happens to work, as BTRFS_COMPRESS_LZO == 2, which
      has only one bit set.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      69e380d1
    • Li Zefan's avatar
      Btrfs: fix defrag regression · 6c282eb4
      Li Zefan authored
      If a file has 3 small extents:
      
      | ext1 | ext2 | ext3 |
      
      Running "btrfs fi defrag" will only defrag the last two extents, if those
      extent mappings hasn't been read into memory from disk.
      
      This bug was introduced by commit 17ce6ef8
      ("Btrfs: add a check to decide if we should defrag the range")
      
      The cause is, that commit looked into previous and next extents using
      lookup_extent_mapping() only.
      
      While at it, remove the code that checks the previous extent, since
      it's sufficient to check the next extent.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      6c282eb4
    • Josef Bacik's avatar
      Btrfs: call filemap_fdatawrite twice for compression · 7ddf5a42
      Josef Bacik authored
      I removed this in an earlier commit and I was wrong.  Because compression
      can return from filemap_fdatawrite() without having actually set any of it's
      pages as writeback() it can make filemap_fdatawait() do essentially nothing,
      and then we won't find any ordered extents because they may not have been
      created yet.  So not only does this make fsync() completely useless, but it
      will also screw up if you truncate on a non-page aligned offset since we
      zero out the end and then wait on ordered extents and then call drop caches.
      We can drop the cache before the io completes and then we try to unpin the
      extent we just wrote we won't find it and everything goes sideways.  So fix
      this by putting it back and put a giant comment there to keep me from trying
      to remove it in the future.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      7ddf5a42
    • Josef Bacik's avatar
      Btrfs: keep inode pinned when compressing writes · 8180ef88
      Josef Bacik authored
      A user reported lots of problems using compression on the new code and it
      turns out part of the problem was that igrab() was failing when we added a
      new ordered extent.  This is because when writing out an inode under
      compression we immediately return without actually doing anything to the
      pages, and then in another thread at some point down the line actually do
      the ordered dance.  The problem is between the point that we start writeback
      and we actually add the ordered extent we could be trying to reclaim the
      inode, which makes igrab() return NULL.  So we need to do an igrab() when we
      create the async extent and then drop it when we are done with it.  This
      makes sure we stay pinned in memory until the ordered extent can get a
      reference on it and we are good to go.  With this patch we no longer panic
      in btrfs_finish_ordered_io().  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      8180ef88
    • Josef Bacik's avatar
      Btrfs: implement ->show_devname · 9c5085c1
      Josef Bacik authored
      Because btrfs can remove the device that was mounted we need to have a
      ->show_devname so that in this case we can print out some other device in
      the file system to /proc/mount.  So if there are multiple devices in a btrfs
      file system we will just print the device with the lowest devid that we can
      find.  This will make everything consistent and deal with device removal
      properly.  The drawback is if you mount with a device that is higher than
      the lowest devicd it won't show up as the mounted device in /proc/mounts,
      but this is a small price to pay. This was inspired by Miao Xie's patch.
      Thanks,
      Reviewed-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      9c5085c1
    • Josef Bacik's avatar
      Btrfs: use rcu to protect device->name · 606686ee
      Josef Bacik authored
      Al pointed out that we can just toss out the old name on a device and add a
      new one arbitrarily, so anybody who uses device->name in printk could
      possibly use free'd memory.  Instead of adding locking around all of this he
      suggested doing it with RCU, so I've introduced a struct rcu_string that
      does just that and have gone through and protected all accesses to
      device->name that aren't under the uuid_mutex with rcu_read_lock().  This
      protects us and I will use it for dealing with removing the device that we
      used to mount the file system in a later patch.  Thanks,
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      606686ee
    • Josef Bacik's avatar
      Btrfs: unlock everything properly in the error case for nocow · 17ca04af
      Josef Bacik authored
      I was getting hung on umount when a transaction was aborted because a range
      of one of the free space inodes was still locked.  This is because the nocow
      stuff doesn't unlock anything on error.  This fixed the problem and I
      verified that is what was happening.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      17ca04af
    • Josef Bacik's avatar
      Btrfs: fix btrfs_destroy_marked_extents · ee670f0a
      Josef Bacik authored
      So we're forcing the eb's to have their ref count set to 1 so invalidatepage
      works but this breaks lots of things, for example root nodes, and is just
      plain wrong, we don't need to just evict all of this stuff.  Also drop the
      invalidatepage altogether and add a page_cache_release().  With this patch
      we no longer hang when trying to access the root nodes after an aborted
      transaction and we no longer leak memory.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      ee670f0a
    • Josef Bacik's avatar
      Btrfs: abort the transaction if the commit fails · 7b8b92af
      Josef Bacik authored
      If a transaction commit fails we don't abort it so we don't set an error on
      the file system.  This patch fixes that by actually calling the abort stuff
      and then adding a check for a fs error in the transaction start stuff to
      make sure it is caught properly.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      7b8b92af
    • Josef Bacik's avatar
      Btrfs: wake up transaction waiters when aborting a transaction · d7096fc3
      Josef Bacik authored
      I was getting lots of hung tasks and a NULL pointer dereference because we
      are not cleaning up the transaction properly when it aborts.  First we need
      to reset the running_transaction to NULL so we don't get a bad dereference
      for any start_transaction callers after this.  Also we cannot rely on
      waitqueue_active() since it's just a list_empty(), so just call wake_up()
      directly since that will do the barrier for us and such.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      d7096fc3
    • Josef Bacik's avatar
      Btrfs: fix locking in btrfs_destroy_delayed_refs · b939d1ab
      Josef Bacik authored
      The transaction abort stuff was throwing warnings from the list debugging
      code because we do a list_del_init outside of the delayed_refs spin lock.
      The delayed refs locking makes baby Jesus cry so it's not hard to get wrong,
      but we need to take the ref head mutex to make sure it's not being processed
      currently, and so if it is we need to drop the spin lock and then take and
      drop the mutex and do the search again.  If we can take the mutex then we
      can safely remove the head from the list and carry on.  Now when the
      transaction aborts I don't get the list debugging warnings.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      b939d1ab
    • Josef Bacik's avatar
      Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an error · beb42dd7
      Josef Bacik authored
      While doing my enospc work I got a transaction abortion that resulted in a
      panic when we tried to unlock_page() an already unlocked page.  This is
      because we aren't calling extent_clear_unlock_delalloc with the locked page
      so it was unlocking all the pages in the range.  This is wrong since
      __extent_writepage expects to have the page locked still unless we return
      *page_started as 1.  This should keep us from panicing.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      beb42dd7