1. 02 Jul, 2013 1 commit
    • Josef Bacik's avatar
      Btrfs: check for actual acls rather than just xattrs when caching no acl · f23b5a59
      Josef Bacik authored
      We have an optimization that will go ahead and cache no acls on an inode if
      there are no xattrs on the inode.  This saves us a lookup later to check the
      acls for writes or any other access.  The problem is I use selinux so I always
      have an xattr on inodes, so make this test a little smarter and check for the
      actual acl hash on the key and if it isn't there then we still get to cache no
      acl which makes everybody who uses selinux a little happier.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      f23b5a59
  2. 01 Jul, 2013 13 commits
    • Josef Bacik's avatar
      Btrfs: move btrfs_truncate_page to btrfs_cont_expand instead of btrfs_truncate · a71754fc
      Josef Bacik authored
      This has plagued us forever and I'm so over working around it.  When we truncate
      down to a non-page aligned offset we will call btrfs_truncate_page to zero out
      the end of the page and write it back to disk, this will keep us from exposing
      stale data if we truncate back up from that point.  The problem with this is it
      requires data space to do this, and people don't really expect to get ENOSPC
      from truncate() for these sort of things.  This also tends to bite the orphan
      cleanup stuff too which keeps people from mounting.  To get around this we can
      just move this into btrfs_cont_expand() to make sure if we are truncating up
      from a non-page size aligned i_size we will zero out the rest of this page so
      that we don't expose stale data.  This will give ENOSPC if you try to truncate()
      up or if you try to write past the end of isize, which is much more reasonable.
      This fixes xfstests generic/083 failing to mount because of the orphan cleanup
      failing.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      a71754fc
    • Josef Bacik's avatar
      Btrfs: optimize reada_for_balance · 0b08851f
      Josef Bacik authored
      This patch does two things.  First we no longer explicitly read in the blocks
      we're trying to readahead.  For things like balance_level we may never actually
      use the blocks so this just adds uneeded latency, and balance_level and
      split_node will both read in the blocks they care about explicitly so if the
      blocks need to be waited on it will be done there.  Secondly we no longer drop
      the path if we do readahead, we just set the path blocking before we call
      reada_for_balance() and then we're good to go.  Hopefully this will cut down on
      the number of re-searches.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      0b08851f
    • Josef Bacik's avatar
      Btrfs: optimize read_block_for_search · bdf7c00e
      Josef Bacik authored
      This patch does two things, first it only does one call to
      btrfs_buffer_uptodate() with the gen specified instead of once with 0 and then
      again with gen specified.  The other thing is to call btrfs_read_buffer() on the
      buffer we've found instead of dropping it and then calling read_tree_block().
      This will keep us from doing yet another radix tree lookup for a buffer we've
      already found.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      bdf7c00e
    • Josef Bacik's avatar
      Btrfs: unlock extent range on enospc in compressed submit · fdf8e2ea
      Josef Bacik authored
      A user reported a deadlock where the async submit thread was blocked on the
      lock_extent() lock, and then everybody behind him was locked on the page lock
      for the page he was holding.  Looking at the code I noticed we do not unlock the
      extent range when we get ENOSPC and goto retry.  This is bad because we
      immediately try to lock that range again to do the cow, which will cause a
      deadlock.  Fix this by unlocking the range.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      fdf8e2ea
    • Wang Sheng-Hui's avatar
      Btrfs: fix the comment typo for btrfs_attach_transaction_barrier · 90b6d283
      Wang Sheng-Hui authored
      The comment is for btrfs_attach_transaction_barrier, not for
      btrfs_attach_transaction. Fix the typo.
      Signed-off-by: default avatarWang Sheng-Hui <shhuiw@gmail.com>
      Acked-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      90b6d283
    • Josef Bacik's avatar
      Btrfs: fix not being able to find skinny extents during relocate · aee68ee5
      Josef Bacik authored
      We unconditionally search for the EXTENT_ITEM_KEY for metadata during balance,
      and then check the key that we found to see if it is actually a
      METADATA_ITEM_KEY, but this doesn't work right because METADATA is a higher key
      value, so if what we are looking for happens to be the first item in the leaf
      the search will dump us out at the previous leaf, and we won't find our item.
      So instead do what we do everywhere else, search for the skinny extent first and
      if we don't find it go back and re-search for the extent item.  This patch fixes
      the panic I was hitting when balancing a large file system with skinny extents.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      aee68ee5
    • Josef Bacik's avatar
      Btrfs: cleanup backref search commit root flag stuff · da61d31a
      Josef Bacik authored
      Looking into this backref problem I noticed we're using a macro to what turns
      out to essentially be a NULL check to see if we need to search the commit root.
      I'm killing this, let's just do what everybody else does and checks if trans ==
      NULL.  I've also made it so we pass in the path to __resolve_indirect_refs which
      will have the search_commit_root flag set properly already and that way we can
      avoid allocating another path when we have a perfectly good one to use.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      da61d31a
    • Josef Bacik's avatar
      Btrfs: free csums when we're done scrubbing an extent · d88d46c6
      Josef Bacik authored
      A user reported scrub taking up an unreasonable amount of ram as it ran.  This
      is because we lookup the csums for the extent we're scrubbing but don't free it
      up until after we're done with the scrub, which means we can take up a whole lot
      of ram.  This patch fixes this by dropping the csums once we're done with the
      extent we've scrubbed.  The user reported this to fix their problem.  Thanks,
      Reported-and-tested-by: default avatarRemco Hosman <remco@hosman.xs4all.nl>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      d88d46c6
    • Josef Bacik's avatar
      Btrfs: fix transaction throttling for delayed refs · 1be41b78
      Josef Bacik authored
      Dave has this fs_mark script that can make btrfs abort with sufficient amount of
      ram.  This is because with more ram we can keep more dirty metadata in cache
      which in a round about way makes for many more pending delayed refs.  What
      happens is we end up not throttling the transaction enough so when we go to
      commit the transaction when we've completely filled the file system we'll
      abort() because we use all of the space in the global reserve and we still have
      delayed refs to run.  To fix this we need to make the delayed ref flushing and
      the transaction throttling dependant upon the number of delayed refs that we
      have instead of how much reserved space is left in the global reserve.  With
      this patch we not only stop aborting transactions but we also get a smoother run
      speed with fs_mark and it makes us about 10% faster.  Thanks,
      Reported-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      1be41b78
    • Josef Bacik's avatar
      Btrfs: stop waiting on current trans if we aborted · 501407aa
      Josef Bacik authored
      I hit a hang when run_delayed_refs returned an error in the beginning of
      btrfs_commit_transaction.  If we decide we need to commit the transaction in
      btrfs_end_transaction we'll set BLOCKED and start to commit, but if we get an
      error this early on we'll just exit without committing.  This is fine, except
      that anybody else who tried to start a transaction will sit in
      wait_current_trans() since we're set to BLOCKED and we never set it to something
      else and woke people up.  To fix this we want to check for trans->aborted
      everywhere we wait for the transaction state to change, and make
      btrfs_abort_transaction() wake up any waiters there may be.  All the callers
      will notice that the transaction has aborted and exit out properly.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      501407aa
    • Josef Bacik's avatar
      Btrfs: wake up delayed ref flushing waiters on abort · f971fe29
      Josef Bacik authored
      I hit a deadlock because we aborted when flushing delayed refs but didn't wake
      any of the other flushers up and so everybody was just sleeping forever.  This
      should fix the problem.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      f971fe29
    • Jie Liu's avatar
      btrfs: fix the code comments for LZO compression workspace · 3fb40375
      Jie Liu authored
      Fix the code comments for lzo compression workspace.
      The buf item is used to store the decompressed data
      and cbuf is used to store the compressed data.
      Signed-off-by: default avatarJie Liu <jeff.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      3fb40375
    • Miao Xie's avatar
      Btrfs: fix broken nocow after balance · 5bc7247a
      Miao Xie authored
      Balance will create reloc_root for each fs root, and it's going to
      record last_snapshot to filter shared blocks.  The side effect of
      setting last_snapshot is to break nocow attributes of files.
      
      Since the extents are not shared by the relocation tree after the balance,
      we can recover the old last_snapshot safely if no one snapshoted the
      source tree. We fix the above problem by this way.
      Reported-by: default avatarKyle Gates <kylegates@hotmail.com>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      5bc7247a
  3. 14 Jun, 2013 26 commits
    • Josef Bacik's avatar
      Btrfs: exclude logged extents before replying when we are mixed · 8c2a1a30
      Josef Bacik authored
      With non-mixed block groups we replay the logs before we're allowed to do any
      writes, so we get away with not pinning/removing the data extents until right
      when we replay them.  However with mixed block groups we allocate out of the
      same pool, so we could easily allocate a metadata block that was logged in our
      tree log.  To deal with this we just need to notice that we have mixed block
      groups and do the normal excluding/removal dance during the pin stage of the log
      replay and that way we don't allocate metadata blocks from areas we have logged
      data extents.  With this patch we now pass xfstests generic/311 with mixed
      block groups turned on.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      8c2a1a30
    • Josef Bacik's avatar
      Btrfs: put our inode if orphan cleanup fails · 01cd3367
      Josef Bacik authored
      When we cross into a different subvol when doing a lookup we will run the orhpan
      cleanup.  If this fails however we do not drop the ref to the inode we were
      looking up before we return an error, which leads to busy inodes on umount.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      01cd3367
    • Josef Bacik's avatar
      Btrfs: add some missing iput()'s in btrfs_orphan_cleanup · c69b26b0
      Josef Bacik authored
      There are some error cases that we don't do an iput() on our inode, fix this.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      c69b26b0
    • Josef Bacik's avatar
      Btrfs: do not pin while under spin lock · e78417d1
      Josef Bacik authored
      When testing a corrupted fs I noticed I was getting sleep while atomic errors
      when the transaction aborted.  This is because btrfs_pin_extent may need to
      allocate memory and we are calling this under the spin lock.  Fix this by moving
      it out and doing the pin after dropping the spin lock but before dropping the
      mutex, the same way it works when delayed refs run normally.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      e78417d1
    • Thomas Meyer's avatar
      Btrfs: Cocci spatch "memdup.spatch" · a5959bc0
      Thomas Meyer authored
      Signed-off-by: default avatarThomas Meyer <thomas@m3y3r.de>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      a5959bc0
    • Thomas Meyer's avatar
      Btrfs: Cocci spatch "ptr_ret.spatch" · 97a184fe
      Thomas Meyer authored
      Signed-off-by: default avatarThomas Meyer <thomas@m3y3r.de>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      97a184fe
    • Jan Schmidt's avatar
      Btrfs: fix qgroup rescan resume on mount · b382a324
      Jan Schmidt authored
      When called during mount, we cannot start the rescan worker thread until
      open_ctree is done. This commit restuctures the qgroup rescan internals to
      enable a clean deferral of the rescan resume operation.
      
      First of all, the struct qgroup_rescan is removed, saving us a malloc and
      some initialization synchronizations problems. Its only element (the worker
      struct) now lives within fs_info just as the rest of the rescan code.
      
      Then setting up a rescan worker is split into several reusable stages.
      Currently we have three different rescan startup scenarios:
      	(A) rescan ioctl
      	(B) rescan resume by mount
      	(C) rescan by quota enable
      
      Each case needs its own combination of the four following steps:
      	(1) set the progress [A, C: zero; B: state of umount]
      	(2) commit the transaction [A]
      	(3) set the counters [A, C: zero; B: state of umount]
      	(4) start worker [A, B, C]
      
      qgroup_rescan_init does step (1). There's no extra function added to commit
      a transaction, we've got that already. qgroup_rescan_zero_tracking does
      step (3). Step (4) is nothing more than a call to the generic
      btrfs_queue_worker.
      
      We also get rid of a double check for the rescan progress during
      btrfs_qgroup_account_ref, which is no longer required due to having step 2
      from the list above.
      
      As a side effect, this commit prepares to move the rescan start code from
      btrfs_run_qgroups (which is run during commit) to a less time critical
      section.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      b382a324
    • Jan Schmidt's avatar
      Btrfs: avoid double free of fs_info->qgroup_ulist · eb1716af
      Jan Schmidt authored
      When btrfs_read_qgroup_config or btrfs_quota_enable return non-zero, we've
      already freed the fs_info->qgroup_ulist. The final btrfs_free_qgroup_config
      called from quota_disable makes another ulist_free(fs_info->qgroup_ulist)
      call.
      
      We set fs_info->qgroup_ulist to NULL on the mentioned error paths, turning
      the ulist_free in btrfs_free_qgroup_config into a noop.
      
      Cc: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      eb1716af
    • Jan Schmidt's avatar
      Btrfs: fix memory patcher through fs_info->qgroup_ulist · 4373519d
      Jan Schmidt authored
      Commit 5b7c665e introduced fs_info->qgroup_ulist, that is allocated during
      btrfs_read_qgroup_config and meant to be used later by the qgroup accounting
      code. However, it is always freed before btrfs_read_qgroup_config returns,
      becuase the commit mentioned above adds a check for (ret), where a check
      for (ret < 0) would have been the right choice. This commit fixes the check.
      
      Cc: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      4373519d
    • Josef Bacik's avatar
      Btrfs: simplify unlink reservations · d52be818
      Josef Bacik authored
      Dave pointed out a problem where if you filled up a file system as much as
      possible you couldn't remove any files.  The whole unlink reservation thing is
      convoluted because it tries to guess if it's going to add space to unlink
      something or not, and has all these odd uncommented cases where it simply does
      not try.  So to fix this I've added a way to conditionally steal from the global
      reserve if we can't make our normal reservation.  If we have more than half the
      space in the global reserve free we will go ahead and steal from the global
      reserve.  With this patch Dave's reproducer now works and I can rm all the files
      on the file system.  Thanks,
      Reported-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      d52be818
    • Miao Xie's avatar
      Btrfs: merge pending IO for tree log write back · c6adc9cc
      Miao Xie authored
      Before applying this patch, we flushed the log tree of the fs/file
      tree firstly, and then flushed the log root tree. It is ineffective,
      especially on the hard disk. This patch improved this problem by wrapping
      the above two flushes by the same blk_plug.
      
      By test, the performance of the sync write went up ~60%(2.9MB/s -> 4.6MB/s)
      on my scsi disk whose disk buffer was enabled.
      
      Test step:
       # mkfs.btrfs -f -m single <disk>
       # mount <disk> <mnt>
       # dd if=/dev/zero of=<mnt>/file0 bs=32K count=1024 oflag=sync
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      c6adc9cc
    • Liu Bo's avatar
      Btrfs: allow file data clone within a file · a96fbc72
      Liu Bo authored
      We did not allow file data clone within the same file because of
      deadlock issues.
      
      However, we now use nested lock to avoid deadlock between the
      parent directory and the child file.
      
      So it's safe to do file clone within the same file when the two
      ranges are not overlapped.
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      a96fbc72
    • Liu Bo's avatar
      Btrfs: remove unused code in btrfs_del_root · b7394eb9
      Liu Bo authored
      'leaf' and 'ri' is not used somehow.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      b7394eb9
    • Liu Bo's avatar
      Btrfs: kill replicate code in replay_one_buffer · 2da1c669
      Liu Bo authored
      EXTREF is treated same as REF, so we can make the code tidy.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      2da1c669
    • Liu Bo's avatar
      Btrfs: update new flags for tracepoint · e112e2b4
      Liu Bo authored
      Adding new flags to keep tracepoints consistent with btrfs.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      e112e2b4
    • Liu Bo's avatar
      Btrfs: check if leaf's parent exists before pushing items around · 33157e05
      Liu Bo authored
      During splitting a leaf, pushing items around to hopefully get some space only
      works when we have a parent, ie. we have at least one sibling leaf.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      33157e05
    • Liu Bo's avatar
      Btrfs: dont do log_removal in insert_new_root · fdd99c72
      Liu Bo authored
      As for splitting a leaf, root is just the leaf, and tree mod log does not apply
      on leaf, so in this case, we don't do log_removal.
      
      As for splitting a node, the old root is kept as a normal node and we have nicely
      put records in tree mod log for moving keys and items, so in this case we don't do
      that either.
      
      As above, insert_new_root can get rid of log_removal.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      fdd99c72
    • Wei Yongjun's avatar
      Btrfs: return error code in btrfs_check_trunc_cache_free_space() · 4b286cd1
      Wei Yongjun authored
      Fix to return error code instead always return 0 from function
      btrfs_check_trunc_cache_free_space().
      Introduced by commit 7b61cd92
      (Btrfs: don't use global block reservation for inode cache truncation)
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Reviewed-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      4b286cd1
    • Josef Bacik's avatar
      Btrfs: fix estale with btrfs send · 139f807a
      Josef Bacik authored
      This fixes bugzilla 57491.  If we take a snapshot of a fs with a unlink ongoing
      and then try to send that root we will run into problems.  When comparing with a
      parent root we will search the parents and the send roots commit_root, which if
      we've just created the snapshot will include the file that needs to be evicted
      by the orphan cleanup.  So when we find a changed extent we will try and copy
      that info into the send stream, but when we lookup the inode we use the normal
      root, which no longer has the inode because the orphan cleanup deleted it.  The
      best solution I have for this is to check our otransid with the generation of
      the commit root and if they match just commit the transaction again, that way we
      get the changes from the orphan cleanup.  With this patch the reproducer I made
      for this bugzilla no longer returns ESTALE when trying to do the send.  Thanks,
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarChris Wilson <jakdaw@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      139f807a
    • Anand Jain's avatar
      btrfs: device delete to get errors from the kernel · 183860f6
      Anand Jain authored
      when user runs command btrfs dev del the raid requisite error if any
      goes to the /var/log/messages, its not good idea to clutter messages
      with these user (knowledge) errors, further user don't have to review
      the system messages to know problem with the cli it should be dropped
      to the user as part of the cli return.
      
      to bring this feature created a set of the ERROR defined
      BTRFS_ERROR_DEV* error codes and created their error string.
      
      I expect this enum to be added with other error which we might
      want to communicate to the user land
      
      v3:
      moved the code with in the file no logical change
      
      v1->v2:
      introduce error codes for the device mgmt usage
      
      v1:
      adds a parameter in the ioctl arg struct to carry the error string
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      183860f6
    • Josef Bacik's avatar
      Btrfs: do delay iput in sync_fs · c73e2936
      Josef Bacik authored
      We get lock inversion with umount if we allow iputs from sync_fs, so use the
      delay iput flag to keep this from happening.  Thanks,
      Reported-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      c73e2936
    • Miao Xie's avatar
      Btrfs: make the state of the transaction more readable · 4a9d8bde
      Miao Xie authored
      We used 3 variants to track the state of the transaction, it was complex
      and wasted the memory space. Besides that, it was hard to understand that
      which types of the transaction handles should be blocked in each transaction
      state, so the developers often made mistakes.
      
      This patch improved the above problem. In this patch, we define 6 states
      for the transaction,
        enum btrfs_trans_state {
      	TRANS_STATE_RUNNING		= 0,
      	TRANS_STATE_BLOCKED		= 1,
      	TRANS_STATE_COMMIT_START	= 2,
      	TRANS_STATE_COMMIT_DOING	= 3,
      	TRANS_STATE_UNBLOCKED		= 4,
      	TRANS_STATE_COMPLETED		= 5,
      	TRANS_STATE_MAX			= 6,
        }
      and just use 1 variant to track those state.
      
      In order to make the blocked handle types for each state more clear,
      we introduce a array:
        unsigned int btrfs_blocked_trans_types[TRANS_STATE_MAX] = {
      	[TRANS_STATE_RUNNING]		= 0U,
      	[TRANS_STATE_BLOCKED]		= (__TRANS_USERSPACE |
      					   __TRANS_START),
      	[TRANS_STATE_COMMIT_START]	= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH),
      	[TRANS_STATE_COMMIT_DOING]	= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN),
      	[TRANS_STATE_UNBLOCKED]		= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN |
      					   __TRANS_JOIN_NOLOCK),
      	[TRANS_STATE_COMPLETED]		= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN |
      					   __TRANS_JOIN_NOLOCK),
        }
      it is very intuitionistic.
      
      Besides that, because we remove ->in_commit in transaction structure, so
      the lock ->commit_lock which was used to protect it is unnecessary, remove
      ->commit_lock.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      4a9d8bde
    • Miao Xie's avatar
      Btrfs: remove the time check in btrfs_commit_transaction() · 581227d0
      Miao Xie authored
      We checked the commit time to avoid committing the transaction
      frequently, but it is unnecessary because:
      - It made the transaction commit spend more time, and delayed the
        operation of the external writers(TRANS_START/TRANS_USERSPACE).
      - Except the space that we have to commit transaction, such as
        snapshot creation, btrfs doesn't commit the transaction on its
        own initiative.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      581227d0
    • Miao Xie's avatar
      Btrfs: remove unnecessary varient ->num_joined in btrfs_transaction structure · 3f1e3fa6
      Miao Xie authored
      We used ->num_joined track if there were some writers which join the current
      transaction when the committer was sleeping. If some writers joined the current
      transaction, we has to continue the while loop to do some necessary stuff, such
      as flush the ordered operations. But it is unnecessary because we will do it
      after the while loop.
      
      Besides that, tracking ->num_joined would make the committer drop into the while
      loop when there are lots of internal writers(TRANS_JOIN).
      
      So we remove ->num_joined and don't track if there are some writers which join
      the current transaction when the committer is sleeping.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      3f1e3fa6
    • Miao Xie's avatar
      Btrfs: don't flush the delalloc inodes in the while loop if flushoncommit is set · 82436617
      Miao Xie authored
      It is unnecessary to flush the delalloc inodes again and again because
      we don't care the dirty pages which are introduced after the flush, and
      they will be flush in the transaction commit.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      82436617
    • Miao Xie's avatar
      Btrfs: don't wait for all the writers circularly during the transaction commit · 0860adfd
      Miao Xie authored
      btrfs_commit_transaction has the following loop before we commit the
      transaction.
      
      do {
          // attempt to do some useful stuff and/or sleep
      } while (atomic_read(&cur_trans->num_writers) > 1 ||
      	 (should_grow && cur_trans->num_joined != joined));
      
      This is used to prevent from the TRANS_START to get in the way of a
      committing transaction. But it does not prevent from TRANS_JOIN, that
      is we would do this loop for a long time if some writers JOIN the
      current transaction endlessly.
      
      Because we need join the current transaction to do some useful stuff,
      we can not block TRANS_JOIN here. So we introduce a external writer
      counter, which is used to count the TRANS_USERSPACE/TRANS_START writers.
      If the external writer counter is zero, we can break the above loop.
      
      In order to make the code more clear, we don't use enum variant
      to define the type of the transaction handle, use bitmask instead.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      0860adfd