1. 20 Feb, 2013 40 commits
    • Josef Bacik's avatar
      Btrfs: place ordered operations on a per transaction list · 569e0f35
      Josef Bacik authored
      Miao made the ordered operations stuff run async, which introduced a
      deadlock where we could get somebody (sync) racing in and committing the
      transaction while a commit was already happening.  The new committer would
      try and flush ordered operations which would hang waiting for the commit to
      finish because it is done asynchronously and no longer inherits the callers
      trans handle.  To fix this we need to make the ordered operations list a per
      transaction list.  We can get new inodes added to the ordered operation list
      by truncating them and then having another process writing to them, so this
      makes it so that anybody trying to add an ordered operation _must_ start a
      transaction in order to add itself to the list, which will keep new inodes
      from getting added to the ordered operations list after we start committing.
      This should fix the deadlock and also keeps us from doing a lot more work
      than we need to during commit.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      569e0f35
    • Josef Bacik's avatar
      Btrfs: relax the block group size limit for bitmaps · dde5740f
      Josef Bacik authored
      Dave pointed out that xfstests 273 will tell you that it failed to load the
      space cache for a block group when it remounts.  This is because we run out
      of space writing out the block group cache.  This is ok and is working as it
      should, but let's try to be a bit nicer.  This happens because the block
      group was 100mb, but bitmap entries cover 128mb, so we were only getting
      extent entries for this block group, which ended up being too many to fit in
      the free space cache.  So relax the bitmap size requirements to block groups
      that are at least half the size a bitmap will cover or larger, that way we
      can still keep the amount of space used in the free space cache low enough
      to be able to write it out.  With this patch I no longer fail to write out
      the free space cache.  Thanks,
      Reported-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      dde5740f
    • Ilya Dryomov's avatar
      Btrfs: allow for selecting only completely empty chunks · 3e39cea6
      Ilya Dryomov authored
      Enhance balance usage filter by making it possible to balance out only
      completely empty chunks.  Today, usage filter properly acts on values
      from 1 to 99 inclusive, usage=100 selects all chunks, and usage=0
      selects no chunks.  This commit changes the usage=0 case: the new
      meaning is to restripe only completely empty chunks and nothing else.
      Suggested-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      3e39cea6
    • Ilya Dryomov's avatar
      Btrfs: eliminate a use-after-free in btrfs_balance() · bf023ecf
      Ilya Dryomov authored
      Commit 5af3e8cc introduced a use-after-free at volumes.c:3139: bctl is freed
      above in __cancel_balance() in all cases except for balance pause.  Fix this
      by moving the offending check a couple statements above, the meaning of the
      check is preserved.
      Reported-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      bf023ecf
    • Josef Bacik's avatar
      Btrfs: remove unused extent io tree ops V2 · c8f2f24b
      Josef Bacik authored
      Nobody uses these io tree ops anymore so just remove them and clean up the code
      a bit.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      c8f2f24b
    • David Sterba's avatar
      btrfs: add cancellation points to defrag · 210549eb
      David Sterba authored
      The defrag operation can take very long, we want to have a way how to
      cancel it. The code checks for a pending signal at safe points in the
      defrag loops and returns EAGAIN. This means a user can press ^C after
      running 'btrfs fi defrag', woks for both defrag modes, files and root.
      
      Returning from the command was instant in my light tests, but may take
      longer depending on the aging factor of the filesystem.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      210549eb
    • David Sterba's avatar
      btrfs: put some enospc messages under enospc_debug · b069e0c3
      David Sterba authored
      The warning in use_block_rsv is not useful for users and may fill
      the logs unnecessarily.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      b069e0c3
    • Miao Xie's avatar
      Btrfs: implement unlocked dio write · 38851cc1
      Miao Xie authored
      This idea is from ext4. By this patch, we can make the dio write parallel,
      and improve the performance. But because we can not update isize without
      i_mutex, the unlocked dio write just can be done in front of the EOF.
      
      We needn't worry about the race between dio write and truncate, because the
      truncate need wait untill all the dio write end.
      
      And we also needn't worry about the race between dio write and punch hole,
      because we have extent lock to protect our operation.
      
      I ran fio to test the performance of this feature.
      
      == Hardware ==
      CPU: Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
      Mem: 2GB
      SSD: Intel X25-M 120GB (Test Partition: 60GB)
      
      == config file ==
      [global]
      ioengine=psync
      direct=1
      bs=4k
      size=32G
      runtime=60
      directory=/mnt/btrfs/
      filename=testfile
      group_reporting
      thread
      
      [file1]
      numjobs=1 # 2 4
      rw=randwrite
      
      == result (KBps) ==
      write	1	2	4
      lock	24936	24738	24726
      nolock	24962	30866	32101
      
      == result (iops) ==
      write	1	2	4
      lock	6234	6184	6181
      nolock	6240	7716	8025
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      38851cc1
    • Miao Xie's avatar
      Btrfs: serialize unlocked dio reads with truncate · 2e60a51e
      Miao Xie authored
      Currently, we can do unlocked dio reads, but the following race
      is possible:
      
      dio_read_task			truncate_task
      				->btrfs_setattr()
      ->btrfs_direct_IO
          ->__blockdev_direct_IO
            ->btrfs_get_block
      				  ->btrfs_truncate()
      				 #alloc truncated blocks
      				 #to other inode
            ->submit_io()
           #INFORMATION LEAK
      
      In order to avoid this problem, we must serialize unlocked dio reads with
      truncate. There are two approaches:
      - use extent lock to protect the extent that we truncate
      - use inode_dio_wait() to make sure the truncating task will wait for
        the read DIO.
      
      If we use the 1st one, we will meet the endless truncation problem due to
      the nonlocked read DIO after we implement the nonlocked write DIO. It is
      because we still need invoke inode_dio_wait() avoid the race between write
      DIO and truncation. By that time, we have to introduce
      
        btrfs_inode_{block, resume}_nolock_dio()
      
      again. That is we have to implement this patch again, so I choose the 2nd
      way to fix the problem.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      2e60a51e
    • Miao Xie's avatar
      Btrfs: fix deadlock due to unsubmitted · 0934856d
      Miao Xie authored
      The deadlock problem happened when running fsstress(a test program in LTP).
      
      Steps to reproduce:
       # mkfs.btrfs -b 100M <partition>
       # mount <partition> <mnt>
       # <Path>/fsstress -p 3 -n 10000000 -d <mnt>
      
      The reason is:
      btrfs_direct_IO()
       |->do_direct_IO()
           |->get_page()
           |->get_blocks()
           |	 |->btrfs_delalloc_resereve_space()
           |	 |->btrfs_add_ordered_extent() -------	Add a new ordered extent
           |->dio_send_cur_page(page0) --------------	We didn't submit bio here
           |->get_page()
           |->get_blocks()
      	 |->btrfs_delalloc_resereve_space()
      	     |->flush_space()
      		 |->btrfs_start_ordered_extent()
      		     |->wait_event() ----------	Wait the completion of
      						the ordered extent that is
      						mentioned above
      
      But because we didn't submit the bio that is mentioned above, the ordered
      extent can not complete, we would wait for its completion forever.
      
      There are two methods which can fix this deadlock problem:
      1. submit the bio before we invoke get_blocks()
      2. reserve the space before we do dio
      
      Though the 1st is the simplest way, we need modify the code of VFS, and it
      is likely to break contiguous requests, and introduce performance regression
      for the other filesystems.
      
      So we have to choose the 2nd way.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      0934856d
    • Josef Bacik's avatar
      Btrfs: cleanup orphan reservation if truncate fails · 4a7d0f68
      Josef Bacik authored
      I noticed we were getting lots of warnings with xfstest 83 because we have
      reservations outstanding.  This is because we moved the orphan add outside
      of the truncate, but we don't actually cleanup our reservation if something
      fails.  This fixes the problem and I no longer see warnings.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      4a7d0f68
    • Josef Bacik's avatar
      Btrfs: steal from global reserve if we are cleaning up orphans · 5d80366e
      Josef Bacik authored
      Sometimes xfstest 83 will fail to remount the scratch device because we've
      gotten ourselves so full that we cannot cleanup the orphan items.  In this
      case check to see if we're doing the orphan cleanup and if we are allow us
      to steal our reservation from the global block rsv.  With this patch I've
      not been able to reproduce the failed mount problem.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      5d80366e
    • Miao Xie's avatar
      Btrfs: fix memory leak of pending_snapshot->inherit · 8696c533
      Miao Xie authored
      The argument "inherit" of btrfs_ioctl_snap_create_transid() was assigned
      to NULL during we created the snapshots, so we didn't free it though we
      called kfree() in the caller.
      
      But since we are sure the snapshot creation is done after the function -
      btrfs_ioctl_snap_create_transid() - completes, it is safe that we don't
      assign the pointer "inherit" to NULL, and just free it in the caller of
      btrfs_ioctl_snap_create_transid(). In this way, the code can become more
      readable.
      Reported-by: default avatarAlex Lyakas <alex.btrfs@zadarastorage.com>
      Cc: Arne Jansen <sensille@gmx.net>
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      8696c533
    • Miao Xie's avatar
      Btrfs: fix the race between bio and btrfs_stop_workers · 2b8195bb
      Miao Xie authored
      open_ctree() need read the metadata to initialize the global information
      of btrfs. But it may fail after it submit some bio, and then it will jump
      to the error path. Unfortunately, it doesn't check if there are some bios
      in flight, and just stop all the worker threads. As a result, when the
      submitted bios end, they can not find any worker thread which can deal with
      subsequent work, then oops happen.
      
      kernel BUG at fs/btrfs/async-thread.c:605!
      
      Fix this problem by invoking invalidate_inode_pages2() before we stop the
      worker threads. This function will wait until the bio end because it need
      lock the pages which are going to be invalidated, and if a page is under
      disk read IO, it must be locked. invalidate_inode_pages2() need wait until
      end bio handler to unlocked it.
      Reported-and-Tested-by: default avatarTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      2b8195bb
    • Mark Fasheh's avatar
      btrfs: add "no file data" flag to btrfs send ioctl · cb95e7bf
      Mark Fasheh authored
      This patch adds the flag, BTRFS_SEND_FLAG_NO_FILE_DATA to the btrfs send
      ioctl code. When this flag is set, the btrfs send code will never write file
      data into the stream (thus also avoiding expensive reads of that data in the
      first place). BTRFS_SEND_C_UPDATE_EXTENT commands will be sent (instead of
      BTRFS_SEND_C_WRITE) with an offset, length pair indicating the extent in
      question.
      
      This patch does not affect the operation of BTRFS_SEND_C_CLONE commands -
      they will continue to be sent when a search finds an appropriate extent to
      clone from.
      Signed-off-by: default avatarMark Fasheh <mfasheh@suse.de>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      cb95e7bf
    • Liu Bo's avatar
      Btrfs: extend the checksum item as much as possible · 2f697dc6
      Liu Bo authored
      For write, we also reserve some space for COW blocks during updating
      the checksum tree, and we calculate the number of blocks by checking
      if the number of bytes outstanding that are going to need csums needs
      one more block for csum.
      
      When we add these checksum into the checksum tree, we use ordered sums
      list.
      Every ordered sum contains csums for each sector, and we'll first try
      to look up an existing csum item,
      a) if we don't yet have a proper csum item, then we need to insert one,
      b) or if we find one but the csum item is not big enough, then we need
      to extend it.
      
      The point is we'll unlock the whole path and then insert or extend.
      So others can hack in and update the tree.
      
      Each insert or extend needs update the tree with COW on, and we may need
      to insert/extend for many times.
      
      That means what we've reserved for updating checksum tree is NOT enough
      indeed.
      
      The case is even more serious with having several write threads at the
      same time, it can end up eating our reserved space quickly and starting
      eating globle reserve pool instead.
      
      I don't yet come up with a way to calculate the worse case for updating
      csum, but extending the checksum item as much as possible can be helpful
      in my test.
      
      The idea behind is that it can reduce the times we insert/extend so that
      it saves us precious reserved space.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      2f697dc6
    • Eric Sandeen's avatar
      btrfs: remove cache only arguments from defrag path · de78b51a
      Eric Sandeen authored
      The entry point at the defrag ioctl always sets "cache only" to 0;
      the codepaths haven't run for a long time as far as I can
      tell.  Chris says they're dead code, so remove them.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      de78b51a
    • Josef Bacik's avatar
      Btrfs: if we aren't committing just end the transaction if we error out · e4a2bcac
      Josef Bacik authored
      I hit a deadlock where transaction commit was waiting on num_writers to be
      0.  This happened because somebody came into btrfs_commit_transaction and
      noticed we had aborted and it went to cleanup_transaction.  This shouldn't
      happen because cleanup_transaction is really to fixup a bad commit, it
      doesn't do the normal trans handle cleanup things.  So if we have an error
      just do the normal btrfs_end_transaction dance and return.  Once we are in
      the actual commit path we can use cleanup_transaction and be good to go.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      e4a2bcac
    • Josef Bacik's avatar
      Btrfs: handle errors in compression submission path · 3e04e7f1
      Josef Bacik authored
      I noticed we would deadlock if we aborted a transaction while doing
      compressed io.  This is because we don't unlock our pages if something goes
      horribly wrong.  To fix this we need to make sure that we call
      extent_clear_unlock_delalloc in order to unlock all the pages.  If we have
      to cow in the async submission thread we need to make sure to unlock our
      locked_page as the cow error path will not unlock the locked page as it
      depends on the caller to unlock that page.  With this patch we no longer
      deadlock on the page lock when we have an aborted transaction.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      3e04e7f1
    • Josef Bacik's avatar
      Btrfs: rework the overcommit logic to be based on the total size · 70afa399
      Josef Bacik authored
      People have been complaining about random ENOSPC errors that will clear up
      after a umount or just a given amount of time.  Chris was able to reproduce
      this with stress.sh and lots of processes and so was I.  Basically the
      overcommit stuff would really let us get out of hand, in my tests I saw up
      to 30 gigs of outstanding reservations with only 2 gigs total of metadata
      space.  This usually worked out fine but with so much outstanding
      reservation the flushing stuff short circuits to make sure we don't hang
      forever flushing when we really need ENOSPC.  Plus we allocate chunks in
      order to alleviate the pressure, but this doesn't actually help us since we
      only use the non-allocated area in our over commit logic.
      
      So instead of basing overcommit on the amount of non-allocated space,
      instead just do it based on how much total space we have, and then limit it
      to the non-allocated space in case we are short on space to spill over into.
      This allows us to have the same performance as well as no longer giving
      random ENOSPC.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      70afa399
    • Josef Bacik's avatar
      Btrfs: account for orphan inodes properly during cleanup · 925396ec
      Josef Bacik authored
      Dave sent me a panic where we were doing the orphan cleanup and panic'ed
      trying to release our reservation from the orphan block rsv.  The reason for
      this is because our orphan block rsv had been free'd out from underneath us
      because the transaction commit found that there were no orphan inodes
      according to its count and decided to free it.  This is incorrect so make
      sure we inc the orphan inodes count so the accounting is all done properly.
      This would also cause the warning in the orphan commit code normally if you
      had any orphans to cleanup as they would only decrement the orphan count so
      you'd get a negative orphan count which could cause problems during runtime.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      925396ec
    • Josef Bacik's avatar
      Btrfs: unreserve space if our ordered extent fails to work · 0bec9ef5
      Josef Bacik authored
      When a transaction aborts or there's an EIO on an ordered extent or any
      error really we will not free up the space we reserved for this ordered
      extent.  This results in warnings from the block group cache cleanup in the
      case of a transaction abort, or leaking space in the case of EIO on an
      ordered extent.  Fix this up by free'ing the reserved space if we have an
      error at all trying to complete an ordered extent.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      0bec9ef5
    • Josef Bacik's avatar
      Btrfs: fix how we discard outstanding ordered extents on abort · 779880ef
      Josef Bacik authored
      When we abort we've been just free'ing up all the ordered extents and
      hoping for the best.  This results in lots of warnings from various places,
      warnings from btrfs_destroy_inode() because it's ENOSPC accounting isn't
      fixed.  It will also screw up lots of pages who have been set private but
      never get cleared because the ordered extents are never allowed to be
      submitted.  This patch fixes those warnings.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      779880ef
    • Josef Bacik's avatar
      Btrfs: fix freeing delayed ref head while still holding its mutex · eb12db69
      Josef Bacik authored
      I hit this error when reproducing a bug that would end in a transaction
      abort.  We take the delayed ref head's mutex to keep anybody from processing
      it while we're destroying it, but we fail to drop the mutex before we carry
      on and free the damned thing.  Fix this by doing the remove logic for the
      head ourselves and unlock the mutex, that way we can avoid use after free's
      or hung tasks waiting on that mutex to come back so they know the delayed
      ref completed.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      eb12db69
    • Eric Sandeen's avatar
      btrfs: ensure we don't overrun devices_info[] in __btrfs_alloc_chunk · 063d006f
      Eric Sandeen authored
      WARN_ON isn't enough, we need to stop the loop if for any reason
      we would overrun the devices_info array.
      
      I tried to track down the connection between the length of
      the alloc_devices list and the rw_devices counter but
      it wasn't immediately obvious, so be defensive about it.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      063d006f
    • Eric Sandeen's avatar
      btrfs: remove unnecessary DEFINE_WAIT() declarations · 1971e917
      Eric Sandeen authored
      No point in DEFINE_WAIT(wait) if it's not used!
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      1971e917
    • Eric Sandeen's avatar
      btrfs: remove unused "item" in btrfs_insert_delayed_item() · d4c0a7da
      Eric Sandeen authored
      "item" was set but never used in this function.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      d4c0a7da
    • Eric Sandeen's avatar
      btrfs: fix varargs in __btrfs_std_error · 37252a66
      Eric Sandeen authored
      __btrfs_std_error didn't always properly call va_end,
      and might call va_start even if fmt was NULL.
      
      Move all the varargs handling into the block where we
      have fmt.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      37252a66
    • Eric Sandeen's avatar
      btrfs: add missing break in btrfs_print_leaf() · 0e636027
      Eric Sandeen authored
      I don't think that BTRFS_DEV_EXTENT_KEY is supposed
      to fall through to BTRFS_DEV_STATS_KEY ...
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      0e636027
    • Eric Sandeen's avatar
      btrfs: annotate intentional switch case fallthroughs · 1c697d4a
      Eric Sandeen authored
      This keeps static checkers happy.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      1c697d4a
    • Eric Sandeen's avatar
      btrfs: handle null fs_info in btrfs_panic() · aa43a17c
      Eric Sandeen authored
      At least backref_tree_panic() can apparently pass
      in a null fs_info, so handle that in __btrfs_panic
      to get the message out on the console.
      
      The btrfs_panic macro also uses fs_info, but that's
      largely pointless; it's testing to see if
      BTRFS_MOUNT_PANIC_ON_FATAL_ERROR is not set.
      But if it *were* set, __btrfs_panic() would have,
      well, paniced and we wouldn't be here, testing it!
      So just BUG() at this point.
      
      And since we only use fs_info once now, just use it
      directly.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      aa43a17c
    • Eric Sandeen's avatar
      5a016047
    • Eric Sandeen's avatar
      btrfs: list_entry can't return NULL · d1d3cd27
      Eric Sandeen authored
      No need to test the result, we can't get a
      null pointer from list_entry()
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      d1d3cd27
    • Eric Sandeen's avatar
      btrfs: remove unused fd in btrfs_ioctl_send() · b4c6f7b7
      Eric Sandeen authored
      All we do is set it to NULL and test it :)
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      b4c6f7b7
    • Josef Bacik's avatar
      Btrfs: do not overcommit if we don't have enough space for global rsv · 96f1bb57
      Josef Bacik authored
      Because of how little we allocate chunks now we can get really tight on
      metadata space before we will allocate a new chunk.  This resulted in being
      unable to add device extents when allocating a new metadata chunk as we did
      not have enough space.  This is because we were allowed to overcommit too
      much metadata without actually making sure we had enough space to make
      allocations.  The idea behind overcommit is that we are allowed to say "sure
      you can have that reservation" when most of the free space is occupied by
      reservations, not actual allocations.  But in this case where a majority of
      the total space is in use by actual allocations we can screw ourselves by
      not being able to make real allocations when it matters.  So make sure we
      have enough real space for our global reserve, and if not then don't allow
      overcommitting.  Thanks,
      Reported-and-tested-by: default avatarJim Schutt <jaschut@sandia.gov>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      96f1bb57
    • Josef Bacik's avatar
      Btrfs: remove extent mapping if we fail to add chunk · 0f5d42b2
      Josef Bacik authored
      I got a double free error when unmounting a file system that failed to add a
      chunk during its operation.  This is because we will kfree the mapping that
      we created but leave the extent_map in the em_tree for chunks.  So to fix
      this just remove the extent_map when we error out so we don't run into this
      problem.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      0f5d42b2
    • Josef Bacik's avatar
      Btrfs: fix chunk allocation error handling · 04487488
      Josef Bacik authored
      If we error out allocating a dev extent we will have already created the
      block group and such which will cause problems since the allocator may have
      tried to allocate out of the block group that no longer exists.  This will
      cause BUG_ON()'s in the bio submission path.  This also makes a failure to
      allocate a dev extent a non-abort error, we will just clean up the dev
      extents we did allocate and exit.  Now if we fail to delete the dev extents
      we will abort since we can't have half of the dev extents hanging around,
      but this will make us much less likely to abort.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      04487488
    • Miao Xie's avatar
      Btrfs: use bit operation for ->fs_state · 87533c47
      Miao Xie authored
      There is no lock to protect fs_info->fs_state, it will introduce
      some problems, such as the value may be covered by the other task
      when several tasks modify it. For example:
      	Task0 - CPU0		Task1 - CPU1
      	mov %fs_state rax
      	or $0x1 rax
      				mov %fs_state rax
      				or $0x2 rax
      	mov rax %fs_state
      				mov rax %fs_state
      The expected value is 3, but in fact, it is 2.
      
      Though this problem doesn't happen now (because there is only one
      flag currently), the code is error prone, if we add other flags,
      the above problem will happen to a certainty.
      
      Now we use bit operation for it to fix the above problem.
      In this way, we can make the code more robust and be easy to
      add new flags.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      87533c47
    • Miao Xie's avatar
      Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bits · de98ced9
      Miao Xie authored
      There is no lock to protect
        fs_info->avail_{data, metadata, system}_alloc_bits,
      it may introduce some problem, such as the wrong profile
      information, so we add a seqlock to protect them.
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      de98ced9
    • Miao Xie's avatar
      Btrfs: use the inode own lock to protect its delalloc_bytes · df0af1a5
      Miao Xie authored
      We need not use a global lock to protect the delalloc_bytes of the
      inode, just use its own lock. In this way, we can reduce the lock
      contention and ->delalloc_lock will just protect delalloc inode
      list.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      df0af1a5