1. 15 Mar, 2010 20 commits
    • Akinobu Mita's avatar
      btrfs: use memparse · 91748467
      Akinobu Mita authored
      Use memparse() instead of its own private implementation.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: linux-btrfs@vger.kernel.org
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      91748467
    • Josef Bacik's avatar
      Btrfs: add a "df" ioctl for btrfs · 1406e432
      Josef Bacik authored
      df is a very loaded question in btrfs.  This gives us a way to get the per-space
      usage information so we can tell exactly what is in use where.  This will help
      us figure out ENOSPC problems, and help users better understand where their disk
      space is going.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      1406e432
    • Josef Bacik's avatar
      Btrfs: cache the extent state everywhere we possibly can V2 · 2ac55d41
      Josef Bacik authored
      This patch just goes through and fixes everybody that does
      
      lock_extent()
      blah
      unlock_extent()
      
      to use
      
      lock_extent_bits()
      blah
      unlock_extent_cached()
      
      and pass around a extent_state so we only have to do the searches once per
      function.  This gives me about a 3 mb/s boots on my random write test.  I have
      not converted some things, like the relocation and ioctl's, since they aren't
      heavily used and the relocation stuff is in the middle of being re-written.  I
      also changed the clear_extent_bit() to only unset the cached state if we are
      clearing EXTENT_LOCKED and related stuff, so we can do things like this
      
      lock_extent_bits()
      clear delalloc bits
      unlock_extent_cached()
      
      without losing our cached state.  I tested this thoroughly and turned on
      LEAK_DEBUG to make sure we weren't leaking extent states, everything worked out
      fine.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      2ac55d41
    • Josef Bacik's avatar
      Btrfs: cache ordered extent when completing io · 5a1a3df1
      Josef Bacik authored
      When finishing io we run btrfs_dec_test_ordered_pending, and then immediately
      run btrfs_lookup_ordered_extent, but btrfs_dec_test_ordered_pending does that
      already, so we're searching twice when we don't have to.  This patch lets us
      pass a btrfs_ordered_extent in to btrfs_dec_test_ordered_pending so if we do
      complete io on that ordered extent we can just use the one we found then instead
      of having to do another btrfs_lookup_ordered_extent.  This made my fio job with
      the other patch go from 24 mb/s to 29 mb/s.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      5a1a3df1
    • Josef Bacik's avatar
      Btrfs: cache extent state in find_delalloc_range · c2a128d2
      Josef Bacik authored
      This patch makes us cache the extent state we find in find_delalloc_range since
      we'll have to lock the extent later on in the function.  This will keep us from
      re-searching for the rang when we try to lock the extent.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      c2a128d2
    • Josef Bacik's avatar
      Btrfs: change the ordered tree to use a spinlock instead of a mutex · 49958fd7
      Josef Bacik authored
      The ordered tree used to need a mutex, but currently all we use it for is to
      protect the rb_tree, and a spin_lock is just fine for that.  Using a spin_lock
      instead makes dbench run a little faster, 58 mb/s instead of 51 mb/s, and have
      less latency, 3445.138 ms instead of 3820.633 ms.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      49958fd7
    • Chris Mason's avatar
      Btrfs: finish read pages in the order they are submitted · 4125bf76
      Chris Mason authored
      The endio is done at reverse order of bio vectors.
      
      That means for a sequential read, the page first submitted will finish
      last in a bio. Considering we will do checksum (making cache hot) for
      every page, this does introduce delay (and chance to squeeze cache used
      soon) for pages submitted at the begining.
      
      I don't observe obvious performance difference with below patch at my
      simple test, but seems more natural to finish read in the order they are
      submitted.
      Signed-off-by: default avatarShaohua Li <shaohua.li@intel.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      4125bf76
    • Miao Xie's avatar
      btrfs: fix btrfs_mkdir goto for no free objectids · 0be2e981
      Miao Xie authored
      btrfs_mkdir() must jump to the place of ending transaction after
      btrfs_find_free_objectid() failed. Or this transaction can't end.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      0be2e981
    • Sage Weil's avatar
      Btrfs: flush data on snapshot creation · 0bdb1db2
      Sage Weil authored
      Flush any delalloc extents when we create a snapshot, so that recently
      written file data is always included in the snapshot.
      
      A later commit will add the ability to snapshot without the flush, but
      most people expect flushing.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      0bdb1db2
    • Josef Bacik's avatar
      Btrfs: make df be a little bit more understandable · bd4d1088
      Josef Bacik authored
      The way we report df usage is way confusing for everybody, including some other
      utilities (bacula for one).  So this patch makes df a little bit more
      understandable.  First we make used actually count the total amount of used
      space in all space info's.  This will give us a real view of how much disk space
      is in use.  Second, for blocks available, only count data space.  This makes
      things like bacula work because it says 0 when you can no longer write anymore
      data to the disk.  I think this is a nice compromise, since you will end up with
      something like the following
      
      [root@alpha ~]# df -h
      Filesystem            Size  Used Avail Use% Mounted on
      /dev/mapper/VolGroup-lv_root
                            148G   30G  111G  21% /
      /dev/sda1             194M  116M   68M  64% /boot
      tmpfs                 985M   12K  985M   1% /dev/shm
      /dev/mapper/VolGroup-LogVol02
                            145G  140G     0 100% /mnt/btrfs-test
      
      Compare this with btrfsctl -i output
      
      [root@alpha btrfs-progs-unstable]# ./btrfsctl -i /mnt/btrfs-test/
      Metadata, DUP: total=4.62GB, used=2.46GB
      System, DUP: total=8.00MB, used=24.00KB
      Data: total=134.80GB, used=134.80GB
      Metadata: total=8.00MB, used=0.00
      System: total=4.00MB, used=0.00
      operation complete
      
      This way we show that there is no more data space to be used, but we have
      another 5GB of space left for metadata.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      bd4d1088
    • TARUISI Hiroaki's avatar
      btrfs: Update existing btrfs_device for renaming device · 3a0524dc
      TARUISI Hiroaki authored
      When we scan devices in a multi-device filesystem, we memorize the original
      name.  If the device gets a new name, later scans don't update the
      in-kernel structures related to it, and we're not able to mount the
      filesystem.
      
      This patch updates device name during scaning.
      Signed-off-by: default avatarTARUISI Hiroaki <taruishi.hiroak@jp.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      3a0524dc
    • Chris Mason's avatar
      Btrfs: add new defrag-range ioctl. · 1e701a32
      Chris Mason authored
      The btrfs defrag ioctl was limited to doing the entire file.  This
      commit adds a new interface that can defrag a specific range inside
      the file.
      
      It can also force compression on the file, allowing you to selectively
      compress individual files after they were created, even when mount -o
      compress isn't turned on.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      1e701a32
    • Chris Mason's avatar
      Btrfs: be more selective in the defrag ioctl · 940100a4
      Chris Mason authored
      The btrfs defrag ioctl had some bugs around delalloc accounting, and it
      wasn't properly skipping pages that were not in the mapping.
      
      It wasn't properly clearing the page checked flag, which could make the
      writeback code ignore the page forever while pinning it as dirty.
      
      This commit fixes those problems and makes defrag a little smarter.  It
      skips holes and it doesn't waste time defragging large extents.  If a
      tiny extent comes before a very large extent, it will defrag both of
      them to make sure the tiny extent ends up next to something big.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      940100a4
    • Chris Mason's avatar
      Btrfs: run the backing dev more often in the submit_bio helper · 51684082
      Chris Mason authored
      The submit_bio helper thread can decide to loop back around to
      service more bios.  This commit forces it to unplug first, which helps
      reduce the latency seen by submitters.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      51684082
    • Josef Bacik's avatar
      Btrfs: make subvolid=0 mount the original default root · 4849f01d
      Josef Bacik authored
      Since theres not a good way to make sure the user sees the original default root
      tree id, and not to mention it's 5 so is way different than any other volume,
      just make subvol=0 mount the original default root.  This makes it a bit easier
      for users to handle in the long run.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      4849f01d
    • Josef Bacik's avatar
      Btrfs: add ioctl and incompat flag to set the default mount subvol · 6ef5ed0d
      Josef Bacik authored
      This patch needs to go along with my previous patch.  This lets us set the
      default dir item's location to whatever root we want to use as our default
      mounting subvol.  With this we don't have to use mount -o subvol=<tree id>
      anymore to mount a different subvol, we can just set the new one and it will
      just magically work.  I've done some moderate testing with this, mostly just
      switching the default mount around, mounting subvols and the default mount at
      the same time and such, everything seems to work.  Thanks,
      
      Older kernels would generally be able to still mount the filesystem with the
      default subvolume set, but it would result in a different volume being mounted,
      which could be an even more unpleasant suprise for users.  So if you set your
      default subvolume, you can't go back to older kernels.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      6ef5ed0d
    • Josef Bacik's avatar
      Btrfs: change how we mount subvolumes · 73f73415
      Josef Bacik authored
      This work is in preperation for being able to set a different root as the
      default mounting root.
      
      There is currently a problem with how we mount subvolumes.  We cannot currently
      mount a subvolume of a subvolume, you can only mount subvolumes/snapshots of the
      default subvolume.  So say you take a snapshot of the default subvolume and call
      it snap1, and then take a snapshot of snap1 and call it snap2, so now you have
      
      /
      /snap1
      /snap1/snap2
      
      as your available volumes.  Currently you can only mount / and /snap1,
      you cannot mount /snap1/snap2.  To fix this problem instead of passing
      subvolid=<name> you must pass in subvolid=<treeid>, where <treeid> is
      the tree id that gets spit out via the subvolume listing you get from
      the subvolume listing patches (btrfs filesystem list).  This allows us
      to mount /, /snap1 and /snap1/snap2 as the root volume.
      
      In addition to the above, we also now read the default dir item in the
      tree root to get the root key that it points to.  For now this just
      points at what has always been the default subvolme, but later on I plan
      to change it to point at whatever root you want to be the new default
      root, so you can just set the default mount and not have to mount with
      -o subvolid=<treeid>.  I tested this out with the above scenario and it
      worked perfectly.  Thanks,
      
      mount -o subvol operates inside the selected subvolid.  For example:
      
      mount -o subvol=snap1,subvolid=256 /dev/xxx /mnt
      
      /mnt will have the snap1 directory for the subvolume with id
      256.
      
      mount -o subvol=snap /dev/xxx /mnt
      
      /mnt will be the snap directory of whatever the default subvolume
      is.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      73f73415
    • Josef Bacik's avatar
      Btrfs: make set/get functions for the super compat_ro flags use compat_ro · 12534832
      Josef Bacik authored
      Our set/get functions for compat_ro_flags actually look at compat_flags.  This
      will mess any attempt to use compat flags up.  The fix is obvious.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      12534832
    • Chris Mason's avatar
      Btrfs: add search and inode lookup ioctls · ac8e9819
      Chris Mason authored
      The search ioctl is a generic tool for doing btree searches from
      userland applications.  The first user of the search ioctl is a
      subvolume listing feature, but we'll also use it to find new
      files in a subvolume.
      
      The search ioctl allows you to specify min and max keys to search for,
      along with min and max transid.  It returns the items along with a
      header that includes the item key.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      ac8e9819
    • TARUISI Hiroaki's avatar
      Btrfs: add a function to lookup a directory path by following backrefs · 98d377a0
      TARUISI Hiroaki authored
      This will be used by the inode lookup ioctl.
      Signed-off-by: default avatarTARUISI Hiroaki <taruishi.hiroak@jp.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      98d377a0
  2. 08 Mar, 2010 2 commits
  3. 12 Feb, 2010 1 commit
  4. 04 Feb, 2010 6 commits
  5. 28 Jan, 2010 8 commits
  6. 18 Jan, 2010 3 commits
    • Josef Bacik's avatar
      Btrfs: fix possible panic on unmount · 11dfe35a
      Josef Bacik authored
      We can race with the unmount of an fs and the stopping of a kthread where we
      will free the block group before we're done using it.  The reason for this is
      because we do not hold a reference on the block group while its caching, since
      the allocator drops its reference once it exits or moves on to the next block
      group.  This patch fixes the problem by taking a reference to the block group
      before we start caching and dropping it when we're done to make sure all
      accesses to the block group are safe.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      11dfe35a
    • Chris Mason's avatar
      Btrfs: deal with NULL acl sent to btrfs_set_acl · a9cc71a6
      Chris Mason authored
      It is legal for btrfs_set_acl to be sent a NULL acl.  This
      makes sure we don't dereference it.  A similar patch was sent by
      Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      a9cc71a6
    • Josef Bacik's avatar
      Btrfs: fix regression in orphan cleanup · 6c090a11
      Josef Bacik authored
      Currently orphan cleanup only ever gets triggered if we cross subvolumes during
      a lookup, which means that if we just mount a plain jane fs that has orphans in
      it, they will never get cleaned up.  This results in panic's like these
      
      http://www.kerneloops.org/oops.php?number=1109085
      
      where adding an orphan entry results in -EEXIST being returned and we panic.  In
      order to fix this, we check to see on lookup if our root has had the orphan
      cleanup done, and if not go ahead and do it.  This is easily reproduceable by
      running this testcase
      
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <string.h>
      #include <unistd.h>
      #include <stdio.h>
      
      int main(int argc, char **argv)
      {
      	char data[4096];
      	char newdata[4096];
      	int fd1, fd2;
      
      	memset(data, 'a', 4096);
      	memset(newdata, 'b', 4096);
      
      	while (1) {
      		int i;
      
      		fd1 = creat("file1", 0666);
      		if (fd1 < 0)
      			break;
      
      		for (i = 0; i < 512; i++)
      			write(fd1, data, 4096);
      
      		fsync(fd1);
      		close(fd1);
      
      		fd2 = creat("file2", 0666);
      		if (fd2 < 0)
      			break;
      
      		ftruncate(fd2, 4096 * 512);
      
      		for (i = 0; i < 512; i++)
      			write(fd2, newdata, 4096);
      		close(fd2);
      
      		i = rename("file2", "file1");
      		unlink("file1");
      	}
      
      	return 0;
      }
      
      and then pulling the power on the box, and then trying to run that test again
      when the box comes back up.  I've tested this locally and it fixes the problem.
      Thanks to Tomas Carnecky for helping me track this down initially.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      6c090a11