An error occurred fetching the project authors.
- 28 Jan, 2014 14 commits
-
-
Wang Shilong authored
Steps to reproduce: # mkfs.btrfs -f /dev/sda8 # mount /dev/sda8 /mnt -o flushoncommit # dd if=/dev/zero of=/mnt/data bs=4k count=102400 & # mount /dev/sda8 /mnt -o remount, ro When remounting RW to RO, the logic is to firstly set flag to RO and then commit transaction, however with option flushoncommit enabled,we will do RO check within committing transaction, so we get a transaction abortion here. Actually,here check is wrong, we should check if FS_STATE_ERROR is set, fix it. Reported-by:
Qu Wenruo <quwenruo@cn.fujitsu.com> Suggested-by:
Miao Xie <miaox@cn.fujitsu.com> Signed-off-by:
Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by:
Josef Bacik <jbacik@fb.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
Filipe David Borba Manana authored
This change adds infrastructure to allow for generic properties for inodes. Properties are name/value pairs that can be associated with inodes for different purposes. They are stored as xattrs with the prefix "btrfs." Properties can be inherited - this means when a directory inode has inheritable properties set, these are added to new inodes created under that directory. Further, subvolumes can also have properties associated with them, and they can be inherited from their parent subvolume. Naturally, directory properties have priority over subvolume properties (in practice a subvolume property is just a regular property associated with the root inode, objectid 256, of the subvolume's fs tree). This change also adds one specific property implementation, named "compression", whose values can be "lzo" or "zlib" and it's an inheritable property. The corresponding changes to btrfs-progs were also implemented. A patch with xfstests for this feature will follow once there's agreement on this change/feature. Further, the script at the bottom of this commit message was used to do some benchmarks to measure any performance penalties of this feature. Basically the tests correspond to: Test 1 - create a filesystem and mount it with compress-force=lzo, then sequentially create N files of 64Kb each, measure how long it took to create the files, unmount the filesystem, mount the filesystem and perform an 'ls -lha' against the test directory holding the N files, and report the time the command took. Test 2 - create a filesystem and don't use any compression option when mounting it - instead set the compression property of the subvolume's root to 'lzo'. Then create N files of 64Kb, and report the time it took. The unmount the filesystem, mount it again and perform an 'ls -lha' like in the former test. This means every single file ends up with a property (xattr) associated to it. Test 3 - same as test 2, but uses 4 properties - 3 are duplicates of the compression property, have no real effect other than adding more work when inheriting properties and taking more btree leaf space. Test 4 - same as test 3 but with 10 properties per file. Results (in seconds, and averages of 5 runs each), for different N numbers of files follow. * Without properties (test 1) file creation time ls -lha time 10 000 files 3.49 0.76 100 000 files 47.19 8.37 1 000 000 files 518.51 107.06 * With 1 property (compression property set to lzo - test 2) file creation time ls -lha time 10 000 files 3.63 0.93 100 000 files 48.56 9.74 1 000 000 files 537.72 125.11 * With 4 properties (test 3) file creation time ls -lha time 10 000 files 3.94 1.20 100 000 files 52.14 11.48 1 000 000 files 572.70 142.13 * With 10 properties (test 4) file creation time ls -lha time 10 000 files 4.61 1.35 100 000 files 58.86 13.83 1 000 000 files 656.01 177.61 The increased latencies with properties are essencialy because of: *) When creating an inode, we now synchronously write 1 more item (an xattr item) for each property inherited from the parent dir (or subvolume). This could be done in an asynchronous way such as we do for dir intex items (delayed-inode.c), which could help reduce the file creation latency; *) With properties, we now have larger fs trees. For this particular test each xattr item uses 75 bytes of leaf space in the fs tree. This could be less by using a new item for xattr items, instead of the current btrfs_dir_item, since we could cut the 'location' and 'type' fields (saving 18 bytes) and maybe 'transid' too (saving a total of 26 bytes per xattr item) from the btrfs_dir_item type. Also tried batching the xattr insertions (ignoring proper hash collision handling, since it didn't exist) when creating files that inherit properties from their parent inode/subvolume, but the end results were (surprisingly) essentially the same. Test script: $ cat test.pl #!/usr/bin/perl -w use strict; use Time::HiRes qw(time); use constant NUM_FILES => 10_000; use constant FILE_SIZES => (64 * 1024); use constant DEV => '/dev/sdb4'; use constant MNT_POINT => '/home/fdmanana/btrfs-tests/dev'; use constant TEST_DIR => (MNT_POINT . '/testdir'); system("mkfs.btrfs", "-l", "16384", "-f", DEV) == 0 or die "mkfs.btrfs failed!"; # following line for testing without properties #system("mount", "-o", "compress-force=lzo", DEV, MNT_POINT) == 0 or die "mount failed!"; # following 2 lines for testing with properties system("mount", DEV, MNT_POINT) == 0 or die "mount failed!"; system("btrfs", "prop", "set", MNT_POINT, "compression", "lzo") == 0 or die "set prop failed!"; system("mkdir", TEST_DIR) == 0 or die "mkdir failed!"; my ($t1, $t2); $t1 = time(); for (my $i = 1; $i <= NUM_FILES; $i++) { my $p = TEST_DIR . '/file_' . $i; open(my $f, '>', $p) or die "Error opening file!"; $f->autoflush(1); for (my $j = 0; $j < FILE_SIZES; $j += 4096) { print $f ('A' x 4096) or die "Error writing to file!"; } close($f); } $t2 = time(); print "Time to create " . NUM_FILES . ": " . ($t2 - $t1) . " seconds.\n"; system("umount", DEV) == 0 or die "umount failed!"; system("mount", DEV, MNT_POINT) == 0 or die "mount failed!"; $t1 = time(); system("bash -c 'ls -lha " . TEST_DIR . " > /dev/null'") == 0 or die "ls failed!"; $t2 = time(); print "Time to ls -lha all files: " . ($t2 - $t1) . " seconds.\n"; system("umount", DEV) == 0 or die "umount failed!"; Signed-off-by:
Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by:
Josef Bacik <jbacik@fb.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
Filipe David Borba Manana authored
When writing to a file we drop existing file extent items that cover the write range and then add a new file extent item that represents that write range. Before this change we were doing a tree lookup to remove the file extent items, and then after we did another tree lookup to insert the new file extent item. Most of the time all the file extent items we need to drop are located within a single leaf - this is the leaf where our new file extent item ends up at. Therefore, in this common case just combine these 2 operations into a single one. By avoiding the second btree navigation for insertion of the new file extent item, we reduce btree node/leaf lock acquisitions/releases, btree block/leaf COW operations, CPU time on btree node/leaf key binary searches, etc. Besides for file writes, this is an operation that happens for file fsync's as well. However log btrees are much less likely to big as big as regular fs btrees, therefore the impact of this change is smaller. The following benchmark was performed against an SSD drive and a HDD drive, both for random and sequential writes: sysbench --test=fileio --file-num=4096 --file-total-size=8G \ --file-test-mode=[rndwr|seqwr] --num-threads=512 \ --file-block-size=8192 \ --max-requests=1000000 \ --file-fsync-freq=0 --file-io-mode=sync [prepare|run] All results below are averages of 10 runs of the respective test. ** SSD sequential writes Before this change: 225.88 Mb/sec After this change: 277.26 Mb/sec ** SSD random writes Before this change: 49.91 Mb/sec After this change: 56.39 Mb/sec ** HDD sequential writes Before this change: 68.53 Mb/sec After this change: 69.87 Mb/sec ** HDD random writes Before this change: 13.04 Mb/sec After this change: 14.39 Mb/sec Signed-off-by:
Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by:
Josef Bacik <jbacik@fb.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
Miao Xie authored
The following warning message was outputed when running the 274th case of xfstests with nodatacow option: BUG: Bad page state in process kswapd0 pfn:1c66f page:ffffea0000636848 count:0 mapcount:0 mapping:(null) index:0x78000 page flags: 0x1000000000100a(error|uptodate|private_2) It is because the check of nocow range was wrong, we should compare the start and end position of the extent with the write position to verify if the write position was in the extent, but the current code just used the start postion to do the check, so we got the wrong extent and told the caller that it was a nocow write. And then when we write back the dirty pages, we found we should cow the extent, but at that time, there was no space in the fs, we had to the error flag for the page. When someone reclaimed that page, the above warning outputed. Fix it. Reported-by:
Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by:
Miao Xie <miaox@cn.fujitsu.com> Signed-off-by:
Josef Bacik <jbacik@fb.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
Filipe David Borba Manana authored
If we do a btree search with the goal of updating an existing item without changing its size (ins_len == 0 and cow == 1), then we never need to hold locks on upper level nodes (even when slot == 0) after we COW their child nodes/leaves, as we won't have node splits or merges in this scenario (that is, no key additions, removals or shifts on any nodes or leaves). Therefore release the locks immediately after COWing the child nodes/leaves while navigating the btree, even if their parent slot is 0, instead of returning a path to the caller with those nodes locked, which would get released only when the caller releases or frees the path (or if it calls btrfs_unlock_up_safe). This is a common scenario, for example when updating inode items in fs trees and block group items in the extent tree. The following benchmarks were performed on a quad core machine with 32Gb of ram, using a leaf/node size of 4Kb (to generate deeper fs trees more quickly). sysbench --test=fileio --file-num=131072 --file-total-size=8G \ --file-test-mode=seqwr --num-threads=512 --file-block-size=8192 \ --max-requests=100000 --file-io-mode=sync [prepare|run] Before this change: 49.85Mb/s (average of 5 runs) After this change: 50.38Mb/s (average of 5 runs) Signed-off-by:
Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by:
Josef Bacik <jbacik@fb.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
Miao Xie authored
The inode reference item is close to inode item, so we insert it simultaneously with the inode item insertion when we create a file/directory.. In fact, we also can handle the inode reference deletion by the same way. So we made this patch to introduce the delayed inode reference deletion for the single link inode(At most case, the file doesn't has hard link, so we don't take the hard link into account). This function is based on the delayed inode mechanism. After applying this patch, we can reduce the time of the file/directory deletion by ~10%. Signed-off-by:
Miao Xie <miaox@cn.fujitsu.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
Frank Holton authored
Convert all applicable cases of printk and pr_* to the btrfs_* macros. Fix all uses of the BTRFS prefix. Signed-off-by:
Frank Holton <fholton@gmail.com> Signed-off-by:
Josef Bacik <jbacik@fb.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
Wang Shilong authored
See the warning below: [ 1209.102076] [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs] [ 1209.102084] [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs] [ 1209.102089] [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40 [ 1209.102092] [<ffffffff8118ab2e>] evict+0x9e/0x190 [ 1209.102094] [<ffffffff8118b313>] iput+0xf3/0x180 [ 1209.102101] [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs] [ 1209.102107] [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs] clear extent bit here to avoid triggering WARN_ON() in remove_extent_mapping() Signed-off-by:
Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by:
Josef Bacik <jbacik@fb.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
Wang Shilong authored
Chris introduced hleper function read_csums() and this function has been removed, but we forgot to remove its corresponding comments. Signed-off-by:
Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by:
Josef Bacik <jbacik@fb.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
Tsutomu Itoh authored
Clean up btrfs_lookup_dentry() to never return NULL, but PTR_ERR(-ENOENT) instead. This keeps the return value convention consistent. Callers who use btrfs_lookup_dentry() require a trivial update. create_snapshot() in particular looks like it can also lose a BUG_ON(!inode) which is not really needed - there seems less harm in returning ENOENT to userspace at that point in the stack than there is to crash the machine. Signed-off-by:
Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by:
Josef Bacik <jbacik@fb.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
Filipe David Borba Manana authored
The inode eviction can be very slow, because during eviction we tell the VFS to truncate all of the inode's pages. This results in calls to btrfs_invalidatepage() which in turn does calls to lock_extent_bits() and clear_extent_bit(). These calls result in too many merges and splits of extent_state structures, which consume a lot of time and cpu when the inode has many pages. In some scenarios I have experienced umount times higher than 15 minutes, even when there's no pending IO (after a btrfs fs sync). A quick way to reproduce this issue: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ cd /mnt/btrfs $ sysbench --test=fileio --file-num=128 --file-total-size=16G \ --file-test-mode=seqwr --num-threads=128 \ --file-block-size=16384 --max-time=60 --max-requests=0 run $ time btrfs fi sync . FSSync '.' real 0m25.457s user 0m0.000s sys 0m0.092s $ cd .. $ time umount /mnt/btrfs real 1m38.234s user 0m0.000s sys 1m25.760s The same test on ext4 runs much faster: $ mkfs.ext4 /dev/sdb3 $ mount /dev/sdb3 /mnt/ext4 $ cd /mnt/ext4 $ sysbench --test=fileio --file-num=128 --file-total-size=16G \ --file-test-mode=seqwr --num-threads=128 \ --file-block-size=16384 --max-time=60 --max-requests=0 run $ sync $ cd .. $ time umount /mnt/ext4 real 0m3.626s user 0m0.004s sys 0m3.012s After this patch, the unmount (inode evictions) is much faster: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ cd /mnt/btrfs $ sysbench --test=fileio --file-num=128 --file-total-size=16G \ --file-test-mode=seqwr --num-threads=128 \ --file-block-size=16384 --max-time=60 --max-requests=0 run $ time btrfs fi sync . FSSync '.' real 0m26.774s user 0m0.000s sys 0m0.084s $ cd .. $ time umount /mnt/btrfs real 0m1.811s user 0m0.000s sys 0m1.564s Signed-off-by:
Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by:
Josef Bacik <jbacik@fb.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
Kelley Nielsen authored
This patch is the second step in bootstrapping the btrfs_find_item interface. The btrfs_find_root_ref() is similar to the former __inode_info(); it accepts four of its parameters, and duplicates the first half of its functionality. Replace the one former call to btrfs_find_root_ref() with a call to btrfs_find_item(), along with the defined key type that was used internally by btrfs_find_root ref, and a null found key. In btrfs_find_item(), add a test for the null key at the place where the functionality of btrfs_find_root_ref() ends; btrfs_find_item() then returns if the test passes. Finally, remove btrfs_find_root_ref(). Signed-off-by:
Kelley Nielsen <kelleynnn@gmail.com> Suggested-by:
Zach Brown <zab@redhat.com> Reviewed-by:
Josh Triplett <josh@joshtriplett.org> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
Valentina Giusti authored
Variable owner in btrfs_new_inode is unused since commit d82a6f1d (Btrfs: kill BTRFS_I(inode)->block_group) Signed-off-by:
Valentina Giusti <valentina.giusti@microon.de> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
Josef Bacik authored
Btrfs has always had these filler extent data items for holes in inodes. This has made somethings very easy, like logging hole punches and sending hole punches. However for large holey files these extent data items are pure overhead. So add an incompatible feature to no longer add hole extents to reduce the amount of metadata used by these sort of files. This has a few changes for logging and send obviously since they will need to detect holes and log/send the holes if there are any. I've tested this thoroughly with xfstests and it doesn't cause any issues with and without the incompat format set. Thanks, Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <clm@fb.com>
-
- 21 Nov, 2013 2 commits
-
-
Steven Rostedt authored
Doing an if statement to test some condition to know if we should trigger a tracepoint is pointless when tracing is disabled. This just adds overhead and wastes a branch prediction. This is why the TRACE_EVENT_CONDITION() was created. It places the check inside the jump label so that the branch does not happen unless tracing is enabled. That is, instead of doing: if (em) trace_btrfs_get_extent(root, em); Which is basically this: if (em) if (static_key(trace_btrfs_get_extent)) { Using a TRACE_EVENT_CONDITION() we can just do: trace_btrfs_get_extent(root, em); And the condition trace event will do: if (static_key(trace_btrfs_get_extent)) { if (em) { ... The static key is a non conditional jump (or nop) that is faster than having to check if em is NULL or not. Signed-off-by:
Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
We can just return false for this so we stop doing the snapshot aware defrag stuff. Thanks, Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
- 12 Nov, 2013 19 commits
-
-
Miao Xie authored
rename the function -- btrfs_start_all_delalloc_inodes(), and make its name be compatible to btrfs_wait_ordered_roots(), since they are always used at the same place. Signed-off-by:
Miao Xie <miaox@cn.fujitsu.com> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Dulshani Gunawardhana authored
Fix spacing issues detected via checkpatch.pl in accordance with the kernel style guidelines. Signed-off-by:
Dulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Dulshani Gunawardhana authored
Use WARN_ON()'s return value in place of WARN_ON(1) for cleaner source code that outputs a more descriptive warnings. Also fix the styling warning of redundant braces that came up as a result of this fix. Signed-off-by:
Dulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Reviewed-by:
Zach Brown <zab@redhat.com> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Liu Bo authored
If something wrong happens in write endio, running snapshot-aware defragment can end up with undefined results, maybe a crash, so we should avoid it. In order to share similar code, this also adds a helper to free the struct for snapshot-aware defrag. Signed-off-by:
Liu Bo <bo.li.liu@oracle.com> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
When using delalloc workers in a non-waiting way (like for enospc handling) we can end up not actually waiting for the dirty pages to be started if we have compression. We need to add an extra filemap flush to make sure any async extents that have started are actually moved along before returning. Thanks, Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
This is just the write path, the only reason we start a transaction is so we can check cross references, we don't make any actual changes, so there is no reason to abort the transaction if we fail. Thanks, Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
We can just return an error and we'll bail out properly. We still want to catch this case to make sure we don't have a bug somewhere, so just warn if this pops up. Thanks, Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
I noticed that if the free space cache has an error writing out it's data it won't actually error out, it will just carry on. This is because it doesn't check the return value of btrfs_wait_ordered_range, which didn't actually return anything. So fix this in order to keep us from making free space cache look valid when it really isnt. Thanks, Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Zach Brown authored
fs/btrfs/compat.h only contained trivial macro wrappers of drop_nlink() and inc_nlink(). This doesn't belong in mainline. Signed-off-by:
Zach Brown <zab@redhat.com> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
While trying to kill our hole extents I noticed I was seeing problems where we seek into a file and then start writing and then try to fiemap that file later. This is because we search for offset 0, don't find anything and so back up one slot, which puts us at the inode ref or something like that, which means we goto not_found and create an extent map for our entire search area. This isn't quite what we want, we want to move forward one slot and see if there is an extent there so we can limit our hole extent. This patch fixes this problem, I will add a testcase for this as well. Thanks, Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
I'm going to be removing hole extents in the near future so I wanted to make a sanity test for btrfs_get_extent to make sure I don't break anything in the meantime. This patch just puts btrfs_get_extent through its paces by giving it a completely unreasonable mapping to look at and make sure it is giving us back maps that make sense. Thanks, Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
While trying to track down a reserved space leak I noticed a few places where we won't properly clean up reserved space if we have an error, this patch fixes those up. Thanks, Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Filipe David Borba Manana authored
Currently the hash value used for adding an inode to the VFS's inode hash table consists of the plain inode number, which is a 64 bits integer. This results in hash table buckets (hlist_head lists) with too many elements for at least 2 important scenarios: 1) When we have many subvolumes. Each subvolume has its own btree where its files and directories are added to, and each has its own objectid (inode number) namespace. This means that if we have N subvolumes, and all have inode number X associated to a file or directory, the corresponding inodes all map to the same hash table entry, resulting in a bucket (hlist_head list) with N elements; 2) On 32 bits machines. Th VFS hash values are unsigned longs, which are 32 bits wide on 32 bits machines, and the inode (objectid) numbers are 64 bits unsigned integers. We simply cast the inode numbers to hash values, which means that for all inodes with the same 32 bits lower half, the same hash bucket is used for all of them. For example, all inodes with a number (objectid) between 0x0000_0000_ffff_ffff and 0xffff_ffff_ffff_ffff will end up in the same hash table bucket. This change ensures the inode's hash value depends both on the objectid (inode number) and its subvolume's (btree root) objectid. For 32 bits machines, this change gives better entropy by making the hash value depend on both the upper and lower 32 bits of the 64 bits hash previously computed. Signed-off-by:
Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Miao Xie authored
The performance was slowed down sometimes when we ran sysbench to measure the performance of the sequential buffered write by 2 or more threads. It was because the write order of the test threads might be confused by the task scheduler, and the coming write would be beyond the end of the file, in this case, we need insert dummy file extents and create a hole for the area we skip. But in order to avoid the ongoing ordered extents which are in the area, we need wait for them. Unfortunately, the current code doesn't check if there are ordered extents in the area or not, try to find and flush the dirty pages directly, but in fact, there is no dirty page in that area, this step of the current code is unnecessary, and just wastes time. Sometimes, it would increase the contention of some locks, and makes the performance slow down suddenly. So we remove the ordered extent flush function before the check, and flush the dirty pages and wait for the ordered extents only when we find them. According to my test, we got 1-2 times of the performance regression when we ran the test by 10 times before applying this patch. After applying this patch, the regression went away. Test Environment: CPU: 1CPU * 4Cores Memory: 6GB Partition: 20GB Test Command: # sysbench --test=fileio --file-total-size=16G --file-test-mode=seqwr \ > --num-threads=512 --file-block-size=16384 --max-time=60 --max-requests=0 run Signed-off-by:
Miao Xie <miaox@cn.fujitsu.com> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
I've been testing our error paths and I was tripping the BUG_ON() in drop_outstanding_extent because our outstanding_extents is 0 for space cache inodes. This is because we don't reserve metadata space for these inodes since we depend on the global block reserve for our space. To fix this we need to make sure the DO_ACCOUNTING stuff doesn't actually call release_metadata for space cache inodes. With this patch I'm no longer panicing. Thanks, Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Filipe David Borba Manana authored
In inode.c:btrfs_orphan_add() if we failed to insert the orphan item, we would return without decrementing the orphan count that we just incremented before attempting the insertion, leaving the orphan inode count wrong. In inode.c:btrfs_orphan_del(), we were decrementing the inode orphan count if the bit BTRFS_INODE_ORPHAN_META_RESERVED was set, which is logically wrong because it should be decremented if the bit BTRFS_INODE_HAS_ORPHAN_ITEM was set - after all we increment the count when we set the bit BTRFS_INODE_HAS_ORPHAN_ITEM elsewhere. Signed-off-by:
Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Ross Kirk authored
Remove unused eb parameter from btrfs_item_nr Signed-off-by:
Ross Kirk <ross.kirk@gmail.com> Reviewed-by:
David Sterba <dsterba@suse.cz> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Filipe David Borba Manana authored
It is not necessary to store the NULL byte in a symlink inline file extent. There's currently no code that requires the NULL byte to be present in the extent. This change also doesn't break file format compatibility nor the send/receive feature. The VFS also doesn't need the NULL byte to be present in the extent, as it reads up to inode->i_size bytes (which already excluded the NULL byte) and sets the NULL byte for us (in fs/namei.c:page_getlink()). So with this change we save 1 byte per symlink file extent (which is always inlined in the btree leaf) without losing backward and forward compatibility. Signed-off-by:
Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Stefan Behrens authored
The fact that btrfs_root_refs() returned 0 for the tree_root caused bugs in the past, therefore it is set to 1 with this patch and (hopefully) all affected code is adapted to this change. I verified this change by temporarily adding WARN_ON() checks everywhere where btrfs_root_refs() is used, checking whether the logic of the code is changed by btrfs_root_refs() returning 1 instead of 0 for root->root_key.objectid == BTRFS_ROOT_TREE_OBJECTID. With these added checks, I ran the xfstests './check -g auto'. The two roots chunk_root and log_root_tree that are only referenced by the superblock and the log_roots below the log_root_tree still have btrfs_root_refs() == 0, only the tree_root is changed. Signed-off-by:
Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
- 18 Oct, 2013 1 commit
-
-
Josef Bacik authored
We can't be holding tree locks while we try to start a transaction, we will deadlock. Thanks, Reported-by:
Sage Weil <sage@inktank.com> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
- 11 Oct, 2013 1 commit
-
-
Josef Bacik authored
btrfs_rename was using the root of the old dir instead of the root of the new dir when checking for a hash collision, so if you tried to move a file into a subvol it would freak out because it would see the file you are trying to move in its current root. This fixes the bug where this would fail btrfs subvol create test1 btrfs subvol create test2 mv test1 test2. Thanks to Chris Murphy for catching this, Cc: stable@vger.kernel.org Reported-by:
Chris Murphy <lists@colorremedies.com> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
- 21 Sep, 2013 3 commits
-
-
Guangyu Sun authored
Commit 2bc55652 (Btrfs: don't update atime on RO subvolumes) ensures that the access time of an inode is not updated when the inode lives in a read-only subvolume. However, if a directory on a read-only subvolume is accessed, the atime is updated. This results in a write operation to a read-only subvolume. I believe that access times should never be updated on read-only subvolumes. To reproduce: # mkfs.btrfs -f /dev/dm-3 (...) # mount /dev/dm-3 /mnt # btrfs subvol create /mnt/sub Create subvolume '/mnt/sub' # mkdir /mnt/sub/dir # echo "abc" > /mnt/sub/dir/file # btrfs subvol snapshot -r /mnt/sub /mnt/rosnap Create a readonly snapshot of '/mnt/sub' in '/mnt/rosnap' # stat /mnt/rosnap/dir File: `/mnt/rosnap/dir' Size: 8 Blocks: 0 IO Block: 4096 directory Device: 16h/22d Inode: 257 Links: 1 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-09-11 07:21:49.389157126 -0400 Modify: 2013-09-11 07:22:02.330156079 -0400 Change: 2013-09-11 07:22:02.330156079 -0400 # ls /mnt/rosnap/dir file # stat /mnt/rosnap/dir File: `/mnt/rosnap/dir' Size: 8 Blocks: 0 IO Block: 4096 directory Device: 16h/22d Inode: 257 Links: 1 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-09-11 07:22:56.797151670 -0400 Modify: 2013-09-11 07:22:02.330156079 -0400 Change: 2013-09-11 07:22:02.330156079 -0400 Reported-by:
Koen De Wit <koen.de.wit@oracle.com> Signed-off-by:
Guangyu Sun <guangyu.sun@oracle.com> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
We don't do the iput when we fail to allocate our delayed delalloc work in __start_delalloc_inodes, fix this. Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-
Filipe David Borba Manana authored
Instead of removing the current inode from the red black tree and then add the new one, just use the red black tree replace operation, which is more efficient. Signed-off-by:
Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by:
Zach Brown <zab@redhat.com> Signed-off-by:
Josef Bacik <jbacik@fusionio.com> Signed-off-by:
Chris Mason <chris.mason@fusionio.com>
-