Commits · 60e6679e28518ccd67169c4a539d8cc7490eb8a6 · Kirill Smelkov / linux

17 May, 2010 12 commits

ext4: Drop whitespace at end of lines · 60e6679e

Theodore Ts'o authored May 17, 2010

This patch was generated using:

#!/usr/bin/perl -i
while (<>) {
    s/[ 	]+$//;
    print;
}
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

60e6679e

ext4: Fix compat EXT4_IOC_ADD_GROUP · 4d92dc0f

Ben Hutchings authored May 17, 2010

struct ext4_new_group_input needs to be converted because u64 has
only 32-bit alignment on some 32-bit architectures, notably i386.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

4d92dc0f

ext4: Conditionally define compat ioctl numbers · 899ad0ce

Ben Hutchings authored May 17, 2010

It is unnecessary, and in general impossible, to define the compat
ioctl numbers except when building the filesystem with CONFIG_COMPAT
defined.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

899ad0ce

tracing: Convert more ext4 events to DEFINE_EVENT · f084db93

Li Zefan authored May 17, 2010

Use DECLARE_EVENT_CLASS, and save ~2.7K:

   text    data     bss     dec     hex filename
 274441    7200     260  281901   44d2d fs/ext4/ext4.o.orig
 271881    7040     256  279177   44289 fs/ext4/ext4.o

4 events are converted:

  ext4__mb_new_pa: ext4_mb_new_inode_pa, ext4_mb_new_group_pa
  ext4__mballoc:   ext4_mballoc_discard, ext4_mballoc_free

No change in functionality.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f084db93

ext4: Add new tracepoints to track mballoc's buddy bitmap loads · f307333e
Theodore Ts'o authored May 17, 2010
```
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
f307333e

ext4: Add a missing trace hook · 5a58ec87

Li Zefan authored May 17, 2010

Commit f8ec9d68 added a
trace event ext4_da_release_space, but didn't add some
corresponding trace hook.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

5a58ec87

ext4: restart ext4_ext_remove_space() after transaction restart · 0617b83f

Dmitry Monakhov authored May 17, 2010

If i_data_sem was internally dropped due to transaction restart, it is
necessary to restart path look-up because extents tree was possibly
modified by ext4_get_block().

https://bugzilla.kernel.org/show_bug.cgi?id=15827Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Jan Kara <jack@suse.cz>

0617b83f

ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted · 786ec791

Theodore Ts'o authored May 17, 2010

Dimitry Monakhov discovered an edge case where it was possible for the
EXT4_EOFBLOCKS_FL flag could get cleared unnecessarily.  This is true;
I have a test case that can be exercised via downloading and
decompressing the file:

wget ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ext4-testcases/eofblocks-fl-test-case.img.bz2
bunzip2 eofblocks-fl-test-case.img
dd if=/dev/zero of=eofblocks-fl-test-case.img bs=1k seek=17925 bs=1k count=1 conv=notrunc

However, triggering it in real life is highly unlikely since it
requires an extremely fragmented sparse file with a hole in exactly
the right place in the extent tree.  (It actually took quite a bit of
work to generate this test case.)  Still, it's nice to get even
extreme corner cases to be correct, so this patch makes sure that we
don't clear the EXT4_EOFBLOCKS_FL incorrectly even in this corner
case.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

786ec791

ext4: Avoid crashing on NULL ptr dereference on a filesystem error · f70f362b

Theodore Ts'o authored May 16, 2010

If the EOFBLOCK_FL flag is set when it should not be and the inode is
zero length, then eh_entries is zero, and ex is NULL, so dereferencing
ex to print ex->ee_block causes a kernel OOPS in
ext4_ext_map_blocks().

On top of that, the error message which is printed isn't very helpful.
So we fix this by printing something more explanatory which doesn't
involve trying to print ex->ee_block.

Addresses-Google-Bug: #2655740
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f70f362b

ext4: Use bitops to read/modify i_flags in struct ext4_inode_info · 12e9b892

Dmitry Monakhov authored May 16, 2010

At several places we modify EXT4_I(inode)->i_flags without holding
i_mutex (ext4_do_update_inode, ...). These modifications are racy and
we can lose updates to i_flags. So convert handling of i_flags to use
bitops which are atomic.

https://bugzilla.kernel.org/show_bug.cgi?id=15792Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

12e9b892

ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE() · 24676da4

Theodore Ts'o authored May 16, 2010

EXT4_ERROR_INODE() tends to provide better error information and in a
more consistent format.  Some errors were not even identifying the inode
or directory which was corrupted, which made them not very useful.

Addresses-Google-Bug: #2507977
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

24676da4

ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks() · 2ed88685

Theodore Ts'o authored May 16, 2010

This saves a huge amount of stack space by avoiding unnecesary struct
buffer_head's from being allocated on the stack.

In addition, to make the code easier to understand, collapse and
refactor ext4_get_block(), ext4_get_block_write(),
noalloc_get_block_write(), into a single function.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

2ed88685

16 May, 2010 21 commits

ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks() · e35fd660

Theodore Ts'o authored May 16, 2010

Jack up ext4_get_blocks() and add a new function, ext4_map_blocks()
which uses a much smaller structure, struct ext4_map_blocks which is
20 bytes, as opposed to a struct buffer_head, which nearly 5 times
bigger on an x86_64 machine. By switching things to use
ext4_map_blocks(), we can save stack space by using ext4_map_blocks()
since we can avoid allocating a struct buffer_head on the stack.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

e35fd660

ext4: Use our own write_cache_pages() · 8e48dcfb

Theodore Ts'o authored May 16, 2010

Make a copy of write_cache_pages() for the benefit of
ext4_da_writepages().  This allows us to simplify the code some, and
will allow us to further customize the code in future patches.

There are some nasty hacks in write_cache_pages(), which Linus has
(correctly) characterized as vile.  I've just copied it into
write_cache_pages_da(), without trying to clean those bits up lest I
break something in the ext4's delalloc implementation, which is a bit
fragile right now.  This will allow Dave Chinner to clean up
write_cache_pages() in mm/page-writeback.c, without worrying about
breaking ext4.  Eventually write_cache_pages_da() will go away when I
rewrite ext4's delayed allocation and create a general
ext4_writepages() which is used for all of ext4's writeback.  Until
now this is the lowest risk way to clean up the core
write_cache_pages() function.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dave Chinner <david@fromorbit.com>

8e48dcfb

ext4: Show journal_checksum option · 39a4bade

Jan Kara authored May 16, 2010

We failed to show journal_checksum option in /proc/mounts. Fix it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

39a4bade

ext4: Fix for ext4_mb_collect_stats() · 291dae47

Curt Wohlgemuth authored May 16, 2010

Fix ext4_mb_collect_stats() to use the correct test for s_bal_success; it
should be testing "best-extent.fe_len >= orig-extent.fe_len" , not
"orig-extent.fe_len >= goal-extent.fe_len" .
Signed-off-by: Curt Wohlgemuth <curtw@google.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

291dae47

ext4: check for a good block group before loading buddy pages · 8a57d9d6

Curt Wohlgemuth authored May 16, 2010

This adds a new field in ext4_group_info to cache the largest available
block range in a block group; and don't load the buddy pages until *after*
we've done a sanity check on the block group.

With large allocation requests (e.g., fallocate(), 8MiB) and relatively full
partitions, it's easy to have no block groups with a block extent large
enough to satisfy the input request length.  This currently causes the loop
during cr == 0 in ext4_mb_regular_allocator() to load the buddy bitmap pages
for EVERY block group.  That can be a lot of pages.  The patch below allows
us to call ext4_mb_good_group() BEFORE we load the buddy pages (although we
have check again after we lock the block group).

Addresses-Google-Bug: #2578108
Addresses-Google-Bug: #2704453
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

8a57d9d6

ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate · 6d19c42b

Nikanth Karthikesan authored May 16, 2010

Currently using posix_fallocate one can bypass an RLIMIT_FSIZE limit
and create a file larger than the limit. Add a check for that.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Amit Arora <aarora@in.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

6d19c42b

ext4: Remove extraneous newlines in ext4_msg() calls · fbe845dd

Curt Wohlgemuth authored May 16, 2010

Addresses-Google-Bug: #2562325
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

fbe845dd

ext4: Print mount options in when mounting and add a remount message · d4c402d9

Curt Wohlgemuth authored May 16, 2010

This adds a "re-mounted" message to ext4_remount(), and both it and
the mount message in ext4_fill_super() now have the original mount
options data string.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

d4c402d9

ext4: don't use quota reservation for speculative metadata · 72b8ab9d

Eric Sandeen authored May 16, 2010

Because we can badly over-reserve metadata when we
calculate worst-case, it complicates things for quota, since
we must reserve and then claim later, retry on EDQUOT, etc.
Quota is also a generally smaller pool than fs free blocks,
so this over-reservation hurts more, and more often.

I'm of the opinion that it's not the worst thing to allow
metadata to push a user slightly over quota.  This simplifies
the code and avoids the false quota rejections that result
from worst-case speculation.

This patch stops the speculative quota-charging for
worst-case metadata requirements, and just charges quota
when the blocks are allocated at writeout.  It also is
able to remove the try-again loop on EDQUOT.

This patch has been tested indirectly by running the xfstests
suite with a hack to mount & enable quota prior to the test.

I also did a more specific test of fragmenting freespace
and then doing a large delalloc write under quota; quota
stopped me at the right amount of file IO, and then the
writeout generated enough metadata (due to the fragmentation)
that it put me slightly over quota, as expected.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

72b8ab9d

quota: add the option to not fail with EDQUOT in block · 0e05842b

Eric Sandeen authored May 16, 2010

To simplify metadata tracking for delalloc writes, ext4
will simply claim metadata blocks at allocation time, without
first speculatively reserving the worst case and then freeing
what was not used.

To do this, we need a mechanism to track allocations in
the quota subsystem, but potentially allow that allocation
to actually go over quota.

This patch adds a DQUOT_SPACE_NOFAIL flag and function
variants for this purpose.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

0e05842b

quota: use flags interface for dquot alloc/free space · 56246f9a

Eric Sandeen authored May 16, 2010

Switch __dquot_alloc_space and __dquot_free_space to take flags
to indicate whether to warn and/or to reserve (or free reserve).

This is slightly more readable at the callpoints, and makes it
cleaner to add a "nofail" option in the next patch.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

56246f9a

ext4: init statistics after journal recovery · 84061e07

Dmitry Monakhov authored May 16, 2010

Currently block/inode/dir counters initialized before journal was
recovered. In fact after journal recovery this info will probably
change. And freeblocks it critical for correct delalloc mode
accounting.

https://bugzilla.kernel.org/show_bug.cgi?id=15768Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

84061e07

ext4: clean up inode bitmaps manipulation in ext4_free_inode · d17413c0

Dmitry Monakhov authored May 16, 2010

- Reorganize locking scheme to batch two atomic operation in to one.
  This also allow us to state what healthy group must obey following rule
  ext4_free_inodes_count(sb, gdp) == ext4_count_free(inode_bitmap, NUM);
- Fix possible undefined pointer dereference.
- Even if group descriptor stats aren't accessible we have to update
  inode bitmaps.
- Move non-group members update out of group_lock.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

d17413c0

ext4: Do not zero out uninitialized extents beyond i_size · 21ca087a

Dmitry Monakhov authored May 16, 2010

The extents code will sometimes zero out blocks and mark them as
initialized instead of splitting an extent into several smaller ones.
This optimization however, causes problems if the extent is beyond
i_size because fsck will complain if there are uninitialized blocks
after i_size as this can not be distinguished from an inode that has
an incorrect i_size field.

https://bugzilla.kernel.org/show_bug.cgi?id=15742Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

21ca087a

jbd2: Improve scalability by not taking j_state_lock in jbd2_journal_stop() · c35a56a0

Theodore Ts'o authored May 16, 2010

One of the most contended locks in the jbd2 layer is j_state_lock when
running dbench.  This is especially true if using the real-time kernel
with its "sleeping spinlocks" patch that replaces spinlocks with
priority inheriting mutexes --- but it also shows up on large SMP
benchmarks.

Thanks to John Stultz for pointing this out.

Reviewed by Mingming Cao and Jan Kara.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

c35a56a0

ext4: don't scan/accumulate more pages than mballoc will allocate · c445e3e0

Eric Sandeen authored May 16, 2010

There was a bug reported on RHEL5 that a 10G dd on a 12G box
had a very, very slow sync after that.

At issue was the loop in write_cache_pages scanning all the way
to the end of the 10G file, even though the subsequent call
to mpage_da_submit_io would only actually write a smallish amt; then
we went back to the write_cache_pages loop ... wasting tons of time
in calling __mpage_da_writepage for thousands of pages we would
just revisit (many times) later.

Upstream it's not such a big issue for sys_sync because we get
to the loop with a much smaller nr_to_write, which limits the loop.

However, talking with Aneesh he realized that fsync upstream still
gets here with a very large nr_to_write and we face the same problem.

This patch makes mpage_add_bh_to_extent stop the loop after we've
accumulated 2048 pages, by setting mpd->io_done = 1; which ultimately
causes the write_cache_pages loop to break.

Repeating the test with a dirty_ratio of 80 (to leave something for
fsync to do), I don't see huge IO performance gains, but the reduction
in cpu usage is striking: 80% usage with stock, and 2% with the
below patch.  Instrumenting the loop in write_cache_pages clearly
shows that we are wasting time here.

Eventually we need to change mpage_da_map_pages() also submit its I/O
to the block layer, subsuming mpage_da_submit_io(), and then change it
call ext4_get_blocks() multiple times.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

c445e3e0

ext4: stop issuing discards if not supported by device · a30eec2a

Eric Sandeen authored May 16, 2010

Turn off issuance of discard requests if the device does
not support it - similar to the action we take for barriers.
This will save a little computation time if a non-discardable
device is mounted with -o discard, and also makes it obvious
that it's not doing what was asked at mount time ...
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

a30eec2a

ext4: don't return to userspace after freezing the fs with a mutex held · 6b0310fb

Eric Sandeen authored May 16, 2010

ext4_freeze() used jbd2_journal_lock_updates() which takes
the j_barrier mutex, and then returns to userspace.  The
kernel does not like this:

================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
lvcreate/1075 is leaving the kernel with locks still held!
1 lock held by lvcreate/1075:
 #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>]
jbd2_journal_lock_updates+0xe1/0xf0

Use vfs_check_frozen() added to ext4_journal_start_sb() and
ext4_force_commit() instead.

Addresses-Red-Hat-Bugzilla: #568503
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

6b0310fb

ext4: symlink must be handled via filesystem specific operation · 256a4535

Dmitry Monakhov authored May 16, 2010

generic setattr implementation is no longer responsible for
quota transfer so synlinks must be handled via ext4_setattr.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

256a4535

ext4: check s_log_groups_per_flex in online resize code · 42007efd

Eric Sandeen authored May 16, 2010

If groups_per_flex < 2, sbi->s_flex_groups[] doesn't get filled out,
and every other access to this first tests s_log_groups_per_flex;
same thing needs to happen in resize or we'll wander off into
a null pointer when doing an online resize of the file system.

Thanks to Christoph Biedl, who came up with the trivial testcase:

# truncate --size 128M fsfile
# mkfs.ext3 -F fsfile
# tune2fs -O extents,uninit_bg,dir_index,flex_bg,huge_file,dir_nlink,extra_isize fsfile
# e2fsck -yDf -C0 fsfile
# truncate --size 132M fsfile
# losetup /dev/loop0 fsfile
# mount /dev/loop0 mnt
# resize2fs -p /dev/loop0

	https://bugzilla.kernel.org/show_bug.cgi?id=13549Reported-by: Alessandro Polverini <alex@nibbles.it>
Test-case-by: Christoph Biedl  <bugzilla.kernel.bpeb@manchmal.in-ulm.de>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

42007efd

ext4: fix quota accounting in case of fallocate · 35121c98

Dmitry Monakhov authored May 16, 2010

allocated_meta_data is already included in 'used' variable.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

35121c98

15 May, 2010 1 commit

ext4: allow defrag (EXT4_IOC_MOVE_EXT) in 32bit compat mode · b684b2ee

Christian Borntraeger authored May 15, 2010

I have an x86_64 kernel with i386 userspace. e4defrag fails on the
EXT4_IOC_MOVE_EXT ioctl because it is not wired up for the compat
case. It seems that struct move_extent is compat save, only types
with fixed widths are used:
{
        __u32 reserved;         /* should be zero */
        __u32 donor_fd;         /* donor file descriptor */
        __u64 orig_start;       /* logical start offset in block for orig */
        __u64 donor_start;      /* logical start offset in block for donor */
        __u64 len;              /* block length to be moved */
        __u64 moved_len;        /* moved block length */
};

Lets just wire up EXT4_IOC_MOVE_EXT for the compat case.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
CC: Akira Fujita <a-fujita@rs.jp.nec.com>

b684b2ee

14 May, 2010 1 commit

ext4: rename ext4_mb_release_desc() to ext4_mb_unload_buddy() · e39e07fd

Jing Zhang authored May 14, 2010

This function cleans up after ext4_mb_load_buddy(), so the renaming
makes the code clearer.
Signed-off-by: Jing Zhang <zj.barak@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

e39e07fd

13 May, 2010 1 commit
- ext4: Remove unnecessary call to ext4_get_group_desc() in mballoc · 62e823a2
  Jing Zhang authored May 13, 2010
```
Signed-off-by: Jing Zhang <zj.barak@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
  62e823a2
12 May, 2010 1 commit

ext4: fix memory leaks in error path handling of ext4_ext_zeroout() · b720303d

Jing Zhang authored May 12, 2010

When EIO occurs after bio is submitted, there is no memory free
operation for bio, which results in memory leakage. And there is also
no check against bio_alloc() for bio.
Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Jing Zhang <zj.barak@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

b720303d

11 May, 2010 1 commit

ext4: Fix coding style in fs/ext4/move_extent.c · c26d0bad

Steven Liu authored May 11, 2010

Making sure ee_block is initialized to zero to prevent gcc from
kvetching.  It's harmless (although it's not obvious that it's
harmless) from code inspection:

fs/ext4/move_extent.c:478: warning: 'start_ext.ee_block' may be used
uninitialized in this function

Thanks to Stefan Richter for first bringing this to the attention of
linux-ext4@vger.kernel.org.
Signed-off-by: LiuQi <lingjiujianke@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>

c26d0bad

10 May, 2010 2 commits
- ext4: check missed return value in ext4_sync_file() · 0671e704
  Dmitry Monakhov authored May 10, 2010
```
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
  0671e704
- Linux 2.6.34-rc7 · b57f95a3
  Linus Torvalds authored May 09, 2010
  
  b57f95a3