Commits · 296c355cd6443d89fa251885a8d78778fe111dc4 · nexedi / linux

30 Sep, 2009 1 commit

ext4: Use tracepoints for mb_history trace file · 296c355c

Theodore Ts'o authored Sep 30, 2009

The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
number of problems: it required a largish amount of memory to be
allocated for each ext4 filesystem, and the s_mb_history_lock
introduced a CPU contention problem.  

By ripping out the mb_history code and replacing it with ftrace
tracepoints, and we get more functionality: timestamps, event
filtering, the ability to correlate mballoc history with other ext4
tracepoints, etc.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

296c355c

29 Sep, 2009 4 commits

ext4, jbd2: Drop unneeded printks at mount and unmount time · 90576c0b

Theodore Ts'o authored Sep 29, 2009

There are a number of kernel printk's which are printed when an ext4
filesystem is mounted and unmounted.  Disable them to economize space
in the system logs.  In addition, disabling the mballoc stats by
default saves a number of unneeded atomic operations for every block
allocation or deallocation.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

90576c0b

ext4: Handle nested ext4_journal_start/stop calls without a journal · d3d1faf6

Curt Wohlgemuth authored Sep 29, 2009

This patch fixes a problem with handling nested calls to
ext4_journal_start/ext4_journal_stop, when there is no journal present.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

d3d1faf6

ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode · f3dc272f

Curt Wohlgemuth authored Sep 29, 2009

This patch a problem that ext4_dirty_inode() was not calling
ext4_mark_inode_dirty() if the current_handle is not valid, which it
is the case in no journal mode.

It also removes a test for non-matching transaction which can never
happen.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f3dc272f

ext4: Avoid updating the inode table bh twice in no journal mode · 830156c7

Frank Mayhar authored Sep 29, 2009

This is a cleanup of commit 91ac6f43.  Since ext4_mark_inode_dirty()
has already called ext4_mark_iloc_dirty(), which in turn calls
ext4_do_update_inode(), it's not necessary to have ext4_write_inode()
call ext4_do_update_inode() in no journal mode.  Indeed, it would be
duplicated work.
Reviewed-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Frank Mayhar <fmayhar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

830156c7

28 Sep, 2009 5 commits

ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first · f3ce8064

Theodore Ts'o authored Sep 28, 2009

Move the check to make sure the original and donor inodes are
different earlier, to avoid a potential deadlock by trying to lock the
same inode twice.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f3ce8064

ext4: async direct IO for holes and fallocate support · 8d5d02e6

Mingming Cao authored Sep 28, 2009

For async direct IO that covers holes or fallocate, the end_io
callback function now queued the convertion work on workqueue but
don't flush the work rightaway as it might take too long to afford.

But when fsync is called after all the data is completed, user expects
the metadata also being updated before fsync returns.

Thus we need to flush the conversion work when fsync() is called.
This patch keep track of a listed of completed async direct io that
has a work queued on workqueue.  When fsync() is called, it will go
through the list and do the conversion.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>

8d5d02e6

ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O · 4c0425ff

Mingming Cao authored Sep 28, 2009

Currently the DIO VFS code passes create = 0 when writing to the
middle of file.  It does this to avoid block allocation for holes, so
as not to expose stale data out when there is a parallel buffered read
(which does not hold the i_mutex lock).  Direct I/O writes into holes
falls back to buffered IO for this reason.

Since preallocated extents are treated as holes when doing a
get_block() look up (buffer is not mapped), direct IO over fallocate
also falls back to buffered IO.  Thus ext4 actually silently falls
back to buffered IO in above two cases, which is undesirable.

To fix this, this patch creates unitialized extents when a direct I/O
write into holes in sparse files, and registering an end_io callback which
converts the uninitialized extent to an initialized extent after the
I/O is completed.
Singed-Off-By: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

4c0425ff

ext4: Split uninitialized extents for direct I/O · 0031462b

Mingming Cao authored Sep 28, 2009

When writing into an unitialized extent via direct I/O, and the direct
I/O doesn't exactly cover the unitialized extent, split the extent
into uninitialized and initialized extents before submitting the I/O.
This avoids needing to deal with an ENOSPC error in the end_io
callback that gets used for direct I/O.

When the IO is complete, the written extent will be marked as initialized.

Singed-Off-By: Mingming Cao <cmm@us.ibm.com> 
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

0031462b

ext4: release reserved quota when block reservation for delalloc retry · 9f0ccfd8

Mingming Cao authored Sep 28, 2009

ext4_da_reserve_space() can reserve quota blocks multiple times if
ext4_claim_free_blocks() fail and we retry the allocation. We should
release the quota reservation before restarting.

Bug found by Jan Kara.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

9f0ccfd8

29 Sep, 2009 1 commit

ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks · 55138e0b

Theodore Ts'o authored Sep 29, 2009

Work around problems in the writeback code to force out writebacks in
larger chunks than just 4mb, which is just too small.  This also works
around limitations in the ext4 block allocator, which can't allocate
more than 2048 blocks at a time.  So we need to defeat the round-robin
characteristics of the writeback code and try to write out as many
blocks in one inode before allowing the writeback code to move on to
another inode.  We add a a new per-filesystem tunable,
max_writeback_mb_bump, which caps this to a default of 128mb per
inode.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

55138e0b

28 Sep, 2009 1 commit

ext4: Fix hueristic which avoids group preallocation for closed files · 71780577

Theodore Ts'o authored Sep 28, 2009

The hueristic was designed to avoid using locality group preallocation
when writing the last segment of a closed file.  Fix it by move
setting size to the maximum of size and isize until after we check
whether size == isize.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

71780577

26 Sep, 2009 1 commit

ext4: Use ext4_msg() for ext4_da_writepage() errors · 1693918e

Theodore Ts'o authored Sep 26, 2009

This allows the user to see what filesystem was involved with a
particular ext4_da_writepage() error.  Also, use KERN_CRIT which is
more appropriate than KERN_EMERG.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

1693918e

29 Sep, 2009 1 commit
- ext4: Update documentation about quota mount options · 83653888
  Jan Kara authored Sep 29, 2009
```
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
  83653888
25 Sep, 2009 26 commits

Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block · 6d7f18f6

Linus Torvalds authored Sep 25, 2009

* 'writeback' of git://git.kernel.dk/linux-2.6-block:
  writeback: writeback_inodes_sb() should use bdi_start_writeback()
  writeback: don't delay inodes redirtied by a fast dirtier
  writeback: make the super_block pinning more efficient
  writeback: don't resort for a single super_block in move_expired_inodes()
  writeback: move inodes from one super_block together
  writeback: get rid to incorrect references to pdflush in comments
  writeback: improve readability of the wb_writeback() continue/break logic
  writeback: cleanup writeback_single_inode()
  writeback: kupdate writeback shall not stop when more io is possible
  writeback: stop background writeback when below background threshold
  writeback: balance_dirty_pages() shall write more than dirtied pages
  fs: Fix busyloop in wb_writeback()

6d7f18f6

writeback: writeback_inodes_sb() should use bdi_start_writeback() · 56a131dc

Jens Axboe authored Sep 25, 2009

Pointless to iterate other devices looking for a super, when
we have a bdi mapping.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

56a131dc

writeback: don't delay inodes redirtied by a fast dirtier · b3af9468

Wu Fengguang authored Sep 25, 2009

Debug traces show that in per-bdi writeback, the inode under writeback
almost always get redirtied by a busy dirtier.  We used to call
redirty_tail() in this case, which could delay inode for up to 30s.

This is unacceptable because it now happens so frequently for plain cp/dd,
that the accumulated delays could make writeback of big files very slow.

So let's distinguish between data redirty and metadata only redirty.
The first one is caused by a busy dirtier, while the latter one could
happen in XFS, NFS, etc. when they are doing delalloc or updating isize.

The inode being busy dirtied will now be requeued for next io, while
the inode being redirtied by fs will continue to be delayed to avoid
repeated IO.

CC: Jan Kara <jack@suse.cz>
CC: Theodore Ts'o <tytso@mit.edu>
CC: Dave Chinner <david@fromorbit.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

b3af9468

writeback: make the super_block pinning more efficient · 9ecc2738

Jens Axboe authored Sep 24, 2009

Currently we pin the inode->i_sb for every single inode. This
increases cache traffic on sb->s_umount sem. Lets instead
cache the inode sb pin state and keep the super_block pinned
for as long as keep writing out inodes from the same
super_block.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

9ecc2738

writeback: don't resort for a single super_block in move_expired_inodes() · cf137307

Jens Axboe authored Sep 24, 2009

If we only moved inodes from a single super_block to the temporary
list, there's no point in doing a resort for multiple super_blocks.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

cf137307

writeback: move inodes from one super_block together · 5c03449d

Shaohua Li authored Sep 24, 2009

__mark_inode_dirty adds inode to wb dirty list in random order. If a disk has
several partitions, writeback might keep spindle moving between partitions.
To reduce the move, better write big chunk of one partition and then move to
another. Inodes from one fs usually are in one partion, so idealy move indoes
from one fs together should reduce spindle move. This patch tries to address
this. Before per-bdi writeback is added, the behavior is write indoes
from one fs first and then another, so the patch restores previous behavior.
The loop in the patch is a bit ugly, should we add a dirty list for each
superblock in bdi_writeback?

Test in a two partition disk with attached fio script shows about 3% ~ 6%
improvement.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

5c03449d

writeback: get rid to incorrect references to pdflush in comments · 5b0830cb
Jens Axboe authored Sep 23, 2009
```
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
5b0830cb

writeback: improve readability of the wb_writeback() continue/break logic · 71fd05a8

Jens Axboe authored Sep 23, 2009

And throw some comments in there, too.
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

71fd05a8

writeback: cleanup writeback_single_inode() · ae1b7f7d

Wu Fengguang authored Sep 23, 2009

Make the if-else straight in writeback_single_inode().
No behavior change.

Cc: Jan Kara <jack@suse.cz>
Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ae1b7f7d

writeback: kupdate writeback shall not stop when more io is possible · 7fbdea32

Wu Fengguang authored Sep 23, 2009

Fix the kupdate case, which disregards wbc.more_io and stop writeback
prematurely even when there are more inodes to be synced.

wbc.more_io should always be respected.

Also remove the pages_skipped check. It will set when some page(s) of some
inode(s) cannot be written for now. Such inodes will be delayed for a while.
This variable has nothing to do with whether there are other writeable inodes.

CC: Jan Kara <jack@suse.cz>
CC: Dave Chinner <david@fromorbit.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

7fbdea32

writeback: stop background writeback when below background threshold · d3ddec76

Wu Fengguang authored Sep 23, 2009

Treat bdi_start_writeback(0) as a special request to do background write,
and stop such work when we are below the background dirty threshold.

Also simplify the (nr_pages <= 0) checks. Since we already pass in
nr_pages=LONG_MAX for WB_SYNC_ALL and background writes, we don't
need to worry about it being decreased to zero.
Reported-by: Richard Kennedy <richard@rsk.demon.co.uk>
CC: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

d3ddec76

writeback: balance_dirty_pages() shall write more than dirtied pages · 3a2e9a5a

Wu Fengguang authored Sep 23, 2009

Some filesystem may choose to write much more than ratelimit_pages
before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
determine number to write based on real number of dirtied pages.

Otherwise it is possible that
  loop {
    btrfs_file_write():     dirty 1024 pages
    balance_dirty_pages():  write up to 48 pages (= ratelimit_pages * 1.5)
  }
in which the writeback rate cannot keep up with dirty rate, and the
dirty pages go all the way beyond dirty_thresh.

The increased write_chunk may make the dirtier more bumpy.
So filesystems shall be take care not to dirty too much at
a time (eg. > 4MB) without checking the ratelimit.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

3a2e9a5a

fs: Fix busyloop in wb_writeback() · a5989bdc

Jan Kara authored Sep 16, 2009

If all inodes are under writeback (e.g. in case when there's only one inode
with dirty pages), wb_writeback() with WB_SYNC_NONE work basically degrades
to busylooping until I_SYNC flags of the inode is cleared. Fix the problem by
waiting on I_SYNC flags of an inode on b_more_io list in case we failed to
write anything.
Tested-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

a5989bdc

Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 · 53cddfcc

Linus Torvalds authored Sep 25, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh_mobile_ceu_camera: fix compile breakage, caused by a bad merge
  sh: Add support DMA Engine to SH7780
  sh: Add support DMA Engine to SH7722
  sh: enable onenand support in kfr2r09 defconfig.
  sh: update defconfigs.
  sh: add FSI driver support for ms7724se
  sh: Fix up uninitialized variable use caught by gcc 4.4.
  sh: Handle unaligned 16-bit instructions on SH-2A.
  sh: mach-ecovec24: Add active low setting for sh_eth
  sh: includecheck fix: dwarf.c

53cddfcc

Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog · c09c2d10

Linus Torvalds authored Sep 25, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  [WATCHDOG] Add support for the Avionic Design Xanthos watchdog timer.

c09c2d10

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 5c3cc208

Linus Torvalds authored Sep 25, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (94 commits)
  genetlink: fix netns vs. netlink table locking (2)
  3c59x: Get rid of "Trying to free already-free IRQ"
  tunnel: eliminate recursion field
  ems_pci: fix size of CAN controllers BAR mapping for CPC-PCI v2
  net: fix htmldocs sunrpc, clnt.c
  Phonet: error on broadcast sending (unimplemented)
  Phonet: fix race for port number in concurrent bind()
  pktgen: better scheduler friendliness
  pktgen: T_TERMINATE flag is unused
  ipv4: check optlen for IP_MULTICAST_IF option
  ath9k: Initialize txgain and rxgain for newer AR9287 chipsets.
  iwlagn: fix panic in iwl{5000,4965}_rx_reply_tx
  ath9k: Fix RFKILL bugs
  drivers/net/wireless: Use usb_endpoint_dir_out
  cfg80211: don't overwrite privacy setting
  wl12xx: fix kconfig/link errors
  rt2x00: fix the definition of rt2x00crypto_rx_insert_iv
  iwlwifi: reduce noise when skb allocation fails
  iwlwifi: do not send sync command while holding spinlock
  mac80211: fix DTIM setting
  ...

5c3cc208

[WATCHDOG] Add support for the Avionic Design Xanthos watchdog timer. · 38461c5c

Thierry Reding authored Sep 23, 2009

This patch adds support for the watchdog timer on Avionic Design Xanthos
boards.
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>

38461c5c

sh_mobile_ceu_camera: fix compile breakage, caused by a bad merge · a79aebfc

Guennadi Liakhovetski authored Sep 25, 2009

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

a79aebfc

sh: Add support DMA Engine to SH7780 · ecb6fd52

Nobuhiro Iwamatsu authored Mar 12, 2009

Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

ecb6fd52

sh: Add support DMA Engine to SH7722 · 8255fff4

Nobuhiro Iwamatsu authored Mar 12, 2009

Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

8255fff4

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 · c373ba99
Paul Mundt authored Sep 25, 2009

c373ba99
sh: enable onenand support in kfr2r09 defconfig. · 6f3529f0
Paul Mundt authored Sep 25, 2009
```
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
```
6f3529f0
sh: update defconfigs. · 5d65498b
Paul Mundt authored Sep 25, 2009
```
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
```
5d65498b

sh: add FSI driver support for ms7724se · 3e9ad52b

Kuninori Morimoto authored Aug 21, 2009

Signed-off-by: Kuninori Morimoto <morimoto.kuninori@renesas.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

3e9ad52b

Merge branch 'for-linus' of git://www.linux-m32r.org/git/takata/linux-2.6_dev · 851b147e

Linus Torvalds authored Sep 24, 2009

* 'for-linus' of git://www.linux-m32r.org/git/takata/linux-2.6_dev:
  m32r: Cleanup linker script using new linker script macros.
  m32r: Move the spi_stack_top and spu_stack_top into .init.data section.
  m32r: Remove unused .altinstructions and .exit.* code from linker script.
  m32r: Move GET_THREAD_INFO definition out of asm/thread_info.h.
  m32r: Define THREAD_SIZE only once.
  m32r: make PAGE_SIZE available to assembly.

851b147e

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 8e44e434

Linus Torvalds authored Sep 24, 2009

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  Fix build of cpm_uart due to core changes
  powerpc/8xx: Fix regression introduced by cache coherency rewrite
  powerpc/4xx: Fix erroneous xmon warning on PowerPC 4xx
  powerpc/mm: Fix 40x and 8xx vs. _PAGE_SPECIAL
  powerpc: Cleanup linker script using new linker script macros.
  powerpc: Fix ibm,client-architecture-support printout
  powerpc: Increase NODES_SHIFT on 64bit from 4 to 8
  powerpc/perf_counter: Fix vdso detection
  powerpc: Move 64bit heap above 1TB on machines with 1TB segments
  powerpc: Change archdata dma_data to a union
  powerpc: Rename get_dma_direct_offset get_dma_offset
  powerpc/mm: Remove duplicated #include
  powerpc/book3e-64: Remove duplicated #include
  powerpc: Check for unsupported relocs when using CONFIG_RELOCATABLE
  powerpc/pmc: Don't access lppaca on Book3E
  powerpc: kmalloc failure ignored in vio_build_iommu_table()
  hvc_console: Provide (un)locked version for hvc_resize()

8e44e434