Commits · 3d3d87471e1f45e3951c4860659cc4495cdafe6d · nexedi / linux

An error occurred fetching the project authors.

19 Oct, 2004 1 commit

[PATCH] unreachable code in ext3_direct_IO() · 3d3d8747

Andrew Morton authored 20 years ago

davej points out that in this code local variable `ret' is already known to be
positive non-zero, so this test is meaningless.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

3d3d8747

16 Oct, 2004 1 commit

[PATCH] ext3 direct io assert fix · 10a27261

Andrew Morton authored 20 years ago

Fix bug identified by Badari Pulavarty <pbadari@us.ibm.com>

Local variable `handle' will become stale if ext3_direct_io_get_blocks()
closes off the current transaction and starts a new one.  This causes a BUG in
journal_stop().

So reacquire the handle from *current after performing the I/O.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

10a27261

22 Sep, 2004 1 commit

[PATCH] ext3 endianness annotations and bugfixes · 5c520b7f

Alexander Viro authored 20 years ago

	* missing le32_to_cpu() in a bunch of printks
	* on big-endian boxen ext3_error() failed to set EXT3_ERROR_FS in
->s_state (cpu_to_le32() instead of cpu_to_le16())
Signed-off-by: Al Viro <viro@parcelfarce.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

5c520b7f

17 Sep, 2004 1 commit

[PATCH] fix for fsync ignoring writing errors · 73441a0e

Andrey V. Savochkin authored 20 years ago

Currently metadata writing errors are ignored and not returned from
sys_fsync on ext2 and ext3 filesystems.  That is, at least ext2 and ext3.

Both ext2 and ext3 resort to sync_inode() in their ->sync_inode method,
which in turn calls ->write_inode.  ->write_inode method has void type, and
any IO errors happening inside are lost.

Make ->write_inode return the error code?
Signed-off-by: Andrey Savochkin <saw@saw.sw.com.sg>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

73441a0e

27 Aug, 2004 1 commit

[PATCH] Add a few might_sleep() checks · 026a14f0

Ingo Molnar authored 20 years ago

Add a whole bunch more might_sleep() checks.  We also enable might_sleep()
checking in copy_*_user().  This was non-trivial because of the "copy_*_user()
in atomic regions" trick would generate false positives.  Fix that up by
adding a new __copy_*_user_inatomic(), which avoids the might_sleep() check.

Only i386 is supported in this patch.

With: Arjan van de Ven <arjanv@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

026a14f0

30 Jun, 2004 1 commit
- [PATCH] sparse: NULL vs 0 - filesystems · a37f4989
  Mika Kukkonen authored 20 years ago
  
  a37f4989
27 Jun, 2004 1 commit

[PATCH] ext3: direct-io transaction extending fix · 375f73f9

Andrew Morton authored 20 years ago

ext3_direct_io_get_blocks() is misinterpreting the return value from
ext3_journal_extend(), and is consequently running out of buffer credits and
going BUG on tremendously large direct-io writes. Fix that up.

Also, I note that the really large direct-io writes can hold a transaction
open for the entire duration, which can be minutes. This violates ext3's
attempt to commit data at regular intervals. Fix that up by looking at the
transaction state: if it's T_LOCKED, shut off the current handle so the
pending commit can complete.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

375f73f9

18 Jun, 2004 1 commit

[PATCH] Ext3: Retry allocation after transaction commit (v2) · 5c4ad014

Theodore Y. Ts'o authored 20 years ago

Here is a reworked version of my patch to ext3 to retry certain filesystem
operations after an ENOSPC error.  The ext3_should_retry_alloc() function will
not wait on the currently running transaction if there is a currently active
handle; hence this should avoid deadlocks in the Lustre use case.  The patch
is versus BK-recent.

I've also included a simple, reliable test case which demonstrates the problem
this patch is intended to fix.  (Note that BK-recent is not sufficient to
address this test case, and waiting on the commiting transaction in
ext3_new_block is also not sufficient.  Been there, tried that, didn't work.
We need to do the full-bore retry from the top level.  The
ext3_should_retry_alloc() will only wait on the committing transaction if
there is an active handle; hence Lustre will probably also need to use
ext3_should_retry_alloc() if it wants to reliably avoid this particular
problem.)

#!/bin/sh
#
#
TEST_DIR=/tmp
IMAGE=$TEST_DIR/retry.img
MNTPT=$TEST_DIR/retry.mnt
TEST_SRC=/usr/projects/e2fsprogs/e2fsprogs/build
MKE2FS_OPTS=""
IMAGE_SIZE=8192

umount $MNTPT
dd if=/dev/zero of=$IMAGE bs=4k count=$IMAGE_SIZE
mke2fs -j -F $MKE2FS_OPTS $IMAGE 

function test_log ()
{
	echo $*
	logger -p local4.notice $*
}

mkdir -p $MNTPT
mount -o loop -t ext3 $IMAGE $MNTPT
test_log Retry test: BEGIN
for i in `seq 1 3`
do
	test_log "Retry test: Loop $i"
	echo 2 > /proc/sys/fs/jbd-debug
	while ! mkdir -p $MNTPT/foo/bar
	do
		test_log "Retry test: mkdir failed"
		sleep 1
	done
	echo 0 > /proc/sys/fs/jbd-debug
	cp -r $TEST_SRC $MNTPT/foo/bar 2> /dev/null
	rm -rf $MNTPT/*
done
umount $MNTPT
test_log "Retry test: END"


akpm@osdl.org

  Rework the code to make it a formal JBD API entry point.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

5c4ad014

25 May, 2004 1 commit

[PATCH] ext3: remove duplicated ext3_std_error() call · 377cb3de

Andrew Morton authored 20 years ago

From: Andi Kleen <ak@muc.de>

When start_transaction() detects an error it already calls ext3_std_error. 
No need to do it again in the caller.

377cb3de

19 May, 2004 1 commit

[PATCH] use-before-uninitialized value in ext3(2)_find_ goal · 83ee50f5

Andrew Morton authored 20 years ago

From: Mingming Cao <cmm@us.ibm.com>

There is a uninitialized goal value being referenced in both ext3 and ext2
find goal block functions (ext3_find_goal() and ext2_find_goal()).

In the non-sequential write case, these functions check the goal value(non
zero) before calling ext3(2)_find_near() to find the goal block to
allocate.

Since the goal value is uninitialized(non zero), the ext3(2)_find_near() is
never being called in the non-sequential write, thus ext3(2)_find_goal()
failed to guide a goal block in the random write case.

ext3(2)_new_block() takes the junk goal value and will turn it to goal 0
since it's normally beyond the filesystem block number limit.  The fix is
trivial.

83ee50f5

10 May, 2004 1 commit

[PATCH] Fix ext3 bogus ENOSPC · e736428d

Andrew Morton authored 20 years ago

With strange workloads which do a lot of quick truncation on small filesystems
it is possible to get into a situation where there are free blocks on the
disk, but they are not allocatable at this time due to their having been freed
up in the current JBD transaction. Applications get unexpected ENOSPC errors.

We can fix that with this patch, originally by Andreas Dilger which forces a
single commit+retry when an ENOSPC is encountered.

e736428d

22 Apr, 2004 1 commit

[PATCH] writeback livelock fix · 1ed73535

Andrew Morton authored 20 years ago

If a filesystem's ->writepage implementation repeatedly refuses to write the
page (it keeps on redirtying it instead) (reiserfs seems to do this) then the
writeback logic can get stuck repeately trying to write the same page.

Fix that up by correctly setting wbc->pages_skipped, to tell the writeback
logic that things aren't working out.

1ed73535

19 Apr, 2004 1 commit

[PATCH] direct-IO return type fixes · 59fed502

Andrew Morton authored 20 years ago

From: me, Badari Pulavarty <pbadari@us.ibm.com>

Currently a direct-IO read or write of more than 2G on 64-bit machines is
broken. Replace int with ssize_t in various places to fix that up.

59fed502

17 Apr, 2004 1 commit
- [PATCH] remove buffer_error() · 4f990f49
  Andrew Morton authored 20 years ago
```
From: Jeff Garzik <jgarzik@pobox.com>

It was debug code, no longer required.
```
  4f990f49
15 Apr, 2004 1 commit

[PATCH] ext3: journalled quotas · 2df2c24a

Andrew Morton authored 20 years ago

From: Jan Kara <jack@ucw.cz>

Journalled quota support for ext3: The patch consists of two parts - ext3
changes and changes in generic quota code. The main idea of the changes is
that a transaction is always started before any operation which changes quota
file and dirtifying of the quota causes its write to disk. These two changes
assure that quota change is journalled into the same transaction as the file
change and hence after journal replay quota is consistent with the filesystem
state. As during journal replay inodes from orphan list are deleted/truncated
we have to do quota_on before the replay of the orphan list - this problem is
solved by additional mount options to ext3 with quota file names and format.

Some changes in generic code were also needed to assure that quota structure
in file is always allocated and so ordinary quota operations (like
adding/deleting a block/inode) need only a few blocks from the transaction.

2df2c24a

20 Jan, 2004 1 commit

[PATCH] ext3: update a_ops when running `chattr +j' · fa85002b

Andrew Morton authored 21 years ago

From: Jan Kara <jack@suse.cz>

Journalled-data files need a different set of address_space_operations, so
we need to update the file's aops when someone runs `chattr +j' on the
file.

fa85002b

19 Jan, 2004 1 commit

[PATCH] bdev: switch to f_mapping · 32d66678

Andrew Morton authored 21 years ago

From: viro@parcelfarce.linux.theplanet.co.uk <viro@parcelfarce.linux.theplanet.co.uk>

A lot of places used to use ->f_dentry->d_inode->i_mapping all over the
place. Replaced with use of ->f_mapping. For now - just the places where we
literally could do search-and-replace.

32d66678

16 Oct, 2003 1 commit

[PATCH] ext3: i_disksize locking fix · 87e628f7

Andrew Morton authored 21 years ago

From: Alex Tomas <alex@clusterfs.com>

The setting of i_disksize can race against concurrent invokations of
ext3_get_block().  Moving this inside i_truncate_sem fixes it up.

87e628f7

05 Oct, 2003 1 commit

[PATCH] ext3 block allocator locking fix · 3101501b

Andrew Morton authored 21 years ago

When the BKL was removed from ext3 we lost locking coverage for
get_block()-versus-get_block().  Nobody seems to have hit the race because
get_block() almost always runs under i_sem: only memory pressure-based
writeout over a file hole runs outside i_sem.

ext2 uses the dedicated i_meta_lock spinlock in the inode to provide the
needed locking.  But ext3 already has an rwsem around all the get_block()
activity to protect it from truncate-related races.

So this patch just converts that rwsem into a semaphore, so concurrent
get_block() can never occur.  This will be more efficient than adding the new
spinlock.

We lose the ability to have two threads run get_block() against the same file
at the same time but again, that only happens during pageout over a hole
anyway.

(Kudos Alex Tomas for noticing the bug)

3101501b

01 Oct, 2003 1 commit

[PATCH] dev_t forward compatibility fix · 1885b3f1

Andrew Morton authored 21 years ago

From: Andries.Brouwer@cwi.nl

ext2 used a 32-bit field for dev_t, with possibly undefined storage
following; thus, no action was required to go to 32-bit dev_t, but going to
64-bit dev_t required some subtlety: 0 was written in the first word and
the 64 bits in the following two.  Al truncated my 64-bit stuff to 32 bits
but did not understand why there was this split, and wrote 0 followed by a
single word.  We should at least zero the word following to have
well-defined storage later.

1885b3f1

23 Sep, 2003 1 commit

[PATCH] 32-bit dev_t: switch-over · 1c2c2a8f

Alexander Viro authored 21 years ago

Real conversion to 32bit dev_t.  Expansion to:
	* mknod() - 32
	* newstat() - 32 on 64bit platforms
	* stat64() - 32 on mips, 64 on everything else (mips has weird struct
stat64 and can't get more than 32 bits).  Note that right now the difference
is purely theoretical - we don't have internal values above 32 bits, so
huge_... vs. new_... only marks the places where 64bit conversion will need
extra work.
	* arch-dependent stat variants - depending on width available.
	* ustat et.al. - 32
	* filesystems that can handle 32 bits right now - 32
	* ext2 and ext3 - 32, with large dev_t inodes having 0 in the first
element of i_data[] (where we store dev_t value for small device numbers) and
keeping the value in the second element.
	* nfsd - 32; it can be driven to 64, but we'll get several issues with
NFSv2 support.
	* RAID - 32
	* devmapper - with v1 it's still 16 (nothing to do here), with v4 it's
64.
	* loop - 64
	* initramfs - 32
	* do_mounts code - 32.  Parts that scan devfs tree are using newstat()
on 64bit platforms and stat64() on the rest (IOW, the latest stat variant on
given platform).
	* old_valid_dev()/new_valid_dev() added where needed (stat variants,
mostly - we fail with -EOVERFLOW if values do not fit).

1c2c2a8f

05 Sep, 2003 2 commits

[PATCH] large dev_t - second series (15/15) · a1f6ff21

Alexander Viro authored 21 years ago

old_decode_dev()/old_encode_dev() added where needed in other
filesystems. Parts in different filesystems are independent, but IMO
it's not worse splitting into a dozen of half-kilobyte patches.

a1f6ff21

[PATCH] large dev_t - second series (7/15) · ad1da81a
Alexander Viro authored 21 years ago
```
	the last kdev_t object is gone; ->i_rdev switched to dev_t.
```
ad1da81a

19 Aug, 2003 1 commit

[PATCH] async write errors: report truncate and io errors on · fe7e689f

Andrew Morton authored 21 years ago

From: Oliver Xymoron <oxymoron@waste.org>

These patches add the infrastructure for reporting asynchronous write errors
to block devices to userspace.  Error which are detected due to pdflush or VM
writeout are reported at the next fsync, fdatasync, or msync on the given
file, and on close if the error occurs in time.

We do this by propagating any errors into page->mapping->error when they are
detected.  In fsync(), msync(), fdatasync() and close() we return that error
and zero it out.


The Open Group say close() _may_ fail if an I/O error occurred while reading
from or writing to the file system.  Well, in this implementation close() can
return -EIO or -ENOSPC.  And in that case it will succeed, not fail - perhaps
that is what they meant.


There are three patches in this series and testing has only been performed
with all three applied.

fe7e689f

01 Aug, 2003 4 commits

[PATCH] don't init statics to 0 (fs/) · 9cf89014

Randy Dunlap authored 21 years ago

From: Leann Ogasawara <ogasawara@osdl.org>

Uninitialize static variables initialized to 0 so they are pushed to the
.bss instead of .data.

9cf89014

[PATCH] direct-io support for XFS unwritten extents · 359a5de1

Andrew Morton authored 21 years ago

From: Nathan Scott <nathans@sgi.com>

This patch adds a mechanism by which a filesystem can register an interest in
the completion of direct I/O. The completion routine will be given the
inode, an offset and a length, and an optional filesystem-private field.

We have extended the use of the buffer_head-based interface (i.e.
get_block_t) for direct I/O such that the b_private field is now utilised.
It is defined to be initially zero at the start of I/O, and will be passed
into the filesystem unmodified by the VFS with each map request, while
setting up the direct I/O. Once I/O has completed the final value of this
pointer will be passed into a filesystems I/O completion handler. This
mechanism can be used to keep track of all of the mapping requests which
encompass an individual direct I/O request.

This has been implemented specifically for XFS, but is done so as to be as
generic as possible. XFS uses this mechanism to provide support for
unwritten extents - these are file extents which have been pre-allocated
on-disk, but not yet written to (once written, these become regular file
extents, but only once I/O is complete).

359a5de1

[PATCH] Fix race in ext3_getblk · 77b070cb

Andrew Morton authored 21 years ago

From: Alex Tomas <bzzz@tmi.comex.ru>

ext3_getblk() memsets a newly allocated buffer, but forgets to check
whether a different thread brought it uptodate while we waited for the
buffer lock.

It's OK normally because we're serialised by the page lock.  But lustre
apparently is doing something different with getblk and hits this race.

Plus I suspect it's racy with competing O_DIRECT writes.

77b070cb

[PATCH] ext3: avoid reading empty inode blocks · bca17d03

Andrew Morton authored 21 years ago

From: Alex Tomas <bzzz@tmi.comex.ru>

ext3_get_inode_loc() read inode's block only if:

  1) this inode has no copy in memory
  2) inode's block has another valid inode(s)

this optimization allows to avoid needless I/O in two cases:

1) just allocated inode is first valid in the inode's block

2) kernel wants to write inode, but buffer in which inode
   belongs to gets freed by VM

bca17d03

25 Jul, 2003 1 commit
- Add an IO completion handler to the direct_IO path to allow the initiator · 48d86a41
  Stephen Lord authored 21 years ago
```
to take an action at completion time. XFS uses this to 
```
  48d86a41
10 Jul, 2003 1 commit

[PATCH] i_size atomic access · eafe5916

Andrew Morton authored 21 years ago

From: Daniel McNeil <daniel@osdl.org>

This adds i_seqcount to the inode structure and then uses i_size_read() and
i_size_write() to provide atomic access to i_size.  This is a port of
Andrea Arcangeli's i_size atomic access patch from 2.4.  This only uses the
generic reader/writer consistent mechanism.

Before:
mnm:/usr/src/25> size vmlinux
   text    data     bss     dec     hex filename
2229582 1027683  162436 3419701  342e35 vmlinux

After:
mnm:/usr/src/25> size vmlinux
   text    data     bss     dec     hex filename
2225642 1027655  162436 3415733  341eb5 vmlinux

3.9k more text, a lot of it fastpath :(

It's a very minor bug, and the fix has a fairly non-minor cost.  The most
compelling reason for fixing this is that writepage() checks i_size.  If it
sees a transient value it may decide that page is outside i_size and will
refuse to write it.  Lost user data.

eafe5916

03 Jul, 2003 1 commit

Re-organize "ext3_get_inode_loc()" and make it easier to · 9c67eccb

Linus Torvalds authored 21 years ago

follow by splitting it into two functions: one that calculates
the position, and the other that actually reads the inode
block off the disk.

9c67eccb

25 Jun, 2003 1 commit

[PATCH] ext3: fix page lock vs journal_start ranking bug · 30276fd6

Andrew Morton authored 21 years ago

ext3_block_truncate_page() is calling grab_cache_page() inside a JBD
transaction. This is wrong, because transactions nest inside lock_page().

The deadlock is against shrink_list->ext3_journalled_writepage->journal_start.

This was not noticed before because we never used to journal writepage() data
in journalled-data mode. And because the deadlock against
generic_file_write() is covered up by i_sem.

Rework things so that we lock the page prior to starting a transaction.

30276fd6

20 Jun, 2003 1 commit
- [PATCH] ext3/JBD: remove trailing whitespace · f5d256f8
  Andrew Morton authored 21 years ago
```
ext3 and JBD still have enormous numbers of lines which end in tabs.  Fix
them all up.
```
  f5d256f8
18 Jun, 2003 6 commits

[PATCH] ext3: disable O_DIRECT in journalled-data mode · 45c22f8f

Andrew Morton authored 21 years ago

We cannot sensibly support O_DIRECT reads or writes when all writes are
journalled.

This is because the VFS explicitly avoids syncing the file metadata during
O_DIRECT reads and writes.  ext3 with journalled data will leave pending
changes in memory and they will overwrite the results of O_DIRECT writes, and
O_DIRECT reads will not return the latest data.

Setting the a_op to null will cause opens and fcntl(F_SETFL) to return
-EINVAL if O_DIRECT is requested.

45c22f8f

[PATCH] ext3: fix data=journal for small blocksize · 319a1ad4

Andrew Morton authored 21 years ago

Fix various problems which cropped up due to MAP_SHARED traffic on
data=journal with blocksize < PAGE_CACHE_SIZE.

All relate to handling the "pending truncate" buffers outside i_size.

319a1ad4

[PATCH] ext3: add a dump_stack() · 4308a50e
Andrew Morton authored 21 years ago
```
add a dump_stack() to a can't-happen path which happened during development.
```
4308a50e

[PATCH] ext3: fix data=journal mode · de285c52

Andrew Morton authored 21 years ago

ext3's fully data-journalled mode has been broken for a year.  This patch
fixes it up.

The prepare_write/commit_write/writepage implementations have been split up.
Instead of having each function handle all three journalling mode we now have
three separate sets of address_space_operations.

The problematic part of data=journal is MAP_SHARED writepage traffic: pages
which don't have buffers.  In 2.4 these were cheatingly treated as
data-ordered buffers and that caused several nasty problems.

Here we do it properly: writepage traffic is fully journalled.  This means
that the various workarounds for the 2.4 scheme can be removed, when I
remember where they all are.

The PG_checked flag has been borrowed: it it set in the atomic set_page_dirty
a_op to tell the subsequent writepage() that this page needs to have buffers
attached, dirtied and journalled.

This rather defines PG_checked as "fs-private info in page->flags" and it
should be renamed sometime.

de285c52

[PATCH] ext3: ext3_writepage race fix · dd71e33f

Andrew Morton authored 21 years ago

After ext3_writepage() has called block_write_full_page() it will walk the
page's buffer ring dropping the buffer_head refcounts.

It does this wrong - on the final loop it will dereference the buffer_head
which it just dropped the refcount on.  Poisoned oopses have been seen
against bh->b_this_page.

Change it to take a local copy of b_this_page prior to dropping the bh's
refcount.

dd71e33f

[PATCH] ext3: move lock_kernel() down into the JBD layer. · 3307fbd1

Andrew Morton authored 21 years ago

This is the start of the ext3 scalability rework.  It basically comes in two
halves:

- ext3 BKL/lock_super removal and scalable inode/block allocators

- JBD locking rework.

The ext3 scalability work was completed a couple of months ago.

The JBD rework has been stable for a couple of weeks now.  My gut feeling is
that there should be one, maybe two bugs left in it, but no problems have
been discovered...


Performance-wise, throughput is increased by up to 2x on dual CPU.  10x on
16-way has been measured.  Given that current ext3 is able to chew two whole
CPUs spinning on locks on a 4-way, that wasn't especially suprising.

These patches were prepared by Alex Tomas <bzzz@tmi.comex.ru> and myself.


First patch: ext3 lock_kernel() removal.

The only reason why ext3 takes lock_kernel() is because it is requires by the
JBD API.

The patch removes the lock_kernels() from ext3 and pushes them down into JBD
itself.

3307fbd1

02 Jun, 2003 1 commit

[PATCH] ext3: fix for blocksize < PAGE_CACHE_SIZE · 32b5fa26

Andrew Morton authored 21 years ago

If a buffer_head is outside i_size, block_write_full_page() will leave it
!buffer_mapped().  We shouldn't attach that buffer to the transaction for
writeout!

This bug has been in 2.5 for some time.

32b5fa26