An error occurred fetching the project authors.
- 19 Oct, 2004 1 commit
-
-
Andrew Morton authored
davej points out that in this code local variable `ret' is already known to be positive non-zero, so this test is meaningless. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
- 16 Oct, 2004 1 commit
-
-
Andrew Morton authored
Fix bug identified by Badari Pulavarty <pbadari@us.ibm.com> Local variable `handle' will become stale if ext3_direct_io_get_blocks() closes off the current transaction and starts a new one. This causes a BUG in journal_stop(). So reacquire the handle from *current after performing the I/O. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
- 22 Sep, 2004 1 commit
-
-
Alexander Viro authored
* missing le32_to_cpu() in a bunch of printks * on big-endian boxen ext3_error() failed to set EXT3_ERROR_FS in ->s_state (cpu_to_le32() instead of cpu_to_le16()) Signed-off-by: Al Viro <viro@parcelfarce.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
- 17 Sep, 2004 1 commit
-
-
Andrey V. Savochkin authored
Currently metadata writing errors are ignored and not returned from sys_fsync on ext2 and ext3 filesystems. That is, at least ext2 and ext3. Both ext2 and ext3 resort to sync_inode() in their ->sync_inode method, which in turn calls ->write_inode. ->write_inode method has void type, and any IO errors happening inside are lost. Make ->write_inode return the error code? Signed-off-by: Andrey Savochkin <saw@saw.sw.com.sg> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
- 27 Aug, 2004 1 commit
-
-
Ingo Molnar authored
Add a whole bunch more might_sleep() checks. We also enable might_sleep() checking in copy_*_user(). This was non-trivial because of the "copy_*_user() in atomic regions" trick would generate false positives. Fix that up by adding a new __copy_*_user_inatomic(), which avoids the might_sleep() check. Only i386 is supported in this patch. With: Arjan van de Ven <arjanv@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
- 30 Jun, 2004 1 commit
-
-
Mika Kukkonen authored
-
- 27 Jun, 2004 1 commit
-
-
Andrew Morton authored
ext3_direct_io_get_blocks() is misinterpreting the return value from ext3_journal_extend(), and is consequently running out of buffer credits and going BUG on tremendously large direct-io writes. Fix that up. Also, I note that the really large direct-io writes can hold a transaction open for the entire duration, which can be minutes. This violates ext3's attempt to commit data at regular intervals. Fix that up by looking at the transaction state: if it's T_LOCKED, shut off the current handle so the pending commit can complete. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
- 18 Jun, 2004 1 commit
-
-
Theodore Y. Ts'o authored
Here is a reworked version of my patch to ext3 to retry certain filesystem operations after an ENOSPC error. The ext3_should_retry_alloc() function will not wait on the currently running transaction if there is a currently active handle; hence this should avoid deadlocks in the Lustre use case. The patch is versus BK-recent. I've also included a simple, reliable test case which demonstrates the problem this patch is intended to fix. (Note that BK-recent is not sufficient to address this test case, and waiting on the commiting transaction in ext3_new_block is also not sufficient. Been there, tried that, didn't work. We need to do the full-bore retry from the top level. The ext3_should_retry_alloc() will only wait on the committing transaction if there is an active handle; hence Lustre will probably also need to use ext3_should_retry_alloc() if it wants to reliably avoid this particular problem.) #!/bin/sh # # TEST_DIR=/tmp IMAGE=$TEST_DIR/retry.img MNTPT=$TEST_DIR/retry.mnt TEST_SRC=/usr/projects/e2fsprogs/e2fsprogs/build MKE2FS_OPTS="" IMAGE_SIZE=8192 umount $MNTPT dd if=/dev/zero of=$IMAGE bs=4k count=$IMAGE_SIZE mke2fs -j -F $MKE2FS_OPTS $IMAGE function test_log () { echo $* logger -p local4.notice $* } mkdir -p $MNTPT mount -o loop -t ext3 $IMAGE $MNTPT test_log Retry test: BEGIN for i in `seq 1 3` do test_log "Retry test: Loop $i" echo 2 > /proc/sys/fs/jbd-debug while ! mkdir -p $MNTPT/foo/bar do test_log "Retry test: mkdir failed" sleep 1 done echo 0 > /proc/sys/fs/jbd-debug cp -r $TEST_SRC $MNTPT/foo/bar 2> /dev/null rm -rf $MNTPT/* done umount $MNTPT test_log "Retry test: END" akpm@osdl.org Rework the code to make it a formal JBD API entry point. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
- 25 May, 2004 1 commit
-
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> When start_transaction() detects an error it already calls ext3_std_error. No need to do it again in the caller.
-
- 19 May, 2004 1 commit
-
-
Andrew Morton authored
From: Mingming Cao <cmm@us.ibm.com> There is a uninitialized goal value being referenced in both ext3 and ext2 find goal block functions (ext3_find_goal() and ext2_find_goal()). In the non-sequential write case, these functions check the goal value(non zero) before calling ext3(2)_find_near() to find the goal block to allocate. Since the goal value is uninitialized(non zero), the ext3(2)_find_near() is never being called in the non-sequential write, thus ext3(2)_find_goal() failed to guide a goal block in the random write case. ext3(2)_new_block() takes the junk goal value and will turn it to goal 0 since it's normally beyond the filesystem block number limit. The fix is trivial.
-
- 10 May, 2004 1 commit
-
-
Andrew Morton authored
With strange workloads which do a lot of quick truncation on small filesystems it is possible to get into a situation where there are free blocks on the disk, but they are not allocatable at this time due to their having been freed up in the current JBD transaction. Applications get unexpected ENOSPC errors. We can fix that with this patch, originally by Andreas Dilger which forces a single commit+retry when an ENOSPC is encountered.
-
- 22 Apr, 2004 1 commit
-
-
Andrew Morton authored
If a filesystem's ->writepage implementation repeatedly refuses to write the page (it keeps on redirtying it instead) (reiserfs seems to do this) then the writeback logic can get stuck repeately trying to write the same page. Fix that up by correctly setting wbc->pages_skipped, to tell the writeback logic that things aren't working out.
-
- 19 Apr, 2004 1 commit
-
-
Andrew Morton authored
From: me, Badari Pulavarty <pbadari@us.ibm.com> Currently a direct-IO read or write of more than 2G on 64-bit machines is broken. Replace int with ssize_t in various places to fix that up.
-
- 17 Apr, 2004 1 commit
-
-
Andrew Morton authored
From: Jeff Garzik <jgarzik@pobox.com> It was debug code, no longer required.
-
- 15 Apr, 2004 1 commit
-
-
Andrew Morton authored
From: Jan Kara <jack@ucw.cz> Journalled quota support for ext3: The patch consists of two parts - ext3 changes and changes in generic quota code. The main idea of the changes is that a transaction is always started before any operation which changes quota file and dirtifying of the quota causes its write to disk. These two changes assure that quota change is journalled into the same transaction as the file change and hence after journal replay quota is consistent with the filesystem state. As during journal replay inodes from orphan list are deleted/truncated we have to do quota_on before the replay of the orphan list - this problem is solved by additional mount options to ext3 with quota file names and format. Some changes in generic code were also needed to assure that quota structure in file is always allocated and so ordinary quota operations (like adding/deleting a block/inode) need only a few blocks from the transaction.
-
- 20 Jan, 2004 1 commit
-
-
Andrew Morton authored
From: Jan Kara <jack@suse.cz> Journalled-data files need a different set of address_space_operations, so we need to update the file's aops when someone runs `chattr +j' on the file.
-
- 19 Jan, 2004 1 commit
-
-
Andrew Morton authored
From: viro@parcelfarce.linux.theplanet.co.uk <viro@parcelfarce.linux.theplanet.co.uk> A lot of places used to use ->f_dentry->d_inode->i_mapping all over the place. Replaced with use of ->f_mapping. For now - just the places where we literally could do search-and-replace.
-
- 16 Oct, 2003 1 commit
-
-
Andrew Morton authored
From: Alex Tomas <alex@clusterfs.com> The setting of i_disksize can race against concurrent invokations of ext3_get_block(). Moving this inside i_truncate_sem fixes it up.
-
- 05 Oct, 2003 1 commit
-
-
Andrew Morton authored
When the BKL was removed from ext3 we lost locking coverage for get_block()-versus-get_block(). Nobody seems to have hit the race because get_block() almost always runs under i_sem: only memory pressure-based writeout over a file hole runs outside i_sem. ext2 uses the dedicated i_meta_lock spinlock in the inode to provide the needed locking. But ext3 already has an rwsem around all the get_block() activity to protect it from truncate-related races. So this patch just converts that rwsem into a semaphore, so concurrent get_block() can never occur. This will be more efficient than adding the new spinlock. We lose the ability to have two threads run get_block() against the same file at the same time but again, that only happens during pageout over a hole anyway. (Kudos Alex Tomas for noticing the bug)
-
- 01 Oct, 2003 1 commit
-
-
Andrew Morton authored
From: Andries.Brouwer@cwi.nl ext2 used a 32-bit field for dev_t, with possibly undefined storage following; thus, no action was required to go to 32-bit dev_t, but going to 64-bit dev_t required some subtlety: 0 was written in the first word and the 64 bits in the following two. Al truncated my 64-bit stuff to 32 bits but did not understand why there was this split, and wrote 0 followed by a single word. We should at least zero the word following to have well-defined storage later.
-
- 23 Sep, 2003 1 commit
-
-
Alexander Viro authored
Real conversion to 32bit dev_t. Expansion to: * mknod() - 32 * newstat() - 32 on 64bit platforms * stat64() - 32 on mips, 64 on everything else (mips has weird struct stat64 and can't get more than 32 bits). Note that right now the difference is purely theoretical - we don't have internal values above 32 bits, so huge_... vs. new_... only marks the places where 64bit conversion will need extra work. * arch-dependent stat variants - depending on width available. * ustat et.al. - 32 * filesystems that can handle 32 bits right now - 32 * ext2 and ext3 - 32, with large dev_t inodes having 0 in the first element of i_data[] (where we store dev_t value for small device numbers) and keeping the value in the second element. * nfsd - 32; it can be driven to 64, but we'll get several issues with NFSv2 support. * RAID - 32 * devmapper - with v1 it's still 16 (nothing to do here), with v4 it's 64. * loop - 64 * initramfs - 32 * do_mounts code - 32. Parts that scan devfs tree are using newstat() on 64bit platforms and stat64() on the rest (IOW, the latest stat variant on given platform). * old_valid_dev()/new_valid_dev() added where needed (stat variants, mostly - we fail with -EOVERFLOW if values do not fit).
-
- 05 Sep, 2003 2 commits
-
-
Alexander Viro authored
old_decode_dev()/old_encode_dev() added where needed in other filesystems. Parts in different filesystems are independent, but IMO it's not worse splitting into a dozen of half-kilobyte patches.
-
Alexander Viro authored
the last kdev_t object is gone; ->i_rdev switched to dev_t.
-
- 19 Aug, 2003 1 commit
-
-
Andrew Morton authored
From: Oliver Xymoron <oxymoron@waste.org> These patches add the infrastructure for reporting asynchronous write errors to block devices to userspace. Error which are detected due to pdflush or VM writeout are reported at the next fsync, fdatasync, or msync on the given file, and on close if the error occurs in time. We do this by propagating any errors into page->mapping->error when they are detected. In fsync(), msync(), fdatasync() and close() we return that error and zero it out. The Open Group say close() _may_ fail if an I/O error occurred while reading from or writing to the file system. Well, in this implementation close() can return -EIO or -ENOSPC. And in that case it will succeed, not fail - perhaps that is what they meant. There are three patches in this series and testing has only been performed with all three applied.
-
- 01 Aug, 2003 4 commits
-
-
Randy Dunlap authored
From: Leann Ogasawara <ogasawara@osdl.org> Uninitialize static variables initialized to 0 so they are pushed to the .bss instead of .data.
-
Andrew Morton authored
From: Nathan Scott <nathans@sgi.com> This patch adds a mechanism by which a filesystem can register an interest in the completion of direct I/O. The completion routine will be given the inode, an offset and a length, and an optional filesystem-private field. We have extended the use of the buffer_head-based interface (i.e. get_block_t) for direct I/O such that the b_private field is now utilised. It is defined to be initially zero at the start of I/O, and will be passed into the filesystem unmodified by the VFS with each map request, while setting up the direct I/O. Once I/O has completed the final value of this pointer will be passed into a filesystems I/O completion handler. This mechanism can be used to keep track of all of the mapping requests which encompass an individual direct I/O request. This has been implemented specifically for XFS, but is done so as to be as generic as possible. XFS uses this mechanism to provide support for unwritten extents - these are file extents which have been pre-allocated on-disk, but not yet written to (once written, these become regular file extents, but only once I/O is complete).
-
Andrew Morton authored
From: Alex Tomas <bzzz@tmi.comex.ru> ext3_getblk() memsets a newly allocated buffer, but forgets to check whether a different thread brought it uptodate while we waited for the buffer lock. It's OK normally because we're serialised by the page lock. But lustre apparently is doing something different with getblk and hits this race. Plus I suspect it's racy with competing O_DIRECT writes.
-
Andrew Morton authored
From: Alex Tomas <bzzz@tmi.comex.ru> ext3_get_inode_loc() read inode's block only if: 1) this inode has no copy in memory 2) inode's block has another valid inode(s) this optimization allows to avoid needless I/O in two cases: 1) just allocated inode is first valid in the inode's block 2) kernel wants to write inode, but buffer in which inode belongs to gets freed by VM
-
- 25 Jul, 2003 1 commit
-
-
Stephen Lord authored
to take an action at completion time. XFS uses this to
-
- 10 Jul, 2003 1 commit
-
-
Andrew Morton authored
From: Daniel McNeil <daniel@osdl.org> This adds i_seqcount to the inode structure and then uses i_size_read() and i_size_write() to provide atomic access to i_size. This is a port of Andrea Arcangeli's i_size atomic access patch from 2.4. This only uses the generic reader/writer consistent mechanism. Before: mnm:/usr/src/25> size vmlinux text data bss dec hex filename 2229582 1027683 162436 3419701 342e35 vmlinux After: mnm:/usr/src/25> size vmlinux text data bss dec hex filename 2225642 1027655 162436 3415733 341eb5 vmlinux 3.9k more text, a lot of it fastpath :( It's a very minor bug, and the fix has a fairly non-minor cost. The most compelling reason for fixing this is that writepage() checks i_size. If it sees a transient value it may decide that page is outside i_size and will refuse to write it. Lost user data.
-
- 03 Jul, 2003 1 commit
-
-
Linus Torvalds authored
follow by splitting it into two functions: one that calculates the position, and the other that actually reads the inode block off the disk.
-
- 25 Jun, 2003 1 commit
-
-
Andrew Morton authored
ext3_block_truncate_page() is calling grab_cache_page() inside a JBD transaction. This is wrong, because transactions nest inside lock_page(). The deadlock is against shrink_list->ext3_journalled_writepage->journal_start. This was not noticed before because we never used to journal writepage() data in journalled-data mode. And because the deadlock against generic_file_write() is covered up by i_sem. Rework things so that we lock the page prior to starting a transaction.
-
- 20 Jun, 2003 1 commit
-
-
Andrew Morton authored
ext3 and JBD still have enormous numbers of lines which end in tabs. Fix them all up.
-
- 18 Jun, 2003 6 commits
-
-
Andrew Morton authored
We cannot sensibly support O_DIRECT reads or writes when all writes are journalled. This is because the VFS explicitly avoids syncing the file metadata during O_DIRECT reads and writes. ext3 with journalled data will leave pending changes in memory and they will overwrite the results of O_DIRECT writes, and O_DIRECT reads will not return the latest data. Setting the a_op to null will cause opens and fcntl(F_SETFL) to return -EINVAL if O_DIRECT is requested.
-
Andrew Morton authored
Fix various problems which cropped up due to MAP_SHARED traffic on data=journal with blocksize < PAGE_CACHE_SIZE. All relate to handling the "pending truncate" buffers outside i_size.
-
Andrew Morton authored
add a dump_stack() to a can't-happen path which happened during development.
-
Andrew Morton authored
ext3's fully data-journalled mode has been broken for a year. This patch fixes it up. The prepare_write/commit_write/writepage implementations have been split up. Instead of having each function handle all three journalling mode we now have three separate sets of address_space_operations. The problematic part of data=journal is MAP_SHARED writepage traffic: pages which don't have buffers. In 2.4 these were cheatingly treated as data-ordered buffers and that caused several nasty problems. Here we do it properly: writepage traffic is fully journalled. This means that the various workarounds for the 2.4 scheme can be removed, when I remember where they all are. The PG_checked flag has been borrowed: it it set in the atomic set_page_dirty a_op to tell the subsequent writepage() that this page needs to have buffers attached, dirtied and journalled. This rather defines PG_checked as "fs-private info in page->flags" and it should be renamed sometime.
-
Andrew Morton authored
After ext3_writepage() has called block_write_full_page() it will walk the page's buffer ring dropping the buffer_head refcounts. It does this wrong - on the final loop it will dereference the buffer_head which it just dropped the refcount on. Poisoned oopses have been seen against bh->b_this_page. Change it to take a local copy of b_this_page prior to dropping the bh's refcount.
-
Andrew Morton authored
This is the start of the ext3 scalability rework. It basically comes in two halves: - ext3 BKL/lock_super removal and scalable inode/block allocators - JBD locking rework. The ext3 scalability work was completed a couple of months ago. The JBD rework has been stable for a couple of weeks now. My gut feeling is that there should be one, maybe two bugs left in it, but no problems have been discovered... Performance-wise, throughput is increased by up to 2x on dual CPU. 10x on 16-way has been measured. Given that current ext3 is able to chew two whole CPUs spinning on locks on a 4-way, that wasn't especially suprising. These patches were prepared by Alex Tomas <bzzz@tmi.comex.ru> and myself. First patch: ext3 lock_kernel() removal. The only reason why ext3 takes lock_kernel() is because it is requires by the JBD API. The patch removes the lock_kernels() from ext3 and pushes them down into JBD itself.
-
- 02 Jun, 2003 1 commit
-
-
Andrew Morton authored
If a buffer_head is outside i_size, block_write_full_page() will leave it !buffer_mapped(). We shouldn't attach that buffer to the transaction for writeout! This bug has been in 2.5 for some time.
-