- 18 Jun, 2003 40 commits
-
-
Andrew Morton authored
ext3's fully data-journalled mode has been broken for a year. This patch fixes it up. The prepare_write/commit_write/writepage implementations have been split up. Instead of having each function handle all three journalling mode we now have three separate sets of address_space_operations. The problematic part of data=journal is MAP_SHARED writepage traffic: pages which don't have buffers. In 2.4 these were cheatingly treated as data-ordered buffers and that caused several nasty problems. Here we do it properly: writepage traffic is fully journalled. This means that the various workarounds for the 2.4 scheme can be removed, when I remember where they all are. The PG_checked flag has been borrowed: it it set in the atomic set_page_dirty a_op to tell the subsequent writepage() that this page needs to have buffers attached, dirtied and journalled. This rather defines PG_checked as "fs-private info in page->flags" and it should be renamed sometime.
-
Andrew Morton authored
Avoid holding the journal's j_list_lock while copying the buffer_head's data. We hold jbd_lock_bh_state() during the copy, which is all that is needed.
-
Andrew Morton authored
In start_this_handle() the caller does not have a handle ref pinning the transaction open, and so the call to log_start_commit() is racy because some other CPU could take the transaction into commit state independently. Fix that by holding j_state_lock (which pins j_running_transaction) across the log_start_commit() call.
-
Andrew Morton authored
Plug a conceivable race with the freeing up of trasnactions, and add some more debug checks.
-
Andrew Morton authored
Drop in a few assertions to ensure that the locking rules are being adhered to.
-
Andrew Morton authored
Add a comment describing why a race isn't there.
-
Andrew Morton authored
After ext3_writepage() has called block_write_full_page() it will walk the page's buffer ring dropping the buffer_head refcounts. It does this wrong - on the final loop it will dereference the buffer_head which it just dropped the refcount on. Poisoned oopses have been seen against bh->b_this_page. Change it to take a local copy of b_this_page prior to dropping the bh's refcount.
-
Andrew Morton authored
We need to check that buffer is still journalled _after_ taking the right locks.
-
Andrew Morton authored
There's a bug: a caller tries to journal a buffer and then decides he didn't want to after all. He calls journal_release_buffer(). But journal_release_buffer() is only allowed to give the caller a buffer credit back if it was the caller who added the buffer in the first place. journal_release_buffer() currently looks at the buffer state to work that out, but gets it wrong: if the buffer has been moved onto a different list by some other part of ext3 the credit is bogusly not returned to the caller and the fs can later go BUG due to handle credit exhaustion. The fix: Change journal_get_undo_access() to return the number of buffers which the caller actually added to the journal. (one or zero). When the caller later calls journal_release_buffer(), he passes in that count, to tell journal_release_buffer() how many credits the caller should get back. For API consistency this change should also be made to journal_get_create_access() and journal_get_write_access(). But there is no requirement for that in ext3 at this time. The remaining bug: This logic effectively gives another transaction handle a free buffer credit. These could conceivably accumulate and cause a journal overflow. This is a separate problem and needs changes to the t_outstanding_credits accounting and the logic in start_this_handle.
-
Andrew Morton authored
This filesystem-wide sleeping lock is no longer needed. Remove it.
-
Andrew Morton authored
lock_kernel() is no longer needed in JBD. Remove all the lock_kernel() calls from fs/jbd/. Here is where I get to say "ex-parrot".
-
Andrew Morton authored
Remove the remaining sleep_on() calls from JBD.
-
Andrew Morton authored
From: Alex Tomas <bzzz@tmi.comex.ru> We're about to remove lock_journal(), and it is lock_journal which separates the running and committing transaction's revokes on the single revoke table. So implement two revoke tables and rotate them at commit time.
-
Andrew Morton authored
Impement the designed locking around journal->j_commit_request.
-
Andrew Morton authored
Implement the designed locking around journal->j_commit_sequence.
-
Andrew Morton authored
Implement the designed locking around journal->j_free. Things get a lot better here, too.
-
Andrew Morton authored
Implement the designed locking around journal->j_tail.
-
Andrew Morton authored
Implement the designed locking around journal->j_head.
-
Andrew Morton authored
Implement the designed locking around j_checkpoint_transactions. It was all pretty much there actually.
-
Andrew Morton authored
Go through all sites which use j_committing_transaction and ensure that the deisgned locking is correctly implemented there.
-
Andrew Morton authored
Implement the designed locking around journal->j_running_transaction. A lot more of the new locking scheme falls into place.
-
Andrew Morton authored
We now start to move onto the fields of the topmost JBD data structure: the journal. The patch implements the designed locking around the j_barrier_count member. And as a part of that, a lot of the new locking scheme is implemented. Several lock_kernel()s and sleep_on()s go away.
-
Andrew Morton authored
Provide the designed locking around the transaction's t_jcb callback list. It turns out that this is wholly redundant at present.
-
Andrew Morton authored
Implement the designed locking for t_outstanding_credits
-
Andrew Morton authored
Provide the designating locking for transaction_t.t_updates.
-
Andrew Morton authored
Now we move more into the locking of the transaction_t fields. t_nr_buffers locking is just an audit-and-commentary job.
-
Andrew Morton authored
This was a system-wide spinlock. Simple transformation: make it a filesystem-wide spinlock, in the JBD journal. That's a bit lame, and later it might be nice to make it per-transaction_t. But there are interesting ranking and ordering problems with that, especially around __journal_refile_buffer().
-
Andrew Morton authored
Implement the designated b_tnext locking. This also covers b_tprev locking.
-
Andrew Morton authored
Go through all b_next_transaction instances, implement locking rules. (Nothing to do here - b_transaction locking covered it)
-
Andrew Morton authored
Go through all use of b_transaction and implement the rules. Fairly straightforward.
-
Andrew Morton authored
Implement the designed locking schema around the journal_head.b_committed_data field.
-
Andrew Morton authored
We now start to move across the JBD data structure's fields, from "innermost" and outwards. Start with journal_head.b_frozen_data, because the locking for this field was partially implemented in jbd-010-b_committed_data-race-fix.patch. It is protected by jbd_lock_bh_state(). We keep the lock_journal() and spin_lock(&journal_datalist_lock) calls in place. Later, spin_lock(&journal_datalist_lock) is replaced by spin_lock(&journal->j_list_lock). Of course, this completion of the locking around b_frozen_data also puts a lot of the locking for other fields in place.
-
Andrew Morton authored
journal_unlock_journal_head() is misnamed: what it does is to drop a ref on the journal_head and free it if that ref fell to zero. It doesn't actually unlock anything. Rename it to journal_put_journal_head().
-
Andrew Morton authored
buffer_heads and journal_heads are joined at the hip. We need a lock to protect the joint and its refcounts. JBD is currently using a global spinlock for that. Change it to use one bit in bh->b_state.
-
Andrew Morton authored
This was a strange spinlock which was designed to prevent another CPU from ripping a buffer's journal_head away while this CPU was inspecting its state. Really, we don't need it - we can inspect that state directly from bh->b_state. So kill it off, along with a few things which used it which are themselves not actually used any more.
-
Andrew Morton authored
This is the start of the JBD locking rework. The aims of all this are to remove all lock_kernel() calls from JBD, to remove all lock_journal() calls (the context switch rate is astonishing when the lock_kernel()s are removed) and to remove all sleep_on() instances. The strategy which is taken is: a) Define the lcoking schema (this patch) b) Work through every JBD data structure and implement its locking fully, according to the above schema. We work from "innermost" data structures and outwards. It isn't guaranteed that the filesystem will work very well at all stages of this patch series. In this patch: Add commentary and various locks to jbd.h describing the locking scheme which is about to be implemented. Initialise the new locks. Coding-style goodness in jbd.h
-
Andrew Morton authored
From: Alex Tomas <bzzz@tmi.comex.ru> We have a race wherein the block allocator can decide that journal_head.b_committed_data is present and then will use it. But kjournald can concurrently free it and set the pointer to NULL. It goes oops. We introduce per-buffer_head "spinlocking" based on a bit in b_state. To do this we abstract out pte_chain_lock() and reuse the implementation. The bit-based spinlocking is pretty inefficient CPU-wise (hence the warning in there) and we may move this to a hashed spinlock later.
-
Andrew Morton authored
From: Alex Tomas <bzzz@tmi.comex.ru> This is a port from ext2 of the fuzzy counters (for Orlov allocator heuristics) and the hashed spinlocking (for the inode and bloock allocators).
-
Andrew Morton authored
From: Alex Tomas <bzzz@tmi.comex.ru> This patch weans ext3 off lock_super()-based protection for the inode and block allocators. It's basically the same as the ext2 changes. 1) each group has own spinlock, which is used for group counter modifications 2) sb->s_free_blocks_count isn't used any more. ext2_statfs() and find_group_orlov() loop over groups to count free blocks 3) sb->s_free_blocks_count is recalculated at mount/umount/sync_super time in order to check consistency and to avoid fsck warnings 4) reserved blocks are distributed over last groups 5) ext3_new_block() tries to use non-reserved blocks and if it fails then tries to use reserved blocks 6) ext3_new_block() and ext3_free_blocks do not modify sb->s_free_blocks, therefore they do not call mark_buffer_dirty() for superblock's buffer_head. this should reduce I/O a bit Also fix orlov allocator boundary case: In the interests of SMP scalability the ext2 free blocks and free inodes counters are "approximate". But there is a piece of code in the Orlov allocator which fails due to boundary conditions on really small filesystems. Fix that up via a final allocation pass which simply uses first-fit for allocatiopn of a directory inode.
-
Andrew Morton authored
Move some lock_kernel() calls from the caller to the callee, reducing holdtimes.
-