- 30 Aug, 2002 40 commits
-
-
Chuck Lever authored
sock_writeable determines whether there is space in a socket's output buffer. socket write_space callbacks use it to determine whether to wake up those that are waiting for more output buffer space. however, sock_writeable is not appropriate for TCP sockets. because the RPC layer's write_space callback uses it for TCP sockets, the RPC layer hammers on sock_sendmsg with dozens of write requests that are only a few hundred bytes long when it is trying to send a large write RPC request. this patch adds logic to the RPC layer's write_space callback that properly handles TCP sockets. patch reviewed by Trond and Alexey.
-
Chuck Lever authored
when several RPC requests want to reconnect a TCP transport socket at once, xprt_lock_write serializes the tasks to prevent multiple socket connects. however, TCP connects are always done by a RPC child task that has no request slot. xprt_lock_write can oops if there is no request slot allocated to the invoking RPC task. reviewed and accepted by Trond. the xprt_lock_write changes are not yet in 2.4, so this patch does not apply to 2.4.
-
Ingo Molnar authored
This fixes a bad TLS initialization bug found by Andi Kleen. x86/SMP only worked due to luck.
-
Ingo Molnar authored
This adds two scheduler related fixes: - changes the migration code to use struct completion. Andrew pointed out that there might be a small window in where the up() touches the semaphore while the waiting task goes on and frees its stack. And completion is more suited for this kind of stuff anyway. - removes two unneeded exports, pointed out by Andrew.
-
Ingo Molnar authored
This moves CLONE_SETTID and CLONE_CLEARTID handling into kernel/fork.c, where it belongs. [the CLONE_SETTLS is x86-specific and thus remains in the per-arch process.c] This makes support for these two new flags much easier: architectures only have to pass in the user_tid pointer.
-
Dominik Brodowski authored
It would be helpful if these msr.h #defines could get in.
-
David Mosberger authored
It makes no sense to keep efi.h as an ia64-specific header (there really are x86 machines coming out with optional EFI BIOS support).
-
http://lia64.bkbits.net/to-linus-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Andrew Morton authored
Fix a __FUNCTION__ paste in revoke.c
-
Andrew Morton authored
O_DIRECT support for ext3. It works OK in all journalling modes. Updates to the file metadata and inode are journalled as usual. If the system crashes during an appending O_DIRECT write then journal recovery will truncate the written-to file back to the length which it had on entry to that write. If the system crashes during a file overwrite to existing blocks then the file contents will be an unknown mixture of old and new. If the system crashes during a file overwrite which instantiates new blocks in the middle of the file then there is a possibility of uninitialised disk blocks being present in the file post-recovery.
-
Andrew Morton authored
mpage_writepages() does a lock_page() on pages to be written back, even when it is being used for page reclaim writeback. This is normally OK, because the page is unlocked quickly - pages are unlocked during writeback and nobody should be performing __GFP_FS allocations inside lock_page(). But it has introduced a ranking problem in ext3: generic_file_write ->lock_page ->ext3_prepare_write ->journal_start (waits for a commit) versus ext3_create() ->journal_start() ->ext3_new_inode(GFP_KERNEL) ->page reclaim ->mpage_writepages ->lock_page (locks up, transaction is held open) Maybe sometime, I'll have to turn mpage_writepages' lock_page into a trylock if the caller is PF_MEMALLOC. But for now, let's make ext3's inside-transaction allocations use GFP_NOFS. There is only one of them.
-
Andrew Morton authored
This is a performance and correctness fix against the writeback paths. The writeback code has competing requirements. Sometimes it is used for "memory cleansing": kupdate, bdflush, writer throttling, page allocator writeback, etc. And sometimes this same code is used for data integrity pruposes: fsync, msync, fdatasync, sync, umount, various other kernel-internal uses. The problem is: how to handle a dirty buffer or page which is currently under writeback. For memory cleansing, we just want to skip that buffer/page and go onto the next one. But for sync, we must wait on the old writeback and then start new writeback. mpage_writepages() is current correct for cleansing, but incorrect for sync. block_write_full_page() is currently correct for sync, but inefficient for cleansing. The fix is fairly simple. - In mpage_writepages(), don't skip the page is it's a sync operation. - In block_write_full_page(), skip the buffer if it is a sync operation. And return -EAGAIN to tell the caller that the writeout didn't work out. The caller must then set the page dirty again and move it onto mapping->dirty_pages. This is an extension of the writepage API: writepage can now return EAGAIN. There are only three callers, and they have been updated. fail_writepage() and ext3_writepage() were actually doing this by hand. They have been changed to return -EAGAIN. NTFS will want to be able to return -EAGAIN from its writepage as well. - A sticky question is: how to tell the writeout code which mode it is operating in? Cleansing or sync? It's such a tiny code change that I didn't have the heart to go and propagate a `mode' argument down every instance of writepages() and writepage() in the kernel. So I passed it in via current->flags. Incidentally, the occurrence of a locked-and-dirty buffer in block_write_full_page() is fairly rare: normally the collision avoidance happens at the address_space level, via PageWriteback. But some mappings (blockdevs, ext3 files, etc) have their dirty buffers written out via submit_bh(). It is these buffers which can stall block_write_full_page(). This wart will be pretty intrusive to fix. ext3 needs to become fully page-based (ugh. It's a block-based journalling filesystem, and pages are unnatural). blockdev mappings are still written out by buffers because that's how filesystems use them. Putting _all_ metadata (indirects, inodes, superblocks, etc) into standalone address_spaces would fix that up. - filemap_fdatawrite() sets PF_SYNC. So filemap_fdatawrite() is the kernel function which will start writeback against a mapping for "data integrity" purposes, whereas the unexported, internal-only do_writepages() is the writeback function which is used for memory cleansing. This difference is the reason why I didn't consolidate those functions ages ago... - Lots of code paths had a bogus extra call to filemap_fdatawait(), which I previously added in a moment of weak-headedness. They have all been removed.
-
Andrew Morton authored
A reworked version of the batched page freeing and lock amortisation for VMA teardown. It walks the existing 507-page list in the mmu_gather_t in 16-page chunks, drops their refcounts in 16-page chunks, and de-LRUs and frees any resulting zero-count pages in up-to-16 page chunks.
-
Andrew Morton authored
Clean up put_page() and page_cache_release(). It's pretty simple now: #define page_cache_get(page) get_page(page) #define page_cache_release(page) put_page(page)
-
Andrew Morton authored
it was only being used in invalidate_inode_pages(), and from there, pagevec_release() does the same thing.
-
Andrew Morton authored
As suggested by Daniel - it's a bug to run put_page_testzero against a zero-ref page.
-
Ingo Molnar authored
please apply this patch (Robert ACK-ed it). While there is a preemptible kernel entry already, i think listing this at the scheduler entry is justfied, preemption has a number of scheduler interactions.
-
Ingo Molnar authored
this is an updated version of the LDT fixes. It fixes the following kinds of problems: - fix a possible gcc optimization causing a race causing the loading of a corrupt LDT descriptor upon context switch. [this fix got simplified over previous versions.] - remove an unconditional OOM printk, and there's no need to set ->size in the OOM path. - fix preemption bugs, load_LDT()/clear_LDT() was not preemption-safe, when it was used outside of spinlocks. the context-switch race is the following. 'LDT modification' is the following operation: the seg->ldt pointer is modified, then seg->size is modified. In theory gcc is free to reschedule the two modifications, and first modify ->size, then ->ldt. Thus if this modification is not synchronized with context-switches, another thread might see a temporary state of the new ->size [which was increased], but still the old pointer. Ie.: CPU0 CPU1 pc->size = newsize; load_LDT(); // (oldptr, newsize) pc->ldt = newptr; the corrupt LDT is loaded until the SMP cross-call is sent, leaving the window open for many usecs. the fix is to put a wmb() after ->ldt modifications. [this is also in preparation of not-write-ordered SMP x86 designs.]
-
bk://linux-input.bkbits.net/linux-inputLinus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Vojtech Pavlik authored
some mainboards (Andrew Morton's Dell) report that even everything is okay with AUX. Also remove a check for very old AMI i8042's, which could generate false positives on modern buggy mainboards.
-
bk://jfs.bkbits.net/linux-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-
Peter Wächtler authored
-