An error occurred fetching the project authors.
- 29 Oct, 2002 16 commits
-
-
Andrew Morton authored
There's no need to take down pagecache after performing direct-IO reads from a file or a blockdevice. And when using direct access to a blockdev which has a filesystem mounted it creates unnecessary disturbance of filesystem activity.
-
Andrew Morton authored
From Ingo - performance optimization: do not kill threads in the same thread group as the OOM-ing thread. (it's still necessery to scan over every thread though, as it's possible to have CLONE_VM threads in a different thread group - we do not want those to escape the OOM-kill.) - to not let newly created child threads slip out of the group-kill. Note that the 2.4 kernel's OOM handler has the same problem, and it could be the reason why forkbombs occasionally slip out of the OOM kill.
-
Andrew Morton authored
shrink_slab() wants to calculate nr_scanned_pages * seeks_per_object * entries_in_slab / nr_lru_pages entries_in_slab and nr_lru_pages can vary a lot. There is a potential for 32-bit overflows. I spent ages trying to avoid corner cases which cause a significant lack of precision while preserving some clarity. Gave up and used do_div(). The code is called rarely - at most once per 128 kbytes of reclaim. The patch adds a tweak to balance_pgdat() to reduce the call rate to shrink_slab() in the case where the zone is just a little bit below pages_high. Also increase SHRINK_BATCH. The things we're shrinking are typically a few hundred bytes, and a batchcount of 128 gives us a minimum of ten pages or so per shrinking callout.
-
Andrew Morton authored
There's more work to do on these, for well-aligned copies. Arjan has some stuff for that. First step on that path is to clean the code up, get it uninlined and have a framework for making per-CPU-type decisions.
-
Andrew Morton authored
This patch speeds up copy_*_user for some Intel ia32 processors. It is based on work by Mala Anand. It is a good win. Around 30% for all src/dest alignments except 32/32. In this test a fully-cached one gigabyte file was read into an 8192-byte userspace buffer using read(fd, buf, 8192). The alignment of the user-side buffer was altered between runs. This is a PIII. Times are in seconds. User buffer 2.5.41 2.5.41+ patch++ 0x804c000 4.373 4.343 0x804c001 10.024 6.401 0x804c002 10.002 6.347 0x804c003 10.013 6.328 0x804c004 10.105 6.273 0x804c005 10.184 6.323 0x804c006 10.179 6.322 0x804c007 10.185 6.319 0x804c008 9.725 6.347 0x804c009 9.780 6.275 0x804c00a 9.779 6.355 0x804c00b 9.778 6.350 0x804c00c 9.723 6.351 0x804c00d 9.790 6.307 0x804c00e 9.790 6.289 0x804c00f 9.785 6.294 0x804c010 9.727 6.277 0x804c011 9.779 6.251 0x804c012 9.783 6.246 0x804c013 9.786 6.245 0x804c014 9.772 6.063 0x804c015 9.919 6.237 0x804c016 9.920 6.234 0x804c017 9.918 6.237 0x804c018 9.846 6.372 0x804c019 10.060 6.294 0x804c01a 10.049 6.328 0x804c01b 10.041 6.337 0x804c01c 9.931 6.347 0x804c01d 10.013 6.273 0x804c01e 10.020 6.346 0x804c01f 10.016 6.356 0x804c020 4.442 4.366 So `rep;movsl' is slower at all non-cache-aligned offsets. PII is using the PIII alignment. I don't have a PII any more, but I do recall that it demonstrated the same behaviour as the PIII. The patch contains an enhancement (based on careful testing) from Hirokazu Takahashi <taka@valinux.co.jp>. In cases where source and dest have the same alignment, but that aligment is poor, we do a short copy of a few bytes to bring the two pointers onto a favourable boundary and then do the big copy. And also a bugfix from Hirokazu Takahashi. As an added bonus, this patch decreases the kernel text by 28 kbytes. 22k of this in in .text and the rest in __ex_table. I'm not really sure why .text shrunk so much. These copy routines have no special-case for constant-sized copies. So a lot of uaccess.h becomes dead code with this patch. The next patch which uninlines the copy_*_user functions cleans all that up and saves an additional 5k.
-
Andrew Morton authored
From Rik. "this trivial patch, against 2.5-current, exports nr_running and nr_iowait_tasks in /proc/stat. With this patch in vmstat will no longer need to walk all the processes in the system just to determine the number of running and blocked processes."
-
Andrew Morton authored
When performing lookups against very sparse trees radix_tree_gang_lookup fails to find nodes "far" to the right of the start point. Because it only understands sparseness in the leaf nodes, not the intermediate nodes. Nobody noticed this because all callers are incrementing the start index as they walk the tree. Change it to terminate the search when it really has inspected the last possible node for the current tree's height.
-
Andrew Morton authored
Sort-of-but-not-really from High Dickins. We're doing a lot of buslocked operations in the page allocator just for debug. Plus when they _do_ trigger, there are so many BUG_ONs in there that it's rather hard to work out from user reports which one actually triggered. So redo all that and also print out some more useful info about the page state before taking the machine out. (And yes, we need to take the machine out. Incorrect page handling in there can cause file corruption).
-
Andrew Morton authored
Provide a function in core kernel to initialise a file_ra_state structure. Perviously this was all taken care of by the fact that new struct file's are all zeroed out. But now a file_ra_state may be independently allocated, and we don't want users of it to have to know how to initialise it.
-
Andrew Morton authored
Mainly from Badari Pulavarty Traditionally we have only supported O_DIRECT I/O at an alignment and granularity which matches the underlying filesystem. That typically means that all IO must be 4k-aligned and a multiple of 4k in size. Here, we relax that so that direct I/O happens with (typically) 512-byte alignment and multiple-of-512-byte size. The tricky part is when a write starts and/or ends partway through a filesystem block which has just been added. We need to zero out the parts of that block which lie outside the written region. We handle that by putting appropriately-sized parts of the ZERO_PAGE into sepatate BIOs. The generic_direct_IO() function has been changed so that the filesystem must pass in the address of the block_device against which the IO is to be performed. I'd have preferred to not do this, but we do need that info at that time so that alignment checks can be performed. If the filesystem passes in a NULL block_device pointer then we fall back to the old behaviour - must align with the fs blocksize. There is no trivial way for userspace to know what the minimum alignment is - it depends on what bdev_hardsect_size() says about the device. It is _usually_ 512 bytes, but not always. This introduces the risk that someone will develop and test applications which work fine on their hardware, but will fail on someone else's hardware. It is possible to query the hardsect size using the BLKSSZGET ioctl against the backing block device. This can be performed at runtime or at application installation time.
-
Andrew Morton authored
The direct IO code was initially designed to allocate a known-sized BIO, to fill it with pages and to then send it off. Then along came bio_add_page(). Really, it broke direct-io.c - it meant that the direct-IO BIO assembly code no longer had a-priori knowledge of whether a page would fit into the current BIO. Our attempts to rework the initial design to play well with bio_add_page() really weren't adequate. The code was getting more and more twisty and we kept finding corner-cases which failed. So this patch redesigns the BIO assembly and submission path of the direct-IO code so that it better suits the bio_add_page() semantics. It introduces another layer in the assembly phase: the 'cur_page' which is cached in the dio structure. The function which walks the file mapping do_direct_IO() simply emits a sequence of (page,offset,len,sector) quads into the next layer down - submit_page_section(). submit_page_section() is responsible for looking for a merge of the new quad against the previous page section (same page). If no merge is possible it passes the currently-cached page down to the next level, dio_send_cur_page(). dio_send_cur_page() will try to add the current page to the current BIO. If that fails, the current BIO is submitted for IO and we open a new one. So it's all nicely layered. The assembly of sections-of-page into the current page closely mirrors the assembly of sections-of-BIO into the current BIO. At both of these levels everything is done in a "deferred" manner: try to merge a new request onto the currently-cached one. If that fails then send the currently-cached request and then cache this one instead. Some variables have been renamed to more closely represent their usage. Some thought has been put into ownership of the various state variables within `struct dio'. We were updating and inspecting these in various places in a rather hard-to-follow manner. So things have been reworked so that particular functions "own" particular parts of the dio structure. Violators have been exterminated and commentary has been added to describe this ownership. The handling of file holes has been simplified. As a consequence of all this, the code is clearer and simpler than it used to be, and it now passes the modified-for-O_DIRECT fsx-linux testing again.
-
Andrew Morton authored
Two fixes here. First: Fixes a BUG() which occurs if you try to perform O_DIRECT IO against a blockdev which has an fs mounted on it. (We should be able to do that). What happens is that do_invalidatepage() ends up calling discard_buffer() on buffers which it couldn't strip. That clears buffer_mapped() against useful things like the superblock buffer_head. The next submit_bh() goes BUG over the write of an unmapped buffer. So just run try_to_release_page() (aka try_to_free_buffers()) on the invalidate path. Second: The invalidate_inode_pages() functions are best-effort pagecache shrinkers. They are used against pages inside i_size and are not supposed to throw away dirty data. However it is possible for another CPU to run set_page_dirty() against one of these pages after invalidate_inode_pages() has decided that it is clean. This could happen if someone was performing O_DIRECT IO against a file which was also mapped with MAP_SHARED. So recheck the dirty state of the page inside the mapping->page_lock and back out if the page has just been marked dirty. This will also prevent the remove_from_page_cache() BUG which will occur if someone marks the page dirty between the clear_page_dirty() and remove_from_page_cache() calls in truncate_complete_page().
-
Andrew Morton authored
simple_prepare_write() currently memsets the entire page. It only needs to clear the parts which are outside the to-be-written region. This change makes no difference to performance - that memset was just a cache preload for the copy_from_user() in generic_file_write(). But it's more correct. Also, mark the page dirty in simple_commit_write(), not in simple_prepare_write(). Because the page's contents are changed after prepare_write(). This doesn't matter in practice, but it is setting a bad example. Also, add a flush_dcache_page() to simple_prepare_write(). Again, not really needed because the page cannot be mapped into pagetables if it is not uptodate. But it is example code and should not be missing such things.
-
Andrew Morton authored
From Bill Irwin. Abstract out ramfs readpage(), prepare_write(), and commit_write() operations. Ram-backed filesystems are going to be doing a lot of zero-filled read and write operations. So in this patch, ramfs' implementations are moved to libfs in anticipation of other callers.
-
Andrew Morton authored
Patch from Hugh Dickins <hugh@veritas.com> Fix premature -EIO from blkdev_get_block: bdget initialize bd_block_size consistent with bd_inode->i_blkbits (assigned by new_inode). Otherwise, subsequent set_blocksize can find bd_block_size doesn't need updating, and skip updating i_blkbits, leaving them inconsistent.
-
Andrew Morton authored
Local variable `data' is only used for debugging.
-
- 28 Oct, 2002 24 commits
-
-
Christoph Hellwig authored
Now that the devicemapper hit the tree there's no more reason to keep the uncompiling LVM1 code around and it's various hacks to other files around, this patch removes it.
-
Alexander Viro authored
* first application of the fact that block device methods are per-disk and not per-major - IDE subdrivers got block_device_operations of their own, redirects in ide.c are gone, so is a bunch of methods of IDE subdrivers.
-
Alexander Viro authored
* ide_..._ioctl() never use two of five arguments - inode and file. Arguments removed.
-
Alexander Viro authored
-
Alexander Viro authored
This chunk and the next one basically do equivalent of sard in the right way - counters are exported per-disk in driverfs, as attributes of disk or partition nodes.
-
Alexander Viro authored
-
Alexander Viro authored
* do_open() cleaned up * we always pick block_device_operations from gendisk->fops now * register_blkdev() just stores the name of driver, nothing more * ->bd_op and ->bd_queue removed - we have that in gendisk * get_blkfops() is gone
-
Alexander Viro authored
* we move allocation of gendisks in ide-probe to the moment when queues are set up, so everything that wants to feed requests in one of IDE queues can safely set ->rq_disk
-
Alexander Viro authored
-
Alexander Viro authored
-
Alexander Viro authored
-
Alexander Viro authored
* per-major array eliminated, every disk is a separate source of randomness
-
Alexander Viro authored
-
Alexander Viro authored
* remove blk_dev[] * removed BLK_DEFAULT_QUEUE * moved definition of CURRENT into drivers that used it * removed definition of QUEUE from headers
-
Alexander Viro authored
* compile fixes * switched to private queue * set ->queue
-
Alexander Viro authored
* killed uses of CURRENT and QUEUE
-
Alexander Viro authored
* switched to private queues * set ->queue
-
Alexander Viro authored
* killed remaining CURRENT
-
Alexander Viro authored
* switched to private queues * set ->queue and ->private_data * switched to use of ->bd_disk and ->rq_disk * cleaned up
-
Alexander Viro authored
* switched to private queues * set ->queue and ->private_data * switched to use of ->bd_disk and ->rq_disk * folded recalibrate[] and special_op[] into hd_info[] * switched to passing pointers instead of indices * cleaned up
-
Alexander Viro authored
* switched to private queues * set ->queue
-
Alexander Viro authored
* switched to private queues * set ->queue and ->private_data * switched to use of ->bd_disk and ->rq_disk * fixed the problem with request_module() from open() * cleaned up
-
Alexander Viro authored
* switched to private queues * set ->queue and ->private_data * switched to use of ->bd_disk and ->rq_disk * somewhat cleaned up
-
Alexander Viro authored
* switched to private queues * set ->queue and ->private_data * switched to use of ->bd_disk
-