Commits · 2954850ea90ffb25c0acff03398af95fbde3d034 · Kirill Smelkov / linux

04 Jan, 2005 1 commit

David Howells authored 20 years ago


The attached patch fixes a number of problems in the VM routines:

 (1) Some inline funcs don't compile if CONFIG_MMU is not set.

 (2) swapper_pml4 needn't exist if CONFIG_MMU is not set.

 (3) __free_pages_ok() doesn't counter set_page_refs() different behaviour if
     CONFIG_MMU is not set.

 (4) swsusp.c invokes TLB flushing functions without including the header file
     that declares them.

CONFIG_SHMEM semantics:

- If MMU: Always enabled if !EMBEDDED

- If MMU && EMBEDDED: configurable

- If !MMU: disabled
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

2954850e

31 Aug, 2004 1 commit

[PATCH] tiny shmem/tmpfs replacement · 14ef4d0a

Matt Mackall authored 20 years ago


A patch to replace tmpfs/shmem with ramfs for systems without swap,
incorporating the suggestions from Andi and Hugh.  It uses ramfs instead.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

14ef4d0a

23 Aug, 2004 1 commit

[PATCH] token based thrashing control · d4f9d02b

Rik van Riel authored 20 years ago

The following experimental patch implements token based thrashing
protection, using the algorithm described in:

	http://www.cs.wm.edu/~sjiang/token.htm



When there are pageins going on, a task can grab a token, that protects the
task from pageout (except by itself) until it is no longer doing heavy
pageins, or until the maximum hold time of the token is over.

If the maximum hold time is exceeded, the task isn't eligable to hold the
token for a while more, since it wasn't doing it much good anyway.

I have run a very unscientific benchmark on my system to test the
effectiveness of the patch, timing how a 230MB two-process qsbench run
takes, with and without the token thrashing protection present.

normal 2.6.8-rc6:	6m45s
2.6.8-rc6 + token:	4m24s

This is a quick hack, implemented without having talked to the inventor of
the algorithm.  He's copied on the mail and I suspect we'll be able to do
better than my quick implementation ...
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

d4f9d02b

22 May, 2004 2 commits

[PATCH] rmap 17: real prio_tree · 2fe9c14c

Andrew Morton authored 20 years ago

From: Hugh Dickins <hugh@veritas.com>

Rajesh Venkatasubramanian's implementation of a radix priority search tree of
vmas, to handle object-based reverse mapping corner cases well.

Amongst the objections to object-based rmap were test cases by akpm and by
mingo, in which large numbers of vmas mapping disjoint or overlapping parts of
a file showed strikingly poor performance of the i_mmap lists. Perhaps those
tests are irrelevant in the real world? We cannot be too sure: the prio_tree
is well-suited to solving precisely that problem, so unless it turns out to
bring too much overhead, let's include it.

Why is this prio_tree.c placed in mm rather than lib? See GET_INDEX: this
implementation is geared throughout to use with vmas, though the first half of
the file appears more general than the second half.

Each node of the prio_tree is itself (contained within) a vma: might save
memory by allocating distinct nodes from which to hang vmas, but wouldn't save
much, and would complicate the usage with preallocations. Off each node of
the prio_tree itself hangs a list of like vmas, if any.

The connection from node to list is a little awkward, but probably the best
compromise: it would be more straightforward to list likes directly from the
tree node, but that would use more memory per vma, for the list_head and to
identify that head. Instead, node's shared.vm_set.head points to next vma
(whose shared.vm_set.head points back to node vma), and that next contains the
list_head from which the rest hang - reusing fields already used in the
prio_tree node itself.

Currently lacks prefetch: Rajesh hopes to add some soon.

2fe9c14c

[PATCH] numa api: Core NUMA API code · d3b8924a

Andrew Morton authored 20 years ago

From: Andi Kleen <ak@suse.de>

The following patches add support for configurable NUMA memory policy
for user processes. It is based on the proposal from last kernel summit
with feedback from various people.

This NUMA API doesn't not attempt to implement page migration or anything
else complicated: all it does is to police the allocation when a page
is first allocation or when a page is reallocated after swapping. Currently
only support for shared memory and anonymous memory is there; policy for
file based mappings is not implemented yet (although they get implicitely
policied by the default process policy)

It adds three new system calls: mbind to change the policy of a VMA,
set_mempolicy to change the policy of a process, get_mempolicy to retrieve
memory policy. User tools (numactl, libnuma, test programs, manpages) can be
found in  ftp://ftp.suse.com/pub/people/ak/numa/numactl-0.6.tar.gz

For details on the system calls see the manpages
http://www.firstfloor.org/~andi/mbind.html
http://www.firstfloor.org/~andi/set_mempolicy.html
http://www.firstfloor.org/~andi/get_mempolicy.html
Most user programs should actually not use the system calls directly,
but use the higher level functions in libnuma
(http://www.firstfloor.org/~andi/numa.html) or the command line tools
(http://www.firstfloor.org/~andi/numactl.html

The system calls allow user programs and administors to set various NUMA memory
policies for putting memory on specific nodes. Here is a short description
of the policies copied from the kernel patch:

 * NUMA policy allows the user to give hints in which node(s) memory should
 * be allocated.
 *
 * Support four policies per VMA and per process:
 *
 * The VMA policy has priority over the process policy for a page fault.
 *
 * interleave     Allocate memory interleaved over a set of nodes,
 *                with normal fallback if it fails.
 *                For VMA based allocations this interleaves based on the
 *                offset into the backing object or offset into the mapping
 *                for anonymous memory. For process policy an process counter
 *                is used.
 * bind           Only allocate memory on a specific set of nodes,
 *                no fallback.
 * preferred      Try a specific node first before normal fallback.
 *                As a special case node -1 here means do the allocation
 *                on the local CPU. This is normally identical to default,
 *                but useful to set in a VMA when you have a non default
 *                process policy.
 * default        Allocate on the local node first, or when on a VMA
 *                use the process policy. This is what Linux always did
 *                in a NUMA aware kernel and still does by, ahem, default.
 *
 * The process policy is applied for most non interrupt memory allocations
 * in that process' context. Interrupts ignore the policies and always
 * try to allocate on the local CPU. The VMA policy is only applied for memory
 * allocations for a VMA in the VM.
 *
 * Currently there are a few corner cases in swapping where the policy
 * is not applied, but the majority should be handled. When process policy
 * is used it is not remembered over swap outs/swap ins.
 *
 * Only the highest zone in the zone hierarchy gets policied. Allocations
 * requesting a lower zone just use default policy. This implies that
 * on systems with highmem kernel lowmem allocation don't get policied.
 * Same with GFP_DMA allocations.
 *
 * For shmfs/tmpfs/hugetlbfs shared memory the policy is shared between
 * all users and remembered even when nobody has memory mapped.




This patch:

This is the core NUMA API code. This includes NUMA policy aware
wrappers for get_free_pages and alloc_page_vma(). On non NUMA kernels
these are defined away.

The system calls mbind (see http://www.firstfloor.org/~andi/mbind.html),
get_mempolicy (http://www.firstfloor.org/~andi/get_mempolicy.html) and
set_mempolicy (http://www.firstfloor.org/~andi/set_mempolicy.html) are
implemented here.

Adds a vm_policy field to the VMA and to the process. The process
also has field for interleaving. VMA interleaving uses the offset
into the VMA, but that's not possible for process allocations.

From: Andi Kleen <ak@muc.de>

  > Andi, how come policy_vma() calls ->set_policy under i_shared_sem?

  I think this can be actually dropped now.  In an earlier version I did
  walk the vma shared list to change the policies of other mappings to the
  same shared memory region.  This turned out too complicated with all the
  corner cases, so I eventually gave in and added ->get_policy to the fast
  path.  Also there is still the mmap_sem which prevents races in the same MM.
   

  Patch to remove it attached.  Also adds documentation and removes the
  bogus __alloc_page_vma() prototype noticed by hch.

From: Andi Kleen <ak@suse.de>

  A few incremental fixes for NUMA API.

  - Fix a few comments

  - Add a compat_ function for get_mem_policy I considered changing the
    ABI to avoid this, but that would have made the API too ugly.  I put it
    directly into the file because a mm/compat.c didn't seem worth it just for
    this.

  - Fix the algorithm for VMA interleave.

From: Matthew Dobson <colpatch@us.ibm.com>

  1) Move the extern of alloc_pages_current() into #ifdef CONFIG_NUMA.
    The only references to the function are in NUMA code in mempolicy.c

  2) Remove the definitions of __alloc_page_vma().  They aren't used.

  3) Move forward declaration of struct vm_area_struct to top of file.

d3b8924a

12 Apr, 2004 1 commit

[PATCH] hugetlb consolidation · c8b976af

Andrew Morton authored 20 years ago

From: William Lee Irwin III <wli@holomorphy.com>

The following patch consolidates redundant code in various hugetlb
implementations.  I took the liberty of renaming a few things, since the
code was all moved anyway, and it has the benefit of helping to catch
missed conversions and/or consolidations.

c8b976af

05 Sep, 2003 1 commit

[PATCH] Unpinned futexes v2: indexing changes · 968f11a8

Jamie Lokier authored 21 years ago

This changes the way futexes are indexed, so that they don't pin pages. 
It also fixes some bugs with private mappings and COW pages.

Currently, all futexes look up the page at the userspace address and pin
it, using the pair (page,offset) as an index into a table of waiting
futexes.  Any page with a futex waiting on it remains pinned in RAM,
which is a problem when many futexes are used, especially with FUTEX_FD.

Another problem is that the page is not always the correct one, if it
can be changed later by a COW (copy on write) operation.  This can
happen when waiting on a futex without writing to it after fork(),
exec() or mmap(), if the page is then written to before attempting to
wake a futex at the same adress. 

There are two symptoms of the COW problem:
 - The wrong process can receive wakeups
 - A process can fail to receive required wakeups. 

This patch fixes both by changing the indexing so that VM_SHARED
mappings use the triple (inode,offset,index), and private mappings use
the pair (mm,virtual_address).

The former correctly handles all shared mappings, including tmpfs and
therefore all kinds of shared memory (IPC shm, /dev/shm and
MAP_ANON|MAP_SHARED).  This works because every mapping which is
VM_SHARED has an associated non-zero vma->vm_file, and hence inode.
(This is ensured in do_mmap_pgoff, where it calls shmem_zero_setup). 

The latter handles all private mappings, both files and anonymous.  It
isn't affected by COW, because it doesn't care about the actual pages,
just the virtual address.

The patch has a few bonuses:

        1. It removes the vcache implementation, as only futexes were
           using it, and they don't any more.

        2. Removing the vcache should make COW page faults a bit faster.

        3. Futex operations no longer take the page table lock, walk
           the page table, fault in pages that aren't mapped in the
           page table, or do a vcache hash lookup - they are mostly a
           simple offset calculation with one hash for the futex
           table.  So they should be noticably faster.

Special thanks to Hugh Dickins, Andrew Morton and Rusty Russell for
insightful feedback.  All suggestions are included.

968f11a8

04 Feb, 2003 1 commit

[PATCH] implement posix_fadvise64() · fccbe384

Andrew Morton authored 22 years ago

An implementation of posix_fadvise64().  It adds 368 bytes to my vmlinux and
is worth it.

I didn't bother doing posix_fadvise(), as userspace can implement that by
calling fadvise64().

The main reason for wanting this syscall is to provide userspace with the
ability to explicitly shoot down pagecache when streaming large files.  This
is what O_STEAMING does, only posix_fadvise() is standards-based, and harder
to use.

posix_fadvise() also subsumes sys_readahead().

POSIX_FADV_WILLNEED will generally provide asynchronous readahead semantics
for small amounts of I/O.  As long as things like indirect blocks are aready
in core.

POSIX_FADV_RANDOM gives unprivileged applications a way of disabling
readahead on a per-fd basis, which may provide some benefit for super-seeky
access patterns such as databases.



The POSIX_FADV_* values are already implemented in glibc, and this patch
ensures that they are in sync.

A test app (fadvise.c) is available in ext3 CVS.  See

	http://www.zip.com.au/~akpm/linux/ext3/

for CVS details.

Ulrich has reviewed this patch (thanks).

fccbe384

03 Feb, 2003 1 commit
- kbuild: Remove export-objs := ... statements · 46124528
  Kai Germaschewski authored 22 years ago
```
One of the goals of the whole new modversions implementation:
export-objs is gone for good!
```
  46124528
02 Dec, 2002 1 commit

[PATCH] Neaten up mm/Makefile · 55ff56e3

Matthew Wilcox authored 22 years ago

This removes the include of (the now empty) Rules.make, gets rid of the
ifndef clause and fixes the indentation.

55ff56e3

01 Dec, 2002 1 commit
- [PATCH] split the mm files compiled according to SWAP & MMU · bcf07dce
  Alan Cox authored 22 years ago
```
Basically a nop for MMU based systems
```
  bcf07dce
03 Nov, 2002 1 commit

[PATCH] make swap code conditional · abcb2f16

Christoph Hellwig authored 22 years ago

Make the swap code conditional on CONFIG_SWAP. This is mostly for
uClinux, but !CONFIG_SWAP compiles and boots fine for i386, too -
the only problem I've seen is that X doesn't starts, it's probably
shm-related, thus it's disabled unconditionally for "normal" arches.

The patch makes three files in mm/ conditional on CONFIG_SWAP, reorganzies
include/linux/swap.h big time to provide stubs for the !CONFIG_SWAP case,
moves the remaining /proc/swaps code to swapfile.c and cleans up some
more MM code to compile fine without CONFIG_SWAP

abcb2f16

31 Oct, 2002 1 commit

[PATCH] sys_remap_file_pages · d16dc20c

Andrew Morton authored 22 years ago

Ingo's remap_file_pages patch.  Supported on ia32, x86-64, sparc
and sparc64.  Others will need to update mman.h and the syscall
tables.

d16dc20c

08 Oct, 2002 1 commit

[PATCH] free_area_init cleanup · 5b73f882

Andrew Morton authored 22 years ago

From Christoph Hellwig.

If we always pass &contig_page_data into free_area_init_node for the
non-distcontig case we can merge both versions of that function into
one.  Move that one to page_alloc.c and thus kill numa.c which was
totally misnamed, btw.

5b73f882

03 Oct, 2002 1 commit

[PATCH] truncate/invalidate_inode_pages rewrite · 735a2573

Andrew Morton authored 22 years ago

Rewrite these functions to use gang lookup.

- This probably has similar performance to the old code in the common case.

- It will be vastly quicker than current code for the worst case
  (single-page truncate).

- invalidate_inode_pages() has been changed.  It used to use
  page_count(page) as the "is it mapped into pagetables" heuristic.  It
  now uses the (page->pte.direct != 0) heuristic.

- Removes the worst cause of scheduling latency in the kernel.

- It's a big code cleanup.

- invalidate_inode_pages() has been changed to take an address_space
  *, not an inode *.

- the maximum hold times for mapping->page_lock are enormously reduced,
  making it quite feasible to turn this into an irq-safe lock.  Which, it
  seems, is a requirement for sane AIO<->direct-io integration, as well
  as possibly other AIO things.

(Thanks Hugh for fixing a bug in this one as well).

(Christoph added some stuff too)

735a2573

27 Sep, 2002 1 commit

[PATCH] virtual => physical page mapping cache · 7c2149e9

Ingo Molnar authored 22 years ago

Implement a "mapping change" notification for virtual lookup caches, and
make the futex code use that to keep the futex page pinning consistent
across copy-on-write events in the VM space.

7c2149e9

18 Sep, 2002 1 commit
- kbuild: Remove O_TARGET from {kernel,mm,fs,...}/Makefile · 709aa851
  Kai Germaschewski authored 22 years ago
```
It's gone almost everywhere else already, and will eventually make for
a nicer top-level Makefile.
```
  709aa851
17 Sep, 2002 1 commit

[PATCH] move madvise implementation into madvise.c · c89a8bad

Andrew Morton authored 22 years ago

Patch from Christoph Hellwig moves the madvise implementation out of
filemap.c and into its own .c file.  No other changes are made.

c89a8bad

19 Jul, 2002 1 commit

[PATCH] minimal rmap · c48c43e6

Andrew Morton authored 22 years ago

This is the "minimal rmap" patch, writen by Rik, ported to 2.5 by Craig
Kulsea.

Basically,

before: When the page reclaim code decides that is has scanned too many
unreclaimable pages on the LRU it does a scan of process virtual
address spaces for pages to add to swapcache.  ptes pointing at the
page are unmapped as the scan proceeds.  When all ptes referring to a
page have been unmapped and it has been written to swap the page is
reclaimable.

after: When an anonymous page is encountered on the tail of the LRU we
use the rmap to see if it hasn't been referenced lately.  If so then
add it to swapcache.  When the page is again encountered on the LRU, if
it is still unreferenced then try to unmap all ptes which refer to it
in one hit, and if it is clean (ie: on swap) then free it.

The rest of the VM - list management, the classzone concept, etc
remains unchanged.

There are a number of things which the per-page pte chain could be
used for.  Bill Irwin has identified the following.


(1)  page replacement no longer goes around randomly unmapping things

(2)  referenced bits are more accurate because there aren't several ms
        or even seconds between find the multiple pte's mapping a page

(3)  reduces page replacement from O(total virtually mapped) to O(physical)

(4)  enables defragmentation of physical memory

(5)  enables cooperative offlining of memory for friendly guest instance
        behavior in UML and/or LPAR settings

(6)  demonstrable benefit in performance of swapping which is common in
        end-user interactive workstation workloads (I don't like the word
        "desktop"). c.f. Craig Kulesa's post wrt. swapping performance

(7)  evidence from 2.4-based rmap trees indicates approximate parity
        with mainline in kernel compiles with appropriate locking bits

(8)  partitioning of physical memory can reduce the complexity of page
        replacement searches by scanning only the "interesting" zones
        implemented and merged in 2.4-based rmap

(9)  partitioning of physical memory can increase the parallelism of page
        replacement searches by independently processing different zones
        implemented, but not merged in 2.4-based rmap

(10) the reverse mappings may be used for efficiently keeping pte cache
        attributes coherent

(11) they may be used for virtual cache invalidation (with changes)

(12) the reverse mappings enable proper RSS limit enforcement
        implemented and merged in 2.4-based rmap



The code adds a pointer to struct page, consumes additional storage for
the pte chains and adds computational expense to the page reclaim code
(I measured it at 3% additional load during streaming I/O).  The
benefits which we get back for all this are, I must say, theoretical
and unproven.  If it has real advantages (or, indeed, disadvantages)
then why has nobody demonstrated them?



There are a number of things remaining to be done:

1: Demonstrate the above advantages.

2: Make it work with pte-highmem  (Bill Irwin is signed up for this)

3: Don't add pte_chains to non-shared pages optimisation (Dave McCracken's
   patch does this)

4: Move the pte_chains into highmem too (Bill, I guess)

5: per-cpu pte_chain freelists (Rik?)

6: maybe GC the pte_chain backing pages. (Seems unavoidable.  Rik?)

7: multithread the page reclaim code.  (I have patches).

8: clustered add-to-swap.  Not sure if I buy this.  anon pages are
   often well-ordered-by-virtual-address on the LRU, so it "just
   works" for benchmarky loads.  But there may be some other loads...

9: Fix bad IO latency in page reclaim (I have lame patches)

10: Develop tuning tools, use them.

11: The nightly updatedb run is still evicting everything.

c48c43e6

30 Apr, 2002 1 commit

[PATCH] writeback from address spaces · 090da372

Andrew Morton authored 22 years ago

[ I reversed the order in which writeback walks the superblock's
  dirty inodes.  It sped up dbench's unlink phase greatly.  I'm
  such a sleaze ]

The core writeback patch.  Switches file writeback from the dirty
buffer LRU over to address_space.dirty_pages.

- The buffer LRU is removed

- The buffer hash is removed (uses blockdev pagecache lookups)

- The bdflush and kupdate functions are implemented against
  address_spaces, via pdflush.

- The relationship between pages and buffers is changed.

  - If a page has dirty buffers, it is marked dirty
  - If a page is marked dirty, it *may* have dirty buffers.
  - A dirty page may be "partially dirty".  block_write_full_page
    discovers this.

- A bunch of consistency checks of the form

	if (!something_which_should_be_true())
		buffer_error();

  have been introduced.  These fog the code up but are important for
  ensuring that the new buffer/page code is working correctly.

- New locking (inode.i_bufferlist_lock) is introduced for exclusion
  from try_to_free_buffers().  This is needed because set_page_dirty
  is called under spinlock, so it cannot lock the page.  But it
  needs access to page->buffers to set them all dirty.

  i_bufferlist_lock is also used to protect inode.i_dirty_buffers.

- fs/inode.c has been split: all the code related to file data writeback
  has been moved into fs/fs-writeback.c

- Code related to file data writeback at the address_space level is in
  the new mm/page-writeback.c

- try_to_free_buffers() is now non-blocking

- Switches vmscan.c over to understand that all pages with dirty data
  are now marked dirty.

- Introduces a new a_op for VM writeback:

	->vm_writeback(struct page *page, int *nr_to_write)

  this is a bit half-baked at present.  The intent is that the address_space
  is given the opportunity to perform clustered writeback.  To allow it to
  opportunistically write out disk-contiguous dirty data which may be in other zones.
  To allow delayed-allocate filesystems to get good disk layout.

- Added address_space.io_pages.  Pages which are being prepared for
  writeback.  This is here for two reasons:

  1: It will be needed later, when BIOs are assembled direct
     against pagecache, bypassing the buffer layer.  It avoids a
     deadlock which would occur if someone moved the page back onto the
     dirty_pages list after it was added to the BIO, but before it was
     submitted.  (hmm.  This may not be a problem with PG_writeback logic).

  2: Avoids a livelock which would occur if some other thread is continually
     redirtying pages.

- There are two known performance problems in this code:

  1: Pages which are locked for writeback cause undesirable
     blocking when they are being overwritten.  A patch which leaves
     pages unlocked during writeback comes later in the series.

  2: While inodes are under writeback, they are locked.  This
     causes namespace lookups against the file to get unnecessarily
     blocked in wait_on_inode().  This is a fairly minor problem.

     I don't have a fix for this at present - I'll fix this when I
     attach dirty address_spaces direct to super_blocks.

- The patch vastly increases the amount of dirty data which the
  kernel permits highmem machines to maintain.  This is because the
  balancing decisions are made against the amount of memory in the
  machine, not against the amount of buffercache-allocatable memory.

  This may be very wrong, although it works fine for me (2.5 gigs).

  We can trivially go back to the old-style throttling with
  s/nr_free_pagecache_pages/nr_free_buffer_pages/ in
  balance_dirty_pages().  But better would be to allow blockdev
  mappings to use highmem (I'm thinking about this one, slowly).  And
  to move writer-throttling and writeback decisions into the VM (modulo
  the file-overwriting problem).

- Drops 24 bytes from struct buffer_head.  More to come.

- There's some gunk like super_block.flags:MS_FLUSHING which needs to
  be killed.  Need a better way of providing collision avoidance
  between pdflush threads, to prevent more than one pdflush thread
  working a disk at the same time.

  The correct way to do that is to put a flag in the request queue to
  say "there's a pdlfush thread working this disk".  This is easy to
  do: just generalise the "ra_pages" pointer to point at a struct which
  includes ra_pages and the new collision-avoidance flag.

090da372

10 Apr, 2002 2 commits

[PATCH] writeback daemons · 1ed704e9

Andrew Morton authored 22 years ago

This patch implements a gang-of-threads which are designed to
be used for dirty data writeback. "pdflush" -> dirty page
flush, or something.

The number of threads is dynamically managed by a simple
demand-driven algorithm.

"Oh no, more kernel threads".  Don't worry, kupdate and
bdflush disappear later.

The intent is that no two pdflush threads are ever performing
writeback against the same request queue at the same time.
It would be wasteful to do that.  My current patches don't
quite achieve this; I need to move the state into the request
queue itself...

The driver for implementing the thread pool was to avoid the
possibility where bdflush gets stuck on one device's get_request_wait()
queue while lots of other disks sit idle.  Also generality,
abstraction, and the need to have something in place to perform
the address_space-based writeback when the buffer_head-based
writeback disappears.

There is no provision inside the pdflush code itself to prevent
many threads from working against the same device.  That's
the responsibility of the caller.

The main API function, `pdflush_operation()' attempts to find
a thread to do some work for you.  It is not reliable - it may
return -1 and say "sorry, I didn't do that".  This happens if
all threads are busy.

One _could_ extend pdflush_operation() to queue the work so that
it is guaranteed to happen.  If there's a need, that additional
minor complexity can be added.

1ed704e9

[PATCH] readahead · 8fa49846

Andrew Morton authored 22 years ago

I'd like to be able to claim amazing speedups, but
the best benchmark I could find was diffing two
256 megabyte files, which is about 10% quicker.  And
that is probably due to the window size being effectively
50% larger.

Fact is, any disk worth owning nowadays has a segmented
2-megabyte cache, and OS-level readahead mainly seems
to save on CPU cycles rather than overall throughput.
Once you start reading more streams than there are segments
in the disk cache we start to win.

Still.  The main motivation for this work is to
clean the code up, and to create a central point at
which many pages are marshalled together so that
they can all be encapsulated into the smallest possible
number of BIOs, and injected into the request layer.

A number of filesystems were poking around inside the
readahead state variables.  I'm not really sure what they
were up to, but I took all that out.  The readahead
code manages its own state autonomously and should not
need any hints.

- Unifies the current three readahead functions (mmap reads, read(2)
  and sys_readhead) into a single implementation.

- More aggressive in building up the readahead windows.

- More conservative in tearing them down.

- Special start-of-file heuristics.

- Preallocates the readahead pages, to avoid the (never demonstrated,
  but potentially catastrophic) scenario where allocation of readahead
  pages causes the allocator to perform VM writeout.

- Gets all the readahead pages gathered together in
  one spot, so they can be marshalled into big BIOs.

- reinstates the readahead ioctls, so hdparm(8) and blockdev(8)
  are working again.  The readahead settings are now per-request-queue,
  and the drivers never have to know about it.  I use blockdev(8).
  It works in units of 512 bytes.

- Identifies readahead thrashing.

  Also attempts to handle it.  Certainly the changes here
  delay the onset of catastrophic readahead thrashing by
  quite a lot, and decrease it seriousness as we get more
  deeply into it, but it's still pretty bad.

8fa49846

08 Mar, 2002 2 commits
- mm cleanup: split out mincore() system call from filemap.c · 24237d11
  Linus Torvalds authored 23 years ago
  
  24237d11
- Split out "msync" logic into a file of its own. No actual code · f98dc48f
  Linus Torvalds authored 23 years ago
```
changes.
```
  f98dc48f
19 Feb, 2002 1 commit

[PATCH] new struct page shrinkage · e5191c50

Rik van Riel authored 23 years ago

The patch has been changed like you wanted, with page->zone
shoved into page->flags. I've also pulled the thing up to
your latest changes from linux.bkbits.net so you should be
able to just pull it into your tree from:

Rik

e5191c50

05 Feb, 2002 5 commits

v2.5.1 -> v2.5.1.1 · 0925bad3

Linus Torvalds authored 23 years ago

- me: revert the "kill(-1..)" change.  POSIX isn't that clear on the
issue anyway, and the new behaviour breaks things.
- Jens Axboe: more bio updates
- Al Viro: rd_load cleanups. hpfs mount fix, mount cleanups
- Ingo Molnar: more raid updates
- Jakub Jelinek: fix Linux/x86 confusion about arg passing of "save_v86_state" and "do_signal"
- Trond Myklebust: fix NFS client race conditions

0925bad3

v2.5.0.9 -> v2.5.0.10 · 80044607

Linus Torvalds authored 23 years ago

- Jens Axboe: more bio stuff
- Ingo Molnar: mempool for bio
- Niibe Yutaka: Super-H update

80044607

v2.4.13 -> v2.4.13.1 · 980adcb2

Linus Torvalds authored 23 years ago

  - Michael Warfield: computone serial driver update
  - Alexander Viro: cdrom module race fixes
  - David Miller: Acenic driver fix
  - Andrew Grover: ACPI update
  - Kai Germaschewski: ISDN update
  - Tim Waugh: parport update
  - David Woodhouse: JFFS garbage collect sleep

980adcb2

v2.4.6.6 -> v2.4.6.7 · 74f5133b

Linus Torvalds authored 23 years ago

  - Andreas Dilger: various ext2 cleanups
  - Richard Gooch: devfs update
  - Johannes Erdfelt: USB updates
  - Alan Cox: merges
  - David Miller: fix SMP pktsched bootup deadlock (CONFIG_NET_SCHED)
  - Roman Zippel: AFFS update
  - Anton Altaparmakov: NTFS update
  - me: fix races in vfork() (semaphores are not good completion handlers)
  - Jeff Garzik: net driver updates, sysvfs update

74f5133b

Import changeset · 7a2deb32
Linus Torvalds authored 23 years ago

7a2deb32