Commits · b6a6107abf22149c9cedc68149d3fd78a76f7e34 · Kirill Smelkov / linux

04 Jan, 2005 1 commit

[PATCH] enhanced Memory accounting data collection · b6a6107a

Jay Lan authored 20 years ago


This patch is to offer common accounting data collection method at memory
usage for various accounting packages including BSD accounting, ELSA, CSA
and any other acct packages that use a common layer of data collection.

New struct fields are added to mm_struct to save high watermarks of rss
usage as well as virtual memory usage.

New struct fields are added to task_struct to collect accumulated rss usage
and vm usages.

These data are collected on per process basis.
Signed-off-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b6a6107a

03 Jan, 2005 2 commits

[PATCH] do_anonymous_page() use SetPageReferenced · a161d268

Andrew Morton authored 20 years ago


mark_page_accessed() is more heavyweight than we need: the page is already
headed for the active list, so setting the software-referenced bit is
equivalent.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

a161d268

[PATCH] kill off highmem_start_page · 422e43d4

Dave Hansen authored 20 years ago


People love to do comparisons with highmem_start_page.  However, where
CONFIG_HIGHMEM=y and there is no actual highmem, there's no real page at
*highmem_start_page.

That's usually not a problem, but CONFIG_NONLINEAR is a bit more strict and
catches the bogus address tranlations. 

There are about a gillion different ways to find out of a 'struct page' is
highmem or not.  Why not just check page_flags?  Just use PageHighMem()
wherever there used to be a highmem_start_page comparison.  Then, kill off
highmem_start_page.

This removes more code than it adds, and gets rid of some nasty
#ifdefs in .c files.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

422e43d4

02 Jan, 2005 1 commit

[PATCH] ia64 4-level pgtable fix · 0a299616

Nick Piggin authored 20 years ago


Fix a 4-level page table bug that slipped through (introduced by me,
not Andi).

Compiles and boots on ia64 and 2-level i386.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

0a299616

01 Jan, 2005 4 commits

[PATCH] introduce fallback header · f1b32486

Nick Piggin authored 20 years ago

Add a temporary "fallback" header so architectures can run with the 4level
patgetables patch without modification. All architectures should be
converted to use the folding headers (include/asm-generic/pgtable-nop?d.h)
as soon as possible, and the fallback header removed.

Make all architectures include the fallback header, except i386, because that
architecture has earlier been converted to use pgtable-nopmd.h under the 3
level system, which is not compatible with the fallback header.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

f1b32486

[PATCH] convert Linux to 4-level page tables · 31382a8d

Andi Kleen authored 20 years ago


Extend the Linux MM to 4level page tables.

This is the core patch for mm/*, fs/*, include/linux/*

It breaks all architectures, which will be fixed in separate patches.

The conversion is quite straight forward.  All the functions walking the page
table hierarchy have been changed to deal with another level at the top.  The
additional level is called pml4. 

mm/memory.c has changed a lot because it did most of the heavy lifting here.
Most of the changes here are extensions of the previous code.  
Signed-off-by: Andi Kleen <ak@suse.de>

Converted by Nick Piggin to use the pud_t 'page upper' level between pgd
and pmd instead of Andi's pml4 level above pgd. 
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

31382a8d

[PATCH] replace clear_page_tables with clear_page_range · 62dc69da

Nick Piggin authored 20 years ago


Rename clear_page_tables to clear_page_range. clear_page_range takes byte
ranges, and aggressively frees page table pages. Maybe useful to control
page table memory consumption on 4-level architectures (and even 3 level
ones).

Possible downsides are:
- flush_tlb_pgtables gets called more often (only a problem for sparc64
  AFAIKS).

- the opportunistic "expand to fill PGDIR_SIZE hole" logic that ensures
  something actually gets done under the old system is still in place.
  This could sometimes make unmapping small regions more inefficient. There
  are some other solutions to look at if this is the case though.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

62dc69da

[PATCH] split copy_page_range · d537e007

Andi Kleen authored 20 years ago


Split copy_page_range into the usual set of page table walking functions.
Needed to handle the complexity when moving to 4 levels.
Signed-off-by: Andi Kleen <ak@suse.de>

Split out from Andi Kleen's 4level patch by Nick Piggin.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

d537e007

28 Dec, 2004 2 commits

kallsyms: Add in_gate_area_no_task() · d6365493

Keith Owens authored 20 years ago


Add in_gate_area_no_task() for use in places where no task is valid
(e.g. kallsyms).  If you have a valid task, use in_gate_area() as
before.
Signed-off-by: Keith Owens <kaos@ocs.com.au>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>

d6365493

kallsyms: Clean up x86-64 special casing of in_gate_area() · 340241a9

Keith Owens authored 20 years ago


x86-64 has special case code for in_gate_area(), but it is not clean.

* Replace CONFIG_ARCH_GATE_AREA with __HAVE_ARCH_GATE_AREA.
  ARCH_GATE_AREA is not a config option.

* The definitions of get_gate_vma() and in_gate_area() are identical in
  include/asm-x86_64/page.h and include/linux/mm.h.  Fold the duplicate
  definitions into include/linux/mm.h.

Does not affect kallsyms directly, this patch just creates a clean base
for patch 2.
Signed-off-by: Keith Owens <kaos@ocs.com.au>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>

340241a9

19 Nov, 2004 1 commit

[PATCH] mlock-vs-VM_IO hang fix · b6de4056

Hugh Dickins authored 20 years ago


With Andrea Arcangeli <andrea@novell.com>

Fix a hang which occurs when mlock() encounters a mapping of /dev/mem.
These have VM_IO set.  follow_page() keeps returning zero (not a valid pfn)
and handle_mm_fault() keeps on returning VM_FAULT_MINOR (there's a pte
there), so get_user_pages() locks up.

The patch changes get_user_pages() to just bale out when it hits a VM_IO
region.  make_pages_present() is taught to ignore the resulting -EFAULT.

We still have two bugs:

a) If a process has a VM_IO vma, get_user_pages() will bale early,
   without having considered the vmas at higher virtual addresses.

   As do_mlock() also walks the vma list this bug is fairly benign, but
   get_user_pages() is doing the wrong thing there.

b) The `len' argument to get_user_pages should be long, not int.  We
   presently have a 16TB limit on 64-bit.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b6de4056

15 Nov, 2004 1 commit

[PATCH] x86-64: Fix get_user_pages access to vsyscall page · ce57094b

Andi Kleen authored 20 years ago


The current kernel oopses on x86-64 when gdb steps into the vsyscall page. 
This patch fixes it.

I also removed the bogus NULL checks of *_offset and replaced them with
proper _none checks.  I made them BUGs because vsyscall pages should be
always mapped.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

ce57094b

28 Oct, 2004 2 commits

[PATCH] vmalloc_to_page helper · e1f40fc0

Arjan van de Ven authored 20 years ago


After William did the remap_pfn_range change, a very common pattern became:
	page = page_to_pfn(vmalloc_to_page((void *)pos));
	if (remap_pfn_range(vma, start, page, PAGE_SIZE, PAGE_SHARED)) {

the patch below adds a very simple helper, vmalloc_to_pfn() to simplify this
a bit.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

e1f40fc0

[PATCH] statm: shared = rss - anon_rss · 8586febd

Hugh Dickins authored 20 years ago

The third "shared" field of /proc/$pid/statm in 2.4 was a count of pages in
the mm whose page_count is more than 1 (oddly, including pages shared just
with swapcache). That's too costly to calculate each time, so 2.6 changed
it to the total file-backed extent. But Andrea knows apps and users
surprised when (rss - shared) goes negative: we need to provide an rss-like
statistic, close to the 2.4 interpretation.

Something that's quick and easy to maintain accurately is mm->anon_rss, the
count of anonymous pages in the mm. Then shared = rss - anon_rss gives a
pretty good and meaningful approximation to 2.4's intention: wli confirms
that this will be useful to Oracle too.

Where to show it? I think it's best to treat this as a bugfix and show it
in the third field of /proc/$pid/statm, after resident, as before - there's
no evidence that the total file-backed extent was found useful.

Albert would like other fields to revert to page counts, but that's a lot
harder: if mprotect can change the category of a page, then it can't be
accounted as simply as this. Only go that route if real need shown.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8586febd

21 Oct, 2004 1 commit

remap_pfn_range: make the region special. · 58c35841

Linus Torvalds authored 20 years ago

VM_IO tells the rest fo the world that the pages may
have side effects on reads/writes etc, and VM_RESERVED
historically told swap-out not to bother with it.

58c35841

20 Oct, 2004 3 commits

[PATCH] vmalloc_to_page() preempt cleanup · 6fd96437

Andrew Morton authored 20 years ago


remove unneeded preempt_disable/enable.  pte_offset_map/unmap already does
that.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

6fd96437

[PATCH] vm: introduce remap_pfn_range() to replace remap_page_range() · c363ca85

William Lee Irwin III authored 20 years ago


This patch introduces remap_pfn_range(), destined to replace
remap_page_range(), to which all callers of remap_page_range() are converted
in the sequel.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

c363ca85

[PATCH] lighten mmlist_lock · 1b46884a

Hugh Dickins authored 20 years ago

Let's lighten the global spinlock mmlist_lock.

What's it for?
1. Its original role is to guard mmlist.
2. It later got a second role, to prevent get_task_mm from raising
mm_users from the dead, just after it went down to 0.

Firstly consider the second: __exit_mm sets tsk->mm NULL while holding
task_lock before calling mmput; so mmlist_lock only guards against the
exceptional case, of get_task_mm on a kernel workthread which did AIO's
use_mm (which transiently sets its tsk->mm without raising mm_users) on an
mm now exiting.

Well, I don't think get_task_mm should succeed at all on use_mm tasks.
It's mainly used by /proc/pid and ptrace, seems at best confusing for those
to present the kernel thread as having a user mm, which it won't have a
moment later. Define PF_BORROWED_MM, set in use_mm, clear in unuse_mm
(though we could just leave it), get_task_mm give NULL if set.

Secondly consider the first: and what's mmlist for?
1. Its original role was for swap_out to scan: rmap ended that in 2.5.27.
2. In 2.4.10 it got a second role, for try_to_unuse to scan for swapoff.

So, make mmlist a list of mms which maybe have pages on swap: add mm to
mmlist when first swap entry is assigned in try_to_unmap_one (pageout), or
in copy_page_range (fork); and mmput remove it from mmlist as before,
except usually list_empty and there's no need to lock. drain_mmlist added
to swapoff, to empty out the mmlist if no swap is then in use.

mmput leave mm on mmlist until after its exit_mmap, so try_to_unmap_one can
still add mm to mmlist without worrying about the mm_users 0 case; but
try_to_unuse must avoid the mm_users 0 case (when an mm might be removed
from mmlist, and freed, while it's down in unuse_process): use
atomic_inc_return now all architectures support that.

Some of the detailed comments in try_to_unuse have grown out of date:
updated and trimmed some, but leave SWAP_MAP_MAX for another occasion.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

1b46884a

19 Oct, 2004 1 commit

[PATCH] report per-process pagetable usage · b04e12b1

William Lee Irwin III authored 20 years ago


Andi Kleen requested that the number of pagetable pages in use by a process
be reported in /proc/$PID/status; this patch implements that.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b04e12b1

18 Oct, 2004 1 commit

[PATCH] make rlimit settings per-process instead of per-thread · 31180071

Roland McGrath authored 20 years ago

POSIX specifies that the limit settings provided by getrlimit/setrlimit are
shared by the whole process, not specific to individual threads. This
patch changes the behavior of those calls to comply with POSIX.

I've moved the struct rlimit array from task_struct to signal_struct, as it
has the correct sharing properties. (This reduces kernel memory usage per
thread in multithreaded processes by around 100/200 bytes for 32/64
machines respectively.) I took a fairly minimal approach to the locking
issues with the newly shared struct rlimit array. It turns out that all
the code that is checking limits really just needs to look at one word at a
time (one rlim_cur field, usually). It's only the few places like
getrlimit itself (and fork), that require atomicity in accessing a whole
struct rlimit, so I just used a spin lock for them and no locking for most
of the checks. If it turns out that readers of struct rlimit need more
atomicity where they are now cheap, or less overhead where they are now
atomic (e.g. fork), then seqcount is certainly the right thing to use for
them instead of readers using the spin lock. Though it's in signal_struct,
I didn't use siglock since the access to rlimits never needs to disable
irqs and doesn't overlap with other siglock uses. Instead of adding
something new, I overloaded task_lock(task->group_leader) for this; it is
used for other things that are not likely to happen simultaneously with
limit tweaking. To me that seems preferable to adding a word, but it would
be trivial (and arguably cleaner) to add a separate lock for these users
(or e.g. just use seqlock, which adds two words but is optimal for readers).

Most of the changes here are just the trivial s/->rlim/->signal->rlim/.

I stumbled across what must be a long-standing bug, in reparent_to_init.
It does:
memcpy(current->rlim, init_task.rlim, sizeof(*(current->rlim)));
when surely it was intended to be:
memcpy(current->rlim, init_task.rlim, sizeof(current->rlim));
As rlim is an array, the * in the sizeof expression gets the size of the
first element, so this just changes the first limit (RLIMIT_CPU). This is
for kernel threads, where it's clear that resetting all the rlimits is what
you want. With that fixed, the setting of RLIMIT_FSIZE in nfsd is
superfluous since it will now already have been reset to RLIM_INFINITY.

The other subtlety is removing:
tsk->rlim[RLIMIT_CPU].rlim_cur = RLIM_INFINITY;
in exit_notify, which was to avoid a race signalling during self-reaping
exit. As the limit is now shared, a dying thread should not change it for
others. Instead, I avoid that race by checking current->state before the
RLIMIT_CPU check. (Adding one new conditional in that path is now required
one way or another, since if not for this check there would also be a new
race with self-reaping exit later on clearing current->signal that would
have to be checked for.)

The one loose end left by this patch is with process accounting.
do_acct_process temporarily resets the RLIMIT_FSIZE limit while writing the
accounting record. I left this as it was, but it is now changing a limit
that might be shared by other threads still running. I left this in a
dubious state because it seems to me that processing accounting may already
be more generally a dubious state when it comes to NPTL threads. I would
think you would want one record per process, with aggregate data about all
threads that ever lived in it, not a separate record for each thread.
I don't use process accounting myself, but if anyone is interested in
testing it out I could provide a patch to change it this way.

One final note, this is not 100% to POSIX compliance in regards to rlimits.
POSIX specifies that RLIMIT_CPU refers to a whole process in aggregate, not
to each individual thread. I will provide patches later on to achieve that
change, assuming this patch goes in first.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

31180071

02 Sep, 2004 1 commit

[PATCH] Check find_vma return code in make_pages_present() · 3e5583c1

Dave Jones authored 20 years ago


It can return NULL, so check for it.

Spotted with the source checker from Coverity.com.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

3e5583c1

23 Aug, 2004 2 commits

[PATCH] token based thrashing control · d4f9d02b

Rik van Riel authored 20 years ago

The following experimental patch implements token based thrashing
protection, using the algorithm described in:

	http://www.cs.wm.edu/~sjiang/token.htm



When there are pageins going on, a task can grab a token, that protects the
task from pageout (except by itself) until it is no longer doing heavy
pageins, or until the maximum hold time of the token is over.

If the maximum hold time is exceeded, the task isn't eligable to hold the
token for a while more, since it wasn't doing it much good anyway.

I have run a very unscientific benchmark on my system to test the
effectiveness of the patch, timing how a 230MB two-process qsbench run
takes, with and without the token thrashing protection present.

normal 2.6.8-rc6:	6m45s
2.6.8-rc6 + token:	4m24s

This is a quick hack, implemented without having talked to the inventor of
the algorithm.  He's copied on the mail and I suspect we'll be able to do
better than my quick implementation ...
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

d4f9d02b

[PATCH] prio_tree: iterator + vma_prio_tree_next cleanup · 86de37f0

Rajesh Venkatasubramanian authored 20 years ago


Currently we have:

	while ((vma = vma_prio_tree_next(vma, root, &iter,
                                        begin, end)) != NULL)
		do_something_with(vma);

Then iter,root,begin,end are all transfered unchanged to various functions.
 This patch hides them in struct iter instead.

It slightly lessens source, code size, and stack usage.  Patch compiles and
tested lightly.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Rajesh Venkatasubramanian <vrajesh@umich.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

86de37f0

28 Jul, 2004 1 commit

[PATCH] Make get_user_pages() work again for ia64 gate area · cf8b1162

David Mosberger authored 20 years ago


Changeset

  roland@redhat.com[torvalds]|ChangeSet|20040624165002|30880

inadvertently broke ia64 because the patch assumed that pgd_offset_k() is
just an optimization of pgd_offset(), which it is not.  This patch fixes
the problem by introducing pgd_offset_gate().  On architectures on which
the gate area lives in the user's address-space, this should be aliased to
pgd_offset() and on architectures on which the gate area lives in the
kernel-mapped segment, this should be aliased to pgd_offset_k().

This bug was found and tracked down by Peter Chubb.

Signed-off-by: <davidm@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

cf8b1162

29 Jun, 2004 2 commits

[PATCH] fix page->count discrepancy for zero page · 8bfb7092

Dave Hansen authored 20 years ago


While writing some analysis tools for memory hot-remove, we came across a
single page which had a ->count that always increased, without bound.  It
ended up always being the zero page, and it was caused by a leaked
reference in some do_wp_page() code that ends up avoiding PG_reserved
pages.

Basically what happens is that page_cache_release()/put_page() ignore
PG_reserved pages, while page_cache_get()/get_page() go ahead and take the
reference.  So, each time there's a COW fault on the zero-page, you get a
leaked page->count increment.

It's pretty rare to have a COW fault on anything that's PG_reserved, in
fact, I can't think of anything else that this applies to other than the
zero page.

In any case, it the bug doesn't cause any real problems, but it is a bit of
an annoyance and is obviously incorrect.  We've been running with this
patch for about 3 months now, and haven't run into any problems with it.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8bfb7092

sparse: fix pointer/integer confusion · 9893721c

Linus Torvalds authored 20 years ago

I don't think we're in K&R any more, Toto.

If you want a NULL pointer, use NULL. Don't use an integer.

Most of the users really didn't seem to know the proper type.

9893721c

27 Jun, 2004 2 commits

[PATCH] Don't hold i_sem on swapfiles · 7a35e30c

Hugh Dickins authored 20 years ago

We permanently hold the i_sem of swapfiles so that nobody can addidentally
ftruncate them, causing subsequent filesystem destruction.

Problem is, it's fairly easy for things like backup applications to get
stuck onthe swapfile, sleeping until someone does a swapoff.

So take all that out again and add a new S_SWAPFILE inode flag. Test that
in the truncate path and refuse to truncate an in-use swapfile.

Synchronisation between swapon and truncate is via i_sem.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

7a35e30c

[PATCH] fix NUMA boundaray between ZONE_NORMAL and HIGHMEM · 9278aa39

Martin J. Bligh authored 20 years ago


From: Andy Whitcroft <apw@shadowen.org>

This patch eliminates the false hole which can form between ZONE_NORMAL and
ZONE_HIGHMEM.  This is most easily seen when 4g/4g split is enabled, but
it's always broken, and we just happen not to hit it most of the time.
Basically, the patch changes the allocation of the numa remaps regions (the
source of the holes) such that they officially fall within VMALLOC space,
where they belong.

Tested in -mjb for a couple of months, and again against 2.6.7-mm1.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Martin J. Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

9278aa39

24 Jun, 2004 2 commits

[PATCH] fix x86-64 ptrace access to 32-bit vsyscall page · 91bc6523

Roland McGrath authored 20 years ago

When I made get_user_pages support looking up a pte for the "gate" area, I
assumed it would be part of the kernel's fixed mappings. On x86-64 running
a 32-bit task, the 32-bit vsyscall DSO page still has no vma but has its
pte allocated in the user mm in the normal fashion. This patch makes it
use the generic page-table lookup calls rather than the shortcuts.
With this, ptrace on x86-64 can access a 32-bit process's vsyscall page.

The behavior on x86 is unchanged.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

91bc6523

[PATCH] zap_pte_range speedup · 1d5e0f06

Andrew Morton authored 20 years ago


From: Hugh Dickins <hugh@veritas.com>

zap_pte_range is wasting time marking anon pages accessed: its original
!PageSwapCache test should have been reinstated when page_mapping was
changed to return swapper_space; or more simply, just check !PageAnon.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

1d5e0f06

05 Jun, 2004 1 commit

[PATCH] mm: follow_page invalid pte_page · ad6e519b

Hugh Dickins authored 20 years ago


The follow_page write-access case is relying on pte_page before checking
pfn_valid: rearrange that - and we don't need three struct page *pages.

(I notice mempolicy.c's verify_pages is also relying on pte_page, but I'll
leave that to Andi: maybe it ought to be failing on, or skipping over, VM_IO
or VM_RESERVED vmas?)
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

ad6e519b

26 May, 2004 1 commit

Split ptep_establish into "establish" and "update_access_flags" · 6314b8e2

Linus Torvalds authored 20 years ago


ptep_establish() is used to establish a new mapping at COW time,
and it always replaces a non-writable page mapping with a totally
new page mapping that is dirty (and likely writable, although ptrace
may cause a non-writable new mapping). Because it was nonwritable,
we don't have to worry about losing concurrent dirty page bit updates.

ptep_update_access_flags() leaves the same page mapping, but updates
the accessed/dirty/writable bits (it only ever sets them, and never
removes any permissions). Often easier, but it may race with a dirty
bit update on another CPU.

Booted on x86 and ppc64.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

6314b8e2

25 May, 2004 1 commit

Pass in a "dirty" argument to ptep_establish in · c9e1750c

Linus Torvalds authored 20 years ago

preparation for pte update race fix.

This does not actually use the information yet, but
the next few patches will start to put it to some
good use.

c9e1750c

22 May, 2004 7 commits

[PATCH] rmap 39 add anon_vma rmap · 8aa3448c

Andrew Morton authored 20 years ago

From: Hugh Dickins <hugh@veritas.com>

Andrea Arcangeli's anon_vma object-based reverse mapping scheme for anonymous
pages.  Instead of tracking anonymous pages by pte_chains or by mm, this
tracks them by vma.  But because vmas are frequently split and merged
(particularly by mprotect), a page cannot point directly to its vma(s), but
instead to an anon_vma list of those vmas likely to contain the page - a list
on which vmas can easily be linked and unlinked as they come and go.  The vmas
on one list are all related, either by forking or by splitting.

This has three particular advantages over anonmm: that it can cope
effortlessly with mremap moves; and no longer needs page_table_lock to protect
an mm's vma tree, since try_to_unmap finds vmas via page -> anon_vma -> vma
instead of using find_vma; and should use less cpu for swapout since it can
locate its anonymous vmas more quickly.

It does have disadvantages too: a lot more change in mmap.c to deal with
anon_vmas, though small straightforward additions now that the vma merging has
been refactored there; more lowmem needed for each anon_vma and vma structure;
an additional restriction on the merging of vmas (cannot be merged if already
assigned different anon_vmas, since then their pages will be pointing to
different heads).

(There would be no need to enlarge the vma structure if anonymous pages
belonged only to anonymous vmas; but private file mappings accumulate
anonymous pages by copy-on-write, so need to be listed in both anon_vma and
prio_tree at the same time.  A different implementation could avoid that by
using anon_vmas only for purely anonymous vmas, and use the existing prio_tree
to locate cow pages - but that would involve a long search for each single
private copy, probably not a good idea.)

Where before the vm_pgoff of a purely anonymous (not file-backed) vma was
meaningless, now it represents the virtual start address at which that vma is
mapped - which the standard file pgoff manipulations treat linearly as vmas
are split and merged.  But if mremap moves the vma, then it generally carries
its original vm_pgoff to the new location, so pages shared with the old
location can still be found.  Magic.

Hugh has massaged it somewhat: building on the earlier rmap patches, this
patch is a fifth of the size of Andrea's original anon_vma patch.  Please note
that this posting will be his first sight of this patch, which he may or may
not approve.

8aa3448c

[PATCH] rmap 38 remove anonmm rmap · a89cd0f0

Andrew Morton authored 20 years ago

From: Hugh Dickins <hugh@veritas.com>

Before moving on to anon_vma rmap, remove now what's peculiar to anonmm rmap:
the anonmm handling and the mremap move cows.  Temporarily reduce
page_referenced_anon and try_to_unmap_anon to stubs, so a kernel built with
this patch will not swap anonymous at all.

a89cd0f0

[PATCH] rmap 37 page_add_anon_rmap vma · e1fd9cc9

Andrew Morton authored 20 years ago

From: Hugh Dickins <hugh@veritas.com>

Silly final patch for anonmm rmap: change page_add_anon_rmap's mm arg to vma
arg like anon_vma rmap, to smooth the transition between them.

e1fd9cc9

[PATCH] rmap 32 zap_pmd_range wrap · 5911438d

Andrew Morton authored 20 years ago

From: Hugh Dickins <hugh@veritas.com>

From: Andrea Arcangeli <andrea@suse.de>

zap_pmd_range, alone of all those page_range loops, lacks the check for
whether address wrapped.  Hugh is in doubt as to whether this makes any
difference to any config on any arch, but eager to fix the odd one out.

5911438d

[PATCH] rmap 31 unlikely bad memory · 68c45e43

Andrew Morton authored 20 years ago

From: Hugh Dickins <hugh@veritas.com>

From: Andrea Arcangeli <andrea@suse.de>

Sprinkle unlikelys throughout mm/memory.c, wherever we see a pgd_bad or a
pmd_bad; likely or unlikely on pte_same or !pte_same.  Put the jump in the
error return from do_no_page, not in the fast path.

68c45e43

[PATCH] rmap 20 i_mmap_shared into i_mmap · c7a491f0

Andrew Morton authored 20 years ago

From: Hugh Dickins <hugh@veritas.com>

Why should struct address_space have separate i_mmap and i_mmap_shared
prio_trees (separating !VM_SHARED and VM_SHARED vmas)? No good reason, the
same processing is usually needed on both. Merge i_mmap_shared into i_mmap,
but keep i_mmap_writable count of VM_SHARED vmas (those capable of dirtying
the underlying file) for the mapping_writably_mapped test.

The VM_MAYSHARE test in the arm and parisc loops is not necessarily what they
will want to use in the end: it's provided as a harmless example of what might
be appropriate, but maintainers are likely to revise it later (that parisc
loop is currently being changed in the parisc tree anyway).

On the way, remove the now out-of-date comments on vm_area_struct size.

c7a491f0

[PATCH] unmap_mapping_range: add comment · 4e44e085
Andrew Morton authored 20 years ago

4e44e085