Commits · 42fda66387daa53538ae13a2c858396aaf037158 · Kirill Smelkov / linux

16 Oct, 2007 40 commits

uml: throw out CONFIG_MODE_TT · 42fda663

Jeff Dike authored Oct 16, 2007

This patchset throws out tt mode, which has been non-functional for a while.

This is done in phases, interspersed with code cleanups on the affected files.

The removal is done as follows:
	remove all code, config options, and files which depend on
CONFIG_MODE_TT
	get rid of the CHOOSE_MODE macro, which decided whether to
call tt-mode or skas-mode code, and replace invocations with their
skas portions
	replace all now-trivial procedures with their skas equivalents

There are now a bunch of now-redundant pieces of data structures, including
mode-specific pieces of the thread structure, pt_regs, and mm_context.  These
are all replaced with their skas-specific contents.

As part of the ongoing style compliance project, I made a style pass over all
files that were changed.  There are three such patches, one for each phase,
covering the files affected by that phase but no later ones.

I noticed that we weren't freeing the LDT state associated with a process when
it exited, so that's fixed in one of the later patches.

The last patch is a tidying patch which I've had for a while, but which caused
inexplicable crashes under tt mode.  Since that is no longer a problem, this
can now go in.

This patch:

Start getting rid of tt mode support.

This patch throws out CONFIG_MODE_TT and all config options, code, and files
which depend on it.

CONFIG_MODE_SKAS is gone and everything that depends on it is included
unconditionally.

The few changed lines are in re-written Kconfig help, lines which needed
something skas-related removed from them, and a few more which weren't
strictly deletions.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

42fda663

UML: remove unnecessary hostfs_getattr() · a1ff5878

Miklos Szeredi authored Oct 16, 2007

Currently hostfs_getattr() just defines the default behavior.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a1ff5878

uml: add VDE networking support · ad43c356

Jeff Dike authored Oct 16, 2007

Added vde network backend in uml to introduce native Virtual Distributed
Ethernet support (using libvdeplug).
Signed-off-by: Luca Bigliardi <shammash@artha.org>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ad43c356

uml: physmem code tidying · 6d536e4b

Jeff Dike authored Oct 16, 2007

Tidying of the UML physical memory system.  These are mostly style fixes,
however the includes were cleaned as well.  This uncovered a need for
mem_user.h to be included in mode_kern_skas.h.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6d536e4b

uml: stop saving process FP state · 42daba31

Jeff Dike authored Oct 16, 2007

Throw out a lot of code dealing with saving and restoring floating-point
state. In skas mode, where processes run in a restoring floating-point state
on kernel entry and exit is pointless.

This eliminates most of arch/um/os-Linux/sys-{i386,x86_64}/registers.c. Most
of what remained is now arch-indpendent, and can be moved up to
arch/um/os-Linux/registers.c. Both arches need the jmp_buf accessor
get_thread_reg, and i386 needs {save,restore}_fp_regs because it cheats during
sigreturn by getting the fp state using ptrace rather than copying it out of
the process sigcontext.

After this, it turns out that arch/um/include/skas/mode-skas.h is almost
completely unneeded. The declarations in it are variables which either don't
exist or which don't have global scope. The one exception is
kill_off_processes_skas. If that's removed, this header can be deleted.

This uncovered a bug in user.h, which wasn't correctly making sure that a
size_t definition was available to both userspace and kernelspace files.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

42daba31

uml: stop specially protecting kernel stacks · 5c8aacea

Jeff Dike authored Oct 16, 2007

Map all of physical memory as executable to avoid having to change stack
protections during fork and exit.

unprotect_stack is now called only from MODE_TT code, so it is marked as such.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5c8aacea

uml: fix nonremovability of watchdog · 8d820760

Jeff Dike authored Oct 16, 2007

The UML watchdog driver was using the wrong config variable to control whether
it can be unloaded once active.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8d820760

uml: fix an IPV6 libc vs kernel symbol clash · 818f6ef4

Jeff Dike authored Oct 16, 2007

On some systems, with IPV6 configured, there is a clash between the kernel's
in6addr_any and the one in libc.

This is handled in the usual (gross) way of defining the kernel symbol out of
the way on the gcc command line.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

818f6ef4

uml: stop using libc asm/page.h · 71f926f2

Jeff Dike authored Oct 16, 2007

Remove includes of asm/page.h from libc code.  This header seems to be
disappearing, and UML doesn't make much use of it anyway.

The one use, PAGE_SHIFT in stub.h, is handled by copying the constant from the
kernel side of the house in common_offsets.h.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

71f926f2

uml: console tidying · 2f8a2dc2

Jeff Dike authored Oct 16, 2007

Tidy line.c:
	The includes are more minimal
	Lots of style fixes
	All the printks have severities
	Removed some commented-out code
	Deleted a useless printk when ioctl is called
	Fixed some whitespace damage
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2f8a2dc2

uml: fix console writing bugs · c59dbcad

Jeff Dike authored Oct 16, 2007

The previous console cleanup patch switched generic_read and generic_write
from calling os_{read,write}_file to calling read and write directly. Because
the calling convention is different, they now need to get any error from errno
rather than the return value. I did this for generic_read, but forgot about
generic_write.

While chasing some output corruption, I noticed that line_write was
unnecessarily calling flush_buffer, and deleted it. I don't understand why,
but the corruption disappeared. This is unneeded because there already is a
perfectly good mechanism for finding out when the host output device has some
room to write data - there is an interrupt that comes in when writes can
happen again. line_write calling flush_buffer seemed to just be an attempt to
opportunistically get some data out to the host.

I also made write_chan short-circuit calling into the host-level code for
zero-length writes. Calling libc write with a length of zero conflated write
not being able to write anything with asking it not to write anything. Better
to just cut it off as soon as possible.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c59dbcad

uml: console subsystem tidying · e99525f9

Jeff Dike authored Oct 16, 2007

This does a lot of cleanup on the UML console system.  This patch should be
entirely non-functional.

The tidying is as follows:
	header cleanups - the includes should be closer to minimal and complete
	all printks now have a severity
	lots of style fixes
	fd_close is restructured a little in order to reduce the nesting
	some functions were calling the os_* wrappers when they can
call libc directly
	port_accept had a unnecessary variable
	it also tested a pid unecessarily before killing it
	some functions were made static
	xterm_free is gone, as it was identical to generic_free
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e99525f9

uml: fix error cleanup ordering · 79f66233

Jeff Dike authored Oct 16, 2007

I messed up the error cleanup ordering in the console port driver.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

79f66233

uml: tidy recently-moved code · 8e2d10e1

Jeff Dike authored Oct 16, 2007

Now that the generic console operations are in a userspace file, we
can do the following:
	directly call into libc instead of through the os_* wrappers
	eliminate os_window_size since it has only one user
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8e2d10e1

uml: move userspace code to userspace file · 89fe6476

Jeff Dike authored Oct 16, 2007

Move some code from a kernelspace file to a userspace file where it fits
better. This enables some tidying which is the subject of a later patch.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

89fe6476

tty: bring the old cris driver back somewhere into the realm of new tty buffering · 2090ab05

Alan Cox authored Oct 16, 2007

Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2090ab05

CRIS: cleanup struct irqaction initializers · e5f71781

Thomas Gleixner authored Oct 16, 2007

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mikael Starvik <starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e5f71781

m32r: convert to generic sys_ptrace · 0ac15559

Christoph Hellwig authored Oct 16, 2007

Convert m32r to the generic sys_ptrace.  The conversion requires an
architecture hook after ptrace_attach which this patch adds.  The hook
will also be needed for a conersion of ia64 to the generic ptrace code.

Thanks to Hirokazu Takata for fixing a bug in the first version of this
code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0ac15559

m32r: serial: remove M32R_SIO_SHARE_IRQS · dab8f496

Hirokazu Takata authored Oct 16, 2007

Remove an unused symbol M32R_SIO_SHARE_IRQS from drivers/serial/m32r_sio.h.
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Cc: "Robert P. J. Day" <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dab8f496

M32R: cleanup struct irqaction initializers · 977676ac

Thomas Gleixner authored Oct 16, 2007

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

977676ac

include/asm-m32r/thread_info.h: kmalloc + memset conversion to kzalloc · 3165c0d1

Mariusz Kozlowski authored Oct 16, 2007

Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3165c0d1

alpha: beautify vmlinux.lds · b2b5d37d

Sam Ravnborg authored Oct 16, 2007

Introduced a consistent style in vmlinux.lds and it now matches the
soon-to-be common style for all arch's vmlinux.lds files.

In addition:
- Replaced hardcoded constant with PAGE_SIZE
- Fix page.h so PAGE_SIZE can be used from assembler and in lds files
- Move a few labels inside brackets so linker alignment will not
  make label point ot a too low address
- Replaced DWARF and STABS sections with definitions from asm-generic
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b2b5d37d

alpha: convert to generic sys_ptrace · a5f833f3

Christoph Hellwig authored Oct 16, 2007

This patch converts alpha to the generic sys_ptrace.  We use
force_successful_syscall_return to avoid having to pass the pt_regs pointer
down to the function.  I think the removal of the assemly stub is correct,
but I could only compile-test this patch, so please give it a spin before
commiting :)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a5f833f3

cleanup arch/alpha/Makefile · d9ff5f38

Adrian Bunk authored Oct 16, 2007

- binutils 2.7 is far below the current minimum supported version,
  and there's therefore no longer a need for an extra test

- since even gcc 3.2 already supports all options used we can use them
  unconditionally
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d9ff5f38

M68KNOMMU: remove unused config symbol CONFIG_DISKtel · fabc7f66

Greg Ungerer authored Oct 16, 2007

Remove unused config symbol CONFIG_DISKtel.
Pointed out by Robert P. J. Day <rpjday@mindspring.com>.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fabc7f66

FRV: cleanup struct irqaction initializers · a2e170ea

Thomas Gleixner authored Oct 16, 2007

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a2e170ea

include/asm-frv/thread_info.h: kmalloc + memset conversion to kzalloc · 33bbf959

Mariusz Kozlowski authored Oct 16, 2007

Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Acked-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

33bbf959

remove frv usage of flush_tlb_pgtables() · ef0fce85

Benjamin Herrenschmidt authored Oct 16, 2007

frv is the last user in the tree of that dubious hook, and it's my
understanding that it's not even needed.  It's only called by memory.c
free_pgd_range() which is always called within an mmu_gather, and
tlb_flush() on frv will do a flush_tlb_mm(), which from my reading of the
code, seems to do what flush_tlb_ptables() does, which is to clear the
cached PGE.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ef0fce85

mm: add node states sysfs class attributeS · bde631a5

Lee Schermerhorn authored Oct 16, 2007

Add a per node state sysfs class attribute file to /sys/devices/system/node
to display node state masks.

E.g., on a 4-cell HP ia64 NUMA platform, we have 5 nodes: 4 representing
the actual hardware cells and one memory-only pseudo-node representing a
small amount [512MB] of "hardware interleaved" memory.  With this patch, in
/sys/devices/system/node we see:

#ls -1F /sys/devices/system/node
has_cpu
has_normal_memory
node0/
node1/
node2/
node3/
node4/
online
possible
#cat /sys/devices/system/node/possible
0-255
#cat /sys/devices/system/node/online
0-4
#cat /sys/devices/system/node/has_normal_memory
0-4
#cat /sys/devices/system/node/has_cpu
0-3
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bde631a5

mm/vmstat.c: cleanups · e2fc88d0

Adrian Bunk authored Oct 16, 2007

This patch contains the following cleanups:
- make the needlessly global setup_vmstat() static
- remove the unused refresh_vm_stats()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e2fc88d0

mm/mempolicy.c: cleanups · dbcb0f19

Adrian Bunk authored Oct 16, 2007

This patch contains the following cleanups:
- every file should include the headers containing the prototypes for
  its global functions
- make the follosing needlessly global functions static:
  - migrate_to_node()
  - do_mbind()
  - sp_alloc()
  - mpol_rebind_policy()

[akpm@linux-foundation.org: fix uninitialised var warning]
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dbcb0f19

mm/shmem.c: make 3 functions static · d8dc74f2

Adrian Bunk authored Oct 16, 2007

This patch makes three needlessly global functions static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d8dc74f2

hugetlb: fix dynamic pool resize failure case · af767cbd

Adam Litke authored Oct 16, 2007

When gather_surplus_pages() fails to allocate enough huge pages to satisfy
the requested reservation, it frees what it did allocate back to the buddy
allocator.  put_page() should be called instead of update_and_free_page()
to ensure that pool counters are updated as appropriate and the page's
refcount is decremented.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Acked-by: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

af767cbd

hugetlb: fix hugepage allocation with memoryless nodes · 63b4613c

Nishanth Aravamudan authored Oct 16, 2007

Anton found a problem with the hugetlb pool allocation when some nodes have
no memory (http://marc.info/?l=linux-mm&m=118133042025995&w=2).  Lee worked
on versions that tried to fix it, but none were accepted.  Christoph has
created a set of patches which allow for GFP_THISNODE allocations to fail
if the node has no memory.

Currently, alloc_fresh_huge_page() returns NULL when it is not able to
allocate a huge page on the current node, as specified by its custom
interleave variable.  The callers of this function, though, assume that a
failure in alloc_fresh_huge_page() indicates no hugepages can be allocated
on the system period.  This might not be the case, for instance, if we have
an uneven NUMA system, and we happen to try to allocate a hugepage on a
node with less memory and fail, while there is still plenty of free memory
on the other nodes.

To correct this, make alloc_fresh_huge_page() search through all online
nodes before deciding no hugepages can be allocated.  Add a helper function
for actually allocating the hugepage.  Use a new global nid iterator to
control which nid to allocate on.

Note: we expect particular semantics for __GFP_THISNODE, which are now
enforced even for memoryless nodes.  That is, there is should be no
fallback to other nodes.  Therefore, we rely on the nid passed into
alloc_pages_node() to be the nid the page comes from.  If this is
incorrect, accounting will break.

Tested on x86 !NUMA, x86 NUMA, x86_64 NUMA and ppc64 NUMA (with 2
memoryless nodes).

Before on the ppc64 box:
Trying to clear the hugetlb pool
Done.       0 free
Trying to resize the pool to 100
Node 0 HugePages_Free:     25
Node 1 HugePages_Free:     75
Node 2 HugePages_Free:      0
Node 3 HugePages_Free:      0
Done. Initially     100 free
Trying to resize the pool to 200
Node 0 HugePages_Free:     50
Node 1 HugePages_Free:    150
Node 2 HugePages_Free:      0
Node 3 HugePages_Free:      0
Done.     200 free

After:
Trying to clear the hugetlb pool
Done.       0 free
Trying to resize the pool to 100
Node 0 HugePages_Free:     50
Node 1 HugePages_Free:     50
Node 2 HugePages_Free:      0
Node 3 HugePages_Free:      0
Done. Initially     100 free
Trying to resize the pool to 200
Node 0 HugePages_Free:    100
Node 1 HugePages_Free:    100
Node 2 HugePages_Free:      0
Node 3 HugePages_Free:      0
Done.     200 free
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

63b4613c

hugetlb: fix pool resizing corner case · 6b0c880d

Adam Litke authored Oct 16, 2007

When shrinking the size of the hugetlb pool via the nr_hugepages sysctl, we
are careful to keep enough pages around to satisfy reservations.  But the
calculation is flawed for the following scenario:

Action                          Pool Counters (Total, Free, Resv)
======                          =============
Set pool to 1 page              1 1 0
Map 1 page MAP_PRIVATE          1 1 0
Touch the page to fault it in   1 0 0
Set pool to 3 pages             3 2 0
Map 2 pages MAP_SHARED          3 2 2
Set pool to 2 pages             2 1 2 <-- Mistake, should be 3 2 2
Touch the 2 shared pages        2 0 1 <-- Program crashes here

The last touch above will terminate the process due to lack of huge pages.

This patch corrects the calculation so that it factors in pages being used
for private mappings.  Andrew, this is a standalone fix suitable for
mainline.  It is also now corrected in my latest dynamic pool resizing
patchset which I will send out soon.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Acked-by: Ken Chen <kenchen@google.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6b0c880d

hugetlbfs read() support · e63e1e5a

Badari Pulavarty authored Oct 16, 2007

Support for reading from hugetlbfs files.  libhugetlbfs lets application
text/data to be placed in large pages.  When we do that, oprofile doesn't
work - since libbfd tries to read from it.

This code is very similar to what do_generic_mapping_read() does, but I
can't use it since it has PAGE_CACHE_SIZE assumptions.

[akpm@linux-foundation.org: cleanups, fix leak]
[bunk@stusta.de: make hugetlbfs_read() static]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: William Irwin <bill.irwin@oracle.com>
Tested-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e63e1e5a

hugetlb: allow extending ftruncate on hugetlbfs · 7aa91e10

Ken Chen authored Oct 16, 2007

For historical reason, expanding ftruncate that increases file size on
hugetlbfs is not allowed due to pages were pre-faulted and lack of fault
handler.  Now that we have demand faulting on hugetlb since 2.6.15, there
is no reason to hold back that limitation.

This will make hugetlbfs behave more like a normal fs.  I'm writing a user
level code that uses hugetlbfs but will fall back to tmpfs if there are no
hugetlb page available in the system.  Having hugetlbfs specific ftruncate
behavior is a bit quirky and I would like to remove that artificial
limitation.

Signed-off-by: <kenchen@google.com>
Acked-by: Wiliam Irwin <wli@holomorphy.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7aa91e10

hugetlb: Add hugetlb_dynamic_pool sysctl · 54f9f80d

Adam Litke authored Oct 16, 2007

The maximum size of the huge page pool can be controlled using the overall
size of the hugetlb filesystem (via its 'size' mount option).  However in the
common case the this will not be set as the pool is traditionally fixed in
size at boot time.  In order to maintain the expected semantics, we need to
prevent the pool expanding by default.

This patch introduces a new sysctl controlling dynamic pool resizing.  When
this is enabled the pool will expand beyond its base size up to the size of
the hugetlb filesystem.  It is disabled by default.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Dave McCracken <dave.mccracken@oracle.com>
Cc: William Irwin <bill.irwin@oracle.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Ken Chen <kenchen@google.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

54f9f80d

hugetlb: Try to grow hugetlb pool for MAP_SHARED mappings · e4e574b7

Adam Litke authored Oct 16, 2007

Shared mappings require special handling because the huge pages needed to
fully populate the VMA must be reserved at mmap time.  If not enough pages are
available when making the reservation, allocate all of the shortfall at once
from the buddy allocator and add the pages directly to the hugetlb pool.  If
they cannot be allocated, then fail the mapping.  The page surplus is
accounted for in the same way as for private mappings; faulted surplus pages
will be freed at unmap time.  Reserved, surplus pages that have not been used
must be freed separately when their reservation has been released.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Dave McCracken <dave.mccracken@oracle.com>
Cc: William Irwin <bill.irwin@oracle.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Ken Chen <kenchen@google.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e4e574b7

hugetlb: Try to grow hugetlb pool for MAP_PRIVATE mappings · 7893d1d5

Adam Litke authored Oct 16, 2007

Because we overcommit hugepages for MAP_PRIVATE mappings, it is possible that
the hugetlb pool will be exhausted or completely reserved when a hugepage is
needed to satisfy a page fault.  Before killing the process in this situation,
try to allocate a hugepage directly from the buddy allocator.

The explicitly configured pool size becomes a low watermark.  When dynamically
grown, the allocated huge pages are accounted as a surplus over the watermark.
 As huge pages are freed on a node, surplus pages are released to the buddy
allocator so that the pool will shrink back to the watermark.

Surplus accounting also allows for friendlier explicit pool resizing.  When
shrinking a pool that is fully in-use, increase the surplus so pages will be
returned to the buddy allocator as soon as they are freed.  When growing a
pool that has a surplus, consume the surplus first and then allocate new
pages.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Dave McCracken <dave.mccracken@oracle.com>
Cc: William Irwin <bill.irwin@oracle.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Ken Chen <kenchen@google.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7893d1d5