Commits · 86c562a9d6683063e071692fe14e0a18e64ee1be · Kirill Smelkov / linux

19 Jan, 2006 15 commits

[PATCH] mm: optimize numa policy handling in slab allocator · 86c562a9

Christoph Lameter authored Jan 18, 2006

Move the interrupt check from slab_node into ___cache_alloc and adds an
"unlikely()" to avoid pipeline stalls on some architectures.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

86c562a9

[PATCH] NUMA policies in the slab allocator V2 · dc85da15

Christoph Lameter authored Jan 18, 2006

This patch fixes a regression in 2.6.14 against 2.6.13 that causes an
imbalance in memory allocation during bootup.

The slab allocator in 2.6.13 is not numa aware and simply calls
alloc_pages(). This means that memory policies may control the behavior of
alloc_pages(). During bootup the memory policy is set to MPOL_INTERLEAVE
resulting in the spreading out of allocations during bootup over all
available nodes. The slab allocator in 2.6.13 has only a single list of
slab pages. As a result the per cpu slab cache and the spinlock controlled
page lists may contain slab entries from off node memory. The slab
allocator in 2.6.13 makes no effort to discern the locality of an entry on
its lists.

The NUMA aware slab allocator in 2.6.14 controls locality of the slab pages
explicitly by calling alloc_pages_node(). The NUMA slab allocator manages
slab entries by having lists of available slab pages for each node. The
per cpu slab cache can only contain slab entries associated with the node
local to the processor. This guarantees that the default allocation mode
of the slab allocator always assigns local memory if available.

Setting MPOL_INTERLEAVE as a default policy during bootup has no effect
anymore. In 2.6.14 all node unspecific slab allocations are performed on
the boot processor. This means that most of key data structures are
allocated on one node. Most processors will have to refer to these
structures making the boot node a potential bottleneck. This may reduce
performance and cause unnecessary memory pressure on the boot node.

This patch implements NUMA policies in the slab layer. There is the need
of explicit application of NUMA memory policies by the slab allcator itself
since the NUMA slab allocator does no longer let the page_allocator control
locality.

The check for policies is made directly at the beginning of __cache_alloc
using current->mempolicy. The memory policy is already frequently checked
by the page allocator (alloc_page_vma() and alloc_page_current()). So it
is highly likely that the cacheline is present. For MPOL_INTERLEAVE
kmalloc() will spread out each request to one node after another so that an
equal distribution of allocations can be obtained during bootup.

It is not possible to push the policy check to lower layers of the NUMA
slab allocator since the per cpu caches are now only containing slab
entries from the current node. If the policy says that the local node is
not to be preferred or forbidden then there is no point in checking the
slab cache or local list of slab pages. The allocation better be directed
immediately to the lists containing slab entries for the allowed set of
nodes.

This way of applying policy also fixes another strange behavior in 2.6.13.
alloc_pages() is controlled by the memory allocation policy of the current
process. It could therefore be that one process is running with
MPOL_INTERLEAVE and would f.e. obtain a new page following that policy
since no slab entries are in the lists anymore. A page can typically be
used for multiple slab entries but lets say that the current process is
only using one. The other entries are then added to the slab lists. These
are now non local entries in the slab lists despite of the possible
availability of local pages that would provide faster access and increase
the performance of the application.

Another process without MPOL_INTERLEAVE may now run and expect a local slab
entry from kmalloc(). However, there are still these free slab entries
from the off node page obtained from the other process via MPOL_INTERLEAVE
in the cache. The process will then get an off node slab entry although
other slab entries may be available that are local to that process. This
means that the policy if one process may contaminate the locality of the
slab caches for other processes.

This patch in effect insures that a per process policy is followed for the
allocation of slab entries and that there cannot be a memory policy
influence from one process to another. A process with default policy will
always get a local slab entry if one is available. And the process using
memory policies will get its memory arranged as requested. Off-node slab
allocation will require the use of spinlocks and will make the use of per
cpu caches not possible. A process using memory policies to redirect
allocations offnode will have to cope with additional lock overhead in
addition to the latency added by the need to access a remote slab entry.

Changes V1->V2
- Remove #ifdef CONFIG_NUMA by moving forward declaration into
prior #ifdef CONFIG_NUMA section.

- Give the function determining the node number to use a saner
name.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

dc85da15

[PATCH] sem2mutex: mm/slab.c · fc0abb14

Ingo Molnar authored Jan 18, 2006

Convert mm/swapfile.c's swapon_sem to swapon_mutex.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

fc0abb14

[PATCH] Zone reclaim: proc override · 1743660b

Christoph Lameter authored Jan 18, 2006

proc support for zone reclaim

This patch creates a proc entry /proc/sys/vm/zone_reclaim_mode that may be
used to override the automatic determination of the zone reclaim made on
bootup.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

1743660b

[PATCH] Zone reclaim: Reclaim logic · 9eeff239

Christoph Lameter authored Jan 18, 2006

Some bits for zone reclaim exists in 2.6.15 but they are not usable. This
patch fixes them up, removes unused code and makes zone reclaim usable.

Zone reclaim allows the reclaiming of pages from a zone if the number of
free pages falls below the watermarks even if other zones still have enough
pages available. Zone reclaim is of particular importance for NUMA
machines. It can be more beneficial to reclaim a page than taking the
performance penalties that come with allocating a page on a remote zone.

Zone reclaim is enabled if the maximum distance to another node is higher
than RECLAIM_DISTANCE, which may be defined by an arch. By default
RECLAIM_DISTANCE is 20. 20 is the distance to another node in the same
component (enclosure or motherboard) on IA64. The meaning of the NUMA
distance information seems to vary by arch.

If zone reclaim is not successful then no further reclaim attempts will
occur for a certain time period (ZONE_RECLAIM_INTERVAL).

This patch was discussed before. See

http://marc.theaimsgroup.com/?l=linux-kernel&m=113519961504207&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113408418232531&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113389027420032&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113380938612205&w=2Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

9eeff239

[PATCH] Zone reclaim: resurrect may_swap · f1fd1067

Christoph Lameter authored Jan 18, 2006

Zone reclaim has a huge impact on NUMA performance (f.e.  our maximum
throughput with XFS is raised from 4GB to 6GB/sec / page cache contamination
of numa nodes destroys locality if one just does a large copy operation which
results in performance dropping for good until reboot).

This patch:

Resurrect may_swap in struct scan_control
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

f1fd1067

[PATCH] Simplify migrate_page_add · fc301289

Christoph Lameter authored Jan 18, 2006

Simplify migrate_page_add after feedback from Hugh.  This also allows us to
drop one parameter from migrate_page_add.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

fc301289

[PATCH] mm: migration page refcounting fix · 053837fc

Nick Piggin authored Jan 18, 2006

Migration code currently does not take a reference to target page
properly, so between unlocking the pte and trying to take a new
reference to the page with isolate_lru_page, anything could happen to
it.

Fix this by holding the pte lock until we get a chance to elevate the
refcount.

Other small cleanups while we're here.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

053837fc

[PATCH] mm: dirty_exceeded speedup · e236a166

Andrew Morton authored Jan 18, 2006

Ravikiran reports that this variable is bouncing all around nodes on NUMA
machines, causing measurable performance problems.  Fix that up by only
writing to it when it actually changed.

And put it in a new cacheline to prevent it sharing with other things (this
happened).
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

e236a166

[PATCH] Prevent trident driver from grabbing pcnet32 hardware · c2aeacd4

Jon Mason authored Jan 18, 2006

Some pcnet32 hardware erroneously has the Vendor ID for Trident.  The
pcnet32 driver looks for the PCI ethernet class before grabbing the
hardware, but the current trident driver does not check against the PCI
audio class.  This allows the trident driver to claim the pcnet32 hardware.
 This patch prevents that.

This revised version of the OSS Trident patch includes PCI_DEVICE Macro
usage.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Muli Ben-Yehuda <mulix@mulix.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

c2aeacd4

[PATCH] synclink_gt fix size of register value storage · 35fbd397

Paul Fulghum authored Jan 18, 2006

Fix incorrect variable size used to hold register value.  This bug might
wipe out a portion of the TCR value when setting the interface options.
Signed-off-by: Paul Fulghum <paulkf@microgate.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

35fbd397

[PATCH] scsi_transport_spi build fix · c8d338c8

Andrew Morton authored Jan 18, 2006

On alpha:

In file included from drivers/scsi/sym53c8xx_2/sym_glue.h:59,
                 from drivers/scsi/sym53c8xx_2/sym_fw.c:40:
include/scsi/scsi_transport_spi.h:57: error: field `dv_mutex' has incomplete type

Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

c8d338c8

[PATCH] x86_64: Fix MCE exception stack for boot CPU · ab26a20b

Jan Beulich authored Jan 18, 2006

Fix a typo/mis-merge in one of the previous patches.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

ab26a20b

[PATCH] jbd: remove_transaction fix · 43c3e6f5

Jan Kara authored Jan 18, 2006

We have to check that also the second checkpoint list is non-empty before
dropping the transaction.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

43c3e6f5

[PATCH] jbd: log_do_checkpoint fix · 8d3c7fce

Jan Kara authored Jan 18, 2006

While checkpointing we have to check that our transaction still is in the
checkpoint list *and* (not or) that it's not just a different transaction
with the same address.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8d3c7fce

18 Jan, 2006 25 commits

Merge master.kernel.org:/home/rmk/linux-2.6-serial · 2149bcab
Linus Torvalds authored Jan 18, 2006

2149bcab
Merge master.kernel.org:/home/rmk/linux-2.6-arm · 2333f212
Linus Torvalds authored Jan 18, 2006

2333f212
Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 · 097916ec
Linus Torvalds authored Jan 18, 2006

097916ec
Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 · 3da38566
Linus Torvalds authored Jan 18, 2006

3da38566

[SPARC64]: Fix build with CONFIG_COMPAT disabled. · 959a85ad

David S. Miller authored Jan 18, 2006

Based upon a report and preliminary patch from Jim Gifford.
Signed-off-by: David S. Miller <davem@davemloft.net>

959a85ad

Merge master.kernel.org:/pub/scm/linux/kernel/git/tmlind/linux-omap-upstream · 37b797b2
Russell King authored Jan 18, 2006

37b797b2

[SPARC64]: Serial Console for E250 Patch · c126cf80

Eddie C. Dost authored Jan 18, 2006

From: Eddie C. Dost <ecd@brainaid.de>

I have the following patch for serial console over the RSC
(remote system controller) on my E250 machine. It basically adds
support for input-device=rsc and output-device=rsc from OBP, and
allows 115200,8,n,1,- serial mode setting.
Signed-off-by: David S. Miller <davem@davemloft.net>

c126cf80

[MAINTAINERS]: add entry for wireless networking · 29f8f632

John W. Linville authored Jan 18, 2006

Add an entry to MAINTAINERS for wireless networking, just so people
know whom to bless with patches.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

29f8f632

[MAINTAINERS]: correct location for net-2.6.git · d5ca3117

John W. Linville authored Jan 18, 2006

Correct location info for net-2.6 git tree.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5ca3117

[ARM] 3281/1: ixp4xx: export ixp4xx_exp_bus_size for modules · 1e74c891

David Vrabel authored Jan 18, 2006

Patch from David Vrabel

Export ixp4xx_exp_bus_size so modules can use the IXP4XX_EXP_BUS_BASE(n) macro.

Also, fix a printk format warning.
Signed-off-by: David Vrabel <dvrabel@arcom.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

1e74c891

[ARM] 3272/1: fix kernel decompressor crash · 265d5e48

Nicolas Pitre authored Jan 18, 2006

Patch from Nicolas Pitre

Commit f4619025 broke the kernel
decompressor (at least on PXA).  Here's the fix.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

265d5e48

[ARM] 3271/1: ARM EABI: fix calling of cmpxchg syscall emulation · 5e097445

Nicolas Pitre authored Jan 18, 2006

Patch from Nicolas Pitre

This is kernel provided user space code.

Since a syscall is used, it has to be updated to work with EABI.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

5e097445

[ARM] 3270/1: ARM EABI: fix sigreturn and rt_sigreturn · fcca538b

Nicolas Pitre authored Jan 18, 2006

Patch from Nicolas Pitre

The signal return path consists of user code provided by the kernel.
Since a syscall is used, it has to be updated to work with EABI.

Noticed by Daniel Jacobowitz.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

fcca538b

[ARM] 3268/1: AT91RM9200 serial update for 2.6.15-git12 · 1230b404

Andrew Victor authored Jan 18, 2006

Patch from Andrew Victor

This patch fixes two small issues with 2.6.15-git12.

1) Corrected major/minor numbers for ttyAT devices in the KConfig help.
   (Patch from Karl Olsen)

2) tty->flip.count has been removed.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

1230b404

[ARM] 3267/1: PXA27x SSP controller register defines · 68477d11

David Vrabel authored Jan 18, 2006

Patch from David Vrabel

PXA27x SSP controller has a few different registers, including SCR (serial clock rate) in SSCR0.
Signed-off-by: David Vrabel <dvrabel@arcom.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

68477d11

Merge git://tipc.cslab.ericsson.net/pub/git/tipc · 27a7b041
David S. Miller authored Jan 18, 2006

27a7b041

[IPV4]: Fix multiple bugs in IGMPv3 · ad12583f

David L Stevens authored Jan 18, 2006

1) fix "mld_marksources()" to
        a) send nothing when all queried sources are excluded
        b) send full exclude report when source queried sources are
                not excluded
        c) don't schedule a timer when there's nothing to report

2) fix "add_grec()" to send empty-source records when it should
        The original check doesn't account for a non-empty source
        list with all sources inactive; the new code keeps that
        short-circuit case, and also generates the group header
        with an empty list if needed.

3) fix mca_crcount decrement to be after add_grec(), which needs
        its original value

4) add/remove delete records and prevent current advertisements
        when an exclude-mode filter moves from "active" to "inactive"
        or vice versa based on new filter additions.

        Items 1-3 are just IPv4 versions of the IPv6 bugs found
by Yan Zheng and fixed earlier. Item #4 is a related bug that
affects exclude-mode change records only (but not queries) and
also occurs in IPv6 (IPv6 version coming soon).
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad12583f

[PKTGEN]: Respect hard_header_len of device. · 7ac5459e
David S. Miller authored Jan 18, 2006
```
Don't assume 16.

Found by Ben Greear.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
7ac5459e

[IRDA]: maintainer status · e048a374

Stephen Hemminger authored Jan 18, 2006

Jean says he really doesn't have time to much IRDA any more.
The following would help motivate someone who has more time.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e048a374

[CASSINI]: dont touch page_count · fa4f0774

Nick Piggin authored Jan 18, 2006

Remove page refcount manipulations from cassini driver by using
another field in struct page. Needed for lockless pagecache.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa4f0774

[SPARC64]: Update defconfig. · c07a8475
David S. Miller authored Jan 18, 2006
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
c07a8475

[PATCH] e1000: fix compile warning · 7c4d3367

Jesse Brandeburg authored Jan 18, 2006

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>

7c4d3367

[PATCH] e1000: fix receive breakage · 86c3d59f

Jesse Brandeburg authored Jan 18, 2006

in attempting to not send the "prefetch" patch, we broke the receive code,
this patch fixes that issue.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>

86c3d59f

[PATCH] e1000: Added driver comments · 73629bbc

Jesse Brandeburg authored Jan 18, 2006

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>

73629bbc

[PATCH] e1000: Fix whitespace · 96838a40

Jesse Brandeburg authored Jan 18, 2006

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>

96838a40