Commits · 8a5ec0ba42c4919e2d8f4c3138cc8b987fdb0b79 · nexedi / linux

11 Mar, 2011 4 commits

Lockless (and preemptless) fastpaths for slub · 8a5ec0ba

Christoph Lameter authored Feb 25, 2011

Use the this_cpu_cmpxchg_double functionality to implement a lockless
allocation algorithm on arches that support fast this_cpu_ops.

Each of the per cpu pointers is paired with a transaction id that ensures
that updates of the per cpu information can only occur in sequence on
a certain cpu.

A transaction id is a "long" integer that is comprised of an event number
and the cpu number. The event number is incremented for every change to the
per cpu state. This means that the cmpxchg instruction can verify for an
update that nothing interfered and that we are updating the percpu structure
for the processor where we picked up the information and that we are also
currently on that processor when we update the information.

This results in a significant decrease of the overhead in the fastpaths. It
also makes it easy to adopt the fast path for realtime kernels since this
is lockless and does not require the use of the current per cpu area
over the critical section. It is only important that the per cpu area is
current at the beginning of the critical section and at the end.

So there is no need even to disable preemption.

Test results show that the fastpath cycle count is reduced by up to ~ 40%
(alloc/free test goes from ~140 cycles down to ~80). The slowpath for kfree
adds a few cycles.

Sadly this does nothing for the slowpath which is where the main issues with
performance in slub are but the best case performance rises significantly.
(For that see the more complex slub patches that require cmpxchg_double)

Kmalloc: alloc/free test

Before:

10000 times kmalloc(8)/kfree -> 134 cycles
10000 times kmalloc(16)/kfree -> 152 cycles
10000 times kmalloc(32)/kfree -> 144 cycles
10000 times kmalloc(64)/kfree -> 142 cycles
10000 times kmalloc(128)/kfree -> 142 cycles
10000 times kmalloc(256)/kfree -> 132 cycles
10000 times kmalloc(512)/kfree -> 132 cycles
10000 times kmalloc(1024)/kfree -> 135 cycles
10000 times kmalloc(2048)/kfree -> 135 cycles
10000 times kmalloc(4096)/kfree -> 135 cycles
10000 times kmalloc(8192)/kfree -> 144 cycles
10000 times kmalloc(16384)/kfree -> 754 cycles

After:

10000 times kmalloc(8)/kfree -> 78 cycles
10000 times kmalloc(16)/kfree -> 78 cycles
10000 times kmalloc(32)/kfree -> 82 cycles
10000 times kmalloc(64)/kfree -> 88 cycles
10000 times kmalloc(128)/kfree -> 79 cycles
10000 times kmalloc(256)/kfree -> 79 cycles
10000 times kmalloc(512)/kfree -> 85 cycles
10000 times kmalloc(1024)/kfree -> 82 cycles
10000 times kmalloc(2048)/kfree -> 82 cycles
10000 times kmalloc(4096)/kfree -> 85 cycles
10000 times kmalloc(8192)/kfree -> 82 cycles
10000 times kmalloc(16384)/kfree -> 706 cycles

Kmalloc: Repeatedly allocate then free test

Before:

10000 times kmalloc(8) -> 211 cycles kfree -> 113 cycles
10000 times kmalloc(16) -> 174 cycles kfree -> 115 cycles
10000 times kmalloc(32) -> 235 cycles kfree -> 129 cycles
10000 times kmalloc(64) -> 222 cycles kfree -> 120 cycles
10000 times kmalloc(128) -> 343 cycles kfree -> 139 cycles
10000 times kmalloc(256) -> 827 cycles kfree -> 147 cycles
10000 times kmalloc(512) -> 1048 cycles kfree -> 272 cycles
10000 times kmalloc(1024) -> 2043 cycles kfree -> 528 cycles
10000 times kmalloc(2048) -> 4002 cycles kfree -> 571 cycles
10000 times kmalloc(4096) -> 7740 cycles kfree -> 628 cycles
10000 times kmalloc(8192) -> 8062 cycles kfree -> 850 cycles
10000 times kmalloc(16384) -> 8895 cycles kfree -> 1249 cycles

After:

10000 times kmalloc(8) -> 190 cycles kfree -> 129 cycles
10000 times kmalloc(16) -> 76 cycles kfree -> 123 cycles
10000 times kmalloc(32) -> 126 cycles kfree -> 124 cycles
10000 times kmalloc(64) -> 181 cycles kfree -> 128 cycles
10000 times kmalloc(128) -> 310 cycles kfree -> 140 cycles
10000 times kmalloc(256) -> 809 cycles kfree -> 165 cycles
10000 times kmalloc(512) -> 1005 cycles kfree -> 269 cycles
10000 times kmalloc(1024) -> 1999 cycles kfree -> 527 cycles
10000 times kmalloc(2048) -> 3967 cycles kfree -> 570 cycles
10000 times kmalloc(4096) -> 7658 cycles kfree -> 637 cycles
10000 times kmalloc(8192) -> 8111 cycles kfree -> 859 cycles
10000 times kmalloc(16384) -> 8791 cycles kfree -> 1173 cycles
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

8a5ec0ba

slub: Get rid of slab_free_hook_irq() · d3f661d6

Christoph Lameter authored Feb 25, 2011

The following patch will make the fastpaths lockless and will no longer
require interrupts to be disabled. Calling the free hook with irq disabled
will no longer be possible.

Move the slab_free_hook_irq() logic into slab_free_hook. Only disable
interrupts if the features are selected that require callbacks with
interrupts off and reenable after calls have been made.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

d3f661d6

slub: min_partial needs to be in first cacheline · 1a757fe5

Christoph Lameter authored Feb 25, 2011

It is used in unfreeze_slab() which is a performance critical
function.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

1a757fe5

Merge branch 'for-2.6.39' of... · 2a6c5176

Pekka Enberg authored Mar 11, 2011

Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into slub/lockless

2a6c5176

08 Mar, 2011 5 commits

Linux 2.6.38-rc8 · a5abba98
Linus Torvalds authored Mar 07, 2011

a5abba98

Merge branch 's5p-fixes-for-linus' of... · 715695ca

Linus Torvalds authored Mar 07, 2011

Merge branch 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung

* 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
  ARM: S3C64XX: Update regulator names for debugfs compatiblity on SMDK6410
  ARM: S3C64XX: Fix build with WM1190 disabled and WM1192 enabled on SMDK6410
  ARM: S3C64XX: Reduce output of s3c64xx_dma_init1()
  ARM: S3C64XX: Tone down SDHCI debugging
  ARM: S3C64XX: Add clock for i2c1
  ARM: S3C64XX: Staticise non-exported GPIO to interrupt functions
  ARM: SAMSUNG: Include devs.h in dev-uart.c to prototype devices
  ARM: S3C64XX: Fix keypad setup to configure correct number of rows
  ARM: S3C2440: Fix usage gpio bank j pin definitions on GTA02
  ARM: S5P64X0: Fix number of GPIO lines in Bank F
  ARM: S3C2440: Select missing S3C_DEV_USB_HOST on GTA02

715695ca

Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm · 34d4ade7

Linus Torvalds authored Mar 07, 2011

* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
  davinci: cpufreq: fix section mismatch warning
  DaVinci: fix compilation warnings in <mach/clkdev.h>
  davinci: tnetv107x: fix register indexing for GPIOs numbers > 31
  davinci: da8xx/omap-l1x: add platform device for davinci-pcm-audio
  ARM: pxa/tosa: register wm9712 codec device
  ARM: pxa: enable pxa-pcm-audio on pxa210/pxa25x platform
  ARM: pxa/colibri: don't register pxa2xx-pcmcia nodes on non-colibri platforms
  ARM: pxa/tosa: drop setting LED trigger name, as it's unsupported now
  ARM: 6762/1: Update number of VIC for S5P6442 and S5PC100
  ARM: 6761/1: Update number of VIC for S5PV210
  ARM: 6768/1: hw_breakpoint: ensure debug logic is powered up on v7 cores
  ARM: 6767/1: ptrace: fix register indexing in GETHBPREGS request
  ARM: 6765/1: remove obsolete comment from asm/mach/arch.h
  ARM: 6757/1: fix tlb.h induced linux/swap.h build failure

34d4ade7

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc · 1a345303

Linus Torvalds authored Mar 07, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  mmc: sdio: Allow sdio operations in other threads during sdio_add_func()

1a345303

Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 · b44a53d1

Linus Torvalds authored Mar 07, 2011

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm: index i shadowed in 2nd loop
  drm/nv50-nvc0: prevent multiple vm/bar flushes occuring simultanenously
  drm/nouveau: fix regression causing ttm to not be able to evict vram
  drm/i915: Rebind the buffer if its alignment constraints changes with tiling
  drm/i915: Disable GPU semaphores by default
  drm/i915: Do not overflow the MMADDR write FIFO
  Revert "drm/i915: fix corruptions on i8xx due to relaxed fencing"

b44a53d1

07 Mar, 2011 9 commits

drm: index i shadowed in 2nd loop · 062ac622

roel authored Mar 07, 2011

Index i was already used in thhe first loop
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

062ac622

mmc: sdio: Allow sdio operations in other threads during sdio_add_func() · 34497913

Dmitry Shmidt authored Mar 03, 2011

This fixes a bug introduced by 807e8e40 ("mmc: Fix sd/sdio/mmc
initialization frequency retries") that prevented SDIO drivers from
performing SDIO commands in their probe routines -- the above patch
called mmc_claim_host() before sdio_add_func(), which causes a deadlock
if an external SDIO driver calls sdio_claim_host().

Fix tested on an OLPC XO-1.75 with libertas on SDIO.
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
Reviewed-and-Tested-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

34497913

Merge remote branch 'ickle/drm-intel-fixes' into drm-fixes · 86206041

Dave Airlie authored Mar 08, 2011

* ickle/drm-intel-fixes:
  drm/i915: Rebind the buffer if its alignment constraints changes with tiling
  drm/i915: Disable GPU semaphores by default
  drm/i915: Do not overflow the MMADDR write FIFO
  Revert "drm/i915: fix corruptions on i8xx due to relaxed fencing"

86206041

Merge branch 'omap-fixes-for-linus' of... · 214d93b0

Linus Torvalds authored Mar 07, 2011

Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6

* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
  omap: mailbox: resolve hang issue
  OMAP2+: PM: SmartReflex: fix memory leaks in Smartreflex driver
  arm: mach-omap2: smartreflex: fix another memory leak

214d93b0

Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6 · ad4a4a82

Linus Torvalds authored Mar 07, 2011

* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] tape: deadlock on system work queue
  [S390] keyboard: integer underflow bug
  [S390] xpram: remove __initdata attribute from module parameters

ad4a4a82

drm/nv50-nvc0: prevent multiple vm/bar flushes occuring simultanenously · 6f70a4c3

Ben Skeggs authored Mar 07, 2011

The per-vm mutex doesn't prevent this completely, a flush coming from the
BAR VM could potentially happen at the same time as one for the channel
VM.  Not to mention that if/when we get per-client/channel VM, this will
happen far more frequently.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

6f70a4c3

drm/nouveau: fix regression causing ttm to not be able to evict vram · ef1b2871

Ben Skeggs authored Mar 07, 2011

TTM assumes an error condition from man->func->get_node() means that
something went horribly wrong, and causes it to bail.

The driver is supposed to return 0, and leave mm_node == NULL to
signal that it couldn't allocate any memory.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

ef1b2871

drm/i915: Rebind the buffer if its alignment constraints changes with tiling · 467cffba

Chris Wilson authored Mar 07, 2011

Early gen3 and gen2 chipset do not have the relaxed per-surface tiling
constraints of the later chipsets, so we need to check that the GTT
alignment is correct for the new tiling. If it is not, we need to
rebind.
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

467cffba

drm/i915: Disable GPU semaphores by default · a1656b90

Chris Wilson authored Mar 04, 2011

Andi Kleen narrowed his GPU hangs on his Sugar Bay (SNB desktop) rev 09
down to the use of GPU semaphores, and we already know that they appear
broken up to Huron River (mobile) rev 08. (I'm optimistic that disabling
GPU semaphores is simply hiding another bug by the latency and
side-effects of the additional device interaction it introduces...)

However, use of semaphores is a massive performance improvement... Only
as long as the system remains stable. Enable at your peril.
Reported-by: Andi Kleen <andi-fd@firstfloor.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33921Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

a1656b90

06 Mar, 2011 5 commits

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 · 6277d53a

Linus Torvalds authored Mar 06, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Don't set to D3 in Cirrus errata init verbs
  ALSA: hda - add new Fermi 5xx codec IDs to snd-hda
  ASoC: WM8994: Ensure late enable events are processed for the ADCs
  ASoC: WM8994: Don't disable the AIF[1|2]CLK_ENA unconditionaly
  ASoC: Fix WM9081 platform data initialisation
  ALSA: hda - Fix unable to record issue on ASUS N82JV
  ALSA: HDA: Realtek: Fixup jack detection to input subsystem

6277d53a

virtio: console: Don't access vqs if device was unplugged · d7a62cd0

Amit Shah authored Mar 04, 2011

If a virtio-console device gets unplugged while a port is open, a
subsequent close() call on the port accesses vqs to free up buffers.
This can lead to a crash.

The buffers are already freed up as a result of the call to
unplug_ports() from virtcons_remove().  The fix is to simply not access
vq information if port->portdev is NULL.
Reported-by: juzhang <juzhang@redhat.com>
CC: stable@kernel.org
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d7a62cd0

Merge branch 'fix/asoc' into for-linus · 2133991d
Takashi Iwai authored Mar 06, 2011

2133991d

drm/i915: Do not overflow the MMADDR write FIFO · 91355834

Chris Wilson authored Mar 04, 2011

Whilst the GT is powered down (rc6), writes to MMADDR are placed in a
FIFO by the System Agent. This is a limited resource, only 64 entries, of
which 20 are reserved for Display and PCH writes, and so we must take
care not to queue up too many writes. To avoid this, there is counter
which we can poll to ensure there are sufficient free entries in the
fifo.

"Issuing a write to a full FIFO is not supported; at worst it could
result in corruption or a system hang."
Reported-and-Tested-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34056Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

91355834

Revert "drm/i915: fix corruptions on i8xx due to relaxed fencing" · 0ee537ab

Chris Wilson authored Mar 06, 2011

This reverts commit c2e0eb16.

As it turns out, userspace already depends upon being able to enable
tiling on existing bo which it promises to be large enough for its
purposes i.e. it will not access beyond the end of the last full-tile
row.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35016Reported-and-tested-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

0ee537ab

05 Mar, 2011 17 commits

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · fb62c00a

Linus Torvalds authored Mar 05, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: no .snap inside of snapped namespace
  libceph: fix msgr standby handling
  libceph: fix msgr keepalive flag
  libceph: fix msgr backoff
  libceph: retry after authorization failure
  libceph: fix handling of short returns from get_user_pages
  ceph: do not clear I_COMPLETE from d_release
  ceph: do not set I_COMPLETE
  Revert "ceph: keep reference to parent inode on ceph_dentry"

fb62c00a

mm: use correct numa policy node for transparent hugepages · 5c4b4be3

Andi Kleen authored Mar 04, 2011

Pass down the correct node for a transparent hugepage allocation.  Most
callers continue to use the current node, however the hugepaged daemon
now uses the previous node of the first to be collapsed page instead.
This ensures that khugepaged does not mess up local memory for an
existing process which uses local policy.

The choice of node is somewhat primitive currently: it just uses the
node of the first page in the pmd range.  An alternative would be to
look at multiple pages and use the most popular node.  I used the
simplest variant for now which should work well enough for the case of
all pages being on the same node.

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5c4b4be3

mm: preserve original node for transparent huge page copies · 19ee151e

Andi Kleen authored Mar 04, 2011

This makes a difference for LOCAL policy, where the node cannot be
determined from the policy itself, but has to be gotten from the original
page.
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

19ee151e

mm: add alloc_page_vma_node() · 236344d6

Andi Kleen authored Mar 04, 2011

Add a alloc_page_vma_node that allows passing the "local" node in.  Used
in a followon patch.
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

236344d6

mm: change alloc_pages_vma to pass down the policy node for local policy · 2f5f9486

Andi Kleen authored Mar 04, 2011

Currently alloc_pages_vma() always uses the local node as policy node for
the LOCAL policy.  Pass this node down as an argument instead.

No behaviour change from this patch, but will be needed for followons.
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2f5f9486

RapidIO: Update MAINTAINERS · b8bc1dd3

Alexandre Bounine authored Mar 04, 2011

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b8bc1dd3

drivers/video/backlight/ltv350qv.c: fix a memory leak · 9dab51da

Axel Lin authored Mar 04, 2011

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9dab51da

MAINTAINERS: add maintainer of Samsung Mobile Machine support · 10ffa964

Kyungmin Park authored Mar 04, 2011

Add maintainer of Samsung Mobile machine support.  Currently, Aquila,
Goni, Universal (C210), and Nuri board are supported.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Joe Perches <joe@perches.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

10ffa964

pps: make pps_gen_parport depend on BROKEN · 95b90afe

Thomas Gleixner authored Mar 04, 2011

This driver causes hard lockups, when the active clock soure is jiffies.

The reason is that it loops with interrupts disabled waiting for a
timestamp to be reached by polling getnstimeofday().  Though with a
jiffies clocksource, when that code runs on the same CPU which is
responsible for updating jiffies, then we loop in circles for ever
simply because the timer interrupt cannot update jiffies.  So both UP
and SMP can be affected.

There is no easy fix for that problem so make it depend on BROKEN for
now.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Gordeev <lasaine@lvk.cs.msu.su>
Cc: Rodolfo Giometti <giometti@linux.it>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

95b90afe

drivers/misc/bmp085.c: add MODULE_DEVICE_TABLE · 97e419a0

Axel Lin authored Mar 04, 2011

The device table is required to load modules based on modaliases.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Shubhrajyoti D <shubhrajyoti@ti.com>
Cc: Christoph Mair <christoph.mair@gmail.com>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

97e419a0

cpuset: add a missing unlock in cpuset_write_resmask() · b75f38d6

Li Zefan authored Mar 04, 2011

Don't forget to release cgroup_mutex if alloc_trial_cpuset() fails.

[akpm@linux-foundation.org: avoid multiple return points]
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b75f38d6

drivers/rtc/rtc-s3c.c: fix prototype for s3c_rtc_setaie() · 2ec38a03

Axel Lin authored Mar 04, 2011

Fix s3c_rtc_setaie() prototype to eliminate the following compile
warning:

  drivers/rtc/rtc-s3c.c:383: warning: initialization from incompatible pointer type

(akpm: the rtc_class_ops.alarm_irq_enable() handler is being passed two
arguments where it expects just one, presumably with undesired effects)
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2ec38a03

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin · 212e3499

Linus Torvalds authored Mar 04, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin:
  Blackfin: iflush: update anomaly 05000491 workaround
  Blackfin: outs[lwb]: make sure count is greater than 0

212e3499

Merge branch 'rmobile-fixes-for-linus' of... · 971a967b

Linus Torvalds authored Mar 04, 2011

Merge branch 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6

* 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  ARM: mach-shmobile: mackerel: modify LCDC clock divider value
  ARM: mach-shmobile: ap4evb: modify LCDC clock divider value
  ARM: mach-shmobile: mackerel: fixup memory initialize for zboot
  ARM: mach-shmobile: ap4evb: fixup memory initialize for zboot
  ARM: mach-shmobile: Add sh73a0 MIPI-CSI and CEU clocks
  ARM: mach-shmobile: AG5EVM MIPI-DSI LCD reset delay fix

971a967b

Merge branch 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 · f0678f32

Linus Torvalds authored Mar 04, 2011

* 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: Change __nosave_XXX symbols to long
  sh: Flush executable pages in copy_user_highpage
  sh: Ensure ST40-300 BogoMIPS value is consistent
  sh: sh7750: Fix incompatible pointer type
  sh: sh7750: move machtypes.h to include/generated

f0678f32

Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 · be91bfeb

Linus Torvalds authored Mar 04, 2011

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/nouveau: allocate kernel's notifier object at end of block

be91bfeb

nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3) · e9e3d724

Neil Horman authored Mar 04, 2011

The "bad_page()" page allocator sanity check was reported recently (call
chain as follows):

  bad_page+0x69/0x91
  free_hot_cold_page+0x81/0x144
  skb_release_data+0x5f/0x98
  __kfree_skb+0x11/0x1a
  tcp_ack+0x6a3/0x1868
  tcp_rcv_established+0x7a6/0x8b9
  tcp_v4_do_rcv+0x2a/0x2fa
  tcp_v4_rcv+0x9a2/0x9f6
  do_timer+0x2df/0x52c
  ip_local_deliver+0x19d/0x263
  ip_rcv+0x539/0x57c
  netif_receive_skb+0x470/0x49f
  :virtio_net:virtnet_poll+0x46b/0x5c5
  net_rx_action+0xac/0x1b3
  __do_softirq+0x89/0x133
  call_softirq+0x1c/0x28
  do_softirq+0x2c/0x7d
  do_IRQ+0xec/0xf5
  default_idle+0x0/0x50
  ret_from_intr+0x0/0xa
  default_idle+0x29/0x50
  cpu_idle+0x95/0xb8
  start_kernel+0x220/0x225
  _sinittext+0x22f/0x236

It occurs because an skb with a fraglist was freed from the tcp
retransmit queue when it was acked, but a page on that fraglist had
PG_Slab set (indicating it was allocated from the Slab allocator (which
means the free path above can't safely free it via put_page.

We tracked this back to an nfsv4 setacl operation, in which the nfs code
attempted to fill convert the passed in buffer to an array of pages in
__nfs4_proc_set_acl, which gets used by the skb->frags list in
xs_sendpages.  __nfs4_proc_set_acl just converts each page in the buffer
to a page struct via virt_to_page, but the vfs allocates the buffer via
kmalloc, meaning the PG_slab bit is set.  We can't create a buffer with
kmalloc and free it later in the tcp ack path with put_page, so we need
to either:

1) ensure that when we create the list of pages, no page struct has
   PG_Slab set

 or

2) not use a page list to send this data

Given that these buffers can be multiple pages and arbitrarily sized, I
think (1) is the right way to go.  I've written the below patch to
allocate a page from the buddy allocator directly and copy the data over
to it.  This ensures that we have a put_page free-able page for every
entry that winds up on an skb frag list, so it can be safely freed when
the frame is acked.  We do a put page on each entry after the
rpc_call_sync call so as to drop our own reference count to the page,
leaving only the ref count taken by tcp_sendpages.  This way the data
will be properly freed when the ack comes in

Successfully tested by myself to solve the above oops.

Note, as this is the result of a setacl operation that exceeded a page
of data, I think this amounts to a local DOS triggerable by an
uprivlidged user, so I'm CCing security on this as well.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: security@kernel.org
CC: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e9e3d724