Commits · cd4a3c503c185f5f0a20f04f90da0a6966dd03bd · nexedi / linux

08 Apr, 2011 6 commits

xfs: clean up code layout in xfs_trans_ail.c · cd4a3c50

Dave Chinner authored Apr 08, 2011

This patch rearranges the location of functions in xfs_trans_ail.c
to remove the need for forward declarations of those functions in
preparation for adding new functions without the need for forward
declarations.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>

cd4a3c50

xfs: convert the xfsaild threads to a workqueue · 0bf6a5bd

Dave Chinner authored Apr 08, 2011

Similar to the xfssyncd, the per-filesystem xfsaild threads can be
converted to a global workqueue and run periodically by delayed
works. This makes sense for the AIL pushing because it uses
variable timeouts depending on the work that needs to be done.

By removing the xfsaild, we simplify the AIL pushing code and
remove the need to spread the code to implement the threading
and pushing across multiple files.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>

0bf6a5bd

xfs: introduce background inode reclaim work · a7b339f1

Dave Chinner authored Apr 08, 2011

Background inode reclaim needs to run more frequently that the XFS
syncd work is run as 30s is too long between optimal reclaim runs.
Add a new periodic work item to the xfs syncd workqueue to run a
fast, non-blocking inode reclaim scan.

Background inode reclaim is kicked by the act of marking inodes for
reclaim.  When an AG is first marked as having reclaimable inodes,
the background reclaim work is kicked. It will continue to run
periodically untill it detects that there are no more reclaimable
inodes. It will be kicked again when the first inode is queued for
reclaim.

To ensure shrinker based inode reclaim throttles to the inode
cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the
background inode reclaim so that when we are low on memory we are
trying to reclaim inodes as efficiently as possible. This kick shoul
d not be necessary, but it will protect against failures to kick the
background reclaim when inodes are first dirtied.

To provide the rate throttling, make the shrinker pass do
synchronous inode reclaim so that it blocks on inodes under IO. This
means that the shrinker will reclaim inodes rather than just
skipping over them, but it does not adversely affect the rate of
reclaim because most dirty inodes are already under IO due to the
background reclaim work the shrinker kicked.

These two modifications solve one of the two OOM killer invocations
Chris Mason reported recently when running a stress testing script.
The particular workload trigger for the OOM killer invocation is
where there are more threads than CPUs all unlinking files in an
extremely memory constrained environment. Unlike other solutions,
this one does not have a performance impact on performance when
memory is not constrained or the number of concurrent threads
operating is <= to the number of CPUs.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>

a7b339f1

xfs: convert ENOSPC inode flushing to use new syncd workqueue · 89e4cb55

Dave Chinner authored Apr 08, 2011

On of the problems with the current inode flush at ENOSPC is that we
queue a flush per ENOSPC event, regardless of how many are already
queued. Thi can result in    hundreds of queued flushes, most of
which simply burn CPU scanned and do no real work. This simply slows
down allocation at ENOSPC.

We really only need one active flush at a time, and we can easily
implement that via the new xfs_syncd_wq. All we need to do is queue
a flush if one is not already active, then block waiting for the
currently active flush to complete. The result is that we only ever
have a single ENOSPC inode flush active at a time and this greatly
reduces the overhead of ENOSPC processing.

On my 2p test machine, this results in tests exercising ENOSPC
conditions running significantly faster - 042 halves execution time,
083 drops from 60s to 5s, etc - while not introducing test
regressions.

This allows us to remove the old xfssyncd threads and infrastructure
as they are no longer used.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>

89e4cb55

xfs: introduce a xfssyncd workqueue · c6d09b66

Dave Chinner authored Apr 08, 2011

All of the work xfssyncd does is background functionality. There is
no need for a thread per filesystem to do this work - it can al be
managed by a global workqueue now they manage concurrency
effectively.

Introduce a new gglobal xfssyncd workqueue, and convert the periodic
work to use this new functionality. To do this, use a delayed work
construct to schedule the next running of the periodic sync work
for the filesystem. When the sync work is complete, queue a new
delayed work for the next running of the sync work.

For laptop mode, we wait on completion for the sync works, so ensure
that the sync work queuing interface can flush and wait for work to
complete to enable the work queue infrastructure to replace the
current sequence number and wakeup that is used.

Because the sync work does non-trivial amounts of work, mark the
new work queue as CPU intensive.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>

c6d09b66

xfs: fix extent format buffer allocation size · e828776a

Dave Chinner authored Apr 08, 2011

When formatting an inode item, we have to allocate a separate buffer
to hold extents when there are delayed allocation extents on the
inode and it is in extent format. The allocation size is derived
from the in-core data fork representation, which accounts for
delayed allocation extents, while the on-disk representation does
not contain any delalloc extents.

As a result of this mismatch, the allocated buffer can be far larger
than needed to hold the real extent list which, due to the fact the
inode is in extent format, is limited to the size of the literal
area of the inode. However, we can have thousands of delalloc
extents, resulting in an allocation size orders of magnitude larger
than is needed to hold all the real extents.

Fix this by limiting the size of the buffer being allocated to the
size of the literal area of the inodes in the filesystem (i.e. the
maximum size an inode fork can grow to).
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>

e828776a

31 Mar, 2011 1 commit

xfs: fix unreferenced var error in xfs_buf.c · 89b3600c

Dave Chinner authored Mar 29, 2011

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

89b3600c

29 Mar, 2011 33 commits

Linux 2.6.39-rc1 · 0ce790e7
Linus Torvalds authored Mar 29, 2011

0ce790e7

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc · 6b2a4f7a

Linus Torvalds authored Mar 29, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (26 commits)
  mmc: SDHI should depend on SUPERH || ARCH_SHMOBILE
  mmc: tmio_mmc: Move some defines into a shared header
  mmc: tmio: support aggressive clock gating
  mmc: tmio: fix power-mode interpretation
  mmc: tmio: remove work-around for unmasked SDIO interrupts
  sh: fix SDHI IO address-range
  ARM: mach-shmobile: fix SDHI IO address-range
  mmc: tmio: only access registers above 0xff, if available
  mfd: remove now redundant sh_mobile_sdhi.h header
  sh: convert boards to use linux/mmc/sh_mobile_sdhi.h
  ARM: mach-shmobile: convert boards to use linux/mmc/sh_mobile_sdhi.h
  mmc: tmio: convert the SDHI MMC driver from MFD to a platform driver
  sh: ecovec: use the CONFIG_MMC_TMIO symbols instead of MFD
  mmc: tmio: split core functionality, DMA and MFD glue
  mmc: tmio: use PIO for short transfers
  mmc: tmio-mmc: Improve DMA stability on sh-mobile
  mmc: fix mmc_app_send_scr() for dma transfer
  mmc: sdhci-esdhc: enable esdhc on imx53
  mmc: sdhci-esdhc: use writel/readl as general APIs
  mmc: sdhci: add the abort CMDTYPE bits definition
  ...

6b2a4f7a

Merge branch 'frv' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-frv · eefbab59

Linus Torvalds authored Mar 29, 2011

* 'frv' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-frv:
  FRV: Use generic show_interrupts()
  FRV: Convert genirq namespace
  frv: Select GENERIC_HARDIRQS_NO_DEPRECATED
  frv: Convert cpu irq_chip to new functions
  frv: Convert mb93493 irq_chip to new functions
  frv: Convert mb93093 irq_chip to new function
  frv: Convert mb93091 irq_chip to new functions
  frv: Fix typo from __do_IRQ overhaul
  frv: Remove stale irq_chip.end
  FRV: Do some cleanups
  FRV: Missing node arg in alloc_thread_info_node() macro
  NOMMU: implement access_remote_vm
  NOMMU: support SMP dynamic percpu_alloc
  NOMMU: percpu should use is_vmalloc_addr().

eefbab59

Merge branch 'stable/bug-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen · 90f1e748

Linus Torvalds authored Mar 29, 2011

* 'stable/bug-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Use new irq_move functions
  xen: Convert genirq namespace
  xen: fix p2m section mismatches
  xen/p2m: Allocate p2m tracking pages on override
  xen-gntdev: unlock on error path in gntdev_mmap()
  xen-gntdev: return -EFAULT on copy_to_user failure

90f1e748

Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog · d6ae0c63

Linus Torvalds authored Mar 29, 2011

* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  watchdog: softdog.c: enhancement to optionally invoke panic instead of reboot on timer expiry
  watchdog: fix nv_tco section mismatch
  watchdog: sp5100_tco.c: Check if firmware has set correct value in tcobase.
  watchdog: Convert release_resource to release_region/release_mem_region
  watchdog: s3c2410_wdt.c: Convert release_resource to release_region/release_mem_region

d6ae0c63

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp · 8c82840e
Linus Torvalds authored Mar 29, 2011
```
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  amd64_edac: Fix potential memleak
```
8c82840e

Merge branch 'irq-final-for-linus-v2' of... · c86defc8

Linus Torvalds authored Mar 29, 2011

Merge branch 'irq-final-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'irq-final-for-linus-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (111 commits)
  gpio: ab8500: Mark broken
  genirq: Remove move_*irq leftovers
  genirq: Remove compat code
  drivers: Final irq namespace conversion
  mn10300: Use generic show_interrupts()
  mn10300: Cleanup irq_desc access
  mn10300: Convert genirq namespace
  frv: Use generic show_interrupts()
  frv: Convert genirq namespace
  frv: Select GENERIC_HARDIRQS_NO_DEPRECATED
  frv: Convert cpu irq_chip to new functions
  frv: Convert mb93493 irq_chip to new functions
  frv: Convert mb93093 irq_chip to new function
  frv: Convert mb93091 irq_chip to new functions
  frv: Fix typo from __do_IRQ overhaul
  frv: Remove stale irq_chip.end
  m68k: Convert irq function namespace
  xen: Use new irq_move functions
  xen: Cleanup genirq namespace
  unicore32: Use generic show_interrupts()
  ...

c86defc8

char/tpm: Fix unitialized usage of data buffer · 1309d7af

Peter Huewe authored Mar 29, 2011

This patch fixes information leakage to the userspace by initializing
the data buffer to zero.
Reported-by: Peter Huewe <huewe.external@infineon.com>
Signed-off-by: Peter Huewe <huewe.external@infineon.com>
Signed-off-by: Marcel Selhorst <m.selhorst@sirrix.com>
[ Also removed the silly "* sizeof(u8)".  If that isn't 1, we have way
  deeper problems than a simple multiplication can fix.   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

1309d7af

amd64_edac: Fix potential memleak · a9f0fbe2

Borislav Petkov authored Mar 29, 2011

We check the pointers together but at least one of them could be invalid
due to failed allocation. Since we cannot continue if either of the two
allocations has failed, exit early by freeing them both.

Cc: <stable@kernel.org> # 38.x
Reported-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>

a9f0fbe2

fs: don't use igrab() while holding i_lock · 0444d76a

Dave Chinner authored Mar 29, 2011

Fix the incorrect use of igrab() inside the i_lock in NFS and Ceph‥

If we are already holding the i_lock, we have a reference to the
inode so we can safely use ihold() to gain an extra reference. This
avoids hangs due to lock recursion on the i_lock now that the
inode_lock is gone and igrab() uses the i_lock itself.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0444d76a

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · cb1817b3

Linus Torvalds authored Mar 29, 2011

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits)
  xfrm: Restrict extended sequence numbers to esp
  xfrm: Check for esn buffer len in xfrm_new_ae
  xfrm: Assign esn pointers when cloning a state
  xfrm: Move the test on replay window size into the replay check functions
  netdev: bfin_mac: document TE setting in RMII modes
  drivers net: Fix declaration ordering in inline functions.
  cxgb3: Apply interrupt coalescing settings to all queues
  net: Always allocate at least 16 skb frags regardless of page size
  ipv4: Don't ip_rt_put() an error pointer in RAW sockets.
  net: fix ethtool->set_flags not intended -EINVAL return value
  mlx4_en: Fix loss of promiscuity
  tg3: Fix inline keyword usage
  tg3: use <linux/io.h> and <linux/uaccess.h> instead <asm/io.h> and <asm/uaccess.h>
  net: use CHECKSUM_NONE instead of magic number
  Net / jme: Do not use legacy PCI power management
  myri10ge: small rx_done refactoring
  bridge: notify applications if address of bridge device changes
  ipv4: Fix IP timestamp option (IPOPT_TS_PRESPEC) handling in ip_options_echo()
  can: c_can: Fix tx_bytes accounting
  can: c_can_platform: fix irq check in probe
  ...

cb1817b3

xen: Use new irq_move functions · e240ae4a

Thomas Gleixner authored Mar 24, 2011

These functions take irq_data as an argument and avoid a redundant
lookup in the sparse irq case.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

e240ae4a

xen: Convert genirq namespace · 3b3af761

Thomas Gleixner authored Mar 25, 2011

Converted with coccinelle.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

3b3af761

xen: fix p2m section mismatches · b83c6e55

Randy Dunlap authored Mar 24, 2011

Fix section mismatch warnings:
set_phys_range_identity() is called by __init xen_set_identity(),
so also mark set_phys_range_identity() as __init.
then:
__early_alloc_p2m() is called set_phys_range_identity(), so also mark
__early_alloc_p2m() as __init.

WARNING: arch/x86/built-in.o(.text+0x7856): Section mismatch in reference from the function __early_alloc_p2m() to the function .init.text:extend_brk()
The function __early_alloc_p2m() references
the function __init extend_brk().
This is often because __early_alloc_p2m lacks a __init
annotation or the annotation of extend_brk is wrong.

WARNING: arch/x86/built-in.o(.text+0x7967): Section mismatch in reference from the function set_phys_range_identity() to the function .init.text:extend_brk()
The function set_phys_range_identity() references
the function __init extend_brk().
This is often because set_phys_range_identity lacks a __init
annotation or the annotation of extend_brk is wrong.

[v2: Per Stephen Hemming recommonedation made __early_alloc_p2m static]
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b83c6e55

FRV: Use generic show_interrupts() · 3062aa50

Thomas Gleixner authored Mar 29, 2011

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Howells <dhowells@redhat.com>

3062aa50

FRV: Convert genirq namespace · 60af3ab1

Thomas Gleixner authored Mar 29, 2011

Convert to new function names.  Converted with coccinelle.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Howells <dhowells@redhat.com>

60af3ab1

frv: Select GENERIC_HARDIRQS_NO_DEPRECATED · a9554c3a

Thomas Gleixner authored Mar 29, 2011

All chips converted
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Howells <dhowells@redhat.com>

a9554c3a

frv: Convert cpu irq_chip to new functions · 12516469

Thomas Gleixner authored Mar 29, 2011

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Howells <dhowells@redhat.com>

12516469

frv: Convert mb93493 irq_chip to new functions · a4b48a49

Thomas Gleixner authored Mar 29, 2011

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Howells <dhowells@redhat.com>

a4b48a49

frv: Convert mb93093 irq_chip to new function · 9148d88b

Thomas Gleixner authored Mar 29, 2011

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Howells <dhowells@redhat.com>

9148d88b

frv: Convert mb93091 irq_chip to new functions · 193e7a5f

Thomas Gleixner authored Mar 29, 2011

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Howells <dhowells@redhat.com>

193e7a5f

frv: Fix typo from __do_IRQ overhaul · 303fef90

Thomas Gleixner authored Mar 29, 2011

Compiles way better.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Howells <dhowells@redhat.com>

303fef90

frv: Remove stale irq_chip.end · c4b15980

Thomas Gleixner authored Mar 29, 2011

irq_chip.end got obsolete with the removal of __do_IRQ().

irq-mb93093.c even lacks an implementation, but nobody noticed that
it's broken since commit 88d6e1 in 2006.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Howells <dhowells@redhat.com>

c4b15980

FRV: Do some cleanups · 5ca7202b

Amerigo Wang authored Mar 29, 2011

1. frv doesn't support SMP, remove the useless SMP bits.

2. frv has its own alloc_task_struct, so define __HAVE_ARCH_TASK_STRUCT_ALLOCATOR
   (I am not sure if frv should use generic alloc_task_struct().)
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>

5ca7202b

FRV: Missing node arg in alloc_thread_info_node() macro · 5ef9bdde

David Howells authored Mar 29, 2011

There are two alloc_thread_info_node() macros defined (one for debugging and
one for normal).  The commit that changed them most recently:

	commit b6a84016
	Author: Eric Dumazet <eric.dumazet@gmail.com>
	Date:   Tue Mar 22 16:30:42 2011 -0700
	Subject: mm: NUMA aware alloc_thread_info_node()

didn't add the node argument into the macro argument list for the normal macro.
This results in the following error:

kernel/fork.c:267:39: error: macro "alloc_thread_info_node" passed 2 arguments, but takes just 1
kernel/fork.c: In function 'dup_task_struct':
kernel/fork.c:267: error: 'alloc_thread_info_node' undeclared (first use in this function)
kernel/fork.c:267: error: (Each undeclared identifier is reported only once
kernel/fork.c:267: error: for each function it appears in.)
Signed-off-by: David Howells <dhowells@redhat.com>

5ef9bdde

NOMMU: implement access_remote_vm · f55f199b

Mike Frysinger authored Mar 29, 2011

Recent vm changes brought in a new function which the core procfs code
utilizes.  So implement it for nommu systems too to avoid link failures.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Simon Horman <horms@verge.net.au>
Tested-by: Ithamar Adema <ithamar.adema@team-embedded.nl>
Acked-by: Greg Ungerer <gerg@uclinux.org>

f55f199b

gpio: ab8500: Mark broken · 9ad198cb

Thomas Gleixner authored Mar 29, 2011

This driver is broken in several aspects.

 1) old style irq_chip functions. Sigh

 2) Abuse of the unlock callback. That's not supposed to be a state
    machine for evrything and some more.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

9ad198cb

genirq: Remove move_*irq leftovers · 851d7cf6

Thomas Gleixner authored Mar 29, 2011

All users converted to new interface.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

851d7cf6

genirq: Remove compat code · 0c6f8a8b
Thomas Gleixner authored Mar 28, 2011
```
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
```
0c6f8a8b
drivers: Final irq namespace conversion · dced35ae
Thomas Gleixner authored Mar 28, 2011
```
Scripted with coccinelle.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
```
dced35ae
mn10300: Use generic show_interrupts() · 2a8f55b1
Thomas Gleixner authored Mar 24, 2011
```
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
```
2a8f55b1

mn10300: Cleanup irq_desc access · 232f1d85

Thomas Gleixner authored Mar 24, 2011

The migration needs only access to irq_data.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

232f1d85

mn10300: Convert genirq namespace · f4c547eb

Thomas Gleixner authored Mar 24, 2011

Convert to new function names. Converted with coccinelle.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

f4c547eb