Commits · 8babd8a2e75cccff3167a61176c2a3e977e13799 · Kirill Smelkov / linux

05 Mar, 2010 4 commits

xfs: Increase the default size of the reserved blocks pool · 8babd8a2

Dave Chinner authored Mar 04, 2010

The current default size of the reserved blocks pool is easy to deplete
with certain workloads, in particular workloads that do lots of concurrent
delayed allocation extent conversions. If enough transactions are running
in parallel and the entire pool is consumed then subsequent calls to
xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited
warning so we know if this starts happening again.

This is an updated version of an old patch from Lachlan McIlroy.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

8babd8a2

xfs: truncate delalloc extents when IO fails in writeback · 3ed3a434

Dave Chinner authored Mar 05, 2010

We currently use block_invalidatepage() to clean up pages where I/O
fails in ->writepage(). Unfortunately, if the page has delalloc
regions on it, we fail to remove the delalloc regions when we
invalidate the page.  This can result in tripping a BUG() in
xfs_get_blocks() later on if a direct IO read is done on that same
region - the delalloc extent is returned when none is supposed to be
there.

Fix this by truncating away the delalloc regions on the page before
invalidating it. Because they are delalloc, we can do this without
needing a transaction. Indeed - if we get ENOSPC errors, we have to
be able to do this truncation without a transaction as there is
no space left for block reservation (typically why we see a ENOSPC
in writeback).
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

3ed3a434

xfs: check for more work before sleeping in xfssyncd · 20f6b2c7

Dave Chinner authored Mar 04, 2010

xfssyncd processes a queue of work by detaching the queue and
then iterating over all the work items. It then sleeps for a
time period or until new work comes in. If new work is queued
while xfssyncd is actively processing the detached work queue,
it will not process that new work until after a sleep timeout
or the next work event queued wakes it.

Fix this by checking the work queue again before going to sleep.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

20f6b2c7

xfs: Fix a build warning in xfs_aops.c · 69418932

Dave Chinner authored Mar 04, 2010

Fix a build warning that slipped through.  Dave Chinner had posted
an updated version of his patch but the previous version--without
this fix--was what got committed.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

69418932

02 Mar, 2010 1 commit

xfs: fix locking for inode cache radix tree tag updates · f1f724e4

Christoph Hellwig authored Mar 01, 2010

The radix-tree code requires it's users to serialize tag updates
against other updates to the tree.  While XFS protects tag updates
against each other it does not serialize them against updates of the
tree contents, which can lead to tag corruption.  Fix the inode
cache to always take pag_ici_lock in exclusive mode when updating
radix tree tags.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Patrick Schreurs <patrick@news-service.com>
Tested-by: Patrick Schreurs <patrick@news-service.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

f1f724e4

01 Mar, 2010 15 commits

xfs: remove xfs_ipin/xfs_iunpin · a14a5ab5

Christoph Hellwig authored Feb 18, 2010

Inodes are only pinned/unpinned via the inode item methods, and lots of
code relies on that fact.  So remove the separate xfs_ipin/xfs_iunpin
helpers and merge them into their only callers.  This also fixes up
various duplicate and/or incorrect comments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

a14a5ab5

xfs: cleanup xfs_iunpin_wait/xfs_iunpin_nowait · 60ec6783

Christoph Hellwig authored Feb 17, 2010

Remove the inode item pointer and ili_last_lsn checks in
__xfs_iunpin_wait as any pinned inode is guaranteed to have them
valid.  After this the xfs_iunpin_nowait case is nothing more than a
xfs_log_force_lsn, as we know that the caller has already checked
the pincount.

Make xfs_iunpin_nowait the new low-level routine just doing the log
force and rewrite xfs_iunpin_wait around it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

60ec6783

xfs: kill xfs_lrw.h · d7658d48

Christoph Hellwig authored Feb 17, 2010

Move the two declarations to better fitting headers now that
xfs_lrw.c is gone.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

d7658d48

xfs: factor common xfs_trans_bjoin code · d7e84f41

Christoph Hellwig authored Feb 15, 2010

Most of xfs_trans_bjoin is duplicated in xfs_trans_get_buf,
xfs_trans_getsb and xfs_trans_read_buf.  Add a new _xfs_trans_bjoin
which can be called by all four functions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

d7e84f41

xfs: stop passing opaque handles to xfs_log.c routines · 35a8a72f

Christoph Hellwig authored Feb 15, 2010

Currenly we pass opaque xfs_log_ticket_t handles instead of
struct xlog_ticket pointers, and void pointers instead of
struct xlog_in_core pointers to various log manager functions.
Instead pass properly typed pointers after adding forward
declarations for them to xfs_log.h, and adjust the touched
function prototypes to the standard XFS style while at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

35a8a72f

xfs: split xfs_bmap_btalloc · c467c049

Christoph Hellwig authored Feb 15, 2010

Split out the nullfb case into a separate function to reduce the stack
footprint and make the code more readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

c467c049

xfs: fix xfs_fsblock_t tracing · f7008d0a

Christoph Hellwig authored Feb 15, 2010

Using a static buffer in xfs_fmtfsblock means we can corrupt traces if
multiple CPUs hit this code path at the same.  Just remove xfs_fmtfsblock
for now and print the block number purely numerical.  If we want the
NULLFSBLOCK and NULLSTARTBLOCK formatting back the best way would be
a decoding plugin in the trace-cmd userspace command.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

f7008d0a

xfs: fix inode pincount check in fsync · 024910cb

Christoph Hellwig authored Feb 17, 2010

We need to hold the ilock to check the inode pincount safely.  While
we're at it also remove the check for ip->i_itemp->ili_last_lsn, a
pinned inode always has it set.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

024910cb

xfs: Non-blocking inode locking in IO completion · 77d7a0c2

Dave Chinner authored Feb 17, 2010

The introduction of barriers to loop devices has created a new IO
order completion dependency that XFS does not handle. The loop
device implements barriers using fsync and so turns a log IO in the
XFS filesystem on the loop device into a data IO in the backing
filesystem. That is, the completion of log IOs in the loop
filesystem are now dependent on completion of data IO in the backing
filesystem.

This can cause deadlocks when a flush daemon issues a log force with
an inode locked because the IO completion of IO on the inode is
blocked by the inode lock. This in turn prevents further data IO
completion from occuring on all XFS filesystems on that CPU (due to
the shared nature of the completion queues). This then prevents the
log IO from completing because the log is waiting for data IO
completion as well.

The fix for this new completion order dependency issue is to make
the IO completion inode locking non-blocking. If the inode lock
can't be grabbed, simply requeue the IO completion back to the work
queue so that it can be processed later. This prevents the
completion queue from being blocked and allows data IO completion on
other inodes to proceed, hence avoiding completion order dependent
deadlocks.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

77d7a0c2

xfs: implement optimized fdatasync · 66d834ea

Christoph Hellwig authored Feb 15, 2010

Allow us to track the difference between timestamp and size updates
by using mark_inode_dirty from the I/O completion code, and checking
the VFS inode flags in xfs_file_fsync.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

66d834ea

xfs: remove wrapper for the fsync file operation · fd3200be

Christoph Hellwig authored Feb 15, 2010

Currently the fsync file operation is divided into a low-level
routine doing all the work and one that implements the Linux file
operation and does minimal argument wrapping.  This is a leftover
from the days of the vnode operations layer and can be removed to
simplify the code a bit, as well as preparing for the implementation
of an optimized fdatasync which needs to look at the Linux inode
state.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

fd3200be

xfs: remove wrappers for read/write file operations · 00258e36

Christoph Hellwig authored Feb 15, 2010

Currently the aio_read, aio_write, splice_read and splice_write file
operations are divided into a low-level routine doing all the work
and one that implements the Linux file operations and does minimal
argument wrapping.  This is a leftover from the days of the vnode
operations layer and can be removed to simplify the code a lot.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

00258e36

xfs: merge xfs_lrw.c into xfs_file.c · dda35b8f

Christoph Hellwig authored Feb 15, 2010

Currently the code to implement the file operations is split over
two small files.  Merge the content of xfs_lrw.c into xfs_file.c to
have it in one place.  Note that I haven't done various cleanups
that are possible after this yet, they will follow in the next
patch.  Also the function xfs_dev_is_read_only which was in
xfs_lrw.c before really doesn't fit in here at all and was moved to
xfs_mount.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

dda35b8f

xfs: fix dquota trace format · b262e5df

Christoph Hellwig authored Feb 14, 2010

The be32_to_cpu in the TP_printk output breaks automatic parsing of
the trace format by the trace-cmd tools, so we have to move it into
the TP_assign block.  While we're at it also fix the format for the
quota limits to more regular and easier parseable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

b262e5df

xfs: increase readdir buffer size · a9cc799e

Eric Sandeen authored Feb 03, 2010

While doing some testing of readdir perf a while back,
I noticed that the buffer size we're using internally is
smaller than what glibc gives us by default.  Upping this
size helped a bit, and seems safe.

glibc's __alloc_dir() does:

  const size_t default_allocation = (4 * BUFSIZ < sizeof (struct dirent64)
                                     ? sizeof (struct dirent64) : 4 * BUFSIZ);
  const size_t small_allocation = (BUFSIZ < sizeof (struct dirent64)
                                   ? sizeof (struct dirent64) : BUFSIZ);
  size_t allocation = default_allocation;
#ifdef _STATBUF_ST_BLKSIZE
  if (statp != NULL && default_allocation < statp->st_blksize)
    allocation = statp->st_blksize;
#endif

and

#define _G_BUFSIZ 8192
#define _IO_BUFSIZ _G_BUFSIZ
# define BUFSIZ _IO_BUFSIZ

so the default buffer is 4 * 8192 = 32768
(except in the unlikely case of blocks > 32k....)
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

a9cc799e

26 Feb, 2010 1 commit
- Merge branch 'linux-2.6.33' · 398007f8
  Alex Elder authored Feb 26, 2010
  
  398007f8
24 Feb, 2010 13 commits

Linux 2.6.33 · 60b341b7
Linus Torvalds authored Feb 24, 2010

60b341b7
Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6 · 1e6c5c4e
Linus Torvalds authored Feb 24, 2010
```
* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6:
  parisc: Set PCI CLS early in boot.
```
1e6c5c4e
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 · 46fe2438
Linus Torvalds authored Feb 24, 2010
```
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Fix broken sn2 build
```
46fe2438

parisc: Set PCI CLS early in boot. · 5fd4514b

Carlos O'Donell authored Feb 22, 2010

Set the PCI CLS early in the boot process to prevent
device failures. In pcibios_set_master use the new
pci_cache_line_size instead of a hard-coded value.
Signed-off-by: Carlos O'Donell <carlos@codesourcery.com>
Reviewed-by: Grant Grundler <grundler@google.com>
Signed-off-by: Kyle McMartin <kyle@redhat.com>

5fd4514b

Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze · 7b1f94b8

Linus Torvalds authored Feb 24, 2010

* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Fix out_le32() macro
  microblaze: Fix cache loop function for cache range

7b1f94b8

Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block · 83d90add

Linus Torvalds authored Feb 24, 2010

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  Revert "block: improve queue_should_plug() by looking at IO depths"

83d90add

microblaze: Fix out_le32() macro · 83b4d17d

Steven J. Magnani authored Feb 22, 2010

Trailing semicolon causes compilation involving out_le32() to fail.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>

83b4d17d

microblaze: Fix cache loop function for cache range · 0d670b24

Michal Simek authored Feb 15, 2010

I create wrong asm code but none test shows that this part of code is wrong.
I am not convinces that were good idea to create asm optimized macros
for caches. The reason is that there is not optimization with previous code
that's why make sense to add old code and do some benchmarking which
functions are faster.
Signed-off-by: Michal Simek <monstr@monstr.eu>

0d670b24

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 75ef7cdd

Linus Torvalds authored Feb 23, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  net: bug fix for vlan + gro issue
  tc35815: Remove a wrong netif_wake_queue() call which triggers BUG_ON
  cdc_ether: new PID for Ericsson C3607w to the whitelist (resubmit)
  IPv6: better document max_addresses parameter
  MAINTAINERS: update mv643xx_eth maintenance status
  e1000: Fix DMA mapping error handling on RX
  iwlwifi: sanity check before counting number of tfds can be free
  iwlwifi: error checking for number of tfds in queue
  iwlwifi: set HT flags after channel in rxon

75ef7cdd

net: bug fix for vlan + gro issue · c4d49794

Ajit Khaparde authored Feb 16, 2010

Traffic (tcp) doesnot start on a vlan interface when gro is enabled.
Even the tcp handshake was not taking place.
This is because, the eth_type_trans call before the netif_receive_skb
in napi_gro_finish() resets the skb->dev to napi->dev from the previously
set vlan netdev interface. This causes the ip_route_input to drop the
incoming packet considering it as a packet coming from a martian source.

I could repro this on 2.6.32.7 (stable) and 2.6.33-rc7.
With this fix, the traffic starts and the test runs fine on both vlan
and non-vlan interfaces.

CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

c4d49794

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 · be64c970

Linus Torvalds authored Feb 23, 2010

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPI: Be in TS_POLLING state during mwait based C-state entry
  ACPI: Fix regression where _PPC is not read at boot even when ignore_ppc=0
  acer-wmi: Respect current backlight level when loading

be64c970

Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 · 34e3f91b

Linus Torvalds authored Feb 23, 2010

* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/vmwgfx: Fix queries if no dma buffer thrashing is occuring.
  drm/nv50: fix vram ptes on IGPs to point at stolen system memory
  drm/nv50: fix instmem binding on IGPs to point at stolen system memory
  drm/nv50: improve vram page table construction
  drm/nv50: more efficient clearing of gpu page table entries
  drm/nv50: make nv50_mem_vm_{bind,unbind} operate only on vram
  drm/nouveau: Fix up pre-nv17 analog load detection.

34e3f91b

[IA64] Fix broken sn2 build · f7624c97

Hedi Berriche authored Feb 23, 2010

Revert the change made to arch/ia64/sn/kernel/setup.c by commit
204fba4a as it breaks the build.

Fixing the build the b94b0808 way
breaks xpc because genksyms then fails to generate an CRC for
per_cpu____sn_cnodeid_to_nasid because of limitations in the
generic genksyms code.
Signed-off-by: Hedi Berriche <hedi@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

f7624c97

23 Feb, 2010 6 commits

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 · 675c6070
David S. Miller authored Feb 23, 2010

675c6070

tc35815: Remove a wrong netif_wake_queue() call which triggers BUG_ON · 662a96bd

Atsushi Nemoto authored Feb 19, 2010

The netif_wake_queue() is called correctly (i.e. only on !txfull
condition) from txdone routine.  So Unconditional call to the
netif_wake_queue() here is wrong.  This might cause calling of
start_xmit routine on txfull state and trigger BUG_ON.

This bug does not happen when NAPI disabled.  After txdone there
must be at least one free tx slot.  But with NAPI, this is not
true anymore and the BUG_ON can hits on heavy load.

In this driver NAPI was enabled on 2.6.33-rc1 so this is
regression from 2.6.32 kernel.
Reported-by: Ralf Roesch <ralf.roesch@rw-gmbh.de>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

662a96bd

cdc_ether: new PID for Ericsson C3607w to the whitelist (resubmit) · cac43a1b

Torgny Johansson authored Feb 19, 2010

This patch adds a new vid/pid to the cdc_ether whitelist.

Device added:
- Ericsson Mobile Broadband variant C3607w
Signed-off-by: Torgny Johansson <torgny.johansson@gmail.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.htmlSigned-off-by: David S. Miller <davem@davemloft.net>

cac43a1b

IPv6: better document max_addresses parameter · e79dc484

Brian Haley authored Feb 22, 2010

Andrew Morton wrote:
>> >From ip-sysctl.txt file in kernel documentation I can see following description
>> for max_addresses:
>> max_addresses - INTEGER
>>         Number of maximum addresses per interface.  0 disables limitation.
>>         It is recommended not set too large value (or 0) because it would
>>         be too easy way to crash kernel to allow to create too much of
>>         autoconfigured addresses.
           ^^^^^^^^^^^^^^

>> If this parameter applies only for auto-configured IP addressed, please state
>> it more clearly in docs or rename the parameter to show that it refers to
>> auto-configuration.

It did mention autoconfigured in the text, but the below makes it more obvious.

More clearly document IPv6 max_addresses parameter.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e79dc484

MAINTAINERS: update mv643xx_eth maintenance status · f5ca8502

Lennert Buytenhek authored Feb 22, 2010

I am no longer with Marvell.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5ca8502

e1000: Fix DMA mapping error handling on RX · b5abb028

Anton Blanchard authored Feb 19, 2010

Check for error return from pci_map_single/pci_map_page and clean up.

With this and the previous patch the driver was able to handle a significant
percentage of errors (I set the fault injection rate to 10% and could still
download large files at a reasonable speed).
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5abb028