Commits · 44fb5511638938a2c37c895abc14df648ffc07e9 · nexedi / linux

04 Jun, 2009 2 commits

Btrfs: Fix oops and use after free during space balancing · 44fb5511

Chris Mason authored Jun 04, 2009

The btrfs allocator uses list_for_each to walk the available block
groups when searching for free blocks.  It starts off with a hint
to help find the best block group for a given allocation.

The hint is resolved into a block group, but we don't properly check
to make sure the block group we find isn't in the middle of being
freed due to filesystem shrinking or balancing.  If it is being
freed, the list pointers in it are bogus and can't be trusted.  But,
the code happily goes along and uses them in the list_for_each loop,
leading to all kinds of fun.

The fix used here is to check to make sure the block group we find really
is on the list before we use it.  list_del_init is used when removing
it from the list, so we can do a proper check.

The allocation clustering code has a similar bug where it will trust
the block group in the current free space cluster.  If our allocation
flags have changed (going from single spindle dup to raid1 for example)
because the drives in the FS have changed, we're not allowed to use
the old block group any more.

The fix used here is to check the current cluster against the
current allocation flags.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

44fb5511

Btrfs: set device->total_disk_bytes when adding new device · 2cc3c559

Yan Zheng authored Jun 04, 2009

It was not being properly initialized, and so the size saved to
disk was not correct.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

2cc3c559

14 May, 2009 6 commits

Btrfs: Spelling fix in btrfs_lookup_first_block_group comments · 9f55684c

Sankar P authored May 14, 2009

Signed-off-by: Sankar P <sankar.curiosity@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

9f55684c

Btrfs: make show_options result match actual option names · 6b65c5c6

Sage Weil authored May 14, 2009

The notreelog and flushoncommit mount options were being printed slightly
differently.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

6b65c5c6

Btrfs: remove outdated comment in btrfs_ioctl_resize() · 5d847a8e

Li Hong authored May 14, 2009

In Li Zefan's commit dae7b665,
a combination call of kmalloc() and copy_from_user() is replaced by
memdup_user(). So btrfs_ioctl_resize() doesn't use GFP_NOFS any more.
Signed-off-by: Li Hong <lihong.hi@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

5d847a8e

Btrfs: remove some WARN_ONs in the IO failure path · cc7b0c9b

Chris Mason authored May 14, 2009

These debugging WARN_ONs make too much console noise during regular
IO failures.  An IO failure will still generate a number of messages
as we verify checksums etc, but these two are not needed.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

cc7b0c9b

Btrfs: Don't loop forever on metadata IO failures · 76a05b35

Chris Mason authored May 14, 2009

When a btrfs metadata read fails, the first thing we try to do is find
a good copy on another mirror of the block.  If this fails, read_tree_block()
ends up returning a buffer that isn't up to date.

The btrfs btree reading code was reworked to drop locks and repeat
the search when IO was done, but the changes didn't add a check for failed
reads.  The end result was looping forever on buffers that were never
going to become up to date.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

76a05b35

Btrfs: init inode ordered_data_close flag properly · 2757495c

Chris Mason authored May 14, 2009

This flag is used to decide when we need to send a given file through
the ordered code to make sure it is fully written before a transaction
commits.  It was not being properly set to zero when the inode was
being setup.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

2757495c

27 Apr, 2009 8 commits

Btrfs: look for acls during btrfs_read_locked_inode · 46a53cca

Chris Mason authored Apr 27, 2009

This changes btrfs_read_locked_inode() to peek ahead in the btree for acl items.
If it is certain a given inode has no acls, it will set the in memory acl
fields to null to avoid acl lookups completely.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

46a53cca

Btrfs: fix acl caching · 7b1a14bb

Chris Mason authored Apr 27, 2009

Linus noticed the btrfs code to cache acls wasn't properly caching
a NULL acl when the inode didn't have any acls.  This meant the common
case of no acls resulted in expensive btree searches every time the
kernel checked permissions (which is quite often).

This is a modified version of Linus' original patch:

Properly set initial acl fields to BTRFS_ACL_NOT_CACHED in the inode.
This forces an acl lookup when permission checks are done.

Fix btrfs_get_acl to avoid lookups and locking when the inode acls fields
are set to null.

Fix btrfs_get_acl to use the right return value from __btrfs_getxattr
when deciding to cache a NULL acl.  It was storing a NULL acl when
__btrfs_getxattr return -ENOENT, but __btrfs_getxattr was actually returning
-ENODATA for this case.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

7b1a14bb

Btrfs: Fix a bunch of printk() warnings. · 21380931

Joel Becker authored Apr 21, 2009

Just happened to notice a bunch of %llu vs u64 warnings.  Here's a patch
to cast them all.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

21380931

Btrfs: Fix a trivial warning using max() of u64 vs ULL. · e63b6a6c

Joel Becker authored Apr 21, 2009

A small warning popped up on ia64 because inode-map.c was comparing a
u64 object id with the ULL FIRST_FREE_OBJECTID.  My first thought was
that all the OBJECTID constants should contain the u64 cast because
btrfs code deals entirely in u64s.  But then I saw how large that was,
and figured I'd just fix the max() call.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

e63b6a6c

Btrfs: remove unused btrfs_bit_radix slab · 45c06543
Chris Mason authored Apr 27, 2009
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
45c06543

Btrfs: ratelimit IO error printks · 193f284d

Chris Mason authored Apr 27, 2009

Btrfs has printks for various IO errors, including bad checksums and
mismatches between what we expect the block headers to contain and what
we actually find on the disk.

Longer term we need a real reporting mechanism for this, but for now
printk is going to have to do.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

193f284d

Btrfs: remove #if 0 code · b7967db7

Chris Mason authored Apr 27, 2009

Btrfs had some old code sitting around under #if 0, this drops it.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

b7967db7

Btrfs: When shrinking, only update disk size on success · d6397bae

Chris Ball authored Apr 27, 2009

Previously, we updated a device's size prior to attempting a shrink
operation. This patch moves the device resizing logic to only happen if
the shrink completes successfully. In the process, it introduces a new
field to btrfs_device -- disk_total_bytes -- to track the on-disk size.
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d6397bae

24 Apr, 2009 6 commits

Btrfs: fix deadlocks and stalls on dead root removal · 59bc5c75

Chris Mason authored Apr 24, 2009

After a transaction commit, the old root of the subvol btrees are sent through
snapshot removal. This is what actually frees up any blocks replaced by
COW, and anything the old blocks pointed to.

Snapshot deletion will pause when a transaction commit has started, which
helps to avoid a huge amount of delayed reference count updates piling up
as the transaction is trying to close.

But, this pause happens after the snapshot deletion process has asked other
procs on the system to throttle back a bit so that it can make progress.

We don't want to throttle everyone while we're waiting for the transaction
commit, it leads to deadlocks in the user transaction ioctls used by Ceph
and makes things slower in general.

This patch changes things to avoid the throttling while we sleep.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

59bc5c75

Btrfs: fix fallocate deadlock on inode extent lock · e980b50c

Chris Mason authored Apr 24, 2009

The btrfs fallocate call takes an extent lock on the entire range
being fallocated, and then runs through insert_reserved_extent on each
extent as they are allocated.

The problem with this is that btrfs_drop_extents may decide to try
and take the same extent lock fallocate was already holding.  The solution
used here is to push down knowledge of the range that is already locked
going into btrfs_drop_extents.

It turns out that at least one other caller had the same bug.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

e980b50c

Btrfs: kill btrfs_cache_create · 9601e3f6

Christoph Hellwig authored Apr 13, 2009

Just use kmem_cache_create directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

9601e3f6

Btrfs: don't export symbols · 0d4bf11e

Christoph Hellwig authored Apr 13, 2009

Currently the extent_map code is only for btrfs so don't export it's
symbols.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

0d4bf11e

Btrfs: simplify makefile · 2ea2544e

Christoph Hellwig authored Apr 13, 2009

Get rid of the hacks for building out of tree, and always use += for
assigning to the object lists.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

2ea2544e

Btrfs: try to keep a healthy ratio of metadata vs data block groups · 97e728d4

Josef Bacik authored Apr 21, 2009

This patch makes the chunk allocator keep a good ratio of metadata vs data
block groups. By default for every 8 data block groups, we'll allocate 1
metadata chunk, or about 12% of the disk will be allocated for metadata. This
can be changed by specifying the metadata_ratio mount option.

This is simply the number of data block groups that have to be allocated to
force a metadata chunk allocation. By making sure we allocate metadata chunks
more often, we are less likely to get into situations where the whole disk
has been allocated as data block groups.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

97e728d4

21 Apr, 2009 1 commit

Btrfs: fix btrfs fallocate oops and deadlock · 546888da

Chris Mason authored Apr 21, 2009

Btrfs fallocate was incorrectly starting a transaction with a lock held
on the extent_io tree for the file, which could deadlock. Strictly
speaking it was using join_transaction which would be safe, but it is better
to move the transaction outside of the lock.

When preallocated extents are overwritten, btrfs_mark_buffer_dirty was
being called on an unlocked buffer. This was triggering an assertion and
oops because the lock is supposed to be held.

The bug was calling btrfs_mark_buffer_dirty on a leaf after btrfs_del_item had
been run. btrfs_del_item takes care of dirtying things, so the solution is a
to skip the btrfs_mark_buffer_dirty call in this case.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

546888da

20 Apr, 2009 4 commits

Btrfs: use the right node in reada_for_balance · 8c594ea8

Chris Mason authored Apr 20, 2009

reada_for_balance was using the wrong index into the path node array,
so it wasn't reading the right blocks.  We never directly used the
results of the read done by this function because the btree search is
started over at the end.

This fixes reada_for_balance to reada in the correct node and to
avoid searching past the last slot in the node.  It also makes sure to
hold the parent lock while we are finding the nodes to read.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

8c594ea8

Btrfs: fix oops on page->mapping->host during writepage · 11c8349b

Chris Mason authored Apr 20, 2009

The extent_io writepage call updates the writepage index in the inode
as it makes progress.  But, it was doing the update after unlocking the page,
which isn't legal because page->mapping can't be trusted once the page
is unlocked.

This lead to an oops, especially common with compression turned on.  The
fix here is to update the writeback index before unlocking the page.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

11c8349b

Btrfs: add a priority queue to the async thread helpers · d313d7a3

Chris Mason authored Apr 20, 2009

Btrfs is using WRITE_SYNC_PLUG to send down synchronous IOs with a
higher priority.  But, the checksumming helper threads prevent it
from being fully effective.

There are two problems.  First, a big queue of pending checksumming
will delay the synchronous IO behind other lower priority writes.  Second,
the checksumming uses an ordered async work queue.  The ordering makes sure
that IOs are sent to the block layer in the same order they are sent
to the checksumming threads.  Usually this gives us less seeky IO.

But, when we start mixing IO priorities, the lower priority IO can delay
the higher priority IO.

This patch solves both problems by adding a high priority list to the async
helper threads, and a new btrfs_set_work_high_prio(), which is used
to make put a new async work item onto the higher priority list.

The ordering is still done on high priority IO, but all of the high
priority bios are ordered separately from the low priority bios.  This
ordering is purely an IO optimization, it is not involved in data
or metadata integrity.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d313d7a3

Btrfs: use WRITE_SYNC for synchronous writes · ffbd517d

Chris Mason authored Apr 20, 2009

Part of reducing fsync/O_SYNC/O_DIRECT latencies is using WRITE_SYNC for
writes we plan on waiting on in the near future.  This patch
mirrors recent changes in other filesystems and the generic code to
use WRITE_SYNC when WB_SYNC_ALL is passed and to use WRITE_SYNC for
other latency critical writes.

Btrfs uses async worker threads for checksumming before the write is done,
and then again to actually submit the bios.  The bio submission code just
runs a per-device list of bios that need to be sent down the pipe.

This list is split into low priority and high priority lists so the
WRITE_SYNC IO happens first.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

ffbd517d

14 Apr, 2009 13 commits

Linux 2.6.30-rc2 · 0882e8dd
Linus Torvalds authored Apr 14, 2009

0882e8dd

Merge branch 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel · b897e6fb

Linus Torvalds authored Apr 14, 2009

* 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
  drm/i915: fix scheduling while holding the new active list spinlock
  drm/i915: Allow tiling of objects with bit 17 swizzling by the CPU.
  drm/i915: Correctly set the write flag for get_user_pages in pread.
  drm/i915: Fix use of uninitialized var in 40a5f0de
  drm/i915: indicate framebuffer restore key in SysRq help message
  drm/i915: sync hdmi detection by hdmi identifier with 2D
  drm/i915: Fix a mismerge of the IGD patch (new .find_pll hooks missed)
  drm/i915: Implement batch and ring buffer dumping

b897e6fb

x86 microcode: revert some work_on_cpu · 6f66cbc6

Hugh Dickins authored Apr 14, 2009

Revert part of af5c820a ("x86: cpumask:
use work_on_cpu in arch/x86/kernel/microcode_core.c")

That change is causing only one Intel CPU's microcode to be updated e.g.
microcode: CPU3 updated from revision 0x9 to 0x17, date = 2005-04-22
where before it announced that also for CPU0 and CPU1 and CPU2.

We cannot use work_on_cpu() in the CONFIG_MICROCODE_OLD_INTERFACE code,
because Intel's request_microcode_user() involves a copy_from_user() from
/sbin/microcode_ctl, which therefore needs to be on that CPU at the time.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6f66cbc6

drm/i915: fix scheduling while holding the new active list spinlock · 68c84342

Shaohua Li authored Apr 08, 2009

regression caused by commit 5e118f41:
i915_gem_object_move_to_inactive() should be called in task context,
as it calls fput();

Signed-off-by: Shaohua Li<shaohua.li@intel.com>
[anholt: Add more detail to the comment about the lock break that's added]
Signed-off-by: Eric Anholt <eric@anholt.net>

68c84342

Merge branch 'core-fixes-for-linus' of... · 610f26e7

Linus Torvalds authored Apr 14, 2009

Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  lockdep: warn about lockdep disabling after kernel taint, fix

610f26e7

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · e9de427e

Linus Torvalds authored Apr 14, 2009

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix "direct_io" private mmap
  fuse: fix argument type in fuse_get_user_pages()

e9de427e

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 · 9fc0178c

Linus Torvalds authored Apr 14, 2009

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix possible mismatch of sufile counters on recovery
  nilfs2: segment usage file cleanups
  nilfs2: fix wrong accounting and duplicate brelse in nilfs_sufile_set_error
  nilfs2: simplify handling of active state of segments fix
  nilfs2: remove module version
  nilfs2: fix lockdep recursive locking warning on meta data files
  nilfs2: fix lockdep recursive locking warning on bmap
  nilfs2: return f_fsid for statfs2

9fc0178c

Merge branch 'fixes-for-linus' of git://git.monstr.eu/linux-2.6-microblaze · 2b6b6d38

Linus Torvalds authored Apr 14, 2009

* 'fixes-for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Add missing FILE tag to MAINTAINERS
  microblaze: remove duplicated #include's
  microblaze: struct device - replace bus_id with dev_name()
  microblaze: Simplify copy_thread()
  microblaze: Add TIMESTAMPING constants to socket.h
  microblaze: Add missing empty ftrace.h file
  microblaze: Fix problem with removing zero length files

2b6b6d38

Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 · 3e862dd5

Linus Torvalds authored Apr 14, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: Add in PCI bus for DMA API debugging.
  sh: Pre-allocate a reasonable number of DMA debug entries.
  sh: sh7786: modify usb setup timeout judgment bug.
  MAINTAINERS: Update sh architecture file patterns.
  sh: ap325: use edge control for ov772x camera
  sh: Plug in support for ARCH=sh64 using sh SRCARCH.
  sh: urquell: Fix up address mapping in board comments.
  sh: Add support for DMA API debugging.
  sh: Provide cpumask_of_pcibus() to fix NUMA build.
  sh: urquell: Add board comment
  sh: wire up sys_preadv/sys_pwritev() syscalls.
  sh: sh7785lcr: fix PCI address map for 32-bit mode
  sh: intc: Added resume from hibernation support to the intc

3e862dd5

Fix lpfc_parse_bg_err()'s use of do_div() · 2344b5b6

David Howells authored Apr 14, 2009

Fix lpfc_parse_bg_err()'s use of do_div(). It should be passing a 64-bit
variable as the first parameter. However, since it's only using a 32-bit
variable, it doesn't need to use do_div() at all, but can instead use the
division operator.

This deals with the following warnings:

CC drivers/scsi/lpfc/lpfc_scsi.o
drivers/scsi/lpfc/lpfc_scsi.c: In function 'lpfc_parse_bg_err':
drivers/scsi/lpfc/lpfc_scsi.c:1397: warning: comparison of distinct pointer types lacks a cast
drivers/scsi/lpfc/lpfc_scsi.c:1397: warning: right shift count >= width of type
drivers/scsi/lpfc/lpfc_scsi.c:1397: warning: passing argument 1 of '__div64_32' from incompatible pointer type
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2344b5b6

tty: Update some of the USB kernel doc · 78c5b82e

Leandro Dorileo authored Apr 14, 2009

Updates some usb_serial_port members documentation.
Signed-off-by: Leandro Dorileo <ldorileo@gmail.com>
Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

78c5b82e

parport_pc: Fix build failure drivers/parport/parport_pc.c for powerpc · 19e05426

Tony Breeds authored Apr 14, 2009

In commit 51dcdfec ("parport: Use the
PCI IRQ if offered") parport_pc_probe_port() gained an irqflags arg.
This isn't being supplied on powerpc. This patch make powerpc fallback
to the old behaviour, that is using "0" for irqflags.

Fixes build failure:

In file included from drivers/parport/parport_pc.c:68:
arch/powerpc/include/asm/parport.h: In function 'parport_pc_find_nonpci_ports':
arch/powerpc/include/asm/parport.h:32: error: too few arguments to function 'parport_pc_probe_port'
arch/powerpc/include/asm/parport.h:32: error: too few arguments to function 'parport_pc_probe_port'
arch/powerpc/include/asm/parport.h:32: error: too few arguments to function 'parport_pc_probe_port'
make[3]: *** [drivers/parport/parport_pc.o] Error 1
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

19e05426

parport: Fix various uses of parport_pc · 28783eb5

Alan Cox authored Apr 14, 2009

These got overlooked first time around.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

28783eb5