- 04 Jun, 2009 2 commits
-
-
Chris Mason authored
The btrfs allocator uses list_for_each to walk the available block groups when searching for free blocks. It starts off with a hint to help find the best block group for a given allocation. The hint is resolved into a block group, but we don't properly check to make sure the block group we find isn't in the middle of being freed due to filesystem shrinking or balancing. If it is being freed, the list pointers in it are bogus and can't be trusted. But, the code happily goes along and uses them in the list_for_each loop, leading to all kinds of fun. The fix used here is to check to make sure the block group we find really is on the list before we use it. list_del_init is used when removing it from the list, so we can do a proper check. The allocation clustering code has a similar bug where it will trust the block group in the current free space cluster. If our allocation flags have changed (going from single spindle dup to raid1 for example) because the drives in the FS have changed, we're not allowed to use the old block group any more. The fix used here is to check the current cluster against the current allocation flags. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Yan Zheng authored
It was not being properly initialized, and so the size saved to disk was not correct. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
- 14 May, 2009 6 commits
-
-
Sankar P authored
Signed-off-by: Sankar P <sankar.curiosity@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Sage Weil authored
The notreelog and flushoncommit mount options were being printed slightly differently. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Li Hong authored
In Li Zefan's commit dae7b665, a combination call of kmalloc() and copy_from_user() is replaced by memdup_user(). So btrfs_ioctl_resize() doesn't use GFP_NOFS any more. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
These debugging WARN_ONs make too much console noise during regular IO failures. An IO failure will still generate a number of messages as we verify checksums etc, but these two are not needed. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
When a btrfs metadata read fails, the first thing we try to do is find a good copy on another mirror of the block. If this fails, read_tree_block() ends up returning a buffer that isn't up to date. The btrfs btree reading code was reworked to drop locks and repeat the search when IO was done, but the changes didn't add a check for failed reads. The end result was looping forever on buffers that were never going to become up to date. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
This flag is used to decide when we need to send a given file through the ordered code to make sure it is fully written before a transaction commits. It was not being properly set to zero when the inode was being setup. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
- 27 Apr, 2009 8 commits
-
-
Chris Mason authored
This changes btrfs_read_locked_inode() to peek ahead in the btree for acl items. If it is certain a given inode has no acls, it will set the in memory acl fields to null to avoid acl lookups completely. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
Linus noticed the btrfs code to cache acls wasn't properly caching a NULL acl when the inode didn't have any acls. This meant the common case of no acls resulted in expensive btree searches every time the kernel checked permissions (which is quite often). This is a modified version of Linus' original patch: Properly set initial acl fields to BTRFS_ACL_NOT_CACHED in the inode. This forces an acl lookup when permission checks are done. Fix btrfs_get_acl to avoid lookups and locking when the inode acls fields are set to null. Fix btrfs_get_acl to use the right return value from __btrfs_getxattr when deciding to cache a NULL acl. It was storing a NULL acl when __btrfs_getxattr return -ENOENT, but __btrfs_getxattr was actually returning -ENODATA for this case. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Joel Becker authored
Just happened to notice a bunch of %llu vs u64 warnings. Here's a patch to cast them all. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Joel Becker authored
A small warning popped up on ia64 because inode-map.c was comparing a u64 object id with the ULL FIRST_FREE_OBJECTID. My first thought was that all the OBJECTID constants should contain the u64 cast because btrfs code deals entirely in u64s. But then I saw how large that was, and figured I'd just fix the max() call. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
Btrfs has printks for various IO errors, including bad checksums and mismatches between what we expect the block headers to contain and what we actually find on the disk. Longer term we need a real reporting mechanism for this, but for now printk is going to have to do. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
Btrfs had some old code sitting around under #if 0, this drops it. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Ball authored
Previously, we updated a device's size prior to attempting a shrink operation. This patch moves the device resizing logic to only happen if the shrink completes successfully. In the process, it introduces a new field to btrfs_device -- disk_total_bytes -- to track the on-disk size. Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
- 24 Apr, 2009 6 commits
-
-
Chris Mason authored
After a transaction commit, the old root of the subvol btrees are sent through snapshot removal. This is what actually frees up any blocks replaced by COW, and anything the old blocks pointed to. Snapshot deletion will pause when a transaction commit has started, which helps to avoid a huge amount of delayed reference count updates piling up as the transaction is trying to close. But, this pause happens after the snapshot deletion process has asked other procs on the system to throttle back a bit so that it can make progress. We don't want to throttle everyone while we're waiting for the transaction commit, it leads to deadlocks in the user transaction ioctls used by Ceph and makes things slower in general. This patch changes things to avoid the throttling while we sleep. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
The btrfs fallocate call takes an extent lock on the entire range being fallocated, and then runs through insert_reserved_extent on each extent as they are allocated. The problem with this is that btrfs_drop_extents may decide to try and take the same extent lock fallocate was already holding. The solution used here is to push down knowledge of the range that is already locked going into btrfs_drop_extents. It turns out that at least one other caller had the same bug. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Christoph Hellwig authored
Just use kmem_cache_create directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Christoph Hellwig authored
Currently the extent_map code is only for btrfs so don't export it's symbols. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Christoph Hellwig authored
Get rid of the hacks for building out of tree, and always use += for assigning to the object lists. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Josef Bacik authored
This patch makes the chunk allocator keep a good ratio of metadata vs data block groups. By default for every 8 data block groups, we'll allocate 1 metadata chunk, or about 12% of the disk will be allocated for metadata. This can be changed by specifying the metadata_ratio mount option. This is simply the number of data block groups that have to be allocated to force a metadata chunk allocation. By making sure we allocate metadata chunks more often, we are less likely to get into situations where the whole disk has been allocated as data block groups. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
- 21 Apr, 2009 1 commit
-
-
Chris Mason authored
Btrfs fallocate was incorrectly starting a transaction with a lock held on the extent_io tree for the file, which could deadlock. Strictly speaking it was using join_transaction which would be safe, but it is better to move the transaction outside of the lock. When preallocated extents are overwritten, btrfs_mark_buffer_dirty was being called on an unlocked buffer. This was triggering an assertion and oops because the lock is supposed to be held. The bug was calling btrfs_mark_buffer_dirty on a leaf after btrfs_del_item had been run. btrfs_del_item takes care of dirtying things, so the solution is a to skip the btrfs_mark_buffer_dirty call in this case. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
- 20 Apr, 2009 4 commits
-
-
Chris Mason authored
reada_for_balance was using the wrong index into the path node array, so it wasn't reading the right blocks. We never directly used the results of the read done by this function because the btree search is started over at the end. This fixes reada_for_balance to reada in the correct node and to avoid searching past the last slot in the node. It also makes sure to hold the parent lock while we are finding the nodes to read. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
The extent_io writepage call updates the writepage index in the inode as it makes progress. But, it was doing the update after unlocking the page, which isn't legal because page->mapping can't be trusted once the page is unlocked. This lead to an oops, especially common with compression turned on. The fix here is to update the writeback index before unlocking the page. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
Btrfs is using WRITE_SYNC_PLUG to send down synchronous IOs with a higher priority. But, the checksumming helper threads prevent it from being fully effective. There are two problems. First, a big queue of pending checksumming will delay the synchronous IO behind other lower priority writes. Second, the checksumming uses an ordered async work queue. The ordering makes sure that IOs are sent to the block layer in the same order they are sent to the checksumming threads. Usually this gives us less seeky IO. But, when we start mixing IO priorities, the lower priority IO can delay the higher priority IO. This patch solves both problems by adding a high priority list to the async helper threads, and a new btrfs_set_work_high_prio(), which is used to make put a new async work item onto the higher priority list. The ordering is still done on high priority IO, but all of the high priority bios are ordered separately from the low priority bios. This ordering is purely an IO optimization, it is not involved in data or metadata integrity. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
Part of reducing fsync/O_SYNC/O_DIRECT latencies is using WRITE_SYNC for writes we plan on waiting on in the near future. This patch mirrors recent changes in other filesystems and the generic code to use WRITE_SYNC when WB_SYNC_ALL is passed and to use WRITE_SYNC for other latency critical writes. Btrfs uses async worker threads for checksumming before the write is done, and then again to actually submit the bios. The bio submission code just runs a per-device list of bios that need to be sent down the pipe. This list is split into low priority and high priority lists so the WRITE_SYNC IO happens first. Signed-off-by: Chris Mason <chris.mason@oracle.com>
-
- 14 Apr, 2009 13 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intelLinus Torvalds authored
* 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: drm/i915: fix scheduling while holding the new active list spinlock drm/i915: Allow tiling of objects with bit 17 swizzling by the CPU. drm/i915: Correctly set the write flag for get_user_pages in pread. drm/i915: Fix use of uninitialized var in 40a5f0de drm/i915: indicate framebuffer restore key in SysRq help message drm/i915: sync hdmi detection by hdmi identifier with 2D drm/i915: Fix a mismerge of the IGD patch (new .find_pll hooks missed) drm/i915: Implement batch and ring buffer dumping
-
Hugh Dickins authored
Revert part of af5c820a ("x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c") That change is causing only one Intel CPU's microcode to be updated e.g. microcode: CPU3 updated from revision 0x9 to 0x17, date = 2005-04-22 where before it announced that also for CPU0 and CPU1 and CPU2. We cannot use work_on_cpu() in the CONFIG_MICROCODE_OLD_INTERFACE code, because Intel's request_microcode_user() involves a copy_from_user() from /sbin/microcode_ctl, which therefore needs to be on that CPU at the time. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Shaohua Li authored
regression caused by commit 5e118f41: i915_gem_object_move_to_inactive() should be called in task context, as it calls fput(); Signed-off-by: Shaohua Li<shaohua.li@intel.com> [anholt: Add more detail to the comment about the lock break that's added] Signed-off-by: Eric Anholt <eric@anholt.net>
-
Linus Torvalds authored
Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: lockdep: warn about lockdep disabling after kernel taint, fix
-
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuseLinus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix "direct_io" private mmap fuse: fix argument type in fuse_get_user_pages()
-
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2Linus Torvalds authored
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix possible mismatch of sufile counters on recovery nilfs2: segment usage file cleanups nilfs2: fix wrong accounting and duplicate brelse in nilfs_sufile_set_error nilfs2: simplify handling of active state of segments fix nilfs2: remove module version nilfs2: fix lockdep recursive locking warning on meta data files nilfs2: fix lockdep recursive locking warning on bmap nilfs2: return f_fsid for statfs2
-
git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds authored
* 'fixes-for-linus' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Add missing FILE tag to MAINTAINERS microblaze: remove duplicated #include's microblaze: struct device - replace bus_id with dev_name() microblaze: Simplify copy_thread() microblaze: Add TIMESTAMPING constants to socket.h microblaze: Add missing empty ftrace.h file microblaze: Fix problem with removing zero length files
-
git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6Linus Torvalds authored
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: sh: Add in PCI bus for DMA API debugging. sh: Pre-allocate a reasonable number of DMA debug entries. sh: sh7786: modify usb setup timeout judgment bug. MAINTAINERS: Update sh architecture file patterns. sh: ap325: use edge control for ov772x camera sh: Plug in support for ARCH=sh64 using sh SRCARCH. sh: urquell: Fix up address mapping in board comments. sh: Add support for DMA API debugging. sh: Provide cpumask_of_pcibus() to fix NUMA build. sh: urquell: Add board comment sh: wire up sys_preadv/sys_pwritev() syscalls. sh: sh7785lcr: fix PCI address map for 32-bit mode sh: intc: Added resume from hibernation support to the intc
-
David Howells authored
Fix lpfc_parse_bg_err()'s use of do_div(). It should be passing a 64-bit variable as the first parameter. However, since it's only using a 32-bit variable, it doesn't need to use do_div() at all, but can instead use the division operator. This deals with the following warnings: CC drivers/scsi/lpfc/lpfc_scsi.o drivers/scsi/lpfc/lpfc_scsi.c: In function 'lpfc_parse_bg_err': drivers/scsi/lpfc/lpfc_scsi.c:1397: warning: comparison of distinct pointer types lacks a cast drivers/scsi/lpfc/lpfc_scsi.c:1397: warning: right shift count >= width of type drivers/scsi/lpfc/lpfc_scsi.c:1397: warning: passing argument 1 of '__div64_32' from incompatible pointer type Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Leandro Dorileo authored
Updates some usb_serial_port members documentation. Signed-off-by: Leandro Dorileo <ldorileo@gmail.com> Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tony Breeds authored
In commit 51dcdfec ("parport: Use the PCI IRQ if offered") parport_pc_probe_port() gained an irqflags arg. This isn't being supplied on powerpc. This patch make powerpc fallback to the old behaviour, that is using "0" for irqflags. Fixes build failure: In file included from drivers/parport/parport_pc.c:68: arch/powerpc/include/asm/parport.h: In function 'parport_pc_find_nonpci_ports': arch/powerpc/include/asm/parport.h:32: error: too few arguments to function 'parport_pc_probe_port' arch/powerpc/include/asm/parport.h:32: error: too few arguments to function 'parport_pc_probe_port' arch/powerpc/include/asm/parport.h:32: error: too few arguments to function 'parport_pc_probe_port' make[3]: *** [drivers/parport/parport_pc.o] Error 1 Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
These got overlooked first time around. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-