Commits · 6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda · nexedi / linux

27 Jun, 2008 11 commits

Support adding a spare to a live md array with external metadata. · 6c2fce2e

Neil Brown authored Jun 28, 2008

i.e. extend the 'md/dev-XXX/slot' attribute so that you can
tell a device to fill an vacant slot in an and md array.
Signed-off-by: Neil Brown <neilb@suse.de>

6c2fce2e

Enable setting of 'offset' and 'size' of a hot-added spare. · 8ed0a521

Neil Brown authored Jun 28, 2008

offset_store and rdev_size_store allow control of the region of a
device which is to be using in an md/raid array.
They only allow these values to be set when an array is being assembled,
as changing them on an active array could be dangerous.
However when adding a spare device to an array, we might need to
set the offset and size before starting recovery.  So allow
these values to be set also if "->raid_disk < 0" which indicates that
the device is still a spare.
Signed-off-by: Neil Brown <neilb@suse.de>

8ed0a521

Don't try to make md arrays dirty if that is not meaningful. · 1a0fd497

Neil Brown authored Jun 28, 2008

Arrays personalities such as 'raid0' and 'linear' have no redundancy,
and so marking them as 'clean' or 'dirty' is not meaningful.
So always allow write requests without requiring a superblock update.

Such arrays types are detected by ->sync_request being NULL. If it is
not possible to send a sync request we don't need a 'dirty' flag because
all a dirty flag does is trigger some sync_requests.
Signed-off-by: Neil Brown <neilb@suse.de>

1a0fd497

Close race in md_probe · f48ed538

Neil Brown authored Jun 28, 2008

There is a possible race in md_probe.  If two threads call md_probe
for the same device, then one could exit (having checked that
->gendisk exists) before the other has called kobject_init_and_add,
thus returning an incomplete kobj which will cause problems when
we try to add children to it.

So extend the range of protection of disks_mutex slightly to
avoid this possibility.
Signed-off-by: Neil Brown <neilb@suse.de>

f48ed538

Allow setting start point for requested check/repair · 5e96ee65

Neil Brown authored Jun 28, 2008

This makes it possible to just resync a small part of an array.
e.g. if a drive reports that it has questionable sectors,
a 'repair' of just the region covering those sectors will
cause them to be read and, if there is an error, re-written
with correct data.
Signed-off-by: Neil Brown <neilb@suse.de>

5e96ee65

Improve setting of "events_cleared" for write-intent bitmaps. · a0da84f3

Neil Brown authored Jun 28, 2008

When an array is degraded, bits in the write-intent bitmap are not
cleared, so that if the missing device is re-added, it can be synced
by only updated those parts of the device that have changed since
it was removed.

The enable this a 'events_cleared' value is stored. It is the event
counter for the array the last time that any bits were cleared.

Sometimes - if a device disappears from an array while it is 'clean' -
the events_cleared value gets updated incorrectly (there are subtle
ordering issues between updateing events in the main metadata and the
bitmap metadata) resulting in the missing device appearing to require
a full resync when it is re-added.

With this patch, we update events_cleared precisely when we are about
to clear a bit in the bitmap.  We record events_cleared when we clear
the bit internally, and copy that to the superblock which is written
out before the bit on storage.  This makes it more "obviously correct".

We also need to update events_cleared when the event_count is going
backwards (as happens on a dirty->clean transition of a non-degraded
array).

Thanks to Mike Snitzer for identifying this problem and testing early
"fixes".

Cc:  "Mike Snitzer" <snitzer@gmail.com>
Signed-off-by: Neil Brown <neilb@suse.de>

a0da84f3

use bio_endio instead of a call to bi_end_io · 0e13fe23

Neil Brown authored Jun 28, 2008

Turn calls to bi->bi_end_io() into bio_endio(). Apparently bio_endio does
exactly the same error processing as is hardcoded at these places.

bio_endio() avoids recursion (or will soon), so it should be used.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Neil Brown <neilb@suse.de>

0e13fe23

linear: correct disk numbering error check · 13864515

Nikanth Karthikesan authored Jun 28, 2008

From: "Nikanth Karthikesan" <knikanth@novell.com>

Correct disk numbering problem check.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>

13864515

Fix error paths if md_probe fails. · 9bbbca3a

Neil Brown authored Jun 28, 2008

md_probe can fail (e.g. alloc_disk could fail) without
returning an error (as it alway returns NULL).
So when we call mddev_find immediately afterwards, we need
to check that md_probe actually succeeded.  This means checking
that mdev->gendisk is non-NULL.

cc: <stable@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Neil Brown <neilb@suse.de>

9bbbca3a

Don't acknowlege that stripe-expand is complete until it really is. · efe31143

Neil Brown authored Jun 28, 2008

We shouldn't acknowledge that a stripe has been expanded (When
reshaping a raid5 by adding a device) until the moved data has
actually been written out.  However we are currently
acknowledging (by calling md_done_sync) when the POST_XOR
is complete and before the write.

So track in s.locked whether there are pending writes, and don't
call md_done_sync yet if there are.

Note: we all set R5_LOCKED on devices which are are about to
read from.  This probably isn't technically necessary, but is
usually done when writing a block, and justifies the use of
s.locked here.

This bug can lead to a crash if an array is stopped while an reshape
is in progress.

Cc: <stable@kernel.org>
Signed-off-by: Neil Brown <neilb@suse.de>

efe31143

Ensure interrupted recovery completed properly (v1 metadata plus bitmap) · 8c2e870a

Neil Brown authored Jun 28, 2008

If, while assembling an array, we find a device which is not fully
in-sync with the array, it is important to set the "fullsync" flags.
This is an exact analog to the setting of this flag in hot_add_disk
methods.

Currently, only v1.x metadata supports having devices in an array
which are not fully in-sync (it keep track of how in sync they are).
The 'fullsync' flag only makes a difference when a write-intent bitmap
is being used.  In this case it tells recovery to ignore the bitmap
and recovery all blocks.

This fix is already in place for raid1, but not raid5/6 or raid10.

So without this fix, a raid1 ir raid4/5/6 array with version 1.x
metadata and a write intent bitmaps, that is stopped in the middle
of a recovery, will appear to complete the recovery instantly
after it is reassembled, but the recovery will not be correct.

If you might have an array like that, issueing
   echo repair > /sys/block/mdXX/md/sync_action

will make sure recovery completes properly.

Cc: <stable@kernel.org>
Signed-off-by: Neil Brown <neilb@suse.de>

8c2e870a

25 Jun, 2008 4 commits

Linux 2.6.26-rc8 · 543cf4cb
Linus Torvalds authored Jun 24, 2008

543cf4cb

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 · bd8c540f

Linus Torvalds authored Jun 24, 2008

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte()
  [IA64] Handle count==0 in sn2_ptc_proc_write()
  [IA64] Fix boot failure on ia64/sn2

bd8c540f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes · 035cfc61

Linus Torvalds authored Jun 24, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  [GFS2] fix gfs2 block allocation (cleaned up)
  [GFS2] BUG: unable to handle kernel paging request at ffff81002690e000

035cfc61

Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm · 919c0d14

Linus Torvalds authored Jun 24, 2008

* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: Remove now unused structs from kvm_para.h
  x86: KVM guest: Use the paravirt clocksource structs and functions
  KVM: Make kvm host use the paravirt clocksource structs
  x86: Make xen use the paravirt clocksource structs and functions
  x86: Add structs and functions for paravirt clocksource
  KVM: VMX: Fix host msr corruption with preemption enabled
  KVM: ioapic: fix lost interrupt when changing a device's irq
  KVM: MMU: Fix oops on guest userspace access to guest pagetable
  KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)
  KVM: MMU: Fix rmap_write_protect() hugepage iteration bug
  KVM: close timer injection race window in __vcpu_run
  KVM: Fix race between timer migration and vcpu migration

919c0d14

24 Jun, 2008 25 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog · de08341a

Linus Torvalds authored Jun 24, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  Revert "[WATCHDOG] hpwdt: Add CFLAGS to get driver working"

de08341a

Merge branch 'x86-fixes-for-linus' of... · 9bf8a943

Linus Torvalds authored Jun 24, 2008

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  xen: remove support for non-PAE 32-bit

9bf8a943

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb · 3b968b7c

Linus Torvalds authored Jun 24, 2008

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  kgdb: sparse fix
  kgdb: documentation update - remove kgdboe

3b968b7c

enable bus mastering on i915 at resume time · ea7b44c8

Jie Luo authored Jun 24, 2008

On 9xx chips, bus mastering needs to be enabled at resume time for much of the
chip to function. With this patch, vblank interrupts will work as expected
on resume, along with other chip functions. Fixes kernel bugzilla #10844.
Signed-off-by: Jie Luo <clotho67@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ea7b44c8

KVM: Remove now unused structs from kvm_para.h · 6b1ed908

Gerd Hoffmann authored Jun 03, 2008

The kvm_* structs are obsoleted by the pvclock_* ones.
Now all users have been switched over and the old structs
can be dropped.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

6b1ed908

x86: KVM guest: Use the paravirt clocksource structs and functions · f6e16d5a

Gerd Hoffmann authored Jun 03, 2008

This patch updates the kvm host code to use the pvclock structs
and functions, thereby making it compatible with Xen.

The patch also fixes an initialization bug: on SMP systems the
per-cpu has two different locations early at boot and after CPU
bringup.  kvmclock must take that in account when registering the
physical address within the host.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

f6e16d5a

KVM: Make kvm host use the paravirt clocksource structs · 50d0a0f9

Gerd Hoffmann authored Jun 03, 2008

This patch updates the kvm host code to use the pvclock structs.
It also makes the paravirt clock compatible with Xen.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

50d0a0f9

x86: Make xen use the paravirt clocksource structs and functions · 1c7b67f7

Gerd Hoffmann authored Jun 03, 2008

This patch updates the xen guest to use the pvclock structs
and helper functions.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

1c7b67f7

x86: Add structs and functions for paravirt clocksource · 7af192c9

Gerd Hoffmann authored Jun 03, 2008

This patch adds structs for the paravirt clocksource ABI
used by both xen and kvm (pvclock-abi.h).

It also adds some helper functions to read system time and
wall clock time from a paravirtual clocksource (pvclock.[ch]).
They are based on the xen code.  They are enabled using
CONFIG_PARAVIRT_CLOCK.

Subsequent patches of this series will put the code in use.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

7af192c9

[GFS2] fix gfs2 block allocation (cleaned up) · 5af4e7a0

Benjamin Marzinski authored Jun 24, 2008

This patch fixes bz 450641.

This patch changes the computation for zero_metapath_length(), which it
renames to metapath_branch_start(). When you are extending the metadata
tree, The indirect blocks that point to the new data block must either
diverge from the existing tree either at the inode, or at the first
indirect block. They can diverge at the first indirect block because the
inode has room for 483 pointers while the indirect blocks have room for
509 pointers, so when the tree is grown, there is some free space in the
first indirect block. What metapath_branch_start() now computes is the
height where the first indirect block for the new data block is located.
It can either be 1 (if the indirect block diverges from the inode) or 2
(if it diverges from the first indirect block).
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

5af4e7a0

[IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte() · e2569b7e

Julia Lawall authored Jun 24, 2008

As noted by Akinobu Mita alloc_bootmem and related functions never return
NULL and always return a zeroed region of memory.  Thus a NULL test or
memset after calls to these functions is unnecessary.
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Tony Luck <tony.luck@intel.com>

e2569b7e

[IA64] Handle count==0 in sn2_ptc_proc_write() · 8097110d

Cliff Wickman authored Jun 24, 2008

The fix applied in e0c6d97c
"security hole in sn2_ptc_proc_write" didn't take into account
the case where count==0 (which results in a buffer underrun
when adding the trailing '\0').  Thanks to Andi Kleen for
pointing this out.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

8097110d

[IA64] Fix boot failure on ia64/sn2 · 2826f8c0

Jes Sorensen authored Jun 24, 2008

Call check_sal_cache_flush() after platform_setup() as
check_sal_cache_flush() now relies on being able to call platform
vector code.

Problem was introduced by: 3463a93d
"Update check_sal_cache_flush to use platform_send_ipi()"
Signed-off-by: Jes Sorensen <jes@sgi.com>
Tested-by: Alex Chiang: <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>

2826f8c0

kgdb: sparse fix · aabdc3b8

Jason Wessel authored Jun 24, 2008

- Fix warning reported by sparse
kernel/kgdb.c:1502:6: warning: symbol 'kgdb_console_write' was not declared.
	Should it be static?
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>

aabdc3b8

kgdb: documentation update - remove kgdboe · a606b5e2

Jason Wessel authored Jun 24, 2008

kgdboe is not presently included kgdb, and there should be no
references to it.

Also fix the tcp port terminal connection example.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>

a606b5e2

xen: remove support for non-PAE 32-bit · 28499143

Jeremy Fitzhardinge authored May 09, 2008

Non-PAE operation has been deprecated in Xen for a while, and is
rarely tested or used.  xen-unstable has now officially dropped
non-PAE support.  Since Xen/pvops' non-PAE support has also been
broken for a while, we may as well completely drop it altogether.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

28499143

[GFS2] BUG: unable to handle kernel paging request at ffff81002690e000 · 17c15da0

Bob Peterson authored Jun 18, 2008

This patch fixes bugzilla bug bz448866: gfs2: BUG: unable to
handle kernel paging request at ffff81002690e000.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

17c15da0

Revert "[WATCHDOG] hpwdt: Add CFLAGS to get driver working" · 63842ccc

Wim Van Sebroeck authored Jun 24, 2008

After Linus fixed the inline assembly, the CFLAGS option is not
needed anymore.
Signed-off-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>

63842ccc

KVM: VMX: Fix host msr corruption with preemption enabled · a9b21b62

Avi Kivity authored Jun 24, 2008

Switching msrs can occur either synchronously as a result of calls to
the msr management functions (usually in response to the guest touching
virtualized msrs), or asynchronously when preempting a kvm thread that has
guest state loaded. If we're unlucky enough to have the two at the same
time, host msrs are corrupted and the machine goes kaput on the next syscall.

Most easily triggered by Windows Server 2008, as it does a lot of msr
switching during bootup.
Signed-off-by: Avi Kivity <avi@qumranet.com>

a9b21b62

KVM: ioapic: fix lost interrupt when changing a device's irq · 4fa6b9c5

Avi Kivity authored Jun 17, 2008

The ioapic acknowledge path translates interrupt vectors to irqs.  It
currently uses a first match algorithm, stopping when it finds the first
redirection table entry containing the vector.  That fails however if the
guest changes the irq to a different line, leaving the old redirection table
entry in place (though masked).  Result is interrupts not making it to the
guest.

Fix by always scanning the entire redirection table.
Signed-off-by: Avi Kivity <avi@qumranet.com>

4fa6b9c5

KVM: MMU: Fix oops on guest userspace access to guest pagetable · 6bf6a953

Avi Kivity authored Jun 12, 2008

KVM has a heuristic to unshadow guest pagetables when userspace accesses
them, on the assumption that most guests do not allow userspace to access
pagetables directly. Unfortunately, in addition to unshadowing the pagetables,
it also oopses.

This never triggers on ordinary guests since sane OSes will clear the
pagetables before assigning them to userspace, which will trigger the flood
heuristic, unshadowing the pagetables before the first userspace access. One
particular guest, though (Xenner) will run the kernel in userspace, triggering
the oops.  Since the heuristic is incorrect in this case, we can simply
remove it.
Signed-off-by: Avi Kivity <avi@qumranet.com>

6bf6a953

KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend) · 30945387

Marcelo Tosatti authored Jun 11, 2008

kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed
guests properly. It will instantiate two 2MB sptes pointing to the same
physical 2MB page when a guest large pte update is trapped.

Instead of duplicating code to handle this, disallow directory level
updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes
emulating one guest 4MB pte can be correctly created by the page fault
handling path.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

30945387

KVM: MMU: Fix rmap_write_protect() hugepage iteration bug · 6597ca09

Marcelo Tosatti authored Jun 08, 2008

rmap_next() does not work correctly after rmap_remove(), as it expects
the rmap chains not to change during iteration.  Fix (for now) by restarting
iteration from the beginning.
Signed-off-by: Avi Kivity <avi@qumranet.com>

6597ca09

KVM: close timer injection race window in __vcpu_run · 06e05645

Marcelo Tosatti authored Jun 06, 2008

If a timer fires after kvm_inject_pending_timer_irqs() but before
local_irq_disable() the code will enter guest mode and only inject such
timer interrupt the next time an unrelated event causes an exit.

It would be simpler if the timer->pending irq conversion could be done
with IRQ's disabled, so that the above problem cannot happen.

For now introduce a new vcpu requests bit to cancel guest entry.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

06e05645

KVM: Fix race between timer migration and vcpu migration · d4acf7e7

Marcelo Tosatti authored Jun 06, 2008

A guest vcpu instance can be scheduled to a different physical CPU
between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable().

If that happens, the timer will only be migrated to the current pCPU on
the next exit, meaning that guest LAPIC timer event can be delayed until
a host interrupt is triggered.

Fix it by cancelling guest entry if any vcpu request is pending.  This
has the side effect of nicely consolidating vcpu->requests checks.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

d4acf7e7