Commits · f2ea68cf42aafdd93393b6b8b20fc3c2b5f4390c · Kirill Smelkov / linux

An error occurred fetching the project authors.

21 Jul, 2008 5 commits

md: only count actual openers as access which prevent a 'stop' · f2ea68cf

NeilBrown authored 16 years ago

Open isn't the only thing that increments ->active.  e.g. reading
/proc/mdstat will increment it briefly.  So to avoid false positives
in testing for concurrent access, introduce a new counter that counts
just the number of times the md device it open.
Signed-off-by: NeilBrown <neilb@suse.de>

f2ea68cf

md: Make mddev->array_size sector-based. · f233ea5c

Andre Noll authored 16 years ago

This patch renames the array_size field of struct mddev_s to array_sectors
and converts all instances to use units of 512 byte sectors instead of 1k
blocks.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>

f233ea5c

md: Make super_type->rdev_size_change() take sector-based sizes. · 15f4a5fd

Andre Noll authored 16 years ago

Also, change the type of the size parameter from unsigned long long to
sector_t and rename it to num_sectors.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>

15f4a5fd

md: Fix check for overlapping devices. · d07bd3bc

Andre Noll authored 16 years ago

The checks in overlaps() expect all parameters either in block-based
or sector-based quantities. However, its single caller passes two
rdev->data_offset arguments as well as two rdev->size arguments, the
former being sector counts while the latter are measured in 1K blocks.

This could cause rdev_size_store() to accept an invalid size from user
space. Fix it by passing only sector-based quantities to overlaps().
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>

d07bd3bc

md: Tidy up rdev_size_store a bit: · d7027458

Neil Brown authored 16 years ago

 - used strict_strtoull in place of simple_strtoull
 - use my_mddev in place of rdev->mddev (they have the same value)
and more significantly,
 - don't adjust mddev->size to fit, rather reject changes which make
   rdev->size smaller than mddev->size

Adjusting mddev->size is a hangover from bind_rdev_to_array which
does a similar thing.  But it really is a better design to insist that
mddev->size is set as required, then the rdev->sizes are set to allow
for that.  The previous way invites confusion.
Signed-off-by: NeilBrown <neilb@suse.de>

d7027458

11 Jul, 2008 10 commits

md: Turn rdev->sb_offset into a sector-based quantity. · 0f420358

Andre Noll authored 16 years ago

Rename it to sb_start to make sure all users have been converted.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

0f420358

md: Make calc_dev_sboffset() return a sector count. · b73df2d3

Andre Noll authored 16 years ago

As BLOCK_SIZE_BITS is 10 and

	MD_NEW_SIZE_SECTORS(2 * x) = 2 * NEW_SIZE_BLOCKS(x),

the return value of calc_dev_sboffset() doubles. Fix up all three
callers accordingly.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

b73df2d3

md: Replace calc_dev_size() by calc_num_sectors(). · e7debaa4

Andre Noll authored 16 years ago

Number of sectors is the preferred unit for sizes of raid devices,
so change calc_dev_size() so that it returns this unit instead of
the number of 1K blocks.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

e7debaa4

md: Make update_size() take the number of sectors. · d71f9f88

Andre Noll authored 16 years ago

Changing the internal representations of sizes of raid devices
from 1K blocks to sector counts (512B units) is desirable because
it allows to get rid of many divisions/multiplications and unnecessary
casts that are present in the current code.

This patch is a first step in this direction. It replaces the old
1K-based "size" argument of update_size() by "num_sectors" and
fixes up its two callers.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

d71f9f88

md: Better control of when do_md_stop is allowed to stop the array. · df5b20cf

Neil Brown authored 16 years ago

do_md_stop check the number of active users before allowing the array
to be stopped.
Two problems:
  1/ it assumes the request is coming through an open file descriptor
     (via ioctl) so it allows for that.  This is not always the case.
  2/ it doesn't do the check it the array hasn't been activated.
     This is not good for cases when we use an inactive array to hold
     some devices in a container.
Signed-off-by: Neil Brown <neilb@suse.de>

df5b20cf

md: get_disk_info(): Don't convert between signed and unsigned and back. · 26ef379f

Andre Noll authored 16 years ago

The current code copies a signed int from user space, converts it to
unsigned and passes the unsigned value to find_rdev_nr() which expects
a signed value. Simply pass the signed value from user space directly.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

26ef379f

md: Simplify restart_array(). · 80fab1d7

Andre Noll authored 16 years ago

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

80fab1d7

md: alloc_disk_sb(): Return proper error value. · ebc24337

Andre Noll authored 16 years ago

If alloc_page() fails, ENOMEM is a more suitable error value
than EINVAL.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

ebc24337

md: Simplify sb_equal(). · ce0c8e05

Andre Noll authored 16 years ago

The only caller of sb_equal() tests the return value against
zero, so it's OK to return the negated return value of memcmp().
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

ce0c8e05

md: Simplify uuid_equal(). · 05710466

Andre Noll authored 16 years ago

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

05710466

08 Jul, 2008 7 commits

md: sb_equal(): Fix misleading printk. · 35020f1a

Andre Noll authored 16 years ago

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

35020f1a

md: Fix a typo in the comment to cmd_match(). · 7f6ce769

Andre Noll authored 16 years ago

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

7f6ce769

md: Fix typo in array_state comment. · 910d8cb3

Andre Noll authored 16 years ago

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

910d8cb3

md: sync_speed_show(): Trivial cleanups. · 9687a60c

Andre Noll authored 16 years ago

- Remove superfluous parentheses.
- Make format string match the type of the variable that is printed.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

9687a60c

md: do_md_run(): Fix misleading error message. · 13e53df3

Andre Noll authored 16 years ago

In case pers->run() succeeds but creating the bitmap fails, we
print an error message stating that pers->run() has failed.

Print this message only if pers->run() really failed.
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

13e53df3

md: md_getgeo(): Move comment to proper position. · 2f9618ce
Andre Noll authored 16 years ago
```
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>
```
2f9618ce

md: md_ioctl(): Fix misleading indentation. · bb57fc64

Andre Noll authored 16 years ago

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Neil Brown <neilb@suse.de>

bb57fc64

01 Jul, 2008 1 commit

md: resolve external metadata handling deadlock in md_allow_write · b5470dc5

Dan Williams authored 16 years ago

md_allow_write() marks the metadata dirty while holding mddev->lock and then
waits for the write to complete.  For externally managed metadata this causes a
deadlock as userspace needs to take the lock to communicate that the metadata
update has completed.

Change md_allow_write() in the 'external' case to start the 'mark active'
operation and then return -EAGAIN.  The expected side effects while waiting for
userspace to write 'active' to 'array_state' are holding off reshape (code
currently handles -ENOMEM), cause some 'stripe_cache_size' change requests to
fail, cause some GET_BITMAP_FILE ioctl requests to fall back to GFP_NOIO, and
cause updates to 'raid_disks' to fail.  Except for 'stripe_cache_size' changes
these failures can be mitigated by coordinating with mdmon.

md_write_start() still prevents writes from occurring until the metadata
handler has had a chance to take action as it unconditionally waits for
MD_CHANGE_CLEAN to be cleared.

[neilb@suse.de: return -EAGAIN, try GFP_NOIO]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

b5470dc5

27 Jun, 2008 13 commits

Support changing rdev size on running arrays. · 0cd17fec

Chris Webb authored 16 years ago

From: Chris Webb <chris@arachsys.com>

Allow /sys/block/mdX/md/rdY/size to change on running arrays, moving the
superblock if necessary for this metadata version. We prevent the available
space from shrinking to less than the used size, and allow it to be set to zero
to fill all the available space on the underlying device.
Signed-off-by: Chris Webb <chris@arachsys.com>
Signed-off-by: Neil Brown <neilb@suse.de>

0cd17fec

Make sure all changes to md/dev-XX/state are notified · 52664732

Neil Brown authored 16 years ago

The important state change happens during an interrupt
in md_error.  So just set a flag there and call sysfs_notify
later in process context.
Signed-off-by: Neil Brown <neilb@suse.de>

52664732

Make sure all changes to md/degraded are notified. · a99ac971

Neil Brown authored 16 years ago

When a device fails, when a spare is activated, when
an array is reshaped, or when an array is started,
the extent to which the array is degraded can change.
Signed-off-by: Neil Brown <neilb@suse.de>

a99ac971

Make sure all changes to md/sync_action are notified. · 72a23c21

Neil Brown authored 16 years ago

When the 'resync' thread starts or stops, when we explicitly
set sync_action, or when we determine that there is definitely nothing
to do, we notify sync_action.

To stop "sync_action" from occasionally showing the wrong value,
we introduce a new flags - MD_RECOVERY_RECOVER - to say that a
recovery is probably needed or happening, and we make sure
that we set MD_RECOVERY_RUNNING before clearing MD_RECOVERY_NEEDED.
Signed-off-by: Neil Brown <neilb@suse.de>

72a23c21

Make sure all changes to md/array_state are notified. · 0fd62b86

Neil Brown authored 16 years ago

Changes in md/array_state could be of interest to a monitoring
program.  So make sure all changes trigger a notification.

Exceptions:
   changing active_idle to active is not reported because it
      is frequent and not interesting.
   changing active to active_idle is only reported on arrays
      with externally managed metadata, as it is not interesting
      otherwise.
Signed-off-by: Neil Brown <neilb@suse.de>

0fd62b86

Don't reject HOT_REMOVE_DISK request for an array that is not yet started. · c7d0c941

Neil Brown authored 16 years ago

There is really no need for this test here, and there are valid
cases for selectively removing devices from an array that
it not actually active.
Signed-off-by: Neil Brown <neilb@suse.de>

c7d0c941

rationalise return value for ->hot_add_disk method. · 199050ea

Neil Brown authored 16 years ago

For all array types but linear, ->hot_add_disk returns 1 on
success, 0 on failure.
For linear, it returns 0 on success and -errno on failure.

This doesn't cause a functional problem because the ->hot_add_disk
function of linear is used quite differently to the others.
However it is confusing.

So convert all to return 0 for success or -errno on failure
and fix call sites to match.
Signed-off-by: Neil Brown <neilb@suse.de>

199050ea

Support adding a spare to a live md array with external metadata. · 6c2fce2e

Neil Brown authored 16 years ago

i.e. extend the 'md/dev-XXX/slot' attribute so that you can
tell a device to fill an vacant slot in an and md array.
Signed-off-by: Neil Brown <neilb@suse.de>

6c2fce2e

Enable setting of 'offset' and 'size' of a hot-added spare. · 8ed0a521

Neil Brown authored 16 years ago

offset_store and rdev_size_store allow control of the region of a
device which is to be using in an md/raid array.
They only allow these values to be set when an array is being assembled,
as changing them on an active array could be dangerous.
However when adding a spare device to an array, we might need to
set the offset and size before starting recovery.  So allow
these values to be set also if "->raid_disk < 0" which indicates that
the device is still a spare.
Signed-off-by: Neil Brown <neilb@suse.de>

8ed0a521

Don't try to make md arrays dirty if that is not meaningful. · 1a0fd497

Neil Brown authored 16 years ago

Arrays personalities such as 'raid0' and 'linear' have no redundancy,
and so marking them as 'clean' or 'dirty' is not meaningful.
So always allow write requests without requiring a superblock update.

Such arrays types are detected by ->sync_request being NULL. If it is
not possible to send a sync request we don't need a 'dirty' flag because
all a dirty flag does is trigger some sync_requests.
Signed-off-by: Neil Brown <neilb@suse.de>

1a0fd497

Close race in md_probe · f48ed538

Neil Brown authored 16 years ago

There is a possible race in md_probe.  If two threads call md_probe
for the same device, then one could exit (having checked that
->gendisk exists) before the other has called kobject_init_and_add,
thus returning an incomplete kobj which will cause problems when
we try to add children to it.

So extend the range of protection of disks_mutex slightly to
avoid this possibility.
Signed-off-by: Neil Brown <neilb@suse.de>

f48ed538

Allow setting start point for requested check/repair · 5e96ee65

Neil Brown authored 16 years ago

This makes it possible to just resync a small part of an array.
e.g. if a drive reports that it has questionable sectors,
a 'repair' of just the region covering those sectors will
cause them to be read and, if there is an error, re-written
with correct data.
Signed-off-by: Neil Brown <neilb@suse.de>

5e96ee65

Fix error paths if md_probe fails. · 9bbbca3a

Neil Brown authored 16 years ago

md_probe can fail (e.g. alloc_disk could fail) without
returning an error (as it alway returns NULL).
So when we call mddev_find immediately afterwards, we need
to check that md_probe actually succeeded.  This means checking
that mdev->gendisk is non-NULL.

cc: <stable@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Neil Brown <neilb@suse.de>

9bbbca3a

06 Jun, 2008 1 commit

md: fix uninitialized use of mddev->recovery_wait · a6d8113a

Dan Williams authored 16 years ago

If an array was created with --assume-clean we will oops when trying to
set ->resync_max.

Fix this by initializing ->recovery_wait in mddev_find.

Cc: <stable@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a6d8113a

24 May, 2008 3 commits

md: restart recovery cleanly after device failure. · dfc70645

NeilBrown authored 16 years ago

When we get any IO error during a recovery (rebuilding a spare), we abort
the recovery and restart it.

For RAID6 (and multi-drive RAID1) it may not be best to restart at the
beginning: when multiple failures can be tolerated, the recovery may be
able to continue and re-doing all that has already been done doesn't make
sense.

We already have the infrastructure to record where a recovery is up to
and restart from there, but it is not being used properly.
This is because:
  - We sometimes abort with MD_RECOVERY_ERR rather than just MD_RECOVERY_INTR,
    which causes the recovery not be be checkpointed.
  - We remove spares and then re-added them which loses important state
    information.

The distinction between MD_RECOVERY_ERR and MD_RECOVERY_INTR really isn't
needed.  If there is an error, the relevant drive will be marked as
Faulty, and that is enough to ensure correct handling of the error.  So we
first remove MD_RECOVERY_ERR, changing some of the uses of it to
MD_RECOVERY_INTR.

Then we cause the attempt to remove a non-faulty device from an array to
fail (unless recovery is impossible as the array is too degraded).  Then
when remove_and_add_spares attempts to remove the devices on which
recovery can continue, it will fail, they will remain in place, and
recovery will continue on them as desired.

Issue:  If we are halfway through rebuilding a spare and another drive
fails, and a new spare is immediately available,  do we want to:
 1/ complete the current rebuild, then go back and rebuild the new spare or
 2/ restart the rebuild from the start and rebuild both devices in
    parallel.

Both options can be argued for.  The code currently takes option 2 as
  a/ this requires least code change
  b/ this results in a minimally-degraded array in minimal time.

Cc: "Eivind Sarto" <ivan@kasenna.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dfc70645

md: allow parallel resync of md-devices. · 90b08710

Bernd Schubert authored 16 years ago

In some configurations, a raid6 resync can be limited by CPU speed
(Calculating P and Q and moving data) rather than by device speed.  In
these cases there is nothing to be gained byt serialising resync of arrays
that share a device, and doing the resync in parallel can provide benefit.
 So add a sysfs tunable to flag an array as being allowed to resync in
parallel with other arrays that use (a different part of) the same device.
Signed-off-by: Bernd Schubert <bs@q-leap.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

90b08710

md: notify userspace on 'stop' events · 4f54b0e9

Dan Williams authored 16 years ago

This additional notification to 'array_state' is needed to allow the
monitor application to learn about stop events via sysfs.  The
sysfs_notify("sync_action") call that comes at the end of do_md_stop()
(via md_new_event) is insufficient since the 'sync_action' attribute has
been removed by this point.

(Seems like a sysfs-notify-on-removal patch is a better fix.  Currently
removal updates the event count but does not wake up waiters)
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4f54b0e9