Commit 2b49f029 authored by NeilBrown's avatar NeilBrown Committed by Stefan Bader

md: fix two problems with setting the "re-add" device state.

BugLink: https://bugs.launchpad.net/bugs/1784382

commit 011abdc9 upstream.

If "re-add" is written to the "state" file for a device
which is faulty, this has an effect similar to removing
and re-adding the device.  It should take up the
same slot in the array that it previously had, and
an accelerated (e.g. bitmap-based) rebuild should happen.

The slot that "it previously had" is determined by
rdev->saved_raid_disk.
However this is not set when a device fails (only when a device
is added), and it is cleared when resync completes.
This means that "re-add" will normally work once, but may not work a
second time.

This patch includes two fixes.
1/ when a device fails, record the ->raid_disk value in
    ->saved_raid_disk before clearing ->raid_disk
2/ when "re-add" is written to a device for which
    ->saved_raid_disk is not set, fail.

I think this is suitable for stable as it can
cause re-adding a device to be forced to do a full
resync which takes a lot longer and so puts data at
more risk.

Cc: <stable@vger.kernel.org> (v4.1)
Fixes: 97f6cd39 ("md-cluster: re-add capabilities")
Signed-off-by: default avatarNeilBrown <neilb@suse.com>
Reviewed-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: default avatarShaohua Li <shli@fb.com>
Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: default avatarKleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: default avatarKhalid Elmously <khalid.elmously@canonical.com>
parent 93beef62
...@@ -2690,7 +2690,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len) ...@@ -2690,7 +2690,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
err = 0; err = 0;
} }
} else if (cmd_match(buf, "re-add")) { } else if (cmd_match(buf, "re-add")) {
if (test_bit(Faulty, &rdev->flags) && (rdev->raid_disk == -1)) { if (test_bit(Faulty, &rdev->flags) && (rdev->raid_disk == -1) &&
rdev->saved_raid_disk >= 0) {
/* clear_bit is performed _after_ all the devices /* clear_bit is performed _after_ all the devices
* have their local Faulty bit cleared. If any writes * have their local Faulty bit cleared. If any writes
* happen in the meantime in the local node, they * happen in the meantime in the local node, they
...@@ -8153,6 +8154,7 @@ static int remove_and_add_spares(struct mddev *mddev, ...@@ -8153,6 +8154,7 @@ static int remove_and_add_spares(struct mddev *mddev,
if (mddev->pers->hot_remove_disk( if (mddev->pers->hot_remove_disk(
mddev, rdev) == 0) { mddev, rdev) == 0) {
sysfs_unlink_rdev(mddev, rdev); sysfs_unlink_rdev(mddev, rdev);
rdev->saved_raid_disk = rdev->raid_disk;
rdev->raid_disk = -1; rdev->raid_disk = -1;
removed++; removed++;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment