1. 25 Apr, 2022 2 commits
    • Mariusz Tkaczyk's avatar
      raid5: introduce MD_BROKEN · 57668f0a
      Mariusz Tkaczyk authored
      Raid456 module had allowed to achieve failed state. It was fixed by
      fb73b357 ("raid5: block failing device if raid will be failed").
      This fix introduces a bug, now if raid5 fails during IO, it may result
      with a hung task without completion. Faulty flag on the device is
      necessary to process all requests and is checked many times, mainly in
      analyze_stripe().
      Allow to set faulty on drive again and set MD_BROKEN if raid is failed.
      
      As a result, this level is allowed to achieve failed state again, but
      communication with userspace (via -EBUSY status) will be preserved.
      
      This restores possibility to fail array via #mdadm --set-faulty command
      and will be fixed by additional verification on mdadm side.
      
      Reproduction steps:
       mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1
       mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean
       mkfs.xfs /dev/md126 -f
       mount /dev/md126 /mnt/root/
      
       fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw
      --bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4
      --time_based --group_reporting --name=throughput-test-job
      --eta-newline=1 &
      
       echo 1 > /sys/block/nvme2n1/device/device/remove
       echo 1 > /sys/block/nvme1n1/device/device/remove
      
       [ 1475.787779] Call Trace:
       [ 1475.793111] __schedule+0x2a6/0x700
       [ 1475.799460] schedule+0x38/0xa0
       [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456]
       [ 1475.813856] ? finish_wait+0x80/0x80
       [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456]
       [ 1475.828281] ? finish_wait+0x80/0x80
       [ 1475.834727] ? finish_wait+0x80/0x80
       [ 1475.841127] ? finish_wait+0x80/0x80
       [ 1475.847480] md_handle_request+0x119/0x190
       [ 1475.854390] md_make_request+0x8a/0x190
       [ 1475.861041] generic_make_request+0xcf/0x310
       [ 1475.868145] submit_bio+0x3c/0x160
       [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60
       [ 1475.882070] iomap_dio_bio_actor+0x175/0x390
       [ 1475.889149] iomap_apply+0xff/0x310
       [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390
       [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390
       [ 1475.909974] iomap_dio_rw+0x2f2/0x490
       [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390
       [ 1475.923680] ? atime_needs_update+0x77/0xe0
       [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
       [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
       [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs]
       [ 1475.953403] aio_read+0xd5/0x180
       [ 1475.959395] ? _cond_resched+0x15/0x30
       [ 1475.965907] io_submit_one+0x20b/0x3c0
       [ 1475.972398] __x64_sys_io_submit+0xa2/0x180
       [ 1475.979335] ? do_io_getevents+0x7c/0xc0
       [ 1475.986009] do_syscall_64+0x5b/0x1a0
       [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca
       [ 1476.000255] RIP: 0033:0x7f11fc27978d
       [ 1476.006631] Code: Bad RIP value.
       [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds.
      
      Cc: stable@vger.kernel.org
      Fixes: fb73b357 ("raid5: block failing device if raid will be failed")
      Reviewd-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      57668f0a
    • Mariusz Tkaczyk's avatar
      md: Set MD_BROKEN for RAID1 and RAID10 · 9631abdb
      Mariusz Tkaczyk authored
      There is no direct mechanism to determine raid failure outside
      personality. It is done by checking rdev->flags after executing
      md_error(). If "faulty" flag is not set then -EBUSY is returned to
      userspace. -EBUSY means that array will be failed after drive removal.
      
      Mdadm has special routine to handle the array failure and it is executed
      if -EBUSY is returned by md.
      
      There are at least two known reasons to not consider this mechanism
      as correct:
      1. drive can be removed even if array will be failed[1].
      2. -EBUSY seems to be wrong status. Array is not busy, but removal
         process cannot proceed safe.
      
      -EBUSY expectation cannot be removed without breaking compatibility
      with userspace. In this patch first issue is resolved by adding support
      for MD_BROKEN flag for RAID1 and RAID10. Support for RAID456 is added in
      next commit.
      
      The idea is to set the MD_BROKEN if we are sure that raid is in failed
      state now. This is done in each error_handler(). In md_error() MD_BROKEN
      flag is checked. If is set, then -EBUSY is returned to userspace.
      
      As in previous commit, it causes that #mdadm --set-faulty is able to
      fail array. Previously proposed workaround is valid if optional
      functionality[1] is disabled.
      
      [1] commit 9a567843("md: allow last device to be forcibly removed from
          RAID1/RAID10.")
      Reviewd-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      9631abdb
  2. 18 Apr, 2022 38 commits