1. 25 Apr, 2022 4 commits
    • Xiaomeng Tong's avatar
      md: fix an incorrect NULL check in md_reload_sb · 64c54d92
      Xiaomeng Tong authored
      The bug is here:
      	if (!rdev || rdev->desc_nr != nr) {
      
      The list iterator value 'rdev' will *always* be set and non-NULL
      by rdev_for_each_rcu(), so it is incorrect to assume that the
      iterator value will be NULL if the list is empty or no element
      found (In fact, it will be a bogus pointer to an invalid struct
      object containing the HEAD). Otherwise it will bypass the check
      and lead to invalid memory access passing the check.
      
      To fix the bug, use a new variable 'iter' as the list iterator,
      while using the original variable 'pdev' as a dedicated pointer to
      point to the found element.
      
      Cc: stable@vger.kernel.org
      Fixes: 70bcecdb ("md-cluster: Improve md_reload_sb to be less error prone")
      Signed-off-by: default avatarXiaomeng Tong <xiam0nd.tong@gmail.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      64c54d92
    • Xiaomeng Tong's avatar
      md: fix an incorrect NULL check in does_sb_need_changing · fc873834
      Xiaomeng Tong authored
      The bug is here:
      	if (!rdev)
      
      The list iterator value 'rdev' will *always* be set and non-NULL
      by rdev_for_each(), so it is incorrect to assume that the iterator
      value will be NULL if the list is empty or no element found.
      Otherwise it will bypass the NULL check and lead to invalid memory
      access passing the check.
      
      To fix the bug, use a new variable 'iter' as the list iterator,
      while using the original variable 'rdev' as a dedicated pointer to
      point to the found element.
      
      Cc: stable@vger.kernel.org
      Fixes: 2aa82191 ("md-cluster: Perform a lazy update")
      Acked-by: default avatarGuoqing Jiang <guoqing.jiang@linux.dev>
      Signed-off-by: default avatarXiaomeng Tong <xiam0nd.tong@gmail.com>
      Acked-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      fc873834
    • Mariusz Tkaczyk's avatar
      raid5: introduce MD_BROKEN · 57668f0a
      Mariusz Tkaczyk authored
      Raid456 module had allowed to achieve failed state. It was fixed by
      fb73b357 ("raid5: block failing device if raid will be failed").
      This fix introduces a bug, now if raid5 fails during IO, it may result
      with a hung task without completion. Faulty flag on the device is
      necessary to process all requests and is checked many times, mainly in
      analyze_stripe().
      Allow to set faulty on drive again and set MD_BROKEN if raid is failed.
      
      As a result, this level is allowed to achieve failed state again, but
      communication with userspace (via -EBUSY status) will be preserved.
      
      This restores possibility to fail array via #mdadm --set-faulty command
      and will be fixed by additional verification on mdadm side.
      
      Reproduction steps:
       mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1
       mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean
       mkfs.xfs /dev/md126 -f
       mount /dev/md126 /mnt/root/
      
       fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw
      --bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4
      --time_based --group_reporting --name=throughput-test-job
      --eta-newline=1 &
      
       echo 1 > /sys/block/nvme2n1/device/device/remove
       echo 1 > /sys/block/nvme1n1/device/device/remove
      
       [ 1475.787779] Call Trace:
       [ 1475.793111] __schedule+0x2a6/0x700
       [ 1475.799460] schedule+0x38/0xa0
       [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456]
       [ 1475.813856] ? finish_wait+0x80/0x80
       [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456]
       [ 1475.828281] ? finish_wait+0x80/0x80
       [ 1475.834727] ? finish_wait+0x80/0x80
       [ 1475.841127] ? finish_wait+0x80/0x80
       [ 1475.847480] md_handle_request+0x119/0x190
       [ 1475.854390] md_make_request+0x8a/0x190
       [ 1475.861041] generic_make_request+0xcf/0x310
       [ 1475.868145] submit_bio+0x3c/0x160
       [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60
       [ 1475.882070] iomap_dio_bio_actor+0x175/0x390
       [ 1475.889149] iomap_apply+0xff/0x310
       [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390
       [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390
       [ 1475.909974] iomap_dio_rw+0x2f2/0x490
       [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390
       [ 1475.923680] ? atime_needs_update+0x77/0xe0
       [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
       [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
       [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs]
       [ 1475.953403] aio_read+0xd5/0x180
       [ 1475.959395] ? _cond_resched+0x15/0x30
       [ 1475.965907] io_submit_one+0x20b/0x3c0
       [ 1475.972398] __x64_sys_io_submit+0xa2/0x180
       [ 1475.979335] ? do_io_getevents+0x7c/0xc0
       [ 1475.986009] do_syscall_64+0x5b/0x1a0
       [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca
       [ 1476.000255] RIP: 0033:0x7f11fc27978d
       [ 1476.006631] Code: Bad RIP value.
       [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds.
      
      Cc: stable@vger.kernel.org
      Fixes: fb73b357 ("raid5: block failing device if raid will be failed")
      Reviewd-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      57668f0a
    • Mariusz Tkaczyk's avatar
      md: Set MD_BROKEN for RAID1 and RAID10 · 9631abdb
      Mariusz Tkaczyk authored
      There is no direct mechanism to determine raid failure outside
      personality. It is done by checking rdev->flags after executing
      md_error(). If "faulty" flag is not set then -EBUSY is returned to
      userspace. -EBUSY means that array will be failed after drive removal.
      
      Mdadm has special routine to handle the array failure and it is executed
      if -EBUSY is returned by md.
      
      There are at least two known reasons to not consider this mechanism
      as correct:
      1. drive can be removed even if array will be failed[1].
      2. -EBUSY seems to be wrong status. Array is not busy, but removal
         process cannot proceed safe.
      
      -EBUSY expectation cannot be removed without breaking compatibility
      with userspace. In this patch first issue is resolved by adding support
      for MD_BROKEN flag for RAID1 and RAID10. Support for RAID456 is added in
      next commit.
      
      The idea is to set the MD_BROKEN if we are sure that raid is in failed
      state now. This is done in each error_handler(). In md_error() MD_BROKEN
      flag is checked. If is set, then -EBUSY is returned to userspace.
      
      As in previous commit, it causes that #mdadm --set-faulty is able to
      fail array. Previously proposed workaround is valid if optional
      functionality[1] is disabled.
      
      [1] commit 9a567843("md: allow last device to be forcibly removed from
          RAID1/RAID10.")
      Reviewd-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      9631abdb
  2. 18 Apr, 2022 36 commits