btrfs: Continue replace when set_block_ro failed (76a8efa1) · Commits · nexedi / linux

Commit 76a8efa1 authored Nov 17, 2015 by

Zhaolei Committed by Chris Mason Nov 25, 2015

btrfs: Continue replace when set_block_ro failed

xfstests/011 failed in node with small_size filesystem.
Can be reproduced by following script:
  DEV_LIST="/dev/vdd /dev/vde"
  DEV_REPLACE="/dev/vdf"

  do_test()
  {
      local mkfs_opt="$1"
      local size="$2"

      dmesg -c >/dev/null
      umount $SCRATCH_MNT &>/dev/null

      echo  mkfs.btrfs -f $mkfs_opt "${DEV_LIST[*]}"
      mkfs.btrfs -f $mkfs_opt "${DEV_LIST[@]}" || return 1
      mount "${DEV_LIST[0]}" $SCRATCH_MNT

      echo -n "Writing big files"
      dd if=/dev/urandom of=$SCRATCH_MNT/t0 bs=1M count=1 >/dev/null 2>&1
      for ((i = 1; i <= size; i++)); do
          echo -n .
          /bin/cp $SCRATCH_MNT/t0 $SCRATCH_MNT/t$i || return 1
      done
      echo

      echo Start replace
      btrfs replace start -Bf "${DEV_LIST[0]}" "$DEV_REPLACE" $SCRATCH_MNT || {
          dmesg
          return 1
      }
      return 0
  }

  # Set size to value near fs size
  # for example, 1897 can trigger this bug in 2.6G device.
  #
  ./do_test "-d raid1 -m raid1" 1897

System will report replace fail with following warning in dmesg:
 [  134.710853] BTRFS: dev_replace from /dev/vdd (devid 1) to /dev/vdf started
 [  135.542390] BTRFS: btrfs_scrub_dev(/dev/vdd, 1, /dev/vdf) failed -28
 [  135.543505] ------------[ cut here ]------------
 [  135.544127] WARNING: CPU: 0 PID: 4080 at fs/btrfs/dev-replace.c:428 btrfs_dev_replace_start+0x398/0x440()
 [  135.545276] Modules linked in:
 [  135.545681] CPU: 0 PID: 4080 Comm: btrfs Not tainted 4.3.0 #256
 [  135.546439] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
 [  135.547798]  ffffffff81c5bfcf ffff88003cbb3d28 ffffffff817fe7b5 0000000000000000
 [  135.548774]  ffff88003cbb3d60 ffffffff810a88f1 ffff88002b030000 00000000ffffffe4
 [  135.549774]  ffff88003c080000 ffff88003c082588 ffff88003c28ab60 ffff88003cbb3d70
 [  135.550758] Call Trace:
 [  135.551086]  [<ffffffff817fe7b5>] dump_stack+0x44/0x55
 [  135.551737]  [<ffffffff810a88f1>] warn_slowpath_common+0x81/0xc0
 [  135.552487]  [<ffffffff810a89e5>] warn_slowpath_null+0x15/0x20
 [  135.553211]  [<ffffffff81448c88>] btrfs_dev_replace_start+0x398/0x440
 [  135.554051]  [<ffffffff81412c3e>] btrfs_ioctl+0x1d2e/0x25c0
 [  135.554722]  [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0
 [  135.555506]  [<ffffffff8111ab36>] ? current_kernel_time64+0x56/0xa0
 [  135.556304]  [<ffffffff81201e3d>] do_vfs_ioctl+0x30d/0x580
 [  135.557009]  [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0
 [  135.557855]  [<ffffffff810011d1>] ? do_audit_syscall_entry+0x61/0x70
 [  135.558669]  [<ffffffff8120d1c1>] ? __fget_light+0x61/0x90
 [  135.559374]  [<ffffffff81202124>] SyS_ioctl+0x74/0x80
 [  135.559987]  [<ffffffff81809857>] entry_SYSCALL_64_fastpath+0x12/0x6f
 [  135.560842] ---[ end trace 2a5c1fc3205abbdd ]---

Reason:
 When big data writen to fs, the whole free space will be allocated
 for data chunk.
 And operation as scrub need to set_block_ro(), and when there is
 only one metadata chunk in system(or other metadata chunks
 are all full), the function will try to allocate a new chunk,
 and failed because no space in device.

Fix:
 When set_block_ro failed for metadata chunk, it is not a problem
 because scrub_lock paused commit_trancaction in same time, and
 metadata are always cowed, so the on-the-fly writepages will not
 write data into same place with scrub/replace.
 Let replace continue in this case is no problem.

Tested by above script, and xfstests/011, plus 100 times xfstests/070.

Changelog v1->v2:
1: Add detail comments in source and commit-message.
2: Add dmesg detail into commit-message.
3: Limit return value of -ENOSPC to be passed.
All suggested by: Filipe Manana <fdmanana@gmail.com>
Suggested-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>

parent da02c689

Hide whitespace changes

Inline Side-by-side

View file @ 76a8efa1

...	@@ -3483,6 +3483,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,	...	@@ -3483,6 +3483,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
	u64 length;		u64 length;
	u64 chunk_offset;		u64 chunk_offset;
	int ret = 0;		int ret = 0;
			int ro_set;
	int slot;		int slot;
	struct extent_buffer *l;		struct extent_buffer *l;
	struct btrfs_key key;		struct btrfs_key key;
...	@@ -3568,7 +3569,21 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,	...	@@ -3568,7 +3569,21 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
	scrub_pause_on(fs_info);		scrub_pause_on(fs_info);
	ret = btrfs_inc_block_group_ro(root, cache);		ret = btrfs_inc_block_group_ro(root, cache);
	scrub_pause_off(fs_info);		scrub_pause_off(fs_info);
	if (ret) {
			if (ret == 0) {
			ro_set = 1;
			} else if (ret == -ENOSPC) {
			/*
			* btrfs_inc_block_group_ro return -ENOSPC when it
			* failed in creating new chunk for metadata.
			* It is not a problem for scrub/replace, because
			* metadata are always cowed, and our scrub paused
			* commit_transactions.
			*/
			ro_set = 0;
			} else {
			btrfs_warn(fs_info, "failed setting block group ro, ret=%d\n",
			ret);
	btrfs_put_block_group(cache);		btrfs_put_block_group(cache);
	break;		break;
	}		}
...	@@ -3611,7 +3626,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,	...	@@ -3611,7 +3626,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,

	scrub_pause_off(fs_info);		scrub_pause_off(fs_info);

	btrfs_dec_block_group_ro(root, cache);		if (ro_set)
			btrfs_dec_block_group_ro(root, cache);

	btrfs_put_block_group(cache);		btrfs_put_block_group(cache);
	if (ret)		if (ret)
...		...

Please register or to comment