• Filipe Manana's avatar
    Btrfs: prevent send failures and crashes due to concurrent relocation · 9e967495
    Filipe Manana authored
    Send always operates on read-only trees and always expected that while it
    is in progress, nothing changes in those trees. Due to that expectation
    and the fact that send is a read-only operation, it operates on commit
    roots and does not hold transaction handles. However relocation can COW
    nodes and leafs from read-only trees, which can cause unexpected failures
    and crashes (hitting BUG_ONs). while send using a node/leaf, it gets
    COWed, the transaction used to COW it is committed, a new transaction
    starts, the extent previously used for that node/leaf gets allocated,
    possibly for another tree, and the respective extent buffer' content
    changes while send is still using it. When this happens send normally
    fails with EIO being returned to user space and messages like the
    following are found in dmesg/syslog:
    
      [ 3408.699121] BTRFS error (device sdc): parent transid verify failed on 58703872 wanted 250 found 253
      [ 3441.523123] BTRFS error (device sdc): did not find backref in send_root. inode=63211, offset=0, disk_byte=5222825984 found extent=5222825984
    
    Other times, less often, we hit a BUG_ON() because an extent buffer that
    send is using used to be a node, and while send is still using it, it
    got COWed and got reused as a leaf while send is still using, producing
    the following trace:
    
     [ 3478.466280] ------------[ cut here ]------------
     [ 3478.466282] kernel BUG at fs/btrfs/ctree.c:1806!
     [ 3478.466965] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
     [ 3478.467635] CPU: 0 PID: 2165 Comm: btrfs Not tainted 5.0.0-btrfs-next-46 #1
     [ 3478.468311] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
     [ 3478.469681] RIP: 0010:read_node_slot+0x122/0x130 [btrfs]
     (...)
     [ 3478.471758] RSP: 0018:ffffa437826bfaa0 EFLAGS: 00010246
     [ 3478.472457] RAX: ffff961416ed7000 RBX: 000000000000003d RCX: 0000000000000002
     [ 3478.473151] RDX: 000000000000003d RSI: ffff96141e387408 RDI: ffff961599b30000
     [ 3478.473837] RBP: ffffa437826bfb8e R08: 0000000000000001 R09: ffffa437826bfb8e
     [ 3478.474515] R10: ffffa437826bfa70 R11: 0000000000000000 R12: ffff9614385c8708
     [ 3478.475186] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
     [ 3478.475840] FS:  00007f8e0e9cc8c0(0000) GS:ffff9615b6a00000(0000) knlGS:0000000000000000
     [ 3478.476489] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     [ 3478.477127] CR2: 00007f98b67a056e CR3: 0000000005df6005 CR4: 00000000003606f0
     [ 3478.477762] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
     [ 3478.478385] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
     [ 3478.479003] Call Trace:
     [ 3478.479600]  ? do_raw_spin_unlock+0x49/0xc0
     [ 3478.480202]  tree_advance+0x173/0x1d0 [btrfs]
     [ 3478.480810]  btrfs_compare_trees+0x30c/0x690 [btrfs]
     [ 3478.481388]  ? process_extent+0x1280/0x1280 [btrfs]
     [ 3478.481954]  btrfs_ioctl_send+0x1037/0x1270 [btrfs]
     [ 3478.482510]  _btrfs_ioctl_send+0x80/0x110 [btrfs]
     [ 3478.483062]  btrfs_ioctl+0x13fe/0x3120 [btrfs]
     [ 3478.483581]  ? rq_clock_task+0x2e/0x60
     [ 3478.484086]  ? wake_up_new_task+0x1f3/0x370
     [ 3478.484582]  ? do_vfs_ioctl+0xa2/0x6f0
     [ 3478.485075]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
     [ 3478.485552]  do_vfs_ioctl+0xa2/0x6f0
     [ 3478.486016]  ? __fget+0x113/0x200
     [ 3478.486467]  ksys_ioctl+0x70/0x80
     [ 3478.486911]  __x64_sys_ioctl+0x16/0x20
     [ 3478.487337]  do_syscall_64+0x60/0x1b0
     [ 3478.487751]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
     [ 3478.488159] RIP: 0033:0x7f8e0d7d4dd7
     (...)
     [ 3478.489349] RSP: 002b:00007ffcf6fb4908 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
     [ 3478.489742] RAX: ffffffffffffffda RBX: 0000000000000105 RCX: 00007f8e0d7d4dd7
     [ 3478.490142] RDX: 00007ffcf6fb4990 RSI: 0000000040489426 RDI: 0000000000000005
     [ 3478.490548] RBP: 0000000000000005 R08: 00007f8e0d6f3700 R09: 00007f8e0d6f3700
     [ 3478.490953] R10: 00007f8e0d6f39d0 R11: 0000000000000202 R12: 0000000000000005
     [ 3478.491343] R13: 00005624e0780020 R14: 0000000000000000 R15: 0000000000000001
     (...)
     [ 3478.493352] ---[ end trace d5f537302be4f8c8 ]---
    
    Another possibility, much less likely to happen, is that send will not
    fail but the contents of the stream it produces may not be correct.
    
    To avoid this, do not allow send and relocation (balance) to run in
    parallel. In the long term the goal is to allow for both to be able to
    run concurrently without any problems, but that will take a significant
    effort in development and testing.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    9e967495
volumes.c 205 KB