• Heming Zhao's avatar
    md-cluster: fix use-after-free issue when removing rdev · f7c7a2f9
    Heming Zhao authored
    md_kick_rdev_from_array will remove rdev, so we should
    use rdev_for_each_safe to search list.
    
    How to trigger:
    
    env: Two nodes on kvm-qemu x86_64 VMs (2C2G with 2 iscsi luns).
    
    ```
    node2=192.168.0.3
    
    for i in {1..20}; do
        echo ==== $i `date` ====;
    
        mdadm -Ss && ssh ${node2} "mdadm -Ss"
        wipefs -a /dev/sda /dev/sdb
    
        mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l 1 /dev/sda \
           /dev/sdb --assume-clean
        ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
        mdadm --wait /dev/md0
        ssh ${node2} "mdadm --wait /dev/md0"
    
        mdadm --manage /dev/md0 --fail /dev/sda --remove /dev/sda
        sleep 1
    done
    ```
    
    Crash stack:
    
    ```
    stack segment: 0000 [#1] SMP
    ... ...
    RIP: 0010:md_check_recovery+0x1e8/0x570 [md_mod]
    ... ...
    RSP: 0018:ffffb149807a7d68 EFLAGS: 00010207
    RAX: 0000000000000000 RBX: ffff9d494c180800 RCX: ffff9d490fc01e50
    RDX: fffff047c0ed8308 RSI: 0000000000000246 RDI: 0000000000000246
    RBP: 6b6b6b6b6b6b6b6b R08: ffff9d490fc01e40 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
    R13: ffff9d494c180818 R14: ffff9d493399ef38 R15: ffff9d4933a1d800
    FS:  0000000000000000(0000) GS:ffff9d494f700000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fe68cab9010 CR3: 000000004c6be001 CR4: 00000000003706e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     raid1d+0x5c/0xd40 [raid1]
     ? finish_task_switch+0x75/0x2a0
     ? lock_timer_base+0x67/0x80
     ? try_to_del_timer_sync+0x4d/0x80
     ? del_timer_sync+0x41/0x50
     ? schedule_timeout+0x254/0x2d0
     ? md_start_sync+0xe0/0xe0 [md_mod]
     ? md_thread+0x127/0x160 [md_mod]
     md_thread+0x127/0x160 [md_mod]
     ? wait_woken+0x80/0x80
     kthread+0x10d/0x130
     ? kthread_park+0xa0/0xa0
     ret_from_fork+0x1f/0x40
    ```
    
    Fixes: dbb64f86 ("md-cluster: Fix adding of new disk with new reload code")
    Fixes: 659b254f ("md-cluster: remove a disk asynchronously from cluster environment")
    Cc: stable@vger.kernel.org
    Reviewed-by: default avatarGang He <ghe@suse.com>
    Signed-off-by: default avatarHeming Zhao <heming.zhao@suse.com>
    Signed-off-by: default avatarSong Liu <song@kernel.org>
    f7c7a2f9
md.c 259 KB