• Yu Kuai's avatar
    md: fix regression for null-ptr-deference in __md_stop() · 433279be
    Yu Kuai authored
    Commit 3e453522 ("md: Free resources in __md_stop") tried to fix
    null-ptr-deference for 'active_io' by moving percpu_ref_exit() to
    __md_stop(), however, the commit also moving 'writes_pending' to
    __md_stop(), and this will cause mdadm tests broken:
    
    BUG: kernel NULL pointer dereference, address: 0000000000000038
    Oops: 0000 [#1] PREEMPT SMP
    CPU: 15 PID: 17830 Comm: mdadm Not tainted 6.3.0-rc3-next-20230324-00009-g520d37
    RIP: 0010:free_percpu+0x465/0x670
    Call Trace:
     <TASK>
     __percpu_ref_exit+0x48/0x70
     percpu_ref_exit+0x1a/0x90
     __md_stop+0xe9/0x170
     do_md_stop+0x1e1/0x7b0
     md_ioctl+0x90c/0x1aa0
     blkdev_ioctl+0x19b/0x400
     vfs_ioctl+0x20/0x50
     __x64_sys_ioctl+0xba/0xe0
     do_syscall_64+0x6c/0xe0
     entry_SYSCALL_64_after_hwframe+0x63/0xcd
    
    And the problem can be reporduced 100% by following test:
    
    mdadm -CR /dev/md0 -l1 -n1 /dev/sda --force
    echo inactive > /sys/block/md0/md/array_state
    echo read-auto  > /sys/block/md0/md/array_state
    echo inactive > /sys/block/md0/md/array_state
    
    Root cause:
    
    // start raid
    raid1_run
     mddev_init_writes_pending
      percpu_ref_init
    
    // inactive raid
    array_state_store
     do_md_stop
      __md_stop
       percpu_ref_exit
    
    // start raid again
    array_state_store
     do_md_run
      raid1_run
       mddev_init_writes_pending
        if (mddev->writes_pending.percpu_count_ptr)
        // won't reinit
    
    // inactive raid again
    ...
    percpu_ref_exit
    -> null-ptr-deference
    
    Before the commit, 'writes_pending' is exited when mddev is freed, and
    it's safe to restart raid because mddev_init_writes_pending() already make
    sure that 'writes_pending' will only be initialized once.
    
    Fix the prblem by moving 'writes_pending' back, it's a litter hard to find
    the relationship between alloc memory and free memory, however, code
    changes is much less and we lived with this for a long time already.
    
    Fixes: 3e453522 ("md: Free resources in __md_stop")
    Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
    Reviewed-by: default avatarXiao Ni <xni@redhat.com>
    Signed-off-by: default avatarSong Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20230328094400.1448955-1-yukuai1@huaweicloud.com
    433279be
md.c 260 KB