• Mateusz Jończyk's avatar
    md/raid1: set max_sectors during early return from choose_slow_rdev() · 36a5c03f
    Mateusz Jończyk authored
    Linux 6.9+ is unable to start a degraded RAID1 array with one drive,
    when that drive has a write-mostly flag set. During such an attempt,
    the following assertion in bio_split() is hit:
    
    	BUG_ON(sectors <= 0);
    
    Call Trace:
    	? bio_split+0x96/0xb0
    	? exc_invalid_op+0x53/0x70
    	? bio_split+0x96/0xb0
    	? asm_exc_invalid_op+0x1b/0x20
    	? bio_split+0x96/0xb0
    	? raid1_read_request+0x890/0xd20
    	? __call_rcu_common.constprop.0+0x97/0x260
    	raid1_make_request+0x81/0xce0
    	? __get_random_u32_below+0x17/0x70
    	? new_slab+0x2b3/0x580
    	md_handle_request+0x77/0x210
    	md_submit_bio+0x62/0xa0
    	__submit_bio+0x17b/0x230
    	submit_bio_noacct_nocheck+0x18e/0x3c0
    	submit_bio_noacct+0x244/0x670
    
    After investigation, it turned out that choose_slow_rdev() does not set
    the value of max_sectors in some cases and because of it,
    raid1_read_request calls bio_split with sectors == 0.
    
    Fix it by filling in this variable.
    
    This bug was introduced in
    commit dfa8ecd1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
    but apparently hidden until
    commit 0091c5a2 ("md/raid1: factor out helpers to choose the best rdev from read_balance()")
    shortly thereafter.
    
    Cc: stable@vger.kernel.org # 6.9.x+
    Signed-off-by: default avatarMateusz Jończyk <mat.jonczyk@o2.pl>
    Fixes: dfa8ecd1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
    Cc: Song Liu <song@kernel.org>
    Cc: Yu Kuai <yukuai3@huawei.com>
    Cc: Paul Luse <paul.e.luse@linux.intel.com>
    Cc: Xiao Ni <xni@redhat.com>
    Cc: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
    Link: https://lore.kernel.org/linux-raid/20240706143038.7253-1-mat.jonczyk@o2.pl/
    
    --
    
    Tested on both Linux 6.10 and 6.9.8.
    
    Inside a VM, mdadm testsuite for RAID1 on 6.10 did not find any problems:
    	./test --dev=loop --no-error --raidtype=raid1
    (on 6.9.8 there was one failure, caused by external bitmap support not
    compiled in).
    
    Notes:
    - I was reliably getting deadlocks when adding / removing devices
      on such an array - while the array was loaded with fsstress with 20
      concurrent processes. When the array was idle or loaded with fsstress
      with 8 processes, no such deadlocks happened in my tests.
      This occurred also on unpatched Linux 6.8.0 though, but not on
      6.1.97-rc1, so this is likely an independent regression (to be
      investigated).
    - I was also getting deadlocks when adding / removing the bitmap on the
      array in similar conditions - this happened on Linux 6.1.97-rc1
      also though. fsstress with 8 concurrent processes did cause it only
      once during many tests.
    - in my testing, there was once a problem with hot adding an
      internal bitmap to the array:
    	mdadm: Cannot add bitmap while array is resyncing or reshaping etc.
    	mdadm: failed to set internal bitmap.
      even though no such reshaping was happening according to /proc/mdstat.
      This seems unrelated, though.
    Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
    Signed-off-by: default avatarSong Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20240711202316.10775-1-mat.jonczyk@o2.pl
    36a5c03f
raid1.c 93.6 KB