1. 12 Jul, 2024 2 commits
    • Heming Zhao's avatar
      md-cluster: fix no recovery job when adding/re-adding a disk · 35a0a409
      Heming Zhao authored
      The commit db5e653d ("md: delay choosing sync action to
      md_start_sync()") delays the start of the sync action. In a
      clustered environment, this will cause another node to first
      activate the spare disk and skip recovery. As a result, no
      nodes will perform recovery when a disk is added or re-added.
      
      Before db5e653d:
      
      ```
         node1                                node2
      ----------------------------------------------------------------
      md_check_recovery
       + md_update_sb
       |  sendmsg: METADATA_UPDATED
       + md_choose_sync_action           process_metadata_update
       |  remove_and_add_spares           //node1 has not finished adding
       + call mddev->sync_work            //the spare disk:do nothing
      
      md_start_sync
       starts md_do_sync
      
      md_do_sync
       + grabbed resync_lockres:DLM_LOCK_EX
       + do syncing job
      
      md_check_recovery
       sendmsg: METADATA_UPDATED
                                       process_metadata_update
                                         //activate spare disk
      
                                       ... ...
      
                                       md_do_sync
                                        waiting to grab resync_lockres:EX
      ```
      
      After db5e653d:
      
      (note: if 'cmd:idle' sets MD_RECOVERY_INTR after md_check_recovery
      starts md_start_sync, setting the INTR action will exacerbate the
      delay in node1 calling the md_do_sync function.)
      
      ```
         node1                                node2
      ----------------------------------------------------------------
      md_check_recovery
       + md_update_sb
       |  sendmsg: METADATA_UPDATED
       + calls mddev->sync_work         process_metadata_update
                                         //node1 has not finished adding
                                         //the spare disk:do nothing
      
      md_start_sync
       + md_choose_sync_action
       |  remove_and_add_spares
       + calls md_do_sync
      
      md_check_recovery
       md_update_sb
        sendmsg: METADATA_UPDATED
                                        process_metadata_update
                                          //activate spare disk
      
        ... ...                         ... ...
      
                                        md_do_sync
                                         + grabbed resync_lockres:EX
                                         + raid1_sync_request skip sync under
      				     conf->fullsync:0
      md_do_sync
       1. waiting to grab resync_lockres:EX
       2. when node1 could grab EX lock,
          node1 will skip resync under recovery_offset:MaxSector
      ```
      
      How to trigger:
      
      ```(commands @node1)
       # to easily watch the recovery status
      echo 2000 > /proc/sys/dev/raid/speed_limit_max
      ssh root@node2 "echo 2000 > /proc/sys/dev/raid/speed_limit_max"
      
      mdadm -CR /dev/md0 -l1 -b clustered -n 2 /dev/sda /dev/sdb --assume-clean
      ssh root@node2 mdadm -A /dev/md0 /dev/sda /dev/sdb
      mdadm --manage /dev/md0 --fail /dev/sda --remove /dev/sda
      mdadm --manage /dev/md0 --add /dev/sdc
      
      === "cat /proc/mdstat" on both node, there are no recovery action. ===
      ```
      
      How to fix:
      
      because md layer code logic is hard to restore for speeding up sync job
      on local node, we add new cluster msg to pending the another node to
      active disk.
      Signed-off-by: default avatarHeming Zhao <heming.zhao@suse.com>
      Reviewed-by: default avatarSu Yue <glass.su@suse.com>
      Acked-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20240709104120.22243-2-heming.zhao@suse.com
      35a0a409
    • Heming Zhao's avatar
      md-cluster: fix hanging issue while a new disk adding · fff42f21
      Heming Zhao authored
      The commit 1bbe254e ("md-cluster: check for timeout while a
      new disk adding") is correct in terms of code syntax but not
      suite real clustered code logic.
      
      When a timeout occurs while adding a new disk, if recv_daemon()
      bypasses the unlock for ack_lockres:CR, another node will be waiting
      to grab EX lock. This will cause the cluster to hang indefinitely.
      
      How to fix:
      
      1. In dlm_lock_sync(), change the wait behaviour from forever to a
         timeout, This could avoid the hanging issue when another node
         fails to handle cluster msg. Another result of this change is
         that if another node receives an unknown msg (e.g. a new msg_type),
         the old code will hang, whereas the new code will timeout and fail.
         This could help cluster_md handle new msg_type from different
         nodes with different kernel/module versions (e.g. The user only
         updates one leg's kernel and monitors the stability of the new
         kernel).
      2. The old code for __sendmsg() always returns 0 (success) under the
         design (must successfully unlock ->message_lockres). This commit
         makes this function return an error number when an error occurs.
      
      Fixes: 1bbe254e ("md-cluster: check for timeout while a new disk adding")
      Signed-off-by: default avatarHeming Zhao <heming.zhao@suse.com>
      Reviewed-by: default avatarSu Yue <glass.su@suse.com>
      Acked-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20240709104120.22243-1-heming.zhao@suse.com
      fff42f21
  2. 10 Jul, 2024 4 commits
  3. 09 Jul, 2024 9 commits
  4. 08 Jul, 2024 4 commits
  5. 05 Jul, 2024 12 commits
  6. 04 Jul, 2024 9 commits