1. 25 Jan, 2024 7 commits
    • Eric Chanudet's avatar
      scsi: ufs: qcom: Avoid re-init quirk when gears match · 10a39667
      Eric Chanudet authored
      On sa8775p-ride, probing the HBA will go through the
      UFSHCD_QUIRK_REINIT_AFTER_MAX_GEAR_SWITCH path although the power info is
      the same during the second init.
      
      The REINIT quirk only applies starting with controller v4. For these,
      ufs_qcom_get_hs_gear() reads the highest supported gear when setting the
      host_params. After the negotiation, if the host and device are on the same
      gear, it is the highest gear supported between the two. Skip REINIT to save
      some time.
      Signed-off-by: default avatarEric Chanudet <echanude@redhat.com>
      Link: https://lore.kernel.org/r/20240123192854.1724905-4-echanude@redhat.comReviewed-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8775p-ride
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      10a39667
    • Andrew Halaney's avatar
      scsi: ufs: qcom: Clarify comments about the initial phy_gear · 883a8b45
      Andrew Halaney authored
      The comments that currently are within the hw_ver < 4 conditional are
      misleading. They really apply to various branches of the conditionals there
      and incorrectly state that the phy_gear value can increase.
      
      Right now the logic is to:
      
       - Default to max supported gear for phy_gear
      
       - Set phy_gear to minimum value if version < 4 since those versions only
         support one PHY init sequence (and therefore don't need reinit)
      
       - Set phy_gear to the optimal value if the device version is already
         populated in the controller registers on boot
      
      Let's move some of the comment to outside the if statement and clean up the
      bit left about switching to a higher gear on reinit. This way the comment
      more accurately reflects the logic.
      Signed-off-by: default avatarAndrew Halaney <ahalaney@redhat.com>
      Link: https://lore.kernel.org/r/20240123-ufs-reinit-comments-v1-1-ff2b3532d7fe@redhat.comReviewed-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      883a8b45
    • Martin K. Petersen's avatar
      Merge patch series "scsi: hisi_sas: Minor fixes and cleanups" · 2b9bc9ef
      Martin K. Petersen authored
      chenxiang <chenxiang66@hisilicon.com> says:
      
      This series contains some fixes and cleanups including:
      
       - Fix a deadlock issue related to automatic debugfs;
      
       - Remove redundant checks for automatic debugfs;
      
       - Check whether debugfs is enabled before removing or releasing it;
      
       - Remove hisi_hba->timer for v3 hw;
      
      Link: https://lore.kernel.org/r/1705904747-62186-1-git-send-email-chenxiang66@hisilicon.comSigned-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      2b9bc9ef
    • Xiang Chen's avatar
      scsi: hisi_sas: Remove hisi_hba->timer for v3 hw · f9242f16
      Xiang Chen authored
      hisi_hba->timer is not used for v3 hw but there are two places that some
      operations related to hisi_hba->timer are called by v3 hw:
      
       - Deleting the timer in function hisi_sas_v3_hw() which is only for v3 hw;
      
       - Deleting the timer in function hisi_sas_controller_reset_prepare() which
         is common for v1/v2/v3 hw.
      
      We can remove the timer in the first case, but for the second scenario we
      need to remove it only for v3 hw, so check hw->sht which is NULL only for
      v3 hw before deleting hisi_hba->timer.
      Signed-off-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
      Link: https://lore.kernel.org/r/1705904747-62186-5-git-send-email-chenxiang66@hisilicon.comSigned-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      f9242f16
    • Yihang Li's avatar
      scsi: hisi_sas: Check whether debugfs is enabled before removing or releasing it · 69097a63
      Yihang Li authored
      hisi_sas debugfs remove should be executed only when debugfs is enabled.
      Check whether debugfs is enabled and then remove it only if enabled.
      Signed-off-by: default avatarYihang Li <liyihang9@huawei.com>
      Signed-off-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
      Link: https://lore.kernel.org/r/1705904747-62186-4-git-send-email-chenxiang66@hisilicon.comSigned-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      69097a63
    • Yihang Li's avatar
      scsi: hisi_sas: Remove redundant checks for automatic debugfs dump · 3f030550
      Yihang Li authored
      In commit 63f0733d ("scsi: hisi_sas: Allocate DFX memory during dump
      trigger"), the memory allocation time of the DFX is changed from device
      initialization to dump occurs, so .debugfs_itct is not a valid address and
      do not need to check.
      
      The parameter hisi_sas_debugfs_enable is enough to check whether automatic
      debugfs dump is triggered, so remove redunant checks.
      
      Fixes: 63f0733d ("scsi: hisi_sas: Allocate DFX memory during dump trigger")
      Signed-off-by: default avatarYihang Li <liyihang9@huawei.com>
      Signed-off-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
      Link: https://lore.kernel.org/r/1705904747-62186-3-git-send-email-chenxiang66@hisilicon.comSigned-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      3f030550
    • Yihang Li's avatar
      scsi: hisi_sas: Fix a deadlock issue related to automatic dump · 3c4f53b2
      Yihang Li authored
      If we issue a disabling PHY command, the device attached with it will go
      offline, if a 2 bit ECC error occurs at the same time, a hung task may be
      found:
      
      [ 4613.652388] INFO: task kworker/u256:0:165233 blocked for more than 120 seconds.
      [ 4613.666297] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 4613.674809] task:kworker/u256:0  state:D stack:    0 pid:165233 ppid:     2 flags:0x00000208
      [ 4613.683959] Workqueue: 0000:74:02.0_disco_q sas_revalidate_domain [libsas]
      [ 4613.691518] Call trace:
      [ 4613.694678]  __switch_to+0xf8/0x17c
      [ 4613.698872]  __schedule+0x660/0xee0
      [ 4613.703063]  schedule+0xac/0x240
      [ 4613.706994]  schedule_timeout+0x500/0x610
      [ 4613.711705]  __down+0x128/0x36c
      [ 4613.715548]  down+0x240/0x2d0
      [ 4613.719221]  hisi_sas_internal_abort_timeout+0x1bc/0x260 [hisi_sas_main]
      [ 4613.726618]  sas_execute_internal_abort+0x144/0x310 [libsas]
      [ 4613.732976]  sas_execute_internal_abort_dev+0x44/0x60 [libsas]
      [ 4613.739504]  hisi_sas_internal_task_abort_dev.isra.0+0xbc/0x1b0 [hisi_sas_main]
      [ 4613.747499]  hisi_sas_dev_gone+0x174/0x250 [hisi_sas_main]
      [ 4613.753682]  sas_notify_lldd_dev_gone+0xec/0x2e0 [libsas]
      [ 4613.759781]  sas_unregister_common_dev+0x4c/0x7a0 [libsas]
      [ 4613.765962]  sas_destruct_devices+0xb8/0x120 [libsas]
      [ 4613.771709]  sas_do_revalidate_domain.constprop.0+0x1b8/0x31c [libsas]
      [ 4613.778930]  sas_revalidate_domain+0x60/0xa4 [libsas]
      [ 4613.784716]  process_one_work+0x248/0x950
      [ 4613.789424]  worker_thread+0x318/0x934
      [ 4613.793878]  kthread+0x190/0x200
      [ 4613.797810]  ret_from_fork+0x10/0x18
      [ 4613.802121] INFO: task kworker/u256:4:316722 blocked for more than 120 seconds.
      [ 4613.816026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 4613.824538] task:kworker/u256:4  state:D stack:    0 pid:316722 ppid:     2 flags:0x00000208
      [ 4613.833670] Workqueue: 0000:74:02.0 hisi_sas_rst_work_handler [hisi_sas_main]
      [ 4613.841491] Call trace:
      [ 4613.844647]  __switch_to+0xf8/0x17c
      [ 4613.848852]  __schedule+0x660/0xee0
      [ 4613.853052]  schedule+0xac/0x240
      [ 4613.856984]  schedule_timeout+0x500/0x610
      [ 4613.861695]  __down+0x128/0x36c
      [ 4613.865542]  down+0x240/0x2d0
      [ 4613.869216]  hisi_sas_controller_prereset+0x58/0x1fc [hisi_sas_main]
      [ 4613.876324]  hisi_sas_rst_work_handler+0x40/0x8c [hisi_sas_main]
      [ 4613.883019]  process_one_work+0x248/0x950
      [ 4613.887732]  worker_thread+0x318/0x934
      [ 4613.892204]  kthread+0x190/0x200
      [ 4613.896118]  ret_from_fork+0x10/0x18
      [ 4613.900423] INFO: task kworker/u256:1:348985 blocked for more than 121 seconds.
      [ 4613.914341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 4613.922852] task:kworker/u256:1  state:D stack:    0 pid:348985 ppid:     2 flags:0x00000208
      [ 4613.931984] Workqueue: 0000:74:02.0_event_q sas_port_event_worker [libsas]
      [ 4613.939549] Call trace:
      [ 4613.942702]  __switch_to+0xf8/0x17c
      [ 4613.946892]  __schedule+0x660/0xee0
      [ 4613.951083]  schedule+0xac/0x240
      [ 4613.955015]  schedule_timeout+0x500/0x610
      [ 4613.959725]  wait_for_common+0x200/0x610
      [ 4613.964349]  wait_for_completion+0x3c/0x5c
      [ 4613.969146]  flush_workqueue+0x198/0x790
      [ 4613.973776]  sas_porte_broadcast_rcvd+0x1e8/0x320 [libsas]
      [ 4613.979960]  sas_port_event_worker+0x54/0xa0 [libsas]
      [ 4613.985708]  process_one_work+0x248/0x950
      [ 4613.990420]  worker_thread+0x318/0x934
      [ 4613.994868]  kthread+0x190/0x200
      [ 4613.998800]  ret_from_fork+0x10/0x18
      
      This is because when the device goes offline, we obtain the hisi_hba
      semaphore and send the ABORT_DEV command to the device. However, the
      internal abort timed out due to the 2 bit ECC error and triggers automatic
      dump. In addition, since the hisi_hba semaphore has been obtained, the dump
      cannot be executed and the controller cannot be reset.
      
      Therefore, the deadlocks occur on the following circular dependencies:
      hisi_sas_dev_gone() -> down() -> hisi_sas_internal_task_abort_dev() -> ...
      -> hisi_sas_internal_abort_timeout() -> down().
      
      The deadlock is triggered only when the timeout occurs during device goes
      offline. To fix this issue, use .rst_ha_timeout to distinguish the scenario
      where a device goes offline from other scenarios.
      
      Fixes: 2ff07b5c ("scsi: hisi_sas: Directly call register snapshot instead of using workqueue")
      Signed-off-by: default avatarYihang Li <liyihang9@huawei.com>
      Signed-off-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
      Link: https://lore.kernel.org/r/1705904747-62186-2-git-send-email-chenxiang66@hisilicon.comSigned-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      3c4f53b2
  2. 24 Jan, 2024 16 commits
  3. 21 Jan, 2024 17 commits