An error occurred fetching the project authors.
  1. 17 May, 2023 1 commit
  2. 06 Mar, 2023 3 commits
  3. 07 Apr, 2022 1 commit
  4. 23 Feb, 2022 1 commit
  5. 05 Oct, 2021 1 commit
  6. 02 Jun, 2021 2 commits
  7. 04 Mar, 2021 2 commits
  8. 05 Nov, 2020 1 commit
  9. 08 Jul, 2020 1 commit
  10. 12 May, 2020 2 commits
  11. 16 Jan, 2020 3 commits
  12. 10 Oct, 2019 1 commit
  13. 13 Aug, 2019 1 commit
  14. 12 Jul, 2019 1 commit
  15. 27 Jun, 2019 11 commits
  16. 18 Jun, 2019 8 commits
    • Shivasharan S's avatar
    • Shivasharan S's avatar
      scsi: megaraid_sas: Export RAID map through debugfs · ba53572b
      Shivasharan S authored
      Create a debugfs interface for megaraid_sas driver.  Provide interface to
      dump driver RAID map in debugfs.
      Signed-off-by: default avatarSumit Saxena <sumit.saxena@broadcom.com>
      Signed-off-by: default avatarShivasharan S <shivasharan.srikanteshwara@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      ba53572b
    • Shivasharan S's avatar
      scsi: megaraid_sas: Add debug prints for device list · 0a11c0b0
      Shivasharan S authored
      Add debug prints related to device list being returned by firmware.  The a
      debug flag to activate these prints.
      Signed-off-by: default avatarSumit Saxena <sumit.saxena@broadcom.com>
      Signed-off-by: default avatarShivasharan S <shivasharan.srikanteshwara@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      0a11c0b0
    • Shivasharan S's avatar
      scsi: megaraid_sas: Print FW fault information · b6661342
      Shivasharan S authored
      When driver detects a firmware fault during load, dump additional
      information on fault code and subcode that will help in debugging.
      Signed-off-by: default avatarSumit Saxena <sumit.saxena@broadcom.com>
      Signed-off-by: default avatarShivasharan S <shivasharan.srikanteshwara@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      b6661342
    • Shivasharan S's avatar
      scsi: megaraid_sas: Enhance prints in OCR and TM path · 96c9603c
      Shivasharan S authored
      This patch enhances the existing debug prints in reset and task management
      path.
      
      These debug prints in adapter reset path helps with debugging issues
      related to IO timeouts that are seen frequently in the field.  Add
      additional debug prints to dump the pending command frames before
      initiating an adapter reset.  Also, print FastPath IOs that are
      outstanding.
      Signed-off-by: default avatarSumit Saxena <sumit.saxena@broadcom.com>
      Signed-off-by: default avatarShivasharan S <shivasharan.srikanteshwara@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      96c9603c
    • Shivasharan S's avatar
      scsi: megaraid_sas: Load balance completions across all MSI-X · 1d15d909
      Shivasharan S authored
      Driver will use "reply descriptor post queues" in round robin fashion when
      the combined MSI-X mode is not enabled. With this IO completions are
      distributed and load balanced across all the available reply descriptor
      post queues equally.
      
      This is enabled only if combined MSI-X mode is not enabled in firmware.
      This improves performance and also fixes soft lockups.
      
      When load balancing is enabled, IRQ affinity from driver needs to be
      disabled.
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@broadcom.com>
      Signed-off-by: default avatarShivasharan S <shivasharan.srikanteshwara@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      1d15d909
    • Shivasharan S's avatar
      scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups · 62a04f81
      Shivasharan S authored
      Issue Description:
      
      We have seen cpu lock up issues from field if system has a large (more than
      96) logical cpu count.  SAS3.0 controller (Invader series) supports max 96
      MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.
      
      This may be a generic issue (if PCI device support completion on multiple
      reply queues).
      
      Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
      problem and possible changes to handle such issues.  MegaRAID controller
      supports multiple reply queues in completion path.  Driver creates MSI-X
      vectors for controller as "minimum of (FW supported Reply queues, Logical
      CPUs)".  If submitter is not interrupted via completion on same CPU, there
      is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
      timeout, system sluggish etc.
      
      Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
      (e.g. CPU B) is busy with processing the corresponding IO's reply
      descriptors from reply descriptor queue upon receiving the interrupts from
      HBA.  If CPU A is continuously pumping the IOs then always CPU B (which is
      executing the ISR) will see the valid reply descriptors in the reply
      descriptor queue and it will be continuously processing those reply
      descriptor in a loop without quitting the ISR handler.
      
      megaraid_sas driver will exit ISR handler if it finds unused reply
      descriptor in the reply descriptor queue.  Since CPU A will be continuously
      sending the IOs, CPU B may always see a valid reply descriptor (posted by
      HBA Firmware after processing the IO) in the reply descriptor queue. In
      worst case, driver will not quit from this loop in the ISR handler.
      Eventually, CPU lockup will be detected by watchdog.
      
      Above mentioned behavior is not common if "rq_affinity" set to 2 or
      affinity_hint is honored by irqbalancer as "exact".  If rq_affinity is set
      to 2, submitter will be always interrupted via completion on same CPU.  If
      irqbalancer is using "exact" policy, interrupt will be delivered to
      submitter CPU.
      
      Problem statement:
      
      If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
      1:1, we still have exposure of issue explained above and for that we don't
      have any solution.
      
      Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
      supported by device.
      
      If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
      CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
      then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
      hard/soft lockups. There won't be any one to one mapping between
      CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
      is shared with group/set of CPUs and there is a possibility of having a
      loop in the IO path within that CPU group and may observe lockups.
      
      For example: Consider a system having two NUMA nodes and each node having
      four logical CPUs and also consider that number of MSI-X vectors enabled on
      the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
      e.g.
      MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
      MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
      
      numactl --hardware
      available: 2 nodes (0-1)
      node 0 cpus: 0 1 2 3                 --> MSI-X 0
      node 0 size: 65536 MB
      node 0 free: 63176 MB
      node 1 cpus: 4 5 6 7                 --> MSI-X 1
      node 1 size: 65536 MB
      node 1 free: 63176 MB
      
      Assume that user started an application which uses all the CPUs of NUMA
      node 0 for issuing the IOs.  Only one CPU from affinity list (it can be any
      cpu since this behavior depends upon irqbalance) CPU0 will receive the
      interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
      percentage will be decreasing and ISR processing percentage will be
      increasing as it is more busy with processing the interrupts.  Gradually IO
      submission percentage on CPU 0 will be zero and it's ISR processing
      percentage will be 100% as IO loop has already formed within the
      NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
      submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
      always find the valid reply descriptor in the reply descriptor queue.
      Eventually, we will observe the hard lockup here.
      
      Chances of occurring of hard/soft lockups are directly proportional to
      value of X. If value of X is high, then chances of observing CPU lockups is
      high.
      
      Solution:
      
      Use IRQ poll interface defined in "irq_poll.c".
      
      megaraid_sas driver will execute ISR routine in softirq context and it will
      always quit the loop based on budget provided in IRQ poll interface.
      Driver will switch to IRQ poll only when more than a threshold number of
      reply descriptors are handled in one ISR. Currently threshold is set as
      1/4th of HBA queue depth.
      
      In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
      X:1 (where X >  1)), IRQ poll interface will avoid CPU hard lockups due to
      voluntary exit from the reply queue processing based on budget.
      Note - Only one MSI-X vector is busy doing processing.
      
      Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@broadcom.com>
      Signed-off-by: default avatarShivasharan S <shivasharan.srikanteshwara@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      62a04f81
    • Shivasharan S's avatar
      scsi: megaraid_sas: Block PCI config space access from userspace during OCR · 78409d4b
      Shivasharan S authored
      While an online controller reset(OCR) is in progress, there is short
      duration where all access to controller's PCI config space from the host
      needs to be blocked.  This is due to a hardware limitation of MegaRAID
      controllers.
      
      With this patch, driver will block all access to controller's config space
      from userland applications by calling pci_cfg_access_lock() while OCR is in
      progress and unlocking after controller comes back to ready state.
      
      Added helper function which locks the config space before initiating OCR
      and wait for controller to become READY.
      Signed-off-by: default avatarShivasharan S <shivasharan.srikanteshwara@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      78409d4b