1. 12 Jul, 2019 12 commits
    • Benjamin Block's avatar
      scsi: zfcp: fix request object use-after-free in send path causing wrong traces · 106d45f3
      Benjamin Block authored
      When tracing instances where we open and close WKA ports, we also pass the
      request-ID of the respective FSF command.
      
      But after successfully sending the FSF command we must not use the
      request-object anymore, as this might result in an use-after-free (see
      "zfcp: fix request object use-after-free in send path causing seqno
      errors" ).
      
      To fix this add a new variable that caches the request-ID before sending
      the request. This won't change during the hand-off to the FCP channel,
      and so it's safe to trace this cached request-ID later, instead of using
      the request object.
      Signed-off-by: default avatarBenjamin Block <bblock@linux.ibm.com>
      Fixes: d27a7cb9 ("zfcp: trace on request for open and close of WKA port")
      Cc: <stable@vger.kernel.org> #2.6.38+
      Reviewed-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Reviewed-by: default avatarJens Remus <jremus@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      106d45f3
    • Benjamin Block's avatar
      scsi: zfcp: fix request object use-after-free in send path causing seqno errors · b76becde
      Benjamin Block authored
      With a recent change to our send path for FSF commands we introduced a
      possible use-after-free of request-objects, that might further lead to
      zfcp crafting bad requests, which the FCP channel correctly complains
      about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled
      by an adapter-wide recovery.
      
      The following sequence illustrates the possible use-after-free:
      
          Send Path:
      
              int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action)
              {
                      struct zfcp_fsf_req *req;
                      ...
                      spin_lock_irq(&qdio->req_q_lock);
              //                     ^^^^^^^^^^^^^^^^
              //                     protects QDIO queue during sending
                      ...
                      req = zfcp_fsf_req_create(qdio,
                                                FSF_QTCB_OPEN_PORT_WITH_DID,
                                                SBAL_SFLAGS0_TYPE_READ,
                                                qdio->adapter->pool.erp_req);
              //            ^^^^^^^^^^^^^^^^^^^
              //            allocation of the request-object
                      ...
                      retval = zfcp_fsf_req_send(req);
                      ...
                      spin_unlock_irq(&qdio->req_q_lock);
                      return retval;
              }
      
              static int zfcp_fsf_req_send(struct zfcp_fsf_req *req)
              {
                      struct zfcp_adapter *adapter = req->adapter;
                      struct zfcp_qdio *qdio = adapter->qdio;
                      ...
                      zfcp_reqlist_add(adapter->req_list, req);
              //      ^^^^^^^^^^^^^^^^
              //      add request to our driver-internal hash-table for tracking
              //      (protected by separate lock req_list->lock)
                      ...
                      if (zfcp_qdio_send(qdio, &req->qdio_req)) {
              //          ^^^^^^^^^^^^^^
              //          hand-off the request to FCP channel;
              //          the request can complete at any point now
                              ...
                      }
      
                      /* Don't increase for unsolicited status */
                      if (!zfcp_fsf_req_is_status_read_buffer(req))
              //           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
              //           possible use-after-free
                              adapter->fsf_req_seq_no++;
              //                       ^^^^^^^^^^^^^^^^
              //                       because of the use-after-free we might
              //                       miss this accounting, and as follow-up
              //                       this results in the FCP channel error
              //                       FSF_PROT_SEQ_NUMB_ERROR
                      adapter->req_no++;
      
                      return 0;
              }
      
              static inline bool
              zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req)
              {
                      return req->qtcb == NULL;
              //             ^^^^^^^^^
              //             possible use-after-free
              }
      
          Response Path:
      
              void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx)
              {
                      ...
                      struct zfcp_fsf_req *fsf_req;
                      ...
                      for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) {
                              ...
                              fsf_req = zfcp_reqlist_find_rm(adapter->req_list,
                                                             req_id);
              //                        ^^^^^^^^^^^^^^^^^^^^
              //                        remove request from our driver-internal
              //                        hash-table (lock req_list->lock)
                              ...
                              zfcp_fsf_req_complete(fsf_req);
                      }
              }
      
              static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req)
              {
                      ...
                      if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP))
                              zfcp_fsf_req_free(req);
              //              ^^^^^^^^^^^^^^^^^
              //              free memory for request-object
                      else
                              complete(&req->completion);
              //              ^^^^^^^^
              //              completion notification for code-paths that wait
              //              synchronous for the completion of the request; in
              //              those the memory is freed separately
              }
      
      The result of the use-after-free only affects the send path, and can not
      lead to any data corruption. In case we miss the sequence-number
      accounting, because the memory was already re-purposed, the next FSF
      command will fail with said FCP channel error, and we will recover the
      whole adapter. This causes no additional errors, but it slows down
      traffic.  There is a slight chance of the same thing happen again
      recursively after the adapter recovery, but so far this has not been seen.
      
      This was seen under z/VM, where the send path might run on a virtual CPU
      that gets scheduled away by z/VM, while the return path might still run,
      and so create the necessary timing. Running with KASAN can also slow down
      the kernel sufficiently to run into this user-after-free, and then see the
      report by KASAN.
      
      To fix this, simply pull the test for the sequence-number accounting in
      front of the hand-off to the FCP channel (this information doesn't change
      during hand-off), but leave the sequence-number accounting itself where it
      is.
      
      To make future regressions of the same kind less likely, add comments to
      all closely related code-paths.
      Signed-off-by: default avatarBenjamin Block <bblock@linux.ibm.com>
      Fixes: f9eca022 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header")
      Cc: <stable@vger.kernel.org> #5.0+
      Reviewed-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Reviewed-by: default avatarJens Remus <jremus@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      b76becde
    • Shivasharan S's avatar
    • Shivasharan S's avatar
      scsi: megaraid_sas: Add module parameter for FW Async event logging · d956a116
      Shivasharan S authored
      Add module parameter to control logging levels of async event
      notifications from firmware that get logged to system log.  Also, allow
      changing the value from sysfs after driver load.
      Signed-off-by: default avatarShivasharan S <shivasharan.srikanteshwara@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      d956a116
    • Shivasharan S's avatar
      scsi: megaraid_sas: Enable msix_load_balance for Invader and later controllers · 1175b884
      Shivasharan S authored
      Load balancing IO completions across all available MSI-X vectors should be
      enabled for Invader and later generation controllers only.  This needs to
      be disabled for older controllers.  Add an adapter type check before
      setting msix_load_balance flag.
      
      Fixes: 1d15d909 ("scsi: megaraid_sas: Load balance completions across all MSI-X")
      Signed-off-by: default avatarShivasharan S <shivasharan.srikanteshwara@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      1175b884
    • Shivasharan S's avatar
      scsi: megaraid_sas: Fix calculation of target ID · c8f96df5
      Shivasharan S authored
      In megasas_get_target_prop(), driver is incorrectly calculating the target
      ID for devices with channel 1 and 3.  Due to this, firmware will either
      fail the command (if there is no device with the target id sent from
      driver) or could return the properties for a target which was not
      intended.  Devices could end up with the wrong queue depth due to this.
      
      Fix target id calculation for channel 1 and 3.
      
      Fixes: 96188a89 ("scsi: megaraid_sas: NVME interface target prop added")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarShivasharan S <shivasharan.srikanteshwara@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      c8f96df5
    • Arnd Bergmann's avatar
      scsi: lpfc: reduce stack size with CONFIG_GCC_PLUGIN_STRUCTLEAK_VERBOSE · 057959c6
      Arnd Bergmann authored
      The lpfc_debug_dump_all_queues() function repeatedly calls into
      lpfc_debug_dump_qe() which has a temporary 128 byte buffer.  This was fine
      before the introduction of CONFIG_GCC_PLUGIN_STRUCTLEAK_VERBOSE because
      each instance could occupy the same stack slot. However, now they each get
      their own copy, which leads to a huge increase in stack usage as seen from
      the compiler warning:
      
      drivers/scsi/lpfc/lpfc_debugfs.c: In function 'lpfc_debug_dump_all_queues':
      drivers/scsi/lpfc/lpfc_debugfs.c:6474:1: error: the frame size of 1712 bytes is larger than 100 bytes [-Werror=frame-larger-than=]
      
      Avoid this by not marking lpfc_debug_dump_qe() as inline so the compiler
      can choose to emit a static version of this function when it's needed or
      otherwise silently drop it. As an added benefit, not inlining multiple
      copies of this function means we save several kilobytes of .text section,
      reducing the file size from 47kb to 43.
      
      It is somewhat unusual to have a function that is static but not inline in
      a header file, but this does not cause problems here because it is only
      used by other inline functions. It would however seem reasonable to move
      all the lpfc_debug_dump_* functions into lpfc_debugfs.c and not mark them
      inline as a later cleanup.
      
      Fixes: 81a56f6d ("gcc-plugins: structleak: Generalize to all variable types")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      057959c6
    • Marcos Paulo de Souza's avatar
      scsi: devinfo: BLIST_TRY_VPD_PAGES for SanDisk Cruzer Blade · 4bc02214
      Marcos Paulo de Souza authored
      Currently, all USB devices skip VPD pages, even when the device supports
      them (SPC-3 and later), but some of them support VPD, like Cruzer Blade.
      Signed-off-by: default avatarMarcos Paulo de Souza <marcos.souza.org@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      4bc02214
    • Deepak Ukey's avatar
      scsi: pm80xx: Fixed kernel panic during error recovery for SATA drive · 196ba662
      Deepak Ukey authored
      Disabling the SATA drive interface cause kernel panic. When the drive
      Interface is disabled, device should be deregistered after aborting all
      pending I/Os. Also changed the port recovery timeout to 10000 ms for
      PM8006 controller.
      Signed-off-by: default avatarDeepak Ukey <deepak.ukey@microchip.com>
      Signed-off-by: default avatarViswas G <Viswas.G@microchip.com>
      Reviewed-by: default avatarJack Wang <jinpu.wang@cloud.ionos.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      196ba662
    • Denis Efremov's avatar
      scsi: libsas: remove the exporting of sas_wait_eh · 9b79ee97
      Denis Efremov authored
      The function sas_wait_eh is declared static and marked EXPORT_SYMBOL, which
      is at best an odd combination. Because the function is not used outside of
      the drivers/scsi/libsas/sas_scsi_host.c file it is defined in, this commit
      removes the EXPORT_SYMBOL() marking.
      Signed-off-by: default avatarDenis Efremov <efremov@linux.com>
      Reviewed-by: default avatarJason Yan <yanaijie@huawei.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      9b79ee97
    • YueHaibing's avatar
      scsi: megaraid_sas: Make some symbols static · 6764f519
      YueHaibing authored
      Fix sparse warnings:
      
      drivers/scsi/megaraid/megaraid_sas_base.c:271:1: warning: symbol 'megasas_issue_dcmd' was not declared. Should it be static?
      drivers/scsi/megaraid/megaraid_sas_base.c:2227:6: warning: symbol 'megasas_do_ocr' was not declared. Should it be static?
      drivers/scsi/megaraid/megaraid_sas_base.c:3194:25: warning: symbol 'megaraid_host_attrs' was not declared. Should it be static?
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Acked-by: default avatarSumit Saxena <sumit.saxena@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      6764f519
    • Maurizio Lombardi's avatar
      scsi: core: use scmd_printk() to print which command timed out · 463cdad8
      Maurizio Lombardi authored
      With a possibly faulty disk the following messages may appear in the logs:
      
      kernel: sd 0:0:9:0: timing out command, waited 180s
      kernel: sd 0:0:9:0: timing out command, waited 20s
      kernel: sd 0:0:9:0: timing out command, waited 20s
      kernel: sd 0:0:9:0: timing out command, waited 60s
      kernel: sd 0:0:9:0: timing out command, waited 20s
      
      This is not very informative because it's not possible to identify the
      command that timed out.
      
      This patch replaces sdev_printk() with scmd_printk().
      Signed-off-by: default avatarMaurizio Lombardi <mlombard@redhat.com>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      463cdad8
  2. 27 Jun, 2019 28 commits