1. 20 Dec, 2018 2 commits
    • Stephan Günther's avatar
      scsi: mpt3sas: fix memory ordering on 64bit writes · 23c3828a
      Stephan Günther authored
      With commit 09c2f95a ("scsi: mpt3sas: Swap I/O memory read value back
      to cpu endianness"), 64bit writes in _base_writeq() were rewritten to use
      __raw_writeq() instad of writeq().
      
      This introduced a bug apparent on powerpc64 systems such as the Raptor
      Talos II that causes the HBA to drop from the PCIe bus under heavy load and
      being reinitialized after a couple of seconds.
      
      It can easily be triggered on affacted systems by using something like
      
        fio --name=random-write --iodepth=4 --rw=randwrite --bs=4k --direct=0 \
          --size=128M --numjobs=64 --end_fsync=1
        fio --name=random-write --iodepth=4 --rw=randwrite --bs=64k --direct=0 \
          --size=128M --numjobs=64 --end_fsync=1
      
      a couple of times. In my case I tested it on both a ZFS raidz2 and a btrfs
      raid6 using LSI 9300-8i and 9400-8i controllers.
      
      The fix consists in resembling the write ordering of writeq() by adding a
      mandatory write memory barrier before device access and a compiler barrier
      afterwards. The additional MMIO barrier is superfluous.
      Signed-off-by: default avatarStephan Günther <moepi@moepi.net>
      Reported-by: default avatarMatt Corallo <linux@bluematt.me>
      Acked-by: default avatarSreekanth Reddy <Sreekanth.Reddy@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      23c3828a
    • Anatoliy Glagolev's avatar
      scsi: qla2xxx: deadlock by configfs_depend_item · 17b18eaa
      Anatoliy Glagolev authored
      The intent of invoking configfs_depend_item in commit 7474f52a
      ("tcm_qla2xxx: Perform configfs depend/undepend for base_tpg")
      was to prevent a physical Fibre Channel port removal when
      virtual (NPIV) ports announced through that physical port are active.
      The change does not work as expected: it makes enabled physical port
      dependent on target configfs subsystem (the port's parent), something
      the configfs guarantees anyway.
      
      Besides, scheduling work in a worker thread and waiting for the work's
      completion is not really a valid workaround for the requirement not to call
      configfs_depend_item from a configfs callback: the call occasionally
      deadlocks.
      
      Thus, removing configfs_depend_item calls does not break anything and fixes
      the deadlock problem.
      Signed-off-by: default avatarAnatoliy Glagolev <glagolig@gmail.com>
      Acked-by: default avatarHimanshu Madhani <hmadhani@marvell.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      17b18eaa
  2. 19 Dec, 2018 14 commits
  3. 13 Dec, 2018 16 commits
  4. 08 Dec, 2018 8 commits
    • James Smart's avatar
      scsi: lpfc: update driver version to 12.0.0.9 · de55b786
      James Smart authored
      Update the driver version to 12.0.0.9
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      de55b786
    • James Smart's avatar
      scsi: lpfc: Fix dif and first burst use in write commands · 7c4042a4
      James Smart authored
      When dif and first burst is used in a write command wqe, the driver was not
      properly setting fields in the io command request. This resulted in no dif
      bytes being sent and invalid xfer_rdy's, resulting in the io being aborted
      by the hardware.
      
      Correct the wqe initializaton when both dif and first burst are used.
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      7c4042a4
    • James Smart's avatar
      scsi: lpfc: Fix driver release of fw-logging buffers · 1165a5c2
      James Smart authored
      On driver termination, after the driver stops fw logging by writing a
      register on the chip, the driver immediately unmaps and frees the logging
      buffer, without confirming in any way that the chip has received the write
      and terminated the logging. As termination on the chip is not immediate,
      the chip may issue a dma request to the now unmapped dma buffer, resulting
      in a iommu fault.
      
      Change the driver to receive a confirmation that logging ahs been
      terminated. As the driver always issues an SLI reset with the device as
      part of shutdown, and as part of that is receiving confirmation that the
      reset is complete - the driver was modified to perform the write to disable
      fw logging prior to the SLI reset and only free the fw log buffer after the
      SLI reset is complete. That guarantees use of the fw log buffer is fully
      terminated when it is unmapped.
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      1165a5c2
    • James Smart's avatar
      scsi: lpfc: Correct topology type reporting on G7 adapters · 76558b25
      James Smart authored
      Driver missed classifying the chip type for G7 when reporting supported
      topologies. This resulted in loop being shown as supported on FC links that
      are not supported per the standard.
      
      Add the chip classifications to the topology checks in the driver.
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      76558b25
    • James Smart's avatar
      scsi: lpfc: Correct code setting non existent bits in sli4 ABORT WQE · 1c36833d
      James Smart authored
      Driver is setting bits in word 10 of the SLI4 ABORT WQE (the wqid).  The
      field was a carry over from a prior SLI revision. The field does not exist
      in SLI4, and the action may result in an overlap with future definition of
      the WQE.
      
      Remove the setting of WQID in the ABORT WQE.
      
      Also cleaned up WQE field settings - initialize to zero, don't bother to
      set fields to zero.
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      1c36833d
    • James Smart's avatar
      scsi: lpfc: Defer LS_ACC to FLOGI on point to point logins · 0a9e9687
      James Smart authored
      The current discovery state machine the driver treated FLOGI oddly.  When
      point to point, an FLOGI is to be exchanged by the two ports, with the port
      with the most significant WWN then proceeding with PLOGI.  The
      implementation in the driver was keyed to closely with "what have I sent",
      not with what has happened between the two endpoints. Thus, it blatantly
      would ACC an FLOGI, but reject PLOGI's until it had its FLOGI ACC'd. The
      problem is - the sending of FLOGI may be delayed for some reason, or the
      response to FLOGI held off by the other side. In the failing situation the
      other side sent an FLOGI, which was ACC'd, then sent PLOGIs which were then
      rjt'd until the retry count for the PLOGIs were exceeded and the port gave
      up. The FLOGI may have been very late in transmit, or the response held off
      until the PLOGIs failed. Given the other port had the higher WWN, no PLOGIs
      would occur and communication stopped.
      
      Correct the situation by changing the FLOGI handling. Defer any response to
      an FLOGI until the driver has sent its FLOGI as well. Then, upon either
      completion of the sent FLOGI, or upon sending an ACC to a received FLOGI
      (which may be received before or just after FLOGI was sent). the driver
      will act on who has the higher WWN. if the other port does, the driver will
      noop any handling of an FLOGI response (if outstanding) and wait for PLOGI.
      If the local port does, the driver will transition to sending PLOGI and
      will noop any action on responding to an FLOGI (if not yet received).
      
      Fortunately, to implement this, it only took another state flag and
      deferring any FLOGI response if the FLOGI has yet to be transmit. All
      subsequent actions were already in place.
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      0a9e9687
    • James Smart's avatar
      scsi: lpfc: ls_rjt erroneus FLOGIs · 287aba25
      James Smart authored
      In some link initialization sequences, the fw generates an erroneous FLOGI
      payload to the driver without an intervening link bounce.  The driver, when
      it sees a 2nd FLOGI without an intervening link bounce, automatically
      performs a link bounce. In this, the link bounce causes the situate to
      repeat and in a nasty loop of link bounces.
      
      Resolve the issue by validating the FLOGI payload. The erroneous FLOGI will
      contain VVL signatures that are not normal. When the driver sees these, it
      will simply reject the flogi rather than bouncing the link.  The reject is
      consumed within the firmware.
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      287aba25
    • James Smart's avatar
      scsi: lpfc: rport port swap discovery issue. · 92ea83a8
      James Smart authored
      Two initiator ports were cable swapped and after swap both went down.  The
      driver internally swaps the nlp nodes based on matching node wwn's but not
      the same nport id as before. After detecting a change in the nodes RPI, the
      driver sends an UNREG_RPI command and clears the NLP_RPI_REGISTERED flag,
      then swaps the node information with the other node. But the other node's
      NLP_RPI_REGISTERED flag is also cleared, but it is done so without an
      UNREG_RPI being sent, which causes the later REG_RPI for that other node to
      fail as the hardware believes its still registered.
      
      Additionally, if the node swap occurred while the two nodes had PLOGI's in
      flight, the fc4_types weren't properly getting swapped such that when the
      PLOGIs commpleted and PRLI's were then sent, the PRLI's acted on bad
      protocol types so the PRLI was for the wrong protocol. NVME devices saw
      SCSI FCP PRLIs and vice versa.
      
      Clean up the node swap so that the NLP_RPI_REGISTERED flag is handled
      properly.
      
      Fix the handling of the fc4_types when the nodes are swapped as well
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      92ea83a8