1. 12 Apr, 2017 3 commits
    • James Bottomley's avatar
    • Mauricio Faria de Oliveira's avatar
      scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION · 785a4704
      Mauricio Faria de Oliveira authored
      On a dual controller setup with multipath enabled, some MEDIUM ERRORs
      caused both paths to be failed, thus I/O got queued/blocked since the
      'queue_if_no_path' feature is enabled by default on IPR controllers.
      
      This example disabled 'queue_if_no_path' so the I/O failure is seen at
      the sg_dd program.  Notice that after the sg_dd test-case, both paths
      are in 'failed' state, and both path/priority groups are in 'enabled'
      state (not 'active') -- which would block I/O with 'queue_if_no_path'.
      
          # sg_dd if=/dev/dm-2 bs=4096 count=1 dio=1 verbose=4 blk_sgio=0
          <...>
          read(unix): count=4096, res=-1
          sg_dd: reading, skip=0 : Input/output error
          <...>
      
          # dmesg
          [...] sd 2:2:16:0: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
          [...] sd 2:2:16:0: [sds] Sense Key : Medium Error [current]
          [...] sd 2:2:16:0: [sds] Add. Sense: Unrecovered read error - recommend rewrite the data
          [...] sd 2:2:16:0: [sds] CDB: Read(10) 28 00 00 00 00 00 00 00 20 00
          [...] blk_update_request: I/O error, dev sds, sector 0
          [...] device-mapper: multipath: Failing path 65:32.
          <...>
          [...] device-mapper: multipath: Failing path 65:224.
      
          # multipath -l
          1IBM_IPR-0_59C2AE0000001F80 dm-2 IBM     ,IPR-0   59C2AE00
          size=5.2T features='0' hwhandler='1 alua' wp=rw
          |-+- policy='service-time 0' prio=0 status=enabled
          | `- 2:2:16:0 sds  65:32  failed undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            `- 1:2:7:0  sdae 65:224 failed undef running
      
      This is not the desired behavior. The dm-multipath explicitly checks
      for the MEDIUM ERROR case (and a few others) so not to fail the path
      (e.g., I/O to other sectors could potentially happen without problems).
      See dm-mpath.c :: do_end_io_bio() -> noretry_error() !->! fail_path().
      
      The problem trace is:
      
      1) ipr_scsi_done()  // SENSE KEY/CHECK CONDITION detected, go to..
      2) ipr_erp_start()  // ipr_is_gscsi() and masked_ioasc OK, go to..
      3) ipr_gen_sense()  // masked_ioasc is IPR_IOASC_MED_DO_NOT_REALLOC,
                          // so set DID_PASSTHROUGH.
      
      4) scsi_decide_disposition()  // check for DID_PASSTHROUGH and return
                                    // early on, faking a DID_OK.. *instead*
                                    // of reaching scsi_check_sense().
      
                                    // Had it reached the latter, that would
                                    // set host_byte to DID_MEDIUM_ERROR.
      
      5) scsi_finish_command()
      6) scsi_io_completion()
      7) __scsi_error_from_host_byte()  // That would be converted to -ENODATA
      <...>
      8) dm_softirq_done()
      9) multipath_end_io()
      10) do_end_io()
      11) noretry_error()  // And that is checked in dm-mpath :: noretry_error()
                           // which would cause fail_path() not to be called.
      
      With this patch applied, the I/O is failed but the paths are not.  This
      multipath device continues accepting more I/O requests without blocking.
      (and notice the different host byte/driver byte handling per SCSI layer).
      
          # dmesg
          [...] sd 2:2:7:0: [sdaf] Done: SUCCESS Result: hostbyte=0x13 driverbyte=DRIVER_OK
          [...] sd 2:2:7:0: [sdaf] CDB: Read(10) 28 00 00 00 00 00 00 00 40 00
          [...] sd 2:2:7:0: [sdaf] Sense Key : Medium Error [current]
          [...] sd 2:2:7:0: [sdaf] Add. Sense: Unrecovered read error - recommend rewrite the data
          [...] blk_update_request: critical medium error, dev sdaf, sector 0
          [...] blk_update_request: critical medium error, dev dm-6, sector 0
          [...] sd 2:2:7:0: [sdaf] Done: SUCCESS Result: hostbyte=0x13 driverbyte=DRIVER_OK
          [...] sd 2:2:7:0: [sdaf] CDB: Read(10) 28 00 00 00 00 00 00 00 10 00
          [...] sd 2:2:7:0: [sdaf] Sense Key : Medium Error [current]
          [...] sd 2:2:7:0: [sdaf] Add. Sense: Unrecovered read error - recommend rewrite the data
          [...] blk_update_request: critical medium error, dev sdaf, sector 0
          [...] blk_update_request: critical medium error, dev dm-6, sector 0
          [...] Buffer I/O error on dev dm-6, logical block 0, async page read
      
          # multipath -l 1IBM_IPR-0_59C2AE0000001F80
          1IBM_IPR-0_59C2AE0000001F80 dm-6 IBM     ,IPR-0   59C2AE00
          size=5.2T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
          |-+- policy='service-time 0' prio=0 status=active
          | `- 2:2:7:0  sdaf 65:240 active undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            `- 1:2:7:0  sdh  8:112  active undef running
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Acked-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      785a4704
    • Guilherme G. Piccoli's avatar
      scsi: aacraid: fix PCI error recovery path · 911e572e
      Guilherme G. Piccoli authored
      During a PCI error recovery, if aac_check_health() is not aware that a
      PCI error happened and we have an offline PCI channel, it might trigger
      some errors (like NULL pointer dereference) and inhibit the error
      recovery process to complete.
      
      This patch makes the health check procedure aware of PCI channel issues,
      and in case of error recovery process, the function
      aac_adapter_check_health() returns -1 and let the recovery process to
      complete successfully. This patch was tested on upstream kernel
      v4.11-rc5 in PowerPC ppc64le architecture with adapter 9005:028d
      (VID:DID) - the error recovery procedure was able to recover fine.
      
      Fixes: 5c63f7f7 ("aacraid: Added EEH support")
      Cc: stable@vger.kernel.org # v4.6+
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Reviewed-by: default avatarDave Carroll <david.carroll@microsemi.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      911e572e
  2. 07 Apr, 2017 5 commits
    • Martin K. Petersen's avatar
      scsi: sd: Fix capacity calculation with 32-bit sector_t · 7c856152
      Martin K. Petersen authored
      We previously made sure that the reported disk capacity was less than
      0xffffffff blocks when the kernel was not compiled with large sector_t
      support (CONFIG_LBDAF). However, this check assumed that the capacity
      was reported in units of 512 bytes.
      
      Add a sanity check function to ensure that we only enable disks if the
      entire reported capacity can be expressed in terms of sector_t.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarSteve Magnani <steve.magnani@digidescorp.com>
      Cc: Bart Van Assche <Bart.VanAssche@sandisk.com>
      Reviewed-by: default avatarBart Van Assche <Bart.VanAssche@sandisk.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      7c856152
    • Sawan Chandak's avatar
      scsi: qla2xxx: Add fix to read correct register value for ISP82xx. · bf6061b1
      Sawan Chandak authored
      Add fix to read correct register value for ISP82xx, during check for
      register disconnect.ISP82xx has different base register.
      
      Fixes: a465537a ("qla2xxx: Disable the adapter and skip error recovery in case of register disconnect")
      Signed-off-by: default avatarSawan Chandak <sawan.chandak@cavium.com>
      Signed-off-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      bf6061b1
    • Chad Dupuis's avatar
      scsi: qedf: Fix crash due to unsolicited FIP VLAN response. · 8eaf7dfc
      Chad Dupuis authored
      We need to initialize qedf->fipvlan_compl in __qedf_probe so that if we
      receive an unsolicited FIP VLAN response, the system doesn't crash due
      to trying to complete an uninitialized completion.
      
      Also add a check to see if there are any waiters on the completion so we
      don't inadvertantly kick start the discovery process due to the
      unsolicited frame.
      
      Fixed the crash:
      
      <1>BUG: unable to handle kernel NULL pointer dereference at (null)
      <1>IP: [<ffffffff8105ed71>] __wake_up_common+0x31/0x90
      <4>PGD 0
      <4>Oops: 0000 [#1] SMP
      <4>last sysfs file: /sys/devices/system/cpu/online
      <4>CPU 7
      <4>Modules linked in: autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc cnic fcoe 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat uinput ipmi_devintf microcode power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support dcdbas sg joydev sb_edac edac_core lpc_ich mfd_core shpchp tg3 ptp pps_core ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif qedi(U) iscsi_boot_sysfs libiscsi scsi_transport_iscsi uio qedf(U) libfcoe libfc scsi_transport_fc scsi_tgt qede(U) qed(U) ahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      <4>
      <4>Pid: 1485, comm: qedf_11_ll2 Not tainted 2.6.32-642.el6.x86_64 #1 Dell Inc. PowerEdge R730/0599V5
      <4>RIP: 0010:[<ffffffff8105ed71>]  [<ffffffff8105ed71>] __wake_up_common+0x31/0x90
      <4>RSP: 0018:ffff881068a83d50  EFLAGS: 00010086
      <4>RAX: ffffffffffffffe8 RBX: ffff88106bf42de0 RCX: 0000000000000000
      <4>RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88106bf42de0
      <4>RBP: ffff881068a83d90 R08: 0000000000000000 R09: 00000000fffffffe
      <4>R10: 0000000000000000 R11: 000000000000000b R12: 0000000000000286
      <4>R13: ffff88106bf42de8 R14: 0000000000000000 R15: 0000000000000000
      <4>FS:  0000000000000000(0000) GS:ffff88089c460000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      <4>CR2: 0000000000000000 CR3: 0000000001a8d000 CR4: 00000000001407e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process qedf_11_ll2 (pid: 1485, threadinfo ffff881068a80000, task ffff881068a70040)
      <4>Stack:
      <4> ffff88106ef00090 0000000300000001 ffff881068a83d90 ffff88106bf42de0
      <4><d> 0000000000000286 ffff88106bf42dd8 ffff88106bf40a50 0000000000000002
      <4><d> ffff881068a83dc0 ffffffff810634c7 ffff881000000003 000000000000000b
      <4>Call Trace:
      <4> [<ffffffff810634c7>] complete+0x47/0x60
      <4> [<ffffffffa01d37e7>] qedf_fip_recv+0x1c7/0x450 [qedf]
      <4> [<ffffffffa01cb3cb>] qedf_ll2_recv_thread+0x33b/0x510 [qedf]
      <4> [<ffffffffa01cb090>] ? qedf_ll2_recv_thread+0x0/0x510 [qedf]
      <4> [<ffffffff810a662e>] kthread+0x9e/0xc0
      <4> [<ffffffff8100c28a>] child_rip+0xa/0x20
      <4> [<ffffffff810a6590>] ? kthread+0x0/0xc0
      <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
      <4>Code: 41 56 41 55 41 54 53 48 83 ec 18 0f 1f 44 00 00 89 75 cc 89 55 c8 4c 8d 6f 08 48 8b 57 08 41 89 cf 4d 89 c6 48 8d 42 e8 49 39 d5 <48> 8b 58 18 74 3f 48 83 eb 18 eb 0a 0f 1f 00 48 89 d8 48 8d 5a
      <1>RIP  [<ffffffff8105ed71>] __wake_up_common+0x31/0x90
      <4> RSP <ffff881068a83d50>
      <4>CR2: 0000000000000000
      Signed-off-by: default avatarChad Dupuis <chad.dupuis@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      8eaf7dfc
    • Martin K. Petersen's avatar
      scsi: sr: Sanity check returned mode data · a00a7862
      Martin K. Petersen authored
      Kefeng Wang discovered that old versions of the QEMU CD driver would
      return mangled mode data causing us to walk off the end of the buffer in
      an attempt to parse it. Sanity check the returned mode sense data.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Tested-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      a00a7862
    • Fam Zheng's avatar
      scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable · 67804145
      Fam Zheng authored
      If device reports a small max_xfer_blocks and a zero opt_xfer_blocks, we
      end up using BLK_DEF_MAX_SECTORS, which is wrong and r/w of that size
      may get error.
      
      [mkp: tweaked to avoid setting rw_max twice and added typecast]
      
      Cc: <stable@vger.kernel.org> # v4.4+
      Fixes: ca369d51 ("block/sd: Fix device-imposed transfer length limits")
      Signed-off-by: default avatarFam Zheng <famz@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      67804145
  3. 29 Mar, 2017 1 commit
  4. 28 Mar, 2017 1 commit
  5. 23 Mar, 2017 3 commits
  6. 20 Mar, 2017 1 commit
    • John Garry's avatar
      scsi: libsas: fix ata xfer length · 9702c67c
      John Garry authored
      The total ata xfer length may not be calculated properly, in that we do
      not use the proper method to get an sg element dma length.
      
      According to the code comment, sg_dma_len() should be used after
      dma_map_sg() is called.
      
      This issue was found by turning on the SMMUv3 in front of the hisi_sas
      controller in hip07. Multiple sg elements were being combined into a
      single element, but the original first element length was being use as
      the total xfer length.
      
      Cc: <stable@vger.kernel.org>
      Fixes: ff2aeb1e ("libata: convert to chained sg")
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      9702c67c
  7. 19 Mar, 2017 3 commits
  8. 16 Mar, 2017 1 commit
  9. 15 Mar, 2017 10 commits
  10. 14 Mar, 2017 4 commits
  11. 12 Mar, 2017 1 commit
  12. 08 Mar, 2017 2 commits
  13. 07 Mar, 2017 5 commits