1. 13 Mar, 2018 1 commit
    • Jason Yan's avatar
      scsi: libsas: defer ata device eh commands to libata · 318aaf34
      Jason Yan authored
      When ata device doing EH, some commands still attached with tasks are
      not passed to libata when abort failed or recover failed, so libata did
      not handle these commands. After these commands done, sas task is freed,
      but ata qc is not freed. This will cause ata qc leak and trigger a
      warning like below:
      
      WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
      ata_eh_finish+0xb4/0xcc
      CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G     W  OE 4.14.0#1
      ......
      Call trace:
      [<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc
      [<ffff0000088b8420>] ata_do_eh+0xc4/0xd8
      [<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c
      [<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694
      [<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80
      [<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170
      [<ffff0000080ebd70>] process_one_work+0x144/0x390
      [<ffff0000080ec100>] worker_thread+0x144/0x418
      [<ffff0000080f2c98>] kthread+0x10c/0x138
      [<ffff0000080855dc>] ret_from_fork+0x10/0x18
      
      If ata qc leaked too many, ata tag allocation will fail and io blocked
      for ever.
      
      As suggested by Dan Williams, defer ata device commands to libata and
      merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle
      ata qcs correctly after this.
      Signed-off-by: default avatarJason Yan <yanaijie@huawei.com>
      CC: Xiaofei Tan <tanxiaofei@huawei.com>
      CC: John Garry <john.garry@huawei.com>
      CC: Dan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      318aaf34
  2. 07 Mar, 2018 4 commits
  3. 02 Mar, 2018 8 commits
    • Manish Rangankar's avatar
      scsi: qedi: Fix kernel crash during port toggle · 967823d6
      Manish Rangankar authored
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000100
      
      [  985.596918] IP: _raw_spin_lock_bh+0x17/0x30
      [  985.601581] PGD 0 P4D 0
      [  985.604405] Oops: 0002 [#1] SMP
      :
      [  985.704533] CPU: 16 PID: 1156 Comm: qedi_thread/16 Not tainted 4.16.0-rc2 #1
      [  985.712397] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 2.4.3 01/17/2017
      [  985.720747] RIP: 0010:_raw_spin_lock_bh+0x17/0x30
      [  985.725996] RSP: 0018:ffffa4b1c43d3e10 EFLAGS: 00010246
      [  985.731823] RAX: 0000000000000000 RBX: ffff94a31bd03000 RCX: 0000000000000000
      [  985.739783] RDX: 0000000000000001 RSI: ffff94a32fa16938 RDI: 0000000000000100
      [  985.747744] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000a33
      [  985.755703] R10: 0000000000000000 R11: ffffa4b1c43d3af0 R12: 0000000000000000
      [  985.763662] R13: ffff94a301f40818 R14: 0000000000000000 R15: 000000000000000c
      [  985.771622] FS:  0000000000000000(0000) GS:ffff94a32fa00000(0000) knlGS:0000000000000000
      [  985.780649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  985.787057] CR2: 0000000000000100 CR3: 000000067a009006 CR4: 00000000001606e0
      [  985.795017] Call Trace:
      [  985.797747]  qedi_fp_process_cqes+0x258/0x980 [qedi]
      [  985.803294]  qedi_percpu_io_thread+0x10f/0x1b0 [qedi]
      [  985.808931]  kthread+0xf5/0x130
      [  985.812434]  ? qedi_free_uio+0xd0/0xd0 [qedi]
      [  985.817298]  ? kthread_bind+0x10/0x10
      [  985.821372]  ? do_syscall_64+0x6e/0x1a0
      Signed-off-by: default avatarManish Rangankar <manish.rangankar@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      967823d6
    • Darren Trapp's avatar
      scsi: qla2xxx: Fix FC-NVMe LUN discovery · 2b5b9647
      Darren Trapp authored
      commit a4239945 ("scsi: qla2xxx: Add switch command to simplify
      fabric discovery") introduced regression when it did not consider
      FC-NVMe code path which broke NVMe LUN discovery.
      
      Fixes: a4239945 ("scsi: qla2xxx: Add switch command to simplify fabric discovery")
      Signed-off-by: default avatarDarren Trapp <darren.trapp@cavium.com>
      Signed-off-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      2b5b9647
    • Hannes Reinecke's avatar
      scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte() · e39a9735
      Hannes Reinecke authored
      When converting __scsi_error_from_host_byte() to BLK_STS error codes the
      case DID_OK was forgotten, resulting in it always returning an error.
      
      Fixes: 2a842aca ("block: introduce new block status code type")
      Cc: Doug Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      e39a9735
    • Bart Van Assche's avatar
      scsi: core: Avoid that ATA error handling can trigger a kernel hang or oops · 3be8828f
      Bart Van Assche authored
      Avoid that the recently introduced call_rcu() call in the SCSI core
      triggers a double call_rcu() call.
      Reported-by: default avatarNatanael Copa <ncopa@alpinelinux.org>
      Reported-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      References: https://bugzilla.kernel.org/show_bug.cgi?id=198861
      Fixes: 3bd6f43f ("scsi: core: Ensure that the SCSI error handler gets woken up")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Tested-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Cc: Natanael Copa <ncopa@alpinelinux.org>
      Cc: Damien Le Moal <damien.lemoal@wdc.com>
      Cc: Alexandre Oliva <oliva@gnu.org>
      Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      3be8828f
    • Hannes Reinecke's avatar
      scsi: qla2xxx: ensure async flags are reset correctly · fa83e658
      Hannes Reinecke authored
      The fcport flags FCF_ASYNC_ACTIVE and FCF_ASYNC_SENT are used to
      throttle the state machine, so we need to ensure to always set and unset
      them correctly. Not doing so will lead to the state machine getting
      confused and no login attempt into remote ports.
      
      Cc: Quinn Tran <quinn.tran@cavium.com>
      Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
      Fixes: 3dbec59b ("scsi: qla2xxx: Prevent multiple active discovery commands per session")
      Signed-off-by: default avatarHannes Reinecke <hare@suse.com>
      Acked-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      fa83e658
    • Hannes Reinecke's avatar
      scsi: qla2xxx: do not check login_state if no loop id is assigned · 07ea4b60
      Hannes Reinecke authored
      When no loop id is assigned in qla24xx_fcport_handle_login() the login
      state needs to be ignored; it will get set later on in
      qla_chk_n2n_b4_login().
      
      Cc: Quinn Tran <quinn.tran@cavium.com>
      Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
      Fixes: 040036bb ("scsi: qla2xxx: Delay loop id allocation at login")
      Signed-off-by: default avatarHannes Reinecke <hare@suse.com>
      Acked-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      07ea4b60
    • Hannes Reinecke's avatar
      scsi: qla2xxx: Fixup locking for session deletion · 1c6cacf4
      Hannes Reinecke authored
      Commit d8630bb9 ('Serialize session deletion by using work_lock')
      tries to fixup a deadlock when deleting sessions, but fails to take into
      account the locking rules. This patch resolves the situation by
      introducing a separate lock for processing the GNLIST response, and
      ensures that sess_lock is released before calling
      qlt_schedule_sess_delete().
      
      Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
      Cc: Quinn Tran <quinn.tran@cavium.com>
      Fixes: d8630bb9 ("scsi: qla2xxx: Serialize session deletion by using work_lock")
      Signed-off-by: default avatarHannes Reinecke <hare@suse.com>
      Acked-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      1c6cacf4
    • himanshu.madhani@cavium.com's avatar
      scsi: qla2xxx: Fix NULL pointer crash due to active timer for ABTS · 1514839b
      himanshu.madhani@cavium.com authored
      This patch fixes NULL pointer crash due to active timer running for abort
      IOCB.
      
      From crash dump analysis it was discoverd that get_next_timer_interrupt()
      encountered a corrupted entry on the timer list.
      
       #9 [ffff95e1f6f0fd40] page_fault at ffffffff914fe8f8
          [exception RIP: get_next_timer_interrupt+440]
          RIP: ffffffff90ea3088  RSP: ffff95e1f6f0fdf0  RFLAGS: 00010013
          RAX: ffff95e1f6451028  RBX: 000218e2389e5f40  RCX: 00000001232ad600
          RDX: 0000000000000001  RSI: ffff95e1f6f0fdf0  RDI: 0000000001232ad6
          RBP: ffff95e1f6f0fe40   R8: ffff95e1f6451188   R9: 0000000000000001
          R10: 0000000000000016  R11: 0000000000000016  R12: 00000001232ad5f6
          R13: ffff95e1f6450000  R14: ffff95e1f6f0fdf8  R15: ffff95e1f6f0fe10
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      
      Looking at the assembly of get_next_timer_interrupt(), address came
      from %r8 (ffff95e1f6451188) which is pointing to list_head with single
      entry at ffff95e5ff621178.
      
       0xffffffff90ea307a <get_next_timer_interrupt+426>:      mov    (%r8),%rdx
       0xffffffff90ea307d <get_next_timer_interrupt+429>:      cmp    %r8,%rdx
       0xffffffff90ea3080 <get_next_timer_interrupt+432>:      je     0xffffffff90ea30a7 <get_next_timer_interrupt+471>
       0xffffffff90ea3082 <get_next_timer_interrupt+434>:      nopw   0x0(%rax,%rax,1)
       0xffffffff90ea3088 <get_next_timer_interrupt+440>:      testb  $0x1,0x18(%rdx)
      
       crash> rd ffff95e1f6451188 10
       ffff95e1f6451188:  ffff95e5ff621178 ffff95e5ff621178   x.b.....x.b.....
       ffff95e1f6451198:  ffff95e1f6451198 ffff95e1f6451198   ..E.......E.....
       ffff95e1f64511a8:  ffff95e1f64511a8 ffff95e1f64511a8   ..E.......E.....
       ffff95e1f64511b8:  ffff95e77cf509a0 ffff95e77cf509a0   ...|.......|....
       ffff95e1f64511c8:  ffff95e1f64511c8 ffff95e1f64511c8   ..E.......E.....
      
       crash> rd ffff95e5ff621178 10
       ffff95e5ff621178:  0000000000000001 ffff95e15936aa00   ..........6Y....
       ffff95e5ff621188:  0000000000000000 00000000ffffffff   ................
       ffff95e5ff621198:  00000000000000a0 0000000000000010   ................
       ffff95e5ff6211a8:  ffff95e5ff621198 000000000000000c   ..b.............
       ffff95e5ff6211b8:  00000f5800000000 ffff95e751f8d720   ....X... ..Q....
      
       ffff95e5ff621178 belongs to freed mempool object at ffff95e5ff621080.
      
       CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
       ffff95dc7fd74d00 mnt_cache                384      19785     24948    594    16k
         SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
         ffffdc5dabfd8800  ffff95e5ff620000     1     42         29    13
         FREE / [ALLOCATED]
          ffff95e5ff621080  (cpu 6 cache)
      
      Examining the contents of that memory reveals a pointer to a constant string
      in the driver, "abort\0", which is set by qla24xx_async_abort_cmd().
      
       crash> rd ffffffffc059277c 20
       ffffffffc059277c:  6e490074726f6261 0074707572726574   abort.Interrupt.
       ffffffffc059278c:  00676e696c6c6f50 6920726576697244   Polling.Driver i
       ffffffffc059279c:  646f6d207325206e 6974736554000a65   n %s mode..Testi
       ffffffffc05927ac:  636976656420676e 786c252074612065   ng device at %lx
       ffffffffc05927bc:  6b63656843000a2e 646f727020676e69   ...Checking prod
       ffffffffc05927cc:  6f20444920746375 0a2e706968632066   uct ID of chip..
       ffffffffc05927dc:  5120646e756f4600 204130303232414c   .Found QLA2200A
       ffffffffc05927ec:  43000a2e70696843 20676e696b636568   Chip...Checking
       ffffffffc05927fc:  65786f626c69616d 6c636e69000a2e73   mailboxes...incl
       ffffffffc059280c:  756e696c2f656475 616d2d616d642f78   ude/linux/dma-ma
      
       crash> struct -ox srb_iocb
       struct srb_iocb {
                 union {
                     struct {...} logio;
                     struct {...} els_logo;
                     struct {...} tmf;
                     struct {...} fxiocb;
                     struct {...} abt;
                     struct ct_arg ctarg;
                     struct {...} mbx;
                     struct {...} nack;
          [0x0 ] } u;
          [0xb8] struct timer_list timer;
          [0x108] void (*timeout)(void *);
       }
       SIZE: 0x110
      
       crash> ! bc
       ibase=16
       obase=10
       B8+40
       F8
      
      The object is a srb_t, and at offset 0xf8 within that structure
      (i.e. ffff95e5ff621080 + f8 -> ffff95e5ff621178) is a struct timer_list.
      
      Cc: <stable@vger.kernel.org> #4.4+
      Fixes: 4440e46d ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous handling.")
      Signed-off-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      1514839b
  4. 22 Feb, 2018 3 commits
  5. 15 Feb, 2018 1 commit
  6. 14 Feb, 2018 7 commits
  7. 06 Feb, 2018 1 commit
  8. 31 Jan, 2018 12 commits
  9. 23 Jan, 2018 3 commits