1. 12 Apr, 2023 1 commit
    • Jiri Kosina's avatar
      scsi: ses: Handle enclosure with just a primary component gracefully · c8e22b7a
      Jiri Kosina authored
      This reverts commit 3fe97ff3 ("scsi: ses: Don't attach if enclosure
      has no components") and introduces proper handling of case where there are
      no detected secondary components, but primary component (enumerated in
      num_enclosures) does exist. That fix was originally proposed by Ding Hui
      <dinghui@sangfor.com.cn>.
      
      Completely ignoring devices that have one primary enclosure and no
      secondary one results in ses_intf_add() bailing completely
      
      	scsi 2:0:0:254: enclosure has no enumerated components
              scsi 2:0:0:254: Failed to bind enclosure -12ven in valid configurations such
      
      even on valid configurations with 1 primary and 0 secondary enclosures as
      below:
      
      	# sg_ses /dev/sg0
      	  3PARdata  SES               3321
      	Supported diagnostic pages:
      	  Supported Diagnostic Pages [sdp] [0x0]
      	  Configuration (SES) [cf] [0x1]
      	  Short Enclosure Status (SES) [ses] [0x8]
      	# sg_ses -p cf /dev/sg0
      	  3PARdata  SES               3321
      	Configuration diagnostic page:
      	  number of secondary subenclosures: 0
      	  generation code: 0x0
      	  enclosure descriptor list
      	    Subenclosure identifier: 0 [primary]
      	      relative ES process id: 0, number of ES processes: 1
      	      number of type descriptor headers: 1
      	      enclosure logical identifier (hex): 20000002ac02068d
      	      enclosure vendor: 3PARdata  product: VV                rev: 3321
      	  type descriptor header and text list
      	    Element type: Unspecified, subenclosure id: 0
      	      number of possible elements: 1
      
      The changelog for the original fix follows
      
      =====
      We can get a crash when disconnecting the iSCSI session,
      the call trace like this:
      
        [ffff00002a00fb70] kfree at ffff00000830e224
        [ffff00002a00fba0] ses_intf_remove at ffff000001f200e4
        [ffff00002a00fbd0] device_del at ffff0000086b6a98
        [ffff00002a00fc50] device_unregister at ffff0000086b6d58
        [ffff00002a00fc70] __scsi_remove_device at ffff00000870608c
        [ffff00002a00fca0] scsi_remove_device at ffff000008706134
        [ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4
        [ffff00002a00fd10] scsi_remove_target at ffff0000087064c0
        [ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4
        [ffff00002a00fdb0] process_one_work at ffff00000810f35c
        [ffff00002a00fe00] worker_thread at ffff00000810f648
        [ffff00002a00fe70] kthread at ffff000008116e98
      
      In ses_intf_add, components count could be 0, and kcalloc 0 size scomp,
      but not saved in edev->component[i].scratch
      
      In this situation, edev->component[0].scratch is an invalid pointer,
      when kfree it in ses_intf_remove_enclosure, a crash like above would happen
      The call trace also could be other random cases when kfree cannot catch
      the invalid pointer
      
      We should not use edev->component[] array when the components count is 0
      We also need check index when use edev->component[] array in
      ses_enclosure_data_process
      =====
      Reported-by: default avatarMichal Kolar <mich.k@seznam.cz>
      Originally-by: default avatarDing Hui <dinghui@sangfor.com.cn>
      Cc: stable@vger.kernel.org
      Fixes: 3fe97ff3 ("scsi: ses: Don't attach if enclosure has no components")
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2304042122270.29760@cbobk.fhfr.pmTested-by: default avatarMichal Kolar <mich.k@seznam.cz>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      c8e22b7a
  2. 03 Apr, 2023 4 commits
  3. 25 Mar, 2023 4 commits
  4. 17 Mar, 2023 4 commits
    • Yu Kuai's avatar
      scsi: scsi_dh_alua: Fix memleak for 'qdata' in alua_activate() · a13faca0
      Yu Kuai authored
      If alua_rtpg_queue() failed from alua_activate(), then 'qdata' is not
      freed, which will cause following memleak:
      
      unreferenced object 0xffff88810b2c6980 (size 32):
        comm "kworker/u16:2", pid 635322, jiffies 4355801099 (age 1216426.076s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          40 39 24 c1 ff ff ff ff 00 f8 ea 0a 81 88 ff ff  @9$.............
        backtrace:
          [<0000000098f3a26d>] alua_activate+0xb0/0x320
          [<000000003b529641>] scsi_dh_activate+0xb2/0x140
          [<000000007b296db3>] activate_path_work+0xc6/0xe0 [dm_multipath]
          [<000000007adc9ace>] process_one_work+0x3c5/0x730
          [<00000000c457a985>] worker_thread+0x93/0x650
          [<00000000cb80e628>] kthread+0x1ba/0x210
          [<00000000a1e61077>] ret_from_fork+0x22/0x30
      
      Fix the problem by freeing 'qdata' in error path.
      
      Fixes: 625fe857 ("scsi: scsi_dh_alua: Check scsi_device_get() return value")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Link: https://lore.kernel.org/r/20230315062154.668812-1-yukuai1@huaweicloud.comReviewed-by: default avatarBenjamin Block <bblock@linux.ibm.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      a13faca0
    • Quinn Tran's avatar
      scsi: qla2xxx: Synchronize the IOCB count to be in order · d3affdeb
      Quinn Tran authored
      A system hang was observed with the following call trace:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 15 PID: 86747 Comm: nvme Kdump: loaded Not tainted 6.2.0+ #1
      Hardware name: Dell Inc. PowerEdge R6515/04F3CJ, BIOS 2.7.3 03/31/2022
      RIP: 0010:__wake_up_common+0x55/0x190
      Code: 41 f6 01 04 0f 85 b2 00 00 00 48 8b 43 08 4c 8d
            40 e8 48 8d 43 08 48 89 04 24 48 89 c6\
            49 8d 40 18 48 39 c6 0f 84 e9 00 00 00 <49> 8b 40 18 89 6c 24 14 31
            ed 4c 8d 60 e8 41 8b 18 f6 c3 04 75 5d
      RSP: 0018:ffffb05a82afbba0 EFLAGS: 00010082
      RAX: 0000000000000000 RBX: ffff8f9b83a00018 RCX: 0000000000000000
      RDX: 0000000000000001 RSI: ffff8f9b83a00020 RDI: ffff8f9b83a00018
      RBP: 0000000000000001 R08: ffffffffffffffe8 R09: ffffb05a82afbbf8
      R10: 70735f7472617473 R11: 5f30307832616c71 R12: 0000000000000001
      R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
      FS:  00007f815cf4c740(0000) GS:ffff8f9eeed80000(0000)
      	knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 000000010633a000 CR4: 0000000000350ee0
      Call Trace:
          <TASK>
          __wake_up_common_lock+0x83/0xd0
          qla_nvme_ls_req+0x21b/0x2b0 [qla2xxx]
          __nvme_fc_send_ls_req+0x1b5/0x350 [nvme_fc]
          nvme_fc_xmt_disconnect_assoc+0xca/0x110 [nvme_fc]
          nvme_fc_delete_association+0x1bf/0x220 [nvme_fc]
          ? nvme_remove_namespaces+0x9f/0x140 [nvme_core]
          nvme_do_delete_ctrl+0x5b/0xa0 [nvme_core]
          nvme_sysfs_delete+0x5f/0x70 [nvme_core]
          kernfs_fop_write_iter+0x12b/0x1c0
          vfs_write+0x2a3/0x3b0
          ksys_write+0x5f/0xe0
          do_syscall_64+0x5c/0x90
          ? syscall_exit_work+0x103/0x130
          ? syscall_exit_to_user_mode+0x12/0x30
          ? do_syscall_64+0x69/0x90
          ? exit_to_user_mode_loop+0xd0/0x130
          ? exit_to_user_mode_prepare+0xec/0x100
          ? syscall_exit_to_user_mode+0x12/0x30
          ? do_syscall_64+0x69/0x90
          ? syscall_exit_to_user_mode+0x12/0x30
          ? do_syscall_64+0x69/0x90
          entry_SYSCALL_64_after_hwframe+0x72/0xdc
          RIP: 0033:0x7f815cd3eb97
      
      The IOCB counts are out of order and that would block any commands from
      going out and subsequently hang the system. Synchronize the IOCB count to
      be in correct order.
      
      Fixes: 5f63a163 ("scsi: qla2xxx: Fix exchange oversubscription for management commands")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarQuinn Tran <qutran@marvell.com>
      Signed-off-by: default avatarNilesh Javali <njavali@marvell.com>
      Link: https://lore.kernel.org/r/20230313043711.13500-3-njavali@marvell.comReviewed-by: default avatarHimanshu Madhani <himanshu.madhani@oracle.com>
      Reviewed-by: default avatarJohn Meneghini <jmeneghi@redhat.com>
      Tested-by: default avatarLin Li <lilin@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      d3affdeb
    • Nilesh Javali's avatar
      scsi: qla2xxx: Perform lockless command completion in abort path · 0367076b
      Nilesh Javali authored
      While adding and removing the controller, the following call trace was
      observed:
      
      WARNING: CPU: 3 PID: 623596 at kernel/dma/mapping.c:532 dma_free_attrs+0x33/0x50
      CPU: 3 PID: 623596 Comm: sh Kdump: loaded Not tainted 5.14.0-96.el9.x86_64 #1
      RIP: 0010:dma_free_attrs+0x33/0x50
      
      Call Trace:
         qla2x00_async_sns_sp_done+0x107/0x1b0 [qla2xxx]
         qla2x00_abort_srb+0x8e/0x250 [qla2xxx]
         ? ql_dbg+0x70/0x100 [qla2xxx]
         __qla2x00_abort_all_cmds+0x108/0x190 [qla2xxx]
         qla2x00_abort_all_cmds+0x24/0x70 [qla2xxx]
         qla2x00_abort_isp_cleanup+0x305/0x3e0 [qla2xxx]
         qla2x00_remove_one+0x364/0x400 [qla2xxx]
         pci_device_remove+0x36/0xa0
         __device_release_driver+0x17a/0x230
         device_release_driver+0x24/0x30
         pci_stop_bus_device+0x68/0x90
         pci_stop_and_remove_bus_device_locked+0x16/0x30
         remove_store+0x75/0x90
         kernfs_fop_write_iter+0x11c/0x1b0
         new_sync_write+0x11f/0x1b0
         vfs_write+0x1eb/0x280
         ksys_write+0x5f/0xe0
         do_syscall_64+0x5c/0x80
         ? do_user_addr_fault+0x1d8/0x680
         ? do_syscall_64+0x69/0x80
         ? exc_page_fault+0x62/0x140
         ? asm_exc_page_fault+0x8/0x30
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The command was completed in the abort path during driver unload with a
      lock held, causing the warning in abort path. Hence complete the command
      without any lock held.
      Reported-by: default avatarLin Li <lilin@redhat.com>
      Tested-by: default avatarLin Li <lilin@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNilesh Javali <njavali@marvell.com>
      Link: https://lore.kernel.org/r/20230313043711.13500-2-njavali@marvell.comReviewed-by: default avatarHimanshu Madhani <himanshu.madhani@oracle.com>
      Reviewed-by: default avatarJohn Meneghini <jmeneghi@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      0367076b
    • Joel Selvaraj's avatar
      scsi: core: Add BLIST_SKIP_VPD_PAGES for SKhynix H28U74301AMR · a204b490
      Joel Selvaraj authored
      Xiaomi Poco F1 (qcom/sdm845-xiaomi-beryllium*.dts) comes with a SKhynix
      H28U74301AMR UFS. The sd_read_cpr() operation leads to a 120 second
      timeout, making the device bootup very slow:
      
      [  121.457736] sd 0:0:0:1: [sdb] tag#23 timing out command, waited 120s
      
      Setting the BLIST_SKIP_VPD_PAGES allows the device to skip the failing
      sd_read_cpr operation and boot normally.
      Signed-off-by: default avatarJoel Selvaraj <joelselvaraj.oss@gmail.com>
      Link: https://lore.kernel.org/r/20230313041402.39330-1-joelselvaraj.oss@gmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      a204b490
  5. 10 Mar, 2023 3 commits
  6. 08 Mar, 2023 7 commits
  7. 06 Mar, 2023 17 commits