1. 17 Mar, 2023 3 commits
    • Quinn Tran's avatar
      scsi: qla2xxx: Synchronize the IOCB count to be in order · d3affdeb
      Quinn Tran authored
      A system hang was observed with the following call trace:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 15 PID: 86747 Comm: nvme Kdump: loaded Not tainted 6.2.0+ #1
      Hardware name: Dell Inc. PowerEdge R6515/04F3CJ, BIOS 2.7.3 03/31/2022
      RIP: 0010:__wake_up_common+0x55/0x190
      Code: 41 f6 01 04 0f 85 b2 00 00 00 48 8b 43 08 4c 8d
            40 e8 48 8d 43 08 48 89 04 24 48 89 c6\
            49 8d 40 18 48 39 c6 0f 84 e9 00 00 00 <49> 8b 40 18 89 6c 24 14 31
            ed 4c 8d 60 e8 41 8b 18 f6 c3 04 75 5d
      RSP: 0018:ffffb05a82afbba0 EFLAGS: 00010082
      RAX: 0000000000000000 RBX: ffff8f9b83a00018 RCX: 0000000000000000
      RDX: 0000000000000001 RSI: ffff8f9b83a00020 RDI: ffff8f9b83a00018
      RBP: 0000000000000001 R08: ffffffffffffffe8 R09: ffffb05a82afbbf8
      R10: 70735f7472617473 R11: 5f30307832616c71 R12: 0000000000000001
      R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
      FS:  00007f815cf4c740(0000) GS:ffff8f9eeed80000(0000)
      	knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 000000010633a000 CR4: 0000000000350ee0
      Call Trace:
          <TASK>
          __wake_up_common_lock+0x83/0xd0
          qla_nvme_ls_req+0x21b/0x2b0 [qla2xxx]
          __nvme_fc_send_ls_req+0x1b5/0x350 [nvme_fc]
          nvme_fc_xmt_disconnect_assoc+0xca/0x110 [nvme_fc]
          nvme_fc_delete_association+0x1bf/0x220 [nvme_fc]
          ? nvme_remove_namespaces+0x9f/0x140 [nvme_core]
          nvme_do_delete_ctrl+0x5b/0xa0 [nvme_core]
          nvme_sysfs_delete+0x5f/0x70 [nvme_core]
          kernfs_fop_write_iter+0x12b/0x1c0
          vfs_write+0x2a3/0x3b0
          ksys_write+0x5f/0xe0
          do_syscall_64+0x5c/0x90
          ? syscall_exit_work+0x103/0x130
          ? syscall_exit_to_user_mode+0x12/0x30
          ? do_syscall_64+0x69/0x90
          ? exit_to_user_mode_loop+0xd0/0x130
          ? exit_to_user_mode_prepare+0xec/0x100
          ? syscall_exit_to_user_mode+0x12/0x30
          ? do_syscall_64+0x69/0x90
          ? syscall_exit_to_user_mode+0x12/0x30
          ? do_syscall_64+0x69/0x90
          entry_SYSCALL_64_after_hwframe+0x72/0xdc
          RIP: 0033:0x7f815cd3eb97
      
      The IOCB counts are out of order and that would block any commands from
      going out and subsequently hang the system. Synchronize the IOCB count to
      be in correct order.
      
      Fixes: 5f63a163 ("scsi: qla2xxx: Fix exchange oversubscription for management commands")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarQuinn Tran <qutran@marvell.com>
      Signed-off-by: default avatarNilesh Javali <njavali@marvell.com>
      Link: https://lore.kernel.org/r/20230313043711.13500-3-njavali@marvell.comReviewed-by: default avatarHimanshu Madhani <himanshu.madhani@oracle.com>
      Reviewed-by: default avatarJohn Meneghini <jmeneghi@redhat.com>
      Tested-by: default avatarLin Li <lilin@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      d3affdeb
    • Nilesh Javali's avatar
      scsi: qla2xxx: Perform lockless command completion in abort path · 0367076b
      Nilesh Javali authored
      While adding and removing the controller, the following call trace was
      observed:
      
      WARNING: CPU: 3 PID: 623596 at kernel/dma/mapping.c:532 dma_free_attrs+0x33/0x50
      CPU: 3 PID: 623596 Comm: sh Kdump: loaded Not tainted 5.14.0-96.el9.x86_64 #1
      RIP: 0010:dma_free_attrs+0x33/0x50
      
      Call Trace:
         qla2x00_async_sns_sp_done+0x107/0x1b0 [qla2xxx]
         qla2x00_abort_srb+0x8e/0x250 [qla2xxx]
         ? ql_dbg+0x70/0x100 [qla2xxx]
         __qla2x00_abort_all_cmds+0x108/0x190 [qla2xxx]
         qla2x00_abort_all_cmds+0x24/0x70 [qla2xxx]
         qla2x00_abort_isp_cleanup+0x305/0x3e0 [qla2xxx]
         qla2x00_remove_one+0x364/0x400 [qla2xxx]
         pci_device_remove+0x36/0xa0
         __device_release_driver+0x17a/0x230
         device_release_driver+0x24/0x30
         pci_stop_bus_device+0x68/0x90
         pci_stop_and_remove_bus_device_locked+0x16/0x30
         remove_store+0x75/0x90
         kernfs_fop_write_iter+0x11c/0x1b0
         new_sync_write+0x11f/0x1b0
         vfs_write+0x1eb/0x280
         ksys_write+0x5f/0xe0
         do_syscall_64+0x5c/0x80
         ? do_user_addr_fault+0x1d8/0x680
         ? do_syscall_64+0x69/0x80
         ? exc_page_fault+0x62/0x140
         ? asm_exc_page_fault+0x8/0x30
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The command was completed in the abort path during driver unload with a
      lock held, causing the warning in abort path. Hence complete the command
      without any lock held.
      Reported-by: default avatarLin Li <lilin@redhat.com>
      Tested-by: default avatarLin Li <lilin@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNilesh Javali <njavali@marvell.com>
      Link: https://lore.kernel.org/r/20230313043711.13500-2-njavali@marvell.comReviewed-by: default avatarHimanshu Madhani <himanshu.madhani@oracle.com>
      Reviewed-by: default avatarJohn Meneghini <jmeneghi@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      0367076b
    • Joel Selvaraj's avatar
      scsi: core: Add BLIST_SKIP_VPD_PAGES for SKhynix H28U74301AMR · a204b490
      Joel Selvaraj authored
      Xiaomi Poco F1 (qcom/sdm845-xiaomi-beryllium*.dts) comes with a SKhynix
      H28U74301AMR UFS. The sd_read_cpr() operation leads to a 120 second
      timeout, making the device bootup very slow:
      
      [  121.457736] sd 0:0:0:1: [sdb] tag#23 timing out command, waited 120s
      
      Setting the BLIST_SKIP_VPD_PAGES allows the device to skip the failing
      sd_read_cpr operation and boot normally.
      Signed-off-by: default avatarJoel Selvaraj <joelselvaraj.oss@gmail.com>
      Link: https://lore.kernel.org/r/20230313041402.39330-1-joelselvaraj.oss@gmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      a204b490
  2. 10 Mar, 2023 3 commits
  3. 08 Mar, 2023 7 commits
  4. 06 Mar, 2023 20 commits
  5. 05 Mar, 2023 7 commits
    • Linus Torvalds's avatar
      Linux 6.3-rc1 · fe15c26e
      Linus Torvalds authored
      fe15c26e
    • Linus Torvalds's avatar
      cpumask: re-introduce constant-sized cpumask optimizations · 596ff4a0
      Linus Torvalds authored
      Commit aa47a7c2 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
      in the cpumask operations potentially becoming hugely less efficient,
      because suddenly the cpumask was always considered to be variable-sized.
      
      The optimization was then later added back in a limited form by commit
      6f9c07be ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
      FORCE_NR_CPUS option is not useful in a generic kernel and more of a
      special case for embedded situations with fixed hardware.
      
      Instead, just re-introduce the optimization, with some changes.
      
      Instead of depending on CPUMASK_OFFSTACK being false, and then always
      using the full constant cpumask width, this introduces three different
      cpumask "sizes":
      
       - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.
      
         This is used for situations where we should use the exact size.
      
       - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
         fits in a single word and the bitmap operations thus end up able
         to trigger the "small_const_nbits()" optimizations.
      
         This is used for the operations that have optimized single-word
         cases that get inlined, notably the bit find and scanning functions.
      
       - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
         is an sufficiently small constant that makes simple "copy" and
         "clear" operations more efficient.
      
         This is arbitrarily set at four words or less.
      
      As a an example of this situation, without this fixed size optimization,
      cpumask_clear() will generate code like
      
              movl    nr_cpu_ids(%rip), %edx
              addq    $63, %rdx
              shrq    $3, %rdx
              andl    $-8, %edx
              callq   memset@PLT
      
      on x86-64, because it would calculate the "exact" number of longwords
      that need to be cleared.
      
      In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
      reasonable value to use), the above becomes a single
      
      	movq $0,cpumask
      
      instruction instead, because instead of caring to figure out exactly how
      many CPU's the system has, it just knows that the cpumask will be a
      single word and can just clear it all.
      
      Note that this does end up tightening the rules a bit from the original
      version in another way: operations that set bits in the cpumask are now
      limited to the actual nr_cpu_ids limit, whereas we used to do the
      nr_cpumask_bits thing almost everywhere in the cpumask code.
      
      But if you just clear bits, or scan for bits, we can use the simpler
      compile-time constants.
      
      In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
      which were not useful, and which fundamentally have to be limited to
      'nr_cpu_ids'.  Better remove them now than have somebody introduce use
      of them later.
      
      Of course, on x86-64 with MAXSMP there is no sane small compile-time
      constant for the cpumask sizes, and we end up using the actual CPU bits,
      and will generate the above kind of horrors regardless.  Please don't
      use MAXSMP unless you really expect to have machines with thousands of
      cores.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      596ff4a0
    • Linus Torvalds's avatar
      Merge tag 'v6.3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · f915322f
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "Fix a regression in the caam driver"
      
      * tag 'v6.3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: caam - Fix edesc/iv ordering mixup
      f915322f
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7f9ec7d8
      Linus Torvalds authored
      Pull x86 updates from Thomas Gleixner:
       "A small set of updates for x86:
      
         - Return -EIO instead of success when the certificate buffer for SEV
           guests is not large enough
      
         - Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared
           on return to userspace for performance reasons, but the leaves user
           space vulnerable to cross-thread attacks which STIBP prevents.
           Update the documentation accordingly"
      
      * tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        virt/sev-guest: Return -EIO if certificate buffer is not large enough
        Documentation/hw-vuln: Document the interaction between IBRS and STIBP
        x86/speculation: Allow enabling STIBP with legacy IBRS
      7f9ec7d8
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4e9c542c
      Linus Torvalds authored
      Pull irq updates from Thomas Gleixner:
       "A set of updates for the interrupt susbsystem:
      
         - Prevent possible NULL pointer derefences in
           irq_data_get_affinity_mask() and irq_domain_create_hierarchy()
      
         - Take the per device MSI lock before invoking code which relies on
           it being hold
      
         - Make sure that MSI descriptors are unreferenced before freeing
           them. This was overlooked when the platform MSI code was converted
           to use core infrastructure and results in a fals positive warning
      
         - Remove dead code in the MSI subsystem
      
         - Clarify the documentation for pci_msix_free_irq()
      
         - More kobj_type constification"
      
      * tag 'irq-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/msi, platform-msi: Ensure that MSI descriptors are unreferenced
        genirq/msi: Drop dead domain name assignment
        irqdomain: Add missing NULL pointer check in irq_domain_create_hierarchy()
        genirq/irqdesc: Make kobj_type structures constant
        PCI/MSI: Clarify usage of pci_msix_free_irq()
        genirq/msi: Take the per-device MSI lock before validating the control structure
        genirq/ipi: Fix NULL pointer deref in irq_data_get_affinity_mask()
      4e9c542c
    • Linus Torvalds's avatar
      Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 1a90673e
      Linus Torvalds authored
      Pull vfs update from Al Viro:
       "Adding Christian Brauner as VFS co-maintainer"
      
      * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        Adding VFS co-maintainer
      1a90673e
    • Linus Torvalds's avatar
      Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 1a8d05a7
      Linus Torvalds authored
      Pull VM_FAULT_RETRY fixes from Al Viro:
       "Some of the page fault handlers do not deal with the following case
        correctly:
      
         - handle_mm_fault() has returned VM_FAULT_RETRY
      
         - there is a pending fatal signal
      
         - fault had happened in kernel mode
      
        Correct action in such case is not "return unconditionally" - fatal
        signals are handled only upon return to userland and something like
        copy_to_user() would end up retrying the faulting instruction and
        triggering the same fault again and again.
      
        What we need to do in such case is to make the caller to treat that as
        failed uaccess attempt - handle exception if there is an exception
        handler for faulting instruction or oops if there isn't one.
      
        Over the years some architectures had been fixed and now are handling
        that case properly; some still do not. This series should fix the
        remaining ones.
      
        Status:
      
         - m68k, riscv, hexagon, parisc: tested/acked by maintainers.
      
         - alpha, sparc32, sparc64: tested locally - bug has been reproduced
           on the unpatched kernel and verified to be fixed by this series.
      
         - ia64, microblaze, nios2, openrisc: build, but otherwise completely
           untested"
      
      * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        openrisc: fix livelock in uaccess
        nios2: fix livelock in uaccess
        microblaze: fix livelock in uaccess
        ia64: fix livelock in uaccess
        sparc: fix livelock in uaccess
        alpha: fix livelock in uaccess
        parisc: fix livelock in uaccess
        hexagon: fix livelock in uaccess
        riscv: fix livelock in uaccess
        m68k: fix livelock in uaccess
      1a8d05a7