1. 26 Sep, 2023 4 commits
    • Ilya Dryomov's avatar
      rbd: take header_rwsem in rbd_dev_refresh() only when updating · 0b207d02
      Ilya Dryomov authored
      rbd_dev_refresh() has been holding header_rwsem across header and
      parent info read-in unnecessarily for ages.  With commit 870611e4
      ("rbd: get snapshot context after exclusive lock is ensured to be
      held"), the potential for deadlocks became much more real owning to
      a) header_rwsem now nesting inside lock_rwsem and b) rw_semaphores
      not allowing new readers after a writer is registered.
      
      For example, assuming that I/O request 1, I/O request 2 and header
      read-in request all target the same OSD:
      
      1. I/O request 1 comes in and gets submitted
      2. watch error occurs
      3. rbd_watch_errcb() takes lock_rwsem for write, clears owner_cid and
         releases lock_rwsem
      4. after reestablishing the watch, rbd_reregister_watch() calls
         rbd_dev_refresh() which takes header_rwsem for write and submits
         a header read-in request
      5. I/O request 2 comes in: after taking lock_rwsem for read in
         __rbd_img_handle_request(), it blocks trying to take header_rwsem
         for read in rbd_img_object_requests()
      6. another watch error occurs
      7. rbd_watch_errcb() blocks trying to take lock_rwsem for write
      8. I/O request 1 completion is received by the messenger but can't be
         processed because lock_rwsem won't be granted anymore
      9. header read-in request completion can't be received, let alone
         processed, because the messenger is stranded
      
      Change rbd_dev_refresh() to take header_rwsem only for actually
      updating rbd_dev->header.  Header and parent info read-in don't need
      any locking.
      
      Cc: stable@vger.kernel.org # 0b035401: rbd: move rbd_dev_refresh() definition
      Cc: stable@vger.kernel.org # 510a7330: rbd: decouple header read-in from updating rbd_dev->header
      Cc: stable@vger.kernel.org # c1031177: rbd: decouple parent info read-in from updating rbd_dev
      Cc: stable@vger.kernel.org
      Fixes: 870611e4 ("rbd: get snapshot context after exclusive lock is ensured to be held")
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarDongsheng Yang <dongsheng.yang@easystack.cn>
      0b207d02
    • Ilya Dryomov's avatar
      rbd: decouple parent info read-in from updating rbd_dev · c1031177
      Ilya Dryomov authored
      Unlike header read-in, parent info read-in is already decoupled in
      get_parent_info(), but it's buried in rbd_dev_v2_parent_info() along
      with the processing logic.
      
      Separate the initial read-in and update read-in logic into
      rbd_dev_setup_parent() and rbd_dev_update_parent() respectively and
      have rbd_dev_v2_parent_info() just populate struct parent_image_info
      (i.e. what get_parent_info() did).  Some existing QoI issues, like
      flatten of a standalone clone being disregarded on refresh, remain.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarDongsheng Yang <dongsheng.yang@easystack.cn>
      c1031177
    • Ilya Dryomov's avatar
      rbd: decouple header read-in from updating rbd_dev->header · 510a7330
      Ilya Dryomov authored
      Make rbd_dev_header_info() populate a passed struct rbd_image_header
      instead of rbd_dev->header and introduce rbd_dev_update_header() for
      updating mutable fields in rbd_dev->header upon refresh.  The initial
      read-in of both mutable and immutable fields in rbd_dev_image_probe()
      passes in rbd_dev->header so no update step is required there.
      
      rbd_init_layout() is now called directly from rbd_dev_image_probe()
      instead of individually in format 1 and format 2 implementations.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarDongsheng Yang <dongsheng.yang@easystack.cn>
      510a7330
    • Ilya Dryomov's avatar
      rbd: move rbd_dev_refresh() definition · 0b035401
      Ilya Dryomov authored
      Move rbd_dev_refresh() definition further down to avoid having to
      move struct parent_image_info definition in the next commit.  This
      spares some forward declarations too.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarDongsheng Yang <dongsheng.yang@easystack.cn>
      0b035401
  2. 18 Sep, 2023 2 commits
  3. 17 Sep, 2023 11 commits
  4. 16 Sep, 2023 12 commits
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v6.6' of... · f0b0d403
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Fix kernel-devel RPM and linux-headers Deb package
      
       - Fix too long argument list error in 'make modules_install'
      
      * tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kbuild: avoid long argument lists in make modules_install
        kbuild: fix kernel-devel RPM package and linux-headers Deb package
      f0b0d403
    • Linus Torvalds's avatar
      vm: fix move_vma() memory accounting being off · 3cec5049
      Linus Torvalds authored
      Commit 408579cd ("mm: Update do_vmi_align_munmap() return
      semantics") seems to have updated one of the callers of do_vmi_munmap()
      incorrectly: it used to check for the error case (which didn't
      change: negative means error).
      
      That commit changed the check to the success case (which did change:
      before that commit, 0 was success, and 1 was "success and lock
      downgraded".  After the change, it's always 0 for success, and the lock
      will have been released if requested).
      
      This didn't change any actual VM behavior _except_ for memory accounting
      when 'VM_ACCOUNT' was set on the vma.  Which made the wrong return value
      test fairly subtle, since everything continues to work.
      
      Or rather - it continues to work but the "Committed memory" accounting
      goes all wonky (Committed_AS value in /proc/meminfo), and depending on
      settings that then causes problems much much later as the VM relies on
      bogus statistics for its heuristics.
      
      Revert that one line of the change back to the original logic.
      
      Fixes: 408579cd ("mm: Update do_vmi_align_munmap() return semantics")
      Reported-by: default avatarChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Reported-bisected-and-tested-by: default avatarMichael Labiuk <michael.labiuk@virtuozzo.com>
      Cc: Bagas Sanjaya <bagasdotme@gmail.com>
      Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
      Link: https://lore.kernel.org/all/1694366957@msgid.manchmal.in-ulm.de/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3cec5049
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · ad8a69f3
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "16 small(ish) fixes all in drivers.
      
        The major fixes are in pm8001 (fixes MSI-X issue going back to its
        origin), the qla2xxx endianness fix, which fixes a bug on big endian
        and the lpfc ones which can cause an oops on module removal without
        them"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: lpfc: Prevent use-after-free during rmmod with mapped NVMe rports
        scsi: lpfc: Early return after marking final NLP_DROPPED flag in dev_loss_tmo
        scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file()
        scsi: target: core: Fix target_cmd_counter leak
        scsi: pm8001: Setup IRQs on resume
        scsi: pm80xx: Avoid leaking tags when processing OPC_INB_SET_CONTROLLER_CONFIG command
        scsi: pm80xx: Use phy-specific SAS address when sending PHY_START command
        scsi: ufs: core: Poll HCS.UCRDY before issuing a UIC command
        scsi: ufs: core: Move __ufshcd_send_uic_cmd() outside host_lock
        scsi: qedf: Add synchronization between I/O completions and abort
        scsi: target: Replace strlcpy() with strscpy()
        scsi: qla2xxx: Fix NULL vs IS_ERR() bug for debugfs_create_dir()
        scsi: qla2xxx: Use raw_smp_processor_id() instead of smp_processor_id()
        scsi: qla2xxx: Correct endianness for rqstlen and rsplen
        scsi: ppa: Fix accidentally reversed conditions for 16-bit and 32-bit EPP
        scsi: megaraid_sas: Fix deadlock on firmware crashdump
      ad8a69f3
    • Linus Torvalds's avatar
      Merge tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · cc3e5afc
      Linus Torvalds authored
      Pull ata fixes from Damien Le Moal:
      
       - Fix link power management transitions to disallow unsupported states
         (Niklas)
      
       - A small string handling fix for the sata_mv driver (Christophe)
      
       - Clear port pending interrupts before reset, as per AHCI
         specifications (Szuying).
      
         Followup fixes for this one are to not clear ATA_PFLAG_EH_PENDING in
         ata_eh_reset() to allow EH to continue on with other actions recorded
         with error interrupts triggered before EH completes. And an
         additional fix to avoid thawing a port twice in EH (Niklas)
      
       - Small code style fixes in the pata_parport driver to silence the
         build bot as it keeps complaining about bad indentation (me)
      
       - A fix for the recent CDL code to avoid fetching sense data for
         successful commands when not necessary for correct operation (Niklas)
      
      * tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
        ata: libata-core: fetch sense data for successful commands iff CDL enabled
        ata: libata-eh: do not thaw the port twice in ata_eh_reset()
        ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
        ata: pata_parport: Fix code style issues
        ata: libahci: clear pending interrupt status
        ata: sata_mv: Fix incorrect string length computation in mv_dump_mem()
        ata: libata: disallow dev-initiated LPM transitions to unsupported states
      cc3e5afc
    • Linus Torvalds's avatar
      Merge tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · cce67b6b
      Linus Torvalds authored
      Pull USB fix from Greg KH:
       "Here is a single USB fix for a much-reported regression for 6.6-rc1.
      
        It resolves a crash in the typec debugfs code for many systems. It's
        been in linux-next with no reported issues, and many people have
        reported it resolving their problem with 6.6-rc1"
      
      * tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: typec: ucsi: Fix NULL pointer dereference
      cce67b6b
    • Linus Torvalds's avatar
      Merge tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core · 205d0494
      Linus Torvalds authored
      Pull driver core fixes from Greg KH:
       "Here is a single driver core fix for a much-reported-by-sysbot issue
        that showed up in 6.6-rc1. It's been submitted by many people, all in
        the same way, so it obviously fixes things for them all.
      
        Also in here is a single documentation update adding riscv to the
        embargoed hardware document in case there are any future issues with
        that processor family.
      
        Both of these have been in linux-next with no reported problems"
      
      * tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        Documentation: embargoed-hardware-issues.rst: Add myself for RISC-V
        driver core: return an error when dev_set_name() hasn't happened
      205d0494
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · fd455e77
      Linus Torvalds authored
      Pull char/misc fix from Greg KH:
       "Here is a single patch for 6.6-rc2 that reverts a 6.5 change for the
        comedi subsystem that has ended up being incorrect and caused drivers
        that were working for people to be unable to be able to be selected to
        build at all.
      
        To fix this, the Kconfig change needs to be reverted and a future set
        of fixes for the ioport dependancies will show up in 6.7-rc1 (there's
        no rush for them.)
      
        This has been in linux-next with no reported issues"
      
      * tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        Revert "comedi: add HAS_IOPORT dependencies"
      fd455e77
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · c37f8efc
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "The main thing is the removal of 'probe_new' because all i2c client
        drivers are converted now. Thanks Uwe, this marks the end of a long
        conversion process.
      
        Other than that, we have a few Kconfig updates and driver bugfixes"
      
      * tag 'i2c-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: cadence: Fix the kernel-doc warnings
        i2c: aspeed: Reset the i2c controller when timeout occurs
        i2c: I2C_MLXCPLD on ARM64 should depend on ACPI
        i2c: Make I2C_ATR invisible
        i2c: Drop legacy callback .probe_new()
        w1: ds2482: Switch back to use struct i2c_driver's .probe()
      c37f8efc
    • Niklas Cassel's avatar
      ata: libata-core: fetch sense data for successful commands iff CDL enabled · 5e35a9ac
      Niklas Cassel authored
      Currently, we fetch sense data for a _successful_ command if either:
      1) Command was NCQ and ATA_DFLAG_CDL_ENABLED flag set (flag
         ATA_DFLAG_CDL_ENABLED will only be set if the Successful NCQ command
         sense data supported bit is set); or
      2) Command was non-NCQ and regular sense data reporting is enabled.
      
      This means that case 2) will trigger for a non-NCQ command which has
      ATA_SENSE bit set, regardless if CDL is enabled or not.
      
      This decision was by design. If the device reports that it has sense data
      available, it makes sense to fetch that sense data, since the sk/asc/ascq
      could be important information regardless if CDL is enabled or not.
      
      However, the fetching of sense data for a successful command is done via
      ATA EH. Considering how intricate the ATA EH is, we really do not want to
      invoke ATA EH unless absolutely needed.
      
      Before commit 18bd7718 ("scsi: ata: libata: Handle completion of CDL
      commands using policy 0xD") we never fetched sense data for successful
      commands.
      
      In order to not invoke the ATA EH unless absolutely necessary, even if the
      device claims support for sense data reporting, only fetch sense data for
      successful (NCQ and non-NCQ commands) commands that are using CDL.
      
      [Damien] Modified the check to test the qc flag ATA_QCFLAG_HAS_CDL
      instead of the device support for CDL, which is implied for commands
      using CDL.
      
      Fixes: 3ac873c7 ("ata: libata-core: fix when to fetch sense data for successful commands")
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      5e35a9ac
    • Niklas Cassel's avatar
      ata: libata-eh: do not thaw the port twice in ata_eh_reset() · 7a3bc2b3
      Niklas Cassel authored
      commit 1e641060 ("libata: clear eh_info on reset completion") added
      a workaround that broke the retry mechanism in ATA EH.
      
      Tejun himself suggested to remove this workaround when it was identified
      to cause additional problems:
      https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/
      
      He even said:
      "Hmm... it seems I wasn't thinking straight when I added that work around."
      https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/
      
      While removing the workaround solved the issue, however, the workaround was
      kept to avoid "spurious hotplug events during reset", and instead another
      workaround was added on top of the existing workaround in commit
      8c56cacc ("libata: fix unexpectedly frozen port after ata_eh_reset()").
      
      Because these IRQs happened when the port was frozen, we know that they
      were actually a side effect of PxIS and IS.IPS(x) not being cleared before
      the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
      pending interrupt status"), so these workarounds can now be removed.
      
      Since commit 1e641060 ("libata: clear eh_info on reset completion") has
      now been reverted, the ATA EH retry mechanism is functional again, so there
      is once again no need to thaw the port more than once in ata_eh_reset().
      
      This reverts "the workaround on top of the workaround" introduced in commit
      8c56cacc ("libata: fix unexpectedly frozen port after ata_eh_reset()").
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      7a3bc2b3
    • Niklas Cassel's avatar
      ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset() · 80cc944e
      Niklas Cassel authored
      ata_scsi_port_error_handler() starts off by clearing ATA_PFLAG_EH_PENDING,
      before calling ap->ops->error_handler() (without holding the ap->lock).
      
      If an error IRQ is received while ap->ops->error_handler() is running,
      the irq handler will set ATA_PFLAG_EH_PENDING.
      
      Once ap->ops->error_handler() returns, ata_scsi_port_error_handler()
      checks if ATA_PFLAG_EH_PENDING is set, and if it is, another iteration
      of ATA EH is performed.
      
      The problem is that ATA_PFLAG_EH_PENDING is not only cleared by
      ata_scsi_port_error_handler(), it is also cleared by ata_eh_reset().
      
      ata_eh_reset() is called by ap->ops->error_handler(). This additional
      clearing done by ata_eh_reset() breaks the whole retry logic in
      ata_scsi_port_error_handler(). Thus, if an error IRQ is received while
      ap->ops->error_handler() is running, the port will currently remain
      frozen and will never get re-enabled.
      
      The additional clearing in ata_eh_reset() was introduced in commit
      1e641060 ("libata: clear eh_info on reset completion").
      
      Looking at the original error report:
      https://marc.info/?l=linux-ide&m=124765325828495&w=2
      
      We can see the following happening:
      [    1.074659] ata3: XXX port freeze
      [    1.074700] ata3: XXX hardresetting link, stopping engine
      [    1.074746] ata3: XXX flipping SControl
      
      [    1.411471] ata3: XXX irq_stat=400040 CONN|PHY
      [    1.411475] ata3: XXX port freeze
      
      [    1.420049] ata3: XXX starting engine
      [    1.420096] ata3: XXX rc=0, class=1
      [    1.420142] ata3: XXX clearing IRQs for thawing
      [    1.420188] ata3: XXX port thawed
      [    1.420234] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
      
      We are not supposed to be able to receive an error IRQ while the port is
      frozen (PxIE is set to 0, i.e. all IRQs for the port are disabled).
      
      AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) states:
      "Each bit location can be thought of as reporting a '1' if the virtual
      "interrupt line" for that port is indicating it wishes to generate an
      interrupt. That is, if a port has one or more interrupt status bit set,
      and the enables for those status bits are set, then this bit shall be set."
      
      Additionally, AHCI state P:ComInit clearly shows that the state machine
      will only jump to P:ComInitSetIS (which sets IS.IPS(x) to '1'), if PxIE.PCE
      is set to '1'. In our case, PxIE is set to 0, so IS.IPS(x) won't get set.
      
      So IS.IPS(x) only gets set if PxIS and PxIE is set.
      
      AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) also states:
      "The bits in this register are read/write clear. It is set by the level of
      the virtual interrupt line being a set, and cleared by a write of '1' from
      the software."
      
      So if IS.IPS(x) is set, you need to explicitly clear it by writing a 1 to
      IS.IPS(x) for that port.
      
      Since PxIE is cleared, the only way to get an interrupt while the port is
      frozen, is if IS.IPS(x) is set, and the only way IS.IPS(x) can be set when
      the port is frozen, is if it was set before the port was frozen.
      
      However, since commit 737dd811 ("ata: libahci: clear pending interrupt
      status"), we clear both PxIS and IS.IPS(x) after freezing the port, but
      before the COMRESET, so the problem that commit 1e641060 ("libata:
      clear eh_info on reset completion") fixed can no longer happen.
      
      Thus, revert commit 1e641060 ("libata: clear eh_info on reset
      completion"), so that the retry logic in ata_scsi_port_error_handler()
      works once again. (The retry logic is still needed, since we can still
      get an error IRQ _after_ the port has been thawed, but before
      ata_scsi_port_error_handler() takes the ap->lock in order to check
      if ATA_PFLAG_EH_PENDING is set.)
      Signed-off-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      80cc944e
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-fixes-6.6-rc2' of... · 57d88e8a
      Linus Torvalds authored
      Merge tag 'linux-kselftest-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull more kselftest fixes from Shuah Khan
       "Fixes to user_events test and ftrace test.
      
        The user_events test was enabled by default in Linux 6.6-rc1. The
        following fixes are for bugs found since then:
      
         - add checks for dependencies and skip the test if they aren't met.
      
           The user_events test requires root access, and tracefs and
           user_events enabled. It leaves tracefs mounted and a fix is in
           progress for that missing piece.
      
         - create user_events test-specific Kconfig fragments
      
        ftrace test fixes:
      
         - unmount tracefs for recovering environment. Fix identified during
           the above mentioned user_events dependencies fix.
      
         - adds softlink to latest log directory improving usage"
      
      * tag 'linux-kselftest-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests: tracing: Fix to unmount tracefs for recovering environment
        selftests: user_events: create test-specific Kconfig fragments
        ftrace/selftests: Add softlink to latest log directory
        selftests/user_events: Fix failures when user_events is not installed
      57d88e8a
  5. 15 Sep, 2023 11 commits
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · d8d7cd65
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
      
       - Use correct order when encoding NFSv4 RENAME change_info
      
       - Fix a potential oops during NFSD shutdown
      
      * tag 'nfsd-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        NFSD: fix possible oops when nfsd/pool_stats is closed.
        nfsd: fix change_info in NFSv4 RENAME replies
      d8d7cd65
    • Linus Torvalds's avatar
      Merge tag 'pm-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 4eb2bd24
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "Fix the handling of block devices in the test_resume mode of
        hibernation (Chen Yu)"
      
      * tag 'pm-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM: hibernate: Fix the exclusive get block device in test_resume mode
        PM: hibernate: Rename function parameter from snapshot_test to exclusive
      4eb2bd24
    • Linus Torvalds's avatar
      Merge tag 'thermal-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e2dd7a16
      Linus Torvalds authored
      Pull thermal control fixes from Rafael Wysocki:
       "These fix a thermal core breakage introduced by one of the recent
        changes, amend those changes by adding 'const' to a new callback
        argument and fix two memory leaks.
      
        Specifics:
      
         - Unbreak disabled trip point check in handle_thermal_trip() that may
           cause it to skip enabled trip points (Rafael Wysocki)
      
         - Add missing of_node_put() to of_find_trip_id() and
           thermal_of_for_each_cooling_maps() that each break out of a
           for_each_child_of_node() loop without dropping the reference to the
           child object (Julia Lawall)
      
         - Constify the recently added trip argument of the .get_trend()
           thermal zone callback (Rafael Wysocki)"
      
      * tag 'thermal-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal: core: Fix disabled trip point check in handle_thermal_trip()
        thermal: Constify the trip argument of the .get_trend() zone callback
        thermal/of: add missing of_node_put()
      e2dd7a16
    • Linus Torvalds's avatar
      Merge tag 'for-6.6/dm-fixes' of... · e39bfb59
      Linus Torvalds authored
      Merge tag 'for-6.6/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - Fix DM core retrieve_deps() UAF race due to missing locking of a DM
         table's list of devices that is managed using dm_{get,put}_device.
      
       - Revert DM core's half-baked RCU optimization if IO submitter has set
         REQ_NOWAIT. Can be revisited, and properly justified, after
         comprehensively auditing all of DM to also pass GFP_NOWAIT for any
         allocations if REQ_NOWAIT used.
      
      * tag 'for-6.6/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm: don't attempt to queue IO under RCU protection
        dm: fix a race condition in retrieve_deps
      e39bfb59
    • Linus Torvalds's avatar
      Merge tag 'block-6.6-2023-09-15' of git://git.kernel.dk/linux · 5bc357b2
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull via Keith:
            - nvme-tcp iov len fix (Varun)
            - nvme-hwmon const qualifier for safety (Krzysztof)
            - nvme-fc null pointer checks (Nigel)
            - nvme-pci no numa node fix (Pratyush)
            - nvme timeout fix for non-compliant controllers (Keith)
      
       - MD pull via Song fixing regressions with both 6.5 and 6.6
      
       - Fix a use-after-free regression in resizing blk-mq tags (Chengming)
      
      * tag 'block-6.6-2023-09-15' of git://git.kernel.dk/linux:
        nvme: avoid bogus CRTO values
        md: Put the right device in md_seq_next
        nvme-pci: do not set the NUMA node of device if it has none
        blk-mq: fix tags UAF when shrinking q->nr_hw_queues
        md/raid1: fix error: ISO C90 forbids mixed declarations
        md: fix warning for holder mismatch from export_rdev()
        md: don't dereference mddev after export_rdev()
        nvme-fc: Prevent null pointer dereference in nvme_fc_io_getuuid()
        nvme: host: hwmon: constify pointers to hwmon_channel_info
        nvmet-tcp: pass iov_len instead of sg->length to bvec_set_page()
      5bc357b2
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.6-2023-09-15' of git://git.kernel.dk/linux · 31d8fddb
      Linus Torvalds authored
      Pull io_uring fix from Jens Axboe:
       "Just a single fix, fixing a regression with poll first, recvmsg, and
        using a provided buffer"
      
      * tag 'io_uring-6.6-2023-09-15' of git://git.kernel.dk/linux:
        io_uring/net: fix iter retargeting for selected buf
      31d8fddb
    • Linus Torvalds's avatar
      Merge tag 'firewire-fixes-6.6-rc2' of... · 0e494be7
      Linus Torvalds authored
      Merge tag 'firewire-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
      
      Pull firewire fix from Takashi Sakamoto:
       "A change applied to v6.5 kernel brings an issue that usual GFP
        allocation is done in atomic context under acquired spin-lock. Let us
        revert it"
      
      * tag 'firewire-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
        Revert "firewire: core: obsolete usage of GFP_ATOMIC at building node tree"
      0e494be7
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2023-09-15' of git://anongit.freedesktop.org/drm/drm · 9608c7b7
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Regular rc2 fixes pull, mostly made up of amdgpu stuff, one i915, and
        a bunch of others, one vkms locking violation is reverted.
      
        connector:
         - doc fix
      
        exec:
         - workaround lockdep issue
      
        tests:
         - fix a UAF
      
        vkms:
         - revert hrtimer fix
      
        fbdev:
         - g364fb: fix build failure with mips
      
        i915:
         - Only check eDP HPD when AUX CH is shared.
      
        amdgpu:
         - GC 9.4.3 fixes
         - Fix white screen issues with S/G display on system with >= 64G of ram
         - Replay fixes
         - SMU 13.0.6 fixes
         - AUX backlight fix
         - NBIO 4.3 SR-IOV fixes for HDP
         - RAS fixes
         - DP MST resume fix
         - Fix segfault on systems with no vbios
         - DPIA fixes
      
        amdkfd:
         - CWSR grace period fix
         - Unaligned doorbell fix
         - CRIU fix for GFX11
         - Add missing TLB flush on gfx10 and newer
      
        radeon:
         - make fence wait in suballocator uninterrruptable
      
        gm12u320:
         - Fix the timeout usage for usb_bulk_msg()"
      
      * tag 'drm-fixes-2023-09-15' of git://anongit.freedesktop.org/drm/drm: (29 commits)
        drm/tests: helpers: Avoid a driver uaf
        Revert "drm/vkms: Fix race-condition between the hrtimer and the atomic commit"
        drm/amdkfd: Insert missing TLB flush on GFX10 and later
        drm/i915: Only check eDP HPD when AUX CH is shared
        drm/amd/display: Fix 2nd DPIA encoder Assignment
        drm/amd/display: Add DPIA Link Encoder Assignment Fix
        drm/amd/display: fix replay_mode kernel-doc warning
        drm/amdgpu: Handle null atom context in VBIOS info ioctl
        drm/amdkfd: Checkpoint and restore queues on GFX11
        drm/amd/display: Adjust the MST resume flow
        drm/amdgpu: fallback to old RAS error message for aqua_vanjaram
        drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset for SR-IOV
        drm/amdgpu/soc21: don't remap HDP registers for SR-IOV
        drm/amd/display: Don't check registers, if using AUX BL control
        drm/amdgpu: fix retry loop test
        drm/amd/display: Add dirty rect support for Replay
        Revert "drm/amd: Disable S/G for APUs when 64GB or more host memory"
        drm/amd/display: fix the white screen issue when >= 64GB DRAM
        drm/amdkfd: Update CU masking for GFX 9.4.3
        drm/amdkfd: Update cache info reporting for GFX v9.4.3
        ...
      9608c7b7
    • Linus Torvalds's avatar
      Merge tag 'efi-fixes-for-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · e42bebf6
      Linus Torvalds authored
      Pull EFI fixes from Ard Biesheuvel:
      
       - Missing x86 patch for the runtime cleanup that was merged in -rc1
      
       - Kconfig tweak for kexec on x86 so EFI support does not get disabled
         inadvertently
      
       - Use the right EFI memory type for the unaccepted memory table so
         kexec/kdump exposes it to the crash kernel as well
      
       - Work around EFI implementations which do not implement
         QueryVariableInfo, which is now called by statfs() on efivarfs
      
      * tag 'efi-fixes-for-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        efivarfs: fix statfs() on efivarfs
        efi/unaccepted: Use ACPI reclaim memory for unaccepted memory table
        efi/x86: Ensure that EFI_RUNTIME_MAP is enabled for kexec
        efi/x86: Move EFI runtime call setup/teardown helpers out of line
      e42bebf6
    • Jens Axboe's avatar
      dm: don't attempt to queue IO under RCU protection · a9ce3853
      Jens Axboe authored
      dm looks up the table for IO based on the request type, with an
      assumption that if the request is marked REQ_NOWAIT, it's fine to
      attempt to submit that IO while under RCU read lock protection. This
      is not OK, as REQ_NOWAIT just means that we should not be sleeping
      waiting on other IO, it does not mean that we can't potentially
      schedule.
      
      A simple test case demonstrates this quite nicely:
      
      int main(int argc, char *argv[])
      {
              struct iovec iov;
              int fd;
      
              fd = open("/dev/dm-0", O_RDONLY | O_DIRECT);
              posix_memalign(&iov.iov_base, 4096, 4096);
              iov.iov_len = 4096;
              preadv2(fd, &iov, 1, 0, RWF_NOWAIT);
              return 0;
      }
      
      which will instantly spew:
      
      BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
      in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 5580, name: dm-nowait
      preempt_count: 0, expected: 0
      RCU nest depth: 1, expected: 0
      INFO: lockdep is turned off.
      CPU: 7 PID: 5580 Comm: dm-nowait Not tainted 6.6.0-rc1-g39956d2dcd81 #132
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x11d/0x1b0
       __might_resched+0x3c3/0x5e0
       ? preempt_count_sub+0x150/0x150
       mempool_alloc+0x1e2/0x390
       ? mempool_resize+0x7d0/0x7d0
       ? lock_sync+0x190/0x190
       ? lock_release+0x4b7/0x670
       ? internal_get_user_pages_fast+0x868/0x2d40
       bio_alloc_bioset+0x417/0x8c0
       ? bvec_alloc+0x200/0x200
       ? internal_get_user_pages_fast+0xb8c/0x2d40
       bio_alloc_clone+0x53/0x100
       dm_submit_bio+0x27f/0x1a20
       ? lock_release+0x4b7/0x670
       ? blk_try_enter_queue+0x1a0/0x4d0
       ? dm_dax_direct_access+0x260/0x260
       ? rcu_is_watching+0x12/0xb0
       ? blk_try_enter_queue+0x1cc/0x4d0
       __submit_bio+0x239/0x310
       ? __bio_queue_enter+0x700/0x700
       ? kvm_clock_get_cycles+0x40/0x60
       ? ktime_get+0x285/0x470
       submit_bio_noacct_nocheck+0x4d9/0xb80
       ? should_fail_request+0x80/0x80
       ? preempt_count_sub+0x150/0x150
       ? lock_release+0x4b7/0x670
       ? __bio_add_page+0x143/0x2d0
       ? iov_iter_revert+0x27/0x360
       submit_bio_noacct+0x53e/0x1b30
       submit_bio_wait+0x10a/0x230
       ? submit_bio_wait_endio+0x40/0x40
       __blkdev_direct_IO_simple+0x4f8/0x780
       ? blkdev_bio_end_io+0x4c0/0x4c0
       ? stack_trace_save+0x90/0xc0
       ? __bio_clone+0x3c0/0x3c0
       ? lock_release+0x4b7/0x670
       ? lock_sync+0x190/0x190
       ? atime_needs_update+0x3bf/0x7e0
       ? timestamp_truncate+0x21b/0x2d0
       ? inode_owner_or_capable+0x240/0x240
       blkdev_direct_IO.part.0+0x84a/0x1810
       ? rcu_is_watching+0x12/0xb0
       ? lock_release+0x4b7/0x670
       ? blkdev_read_iter+0x40d/0x530
       ? reacquire_held_locks+0x4e0/0x4e0
       ? __blkdev_direct_IO_simple+0x780/0x780
       ? rcu_is_watching+0x12/0xb0
       ? __mark_inode_dirty+0x297/0xd50
       ? preempt_count_add+0x72/0x140
       blkdev_read_iter+0x2a4/0x530
       do_iter_readv_writev+0x2f2/0x3c0
       ? generic_copy_file_range+0x1d0/0x1d0
       ? fsnotify_perm.part.0+0x25d/0x630
       ? security_file_permission+0xd8/0x100
       do_iter_read+0x31b/0x880
       ? import_iovec+0x10b/0x140
       vfs_readv+0x12d/0x1a0
       ? vfs_iter_read+0xb0/0xb0
       ? rcu_is_watching+0x12/0xb0
       ? rcu_is_watching+0x12/0xb0
       ? lock_release+0x4b7/0x670
       do_preadv+0x1b3/0x260
       ? do_readv+0x370/0x370
       __x64_sys_preadv2+0xef/0x150
       do_syscall_64+0x39/0xb0
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f5af41ad806
      Code: 41 54 41 89 fc 55 44 89 c5 53 48 89 cb 48 83 ec 18 80 3d e4 dd 0d 00 00 74 7a 45 89 c1 49 89 ca 45 31 c0 b8 47 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 be 00 00 00 48 85 c0 79 4a 48 8b 0d da 55
      RSP: 002b:00007ffd3145c7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000147
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5af41ad806
      RDX: 0000000000000001 RSI: 00007ffd3145c850 RDI: 0000000000000003
      RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000008
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
      R13: 00007ffd3145c850 R14: 000055f5f0431dd8 R15: 0000000000000001
       </TASK>
      
      where in fact it is dm itself that attempts to allocate a bio clone with
      GFP_NOIO under the rcu read lock, regardless of the request type.
      
      Fix this by getting rid of the special casing for REQ_NOWAIT, and just
      use the normal SRCU protected table lookup. Get rid of the bio based
      table locking helpers at the same time, as they are now unused.
      
      Cc: stable@vger.kernel.org
      Fixes: 563a225c ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      a9ce3853
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20230914' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 02e768c9
      Linus Torvalds authored
      Pull selinux fix from Paul Moore:
       "A relatively small SELinux patch to fix an issue with a
        vfs/LSM/SELinux patch that went upstream during the recent merge
        window.
      
        The short version is that the original patch changed how we
        initialized mount options to resolve a NFS issue and we inadvertently
        broke a use case due to the changed behavior.
      
        The fix restores this behavior for the cases that require it while
        keeping the original NFS fix in place"
      
      * tag 'selinux-pr-20230914' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: fix handling of empty opts in selinux_fs_context_submount()
      02e768c9