1. 22 Aug, 2024 3 commits
  2. 19 Aug, 2024 2 commits
  3. 16 Aug, 2024 1 commit
    • Li Lingfeng's avatar
      block: Fix lockdep warning in blk_mq_mark_tag_wait · b313a8c8
      Li Lingfeng authored
      Lockdep reported a warning in Linux version 6.6:
      
      [  414.344659] ================================
      [  414.345155] WARNING: inconsistent lock state
      [  414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted
      [  414.346221] --------------------------------
      [  414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
      [  414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes:
      [  414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
      [  414.351204] {IN-SOFTIRQ-W} state was registered at:
      [  414.351751]   lock_acquire+0x18d/0x460
      [  414.352218]   _raw_spin_lock_irqsave+0x39/0x60
      [  414.352769]   __wake_up_common_lock+0x22/0x60
      [  414.353289]   sbitmap_queue_wake_up+0x375/0x4f0
      [  414.353829]   sbitmap_queue_clear+0xdd/0x270
      [  414.354338]   blk_mq_put_tag+0xdf/0x170
      [  414.354807]   __blk_mq_free_request+0x381/0x4d0
      [  414.355335]   blk_mq_free_request+0x28b/0x3e0
      [  414.355847]   __blk_mq_end_request+0x242/0xc30
      [  414.356367]   scsi_end_request+0x2c1/0x830
      [  414.345155] WARNING: inconsistent lock state
      [  414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted
      [  414.346221] --------------------------------
      [  414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
      [  414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes:
      [  414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
      [  414.351204] {IN-SOFTIRQ-W} state was registered at:
      [  414.351751]   lock_acquire+0x18d/0x460
      [  414.352218]   _raw_spin_lock_irqsave+0x39/0x60
      [  414.352769]   __wake_up_common_lock+0x22/0x60
      [  414.353289]   sbitmap_queue_wake_up+0x375/0x4f0
      [  414.353829]   sbitmap_queue_clear+0xdd/0x270
      [  414.354338]   blk_mq_put_tag+0xdf/0x170
      [  414.354807]   __blk_mq_free_request+0x381/0x4d0
      [  414.355335]   blk_mq_free_request+0x28b/0x3e0
      [  414.355847]   __blk_mq_end_request+0x242/0xc30
      [  414.356367]   scsi_end_request+0x2c1/0x830
      [  414.356863]   scsi_io_completion+0x177/0x1610
      [  414.357379]   scsi_complete+0x12f/0x260
      [  414.357856]   blk_complete_reqs+0xba/0xf0
      [  414.358338]   __do_softirq+0x1b0/0x7a2
      [  414.358796]   irq_exit_rcu+0x14b/0x1a0
      [  414.359262]   sysvec_call_function_single+0xaf/0xc0
      [  414.359828]   asm_sysvec_call_function_single+0x1a/0x20
      [  414.360426]   default_idle+0x1e/0x30
      [  414.360873]   default_idle_call+0x9b/0x1f0
      [  414.361390]   do_idle+0x2d2/0x3e0
      [  414.361819]   cpu_startup_entry+0x55/0x60
      [  414.362314]   start_secondary+0x235/0x2b0
      [  414.362809]   secondary_startup_64_no_verify+0x18f/0x19b
      [  414.363413] irq event stamp: 428794
      [  414.363825] hardirqs last  enabled at (428793): [<ffffffff816bfd1c>] ktime_get+0x1dc/0x200
      [  414.364694] hardirqs last disabled at (428794): [<ffffffff85470177>] _raw_spin_lock_irq+0x47/0x50
      [  414.365629] softirqs last  enabled at (428444): [<ffffffff85474780>] __do_softirq+0x540/0x7a2
      [  414.366522] softirqs last disabled at (428419): [<ffffffff813f65ab>] irq_exit_rcu+0x14b/0x1a0
      [  414.367425]
                     other info that might help us debug this:
      [  414.368194]  Possible unsafe locking scenario:
      [  414.368900]        CPU0
      [  414.369225]        ----
      [  414.369548]   lock(&sbq->ws[i].wait);
      [  414.370000]   <Interrupt>
      [  414.370342]     lock(&sbq->ws[i].wait);
      [  414.370802]
                      *** DEADLOCK ***
      [  414.371569] 5 locks held by kworker/u10:3/1152:
      [  414.372088]  #0: ffff88810130e938 ((wq_completion)writeback){+.+.}-{0:0}, at: process_scheduled_works+0x357/0x13f0
      [  414.373180]  #1: ffff88810201fdb8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x3a3/0x13f0
      [  414.374384]  #2: ffffffff86ffbdc0 (rcu_read_lock){....}-{1:2}, at: blk_mq_run_hw_queue+0x637/0xa00
      [  414.375342]  #3: ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0
      [  414.376377]  #4: ffff888106205a08 (&hctx->dispatch_wait_lock){+.-.}-{2:2}, at: blk_mq_dispatch_rq_list+0x1337/0x1ee0
      [  414.378607]
                     stack backtrace:
      [  414.379177] CPU: 0 PID: 1152 Comm: kworker/u10:3 Not tainted 6.6.0-07439-gba2303cacfda #6
      [  414.380032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [  414.381177] Workqueue: writeback wb_workfn (flush-253:0)
      [  414.381805] Call Trace:
      [  414.382136]  <TASK>
      [  414.382429]  dump_stack_lvl+0x91/0xf0
      [  414.382884]  mark_lock_irq+0xb3b/0x1260
      [  414.383367]  ? __pfx_mark_lock_irq+0x10/0x10
      [  414.383889]  ? stack_trace_save+0x8e/0xc0
      [  414.384373]  ? __pfx_stack_trace_save+0x10/0x10
      [  414.384903]  ? graph_lock+0xcf/0x410
      [  414.385350]  ? save_trace+0x3d/0xc70
      [  414.385808]  mark_lock.part.20+0x56d/0xa90
      [  414.386317]  mark_held_locks+0xb0/0x110
      [  414.386791]  ? __pfx_do_raw_spin_lock+0x10/0x10
      [  414.387320]  lockdep_hardirqs_on_prepare+0x297/0x3f0
      [  414.387901]  ? _raw_spin_unlock_irq+0x28/0x50
      [  414.388422]  trace_hardirqs_on+0x58/0x100
      [  414.388917]  _raw_spin_unlock_irq+0x28/0x50
      [  414.389422]  __blk_mq_tag_busy+0x1d6/0x2a0
      [  414.389920]  __blk_mq_get_driver_tag+0x761/0x9f0
      [  414.390899]  blk_mq_dispatch_rq_list+0x1780/0x1ee0
      [  414.391473]  ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10
      [  414.392070]  ? sbitmap_get+0x2b8/0x450
      [  414.392533]  ? __blk_mq_get_driver_tag+0x210/0x9f0
      [  414.393095]  __blk_mq_sched_dispatch_requests+0xd99/0x1690
      [  414.393730]  ? elv_attempt_insert_merge+0x1b1/0x420
      [  414.394302]  ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10
      [  414.394970]  ? lock_acquire+0x18d/0x460
      [  414.395456]  ? blk_mq_run_hw_queue+0x637/0xa00
      [  414.395986]  ? __pfx_lock_acquire+0x10/0x10
      [  414.396499]  blk_mq_sched_dispatch_requests+0x109/0x190
      [  414.397100]  blk_mq_run_hw_queue+0x66e/0xa00
      [  414.397616]  blk_mq_flush_plug_list.part.17+0x614/0x2030
      [  414.398244]  ? __pfx_blk_mq_flush_plug_list.part.17+0x10/0x10
      [  414.398897]  ? writeback_sb_inodes+0x241/0xcc0
      [  414.399429]  blk_mq_flush_plug_list+0x65/0x80
      [  414.399957]  __blk_flush_plug+0x2f1/0x530
      [  414.400458]  ? __pfx___blk_flush_plug+0x10/0x10
      [  414.400999]  blk_finish_plug+0x59/0xa0
      [  414.401467]  wb_writeback+0x7cc/0x920
      [  414.401935]  ? __pfx_wb_writeback+0x10/0x10
      [  414.402442]  ? mark_held_locks+0xb0/0x110
      [  414.402931]  ? __pfx_do_raw_spin_lock+0x10/0x10
      [  414.403462]  ? lockdep_hardirqs_on_prepare+0x297/0x3f0
      [  414.404062]  wb_workfn+0x2b3/0xcf0
      [  414.404500]  ? __pfx_wb_workfn+0x10/0x10
      [  414.404989]  process_scheduled_works+0x432/0x13f0
      [  414.405546]  ? __pfx_process_scheduled_works+0x10/0x10
      [  414.406139]  ? do_raw_spin_lock+0x101/0x2a0
      [  414.406641]  ? assign_work+0x19b/0x240
      [  414.407106]  ? lock_is_held_type+0x9d/0x110
      [  414.407604]  worker_thread+0x6f2/0x1160
      [  414.408075]  ? __kthread_parkme+0x62/0x210
      [  414.408572]  ? lockdep_hardirqs_on_prepare+0x297/0x3f0
      [  414.409168]  ? __kthread_parkme+0x13c/0x210
      [  414.409678]  ? __pfx_worker_thread+0x10/0x10
      [  414.410191]  kthread+0x33c/0x440
      [  414.410602]  ? __pfx_kthread+0x10/0x10
      [  414.411068]  ret_from_fork+0x4d/0x80
      [  414.411526]  ? __pfx_kthread+0x10/0x10
      [  414.411993]  ret_from_fork_asm+0x1b/0x30
      [  414.412489]  </TASK>
      
      When interrupt is turned on while a lock holding by spin_lock_irq it
      throws a warning because of potential deadlock.
      
      blk_mq_prep_dispatch_rq
       blk_mq_get_driver_tag
        __blk_mq_get_driver_tag
         __blk_mq_alloc_driver_tag
          blk_mq_tag_busy -> tag is already busy
          // failed to get driver tag
       blk_mq_mark_tag_wait
        spin_lock_irq(&wq->lock) -> lock A (&sbq->ws[i].wait)
        __add_wait_queue(wq, wait) -> wait queue active
        blk_mq_get_driver_tag
        __blk_mq_tag_busy
      -> 1) tag must be idle, which means there can't be inflight IO
         spin_lock_irq(&tags->lock) -> lock B (hctx->tags)
         spin_unlock_irq(&tags->lock) -> unlock B, turn on interrupt accidentally
      -> 2) context must be preempt by IO interrupt to trigger deadlock.
      
      As shown above, the deadlock is not possible in theory, but the warning
      still need to be fixed.
      
      Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq.
      
      Fixes: 4f1731df ("blk-mq: fix potential io hang by wrong 'wake_batch'")
      Signed-off-by: default avatarLi Lingfeng <lilingfeng3@huawei.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Link: https://lore.kernel.org/r/20240815024736.2040971-1-lilingfeng@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b313a8c8
  4. 15 Aug, 2024 2 commits
  5. 12 Aug, 2024 2 commits
    • Stefan Haberland's avatar
      s390/dasd: fix error recovery leading to data corruption on ESE devices · 7db40423
      Stefan Haberland authored
      Extent Space Efficient (ESE) or thin provisioned volumes need to be
      formatted on demand during usual IO processing.
      
      The dasd_ese_needs_format function checks for error codes that signal
      the non existence of a proper track format.
      
      The check for incorrect length is to imprecise since other error cases
      leading to transport of insufficient data also have this flag set.
      This might lead to data corruption in certain error cases for example
      during a storage server warmstart.
      
      Fix by removing the check for incorrect length and replacing by
      explicitly checking for invalid track format in transport mode.
      
      Also remove the check for file protected since this is not a valid
      ESE handling case.
      
      Cc: stable@vger.kernel.org # 5.3+
      Fixes: 5e2b17e7 ("s390/dasd: Add dynamic formatting support for ESE volumes")
      Reviewed-by: default avatarJan Hoeppner <hoeppner@linux.ibm.com>
      Signed-off-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240812125733.126431-3-sth@linux.ibm.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7db40423
    • Eric Farman's avatar
      s390/dasd: Remove DMA alignment · 2a07bb64
      Eric Farman authored
      This reverts commit bc792884 ("s390/dasd: Establish DMA alignment").
      
      Quoting the original commit:
          linux-next commit bf8d0853 ("iomap: add support for dma aligned
          direct-io") changes the alignment requirement to come from the block
          device rather than the block size, and the default alignment
          requirement is 512-byte boundaries. Since DASD I/O has page
          alignments for IDAW/TIDAW requests, let's override this value to
          restore the expected behavior.
      
      I mentioned TIDAW, but that was wrong. TIDAWs have no distinct alignment
      requirement (per p. 15-70 of POPS SA22-7832-13):
      
         Unless otherwise specified, TIDAWs may designate
         a block of main storage on any boundary and length
         up to 4K bytes, provided the specified block does not
         cross a 4 K-byte boundary.
      
      IDAWs do, but the original commit neglected that while ECKD DASD are
      typically formatted in 4096-byte blocks, they don't HAVE to be. Formatting
      an ECKD volume with smaller blocks is permitted (dasdfmt -b xxx), and the
      problematic commit enforces alignment properties to such a device that
      will result in errors, such as:
      
         [test@host ~]# lsdasd -l a367 | grep blksz
           blksz:				512
         [test@host ~]# mkfs.xfs -f /dev/disk/by-path/ccw-0.0.a367-part1
         meta-data=/dev/dasdc1            isize=512    agcount=4, agsize=230075 blks
                  =                       sectsz=512   attr=2, projid32bit=1
                  =                       crc=1        finobt=1, sparse=1, rmapbt=1
                  =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
         data     =                       bsize=4096   blocks=920299, imaxpct=25
                  =                       sunit=0      swidth=0 blks
         naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
         log      =internal log           bsize=4096   blocks=16384, version=2
                  =                       sectsz=512   sunit=0 blks, lazy-count=1
         realtime =none                   extsz=4096   blocks=0, rtextents=0
         error reading existing superblock: Invalid argument
         mkfs.xfs: pwrite failed: Invalid argument
         libxfs_bwrite: write failed on (unknown) bno 0x70565c/0x100, err=22
         mkfs.xfs: Releasing dirty buffer to free list!
         found dirty buffer (bulk) on free list!
         mkfs.xfs: pwrite failed: Invalid argument
         ...snipped...
      
      The original commit omitted the FBA discipline for just this reason,
      but the formatted block size of the other disciplines was overlooked.
      The solution to all of this is to revert to the original behavior,
      such that the block size can be respected. There were two commits [1]
      that moved this code in the interim, so a straight git-revert is not
      possible, but the change is straightforward.
      
      But what of the original problem? That was manifested with a direct-io
      QEMU guest, where QEMU itself was changed a month or two later with
      commit 25474d90aa ("block: use the request length for iov alignment")
      such that the blamed kernel commit is unnecessary.
      
      [1] commit 0127a47f ("dasd: move queue setup to common code")
          commit fde07a4d ("dasd: use the atomic queue limits API")
      
      Fixes: bc792884 ("s390/dasd: Establish DMA alignment")
      Reviewed-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Signed-off-by: default avatarEric Farman <farman@linux.ibm.com>
      Signed-off-by: default avatarStefan Haberland <sth@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240812125733.126431-2-sth@linux.ibm.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2a07bb64
  6. 08 Aug, 2024 1 commit
  7. 31 Jul, 2024 3 commits
  8. 29 Jul, 2024 1 commit
  9. 27 Jul, 2024 1 commit
  10. 26 Jul, 2024 1 commit
    • Jens Axboe's avatar
      Merge tag 'nvme-6.11-2024-07-26' of git://git.infradead.org/nvme into block-6.11 · f6bb5254
      Jens Axboe authored
      Pull NVMe fixes from Keith:
      
      "nvme fixes for Linux 6.11
      
       - Fix request without payloads cleanup  (Leon)
       - Use new protection information format (Francis)
       - Improved debug message for lost pci link (Bart)
       - Another apst quirk (Wang)
       - Use appropriate sysfs api for printing chars (Markus)"
      
      * tag 'nvme-6.11-2024-07-26' of git://git.infradead.org/nvme:
        nvme-pci: add missing condition check for existence of mapped data
        nvme-core: choose PIF from QPIF if QPIFS supports and PIF is QTYPE
        nvme-pci: Fix the instructions for disabling power management
        nvme: remove redundant bdev local variable
        nvme-fabrics: Use seq_putc() in __nvmf_concat_opt_tokens()
        nvme/pci: Add APST quirk for Lenovo N60z laptop
      f6bb5254
  11. 25 Jul, 2024 1 commit
  12. 24 Jul, 2024 2 commits
    • Ming Lei's avatar
      ublk: fix UBLK_CMD_DEL_DEV_ASYNC handling · 55fbb9a5
      Ming Lei authored
      In ublk_ctrl_uring_cmd(), ioctl command NR should be used for
      matching _IOC_NR(cmd_op).
      
      Fix it by adding one private macro, and this way is clean.
      
      Fixes: 13fe8e68 ("ublk: add UBLK_CMD_DEL_DEV_ASYNC")
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20240724143311.2646330-1-ming.lei@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      55fbb9a5
    • Yang Yang's avatar
      block: fix deadlock between sd_remove & sd_release · 7e04da2d
      Yang Yang authored
      Our test report the following hung task:
      
      [ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds.
      [ 2538.459427] Call trace:
      [ 2538.459430]  __switch_to+0x174/0x338
      [ 2538.459436]  __schedule+0x628/0x9c4
      [ 2538.459442]  schedule+0x7c/0xe8
      [ 2538.459447]  schedule_preempt_disabled+0x24/0x40
      [ 2538.459453]  __mutex_lock+0x3ec/0xf04
      [ 2538.459456]  __mutex_lock_slowpath+0x14/0x24
      [ 2538.459459]  mutex_lock+0x30/0xd8
      [ 2538.459462]  del_gendisk+0xdc/0x350
      [ 2538.459466]  sd_remove+0x30/0x60
      [ 2538.459470]  device_release_driver_internal+0x1c4/0x2c4
      [ 2538.459474]  device_release_driver+0x18/0x28
      [ 2538.459478]  bus_remove_device+0x15c/0x174
      [ 2538.459483]  device_del+0x1d0/0x358
      [ 2538.459488]  __scsi_remove_device+0xa8/0x198
      [ 2538.459493]  scsi_forget_host+0x50/0x70
      [ 2538.459497]  scsi_remove_host+0x80/0x180
      [ 2538.459502]  usb_stor_disconnect+0x68/0xf4
      [ 2538.459506]  usb_unbind_interface+0xd4/0x280
      [ 2538.459510]  device_release_driver_internal+0x1c4/0x2c4
      [ 2538.459514]  device_release_driver+0x18/0x28
      [ 2538.459518]  bus_remove_device+0x15c/0x174
      [ 2538.459523]  device_del+0x1d0/0x358
      [ 2538.459528]  usb_disable_device+0x84/0x194
      [ 2538.459532]  usb_disconnect+0xec/0x300
      [ 2538.459537]  hub_event+0xb80/0x1870
      [ 2538.459541]  process_scheduled_works+0x248/0x4dc
      [ 2538.459545]  worker_thread+0x244/0x334
      [ 2538.459549]  kthread+0x114/0x1bc
      
      [ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds.
      [ 2538.461014] Call trace:
      [ 2538.461016]  __switch_to+0x174/0x338
      [ 2538.461021]  __schedule+0x628/0x9c4
      [ 2538.461025]  schedule+0x7c/0xe8
      [ 2538.461030]  blk_queue_enter+0xc4/0x160
      [ 2538.461034]  blk_mq_alloc_request+0x120/0x1d4
      [ 2538.461037]  scsi_execute_cmd+0x7c/0x23c
      [ 2538.461040]  ioctl_internal_command+0x5c/0x164
      [ 2538.461046]  scsi_set_medium_removal+0x5c/0xb0
      [ 2538.461051]  sd_release+0x50/0x94
      [ 2538.461054]  blkdev_put+0x190/0x28c
      [ 2538.461058]  blkdev_release+0x28/0x40
      [ 2538.461063]  __fput+0xf8/0x2a8
      [ 2538.461066]  __fput_sync+0x28/0x5c
      [ 2538.461070]  __arm64_sys_close+0x84/0xe8
      [ 2538.461073]  invoke_syscall+0x58/0x114
      [ 2538.461078]  el0_svc_common+0xac/0xe0
      [ 2538.461082]  do_el0_svc+0x1c/0x28
      [ 2538.461087]  el0_svc+0x38/0x68
      [ 2538.461090]  el0t_64_sync_handler+0x68/0xbc
      [ 2538.461093]  el0t_64_sync+0x1a8/0x1ac
      
        T1:				T2:
        sd_remove
        del_gendisk
        __blk_mark_disk_dead
        blk_freeze_queue_start
        ++q->mq_freeze_depth
        				bdev_release
       				mutex_lock(&disk->open_mutex)
        				sd_release
       				scsi_execute_cmd
       				blk_queue_enter
       				wait_event(!q->mq_freeze_depth)
        mutex_lock(&disk->open_mutex)
      
      SCSI does not set GD_OWNS_QUEUE, so QUEUE_FLAG_DYING is not set in
      this scenario. This is a classic ABBA deadlock. To fix the deadlock,
      make sure we don't try to acquire disk->open_mutex after freezing
      the queue.
      
      Cc: stable@vger.kernel.org
      Fixes: eec1be4c ("block: delete partitions later in del_gendisk")
      Signed-off-by: default avatarYang Yang <yang.yang@vivo.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Fixes: and Cc: stable tags are missing. Otherwise this patch looks fine
      Link: https://lore.kernel.org/r/20240724070412.22521-1-yang.yang@vivo.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7e04da2d
  13. 23 Jul, 2024 1 commit
  14. 22 Jul, 2024 14 commits
    • Linus Torvalds's avatar
      Merge tag 'irq-msi-2024-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 66ebbdfd
      Linus Torvalds authored
      Pull MSI interrupt updates from Thomas Gleixner:
       "Switch ARM/ARM64 over to the modern per device MSI domains.
      
        This simplifies the handling of platform MSI and wire to MSI
        controllers and removes about 500 lines of legacy code.
      
        Aside of that it paves the way for ARM/ARM64 to utilize the dynamic
        allocation of PCI/MSI interrupts and to support the upcoming non
        standard IMS (Interrupt Message Store) mechanism on PCIe devices"
      
      * tag 'irq-msi-2024-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
        irqchip/gic-v3-its: Correctly fish out the DID for platform MSI
        irqchip/gic-v3-its: Correctly honor the RID remapping
        genirq/msi: Move msi_device_data to core
        genirq/msi: Remove platform MSI leftovers
        irqchip/irq-mvebu-icu: Remove platform MSI leftovers
        irqchip/irq-mvebu-sei: Switch to MSI parent
        irqchip/mvebu-odmi: Switch to parent MSI
        irqchip/mvebu-gicp: Switch to MSI parent
        irqchip/irq-mvebu-icu: Prepare for real per device MSI
        irqchip/imx-mu-msi: Switch to MSI parent
        irqchip/gic-v2m: Switch to device MSI
        irqchip/gic_v3_mbi: Switch over to parent domain
        genirq/msi: Remove platform_msi_create_device_domain()
        irqchip/mbigen: Remove platform_msi_create_device_domain() fallback
        irqchip/gic-v3-its: Switch platform MSI to MSI parent
        irqchip/irq-msi-lib: Prepare for DOMAIN_BUS_WIRED_TO_MSI
        irqchip/mbigen: Prepare for real per device MSI
        irqchip/irq-msi-lib: Prepare for DEVICE MSI to replace platform MSI
        irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]
        irqchip/irq-msi-lib: Prepare for PCI MSI/MSIX
        ...
      66ebbdfd
    • Linus Torvalds's avatar
      Merge tag 'irq-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ac7473a1
      Linus Torvalds authored
      Pull interrupt subsystem updates from Thomas Gleixner:
       "Core:
      
         - Provide a new mechanism to create interrupt domains. The existing
           interfaces have already too many parameters and it's a pain to
           expand any of this for new required functionality.
      
           The new function takes a pointer to a data structure as argument.
           The data structure combines all existing parameters and allows for
           easy extension.
      
           The first extension for this is to handle the instantiation of
           generic interrupt chips at the core level and to allow drivers to
           provide extra init/exit callbacks.
      
           This is necessary to do the full interrupt chip initialization
           before the new domain is published, so that concurrent usage sites
           won't see a half initialized interrupt domain. Similar problems
           exist on teardown.
      
           This has turned out to be a real problem due to the deferred and
           parallel probing which was added in recent years.
      
           Handling this at the core level allows to remove quite some accrued
           boilerplate code in existing drivers and avoids horrible
           workarounds at the driver level.
      
         - The usual small improvements all over the place
      
        Drivers:
      
         - Add support for LAN966x OIC and RZ/Five SoC
      
         - Split the STM ExtI driver into a microcontroller and a SMP version
           to allow building the latter as a module for multi-platform
           kernels
      
         - Enable MSI support for Armada 370XP on platforms which do not
           support IPIs
      
         - The usual small fixes and enhancements all over the place"
      
      * tag 'irq-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
        irqdomain: Fix the kernel-doc and plug it into Documentation
        genirq: Set IRQF_COND_ONESHOT in request_irq()
        irqchip/imx-irqsteer: Handle runtime power management correctly
        irqchip/gic-v3: Pass #redistributor-regions to gic_of_setup_kvm_info()
        irqchip/bcm2835: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
        irqchip/gic-v4: Make sure a VPE is locked when VMAPP is issued
        irqchip/gic-v4: Substitute vmovp_lock for a per-VM lock
        irqchip/gic-v4: Always configure affinity on VPE activation
        Revert "irqchip/dw-apb-ictl: Support building as module"
        Revert "Loongarch: Support loongarch avec"
        arm64: Kconfig: Allow build irq-stm32mp-exti driver as module
        ARM: stm32: Allow build irq-stm32mp-exti driver as module
        irqchip/stm32mp-exti: Allow building as module
        irqchip/stm32mp-exti: Rename internal symbols
        irqchip/stm32-exti: Split MCU and MPU code
        arm64: Kconfig: Select STM32MP_EXTI on STM32 platforms
        ARM: stm32: Use different EXTI driver on ARMv7m and ARMv7a
        irqchip/stm32-exti: Add CONFIG_STM32MP_EXTI
        irqchip/dw-apb-ictl: Support building as module
        irqchip/riscv-aplic: Simplify the initialization code
        ...
      ac7473a1
    • Linus Torvalds's avatar
      Merge tag 'loongarch-6.11' of... · a362ade8
      Linus Torvalds authored
      Merge tag 'loongarch-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch updates from Huacai Chen:
      
       - Define __ARCH_WANT_NEW_STAT in unistd.h
      
       - Always enumerate MADT and setup logical-physical CPU mapping
      
       - Add irq_work support via self IPIs
      
       - Add RANDOMIZE_KSTACK_OFFSET support
      
       - Add ARCH_HAS_PTE_DEVMAP support
      
       - Add ARCH_HAS_DEBUG_VM_PGTABLE support
      
       - Add writecombine support for DMW-based ioremap()
      
       - Add architectural preparation for CPUFreq
      
       - Add ACPI standard hardware register based S3 support
      
       - Add support for relocating the kernel with RELR relocation
      
       - Some bug fixes and other small changes
      
      * tag 'loongarch-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: Make the users of larch_insn_gen_break() constant
        LoongArch: Check TIF_LOAD_WATCH to enable user space watchpoint
        LoongArch: Use rustc option -Zdirect-access-external-data
        LoongArch: Add support for relocating the kernel with RELR relocation
        LoongArch: Remove a redundant checking in relocator
        LoongArch: Use correct API to map cmdline in relocate_kernel()
        LoongArch: Automatically disable KASLR for hibernation
        LoongArch: Add ACPI standard hardware register based S3 support
        LoongArch: Add architectural preparation for CPUFreq
        LoongArch: Add writecombine support for DMW-based ioremap()
        LoongArch: Add ARCH_HAS_DEBUG_VM_PGTABLE support
        LoongArch: Add ARCH_HAS_PTE_DEVMAP support
        LoongArch: Add RANDOMIZE_KSTACK_OFFSET support
        LoongArch: Add irq_work support via self IPIs
        LoongArch: Always enumerate MADT and setup logical-physical CPU mapping
        LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
      a362ade8
    • Linus Torvalds's avatar
      Merge tag 'thermal-6.11-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 539fbb91
      Linus Torvalds authored
      Pull thermal control fix from Rafael Wysocki:
       "Fix a flood of kernel messages coming from the thermal core on systems
        where iwlwifi is loaded, but the network interfaces controlled by it
        are down (Rafael Wysocki)"
      
      * tag 'thermal-6.11-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal: core: Allow thermal zones to tell the core to ignore them
      539fbb91
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.11-20240722' of git://git.kernel.dk/linux · 9deed1d5
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Two minor fixes in here, both heading to stable. In detail:
      
         - Fix error where forced async uring_cmd getsockopt returns the wrong
           value on execution, leading to it never being completed (Pavel)
      
         - Fix io_alloc_pbuf_ring() using a NULL check rather than IS_ERR
           (Pavel)"
      
      * tag 'io_uring-6.11-20240722' of git://git.kernel.dk/linux:
        io_uring: fix error pbuf checking
        io_uring: fix lost getsockopt completions
      9deed1d5
    • Linus Torvalds's avatar
      Merge tag 'for-6.11/block-20240722' of git://git.kernel.dk/linux · 7d080fa8
      Linus Torvalds authored
      Pull more block updates from Jens Axboe:
      
       - MD fixes via Song:
           - md-cluster fixes (Heming Zhao)
           - raid1 fix (Mateusz Jończyk)
      
       - s390/dasd module description (Jeff)
      
       - Series cleaning up and hardening the blk-mq debugfs flag handling
         (John, Christoph)
      
       - blk-cgroup cleanup (Xiu)
      
       - Error polled IO attempts if backend doesn't support it (hexue)
      
       - Fix for an sbitmap hang (Yang)
      
      * tag 'for-6.11/block-20240722' of git://git.kernel.dk/linux: (23 commits)
        blk-cgroup: move congestion_count to struct blkcg
        sbitmap: fix io hung due to race on sbitmap_word::cleared
        block: avoid polling configuration errors
        block: Catch possible entries missing from rqf_name[]
        block: Simplify definition of RQF_NAME()
        block: Use enum to define RQF_x bit indexes
        block: Catch possible entries missing from cmd_flag_name[]
        block: Catch possible entries missing from alloc_policy_name[]
        block: Catch possible entries missing from hctx_flag_name[]
        block: Catch possible entries missing from hctx_state_name[]
        block: Catch possible entries missing from blk_queue_flag_name[]
        block: Make QUEUE_FLAG_x as an enum
        block: Relocate BLK_MQ_MAX_DEPTH
        block: Relocate BLK_MQ_CPU_WORK_BATCH
        block: remove QUEUE_FLAG_STOPPED
        block: Add missing entry to hctx_flag_name[]
        block: Add zone write plugging entry to rqf_name[]
        block: Add missing entries from cmd_flag_name[]
        s390/dasd: fix error checks in dasd_copy_pair_store()
        s390/dasd: add missing MODULE_DESCRIPTION() macros
        ...
      7d080fa8
    • Linus Torvalds's avatar
      Merge tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux · 02569948
      Linus Torvalds authored
      Pull block integrity mapping updates from Jens Axboe:
       "A set of cleanups and fixes for the block integrity support.
      
        Sent separately from the main block changes from last week, as they
        depended on later fixes in the 6.10-rc cycle"
      
      * tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux:
        block: don't free the integrity payload in bio_integrity_unmap_free_user
        block: don't free submitter owned integrity payload on I/O completion
        block: call bio_integrity_unmap_free_user from blk_rq_unmap_user
        block: don't call bio_uninit from bio_endio
        block: also return bio_integrity_payload * from stubs
        block: split integrity support out of bio.h
      02569948
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-07-22' of https://evilpiepirate.org/git/bcachefs · dd018c23
      Linus Torvalds authored
      Pull bcachefs fixes from Kent Overstreet:
      
       - another fix for fsck getting stuck, from marcin
      
       - small syzbot fix
      
       - another undefined shift fix
      
      * tag 'bcachefs-2024-07-22' of https://evilpiepirate.org/git/bcachefs:
        bcachefs: Fix printbuf usage while atomic
        bcachefs: More informative error message in reattach_inode()
        bcachefs: kill btree_trans_too_many_iters() in bch2_bucket_alloc_freelist()
        bcachefs: mean_and_variance: Avoid too-large shift amounts
      dd018c23
    • Linus Torvalds's avatar
      Merge tag 'ntfs3_for_6.11' of https://github.com/Paragon-Software-Group/linux-ntfs3 · 5ea6d724
      Linus Torvalds authored
      Pull ntfs3 updates from Konstantin Komarov:
       "New code:
         - simple fileattr support
      
        Fixes:
         - transform resident to nonresident for compressed files
         - the format of the "nocase" mount option
         - getting file type
         - many other internal bugs
      
        Refactoring:
         - remove unused functions and macros
         - partial transition from page to folio (suggested by Matthew Wilcox)
         - legacy ntfs support"
      
      * tag 'ntfs3_for_6.11' of https://github.com/Paragon-Software-Group/linux-ntfs3: (42 commits)
        fs/ntfs3: Fix formatting, change comments, renaming
        fs/ntfs3: Update log->page_{mask,bits} if log->page_size changed
        fs/ntfs3: Implement simple fileattr
        fs/ntfs3: Redesign legacy ntfs support
        fs/ntfs3: Use function file_inode to get inode from file
        fs/ntfs3: Minor ntfs_list_ea refactoring
        fs/ntfs3: Check more cases when directory is corrupted
        fs/ntfs3: Do copy_to_user out of run_lock
        fs/ntfs3: Keep runs for $MFT::$ATTR_DATA and $MFT::$ATTR_BITMAP
        fs/ntfs3: Missed error return
        fs/ntfs3: Fix the format of the "nocase" mount option
        fs/ntfs3: Fix field-spanning write in INDEX_HDR
        ntfs3: Convert attr_wof_frame_info() to use a folio
        ntfs3: Convert ni_readpage_cmpr() to take a folio
        ntfs3: Convert ntfs_get_frame_pages() to use a folio
        ntfs3: Remove calls to set/clear the error flag
        ntfs3: Convert attr_make_nonresident to use a folio
        ntfs3: Convert attr_data_write_resident to use a folio
        ntfs3: Convert ntfs_write_end() to work on a folio
        ntfs3: Convert attr_data_read_resident() to take a folio
        ...
      5ea6d724
    • Kent Overstreet's avatar
      bcachefs: Fix printbuf usage while atomic · 737759fc
      Kent Overstreet authored
      Reported-by: syzbot+f765e51170cf13493f0b@syzkaller.appspotmail.com
      Fixes: f12410bb ("bcachefs: Add an error message for insufficient rw journal devs")
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      737759fc
    • Kent Overstreet's avatar
    • Linus Torvalds's avatar
      Merge tag '6.11-rc-smb3-server-fixes' of git://git.samba.org/ksmbd · 93306970
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
      
       - two durable handle improvements
      
       - two small cleanup patches
      
      * tag '6.11-rc-smb3-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: add durable scavenger timer
        ksmbd: avoid reclaiming expired durable opens by the client
        ksmbd: Constify struct ksmbd_transport_ops
        ksmbd: remove duplicate SMB2 Oplock levels definitions
      93306970
    • Linus Torvalds's avatar
      Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of... · 527eff22
      Linus Torvalds authored
      Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull non-MM updates from Andrew Morton:
      
       - In the series "treewide: Refactor heap related implementation",
         Kuan-Wei Chiu has significantly reworked the min_heap library code
         and has taught bcachefs to use the new more generic implementation.
      
       - Yury Norov's series "Cleanup cpumask.h inclusion in core headers"
         reworks the cpumask and nodemask headers to make things generally
         more rational.
      
       - Kuan-Wei Chiu has sent along some maintenance work against our
         sorting library code in the series "lib/sort: Optimizations and
         cleanups".
      
       - More library maintainance work from Christophe Jaillet in the series
         "Remove usage of the deprecated ida_simple_xx() API".
      
       - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the
         series "nilfs2: eliminate the call to inode_attach_wb()".
      
       - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix
         GDB command error".
      
       - Plus the usual shower of singleton patches all over the place. Please
         see the relevant changelogs for details.
      
      * tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (98 commits)
        ia64: scrub ia64 from poison.h
        watchdog/perf: properly initialize the turbo mode timestamp and rearm counter
        tsacct: replace strncpy() with strscpy()
        lib/bch.c: use swap() to improve code
        test_bpf: convert comma to semicolon
        init/modpost: conditionally check section mismatch to __meminit*
        init: remove unused __MEMINIT* macros
        nilfs2: Constify struct kobj_type
        nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro
        math: rational: add missing MODULE_DESCRIPTION() macro
        lib/zlib: add missing MODULE_DESCRIPTION() macro
        fs: ufs: add MODULE_DESCRIPTION()
        lib/rbtree.c: fix the example typo
        ocfs2: add bounds checking to ocfs2_check_dir_entry()
        fs: add kernel-doc comments to ocfs2_prepare_orphan_dir()
        coredump: simplify zap_process()
        selftests/fpu: add missing MODULE_DESCRIPTION() macro
        compiler.h: simplify data_race() macro
        build-id: require program headers to be right after ELF header
        resource: add missing MODULE_DESCRIPTION()
        ...
      527eff22
    • Linus Torvalds's avatar
      Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · fbc90c04
      Linus Torvalds authored
      Pull MM updates from Andrew Morton:
      
       - In the series "mm: Avoid possible overflows in dirty throttling" Jan
         Kara addresses a couple of issues in the writeback throttling code.
         These fixes are also targetted at -stable kernels.
      
       - Ryusuke Konishi's series "nilfs2: fix potential issues related to
         reserved inodes" does that. This should actually be in the
         mm-nonmm-stable tree, along with the many other nilfs2 patches. My
         bad.
      
       - More folio conversions from Kefeng Wang in the series "mm: convert to
         folio_alloc_mpol()"
      
       - Kemeng Shi has sent some cleanups to the writeback code in the series
         "Add helper functions to remove repeated code and improve readability
         of cgroup writeback"
      
       - Kairui Song has made the swap code a little smaller and a little
         faster in the series "mm/swap: clean up and optimize swap cache
         index".
      
       - In the series "mm/memory: cleanly support zeropage in
         vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
         Hildenbrand has reworked the rather sketchy handling of the use of
         the zeropage in MAP_SHARED mappings. I don't see any runtime effects
         here - more a cleanup/understandability/maintainablity thing.
      
       - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling
         of higher addresses, for aarch64. The (poorly named) series is
         "Restructure va_high_addr_switch".
      
       - The core TLB handling code gets some cleanups and possible slight
         optimizations in Bang Li's series "Add update_mmu_tlb_range() to
         simplify code".
      
       - Jane Chu has improved the handling of our
         fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in
         the series "Enhance soft hwpoison handling and injection".
      
       - Jeff Johnson has sent a billion patches everywhere to add
         MODULE_DESCRIPTION() to everything. Some landed in this pull.
      
       - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang
         has simplified migration's use of hardware-offload memory copying.
      
       - Yosry Ahmed performs more folio API conversions in his series "mm:
         zswap: trivial folio conversions".
      
       - In the series "large folios swap-in: handle refault cases first",
         Chuanhua Han inches us forward in the handling of large pages in the
         swap code. This is a cleanup and optimization, working toward the end
         objective of full support of large folio swapin/out.
      
       - In the series "mm,swap: cleanup VMA based swap readahead window
         calculation", Huang Ying has contributed some cleanups and a possible
         fixlet to his VMA based swap readahead code.
      
       - In the series "add mTHP support for anonymous shmem" Baolin Wang has
         taught anonymous shmem mappings to use multisize THP. By default this
         is a no-op - users must opt in vis sysfs controls. Dramatic
         improvements in pagefault latency are realized.
      
       - David Hildenbrand has some cleanups to our remaining use of
         page_mapcount() in the series "fs/proc: move page_mapcount() to
         fs/proc/internal.h".
      
       - David also has some highmem accounting cleanups in the series
         "mm/highmem: don't track highmem pages manually".
      
       - Build-time fixes and cleanups from John Hubbard in the series
         "cleanups, fixes, and progress towards avoiding "make headers"".
      
       - Cleanups and consolidation of the core pagemap handling from Barry
         Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
         and utilize them".
      
       - Lance Yang's series "Reclaim lazyfree THP without splitting" has
         reduced the latency of the reclaim of pmd-mapped THPs under fairly
         common circumstances. A 10x speedup is seen in a microbenchmark.
      
         It does this by punting to aother CPU but I guess that's a win unless
         all CPUs are pegged.
      
       - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
         "mm/hugetlb_cgroup: rework on cftypes".
      
       - Miaohe Lin's series "Some cleanups for memory-failure" does just that
         thing.
      
       - Someone other than SeongJae has developed a DAMON feature in Honggyu
         Kim's series "DAMON based tiered memory management for CXL memory".
         This adds DAMON features which may be used to help determine the
         efficiency of our placement of CXL/PCIe attached DRAM.
      
       - DAMON user API centralization and simplificatio work in SeongJae
         Park's series "mm/damon: introduce DAMON parameters online commit
         function".
      
       - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
         David Hildenbrand does some maintenance work on zsmalloc - partially
         modernizing its use of pageframe fields.
      
       - Kefeng Wang provides more folio conversions in the series "mm: remove
         page_maybe_dma_pinned() and page_mkclean()".
      
       - More cleanup from David Hildenbrand, this time in the series
         "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
         !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
         pages" and permits the removal of some virtio-mem hacks.
      
       - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
         __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
         preparation for mTHP (multisize THP) swapin.
      
       - Kefeng Wang's series "mm: improve clear and copy user folio"
         implements more folio conversions, this time in the area of large
         folio userspace copying.
      
       - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
         and community meetup series" tells people how to get better involved
         with other DAMON developers. From SeongJae Park.
      
       - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
         that.
      
       - David Hildenbrand sends along more cleanups, this time against the
         migration code. The series is "mm/migrate: move NUMA hinting fault
         folio isolation + checks under PTL".
      
       - Jan Kara has found quite a lot of strangenesses and minor errors in
         the readahead code. He addresses this in the series "mm: Fix various
         readahead quirks".
      
       - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
         {min,max}_nr_regions" adds features and addresses errors in DAMON's
         self testing code.
      
       - Gavin Shan has found a userspace-triggerable WARN in the pagecache
         code. The series "mm/filemap: Limit page cache size to that supported
         by xarray" addresses this. The series is marked cc:stable.
      
       - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
         and cleanup" cleans up and slightly optimizes KSM.
      
       - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
         code motion. The series (which also makes the memcg-v1 code
         Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put
         under config option" and "mm: memcg: put cgroup v1-specific memcg
         data under CONFIG_MEMCG_V1"
      
       - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
         adds an additional feature to this cgroup-v2 control file.
      
       - The series "Userspace controls soft-offline pages" from Jiaqi Yan
         permits userspace to stop the kernel's automatic treatment of
         excessive correctable memory errors. In order to permit userspace to
         monitor and handle this situation.
      
       - Kefeng Wang's series "mm: migrate: support poison recover from
         migrate folio" teaches the kernel to appropriately handle migration
         from poisoned source folios rather than simply panicing.
      
       - SeongJae Park's series "Docs/damon: minor fixups and improvements"
         does those things.
      
       - In the series "mm/zsmalloc: change back to per-size_class lock"
         Chengming Zhou improves zsmalloc's scalability and memory
         utilization.
      
       - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
         pinning memfd folios" makes the GUP code use FOLL_PIN rather than
         bare refcount increments. So these paes can first be moved aside if
         they reside in the movable zone or a CMA block.
      
       - Andrii Nakryiko has added a binary ioctl()-based API to
         /proc/pid/maps for much faster reading of vma information. The series
         is "query VMAs from /proc/<pid>/maps".
      
       - In the series "mm: introduce per-order mTHP split counters" Lance
         Yang improves the kernel's presentation of developer information
         related to multisize THP splitting.
      
       - Michael Ellerman has developed the series "Reimplement huge pages
         without hugepd on powerpc (8xx, e500, book3s/64)". This permits
         userspace to use all available huge page sizes.
      
       - In the series "revert unconditional slab and page allocator fault
         injection calls" Vlastimil Babka removes a performance-affecting and
         not very useful feature from slab fault injection.
      
      * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits)
        mm/mglru: fix ineffective protection calculation
        mm/zswap: fix a white space issue
        mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
        mm/hugetlb: fix possible recursive locking detected warning
        mm/gup: clear the LRU flag of a page before adding to LRU batch
        mm/numa_balancing: teach mpol_to_str about the balancing mode
        mm: memcg1: convert charge move flags to unsigned long long
        alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting
        lib: reuse page_ext_data() to obtain codetag_ref
        lib: add missing newline character in the warning message
        mm/mglru: fix overshooting shrinker memory
        mm/mglru: fix div-by-zero in vmpressure_calc_level()
        mm/kmemleak: replace strncpy() with strscpy()
        mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC
        mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
        mm: ignore data-race in __swap_writepage
        hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr
        mm: shmem: rename mTHP shmem counters
        mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async()
        mm/migrate: putback split folios when numa hint migration fails
        ...
      fbc90c04
  15. 21 Jul, 2024 3 commits
    • Linus Torvalds's avatar
      Merge tag 'rtc-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · 7846b618
      Linus Torvalds authored
      Pull RTC updates from Alexandre Belloni:
       "Subsystem:
         - add missing MODULE_DESCRIPTION() macro
         - fix offset addition for alarms
      
        Drivers:
         - isl1208: alarm clearing fixes
         - mcp794xx: oscillator failure detection
         - stm32: stm32mp25 support
         - tps6594: power management support"
      
      * tag 'rtc-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
        rtc: stm32: add new st,stm32mp25-rtc compatible and check RIF configuration
        dt-bindings: rtc: stm32: introduce new st,stm32mp25-rtc compatible
        rtc: Drop explicit initialization of struct i2c_device_id::driver_data to 0
        rtc: interface: Add RTC offset to alarm after fix-up
        rtc: ds1307: Clamp year to valid BCD (0-99) in `set_time()`
        rtc: ds1307: Detect oscillator fail on mcp794xx
        rtc: isl1208: Update correct procedure for clearing alarm
        rtc: isl1208: Add a delay for clearing alarm
        dt-bindings: rtc: Convert rtc-fsl-ftm-alarm.txt to yaml format
        rtc: add missing MODULE_DESCRIPTION() macro
        rtc: abx80x: Fix return value of nvmem callback on read
        rtc: cmos: Fix return value of nvmem callbacks
        rtc: isl1208: Fix return value of nvmem callbacks
        rtc: tps6594: Add power management support
        rtc: tps6594: introduce private structure as drvdata
        rtc: tps6594: Fix memleak in probe
      7846b618
    • Linus Torvalds's avatar
      Merge tag '6.11-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 33c9de29
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
       "Six smb3 client fixes, most for stable including important netfs fixes:
      
         - various netfs related fixes for cifs addressing some regressions in
           6.10 (e.g. generic/708 and some multichannel crediting related
           issues)
      
         - fix for a noisy log message on copy_file_range
      
         - add trace point for read/write credits"
      
      * tag '6.11-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Fix missing fscache invalidation
        cifs: Add a tracepoint to track credits involved in R/W requests
        cifs: Fix setting of zero_point after DIO write
        cifs: Fix missing error code set
        cifs: Fix server re-repick on subrequest retry
        cifs: fix noisy message on copy_file_range
      33c9de29
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 8e313211
      Linus Torvalds authored
      Pull pin control updates from Linus Walleij:
       "Some new drivers is the main part, the rest is cleanups and nonurgent
        fixes.
      
        Nothing much special about this, no core changes this time.
      
        New drivers:
      
         - Renesas RZ/V2H(P) SoC
      
         - NXP Freescale i.MX91 SoC
      
         - Nuvoton MA35D1 SoC
      
         - Qualcomm PMC8380, SM4250, SM4250 LPI
      
        Enhancements:
      
         - A slew of scoped-based simplifications of of_node_put()"
      
      * tag 'pinctrl-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (110 commits)
        pinctrl: renesas: rzg2l: Support output enable on RZ/G2L
        pinctrl: renesas: rzg2l: Clean up and refactor OEN read/write functions
        pinctrl: renesas: rzg2l: Clarify OEN read/write support
        dt-bindings: pinctrl: pinctrl-single: Fix pinctrl-single,gpio-range description
        dt-bindings: pinctrl: npcm8xx: add missing pin group and mux function
        dt-bindings: pinctrl: pinctrl-single: fix schmitt related properties
        pinctrl: freescale: Use scope based of_node_put() cleanups
        pinctrl: equilibrium: Use scope based of_node_put() cleanups
        pinctrl: ti: iodelay: Use scope based of_node_put() cleanups
        pinctrl: qcom: lpass-lpi: increase MAX_NR_GPIO to 32
        pinctrl: cy8c95x0: Update cache modification
        pinctrl: cy8c95x0: Use cleanup.h
        pinctrl: renesas: r8a779h0: Remove unneeded separators
        pinctrl: renesas: r8a779g0: Add INTC-EX pins, groups, and function
        pinctrl: renesas: r8a779g0: Remove unneeded separators
        pinctrl: renesas: r8a779h0: Add AVB MII pins and groups
        pinctrl: renesas: r8a779g0: Fix TPU suffixes
        pinctrl: renesas: r8a779g0: Fix TCLK suffixes
        pinctrl: renesas: r8a779g0: FIX PWM suffixes
        pinctrl: renesas: r8a779g0: Fix IRQ suffixes
        ...
      8e313211
  16. 20 Jul, 2024 2 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 2c9b3512
      Linus Torvalds authored
      Pull kvm updates from Paolo Bonzini:
       "ARM:
      
         - Initial infrastructure for shadow stage-2 MMUs, as part of nested
           virtualization enablement
      
         - Support for userspace changes to the guest CTR_EL0 value, enabling
           (in part) migration of VMs between heterogenous hardware
      
         - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1
           of the protocol
      
         - FPSIMD/SVE support for nested, including merged trap configuration
           and exception routing
      
         - New command-line parameter to control the WFx trap behavior under
           KVM
      
         - Introduce kCFI hardening in the EL2 hypervisor
      
         - Fixes + cleanups for handling presence/absence of FEAT_TCRX
      
         - Miscellaneous fixes + documentation updates
      
        LoongArch:
      
         - Add paravirt steal time support
      
         - Add support for KVM_DIRTY_LOG_INITIALLY_SET
      
         - Add perf kvm-stat support for loongarch
      
        RISC-V:
      
         - Redirect AMO load/store access fault traps to guest
      
         - perf kvm stat support
      
         - Use guest files for IMSIC virtualization, when available
      
        s390:
      
         - Assortment of tiny fixes which are not time critical
      
        x86:
      
         - Fixes for Xen emulation
      
         - Add a global struct to consolidate tracking of host values, e.g.
           EFER
      
         - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the
           effective APIC bus frequency, because TDX
      
         - Print the name of the APICv/AVIC inhibits in the relevant
           tracepoint
      
         - Clean up KVM's handling of vendor specific emulation to
           consistently act on "compatible with Intel/AMD", versus checking
           for a specific vendor
      
         - Drop MTRR virtualization, and instead always honor guest PAT on
           CPUs that support self-snoop
      
         - Update to the newfangled Intel CPU FMS infrastructure
      
         - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as
           it reads '0' and writes from userspace are ignored
      
         - Misc cleanups
      
        x86 - MMU:
      
         - Small cleanups, renames and refactoring extracted from the upcoming
           Intel TDX support
      
         - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages
           that can't hold leafs SPTEs
      
         - Unconditionally drop mmu_lock when allocating TDP MMU page tables
           for eager page splitting, to avoid stalling vCPUs when splitting
           huge pages
      
         - Bug the VM instead of simply warning if KVM tries to split a SPTE
           that is non-present or not-huge. KVM is guaranteed to end up in a
           broken state because the callers fully expect a valid SPTE, it's
           all but dangerous to let more MMU changes happen afterwards
      
        x86 - AMD:
      
         - Make per-CPU save_area allocations NUMA-aware
      
         - Force sev_es_host_save_area() to be inlined to avoid calling into
           an instrumentable function from noinstr code
      
         - Base support for running SEV-SNP guests. API-wise, this includes a
           new KVM_X86_SNP_VM type, encrypting/measure the initial image into
           guest memory, and finalizing it before launching it. Internally,
           there are some gmem/mmu hooks needed to prepare gmem-allocated
           pages before mapping them into guest private memory ranges
      
           This includes basic support for attestation guest requests, enough
           to say that KVM supports the GHCB 2.0 specification
      
           There is no support yet for loading into the firmware those signing
           keys to be used for attestation requests, and therefore no need yet
           for the host to provide certificate data for those keys.
      
           To support fetching certificate data from userspace, a new KVM exit
           type will be needed to handle fetching the certificate from
           userspace.
      
           An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS
           exit type to handle this was introduced in v1 of this patchset, but
           is still being discussed by community, so for now this patchset
           only implements a stub version of SNP Extended Guest Requests that
           does not provide certificate data
      
        x86 - Intel:
      
         - Remove an unnecessary EPT TLB flush when enabling hardware
      
         - Fix a series of bugs that cause KVM to fail to detect nested
           pending posted interrupts as valid wake eents for a vCPU executing
           HLT in L2 (with HLT-exiting disable by L1)
      
         - KVM: x86: Suppress MMIO that is triggered during task switch
           emulation
      
           Explicitly suppress userspace emulated MMIO exits that are
           triggered when emulating a task switch as KVM doesn't support
           userspace MMIO during complex (multi-step) emulation
      
           Silently ignoring the exit request can result in the
           WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace
           for some other reason prior to purging mmio_needed
      
           See commit 0dc90226 ("KVM: x86: Suppress pending MMIO write
           exits if emulator detects exception") for more details on KVM's
           limitations with respect to emulated MMIO during complex emulator
           flows
      
        Generic:
      
         - Rename the AS_UNMOVABLE flag that was introduced for KVM to
           AS_INACCESSIBLE, because the special casing needed by these pages
           is not due to just unmovability (and in fact they are only
           unmovable because the CPU cannot access them)
      
         - New ioctl to populate the KVM page tables in advance, which is
           useful to mitigate KVM page faults during guest boot or after live
           migration. The code will also be used by TDX, but (probably) not
           through the ioctl
      
         - Enable halt poll shrinking by default, as Intel found it to be a
           clear win
      
         - Setup empty IRQ routing when creating a VM to avoid having to
           synchronize SRCU when creating a split IRQCHIP on x86
      
         - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with
           a flag that arch code can use for hooking both sched_in() and
           sched_out()
      
         - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
           truncating a bogus value from userspace, e.g. to help userspace
           detect bugs
      
         - Mark a vCPU as preempted if and only if it's scheduled out while in
           the KVM_RUN loop, e.g. to avoid marking it preempted and thus
           writing guest memory when retrieving guest state during live
           migration blackout
      
        Selftests:
      
         - Remove dead code in the memslot modification stress test
      
         - Treat "branch instructions retired" as supported on all AMD Family
           17h+ CPUs
      
         - Print the guest pseudo-RNG seed only when it changes, to avoid
           spamming the log for tests that create lots of VMs
      
         - Make the PMU counters test less flaky when counting LLC cache
           misses by doing CLFLUSH{OPT} in every loop iteration"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
        crypto: ccp: Add the SNP_VLEK_LOAD command
        KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops
        KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops
        KVM: x86: Replace static_call_cond() with static_call()
        KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
        x86/sev: Move sev_guest.h into common SEV header
        KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
        KVM: x86: Suppress MMIO that is triggered during task switch emulation
        KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro
        KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE
        KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY
        KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()
        KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level
        KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault"
        KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler
        KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory
        KVM: Document KVM_PRE_FAULT_MEMORY ioctl
        mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE
        perf kvm: Add kvm-stat for loongarch64
        LoongArch: KVM: Add PV steal time support in guest side
        ...
      2c9b3512
    • David Howells's avatar
      cifs: Fix missing fscache invalidation · a07d38af
      David Howells authored
      A network filesystem needs to implement a netfslib hook to invalidate
      fscache if it's to be able to use the cache.
      
      Fix cifs to implement the cache invalidation hook.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarPaulo Alcantara (Red Hat) <pc@manguebit.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cifs@vger.kernel.org
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 3ee1a1fc ("cifs: Cut over to using netfslib")
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      a07d38af