1. 02 Sep, 2024 13 commits
    • Mike Yuan's avatar
      mm/memcontrol: respect zswap.writeback setting from parent cg too · e3992573
      Mike Yuan authored
      Currently, the behavior of zswap.writeback wrt.  the cgroup hierarchy
      seems a bit odd.  Unlike zswap.max, it doesn't honor the value from parent
      cgroups.  This surfaced when people tried to globally disable zswap
      writeback, i.e.  reserve physical swap space only for hibernation [1] -
      disabling zswap.writeback only for the root cgroup results in subcgroups
      with zswap.writeback=1 still performing writeback.
      
      The inconsistency became more noticeable after I introduced the
      MemoryZSwapWriteback= systemd unit setting [2] for controlling the knob.
      The patch assumed that the kernel would enforce the value of parent
      cgroups.  It could probably be workarounded from systemd's side, by going
      up the slice unit tree and inheriting the value.  Yet I think it's more
      sensible to make it behave consistently with zswap.max and friends.
      
      [1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation
      [2] https://github.com/systemd/systemd/pull/31734
      
      Link: https://lkml.kernel.org/r/20240823162506.12117-1-me@yhndnzj.com
      Fixes: 501a06fe ("zswap: memcontrol: implement zswap writeback disabling")
      Signed-off-by: default avatarMike Yuan <me@yhndnzj.com>
      Reviewed-by: default avatarNhat Pham <nphamcs@gmail.com>
      Acked-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeel.butt@linux.dev>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e3992573
    • Marc Zyngier's avatar
      scripts: fix gfp-translate after ___GFP_*_BITS conversion to an enum · a3f6a89c
      Marc Zyngier authored
      Richard reports that since 772dd034 ("mm: enumerate all gfp flags"),
      gfp-translate is broken, as the bit numbers are implicit, leaving the
      shell script unable to extract them.  Even more, some bits are now at a
      variable location, making it double extra hard to parse using a simple
      shell script.
      
      Use a brute-force approach to the problem by generating a small C stub
      that will use the enum to dump the interesting bits.
      
      As an added bonus, we are now able to identify invalid bits for a given
      configuration.  As an added drawback, we cannot parse include files that
      predate this change anymore.  Tough luck.
      
      Link: https://lkml.kernel.org/r/20240823163850.3791201-1-maz@kernel.org
      Fixes: 772dd034 ("mm: enumerate all gfp flags")
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Reported-by: default avatarRichard Weinberger <richard@nod.at>
      Cc: Petr Tesařík <petr@tesarici.cz>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a3f6a89c
    • Usama Arif's avatar
      Revert "mm: skip CMA pages when they are not available" · bfe0857c
      Usama Arif authored
      This reverts commit 5da226db ("mm: skip CMA pages when they are not
      available") and b7108d66 ("Multi-gen LRU: skip CMA pages when they are
      not eligible").
      
      lruvec->lru_lock is highly contended and is held when calling
      isolate_lru_folios.  If the lru has a large number of CMA folios
      consecutively, while the allocation type requested is not MIGRATE_MOVABLE,
      isolate_lru_folios can hold the lock for a very long time while it skips
      those.  For FIO workload, ~150million order=0 folios were skipped to
      isolate a few ZONE_DMA folios [1].  This can cause lockups [1] and high
      memory pressure for extended periods of time [2].
      
      Remove skipping CMA for MGLRU as well, as it was introduced in sort_folio
      for the same resaon as 5da226db.
      
      [1] https://lore.kernel.org/all/CAOUHufbkhMZYz20aM_3rHZ3OcK4m2puji2FGpUpn_-DevGk3Kg@mail.gmail.com/
      [2] https://lore.kernel.org/all/ZrssOrcJIDy8hacI@gmail.com/
      
      [usamaarif642@gmail.com: also revert b7108d66, per Johannes]
        Link: https://lkml.kernel.org/r/9060a32d-b2d7-48c0-8626-1db535653c54@gmail.com
        Link: https://lkml.kernel.org/r/357ac325-4c61-497a-92a3-bdbd230d5ec9@gmail.com
      Link: https://lkml.kernel.org/r/9060a32d-b2d7-48c0-8626-1db535653c54@gmail.com
      Fixes: 5da226db ("mm: skip CMA pages when they are not available")
      Signed-off-by: default avatarUsama Arif <usamaarif642@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Bharata B Rao <bharata@amd.com>
      Cc: Breno Leitao <leitao@debian.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: Zhaoyang Huang <huangzhaoyang@gmail.com>
      Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bfe0857c
    • Liam R. Howlett's avatar
      maple_tree: remove rcu_read_lock() from mt_validate() · f806de88
      Liam R. Howlett authored
      The write lock should be held when validating the tree to avoid updates
      racing with checks.  Holding the rcu read lock during a large tree
      validation may also cause a prolonged rcu read window and "rcu_preempt
      detected stalls" warnings.
      
      Link: https://lore.kernel.org/all/0000000000001d12d4062005aea1@google.com/
      Link: https://lkml.kernel.org/r/20240820175417.2782532-1-Liam.Howlett@oracle.com
      Fixes: 54a611b6 ("Maple Tree: add new data structure")
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@Oracle.com>
      Reported-by: syzbot+036af2f0c7338a33b0cd@syzkaller.appspotmail.com
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f806de88
    • Petr Tesarik's avatar
      kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y · 6dacd79d
      Petr Tesarik authored
      Fix the condition to exclude the elfcorehdr segment from the SHA digest
      calculation.
      
      The j iterator is an index into the output sha_regions[] array, not into
      the input image->segment[] array.  Once it reaches
      image->elfcorehdr_index, all subsequent segments are excluded.  Besides,
      if the purgatory segment precedes the elfcorehdr segment, the elfcorehdr
      may be wrongly included in the calculation.
      
      Link: https://lkml.kernel.org/r/20240805150750.170739-1-petr.tesarik@suse.com
      Fixes: f7cc804a ("kexec: exclude elfcorehdr from the segment digest")
      Signed-off-by: default avatarPetr Tesarik <ptesarik@suse.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
      Cc: Eric DeVolder <eric_devolder@yahoo.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6dacd79d
    • Hao Ge's avatar
      mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook · ab7ca095
      Hao Ge authored
      When enable CONFIG_MEMCG & CONFIG_KFENCE & CONFIG_KMEMLEAK, the following
      warning always occurs,This is because the following call stack occurred:
      mem_pool_alloc
          kmem_cache_alloc_noprof
              slab_alloc_node
                  kfence_alloc
      
      Once the kfence allocation is successful,slab->obj_exts will not be empty,
      because it has already been assigned a value in kfence_init_pool.
      
      Since in the prepare_slab_obj_exts_hook function,we perform a check for
      s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE),the alloc_tag_add function
      will not be called as a result.Therefore,ref->ct remains NULL.
      
      However,when we call mem_pool_free,since obj_ext is not empty, it
      eventually leads to the alloc_tag_sub scenario being invoked.  This is
      where the warning occurs.
      
      So we should add corresponding checks in the alloc_tagging_slab_free_hook.
      For __GFP_NO_OBJ_EXT case,I didn't see the specific case where it's using
      kfence,so I won't add the corresponding check in
      alloc_tagging_slab_free_hook for now.
      
      [    3.734349] ------------[ cut here ]------------
      [    3.734807] alloc_tag was not set
      [    3.735129] WARNING: CPU: 4 PID: 40 at ./include/linux/alloc_tag.h:130 kmem_cache_free+0x444/0x574
      [    3.735866] Modules linked in: autofs4
      [    3.736211] CPU: 4 UID: 0 PID: 40 Comm: ksoftirqd/4 Tainted: G        W          6.11.0-rc3-dirty #1
      [    3.736969] Tainted: [W]=WARN
      [    3.737258] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
      [    3.737875] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [    3.738501] pc : kmem_cache_free+0x444/0x574
      [    3.738951] lr : kmem_cache_free+0x444/0x574
      [    3.739361] sp : ffff80008357bb60
      [    3.739693] x29: ffff80008357bb70 x28: 0000000000000000 x27: 0000000000000000
      [    3.740338] x26: ffff80008207f000 x25: ffff000b2eb2fd60 x24: ffff0000c0005700
      [    3.740982] x23: ffff8000804229e4 x22: ffff800082080000 x21: ffff800081756000
      [    3.741630] x20: fffffd7ff8253360 x19: 00000000000000a8 x18: ffffffffffffffff
      [    3.742274] x17: ffff800ab327f000 x16: ffff800083398000 x15: ffff800081756df0
      [    3.742919] x14: 0000000000000000 x13: 205d344320202020 x12: 5b5d373038343337
      [    3.743560] x11: ffff80008357b650 x10: 000000000000005d x9 : 00000000ffffffd0
      [    3.744231] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008237bad0 x6 : c0000000ffff7fff
      [    3.744907] x5 : ffff80008237ba78 x4 : ffff8000820bbad0 x3 : 0000000000000001
      [    3.745580] x2 : 68d66547c09f7800 x1 : 68d66547c09f7800 x0 : 0000000000000000
      [    3.746255] Call trace:
      [    3.746530]  kmem_cache_free+0x444/0x574
      [    3.746931]  mem_pool_free+0x44/0xf4
      [    3.747306]  free_object_rcu+0xc8/0xdc
      [    3.747693]  rcu_do_batch+0x234/0x8a4
      [    3.748075]  rcu_core+0x230/0x3e4
      [    3.748424]  rcu_core_si+0x14/0x1c
      [    3.748780]  handle_softirqs+0x134/0x378
      [    3.749189]  run_ksoftirqd+0x70/0x9c
      [    3.749560]  smpboot_thread_fn+0x148/0x22c
      [    3.749978]  kthread+0x10c/0x118
      [    3.750323]  ret_from_fork+0x10/0x20
      [    3.750696] ---[ end trace 0000000000000000 ]---
      
      Link: https://lkml.kernel.org/r/20240816013336.17505-1-hao.ge@linux.dev
      Fixes: 4b873696 ("mm/slab: add allocation accounting into slab allocation and free paths")
      Signed-off-by: default avatarHao Ge <gehao@kylinos.cn>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <kees@kernel.org>
      Cc: Kent Overstreet <kent.overstreet@linux.dev>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ab7ca095
    • Ryusuke Konishi's avatar
      nilfs2: fix state management in error path of log writing function · 6576dd66
      Ryusuke Konishi authored
      After commit a694291a ("nilfs2: separate wait function from
      nilfs_segctor_write") was applied, the log writing function
      nilfs_segctor_do_construct() was able to issue I/O requests continuously
      even if user data blocks were split into multiple logs across segments,
      but two potential flaws were introduced in its error handling.
      
      First, if nilfs_segctor_begin_construction() fails while creating the
      second or subsequent logs, the log writing function returns without
      calling nilfs_segctor_abort_construction(), so the writeback flag set on
      pages/folios will remain uncleared.  This causes page cache operations to
      hang waiting for the writeback flag.  For example,
      truncate_inode_pages_final(), which is called via nilfs_evict_inode() when
      an inode is evicted from memory, will hang.
      
      Second, the NILFS_I_COLLECTED flag set on normal inodes remain uncleared. 
      As a result, if the next log write involves checkpoint creation, that's
      fine, but if a partial log write is performed that does not, inodes with
      NILFS_I_COLLECTED set are erroneously removed from the "sc_dirty_files"
      list, and their data and b-tree blocks may not be written to the device,
      corrupting the block mapping.
      
      Fix these issues by uniformly calling nilfs_segctor_abort_construction()
      on failure of each step in the loop in nilfs_segctor_do_construct(),
      having it clean up logs and segment usages according to progress, and
      correcting the conditions for calling nilfs_redirty_inodes() to ensure
      that the NILFS_I_COLLECTED flag is cleared.
      
      Link: https://lkml.kernel.org/r/20240814101119.4070-1-konishi.ryusuke@gmail.com
      Fixes: a694291a ("nilfs2: separate wait function from nilfs_segctor_write")
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6576dd66
    • Ryusuke Konishi's avatar
      nilfs2: fix missing cleanup on rollforward recovery error · 5787fcaa
      Ryusuke Konishi authored
      In an error injection test of a routine for mount-time recovery, KASAN
      found a use-after-free bug.
      
      It turned out that if data recovery was performed using partial logs
      created by dsync writes, but an error occurred before starting the log
      writer to create a recovered checkpoint, the inodes whose data had been
      recovered were left in the ns_dirty_files list of the nilfs object and
      were not freed.
      
      Fix this issue by cleaning up inodes that have read the recovery data if
      the recovery routine fails midway before the log writer starts.
      
      Link: https://lkml.kernel.org/r/20240810065242.3701-1-konishi.ryusuke@gmail.com
      Fixes: 0f3e1c7f ("nilfs2: recovery functions")
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5787fcaa
    • Ryusuke Konishi's avatar
      nilfs2: protect references to superblock parameters exposed in sysfs · 68340825
      Ryusuke Konishi authored
      The superblock buffers of nilfs2 can not only be overwritten at runtime
      for modifications/repairs, but they are also regularly swapped, replaced
      during resizing, and even abandoned when degrading to one side due to
      backing device issues.  So, accessing them requires mutual exclusion using
      the reader/writer semaphore "nilfs->ns_sem".
      
      Some sysfs attribute show methods read this superblock buffer without the
      necessary mutual exclusion, which can cause problems with pointer
      dereferencing and memory access, so fix it.
      
      Link: https://lkml.kernel.org/r/20240811100320.9913-1-konishi.ryusuke@gmail.com
      Fixes: da7141fb ("nilfs2: add /sys/fs/nilfs2/<device> group")
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      68340825
    • Jann Horn's avatar
      userfaultfd: don't BUG_ON() if khugepaged yanks our page table · 4828d207
      Jann Horn authored
      Since khugepaged was changed to allow retracting page tables in file
      mappings without holding the mmap lock, these BUG_ON()s are wrong - get
      rid of them.
      
      We could also remove the preceding "if (unlikely(...))" block, but then we
      could reach pte_offset_map_lock() with transhuge pages not just for file
      mappings but also for anonymous mappings - which would probably be fine
      but I think is not necessarily expected.
      
      Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-2-5efa61078a41@google.com
      Fixes: 1d65b771 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Reviewed-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4828d207
    • Jann Horn's avatar
      userfaultfd: fix checks for huge PMDs · 71c186ef
      Jann Horn authored
      Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.
      
      The pmd_trans_huge() code in mfill_atomic() is wrong in three different
      ways depending on kernel version:
      
      1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit
         the right two race windows) - I've tested this in a kernel build with
         some extra mdelay() calls. See the commit message for a description
         of the race scenario.
         On older kernels (before 6.5), I think the same bug can even
         theoretically lead to accessing transhuge page contents as a page table
         if you hit the right 5 narrow race windows (I haven't tested this case).
      2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for
         detecting PMDs that don't point to page tables.
         On older kernels (before 6.5), you'd just have to win a single fairly
         wide race to hit this.
         I've tested this on 6.1 stable by racing migration (with a mdelay()
         patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86
         VM, that causes a kernel oops in ptlock_ptr().
      3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed
         to yank page tables out from under us (though I haven't tested that),
         so I think the BUG_ON() checks in mfill_atomic() are just wrong.
      
      I decided to write two separate fixes for these (one fix for bugs 1+2, one
      fix for bug 3), so that the first fix can be backported to kernels
      affected by bugs 1+2.
      
      
      This patch (of 2):
      
      This fixes two issues.
      
      I discovered that the following race can occur:
      
        mfill_atomic                other thread
        ============                ============
                                    <zap PMD>
        pmdp_get_lockless() [reads none pmd]
        <bail if trans_huge>
        <if none:>
                                    <pagefault creates transhuge zeropage>
          __pte_alloc [no-op]
                                    <zap PMD>
        <bail if pmd_trans_huge(*dst_pmd)>
        BUG_ON(pmd_none(*dst_pmd))
      
      I have experimentally verified this in a kernel with extra mdelay() calls;
      the BUG_ON(pmd_none(*dst_pmd)) triggers.
      
      On kernels newer than commit 0d940a9b ("mm/pgtable: allow
      pte_offset_map[_lock]() to fail"), this can't lead to anything worse than
      a BUG_ON(), since the page table access helpers are actually designed to
      deal with page tables concurrently disappearing; but on older kernels
      (<=6.4), I think we could probably theoretically race past the two
      BUG_ON() checks and end up treating a hugepage as a page table.
      
      The second issue is that, as Qi Zheng pointed out, there are other types
      of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs
      (in particular, migration PMDs).
      
      On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a
      PMD that contains a migration entry (which just requires winning a single,
      fairly wide race), it will pass the PMD to pte_offset_map_lock(), which
      assumes that the PMD points to a page table.
      
      Breakage follows: First, the kernel tries to take the PTE lock (which will
      crash or maybe worse if there is no "struct page" for the address bits in
      the migration entry PMD - I think at least on X86 there usually is no
      corresponding "struct page" thanks to the PTE inversion mitigation, amd64
      looks different).
      
      If that didn't crash, the kernel would next try to write a PTE into what
      it wrongly thinks is a page table.
      
      As part of fixing these issues, get rid of the check for pmd_trans_huge()
      before __pte_alloc() - that's redundant, we're going to have to check for
      that after the __pte_alloc() anyway.
      
      Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.
      
      Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@google.com
      Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-1-5efa61078a41@google.com
      Fixes: c1a4de99 ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      71c186ef
    • Will Deacon's avatar
      mm: vmalloc: ensure vmap_block is initialised before adding to queue · 3e3de794
      Will Deacon authored
      Commit 8c61291f ("mm: fix incorrect vbq reference in
      purge_fragmented_block") extended the 'vmap_block' structure to contain a
      'cpu' field which is set at allocation time to the id of the initialising
      CPU.
      
      When a new 'vmap_block' is being instantiated by new_vmap_block(), the
      partially initialised structure is added to the local 'vmap_block_queue'
      xarray before the 'cpu' field has been initialised.  If another CPU is
      concurrently walking the xarray (e.g.  via vm_unmap_aliases()), then it
      may perform an out-of-bounds access to the remote queue thanks to an
      uninitialised index.
      
      This has been observed as UBSAN errors in Android:
      
       | Internal error: UBSAN: array index out of bounds: 00000000f2005512 [#1] PREEMPT SMP
       |
       | Call trace:
       |  purge_fragmented_block+0x204/0x21c
       |  _vm_unmap_aliases+0x170/0x378
       |  vm_unmap_aliases+0x1c/0x28
       |  change_memory_common+0x1dc/0x26c
       |  set_memory_ro+0x18/0x24
       |  module_enable_ro+0x98/0x238
       |  do_init_module+0x1b0/0x310
      
      Move the initialisation of 'vb->cpu' in new_vmap_block() ahead of the
      addition to the xarray.
      
      Link: https://lkml.kernel.org/r/20240812171606.17486-1-will@kernel.org
      Fixes: 8c61291f ("mm: fix incorrect vbq reference in purge_fragmented_block")
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
      Cc: Hailong.Liu <hailong.liu@oppo.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3e3de794
    • Muhammad Usama Anjum's avatar
      selftests: mm: fix build errors on armhf · b808f629
      Muhammad Usama Anjum authored
      The __NR_mmap isn't found on armhf.  The mmap() is commonly available
      system call and its wrapper is present on all architectures.  So it should
      be used directly.  It solves problem for armhf and doesn't create problem
      for other architectures.
      
      Remove sys_mmap() functions as they aren't doing anything else other than
      calling mmap().  There is no need to set errno = 0 manually as glibc
      always resets it.
      
      For reference errors are as following:
      
        CC       seal_elf
      seal_elf.c: In function 'sys_mmap':
      seal_elf.c:39:33: error: '__NR_mmap' undeclared (first use in this function)
         39 |         sret = (void *) syscall(__NR_mmap, addr, len, prot,
            |                                 ^~~~~~~~~
      
      mseal_test.c: In function 'sys_mmap':
      mseal_test.c:90:33: error: '__NR_mmap' undeclared (first use in this function)
         90 |         sret = (void *) syscall(__NR_mmap, addr, len, prot,
            |                                 ^~~~~~~~~
      
      Link: https://lkml.kernel.org/r/20240809082511.497266-1-usama.anjum@collabora.com
      Fixes: 4926c7a5 ("selftest mm/mseal memory sealing")
      Signed-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Cc: Jeff Xu <jeffxu@chromium.org>
      Cc: Kees Cook <kees@kernel.org>
      Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b808f629
  2. 01 Sep, 2024 3 commits
    • Linus Torvalds's avatar
      Linux 6.11-rc6 · 431c1646
      Linus Torvalds authored
      431c1646
    • Linus Torvalds's avatar
      Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 6b9ffc45
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
      
       - copy_file_range fix
      
       - two read fixes including read past end of file rc fix and read retry
         crediting fix
      
       - falloc zero range fix
      
      * tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region
        cifs: Fix copy offload to flush destination region
        netfs, cifs: Fix handling of short DIO read
        cifs: Fix lack of credit renegotiation on read retry
      6b9ffc45
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs · a4c76312
      Linus Torvalds authored
      Push bcachefs fixes from Kent Overstreet:
       "The data corruption in the buffered write path is troubling; inode
        lock should not have been able to cause that...
      
         - Fix a rare data corruption in the rebalance path, caught as a nonce
           inconsistency on encrypted filesystems
      
         - Revert lockless buffered write path
      
         - Mark more errors as autofix"
      
      * tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs:
        bcachefs: Mark more errors as autofix
        bcachefs: Revert lockless buffered IO path
        bcachefs: Fix bch2_extents_match() false positive
        bcachefs: Fix failure to return error in data_update_index_update()
      a4c76312
  3. 31 Aug, 2024 14 commits
    • Kent Overstreet's avatar
      bcachefs: Mark more errors as autofix · 3d3020c4
      Kent Overstreet authored
      errors that are known to always be safe to fix should be autofix: this
      should be most errors even at this point, but that will need some
      thorough review.
      
      note that errors are still logged in the superblock, so we'll still know
      that they happened.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      3d3020c4
    • Kent Overstreet's avatar
      bcachefs: Revert lockless buffered IO path · e3e69409
      Kent Overstreet authored
      We had a report of data corruption on nixos when building installer
      images.
      
      https://github.com/NixOS/nixpkgs/pull/321055#issuecomment-2184131334
      
      It seems that writes are being dropped, but only when issued by QEMU,
      and possibly only in snapshot mode. It's undetermined if it's write
      calls are being dropped or dirty folios.
      
      Further testing, via minimizing the original patch to just the change
      that skips the inode lock on non appends/truncates, reveals that it
      really is just not taking the inode lock that causes the corruption: it
      has nothing to do with the other logic changes for preserving write
      atomicity in corner cases.
      
      It's also kernel config dependent: it doesn't reproduce with the minimal
      kernel config that ktest uses, but it does reproduce with nixos's distro
      config. Bisection the kernel config initially pointer the finger at page
      migration or compaction, but it appears that was erroneous; we haven't
      yet determined what kernel config option actually triggers it.
      
      Sadly it appears this will have to be reverted since we're getting too
      close to release and my plate is full, but we'd _really_ like to fully
      debug it.
      
      My suspicion is that this patch is exposing a preexisting bug - the
      inode lock actually covers very little in IO paths, and we have a
      different lock (the pagecache add lock) that guards against races with
      truncate here.
      
      Fixes: 7e64c86c ("bcachefs: Buffered write path now can avoid the inode lock")
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      e3e69409
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · 6cd90e5e
      Linus Torvalds authored
      Pull misc fixes from Guenter Roeck.
      
      These are fixes for regressions that Guenther has been reporting, and
      the maintainers haven't picked up and sent in. With rc6 fairly imminent,
      I'm taking them directly from Guenter.
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        apparmor: fix policy_unpack_test on big endian systems
        Revert "MIPS: csrc-r4k: Apply verification clocksource flags"
        microblaze: don't treat zero reserved memory regions as error
      6cd90e5e
    • Linus Torvalds's avatar
      Merge tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 8463be84
      Linus Torvalds authored
      Pull power sequencing fix from Bartosz Golaszewski:
       "A follow-up fix for the power sequencing subsystem. It turned out the
        previous fix for this driver was incomplete and broke the WLAN support
        on some platforms. This addresses the issue.
      
         - set the direction of the wlan-enable GPIO to output after
           requesting it as-is"
      
      * tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        power: sequencing: qcom-wcn: set the wlan-enable GPIO to output
      8463be84
    • Bartosz Golaszewski's avatar
      power: sequencing: qcom-wcn: set the wlan-enable GPIO to output · d8b76207
      Bartosz Golaszewski authored
      Commit a9aaf1ff ("power: sequencing: request the WLAN enable GPIO
      as-is") broke WLAN on boards on which the wlan-enable GPIO enabling the
      wifi module isn't in output mode by default. We need to set direction to
      output while retaining the value that was already set to keep the ath
      module on if it's already started.
      
      Fixes: a9aaf1ff ("power: sequencing: request the WLAN enable GPIO as-is")
      Link: https://lore.kernel.org/r/20240823115500.37280-1-brgl@bgdev.plSigned-off-by: default avatarBartosz Golaszewski <bartosz.golaszewski@linaro.org>
      d8b76207
    • Linus Torvalds's avatar
      Merge tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · e8784b0a
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes for 6.11-rc6.  Included in here are:
      
         - dwc3 driver fixes for reported issues
      
         - MAINTAINER file update, marking a driver as unsupported :(
      
         - cdnsp driver fixes
      
         - USB gadget driver fix
      
         - USB sysfs fix
      
         - other tiny fixes
      
         - new device ids for usb serial driver
      
        All of these have been in linux-next this week with no reported
        issues"
      
      * tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: serial: option: add MeiG Smart SRM825L
        usb: cdnsp: fix for Link TRB with TC
        usb: dwc3: st: add missing depopulate in probe error path
        usb: dwc3: st: fix probed platform device ref count on probe error path
        usb: dwc3: ep0: Don't reset resource alloc flag (including ep0)
        usb: core: sysfs: Unmerge @usb3_hardware_lpm_attr_group in remove_power_attributes()
        usb: typec: fsa4480: Relax CHIP_ID check
        usb: dwc3: xilinx: add missing depopulate in probe error path
        usb: dwc3: omap: add missing depopulate in probe error path
        dt-bindings: usb: microchip,usb2514: Fix reference USB device schema
        usb: gadget: uvc: queue pump work in uvcg_video_enable()
        cdc-acm: Add DISABLE_ECHO quirk for GE HealthCare UI Controller
        usb: cdnsp: fix incorrect index in cdnsp_get_hw_deq function
        usb: dwc3: core: Prevent USB core invalid event buffer address access
        MAINTAINERS: Mark UVC gadget driver as orphan
      e8784b0a
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 770b0ffe
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Minor fixes only.
      
        The sd.c one ignores a sync cache request if format is in progress
        which can happen if formatting a drive across suspend/resume"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: sd: Ignore command SYNCHRONIZE CACHE error if format in progress
        scsi: aacraid: Fix double-free on probe failure
        scsi: lpfc: Fix overflow build issue
      770b0ffe
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · 6a2fcc51
      Linus Torvalds authored
      Pull nfsd fix from Chuck Lever:
      
       - One more write delegation fix
      
      * tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease
      6a2fcc51
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 0efdc097
      Linus Torvalds authored
      Pull xfs fixes from Chandan Babu:
      
       - Do not call out v1 inodes with non-zero di_nlink field as being
         corrupt
      
       - Change xfs_finobt_count_blocks() to count "free inode btree" blocks
         rather than "inode btree" blocks
      
       - Don't report the number of trimmed bytes via FITRIM because the
         underlying storage isn't required to do anything and failed discard
         IOs aren't reported to the caller anyway
      
       - Fix incorrect setting of rm_owner field in an rmap query
      
       - Report missing disk offset range in an fsmap query
      
       - Obtain m_growlock when extending realtime section of the filesystem
      
       - Reset rootdir extent size hint after extending realtime section of
         the filesystem
      
      * tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: reset rootdir extent size hint after growfsrt
        xfs: take m_growlock when running growfsrt
        xfs: Fix missing interval for missing_owner in xfs fsmap
        xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code
        xfs: Fix the owner setting issue for rmap query in xfs fsmap
        xfs: don't bother reporting blocks trimmed via FITRIM
        xfs: xfs_finobt_count_blocks() walks the wrong btree
        xfs: fix folio dirtying for XFILE_ALLOC callers
        xfs: fix di_onlink checking for V1/V2 inodes
      0efdc097
    • Linus Torvalds's avatar
      Merge tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 35667a29
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "There is a fairly large number of bug fixes for Qualcomm platforms,
        most of them addressing issues with the devicetree files for the newly
        added Snapdragon X1 based laptops to make them more reliable.
      
        The Qualcomm driver changes address a few build-time issues as well as
        runtime problems in the tzmem and scm firmware, the USB Type-C driver,
        and the cmd-db and pmic_glink soc drivers.
      
        The NXP i.MX usually gets a bunch of devicetree fixes that is
        proportional to the number of supported machines. This includes both
        warning fixes and correctness for the 64-bit i.MX9, i.MX8 and
        layerscape platforms, as well as a single fix for a 32-bit i.MX6 based
        board.
      
        The other changes are the usual minor changes, including an update to
        the MAINTAINERS file, an omap3 dts file and a SoC driver for mpfs
        (risc-v)"
      
      * tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits)
        firmware: microchip: fix incorrect error report of programming:timeout on success
        soc: qcom: pd-mapper: Fix singleton refcount
        firmware: qcom: tzmem: disable sdm670 platform
        soc: qcom: pmic_glink: Actually communicate when remote goes down
        usb: typec: ucsi: Move unregister out of atomic section
        soc: qcom: pmic_glink: Fix race during initialization
        firmware: qcom: qseecom: remove unused functions
        firmware: qcom: tzmem: fix virtual-to-physical address conversion
        firmware: qcom: scm: Mark get_wq_ctx() as atomic call
        arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt
        arm64: dts: qcom: disable GPU on x1e80100 by default
        arm64: dts: imx8mm-phygate: fix typo pinctrcl-0
        arm64: dts: imx95: correct L3Cache cache-sets
        arm64: dts: imx95: correct a55 power-domains
        arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo
        arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges
        ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design
        arm64: dts: imx93: update default value for snps,clk-csr
        arm64: dts: freescale: tqma9352: Fix watchdog reset
        arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962
        ...
      35667a29
    • Linus Torvalds's avatar
      Merge tag 'input-for-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 1934261d
      Linus Torvalds authored
      Pull input fix from Dmitry Torokhov:
      
       - a fix for Cypress PS/2 touchpad for regression introduced in 6.11
         merge window where a timeout condition is incorrectly reported for
         all extended Cypress commands
      
      * tag 'input-for-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: cypress_ps2 - fix waiting for command response
      1934261d
    • Linus Torvalds's avatar
      Merge tag 'pci-v6.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci · 8101b276
      Linus Torvalds authored
      Pull pci fixes from Bjorn Helgaas:
      
       - Add Manivannan Sadhasivam as PCI native host bridge and endpoint
         driver reviewer (Manivannan Sadhasivam)
      
       - Disable MHI RAM data parity error interrupt for qcom SA8775P SoC to
         work around hardware erratum that causes a constant stream of
         interrupts (Manivannan Sadhasivam)
      
       - Don't try to fall back to qcom Operating Performance Points (OPP)
         support unless the platform actually supports OPP (Manivannan
         Sadhasivam)
      
       - Add imx@lists.linux.dev mailing list to MAINTAINERS for NXP
         layerscape and imx6 PCI controller drivers (Frank Li)
      
      * tag 'pci-v6.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
        MAINTAINERS: PCI: Add NXP PCI controller mailing list imx@lists.linux.dev
        PCI: qcom: Use OPP only if the platform supports it
        PCI: qcom-ep: Disable MHI RAM data parity error interrupt for SA8775P SoC
        MAINTAINERS: Add Manivannan Sadhasivam as Reviewer for PCI native host bridge and endpoint drivers
      8101b276
    • Linus Torvalds's avatar
      Merge tag 'block-6.11-20240830' of git://git.kernel.dk/linux · 216d1631
      Linus Torvalds authored
      Pull block fix from Jens Axboe:
       "Fix for a single regression for WRITE_SAME introduced in the 6.11
        merge window"
      
      * tag 'block-6.11-20240830' of git://git.kernel.dk/linux:
        block: fix detection of unsupported WRITE SAME in blkdev_issue_write_zeroes
      216d1631
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.11-20240830' of git://git.kernel.dk/linux · ad246d9f
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - A fix for a regression that happened in 6.11 merge window, where the
         copying of iovecs for compat mode applications got broken for certain
         cases.
      
       - Fix for a bug introduced in 6.10, where if using recv/send bundles
         with classic provided buffers, the recv/send would fail to set the
         right iovec count. This caused 0 byte send/recv results. Found via
         code coverage testing and writing a test case to exercise it.
      
      * tag 'io_uring-6.11-20240830' of git://git.kernel.dk/linux:
        io_uring/kbuf: return correct iovec count from classic buffer peek
        io_uring/rsrc: ensure compat iovecs are copied correctly
      ad246d9f
  4. 30 Aug, 2024 10 commits