1. 16 Aug, 2024 16 commits
    • Gao Xiang's avatar
      mm/migrate: fix deadlock in migrate_pages_batch() on large folios · 2e6506e1
      Gao Xiang authored
      Currently, migrate_pages_batch() can lock multiple locked folios with an
      arbitrary order.  Although folio_trylock() is used to avoid deadlock as
      commit 2ef7dbb2 ("migrate_pages: try migrate in batch asynchronously
      firstly") mentioned, it seems try_split_folio() is still missing.
      
      It was found by compaction stress test when I explicitly enable EROFS
      compressed files to use large folios, which case I cannot reproduce with
      the same workload if large folio support is off (current mainline). 
      Typically, filesystem reads (with locked file-backed folios) could use
      another bdev/meta inode to load some other I/Os (e.g.  inode extent
      metadata or caching compressed data), so the locking order will be:
      
        file-backed folios  (A)
           bdev/meta folios (B)
      
      The following calltrace shows the deadlock:
         Thread 1 takes (B) lock and tries to take folio (A) lock
         Thread 2 takes (A) lock and tries to take folio (B) lock
      
      [Thread 1]
      INFO: task stress:1824 blocked for more than 30 seconds.
            Tainted: G           OE      6.10.0-rc7+ #6
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:stress          state:D stack:0     pid:1824  tgid:1824  ppid:1822   flags:0x0000000c
      Call trace:
       __switch_to+0xec/0x138
       __schedule+0x43c/0xcb0
       schedule+0x54/0x198
       io_schedule+0x44/0x70
       folio_wait_bit_common+0x184/0x3f8
      			<-- folio mapping ffff00036d69cb18 index 996  (**)
       __folio_lock+0x24/0x38
       migrate_pages_batch+0x77c/0xea0	// try_split_folio (mm/migrate.c:1486:2)
      					// migrate_pages_batch (mm/migrate.c:1734:16)
      		<--- LIST_HEAD(unmap_folios) has
      			..
      			folio mapping 0xffff0000d184f1d8 index 1711;   (*)
      			folio mapping 0xffff0000d184f1d8 index 1712;
      			..
       migrate_pages+0xb28/0xe90
       compact_zone+0xa08/0x10f0
       compact_node+0x9c/0x180
       sysctl_compaction_handler+0x8c/0x118
       proc_sys_call_handler+0x1a8/0x280
       proc_sys_write+0x1c/0x30
       vfs_write+0x240/0x380
       ksys_write+0x78/0x118
       __arm64_sys_write+0x24/0x38
       invoke_syscall+0x78/0x108
       el0_svc_common.constprop.0+0x48/0xf0
       do_el0_svc+0x24/0x38
       el0_svc+0x3c/0x148
       el0t_64_sync_handler+0x100/0x130
       el0t_64_sync+0x190/0x198
      
      [Thread 2]
      INFO: task stress:1825 blocked for more than 30 seconds.
            Tainted: G           OE      6.10.0-rc7+ #6
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:stress          state:D stack:0     pid:1825  tgid:1825  ppid:1822   flags:0x0000000c
      Call trace:
       __switch_to+0xec/0x138
       __schedule+0x43c/0xcb0
       schedule+0x54/0x198
       io_schedule+0x44/0x70
       folio_wait_bit_common+0x184/0x3f8
      			<-- folio = 0xfffffdffc6b503c0 (mapping == 0xffff0000d184f1d8 index == 1711) (*)
       __folio_lock+0x24/0x38
       z_erofs_runqueue+0x384/0x9c0 [erofs]
       z_erofs_readahead+0x21c/0x350 [erofs]       <-- folio mapping 0xffff00036d69cb18 range from [992, 1024] (**)
       read_pages+0x74/0x328
       page_cache_ra_order+0x26c/0x348
       ondemand_readahead+0x1c0/0x3a0
       page_cache_sync_ra+0x9c/0xc0
       filemap_get_pages+0xc4/0x708
       filemap_read+0x104/0x3a8
       generic_file_read_iter+0x4c/0x150
       vfs_read+0x27c/0x330
       ksys_pread64+0x84/0xd0
       __arm64_sys_pread64+0x28/0x40
       invoke_syscall+0x78/0x108
       el0_svc_common.constprop.0+0x48/0xf0
       do_el0_svc+0x24/0x38
       el0_svc+0x3c/0x148
       el0t_64_sync_handler+0x100/0x130
       el0t_64_sync+0x190/0x198
      
      Link: https://lkml.kernel.org/r/20240729021306.398286-1-hsiangkao@linux.alibaba.com
      Fixes: 5dfab109 ("migrate_pages: batch _unmap and _move")
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2e6506e1
    • Suren Baghdasaryan's avatar
      alloc_tag: mark pages reserved during CMA activation as not tagged · 766c163c
      Suren Baghdasaryan authored
      During CMA activation, pages in CMA area are prepared and then freed
      without being allocated.  This triggers warnings when memory allocation
      debug config (CONFIG_MEM_ALLOC_PROFILING_DEBUG) is enabled.  Fix this by
      marking these pages not tagged before freeing them.
      
      Link: https://lkml.kernel.org/r/20240813150758.855881-2-surenb@google.com
      Fixes: d224eb02 ("codetag: debug: mark codetags for reserved pages as empty")
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kent Overstreet <kent.overstreet@linux.dev>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Sourav Panda <souravpanda@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>	[6.10]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      766c163c
    • Suren Baghdasaryan's avatar
      alloc_tag: introduce clear_page_tag_ref() helper function · a8fc28da
      Suren Baghdasaryan authored
      In several cases we are freeing pages which were not allocated using
      common page allocators.  For such cases, in order to keep allocation
      accounting correct, we should clear the page tag to indicate that the page
      being freed is expected to not have a valid allocation tag.  Introduce
      clear_page_tag_ref() helper function to be used for this.
      
      Link: https://lkml.kernel.org/r/20240813150758.855881-1-surenb@google.com
      Fixes: d224eb02 ("codetag: debug: mark codetags for reserved pages as empty")
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Suggested-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kent Overstreet <kent.overstreet@linux.dev>
      Cc: Sourav Panda <souravpanda@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>	[6.10]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a8fc28da
    • Jinjie Ruan's avatar
      crash: fix riscv64 crash memory reserve dead loop · edb907a6
      Jinjie Ruan authored
      On RISCV64 Qemu machine with 512MB memory, cmdline "crashkernel=500M,high"
      will cause system stall as below:
      
      	 Zone ranges:
      	   DMA32    [mem 0x0000000080000000-0x000000009fffffff]
      	   Normal   empty
      	 Movable zone start for each node
      	 Early memory node ranges
      	   node   0: [mem 0x0000000080000000-0x000000008005ffff]
      	   node   0: [mem 0x0000000080060000-0x000000009fffffff]
      	 Initmem setup node 0 [mem 0x0000000080000000-0x000000009fffffff]
      	(stall here)
      
      commit 5d99cadf1568 ("crash: fix x86_32 crash memory reserve dead loop
      bug") fix this on 32-bit architecture.  However, the problem is not
      completely solved.  If `CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX` on
      64-bit architecture, for example, when system memory is equal to
      CRASH_ADDR_LOW_MAX on RISCV64, the following infinite loop will also
      occur:
      
      	-> reserve_crashkernel_generic() and high is true
      	   -> alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] fail
      	      -> alloc at [0, CRASH_ADDR_LOW_MAX] fail and repeatedly
      	         (because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX).
      
      As Catalin suggested, do not remove the ",high" reservation fallback to
      ",low" logic which will change arm64's kdump behavior, but fix it by
      skipping the above situation similar to commit d2f32f23190b ("crash: fix
      x86_32 crash memory reserve dead loop").
      
      After this patch, it print:
      	cannot allocate crashkernel (size:0x1f400000)
      
      Link: https://lkml.kernel.org/r/20240812062017.2674441-1-ruanjinjie@huawei.comSigned-off-by: default avatarJinjie Ruan <ruanjinjie@huawei.com>
      Suggested-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      edb907a6
    • Muhammad Usama Anjum's avatar
      selftests: memfd_secret: don't build memfd_secret test on unsupported arches · 7c5e8d21
      Muhammad Usama Anjum authored
      [1] mentions that memfd_secret is only supported on arm64, riscv, x86 and
      x86_64 for now.  It doesn't support other architectures.  I found the
      build error on arm and decided to send the fix as it was creating noise on
      KernelCI:
      
      memfd_secret.c: In function 'memfd_secret':
      memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
      did you mean 'memfd_secret'?
         42 |         return syscall(__NR_memfd_secret, flags);
            |                        ^~~~~~~~~~~~~~~~~
            |                        memfd_secret
      
      Hence I'm adding condition that memfd_secret should only be compiled on
      supported architectures.
      
      Also check in run_vmtests script if memfd_secret binary is present before
      executing it.
      
      Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com
      Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1]
      Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com
      Fixes: 76fe17ef ("secretmem: test: add basic selftest for memfd_secret(2)")
      Signed-off-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Reviewed-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Acked-by: default avatarMike Rapoport (Microsoft) <rppt@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7c5e8d21
    • Kirill A. Shutemov's avatar
      mm: fix endless reclaim on machines with unaccepted memory · 807174a9
      Kirill A. Shutemov authored
      Unaccepted memory is considered unusable free memory, which is not counted
      as free on the zone watermark check.  This causes get_page_from_freelist()
      to accept more memory to hit the high watermark, but it creates problems
      in the reclaim path.
      
      The reclaim path encounters a failed zone watermark check and attempts to
      reclaim memory.  This is usually successful, but if there is little or no
      reclaimable memory, it can result in endless reclaim with little to no
      progress.  This can occur early in the boot process, just after start of
      the init process when the only reclaimable memory is the page cache of the
      init executable and its libraries.
      
      Make unaccepted memory free from watermark check point of view.  This way
      unaccepted memory will never be the trigger of memory reclaim.  Accept
      more memory in the get_page_from_freelist() if needed.
      
      Link: https://lkml.kernel.org/r/20240809114854.3745464-2-kirill.shutemov@linux.intel.com
      Fixes: dcdfdd40 ("mm: Add support for unaccepted memory")
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: default avatarJianxiong Gao <jxgao@google.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Tested-by: default avatarJianxiong Gao <jxgao@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>	[6.5+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      807174a9
    • Dan Carpenter's avatar
      selftests/mm: compaction_test: fix off by one in check_compaction() · af3b7d09
      Dan Carpenter authored
      The "initial_nr_hugepages" variable is unsigned long so it takes up to 20
      characters to print, plus 1 more character for the NUL terminator. 
      Unfortunately, this buffer is not quite large enough for the terminator to
      fit.  Also use snprintf() for a belt and suspenders approach.
      
      Link: https://lkml.kernel.org/r/87470c06-b45a-4e83-92ff-aac2e7b9c6ba@stanley.mountain
      Fixes: fb9293b6 ("selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      af3b7d09
    • Zi Yan's avatar
      mm/numa: no task_numa_fault() call if PMD is changed · fd8c35a9
      Zi Yan authored
      When handling a numa page fault, task_numa_fault() should be called by a
      process that restores the page table of the faulted folio to avoid
      duplicated stats counting.  Commit c5b5a3dd ("mm: thp: refactor NUMA
      fault handling") restructured do_huge_pmd_numa_page() and did not avoid
      task_numa_fault() call in the second page table check after a numa
      migration failure.  Fix it by making all !pmd_same() return immediately.
      
      This issue can cause task_numa_fault() being called more than necessary
      and lead to unexpected numa balancing results (It is hard to tell whether
      the issue will cause positive or negative performance impact due to
      duplicated numa fault counting).
      
      Link: https://lkml.kernel.org/r/20240809145906.1513458-3-ziy@nvidia.com
      Fixes: c5b5a3dd ("mm: thp: refactor NUMA fault handling")
      Reported-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel.com/Signed-off-by: default avatarZi Yan <ziy@nvidia.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fd8c35a9
    • Zi Yan's avatar
      mm/numa: no task_numa_fault() call if PTE is changed · 40b760cf
      Zi Yan authored
      When handling a numa page fault, task_numa_fault() should be called by a
      process that restores the page table of the faulted folio to avoid
      duplicated stats counting.  Commit b99a342d ("NUMA balancing: reduce
      TLB flush via delaying mapping on hint page fault") restructured
      do_numa_page() and did not avoid task_numa_fault() call in the second page
      table check after a numa migration failure.  Fix it by making all
      !pte_same() return immediately.
      
      This issue can cause task_numa_fault() being called more than necessary
      and lead to unexpected numa balancing results (It is hard to tell whether
      the issue will cause positive or negative performance impact due to
      duplicated numa fault counting).
      
      Link: https://lkml.kernel.org/r/20240809145906.1513458-2-ziy@nvidia.com
      Fixes: b99a342d ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault")
      Signed-off-by: default avatarZi Yan <ziy@nvidia.com>
      Reported-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel.com/Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      40b760cf
    • Hailong Liu's avatar
      mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0 · 61ebe5a7
      Hailong Liu authored
      The __vmap_pages_range_noflush() assumes its argument pages** contains
      pages with the same page shift.  However, since commit e9c3cda4 ("mm,
      vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes
      __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation
      failed for high order, the pages** may contain two different page shifts
      (high order and order-0).  This could lead __vmap_pages_range_noflush() to
      perform incorrect mappings, potentially resulting in memory corruption.
      
      Users might encounter this as follows (vmap_allow_huge = true, 2M is for
      PMD_SIZE):
      
      kvmalloc(2M, __GFP_NOFAIL|GFP_X)
          __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP)
              vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0
                  vmap_pages_range()
                      vmap_pages_range_noflush()
                          __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
      
      We can remove the fallback code because if a high-order allocation fails,
      __vmalloc_node_range_noprof() will retry with order-0.  Therefore, it is
      unnecessary to fallback to order-0 here.  Therefore, fix this by removing
      the fallback code.
      
      Link: https://lkml.kernel.org/r/20240808122019.3361-1-hailong.liu@oppo.com
      Fixes: e9c3cda4 ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
      Signed-off-by: default avatarHailong Liu <hailong.liu@oppo.com>
      Reported-by: default avatarTangquan Zheng <zhengtangquan@oppo.com>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Acked-by: default avatarBarry Song <baohua@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      61ebe5a7
    • Waiman Long's avatar
      mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu · d75abd0d
      Waiman Long authored
      The memory_failure_cpu structure is a per-cpu structure.  Access to its
      content requires the use of get_cpu_var() to lock in the current CPU and
      disable preemption.  The use of a regular spinlock_t for locking purpose
      is fine for a non-RT kernel.
      
      Since the integration of RT spinlock support into the v5.15 kernel, a
      spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping
      lock in a preemption disabled context is illegal resulting in the
      following kind of warning.
      
        [12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
        [12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0
        [12135.732252] preempt_count: 1, expected: 0
        [12135.732255] RCU nest depth: 2, expected: 2
          :
        [12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021
        [12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred
        [12135.732433] Call Trace:
        [12135.732436]  <TASK>
        [12135.732450]  dump_stack_lvl+0x57/0x81
        [12135.732461]  __might_resched.cold+0xf4/0x12f
        [12135.732479]  rt_spin_lock+0x4c/0x100
        [12135.732491]  memory_failure_queue+0x40/0xe0
        [12135.732503]  ghes_do_memory_failure+0x53/0x390
        [12135.732516]  ghes_do_proc.constprop.0+0x229/0x3e0
        [12135.732575]  ghes_proc+0xf9/0x1a0
        [12135.732591]  ghes_notify_hed+0x6a/0x150
        [12135.732602]  notifier_call_chain+0x43/0xb0
        [12135.732626]  blocking_notifier_call_chain+0x43/0x60
        [12135.732637]  acpi_ev_notify_dispatch+0x47/0x70
        [12135.732648]  acpi_os_execute_deferred+0x13/0x20
        [12135.732654]  process_one_work+0x41f/0x500
        [12135.732695]  worker_thread+0x192/0x360
        [12135.732715]  kthread+0x111/0x140
        [12135.732733]  ret_from_fork+0x29/0x50
        [12135.732779]  </TASK>
      
      Fix it by using a raw_spinlock_t for locking instead.
      
      Also move the pr_err() out of the lock critical section and after
      put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep
      with this call.
      
      [longman@redhat.com: don't hold percpu ref across pr_err(), per Miaohe]
        Link: https://lkml.kernel.org/r/20240807181130.1122660-1-longman@redhat.com
      Link: https://lkml.kernel.org/r/20240806164107.1044956-1-longman@redhat.com
      Fixes: 0f383b6d ("locking/spinlock: Provide RT variant")
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d75abd0d
    • Pasha Tatashin's avatar
      mm: don't account memmap per-node · 9d857311
      Pasha Tatashin authored
      Fix invalid access to pgdat during hot-remove operation:
      ndctl users reported a GPF when trying to destroy a namespace:
      $ ndctl destroy-namespace all -r all -f
       Segmentation fault
       dmesg:
       Oops: general protection fault, probably for
       non-canonical address 0xdffffc0000005650: 0000 [#1] PREEMPT SMP KASAN
       PTI
       KASAN: probably user-memory-access in range
       [0x000000000002b280-0x000000000002b287]
       CPU: 26 UID: 0 PID: 1868 Comm: ndctl Not tainted 6.11.0-rc1 #1
       Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS
       2.20.1 09/13/2023
       RIP: 0010:mod_node_page_state+0x2a/0x110
      
      cxl-test users report a GPF when trying to unload the test module:
      $ modrpobe -r cxl-test
       dmesg
       BUG: unable to handle page fault for address: 0000000000004200
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 0 UID: 0 PID: 1076 Comm: modprobe Tainted: G O N 6.11.0-rc1 #197
       Tainted: [O]=OOT_MODULE, [N]=TEST
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/15
       RIP: 0010:mod_node_page_state+0x6/0x90
      
      Currently, when memory is hot-plugged or hot-removed the accounting is
      done based on the assumption that memmap is allocated from the same node
      as the hot-plugged/hot-removed memory, which is not always the case.
      
      In addition, there are challenges with keeping the node id of the memory
      that is being remove to the time when memmap accounting is actually
      performed: since this is done after remove_pfn_range_from_zone(), and
      also after remove_memory_block_devices(). Meaning that we cannot use
      pgdat nor walking though memblocks to get the nid.
      
      Given all of that, account the memmap overhead system wide instead.
      
      For this we are going to be using global atomic counters, but given that
      memmap size is rarely modified, and normally is only modified either
      during early boot when there is only one CPU, or under a hotplug global
      mutex lock, therefore there is no need for per-cpu optimizations.
      
      Also, while we are here rename nr_memmap to nr_memmap_pages, and
      nr_memmap_boot to nr_memmap_boot_pages to be self explanatory that the
      units are in page count.
      
      [pasha.tatashin@soleen.com: address a few nits from David Hildenbrand]
        Link: https://lkml.kernel.org/r/20240809191020.1142142-4-pasha.tatashin@soleen.com
      Link: https://lkml.kernel.org/r/20240809191020.1142142-4-pasha.tatashin@soleen.com
      Link: https://lkml.kernel.org/r/20240808213437.682006-4-pasha.tatashin@soleen.com
      Fixes: 15995a35 ("mm: report per-page metadata information")
      Signed-off-by: default avatarPasha Tatashin <pasha.tatashin@soleen.com>
      Reported-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Closes: https://lore.kernel.org/linux-cxl/CAHj4cs9Ax1=CoJkgBGP_+sNu6-6=6v=_L-ZBZY0bVLD3wUWZQg@mail.gmail.comReported-by: default avatarAlison Schofield <alison.schofield@intel.com>
      Closes: https://lore.kernel.org/linux-mm/Zq0tPd2h6alFz8XF@aschofie-mobl2/#tTested-by: default avatarDan Williams <dan.j.williams@intel.com>
      Tested-by: default avatarAlison Schofield <alison.schofield@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Tested-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
      Cc: Fan Ni <fan.ni@samsung.com>
      Cc: Joel Granados <j.granados@samsung.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Li Zhijian <lizhijian@fujitsu.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sourav Panda <souravpanda@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9d857311
    • Pasha Tatashin's avatar
      mm: add system wide stats items category · f4cb78af
      Pasha Tatashin authored
      /proc/vmstat contains events and stats, events can only grow, but stats
      can grow and shrink.
      
      vmstat has the following:
      -------------------------
      NR_VM_ZONE_STAT_ITEMS:	per-zone stats
      NR_VM_NUMA_EVENT_ITEMS:	per-numa events
      NR_VM_NODE_STAT_ITEMS:	per-numa stats
      NR_VM_WRITEBACK_STAT_ITEMS:	system-wide background-writeback and
      				dirty-throttling tresholds.
      NR_VM_EVENT_ITEMS:	system-wide events
      -------------------------
      
      Rename NR_VM_WRITEBACK_STAT_ITEMS to NR_VM_STAT_ITEMS, to track the
      system-wide stats, we are going to add per-page metadata stats to this
      category in the next patch.
      
      Also delete unused writeback_stat_name().
      
      Link: https://lkml.kernel.org/r/20240809191020.1142142-2-pasha.tatashin@soleen.com
      Link: https://lkml.kernel.org/r/20240808213437.682006-3-pasha.tatashin@soleen.com
      Fixes: 15995a35 ("mm: report per-page metadata information")
      Signed-off-by: default avatarPasha Tatashin <pasha.tatashin@soleen.com>
      Suggested-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Tested-by: default avatarAlison Schofield <alison.schofield@intel.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
      Cc: Joel Granados <j.granados@samsung.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Li Zhijian <lizhijian@fujitsu.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sourav Panda <souravpanda@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yi Zhang <yi.zhang@redhat.com>
      Cc: Fan Ni <fan.ni@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f4cb78af
    • Pasha Tatashin's avatar
      mm: don't account memmap on failure · ace0741a
      Pasha Tatashin authored
      Patch series "Fixes for memmap accounting", v4.
      
      Memmap accounting provides us with observability of how much memory is
      used for per-page metadata: i.e. "struct page"'s and "struct page_ext".
      It also provides with information of how much was allocated using
      boot allocator (i.e. not part of MemTotal), and how much was allocated
      using buddy allocated (i.e. part of MemTotal).
      
      This small series fixes a few problems that were discovered with the
      original patch.
      
      
      This patch (of 3):
      
      When we fail to allocate the mmemmap in alloc_vmemmap_page_list(), do not
      account any already-allocated pages: we're going to free all them before
      we return from the function.
      
      Link: https://lkml.kernel.org/r/20240809191020.1142142-1-pasha.tatashin@soleen.com
      Link: https://lkml.kernel.org/r/20240808213437.682006-1-pasha.tatashin@soleen.com
      Link: https://lkml.kernel.org/r/20240808213437.682006-2-pasha.tatashin@soleen.com
      Fixes: 15995a35 ("mm: report per-page metadata information")
      Signed-off-by: default avatarPasha Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: default avatarFan Ni <fan.ni@samsung.com>
      Reviewed-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Tested-by: default avatarAlison Schofield <alison.schofield@intel.com>
      Reviewed-by: default avatarMuchun Song <muchun.song@linux.dev>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
      Cc: Joel Granados <j.granados@samsung.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Li Zhijian <lizhijian@fujitsu.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sourav Panda <souravpanda@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yi Zhang <yi.zhang@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ace0741a
    • David Hildenbrand's avatar
      mm/hugetlb: fix hugetlb vs. core-mm PT locking · 5f75cfbd
      David Hildenbrand authored
      We recently made GUP's common page table walking code to also walk hugetlb
      VMAs without most hugetlb special-casing, preparing for the future of
      having less hugetlb-specific page table walking code in the codebase. 
      Turns out that we missed one page table locking detail: page table locking
      for hugetlb folios that are not mapped using a single PMD/PUD.
      
      Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
      hugetlb folios on arm64 with 4 KiB base page size).  GUP, as it walks the
      page tables, will perform a pte_offset_map_lock() to grab the PTE table
      lock.
      
      However, hugetlb that concurrently modifies these page tables would
      actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
      locks would differ.  Something similar can happen right now with hugetlb
      folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
      
      This issue can be reproduced [1], for example triggering:
      
      [ 3105.936100] ------------[ cut here ]------------
      [ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188
      [ 3105.944634] Modules linked in: [...]
      [ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1
      [ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024
      [ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [ 3105.991108] pc : try_grab_folio+0x11c/0x188
      [ 3105.994013] lr : follow_page_pte+0xd8/0x430
      [ 3105.996986] sp : ffff80008eafb8f0
      [ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43
      [ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48
      [ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978
      [ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001
      [ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000
      [ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000
      [ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0
      [ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080
      [ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000
      [ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000
      [ 3106.047957] Call trace:
      [ 3106.049522]  try_grab_folio+0x11c/0x188
      [ 3106.051996]  follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0
      [ 3106.055527]  follow_page_mask+0x1a0/0x2b8
      [ 3106.058118]  __get_user_pages+0xf0/0x348
      [ 3106.060647]  faultin_page_range+0xb0/0x360
      [ 3106.063651]  do_madvise+0x340/0x598
      
      Let's make huge_pte_lockptr() effectively use the same PT locks as any
      core-mm page table walker would.  Add ptep_lockptr() to obtain the PTE
      page table lock using a pte pointer -- unfortunately we cannot convert
      pte_lockptr() because virt_to_page() doesn't work with kmap'ed page tables
      we can have with CONFIG_HIGHPTE.
      
      Handle CONFIG_PGTABLE_LEVELS correctly by checking in reverse order, such
      that when e.g., CONFIG_PGTABLE_LEVELS==2 with
      PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE will work as expected.  Document
      why that works.
      
      There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb
      folio being mapped using two PTE page tables.  While hugetlb wants to take
      the PMD table lock, core-mm would grab the PTE table lock of one of both
      PTE page tables.  In such corner cases, we have to make sure that both
      locks match, which is (fortunately!) currently guaranteed for 8xx as it
      does not support SMP and consequently doesn't use split PT locks.
      
      [1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/
      
      Link: https://lkml.kernel.org/r/20240801204748.99107-1-david@redhat.com
      Fixes: 9cb28da5 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Tested-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5f75cfbd
    • Pedro Falcato's avatar
      mseal: fix is_madv_discard() · e46bc2e7
      Pedro Falcato authored
      is_madv_discard did its check wrong. MADV_ flags are not bitwise,
      they're normal sequential numbers. So, for instance:
      	behavior & (/* ... */ | MADV_REMOVE)
      
      tagged both MADV_REMOVE and MADV_RANDOM (bit 0 set) as discard
      operations.
      
      As a result the kernel could erroneously block certain madvises (e.g
      MADV_RANDOM or MADV_HUGEPAGE) on sealed VMAs due to them sharing bits
      with blocked MADV operations (e.g REMOVE or WIPEONFORK).
      
      This is obviously incorrect, so use a switch statement instead.
      
      Link: https://lkml.kernel.org/r/20240807173336.2523757-1-pedro.falcato@gmail.com
      Link: https://lkml.kernel.org/r/20240807173336.2523757-2-pedro.falcato@gmail.com
      Fixes: 8be7258a ("mseal: add mseal syscall")
      Signed-off-by: default avatarPedro Falcato <pedro.falcato@gmail.com>
      Tested-by: default avatarJeff Xu <jeffxu@chromium.org>
      Reviewed-by: default avatarJeff Xu <jeffxu@chromium.org>
      Cc: Kees Cook <kees@kernel.org>
      Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e46bc2e7
  2. 11 Aug, 2024 9 commits
    • Linus Torvalds's avatar
      Linux 6.11-rc3 · 7c626ce4
      Linus Torvalds authored
      7c626ce4
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7006fe2f
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
      
       - Fix 32-bit PTI for real.
      
         pti_clone_entry_text() is called twice, once before initcalls so that
         initcalls can use the user-mode helper and then again after text is
         set read only. Setting read only on 32-bit might break up the PMD
         mapping, which makes the second invocation of pti_clone_entry_text()
         find the mappings out of sync and failing.
      
         Allow the second call to split the existing PMDs in the user mapping
         and synchronize with the kernel mapping.
      
       - Don't make acpi_mp_wake_mailbox read-only after init as the mail box
         must be writable in the case that CPU hotplug operations happen after
         boot. Otherwise the attempt to start a CPU crashes with a write to
         read only memory.
      
       - Add a missing sanity check in mtrr_save_state() to ensure that the
         fixed MTRR MSRs are supported.
      
         Otherwise mtrr_save_state() ends up in a #GP, which is fixed up, but
         the WARN_ON() can bring systems down when panic on warn is set.
      
      * tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mtrr: Check if fixed MTRRs exist before saving them
        x86/paravirt: Fix incorrect virt spinlock setting on bare metal
        x86/acpi: Remove __ro_after_init from acpi_mp_wake_mailbox
        x86/mm: Fix PTI for i386 some more
      7006fe2f
    • Linus Torvalds's avatar
      Merge tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7270e931
      Linus Torvalds authored
      Pull time keeping fixes from Thomas Gleixner:
      
       - Fix a couple of issues in the NTP code where user supplied values are
         neither sanity checked nor clamped to the operating range. This
         results in integer overflows and eventualy NTP getting out of sync.
      
         According to the history the sanity checks had been removed in favor
         of clamping the values, but the clamping never worked correctly under
         all circumstances. The NTP people asked to not bring the sanity
         checks back as it might break existing applications.
      
         Make the clamping work correctly and add it where it's missing
      
       - If adjtimex() sets the clock it has to trigger the hrtimer subsystem
         so it can adjust and if the clock was set into the future expire
         timers if needed. The caller should provide a bitmask to tell
         hrtimers which clocks have been adjusted.
      
         adjtimex() uses not the proper constant and uses CLOCK_REALTIME
         instead, which is 0. So hrtimers adjusts only the clocks, but does
         not check for expired timers, which might make them expire really
         late. Use the proper bitmask constant instead.
      
      * tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        timekeeping: Fix bogus clock_was_set() invocation in do_adjtimex()
        ntp: Safeguard against time_constant overflow
        ntp: Clamp maxerror and esterror to operating range
      7270e931
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 56fe0a6a
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "Three small fixes for interrupt core and drivers:
      
         - The interrupt core fails to honor caller supplied affinity hints
           for non-managed interrupts and uses the system default affinity on
           startup instead. Set the missing flag in the descriptor to tell the
           core to use the provided affinity.
      
         - Fix a shift out of bounds error in the Xilinx driver
      
         - Handle switching to level trigger correctly in the RISCV APLIC
           driver. It failed to retrigger the interrupt which causes it to
           become stale"
      
      * tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/riscv-aplic: Retrigger MSI interrupt on source configuration
        irqchip/xilinx: Fix shift out of bounds
        genirq/irqdesc: Honor caller provided affinity in alloc_desc()
      56fe0a6a
    • Linus Torvalds's avatar
      Merge tag 'usb-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · cb2e5ee8
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are a number of small USB driver fixes for reported issues for
        6.11-rc3. Included in here are:
      
         - usb serial driver MODULE_DESCRIPTION() updates
      
         - usb serial driver fixes
      
         - typec driver fixes
      
         - usb-ip driver fix
      
         - gadget driver fixes
      
         - dt binding update
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: typec: ucsi: Fix a deadlock in ucsi_send_command_common()
        usb: typec: tcpm: avoid sink goto SNK_UNATTACHED state if not received source capability message
        usb: gadget: f_fs: pull out f->disable() from ffs_func_set_alt()
        usb: gadget: f_fs: restore ffs_func_disable() functionality
        USB: serial: debug: do not echo input by default
        usb: typec: tipd: Delete extra semi-colon
        usb: typec: tipd: Fix dereferencing freeing memory in tps6598x_apply_patch()
        usb: gadget: u_serial: Set start_delayed during suspend
        usb: typec: tcpci: Fix error code in tcpci_check_std_output_cap()
        usb: typec: fsa4480: Check if the chip is really there
        usb: gadget: core: Check for unset descriptor
        usb: vhci-hcd: Do not drop references before new references are gained
        usb: gadget: u_audio: Check return codes from usb_ep_enable and config_ep_by_speed.
        usb: gadget: midi2: Fix the response for FB info with block 0xff
        dt-bindings: usb: microchip,usb2514: Add USB2517 compatible
        USB: serial: garmin_gps: use struct_size() to allocate pkt
        USB: serial: garmin_gps: annotate struct garmin_packet with __counted_by
        USB: serial: add missing MODULE_DESCRIPTION() macros
        USB: serial: spcp8x5: remove unused struct 'spcp8x5_usb_ctrl_arg'
      cb2e5ee8
    • Linus Torvalds's avatar
      Merge tag 'tty-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 42b34a8d
      Linus Torvalds authored
      Pull tty / serial driver fixes from Greg KH:
       "Here are some small tty and serial driver fixes for reported problems
        for 6.11-rc3. Included in here are:
      
         - sc16is7xx serial driver fixes
      
         - uartclk bugfix for a divide by zero issue
      
         - conmakehash userspace build issue fix
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'tty-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        tty: vt: conmakehash: cope with abs_srctree no longer in env
        serial: sc16is7xx: fix invalid FIFO access with special register set
        serial: sc16is7xx: fix TX fifo corruption
        serial: core: check uartclk for zero to avoid divide by zero
      42b34a8d
    • Linus Torvalds's avatar
      Merge tag 'driver-core-6.11-rc3' of... · 84e6da57
      Linus Torvalds authored
      Merge tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core / documentation fixes from Greg KH:
       "Here are some small fixes, and some documentation updates for
        6.11-rc3. Included in here are:
      
         - embargoed hardware documenation updates based on a lot of review by
           legal-types in lots of companies to try to make the process a _bit_
           easier for us to manage over time.
      
         - rust firmware documentation fix
      
         - driver detach race fix for the fix that went into 6.11-rc1
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        driver core: Fix uevent_show() vs driver detach race
        Documentation: embargoed-hardware-issues.rst: add a section documenting the "early access" process
        Documentation: embargoed-hardware-issues.rst: minor cleanups and fixes
        rust: firmware: fix invalid rustdoc link
      84e6da57
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 9221afb2
      Linus Torvalds authored
      Pull char/misc fixes from Greg KH:
       "Here are some small char/misc/other driver fixes for 6.11-rc3 for
        reported issues. Included in here are:
      
         - binder driver fixes
      
         - fsi MODULE_DESCRIPTION() additions (people seem to love them...)
      
         - eeprom driver fix
      
         - Kconfig dependency fix to resolve build issues
      
         - spmi driver fixes
      
        All of these have been in linux-next for a while with no reported
        problems"
      
      * tag 'char-misc-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        spmi: pmic-arb: add missing newline in dev_err format strings
        spmi: pmic-arb: Pass the correct of_node to irq_domain_add_tree
        binder_alloc: Fix sleeping function called from invalid context
        binder: fix descriptor lookup for context manager
        char: add missing NetWinder MODULE_DESCRIPTION() macros
        misc: mrvl-cn10k-dpi: add PCI_IOV dependency
        eeprom: ee1004: Fix locking issues in ee1004_probe()
        fsi: add missing MODULE_DESCRIPTION() macros
      9221afb2
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 04cc50c2
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two core fixes: one to prevent discard type changes (seen on iSCSI)
        during intermittent errors and the other is fixing a lockdep problem
        caused by the queue limits change.
      
        And one driver fix in ufs"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: sd: Keep the discard mode stable
        scsi: sd: Move sd_read_cpr() out of the q->limits_lock region
        scsi: ufs: core: Fix hba->last_dme_cmd_tstamp timestamp updating logic
      04cc50c2
  3. 10 Aug, 2024 8 commits
  4. 09 Aug, 2024 7 commits
    • Kent Overstreet's avatar
      bcachefs: bcachefs_metadata_version_disk_accounting_v3 · 8a2491db
      Kent Overstreet authored
      bcachefs_metadata_version_disk_accounting_v2 erroneously had padding
      bytes in disk_accounting_key, which is a problem because we have to
      guarantee that all unused bytes in disk_accounting_key are zeroed.
      
      Fortunately 6.11 isn't out yet, so it's cheap to fix this by spinning a
      new version.
      Reported-by: default avatarGabriel de Perthuis <g2p.code@gmail.com>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      8a2491db
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2024-08-10' of https://gitlab.freedesktop.org/drm/kernel · 15833fea
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Weekly regular fixes, mostly amdgpu with i915/xe having a few each,
        and then some misc bits across the board, seems about right for rc3
        time.
      
        client:
         - fix null ptr deref
      
        bridge:
         - connector: fix double free
      
        atomic:
         - fix async flip update
      
        panel:
         - document panel
      
        omap:
         - add config dependency
      
        tests:
         - fix gem shmem test
      
        drm buddy:
         - Add start address to trim function
      
        amdgpu:
         - DMCUB fix
         - Fix DET programming on some DCNs
         - DCC fixes
         - DCN 4.0.1 fixes
         - SMU 14.0.x update
         - MMHUB fix
         - DCN 3.1.4 fix
         - GC 12.0 fixes
         - Fix soft recovery error propogation
         - SDMA 7.0 fixes
         - DSC fix
      
        xe:
         - Fix off-by-one when processing RTP rules
         - Use dma_fence_chain_free in chain fence unused as a sync
         - Fix PL1 disable flow in xe_hwmon_power_max_write
         - Take ref to VM in delayed dump snapshot
      
        i915:
         - correct dual pps handling for MTL_PCH+ [display]
         - Adjust vma offset for framebuffer mmap offset [gem]
         - Fix Virtual Memory mapping boundaries calculation [gem]
         - Allow evicting to use the requested placement
         - Attempt to get pages without eviction first"
      
      * tag 'drm-fixes-2024-08-10' of https://gitlab.freedesktop.org/drm/kernel: (31 commits)
        drm/xe: Take ref to VM in delayed snapshot
        drm/xe/hwmon: Fix PL1 disable flow in xe_hwmon_power_max_write
        drm/xe: Use dma_fence_chain_free in chain fence unused as a sync
        drm/xe/rtp: Fix off-by-one when processing rules
        drm/amdgpu: Add DCC GFX12 flag to enable address alignment
        drm/amdgpu: correct sdma7 max dw
        drm/amdgpu: Add address alignment support to DCC buffers
        drm/amd/display: Skip Recompute DSC Params if no Stream on Link
        drm/amdgpu: change non-dcc buffer copy configuration
        drm/amdgpu: Forward soft recovery errors to userspace
        drm/amdgpu: add golden setting for gc v12
        drm/buddy: Add start address support to trim function
        drm/amd/display: Add missing program DET segment call to pipe init
        drm/amd/display: Add missing DCN314 to the DML Makefile
        drm/amdgpu: force to use legacy inv in mmhub
        drm/amd/pm: update powerplay structure on smu v14.0.2/3
        drm/amd/display: Add missing mcache registers
        drm/amd/display: Add dcc propagation value
        drm/amd/display: Add missing DET segments programming
        drm/amd/display: Replace dm_execute_dmub_cmd with dc_wake_and_execute_dmub_cmd
        ...
      15833fea
    • Kent Overstreet's avatar
      bcachefs: improve bch2_dev_usage_to_text() · 1a9e219d
      Kent Overstreet authored
      Add a line for capacity
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      1a9e219d
    • Kent Overstreet's avatar
      bcachefs: bch2_accounting_invalid() · 077e4737
      Kent Overstreet authored
      Implement bch2_accounting_invalid(); check for junk at the end, and
      replicas accounting entries in particular need to be checked or we'll
      pop asserts later.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      077e4737
    • Linus Torvalds's avatar
      Merge tag 'bitmap-6.11-rc' of https://github.com/norov/linux · afdab700
      Linus Torvalds authored
      Pull cpumask fix from Yury Norov:
       "Fix for cpumask merge"
      
      [ Mea culpa, this was my mismerge due to too much cut-and-paste - Linus ]
      
      * tag 'bitmap-6.11-rc' of https://github.com/norov/linux:
        cpumask: Fix crash on updating CPU enabled mask
      afdab700
    • Linus Torvalds's avatar
      Merge tag 'pm-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 85082897
      Linus Torvalds authored
      Pull power management fix from Rafael Wysocki:
       "Change the default EPP (energy-performence preference) value for the
        Emerald Rapids processor in the intel_pstate driver.
      
        Thisshould improve both the performance and energy efficiency (Pedro
        Henrique Kopper)"
      
      * tag 'pm-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: intel_pstate: Update Balance performance EPP for Emerald Rapids
      85082897
    • Linus Torvalds's avatar
      Merge tag 'asm-generic-fixes-6.11-2' of... · 58d40f5f
      Linus Torvalds authored
      Merge tag 'asm-generic-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
      
      Pull asm-generic fixes from Arnd Bergmann:
       "There are two more changes to the syscall.tbl conversion: the
        '__NR_newfstat' in the previous bugfix was a mistake and gets reverted
        now, after triple-checking that the contents are now back to what they
        were on all architectures. The __NR_nfsservctl definition is not
        really needed but came up in the same discussion as it had previously
        been defined in uapi/asm-generic/unistd.h and tested for in user
        space.
      
        There are a few more symbols that used to be defined in the old
        unistd.h file, but that are never defined on any other architecture
        using syscall.tbl format. These used to be needed inside of the
        kernel:
      
           __NR_syscalls
           __NR_arch_specific_syscall
           __NR3264_*
      
        Searching for these on https://codesearch.debian.net/ shows a few
        packages (rustc, golang, clamav, libseccomp, librsvg, strace) that
        duplicate all the macros from asm/unistd.h, but nothing that actually
        uses the macros, so I concluded that they are fine to omit after all"
      
      * tag 'asm-generic-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
        syscalls: add back legacy __NR_nfsservctl macro
        syscalls: fix fstat() entry again
      58d40f5f