1. 16 Apr, 2024 8 commits
    • Miaohe Lin's avatar
      mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled · 1983184c
      Miaohe Lin authored
      When I did hard offline test with hugetlb pages, below deadlock occurs:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      6.8.0-11409-gf6cef5f8 #1 Not tainted
      ------------------------------------------------------
      bash/46904 is trying to acquire lock:
      ffffffffabe68910 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_dec+0x16/0x60
      
      but task is already holding lock:
      ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (pcp_batch_high_lock){+.+.}-{3:3}:
             __mutex_lock+0x6c/0x770
             page_alloc_cpu_online+0x3c/0x70
             cpuhp_invoke_callback+0x397/0x5f0
             __cpuhp_invoke_callback_range+0x71/0xe0
             _cpu_up+0xeb/0x210
             cpu_up+0x91/0xe0
             cpuhp_bringup_mask+0x49/0xb0
             bringup_nonboot_cpus+0xb7/0xe0
             smp_init+0x25/0xa0
             kernel_init_freeable+0x15f/0x3e0
             kernel_init+0x15/0x1b0
             ret_from_fork+0x2f/0x50
             ret_from_fork_asm+0x1a/0x30
      
      -> #0 (cpu_hotplug_lock){++++}-{0:0}:
             __lock_acquire+0x1298/0x1cd0
             lock_acquire+0xc0/0x2b0
             cpus_read_lock+0x2a/0xc0
             static_key_slow_dec+0x16/0x60
             __hugetlb_vmemmap_restore_folio+0x1b9/0x200
             dissolve_free_huge_page+0x211/0x260
             __page_handle_poison+0x45/0xc0
             memory_failure+0x65e/0xc70
             hard_offline_page_store+0x55/0xa0
             kernfs_fop_write_iter+0x12c/0x1d0
             vfs_write+0x387/0x550
             ksys_write+0x64/0xe0
             do_syscall_64+0xca/0x1e0
             entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(pcp_batch_high_lock);
                                     lock(cpu_hotplug_lock);
                                     lock(pcp_batch_high_lock);
        rlock(cpu_hotplug_lock);
      
       *** DEADLOCK ***
      
      5 locks held by bash/46904:
       #0: ffff98f6c3bb23f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
       #1: ffff98f6c328e488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
       #2: ffff98ef83b31890 (kn->active#113){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
       #3: ffffffffabf9db48 (mf_mutex){+.+.}-{3:3}, at: memory_failure+0x44/0xc70
       #4: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40
      
      stack backtrace:
      CPU: 10 PID: 46904 Comm: bash Kdump: loaded Not tainted 6.8.0-11409-gf6cef5f8 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x68/0xa0
       check_noncircular+0x129/0x140
       __lock_acquire+0x1298/0x1cd0
       lock_acquire+0xc0/0x2b0
       cpus_read_lock+0x2a/0xc0
       static_key_slow_dec+0x16/0x60
       __hugetlb_vmemmap_restore_folio+0x1b9/0x200
       dissolve_free_huge_page+0x211/0x260
       __page_handle_poison+0x45/0xc0
       memory_failure+0x65e/0xc70
       hard_offline_page_store+0x55/0xa0
       kernfs_fop_write_iter+0x12c/0x1d0
       vfs_write+0x387/0x550
       ksys_write+0x64/0xe0
       do_syscall_64+0xca/0x1e0
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      RIP: 0033:0x7fc862314887
      Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      RSP: 002b:00007fff19311268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc862314887
      RDX: 000000000000000c RSI: 000056405645fe10 RDI: 0000000000000001
      RBP: 000056405645fe10 R08: 00007fc8623d1460 R09: 000000007fffffff
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
      R13: 00007fc86241b780 R14: 00007fc862417600 R15: 00007fc862416a00
      
      In short, below scene breaks the lock dependency chain:
      
       memory_failure
        __page_handle_poison
         zone_pcp_disable -- lock(pcp_batch_high_lock)
         dissolve_free_huge_page
          __hugetlb_vmemmap_restore_folio
           static_key_slow_dec
            cpus_read_lock -- rlock(cpu_hotplug_lock)
      
      Fix this by calling drain_all_pages() instead.
      
      This issue won't occur until commit a6b40850 ("mm: hugetlb: replace
      hugetlb_free_vmemmap_enabled with a static_key").  As it introduced
      rlock(cpu_hotplug_lock) in dissolve_free_huge_page() code path while
      lock(pcp_batch_high_lock) is already in the __page_handle_poison().
      
      [linmiaohe@huawei.com: extend comment per Oscar]
      [akpm@linux-foundation.org: reflow block comment]
      Link: https://lkml.kernel.org/r/20240407085456.2798193-1-linmiaohe@huawei.com
      Fixes: a6b40850 ("mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarJane Chu <jane.chu@oracle.com>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1983184c
    • Peter Xu's avatar
      mm/userfaultfd: allow hugetlb change protection upon poison entry · c5977c95
      Peter Xu authored
      After UFFDIO_POISON, there can be two kinds of hugetlb pte markers, either
      the POISON one or UFFD_WP one.
      
      Allow change protection to run on a poisoned marker just like !hugetlb
      cases, ignoring the marker irrelevant of the permission.
      
      Here the two bits are mutual exclusive.  For example, when install a
      poisoned entry it must not be UFFD_WP already (by checking pte_none()
      before such install).  And it also means if UFFD_WP is set there must have
      no POISON bit set.  It makes sense because UFFD_WP is a bit to reflect
      permission, and permissions do not apply if the pte is poisoned and
      destined to sigbus.
      
      So here we simply check uffd_wp bit set first, do nothing otherwise.
      
      Attach the Fixes to UFFDIO_POISON work, as before that it should not be
      possible to have poison entry for hugetlb (e.g., hugetlb doesn't do swap,
      so no chance of swapin errors).
      
      Link: https://lkml.kernel.org/r/20240405231920.1772199-1-peterx@redhat.com
      Link: https://lore.kernel.org/r/000000000000920d5e0615602dd1@google.com
      Fixes: fc71884a ("mm: userfaultfd: add new UFFDIO_POISON ioctl")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reported-by: syzbot+b07c8ac8eee3d4d8440f@syzkaller.appspotmail.com
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Cc: <stable@vger.kernel.org>	[6.6+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c5977c95
    • Oscar Salvador's avatar
      mm,page_owner: fix printing of stack records · 74017458
      Oscar Salvador authored
      When seq_* code sees that its buffer overflowed, it re-allocates a bigger
      onecand calls seq_operations->start() callback again.  stack_start()
      naively though that if it got called again, it meant that the old record
      got already printed so it returned the next object, but that is not true.
      
      The consequence of that is that every time stack_stop() -> stack_start()
      get called because we needed a bigger buffer, stack_start() will skip
      entries, and those will not be printed.
      
      Fix it by not advancing to the next object in stack_start().
      
      Link: https://lkml.kernel.org/r/20240404070702.2744-5-osalvador@suse.de
      Fixes: 765973a0 ("mm,page_owner: display all stacks and their count")
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      74017458
    • Oscar Salvador's avatar
      mm,page_owner: fix accounting of pages when migrating · 718b1f33
      Oscar Salvador authored
      Upon migration, new allocated pages are being given the handle of the old
      pages.  This is problematic because it means that for the stack which
      allocated the old page, we will be substracting the old page + the new one
      when that page is freed, creating an accounting imbalance.
      
      There is an interest in keeping it that way, as otherwise the output will
      biased towards migration stacks should those operations occur often, but
      that is not really helpful.
      
      The link from the new page to the old stack is being performed by calling
      __update_page_owner_handle() in __folio_copy_owner().  The only thing that
      is left is to link the migrate stack to the old page, so the old page will
      be subtracted from the migrate stack, avoiding by doing so any possible
      imbalance.
      
      Link: https://lkml.kernel.org/r/20240404070702.2744-4-osalvador@suse.de
      Fixes: 217b2119 ("mm,page_owner: implement the tracking of the stacks count")
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      718b1f33
    • Oscar Salvador's avatar
      mm,page_owner: fix refcount imbalance · f5c12105
      Oscar Salvador authored
      Current code does not contemplate scenarios were an allocation and free
      operation on the same pages do not handle it in the same amount at once. 
      To give an example, page_alloc_exact(), where we will allocate a page of
      enough order to stafisfy the size request, but we will free the remainings
      right away.
      
      In the above example, we will increment the stack_record refcount only
      once, but we will decrease it the same number of times as number of unused
      pages we have to free.  This will lead to a warning because of refcount
      imbalance.
      
      Fix this by recording the number of base pages in the refcount field.
      
      Link: https://lkml.kernel.org/r/20240404070702.2744-3-osalvador@suse.de
      Reported-by: syzbot+41bbfdb8d41003d12c0f@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/linux-mm/00000000000090e8ff0613eda0e5@google.com
      Fixes: 217b2119 ("mm,page_owner: implement the tracking of the stacks count")
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarAlexandre Ghiti <alexghiti@rivosinc.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f5c12105
    • Oscar Salvador's avatar
      mm,page_owner: update metadata for tail pages · ea4b5b33
      Oscar Salvador authored
      Patch series "page_owner: Fix refcount imbalance and print fixup", v4.
      
      This series consists of a refactoring/correctness of updating the metadata
      of tail pages, a couple of fixups for the refcounting part and a fixup for
      the stack_start() function.
      
      From this series on, instead of counting the stacks, we count the
      outstanding nr_base_pages each stack has, which gives us a much better
      memory overview.  The other fixup is for the migration part.
      
      A more detailed explanation can be found in the changelog of the
      respective patches.
      
      
      This patch (of 4):
      
      __set_page_owner_handle() and __reset_page_owner() update the metadata of
      all pages when the page is of a higher-order, but we miss to do the same
      when the pages are migrated.  __folio_copy_owner() only updates the
      metadata of the head page, meaning that the information stored in the
      first page and the tail pages will not match.
      
      Strictly speaking that is not a big problem because 1) we do not print
      tail pages and 2) upon splitting all tail pages will inherit the metadata
      of the head page, but it is better to have all metadata in check should
      there be any problem, so it can ease debugging.
      
      For that purpose, a couple of helpers are created
      __update_page_owner_handle() which updates the metadata on allocation, and
      __update_page_owner_free_handle() which does the same when the page is
      freed.
      
      __folio_copy_owner() will make use of both as it needs to entirely replace
      the page_owner metadata for the new page.
      
      Link: https://lkml.kernel.org/r/20240404070702.2744-1-osalvador@suse.de
      Link: https://lkml.kernel.org/r/20240404070702.2744-2-osalvador@suse.deSigned-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ea4b5b33
    • Lokesh Gidra's avatar
      userfaultfd: change src_folio after ensuring it's unpinned in UFFDIO_MOVE · c0205eaf
      Lokesh Gidra authored
      Commit d7a08838 ("mm: userfaultfd: fix unexpected change to src_folio
      when UFFDIO_MOVE fails") moved the src_folio->{mapping, index} changing to
      after clearing the page-table and ensuring that it's not pinned.  This
      avoids failure of swapout+migration and possibly memory corruption.
      
      However, the commit missed fixing it in the huge-page case.
      
      Link: https://lkml.kernel.org/r/20240404171726.2302435-1-lokeshgidra@google.com
      Fixes: adef4406 ("userfaultfd: UFFDIO_MOVE uABI")
      Signed-off-by: default avatarLokesh Gidra <lokeshgidra@google.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Kalesh Singh <kaleshsingh@google.com>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: Nicolas Geoffray <ngeoffray@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c0205eaf
    • David Hildenbrand's avatar
      mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly · 631426ba
      David Hildenbrand authored
      Darrick reports that in some cases where pread() would fail with -EIO and
      mmap()+access would generate a SIGBUS signal, MADV_POPULATE_READ /
      MADV_POPULATE_WRITE will keep retrying forever and not fail with -EFAULT.
      
      While the madvise() call can be interrupted by a signal, this is not the
      desired behavior.  MADV_POPULATE_READ / MADV_POPULATE_WRITE should behave
      like page faults in that case: fail and not retry forever.
      
      A reproducer can be found at [1].
      
      The reason is that __get_user_pages(), as called by
      faultin_vma_page_range(), will not handle VM_FAULT_RETRY in a proper way:
      it will simply return 0 when VM_FAULT_RETRY happened, making
      madvise_populate()->faultin_vma_page_range() retry again and again, never
      setting FOLL_TRIED->FAULT_FLAG_TRIED for __get_user_pages().
      
      __get_user_pages_locked() does what we want, but duplicating that logic in
      faultin_vma_page_range() feels wrong.
      
      So let's use __get_user_pages_locked() instead, that will detect
      VM_FAULT_RETRY and set FOLL_TRIED when retrying, making the fault handler
      return VM_FAULT_SIGBUS (VM_FAULT_ERROR) at some point, propagating -EFAULT
      from faultin_page() to __get_user_pages(), all the way to
      madvise_populate().
      
      But, there is an issue: __get_user_pages_locked() will end up re-taking
      the MM lock and then __get_user_pages() will do another VMA lookup.  In
      the meantime, the VMA layout could have changed and we'd fail with
      different error codes than we'd want to.
      
      As __get_user_pages() will currently do a new VMA lookup either way, let
      it do the VMA handling in a different way, controlled by a new
      FOLL_MADV_POPULATE flag, effectively moving these checks from
      madvise_populate() + faultin_page_range() in there.
      
      With this change, Darricks reproducer properly fails with -EFAULT, as
      documented for MADV_POPULATE_READ / MADV_POPULATE_WRITE.
      
      [1] https://lore.kernel.org/all/20240313171936.GN1927156@frogsfrogsfrogs/
      
      Link: https://lkml.kernel.org/r/20240314161300.382526-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20240314161300.382526-2-david@redhat.com
      Fixes: 4ca9b385 ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Closes: https://lore.kernel.org/all/20240311223815.GW1927156@frogsfrogsfrogs/
      Cc: Darrick J. Wong <djwong@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      631426ba
  2. 14 Apr, 2024 10 commits
  3. 13 Apr, 2024 5 commits
    • Linus Torvalds's avatar
      Merge tag 'ata-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux · 7efd0a74
      Linus Torvalds authored
      Pull ata fixes from Damien Le Moal:
      
       - Add the mask_port_map parameter to the ahci driver. This is a
         follow-up to the recent snafu with the ASMedia controller and its
         virtual port hidding port-multiplier devices. As ASMedia confirmed
         that there is no way to determine if these slow-to-probe virtual
         ports are actually representing the ports of a port-multiplier
         devices, this new parameter allow masking ports to significantly
         speed up probing during system boot, resulting in shorter boot times.
      
       - A fix for an incorrect handling of a port unlock in
         ata_scsi_dev_rescan().
      
       - Allow command duration limits to be detected for ACS-4 devices are
         there are such devices out in the field.
      
      * tag 'ata-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
        ata: libata-core: Allow command duration limits detection for ACS-4 drives
        ata: libata-scsi: Fix ata_scsi_dev_rescan() error path
        ata: ahci: Add mask_port_map module parameter
      7efd0a74
    • Linus Torvalds's avatar
      Merge tag 'zonefs-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs · 76b0e9c4
      Linus Torvalds authored
      Pull zonefs fix from Damien Le Moal:
      
       - Suppress a coccicheck warning using str_plural()
      
      * tag 'zonefs-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
        zonefs: Use str_plural() to fix Coccinelle warning
      76b0e9c4
    • Linus Torvalds's avatar
      Merge tag 'v6.9-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · fa4022cb
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
      
       - fix for oops in cifs_get_fattr of deleted files
      
       - fix for the remote open counter going negative in some directory
         lease cases
      
       - fix for mkfifo to instantiate dentry to avoid possible crash
      
       - important fix to allow handling key rotation for mount and remount
         (ie cases that are becoming more common when password that was used
         for the mount will expire soon but will be replaced by new password)
      
      * tag 'v6.9-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        smb3: fix broken reconnect when password changing on the server by allowing password rotation
        smb: client: instantiate when creating SFU files
        smb3: fix Open files on server counter going negative
        smb: client: fix NULL ptr deref in cifs_mark_open_handles_for_deleted_file()
      fa4022cb
    • Igor Pylypiv's avatar
      ata: libata-core: Allow command duration limits detection for ACS-4 drives · c0297e7d
      Igor Pylypiv authored
      Even though the command duration limits (CDL) feature was first added
      in ACS-5 (major version 12), there are some ACS-4 (major version 11)
      drives that implement CDL as well.
      
      IDENTIFY_DEVICE, SUPPORTED_CAPABILITIES, and CURRENT_SETTINGS log pages
      are mandatory in the ACS-4 standard so it should be safe to read these
      log pages on older drives implementing the ACS-4 standard.
      
      Fixes: 62e4a60e ("scsi: ata: libata: Detect support for command duration limits")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarIgor Pylypiv <ipylypiv@google.com>
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      c0297e7d
    • Damien Le Moal's avatar
      ata: libata-scsi: Fix ata_scsi_dev_rescan() error path · 79336504
      Damien Le Moal authored
      Commit 0c76106c ("scsi: sd: Fix TCG OPAL unlock on system resume")
      incorrectly handles failures of scsi_resume_device() in
      ata_scsi_dev_rescan(), leading to a double call to
      spin_unlock_irqrestore() to unlock a device port. Fix this by redefining
      the goto labels used in case of errors and only unlock the port
      scsi_scan_mutex when scsi_resume_device() fails.
      
      Bug found with the Smatch static checker warning:
      
      	drivers/ata/libata-scsi.c:4774 ata_scsi_dev_rescan()
      	error: double unlocked 'ap->lock' (orig line 4757)
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Fixes: 0c76106c ("scsi: sd: Fix TCG OPAL unlock on system resume")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarNiklas Cassel <cassel@kernel.org>
      79336504
  4. 12 Apr, 2024 17 commits
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 8f2c0577
      Linus Torvalds authored
      Pull arm64 fix from Catalin Marinas:
       "Fix the TLBI RANGE operand calculation causing live migration under
        KVM/arm64 to miss dirty pages due to stale TLB entries"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: tlb: Fix TLBI RANGE operand
      8f2c0577
    • Linus Torvalds's avatar
      Merge tag 'soc-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 678e14c7
      Linus Torvalds authored
      Pull SoC fixes from Arnd Bergmann:
       "The device tree changes this time are all for NXP i.MX platforms,
        addressing issues with clocks and regulators on i.MX7 and i.MX8.
      
        The old OMAP2 based Nokia N8x0 tablet get a couple of code fixes for
        regressions that came in.
      
        The ARM SCMI and FF-A firmware interfaces get a couple of minor bug
        fixes.
      
        A regression fix for RISC-V cache management addresses a problem with
        probe order on Sifive cores"
      
      * tag 'soc-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (23 commits)
        MAINTAINERS: Change Krzysztof Kozlowski's email address
        arm64: dts: imx8qm-ss-dma: fix can lpcg indices
        arm64: dts: imx8-ss-dma: fix can lpcg indices
        arm64: dts: imx8-ss-dma: fix adc lpcg indices
        arm64: dts: imx8-ss-dma: fix pwm lpcg indices
        arm64: dts: imx8-ss-dma: fix spi lpcg indices
        arm64: dts: imx8-ss-conn: fix usb lpcg indices
        arm64: dts: imx8-ss-lsio: fix pwm lpcg indices
        ARM: dts: imx7s-warp: Pass OV2680 link-frequencies
        ARM: dts: imx7-mba7: Use 'no-mmc' property
        arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order
        arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix USB vbus regulator
        arm64: dts: freescale: imx8mp-venice-gw72xx-2x: fix USB vbus regulator
        cache: sifive_ccache: Partially convert to a platform driver
        firmware: arm_scmi: Make raw debugfs entries non-seekable
        firmware: arm_scmi: Fix wrong fastchannel initialization
        firmware: arm_ffa: Fix the partition ID check in ffa_notification_info_get()
        ARM: OMAP2+: fix USB regression on Nokia N8x0
        mmc: omap: restore original power up/down steps
        mmc: omap: fix deferred probe
        ...
      678e14c7
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · c7c4e130
      Linus Torvalds authored
      Pull iommu fixes from Joerg Roedel:
      
       - Intel VT-d Fixes:
           - Allocate local memory for PRQ page
           - Fix WARN_ON in iommu probe path
           - Fix wrong use of pasid config
      
       - AMD IOMMU Fixes:
           - Lock inversion fix
           - Log message severity fix
           - Disable SNP when v2 page-tables are used
      
       - Mediatek driver:
           - Fix module autoloading
      
      * tag 'iommu-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/amd: Change log message severity
        iommu/vt-d: Fix WARN_ON in iommu probe path
        iommu/vt-d: Allocate local memory for page request queue
        iommu/vt-d: Fix wrong use of pasid config
        iommu: mtk: fix module autoloading
        iommu/amd: Do not enable SNP when V2 page table is enabled
        iommu/amd: Fix possible irq lock inversion dependency issue
      c7c4e130
    • Linus Torvalds's avatar
      Merge tag 'pci-v6.9-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci · b3812ff0
      Linus Torvalds authored
      Pull pci fixes from Bjorn Helgaas:
      
       - Revert a quirk that prevented Secondary Bus Reset for LSI / Agere
         FW643.
      
         We thought the device was broken, but the reset does work correctly
         on other platforms, and the reset avoids leaking data out of VMs
         (Bjorn Helgaas)
      
       - Update MAINTAINERS to reflect that Gustavo Pimentel is no longer
         reachable (Manivannan Sadhasivam)
      
      * tag 'pci-v6.9-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
        Revert "PCI: Mark LSI FW643 to avoid bus reset"
        MAINTAINERS: Drop Gustavo Pimentel as PCI DWC Maintainer
      b3812ff0
    • Linus Torvalds's avatar
      Merge tag 'block-6.9-20240412' of git://git.kernel.dk/linux · d7ad0581
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - MD pull request via Song:
             - UAF fix (Yu)
      
       - Avoid out-of-bounds shift in blk-iocost (Rik)
      
       - Fix for q->blkg_list corruption (Ming)
      
       - Relax virt boundary mask/size segment checking (Ming)
      
      * tag 'block-6.9-20240412' of git://git.kernel.dk/linux:
        block: fix that blk_time_get_ns() doesn't update time after schedule
        block: allow device to have both virt_boundary_mask and max segment size
        block: fix q->blkg_list corruption during disk rebind
        blk-iocost: avoid out of bounds shift
        raid1: fix use-after-free for original bio in raid1_write_request()
      d7ad0581
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.9-20240412' of git://git.kernel.dk/linux · c7adbe2e
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Fix for sigmask restoring while waiting for events (Alexey)
      
       - Typo fix in comment (Haiyue)
      
       - Fix for a msg_control retstore on SEND_ZC retries (Pavel)
      
      * tag 'io_uring-6.9-20240412' of git://git.kernel.dk/linux:
        io-uring: correct typo in comment for IOU_F_TWQ_LAZY_WAKE
        io_uring/net: restore msg_control on sendzc retry
        io_uring: Fix io_cqring_wait() not restoring sigmask on get_timespec64() failure
      c7adbe2e
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-6.9-rc4' of https://github.com/ceph/ceph-client · 90d3eaaf
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "Two CephFS fixes marked for stable and a MAINTAINERS update"
      
      * tag 'ceph-for-6.9-rc4' of https://github.com/ceph/ceph-client:
        MAINTAINERS: remove myself as a Reviewer for Ceph
        ceph: switch to use cap_delay_lock for the unlink delay list
        ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE
      90d3eaaf
    • Linus Torvalds's avatar
      Kconfig: add some hidden tabs on purpose · d5cf50da
      Linus Torvalds authored
      Commit d96c3600 ("tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig
      entry") removed a hidden tab because it apparently showed breakage in
      some third-party kernel config parsing tool.
      
      It wasn't clear what tool it was, but let's make sure it gets fixed.
      Because if you can't parse tabs as whitespace, you should not be parsing
      the kernel Kconfig files.
      
      In fact, let's make such breakage more obvious than some esoteric ftrace
      record size option.  If you can't parse tabs, you can't have page sizes.
      
      Yes, tab-vs-space confusion is sadly a traditional Unix thing, and
      'make' is famous for being broken in this regard.  But no, that does not
      mean that it's ok.
      
      I'd add more random tabs to our Kconfig files, but I don't want to make
      things uglier than necessary.  But it *might* bbe necessary if it turns
      out we see more of this kind of silly tooling.
      
      Fixes: d96c3600 ("tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry")
      Link: https://lore.kernel.org/lkml/CAHk-=wj-hLLN_t_m5OL4dXLaxvXKy_axuoJYXif7iczbfgAevQ@mail.gmail.com/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d5cf50da
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 5939d451
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix the buffer_percent accounting as it is dependent on three
         variables:
      
           1) pages_read - number of subbuffers read
           2) pages_lost - number of subbuffers lost due to overwrite
           3) pages_touched - number of pages that a writer entered
      
         These three counters only increment, and to know how many active
         pages there are on the buffer at any given time, the pages_read and
         pages_lost are subtracted from pages_touched.
      
         But the pages touched was incremented whenever any writer went to the
         next subbuffer even if it wasn't the only one, so it was incremented
         more than it should be causing the counter for how many subbuffers
         currently have content incorrect, which caused the buffer_percent
         that holds waiters until the ring buffer is filled to a given
         percentage to wake up early.
      
       - Fix warning of unused functions when PERF_EVENTS is not configured in
      
       - Replace bad tab with space in Kconfig for FTRACE_RECORD_RECURSION_SIZE
      
       - Fix to some kerneldoc function comments in eventfs code.
      
      * tag 'trace-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        ring-buffer: Only update pages_touched when a new page is touched
        tracing: hide unused ftrace_event_id_fops
        tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry
        eventfs: Fix kernel-doc comments to functions
      5939d451
    • Linus Torvalds's avatar
      Merge tag 'mips-fixes_6.9_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · e00011a1
      Linus Torvalds authored
      Pull MIPS fix from Thomas Bogendoerfer:
       "Fix for syscall_get_nr() to make it work even if tracing is disabled"
      
      * tag 'mips-fixes_6.9_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: scall: Save thread_info.syscall unconditionally on entry
      e00011a1
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2024-04-12' of https://gitlab.freedesktop.org/drm/kernel · d1c13e80
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Looks like everyone woke up after holidays, this weeks pull has a
        bunch of stuff all over, 2 weeks worth of amdgpu is a lot of it, then
        i915/xe have a few, a bunch of msm fixes, then some scattered driver
        fixes.
      
        I expect things will settle down for rc5.
      
        client:
         - Protect connector modes with mode_config mutex
      
        ast:
         - Fix soft lockup
      
        host1x:
         - Do not setup DMA for virtual addresses
      
        ivpu:
         - Fix deadlock in context_xa
         - PCI fixes
         - Fixes to error handling
      
        nouveau:
         - gsp: Fix OOB access
         - Fix casting
      
        panfrost:
         - Fix error path in MMU code
      
        qxl:
         - Revert "drm/qxl: simplify qxl_fence_wait"
      
        vmwgfx:
         - Enable DMA for SEV mappings
      
        i915:
         - Couple CDCLK programming fixes
         - HDCP related fix
         - 4 Bigjoiner related fixes
         - Fix for a circular locking around GuC on reset+wedged case
      
        xe:
         - Fix double display mutex initializations
         - Fix u32 -> u64 implicit conversions
         - Fix RING_CONTEXT_CONTROL not marked as masked
      
        msm:
         - DP refcount leak fix on disconnect
         - Add missing newlines to prints in msm_fb and msm_kms
         - fix dpu debugfs entry permissions
         - Fix the interface table for the catalog of X1E80100
         - fix irq message printing
         - Bindings fix to add DP node as child of mdss for mdss node
         - Minor typo fix in DP driver API which handles port status change
         - fix CHRASHDUMP_READ()
         - fix HHB (highest bank bit) for a619 to fix UBWC corruption
      
        amdgpu:
         - GPU reset fixes
         - Fix some confusing logging
         - UMSCH fix
         - Aborted suspend fix
         - DCN 3.5 fixes
         - S4 fix
         - MES logging fixes
         - SMU 14 fixes
         - SDMA 4.4.2 fix
         - KASAN fix
         - SMU 13.0.10 fix
         - VCN partition fix
         - GFX11 fixes
         - DWB fixes
         - Plane handling fix
         - FAMS fix
         - DCN 3.1.6 fix
         - VSC SDP fixes
         - OLED panel fix
         - GFX 11.5 fix
      
        amdkfd:
         - GPU reset fixes
         - fix ioctl integer overflow"
      
      * tag 'drm-fixes-2024-04-12' of https://gitlab.freedesktop.org/drm/kernel: (65 commits)
        amdkfd: use calloc instead of kzalloc to avoid integer overflow
        drm/xe: Label RING_CONTEXT_CONTROL as masked
        drm/xe/xe_migrate: Cast to output precision before multiplying operands
        drm/xe/hwmon: Cast result to output precision on left shift of operand
        drm/xe/display: Fix double mutex initialization
        drm/amdgpu: differentiate external rev id for gfx 11.5.0
        drm/amd/display: Adjust dprefclk by down spread percentage.
        drm/amd/display: Set VSC SDP Colorimetry same way for MST and SST
        drm/amd/display: Program VSC SDP colorimetry for all DP sinks >= 1.4
        drm/amd/display: fix disable otg wa logic in DCN316
        drm/amd/display: Do not recursively call manual trigger programming
        drm/amd/display: always reset ODM mode in context when adding first plane
        drm/amdgpu: fix incorrect number of active RBs for gfx11
        drm/amd/display: Return max resolution supported by DWB
        amd/amdkfd: sync all devices to wait all processes being evicted
        drm/amdgpu: clear set_q_mode_offs when VM changed
        drm/amdgpu: Fix VCN allocation in CPX partition
        drm/amd/pm: fix the high voltage issue after unload
        drm/amd/display: Skip on writeback when it's not applicable
        drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2
        ...
      d1c13e80
    • Oleg Nesterov's avatar
      selftests: kselftest: Fix build failure with NOLIBC · 16767502
      Oleg Nesterov authored
      As Mark explains ksft_min_kernel_version() can't be compiled with nolibc,
      it doesn't implement uname().
      
      Fixes: 6d029c25 ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
      Reported-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20240412123536.GA32444@redhat.com
      Closes: https://lore.kernel.org/all/f0523b3a-ea08-4615-b0fb-5b504a2d39df@sirena.org.uk/
      16767502
    • Yu Kuai's avatar
      block: fix that blk_time_get_ns() doesn't update time after schedule · 3ec48489
      Yu Kuai authored
      While monitoring the throttle time of IO from iocost, it's found that
      such time is always zero after the io_schedule() from ioc_rqos_throttle,
      for example, with the following debug patch:
      
      +       printk("%s-%d: %s enter %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
              while (true) {
                      set_current_state(TASK_UNINTERRUPTIBLE);
                      if (wait.committed)
                              break;
                      io_schedule();
              }
      +       printk("%s-%d: %s exit  %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
      
      It can be observerd that blk_time_get_ns() always return the same time:
      
      [ 1068.096579] fio-1268: ioc_rqos_throttle enter 1067901962288
      [ 1068.272587] fio-1268: ioc_rqos_throttle exit  1067901962288
      [ 1068.274389] fio-1268: ioc_rqos_throttle enter 1067901962288
      [ 1068.472690] fio-1268: ioc_rqos_throttle exit  1067901962288
      [ 1068.474485] fio-1268: ioc_rqos_throttle enter 1067901962288
      [ 1068.672656] fio-1268: ioc_rqos_throttle exit  1067901962288
      [ 1068.674451] fio-1268: ioc_rqos_throttle enter 1067901962288
      [ 1068.872655] fio-1268: ioc_rqos_throttle exit  1067901962288
      
      And I think the root cause is that 'PF_BLOCK_TS' is always cleared
      by blk_flush_plug() before scheduel(), hence blk_plug_invalidate_ts()
      will never be called:
      
      blk_time_get_ns
       plug->cur_ktime = ktime_get_ns();
       current->flags |= PF_BLOCK_TS;
      
      io_schedule:
       io_schedule_prepare
        blk_flush_plug
         __blk_flush_plug
          /* the flag is cleared, while time is not */
          current->flags &= ~PF_BLOCK_TS;
       schedule
       sched_update_worker
        /* the flag is not set, hence plug->cur_ktime is not cleared */
        if (tsk->flags & PF_BLOCK_TS)
         blk_plug_invalidate_ts()
      
      blk_time_get_ns
       /* got the time stashed before schedule */
       return plug->cur_ktime;
      
      Fix the problem by clearing cached time in __blk_flush_plug().
      
      Fixes: 06b23f92 ("block: update cached timestamp post schedule/preemption")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Link: https://lore.kernel.org/r/20240411032349.3051233-2-yukuai1@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3ec48489
    • John Stultz's avatar
      selftests: timers: Fix abs() warning in posix_timers test · ed366de8
      John Stultz authored
      Building with clang results in the following warning:
      
        posix_timers.c:69:6: warning: absolute value function 'abs' given an
            argument of type 'long long' but has parameter of type 'int' which may
            cause truncation of value [-Wabsolute-value]
              if (abs(diff - DELAY * USECS_PER_SEC) > USECS_PER_SEC / 2) {
                  ^
      So switch to using llabs() instead.
      
      Fixes: 0bc4b0cf ("selftests: add basic posix timers selftests")
      Signed-off-by: default avatarJohn Stultz <jstultz@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20240410232637.4135564-3-jstultz@google.com
      ed366de8
    • Nathan Chancellor's avatar
      selftests: kselftest: Mark functions that unconditionally call exit() as __noreturn · f7d5bcd3
      Nathan Chancellor authored
      After commit 6d029c25 ("selftests/timers/posix_timers: Reimplement
      check_timer_distribution()"), clang warns:
      
        tools/testing/selftests/timers/../kselftest.h:398:6: warning: variable 'major' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
          398 |         if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
              |             ^~~~~~~~~~~~
        tools/testing/selftests/timers/../kselftest.h:401:9: note: uninitialized use occurs here
          401 |         return major > min_major || (major == min_major && minor >= min_minor);
              |                ^~~~~
        tools/testing/selftests/timers/../kselftest.h:398:6: note: remove the '||' if its condition is always false
          398 |         if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
              |             ^~~~~~~~~~~~~~~
        tools/testing/selftests/timers/../kselftest.h:395:20: note: initialize the variable 'major' to silence this warning
          395 |         unsigned int major, minor;
              |                           ^
              |                            = 0
      
      This is a false positive because if uname() fails, ksft_exit_fail_msg()
      will be called, which unconditionally calls exit(), a noreturn function.
      However, clang does not know that ksft_exit_fail_msg() will call exit() at
      the point in the pipeline that the warning is emitted because inlining has
      not occurred, so it assumes control flow will resume normally after
      ksft_exit_fail_msg() is called.
      
      Make it clear to clang that all of the functions that call exit()
      unconditionally in kselftest.h are noreturn transitively by marking them
      explicitly with '__attribute__((__noreturn__))', which clears up the
      warning above and any future warnings that may appear for the same reason.
      
      Fixes: 6d029c25 ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
      Reported-by: default avatarJohn Stultz <jstultz@google.com>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20240411-mark-kselftest-exit-funcs-noreturn-v1-1-b027c948f586@kernel.org
      Closes: https://lore.kernel.org/all/20240410232637.4135564-2-jstultz@google.com/
      f7d5bcd3
    • John Stultz's avatar
      selftests: timers: Fix posix_timers ksft_print_msg() warning · e4a6bcea
      John Stultz authored
      After commit 6d029c25 ("selftests/timers/posix_timers: Reimplement
      check_timer_distribution()") the following warning occurs when building
      with an older gcc:
      
      posix_timers.c:250:2: warning: format not a string literal and no format arguments [-Wformat-security]
        250 |  ksft_print_msg(errmsg);
            |  ^~~~~~~~~~~~~~
      
      Fix this up by changing it to ksft_print_msg("%s", errmsg)
      
      Fixes: 6d029c25 ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
      Signed-off-by: default avatarJohn Stultz <jstultz@google.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarJustin Stitt <justinstitt@google.com>
      Acked-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20240410232637.4135564-1-jstultz@google.com
      e4a6bcea
    • Vasant Hegde's avatar
      iommu/amd: Change log message severity · b8246a2a
      Vasant Hegde authored
      Use consistent log severity (pr_warn) to log all messages in SNP
      enable path.
      Suggested-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarVasant Hegde <vasant.hegde@amd.com>
      Link: https://lore.kernel.org/r/20240410101643.32309-1-vasant.hegde@amd.comSigned-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      b8246a2a