• Jiaqi Yan's avatar
    mm/hwpoison: delete all entries before traversal in __folio_free_raw_hwp · 9e130c4b
    Jiaqi Yan authored
    Patch series "Improve hugetlbfs read on HWPOISON hugepages", v4.
    
    Today when hardware memory is corrupted in a hugetlb hugepage, kernel
    leaves the hugepage in pagecache [1]; otherwise future mmap or read will
    suject to silent data corruption.  This is implemented by returning -EIO
    from hugetlb_read_iter immediately if the hugepage has HWPOISON flag set.
    
    Since memory_failure already tracks the raw HWPOISON subpages in a
    hugepage, a natural improvement is possible: if userspace only asks for
    healthy subpages in the pagecache, kernel can return these data.
    
    This patchset implements this improvement.  It consist of three parts. 
    The 1st commit exports the functionality to tell if a subpage inside a
    hugetlb hugepage is a raw HWPOISON page.  The 2nd commit teaches
    hugetlbfs_read_iter to return as many healthy bytes as possible.  The 3rd
    commit properly tests this new feature.
    
    [1] commit 8625147c ("hugetlbfs: don't delete error page from pagecache")
    
    
    This patch (of 4):
    
    Traversal on llist (e.g.  llist_for_each_safe) is only safe AFTER entries
    are deleted from the llist.  Correct the way __folio_free_raw_hwp deletes
    and frees raw_hwp_page entries in raw_hwp_list: first llist_del_all, then
    kfree within llist_for_each_safe.
    
    As of today, concurrent adding, deleting, and traversal on raw_hwp_list
    from hugetlb.c and/or memory-failure.c are fine with each other.  Note
    this is guaranteed partly by the lock-free nature of llist, and partly by
    holding hugetlb_lock and/or mf_mutex.  For example, as llist_del_all is
    lock-free with itself, folio_clear_hugetlb_hwpoison()s from
    __update_and_free_hugetlb_folio and memory_failure won't need explicit
    locking when freeing the raw_hwp_list.  New code that manipulates
    raw_hwp_list must be careful to ensure the concurrency correctness.
    
    Link: https://lkml.kernel.org/r/20230713001833.3778937-1-jiaqiyan@google.com
    Link: https://lkml.kernel.org/r/20230713001833.3778937-2-jiaqiyan@google.comSigned-off-by: default avatarJiaqi Yan <jiaqiyan@google.com>
    Acked-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Acked-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    9e130c4b
memory-failure.c 71.6 KB