• Jiaqi Yan's avatar
    mm/khugepaged: recover from poisoned file-backed memory · 12904d95
    Jiaqi Yan authored
    Make collapse_file roll back when copying pages failed. More concretely:
    - extract copying operations into a separate loop
    - postpone the updates for nr_none until both scanning and copying
      succeeded
    - postpone joining small xarray entries until both scanning and copying
      succeeded
    - postpone the update operations to NR_XXX_THPS until both scanning and
      copying succeeded
    - for non-SHMEM file, roll back filemap_nr_thps_inc if scan succeeded but
      copying failed
    
    Tested manually:
    0. Enable khugepaged on system under test. Mount tmpfs at /mnt/ramdisk.
    1. Start a two-thread application. Each thread allocates a chunk of
       non-huge memory buffer from /mnt/ramdisk.
    2. Pick 4 random buffer address (2 in each thread) and inject
       uncorrectable memory errors at physical addresses.
    3. Signal both threads to make their memory buffer collapsible, i.e.
       calling madvise(MADV_HUGEPAGE).
    4. Wait and then check kernel log: khugepaged is able to recover from
       poisoned pages by skipping them.
    5. Signal both threads to inspect their buffer contents and make sure no
       data corruption.
    
    Link: https://lkml.kernel.org/r/20230329151121.949896-4-jiaqiyan@google.comSigned-off-by: default avatarJiaqi Yan <jiaqiyan@google.com>
    Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
    Acked-by: default avatarHugh Dickins <hughd@google.com>
    Cc: David Stevens <stevensd@chromium.org>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Tong Tiangen <tongtiangen@huawei.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    12904d95
khugepaged.c 74.9 KB