• Michal Hocko's avatar
    hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined · b15c8726
    Michal Hocko authored
    We have received a bug report that an injected MCE about faulty memory
    prevents memory offline to succeed on 4.4 base kernel.  The underlying
    reason was that the HWPoison page has an elevated reference count and the
    migration keeps failing.  There are two problems with that.  First of all
    it is dubious to migrate the poisoned page because we know that accessing
    that memory is possible to fail.  Secondly it doesn't make any sense to
    migrate a potentially broken content and preserve the memory corruption
    over to a new location.
    
    Oscar has found out that 4.4 and the current upstream kernels behave
    slightly differently with his simply testcase
    
    ===
    
    int main(void)
    {
            int ret;
            int i;
            int fd;
            char *array = malloc(4096);
            char *array_locked = malloc(4096);
    
            fd = open("/tmp/data", O_RDONLY);
            read(fd, array, 4095);
    
            for (i = 0; i < 4096; i++)
                    array_locked[i] = 'd';
    
            ret = mlock((void *)PAGE_ALIGN((unsigned long)array_locked), sizeof(array_locked));
            if (ret)
                    perror("mlock");
    
            sleep (20);
    
            ret = madvise((void *)PAGE_ALIGN((unsigned long)array_locked), 4096, MADV_HWPOISON);
            if (ret)
                    perror("madvise");
    
            for (i = 0; i < 4096; i++)
                    array_locked[i] = 'd';
    
            return 0;
    }
    ===
    
    + offline this memory.
    
    In 4.4 kernels he saw the hwpoisoned page to be returned back to the LRU
    list
    kernel:  [<ffffffff81019ac9>] dump_trace+0x59/0x340
    kernel:  [<ffffffff81019e9a>] show_stack_log_lvl+0xea/0x170
    kernel:  [<ffffffff8101ac71>] show_stack+0x21/0x40
    kernel:  [<ffffffff8132bb90>] dump_stack+0x5c/0x7c
    kernel:  [<ffffffff810815a1>] warn_slowpath_common+0x81/0xb0
    kernel:  [<ffffffff811a275c>] __pagevec_lru_add_fn+0x14c/0x160
    kernel:  [<ffffffff811a2eed>] pagevec_lru_move_fn+0xad/0x100
    kernel:  [<ffffffff811a334c>] __lru_cache_add+0x6c/0xb0
    kernel:  [<ffffffff81195236>] add_to_page_cache_lru+0x46/0x70
    kernel:  [<ffffffffa02b4373>] extent_readpages+0xc3/0x1a0 [btrfs]
    kernel:  [<ffffffff811a16d7>] __do_page_cache_readahead+0x177/0x200
    kernel:  [<ffffffff811a18c8>] ondemand_readahead+0x168/0x2a0
    kernel:  [<ffffffff8119673f>] generic_file_read_iter+0x41f/0x660
    kernel:  [<ffffffff8120e50d>] __vfs_read+0xcd/0x140
    kernel:  [<ffffffff8120e9ea>] vfs_read+0x7a/0x120
    kernel:  [<ffffffff8121404b>] kernel_read+0x3b/0x50
    kernel:  [<ffffffff81215c80>] do_execveat_common.isra.29+0x490/0x6f0
    kernel:  [<ffffffff81215f08>] do_execve+0x28/0x30
    kernel:  [<ffffffff81095ddb>] call_usermodehelper_exec_async+0xfb/0x130
    kernel:  [<ffffffff8161c045>] ret_from_fork+0x55/0x80
    
    And that latter confuses the hotremove path because an LRU page is
    attempted to be migrated and that fails due to an elevated reference
    count.  It is quite possible that the reuse of the HWPoisoned page is some
    kind of fixed race condition but I am not really sure about that.
    
    With the upstream kernel the failure is slightly different.  The page
    doesn't seem to have LRU bit set but isolate_movable_page simply fails and
    do_migrate_range simply puts all the isolated pages back to LRU and
    therefore no progress is made and scan_movable_pages finds same set of
    pages over and over again.
    
    Fix both cases by explicitly checking HWPoisoned pages before we even try
    to get reference on the page, try to unmap it if it is still mapped.  As
    explained by Naoya:
    
    : Hwpoison code never unmapped those for no big reason because
    : Ksm pages never dominate memory, so we simply didn't have strong
    : motivation to save the pages.
    
    Also put WARN_ON(PageLRU) in case there is a race and we can hit LRU
    HWPoison pages which shouldn't happen but I couldn't convince myself about
    that.  Naoya has noted the following:
    
    : Theoretically no such gurantee, because try_to_unmap() doesn't have a
    : guarantee of success and then memory_failure() returns immediately
    : when hwpoison_user_mappings fails.
    : Or the following code (comes after hwpoison_user_mappings block) also impli=
    : es
    : that the target page can still have PageLRU flag.
    :
    :         /*
    :          * Torn down by someone else?
    :          */
    :         if (PageLRU(p) && !PageSwapCache(p) && p->mapping =3D=3D NULL) {
    :                 action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED);
    :                 res =3D -EBUSY;
    :                 goto out;
    :         }
    :
    : So I think it's OK to keep "if (WARN_ON(PageLRU(page)))" block in
    : current version of your patch.
    
    Link: http://lkml.kernel.org/r/20181206120135.14079-1-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Reviewed-by: default avatarOscar Salvador <osalvador@suse.com>
    Debugged-by: default avatarOscar Salvador <osalvador@suse.com>
    Tested-by: default avatarOscar Salvador <osalvador@suse.com>
    Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
    Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    b15c8726
memory_hotplug.c 48.3 KB