• Naoya Horiguchi's avatar
    mm/memory-failure.c: transfer page count from head page to tail page after split thp · a3e0f9e4
    Naoya Horiguchi authored
    Memory failures on thp tail pages cause kernel panic like below:
    
       mce: [Hardware Error]: Machine check events logged
       MCE exception done on CPU 7
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
       IP: [<ffffffff811b7cd1>] dequeue_hwpoisoned_huge_page+0x131/0x1e0
       PGD bae42067 PUD ba47d067 PMD 0
       Oops: 0000 [#1] SMP
      ...
       CPU: 7 PID: 128 Comm: kworker/7:2 Tainted: G   M       O 3.13.0-rc4-131217-1558-00003-g83b7df08e462 #25
      ...
       Call Trace:
         me_huge_page+0x3e/0x50
         memory_failure+0x4bb/0xc20
         mce_process_work+0x3e/0x70
         process_one_work+0x171/0x420
         worker_thread+0x11b/0x3a0
         ? manage_workers.isra.25+0x2b0/0x2b0
         kthread+0xe4/0x100
         ? kthread_create_on_node+0x190/0x190
         ret_from_fork+0x7c/0xb0
         ? kthread_create_on_node+0x190/0x190
      ...
       RIP   dequeue_hwpoisoned_huge_page+0x131/0x1e0
       CR2: 0000000000000058
    
    The reasoning of this problem is shown below:
     - when we have a memory error on a thp tail page, the memory error
       handler grabs a refcount of the head page to keep the thp under us.
     - Before unmapping the error page from processes, we split the thp,
       where page refcounts of both of head/tail pages don't change.
     - Then we call try_to_unmap() over the error page (which was a tail
       page before). We didn't pin the error page to handle the memory error,
       this error page is freed and removed from LRU list.
     - We never have the error page on LRU list, so the first page state
       check returns "unknown page," then we move to the second check
       with the saved page flag.
     - The saved page flag have PG_tail set, so the second page state check
       returns "hugepage."
     - We call me_huge_page() for freed error page, then we hit the above panic.
    
    The root cause is that we didn't move refcount from the head page to the
    tail page after split thp.  So this patch suggests to do this.
    
    This panic was introduced by commit 524fca1e ("HWPOISON: fix
    misjudgement of page_action() for errors on mlocked pages").  Note that we
    did have the same refcount problem before this commit, but it was just
    ignored because we had only first page state check which returned "unknown
    page." The commit changed the refcount problem from "doesn't work" to
    "kernel panic."
    Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Reviewed-by: default avatarWanpeng Li <liwanp@linux.vnet.ibm.com>
    Cc: Andi Kleen <andi@firstfloor.org>
    Cc: <stable@vger.kernel.org>	[3.9+]
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a3e0f9e4
memory-failure.c 46.2 KB