• Naoya Horiguchi's avatar
    mm: soft-offline: check return value in second __get_any_page() call · cb260e84
    Naoya Horiguchi authored
    commit d96b339f upstream.
    
    I saw the following BUG_ON triggered in a testcase where a process calls
    madvise(MADV_SOFT_OFFLINE) on thps, along with a background process that
    calls migratepages command repeatedly (doing ping-pong among different
    NUMA nodes) for the first process:
    
       Soft offlining page 0x60000 at 0x700000600000
       __get_any_page: 0x60000 free buddy page
       page:ffffea0001800000 count:0 mapcount:-127 mapping:          (null) index:0x1
       flags: 0x1fffc0000000000()
       page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
       ------------[ cut here ]------------
       kernel BUG at /src/linux-dev/include/linux/mm.h:342!
       invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
       Modules linked in: cfg80211 rfkill crc32c_intel serio_raw virtio_balloon i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi
       CPU: 3 PID: 3035 Comm: test_alloc_gene Tainted: G           O    4.4.0-rc8-v4.4-rc8-160107-1501-00000-rc8+ #74
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff88007c63d5c0 ti: ffff88007c210000 task.ti: ffff88007c210000
       RIP: 0010:[<ffffffff8118998c>]  [<ffffffff8118998c>] put_page+0x5c/0x60
       RSP: 0018:ffff88007c213e00  EFLAGS: 00010246
       Call Trace:
         put_hwpoison_page+0x4e/0x80
         soft_offline_page+0x501/0x520
         SyS_madvise+0x6bc/0x6f0
         entry_SYSCALL_64_fastpath+0x12/0x6a
       Code: 8b fc ff ff 5b 5d c3 48 89 df e8 b0 fa ff ff 48 89 df 31 f6 e8 c6 7d ff ff 5b 5d c3 48 c7 c6 08 54 a2 81 48 89 df e8 a4 c5 01 00 <0f> 0b 66 90 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 48 8b 47
       RIP  [<ffffffff8118998c>] put_page+0x5c/0x60
        RSP <ffff88007c213e00>
    
    The root cause resides in get_any_page() which retries to get a refcount
    of the page to be soft-offlined.  This function calls
    put_hwpoison_page(), expecting that the target page is putback to LRU
    list.  But it can be also freed to buddy.  So the second check need to
    care about such case.
    
    Fixes: af8fae7c ("mm/memory-failure.c: clean up soft_offline_page()")
    Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Sasha Levin <sasha.levin@oracle.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Jerome Marchand <jmarchan@redhat.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Steve Capper <steve.capper@linaro.org>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    cb260e84
memory-failure.c 47.7 KB