• Baolin Wang's avatar
    mm: memcg: fix split queue list crash when large folio migration · 9bcef597
    Baolin Wang authored
    When running autonuma with enabling multi-size THP, I encountered the
    following kernel crash issue:
    
    [  134.290216] list_del corruption. prev->next should be fffff9ad42e1c490,
    but was dead000000000100. (prev=fffff9ad42399890)
    [  134.290877] kernel BUG at lib/list_debug.c:62!
    [  134.291052] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    [  134.291210] CPU: 56 PID: 8037 Comm: numa01 Kdump: loaded Tainted:
    G            E      6.7.0-rc4+ #20
    [  134.291649] RIP: 0010:__list_del_entry_valid_or_report+0x97/0xb0
    ......
    [  134.294252] Call Trace:
    [  134.294362]  <TASK>
    [  134.294440]  ? die+0x33/0x90
    [  134.294561]  ? do_trap+0xe0/0x110
    ......
    [  134.295681]  ? __list_del_entry_valid_or_report+0x97/0xb0
    [  134.295842]  folio_undo_large_rmappable+0x99/0x100
    [  134.296003]  destroy_large_folio+0x68/0x70
    [  134.296172]  migrate_folio_move+0x12e/0x260
    [  134.296264]  ? __pfx_remove_migration_pte+0x10/0x10
    [  134.296389]  migrate_pages_batch+0x495/0x6b0
    [  134.296523]  migrate_pages+0x1d0/0x500
    [  134.296646]  ? __pfx_alloc_misplaced_dst_folio+0x10/0x10
    [  134.296799]  migrate_misplaced_folio+0x12d/0x2b0
    [  134.296953]  do_numa_page+0x1f4/0x570
    [  134.297121]  __handle_mm_fault+0x2b0/0x6c0
    [  134.297254]  handle_mm_fault+0x107/0x270
    [  134.300897]  do_user_addr_fault+0x167/0x680
    [  134.304561]  exc_page_fault+0x65/0x140
    [  134.307919]  asm_exc_page_fault+0x22/0x30
    
    The reason for the crash is that, the commit 85ce2c51 ("memcontrol:
    only transfer the memcg data for migration") removed the charging and
    uncharging operations of the migration folios and cleared the memcg data
    of the old folio.
    
    During the subsequent release process of the old large folio in
    destroy_large_folio(), if the large folio needs to be removed from the
    split queue, an incorrect split queue can be obtained (which is
    pgdat->deferred_split_queue) because the old folio's memcg is NULL now. 
    This can lead to list operations being performed under the wrong split
    queue lock protection, resulting in a list crash as above.
    
    After the migration, the old folio is going to be freed, so we can remove
    it from the split queue in mem_cgroup_migrate() a bit earlier before
    clearing the memcg data to avoid getting incorrect split queue.
    
    [akpm@linux-foundation.org: fix comment, per Zi Yan]
    Link: https://lkml.kernel.org/r/61273e5e9b490682388377c20f52d19de4a80460.1703054559.git.baolin.wang@linux.alibaba.com
    Fixes: 85ce2c51 ("memcontrol: only transfer the memcg data for migration")
    Signed-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
    Reviewed-by: default avatarNhat Pham <nphamcs@gmail.com>
    Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
    Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Shakeel Butt <shakeelb@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    9bcef597
memcontrol.c 212 KB