• Ilya Lipnitskiy's avatar
    mm: fix race by making init_zero_pfn() early_initcall · e720e7d0
    Ilya Lipnitskiy authored
    There are code paths that rely on zero_pfn to be fully initialized
    before core_initcall.  For example, wq_sysfs_init() is a core_initcall
    function that eventually results in a call to kernel_execve, which
    causes a page fault with a subsequent mmput.  If zero_pfn is not
    initialized by then it may not get cleaned up properly and result in an
    error:
    
      BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1
    
    Here is an analysis of the race as seen on a MIPS device. On this
    particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until
    initialized, at which point it becomes PFN 5120:
    
      1. wq_sysfs_init calls into kobject_uevent_env at core_initcall:
           kobject_uevent_env+0x7e4/0x7ec
           kset_register+0x68/0x88
           bus_register+0xdc/0x34c
           subsys_virtual_register+0x34/0x78
           wq_sysfs_init+0x1c/0x4c
           do_one_initcall+0x50/0x1a8
           kernel_init_freeable+0x230/0x2c8
           kernel_init+0x10/0x100
           ret_from_kernel_thread+0x14/0x1c
    
      2. kobject_uevent_env() calls call_usermodehelper_exec() which executes
         kernel_execve asynchronously.
    
      3. Memory allocations in kernel_execve cause a page fault, bumping the
         MM reference counter:
           add_mm_counter_fast+0xb4/0xc0
           handle_mm_fault+0x6e4/0xea0
           __get_user_pages.part.78+0x190/0x37c
           __get_user_pages_remote+0x128/0x360
           get_arg_page+0x34/0xa0
           copy_string_kernel+0x194/0x2a4
           kernel_execve+0x11c/0x298
           call_usermodehelper_exec_async+0x114/0x194
    
      4. In case zero_pfn has not been initialized yet, zap_pte_range does
         not decrement the MM_ANONPAGES RSS counter and the BUG message is
         triggered shortly afterwards when __mmdrop checks the ref counters:
           __mmdrop+0x98/0x1d0
           free_bprm+0x44/0x118
           kernel_execve+0x160/0x1d8
           call_usermodehelper_exec_async+0x114/0x194
           ret_from_kernel_thread+0x14/0x1c
    
    To avoid races such as described above, initialize init_zero_pfn at
    early_initcall level.  Depending on the architecture, ZERO_PAGE is
    either constant or gets initialized even earlier, at paging_init, so
    there is no issue with initializing zero_pfn earlier.
    
    Link: https://lkml.kernel.org/r/CALCv0x2YqOXEAy2Q=hafjhHCtTHVodChv1qpM=niAXOpqEbt7w@mail.gmail.com
    
    Signed-off-by: default avatarIlya Lipnitskiy <ilya.lipnitskiy@gmail.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: stable@vger.kernel.org
    Tested-by: default avatar周琰杰 (Zhou Yanjie) <zhouyanjie@wanyeetech.com>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    e720e7d0
memory.c 142 KB