• Muchun Song's avatar
    mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl · 78f39084
    Muchun Song authored
    We must add hugetlb_free_vmemmap=on (or "off") to the boot cmdline and
    reboot the server to enable or disable the feature of optimizing vmemmap
    pages associated with HugeTLB pages.  However, rebooting usually takes a
    long time.  So add a sysctl to enable or disable the feature at runtime
    without rebooting.  Why we need this?  There are 3 use cases.
    
    1) The feature of minimizing overhead of struct page associated with
       each HugeTLB is disabled by default without passing
       "hugetlb_free_vmemmap=on" to the boot cmdline.  When we (ByteDance)
       deliver the servers to the users who want to enable this feature, they
       have to configure the grub (change boot cmdline) and reboot the
       servers, whereas rebooting usually takes a long time (we have thousands
       of servers).  It's a very bad experience for the users.  So we need a
       approach to enable this feature after rebooting.  This is a use case in
       our practical environment.
    
    2) Some use cases are that HugeTLB pages are allocated 'on the fly'
       instead of being pulled from the HugeTLB pool, those workloads would be
       affected with this feature enabled.  Those workloads could be
       identified by the characteristics of they never explicitly allocating
       huge pages with 'nr_hugepages' but only set 'nr_overcommit_hugepages'
       and then let the pages be allocated from the buddy allocator at fault
       time.  We can confirm it is a real use case from the commit
       099730d6.  For those workloads, the page fault time could be ~2x
       slower than before.  We suspect those users want to disable this
       feature if the system has enabled this before and they don't think the
       memory savings benefit is enough to make up for the performance drop.
    
    3) If the workload which wants vmemmap pages to be optimized and the
       workload which wants to set 'nr_overcommit_hugepages' and does not want
       the extera overhead at fault time when the overcommitted pages be
       allocated from the buddy allocator are deployed in the same server. 
       The user could enable this feature and set 'nr_hugepages' and
       'nr_overcommit_hugepages', then disable the feature.  In this case, the
       overcommited HugeTLB pages will not encounter the extra overhead at
       fault time.
    
    Link: https://lkml.kernel.org/r/20220512041142.39501-5-songmuchun@bytedance.comSigned-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
    Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Iurii Zaikin <yzaikin@google.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Masahiro Yamada <masahiroy@kernel.org>
    Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    78f39084
memory_hotplug.c 64.1 KB