• Michal Hocko's avatar
    hugetlb: do not account hugetlb pages as NR_FILE_PAGES · 4165b9b4
    Michal Hocko authored
    hugetlb pages uses add_to_page_cache to track shared mappings.  This is
    OK from the data structure point of view but it is less so from the
    NR_FILE_PAGES accounting:
    
    	- huge pages are accounted as 4k which is clearly wrong
    	- this counter is used as the amount of the reclaimable page
    	  cache which is incorrect as well because hugetlb pages are
    	  special and not reclaimable
    	- the counter is then exported to userspace via /proc/meminfo
    	  (in Cached:), /proc/vmstat and /proc/zoneinfo as
    	  nr_file_pages which is confusing at least:
    	  Cached:          8883504 kB
    	  HugePages_Free:     8348
    	  ...
    	  Cached:          8916048 kB
    	  HugePages_Free:      156
    	  ...
    	  thats 8192 huge pages allocated which is ~16G accounted as 32M
    
    There are usually not that many huge pages in the system for this to
    make any visible difference e.g.  by fooling __vm_enough_memory or
    zone_pagecache_reclaimable.
    
    Fix this by special casing huge pages in both __delete_from_page_cache
    and __add_to_page_cache_locked.  replace_page_cache_page is currently
    only used by fuse and that shouldn't touch hugetlb pages AFAICS but it
    is more robust to check for special casing there as well.
    
    Hugetlb pages shouldn't get to any other paths where we do accounting:
    	- migration - we have a special handling via
    	  hugetlbfs_migrate_page
    	- shmem - doesn't handle hugetlb pages directly even for
    	  SHM_HUGETLB resp. MAP_HUGETLB
    	- swapcache - hugetlb is not swapable
    
    This has a user visible effect but I believe it is reasonable because the
    previously exported number is simply bogus.
    
    An alternative would be to account hugetlb pages with their real size and
    treat them similar to shmem.  But this has some drawbacks.
    
    First we would have to special case in kernel users of NR_FILE_PAGES and
    considering how hugetlb is special we would have to do it everywhere.  We
    do not want Cached exported by /proc/meminfo to include it because the
    value would be even more misleading.
    
    __vm_enough_memory and zone_pagecache_reclaimable would have to do the
    same thing because those pages are simply not reclaimable.  The correction
    is even not trivial because we would have to consider all active hugetlb
    page sizes properly.  Users of the counter outside of the kernel would
    have to do the same.
    
    So the question is why to account something that needs to be basically
    excluded for each reasonable usage.  This doesn't make much sense to me.
    
    It seems that this has been broken since hugetlb was introduced but I
    haven't checked the whole history.
    
    [akpm@linux-foundation.org: tweak comments]
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
    Acked-by: default avatarMel Gorman <mgorman@suse.de>
    Tested-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    4165b9b4
filemap.c 70.1 KB