• Vlastimil Babka's avatar
    mm/page_alloc: prevent merging between isolated and other pageblocks · 7f840533
    Vlastimil Babka authored
    commit d9dddbf5 upstream.
    
    Hanjun Guo has reported that a CMA stress test causes broken accounting of
    CMA and free pages:
    
    > Before the test, I got:
    > -bash-4.3# cat /proc/meminfo | grep Cma
    > CmaTotal:         204800 kB
    > CmaFree:          195044 kB
    >
    >
    > After running the test:
    > -bash-4.3# cat /proc/meminfo | grep Cma
    > CmaTotal:         204800 kB
    > CmaFree:         6602584 kB
    >
    > So the freed CMA memory is more than total..
    >
    > Also the the MemFree is more than mem total:
    >
    > -bash-4.3# cat /proc/meminfo
    > MemTotal:       16342016 kB
    > MemFree:        22367268 kB
    > MemAvailable:   22370528 kB
    
    Laura Abbott has confirmed the issue and suspected the freepage accounting
    rewrite around 3.18/4.0 by Joonsoo Kim.  Joonsoo had a theory that this is
    caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
    pageblocks:
    
    > CMA isolates MAX_ORDER aligned blocks, but, during the process,
    > partialy isolated block exists. If MAX_ORDER is 11 and
    > pageblock_order is 9, two pageblocks make up MAX_ORDER
    > aligned block and I can think following scenario because pageblock
    > (un)isolation would be done one by one.
    >
    > (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
    > MIGRATE_ISOLATE, respectively.
    >
    > CC -> IC -> II (Isolation)
    > II -> CI -> CC (Un-isolation)
    >
    > If some pages are freed at this intermediate state such as IC or CI,
    > that page could be merged to the other page that is resident on
    > different type of pageblock and it will cause wrong freepage count.
    
    This was supposed to be prevented by CMA operating on MAX_ORDER blocks,
    but since it doesn't hold the zone->lock between pageblocks, a race
    window does exist.
    
    It's also likely that unexpected merging can occur between
    MIGRATE_ISOLATE and non-CMA pageblocks.  This should be prevented in
    __free_one_page() since commit 3c605096 ("mm/page_alloc: restrict
    max order of merging on isolated pageblock").  However, we only check
    the migratetype of the pageblock where buddy merging has been initiated,
    not the migratetype of the buddy pageblock (or group of pageblocks)
    which can be MIGRATE_ISOLATE.
    
    Joonsoo has suggested checking for buddy migratetype as part of
    page_is_buddy(), but that would add extra checks in allocator hotpath
    and bloat-o-meter has shown significant code bloat (the function is
    inline).
    
    This patch reduces the bloat at some expense of more complicated code.
    The buddy-merging while-loop in __free_one_page() is initially bounded
    to pageblock_border and without any migratetype checks.  The checks are
    placed outside, bumping the max_order if merging is allowed, and
    returning to the while-loop with a statement which can't be possibly
    considered harmful.
    
    This fixes the accounting bug and also removes the arguably weird state
    in the original commit 3c605096 where buddies could be left
    unmerged.
    
    Fixes: 3c605096 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
    Link: https://lkml.org/lkml/2016/3/2/280Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Reported-by: default avatarHanjun Guo <guohanjun@huawei.com>
    Tested-by: default avatarHanjun Guo <guohanjun@huawei.com>
    Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
    Debugged-by: default avatarLaura Abbott <labbott@redhat.com>
    Debugged-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
    Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
    Cc: Michal Nazarewicz <mina86@mina86.com>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    [ kamal: backport to 4.2-stable: context ]
    Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
    7f840533
page_alloc.c 194 KB