• Tejun Heo's avatar
    x86, numa: Implement pfn -> nid mapping granularity check · 1e01979c
    Tejun Heo authored
    SPARSEMEM w/o VMEMMAP and DISCONTIGMEM, both used only on 32bit, use
    sections array to map pfn to nid which is limited in granularity.  If
    NUMA nodes are laid out such that the mapping cannot be accurate, boot
    will fail triggering BUG_ON() in mminit_verify_page_links().
    
    On 32bit, it's 512MiB w/ PAE and SPARSEMEM.  This seems to have been
    granular enough until commit 2706a0bf (x86, NUMA: Enable
    CONFIG_AMD_NUMA on 32bit too).  Apparently, there is a machine which
    aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT.  This
    led to the following BUG_ON().
    
     On node 0 totalpages: 2096615
       DMA zone: 32 pages used for memmap
       DMA zone: 0 pages reserved
       DMA zone: 3927 pages, LIFO batch:0
       Normal zone: 1740 pages used for memmap
       Normal zone: 220978 pages, LIFO batch:31
       HighMem zone: 16405 pages used for memmap
       HighMem zone: 1853533 pages, LIFO batch:31
     BUG: Int 6: CR2   (null)
          EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
          EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
          err   (null)  EIP c16209aa   CS 00000060  flg 00010002
     Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
              (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
            f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
     Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0bf #17
     Call Trace:
      [<c136b1e5>] ? early_fault+0x2e/0x2e
      [<c16209aa>] ? mminit_verify_page_links+0x12/0x42
      [<c1620613>] ? memmap_init_zone+0xaf/0x10c
      [<c1620929>] ? free_area_init_node+0x2b9/0x2e3
      [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
      [<c1601d80>] ? paging_init+0x112/0x118
      [<c15f578d>] ? setup_arch+0x791/0x82f
      [<c15f43d9>] ? start_kernel+0x6a/0x257
    
    This patch implements node_map_pfn_alignment() which determines
    maximum internode alignment and update numa_register_memblks() to
    reject NUMA configuration if alignment exceeds the pfn -> nid mapping
    granularity of the memory model as determined by PAGES_PER_SECTION.
    
    This makes the problematic machine boot w/ flatmem by rejecting the
    NUMA config and provides protection against crazy NUMA configurations.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Link: http://lkml.kernel.org/r/20110712074534.GB2872@htj.dyndns.org
    LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com>
    Reported-and-Tested-by: default avatarHans Rosenfeld <hans.rosenfeld@amd.com>
    Cc: Conny Seidel <conny.seidel@amd.com>
    Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
    1e01979c
numa.c 20.7 KB