• Mel Gorman's avatar
    mm: thp: acquire the anon_vma rwsem for write during split · 062f1af2
    Mel Gorman authored
    Zhouping Liu reported the following against 3.8-rc1 when running a mmap
    testcase from LTP.
    
      mapcount 0 page_mapcount 3
      ------------[ cut here ]------------
      kernel BUG at mm/huge_memory.c:1798!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables bnep bluetooth rfkill iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfat fat dm_mirror dm_region_hash dm_log dm_mod cdc_ether iTCO_wdt i7core_edac coretemp usbnet iTCO_vendor_support mii crc32c_intel edac_core lpc_ich shpchp ioatdma mfd_core i2c_i801 pcspkr serio_raw bnx2 microcode dca vhost_net tun macvtap macvlan kvm_intel kvm uinput mgag200 sr_mod cdrom i2c_algo_bit sd_mod drm_kms_helper crc_t10dif ata_generic pata_acpi ttm ata_piix drm libata i2c_core megaraid_sas
      CPU 1
      Pid: 23217, comm: mmap10 Not tainted 3.8.0-rc1mainline+ #17 IBM IBM System x3400 M3 Server -[7379I08]-/69Y4356
      RIP: __split_huge_page+0x677/0x6d0
      RSP: 0000:ffff88017a03fc08  EFLAGS: 00010293
      RAX: 0000000000000003 RBX: ffff88027a6c22e0 RCX: 00000000000034d2
      RDX: 000000000000748b RSI: 0000000000000046 RDI: 0000000000000246
      RBP: ffff88017a03fcb8 R08: ffffffff819d2440 R09: 000000000000054a
      R10: 0000000000aaaaaa R11: 00000000ffffffff R12: 0000000000000000
      R13: 00007f4f11a00000 R14: ffff880179e96e00 R15: ffffea0005c08000
      FS:  00007f4f11f4a740(0000) GS:ffff88017bc20000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00000037e9ebb404 CR3: 000000017a436000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process mmap10 (pid: 23217, threadinfo ffff88017a03e000, task ffff880172dd32e0)
      Stack:
       ffff88017a540ec8 ffff88017a03fc20 ffffffff816017b5 ffff88017a03fc88
       ffffffff812fa014 0000000000000000 ffff880279ebd5c0 00000000f4f11a4c
       00000007f4f11f49 00000007f4f11a00 ffff88017a540ef0 ffff88017a540ee8
      Call Trace:
        split_huge_page+0x68/0xb0
        __split_huge_page_pmd+0x134/0x330
        split_huge_page_pmd_mm+0x51/0x60
        split_huge_page_address+0x3b/0x50
        __vma_adjust_trans_huge+0x9c/0xf0
        vma_adjust+0x684/0x750
        __split_vma.isra.28+0x1fa/0x220
        do_munmap+0xf9/0x420
        vm_munmap+0x4e/0x70
        sys_munmap+0x2b/0x40
        system_call_fastpath+0x16/0x1b
    
    Alexander Beregalov and Alex Xu reported similar bugs and Hillf Danton
    identified that commit 5a505085 ("mm/rmap: Convert the struct
    anon_vma::mutex to an rwsem") and commit 4fc3f1d6 ("mm/rmap,
    migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable")
    were likely the problem.  Reverting these commits was reported to solve
    the problem for Alexander.
    
    Despite the reason for these commits, NUMA balancing is not the direct
    source of the problem.  split_huge_page() expects the anon_vma lock to
    be exclusive to serialise the whole split operation.  Ordinarily it is
    expected that the anon_vma lock would only be required when updating the
    avcs but THP also uses the anon_vma rwsem for collapse and split
    operations where the page lock or compound lock cannot be used (as the
    page is changing from base to THP or vice versa) and the page table
    locks are insufficient.
    
    This patch takes the anon_vma lock for write to serialise against parallel
    split_huge_page as THP expected before the conversion to rwsem.
    Reported-and-tested-by: default avatarZhouping Liu <zliu@redhat.com>
    Reported-by: default avatarAlexander Beregalov <a.beregalov@gmail.com>
    Reported-by: default avatarAlex Xu <alex_y_xu@yahoo.ca>
    Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    062f1af2
huge_memory.c 73.7 KB