1. 19 Aug, 2012 24 commits
  2. 09 Aug, 2012 16 commits
    • Ben Hutchings's avatar
      Linux 3.2.27 · 8524c787
      Ben Hutchings authored
      8524c787
    • Tomoya MORINAGA's avatar
      pch_uart: Fix parity setting issue · dbf6df1f
      Tomoya MORINAGA authored
      commit 38bd2a1a upstream.
      
      Parity Setting value is reverse.
      E.G. In case of setting ODD parity, EVEN value is set.
      This patch inverts "if" condition.
      Signed-off-by: default avatarTomoya MORINAGA <tomoya.rohm@gmail.com>
      Acked-by: default avatarAlan Cox <alan@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      dbf6df1f
    • Tomoya MORINAGA's avatar
      pch_uart: Fix rx error interrupt setting issue · 0eaadc03
      Tomoya MORINAGA authored
      commit 9539dfb7 upstream.
      
      Rx Error interrupt(E.G. parity error) is not enabled.
      So, when parity error occurs, error interrupt is not occurred.
      As a result, the received data is not dropped.
      
      This patch adds enable/disable rx error interrupt code.
      Signed-off-by: default avatarTomoya MORINAGA <tomoya.rohm@gmail.com>
      Acked-by: default avatarAlan Cox <alan@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [Backported by Tomoya MORINGA: adjusted context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0eaadc03
    • Alan Cox's avatar
      pch_uart: Fix missing break for 16 byte fifo · 882cd0e8
      Alan Cox authored
      commit 9bc03743 upstream.
      
      Otherwise we fall back to the wrong value.
      
      Reported-by: <dcb314@hotmail.com>
      Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=44091Signed-off-by: default avatarAlan Cox <alan@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      882cd0e8
    • Eric Dumazet's avatar
      drop_monitor: dont sleep in atomic context · d4a6acbb
      Eric Dumazet authored
      commit bec4596b upstream.
      
      drop_monitor calls several sleeping functions while in atomic context.
      
       BUG: sleeping function called from invalid context at mm/slub.c:943
       in_atomic(): 1, irqs_disabled(): 0, pid: 2103, name: kworker/0:2
       Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ #55
       Call Trace:
        [<ffffffff810697ca>] __might_sleep+0xca/0xf0
        [<ffffffff811345a3>] kmem_cache_alloc_node+0x1b3/0x1c0
        [<ffffffff8105578c>] ? queue_delayed_work_on+0x11c/0x130
        [<ffffffff815343fb>] __alloc_skb+0x4b/0x230
        [<ffffffffa00b0360>] ? reset_per_cpu_data+0x160/0x160 [drop_monitor]
        [<ffffffffa00b022f>] reset_per_cpu_data+0x2f/0x160 [drop_monitor]
        [<ffffffffa00b03ab>] send_dm_alert+0x4b/0xb0 [drop_monitor]
        [<ffffffff810568e0>] process_one_work+0x130/0x4c0
        [<ffffffff81058249>] worker_thread+0x159/0x360
        [<ffffffff810580f0>] ? manage_workers.isra.27+0x240/0x240
        [<ffffffff8105d403>] kthread+0x93/0xa0
        [<ffffffff816be6d4>] kernel_thread_helper+0x4/0x10
        [<ffffffff8105d370>] ? kthread_freezable_should_stop+0x80/0x80
        [<ffffffff816be6d0>] ? gs_change+0xb/0xb
      
      Rework the logic to call the sleeping functions in right context.
      
      Use standard timer/workqueue api to let system chose any cpu to perform
      the allocation and netlink send.
      
      Also avoid a loop if reset_per_cpu_data() cannot allocate memory :
      use mod_timer() to wait 1/10 second before next try.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Reviewed-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d4a6acbb
    • Neil Horman's avatar
      drop_monitor: prevent init path from scheduling on the wrong cpu · 1b51d69a
      Neil Horman authored
      commit 4fdcfa12 upstream.
      
      I just noticed after some recent updates, that the init path for the drop
      monitor protocol has a minor error.  drop monitor maintains a per cpu structure,
      that gets initalized from a single cpu.  Normally this is fine, as the protocol
      isn't in use yet, but I recently made a change that causes a failed skb
      allocation to reschedule itself .  Given the current code, the implication is
      that this workqueue reschedule will take place on the wrong cpu.  If drop
      monitor is used early during the boot process, its possible that two cpus will
      access a single per-cpu structure in parallel, possibly leading to data
      corruption.
      
      This patch fixes the situation, by storing the cpu number that a given instance
      of this per-cpu data should be accessed from.  In the case of a need for a
      reschedule, the cpu stored in the struct is assigned the rescheule, rather than
      the currently executing cpu
      
      Tested successfully by myself.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1b51d69a
    • Neil Horman's avatar
      drop_monitor: Make updating data->skb smp safe · ea39e338
      Neil Horman authored
      commit 3885ca78 upstream.
      
      Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in
      its smp protections.  Specifically, its possible to replace data->skb while its
      being written.  This patch corrects that by making data->skb an rcu protected
      variable.  That will prevent it from being overwritten while a tracepoint is
      modifying it.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      CC: David Miller <davem@davemloft.net>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ea39e338
    • Neil Horman's avatar
      drop_monitor: fix sleeping in invalid context warning · caaf10b6
      Neil Horman authored
      commit cde2e9a6 upstream.
      
      Eric Dumazet pointed out this warning in the drop_monitor protocol to me:
      
      [   38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85
      [   38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch
      [   38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71
      [   38.352582] Call Trace:
      [   38.352592]  [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
      [   38.352599]  [<ffffffff81063f2a>] __might_sleep+0xca/0xf0
      [   38.352606]  [<ffffffff81655b16>] mutex_lock+0x26/0x50
      [   38.352610]  [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
      [   38.352616]  [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90
      [   38.352621]  [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170
      [   38.352625]  [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40
      [   38.352630]  [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0
      [   38.352636]  [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0
      [   38.352640]  [<ffffffff8154a600>] ? genl_rcv+0x30/0x30
      [   38.352645]  [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0
      [   38.352649]  [<ffffffff8154a5f0>] genl_rcv+0x20/0x30
      [   38.352653]  [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0
      [   38.352658]  [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310
      [   38.352663]  [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130
      [   38.352668]  [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0
      [   38.352673]  [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0
      [   38.352677]  [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390
      [   38.352682]  [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210
      [   38.352687]  [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0
      [   38.352693]  [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0
      [   38.352699]  [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10
      [   38.352703]  [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140
      [   38.352708]  [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80
      [   38.352713]  [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b
      
      It stems from holding a spinlock (trace_state_lock) while attempting to register
      or unregister tracepoint hooks, making in_atomic() true in this context, leading
      to the warning when the tracepoint calls might_sleep() while its taking a mutex.
      Since we only use the trace_state_lock to prevent trace protocol state races, as
      well as hardware stat list updates on an rcu write side, we can just convert the
      spinlock to a mutex to avoid this problem.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      CC: David Miller <davem@davemloft.net>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      caaf10b6
    • Jeongdo Son's avatar
      rt2x00: Add support for BUFFALO WLI-UC-GNM2 to rt2800usb. · 6cf299fa
      Jeongdo Son authored
      commit a769f957 upstream.
      
      This is a RT3070 based device.
      Signed-off-by: default avatarJeongdo Son <sohn9086@gmail.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6cf299fa
    • Jesse Barnes's avatar
      drm/i915: prefer wide & slow to fast & narrow in DP configs · 38aa8510
      Jesse Barnes authored
      commit 2514bc51 upstream.
      
      High frequency link configurations have the potential to cause trouble
      with long and/or cheap cables, so prefer slow and wide configurations
      instead.  This patch has the potential to cause trouble for eDP
      configurations that lie about available lanes, so if we run into that we
      can make it conditional on eDP.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45801
      Tested-by: peter@colberg.org
      Signed-off-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      38aa8510
    • Andreas Schwab's avatar
      m68k: Make sys_atomic_cmpxchg_32 work on classic m68k · 56c5dc34
      Andreas Schwab authored
      commit 9e2760d1 upstream.
      
      User space access must always go through uaccess accessors, since on
      classic m68k user space and kernel space are completely separate.
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Tested-by: default avatarThorsten Glaser <tg@debian.org>
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      56c5dc34
    • Boaz Harrosh's avatar
      ore: Fix out-of-bounds access in _ios_obj() · 922c9791
      Boaz Harrosh authored
      commit 9e62bb44 upstream.
      
      _ios_obj() is accessed by group_index not device_table index.
      
      The oc->comps array is only a group_full of devices at a time
      it is not like ore_comp_dev() which is indexed by a global
      device_table index.
      
      This did not BUG until now because exofs only uses a single
      COMP for all devices. But with other FSs like PanFS this is
      not true.
      
      This bug was only in the write_path, all other users were
      using it correctly
      
      [This is a bug since 3.2 Kernel]
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      922c9791
    • Takashi Iwai's avatar
      ALSA: hda - Support dock on Lenovo Thinkpad T530 with ALC269VC · 18fbe5a7
      Takashi Iwai authored
      commit 707fba3f upstream.
      
      Lenovo Thinkpad T530 with ALC269VC codec has a dock port but BIOS
      doesn't set up the pins properly.  Enable the pins as well as on
      Thinkpad X230 Tablet.
      Reported-and-tested-by: default avatarMario <anyc@hadiko.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      18fbe5a7
    • Daniel Mack's avatar
      ALSA: snd-usb: fix clock source validity index · 768049e3
      Daniel Mack authored
      commit aff252a8 upstream.
      
      uac_clock_source_is_valid() uses the control selector value to access
      the bmControls bitmap of the clock source unit. This is wrong, as
      control selector values start from 1, while the bitmap uses all
      available bits.
      
      In other words, "Clock Validity Control" is stored in D3..2, not D5..4
      of the clock selector unit's bmControls.
      Signed-off-by: default avatarDaniel Mack <zonque@gmail.com>
      Reported-by: default avatarAndreas Koch <andreas@akdesigninc.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      768049e3
    • Mel Gorman's avatar
      mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables · 6f72a41f
      Mel Gorman authored
      commit d833352a upstream.
      
      If a process creates a large hugetlbfs mapping that is eligible for page
      table sharing and forks heavily with children some of whom fault and
      others which destroy the mapping then it is possible for page tables to
      get corrupted.  Some teardowns of the mapping encounter a "bad pmd" and
      output a message to the kernel log.  The final teardown will trigger a
      BUG_ON in mm/filemap.c.
      
      This was reproduced in 3.4 but is known to have existed for a long time
      and goes back at least as far as 2.6.37.  It was probably was introduced
      in 2.6.20 by [39dde65c: shared page table for hugetlb page].  The messages
      look like this;
      
      [  ..........] Lots of bad pmd messages followed by this
      [  127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
      [  127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
      [  127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
      [  127.186778] ------------[ cut here ]------------
      [  127.186781] kernel BUG at mm/filemap.c:134!
      [  127.186782] invalid opcode: 0000 [#1] SMP
      [  127.186783] CPU 7
      [  127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
      [  127.186801]
      [  127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR
      [  127.186804] RIP: 0010:[<ffffffff810ed6ce>]  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
      [  127.186809] RSP: 0000:ffff8804144b5c08  EFLAGS: 00010002
      [  127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
      [  127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
      [  127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
      [  127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
      [  127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
      [  127.186815] FS:  00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
      [  127.186816] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
      [  127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
      [  127.186821] Stack:
      [  127.186822]  ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
      [  127.186824]  ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
      [  127.186825]  ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
      [  127.186827] Call Trace:
      [  127.186829]  [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80
      [  127.186832]  [<ffffffff811bc925>] truncate_hugepages+0x115/0x220
      [  127.186834]  [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30
      [  127.186837]  [<ffffffff811655c7>] evict+0xa7/0x1b0
      [  127.186839]  [<ffffffff811657a3>] iput_final+0xd3/0x1f0
      [  127.186840]  [<ffffffff811658f9>] iput+0x39/0x50
      [  127.186842]  [<ffffffff81162708>] d_kill+0xf8/0x130
      [  127.186843]  [<ffffffff81162812>] dput+0xd2/0x1a0
      [  127.186845]  [<ffffffff8114e2d0>] __fput+0x170/0x230
      [  127.186848]  [<ffffffff81236e0e>] ? rb_erase+0xce/0x150
      [  127.186849]  [<ffffffff8114e3ad>] fput+0x1d/0x30
      [  127.186851]  [<ffffffff81117db7>] remove_vma+0x37/0x80
      [  127.186853]  [<ffffffff81119182>] do_munmap+0x2d2/0x360
      [  127.186855]  [<ffffffff811cc639>] sys_shmdt+0xc9/0x170
      [  127.186857]  [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b
      [  127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
      [  127.186868] RIP  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
      [  127.186870]  RSP <ffff8804144b5c08>
      [  127.186871] ---[ end trace 7cbac5d1db69f426 ]---
      
      The bug is a race and not always easy to reproduce.  To reproduce it I was
      doing the following on a single socket I7-based machine with 16G of RAM.
      
      $ hugeadm --pool-pages-max DEFAULT:13G
      $ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax
      $ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall
      $ for i in `seq 1 9000`; do ./hugetlbfs-test; done
      
      On my particular machine, it usually triggers within 10 minutes but
      enabling debug options can change the timing such that it never hits.
      Once the bug is triggered, the machine is in trouble and needs to be
      rebooted.  The machine will respond but processes accessing proc like "ps
      aux" will hang due to the BUG_ON.  shutdown will also hang and needs a
      hard reset or a sysrq-b.
      
      The basic problem is a race between page table sharing and teardown.  For
      the most part page table sharing depends on i_mmap_mutex.  In some cases,
      it is also taking the mm->page_table_lock for the PTE updates but with
      shared page tables, it is the i_mmap_mutex that is more important.
      
      Unfortunately it appears to be also insufficient. Consider the following
      situation
      
      Process A					Process B
      ---------					---------
      hugetlb_fault					shmdt
        						LockWrite(mmap_sem)
          						  do_munmap
      						    unmap_region
      						      unmap_vmas
      						        unmap_single_vma
      						          unmap_hugepage_range
            						            Lock(i_mmap_mutex)
      							    Lock(mm->page_table_lock)
      							    huge_pmd_unshare/unmap tables <--- (1)
      							    Unlock(mm->page_table_lock)
            						            Unlock(i_mmap_mutex)
        huge_pte_alloc				      ...
          Lock(i_mmap_mutex)				      ...
          vma_prio_walk, find svma, spte		      ...
          Lock(mm->page_table_lock)			      ...
          share spte					      ...
          Unlock(mm->page_table_lock)			      ...
          Unlock(i_mmap_mutex)			      ...
        hugetlb_no_page									  <--- (2)
      						      free_pgtables
      						        unlink_file_vma
      							hugetlb_free_pgd_range
      						    remove_vma_list
      
      In this scenario, it is possible for Process A to share page tables with
      Process B that is trying to tear them down.  The i_mmap_mutex on its own
      does not prevent Process A walking Process B's page tables.  At (1) above,
      the page tables are not shared yet so it unmaps the PMDs.  Process A sets
      up page table sharing and at (2) faults a new entry.  Process B then trips
      up on it in free_pgtables.
      
      This patch fixes the problem by adding a new function
      __unmap_hugepage_range_final that is only called when the VMA is about to
      be destroyed.  This function clears VM_MAYSHARE during
      unmap_hugepage_range() under the i_mmap_mutex.  This makes the VMA
      ineligible for sharing and avoids the race.  Superficially this looks like
      it would then be vunerable to truncate and madvise issues but hugetlbfs
      has its own truncate handlers so does not use unmap_mapping_range() and
      does not support madvise(DONTNEED).
      
      This should be treated as a -stable candidate if it is merged.
      
      Test program is as follows. The test case was mostly written by Michal
      Hocko with a few minor changes to reproduce this bug.
      
      ==== CUT HERE ====
      
      static size_t huge_page_size = (2UL << 20);
      static size_t nr_huge_page_A = 512;
      static size_t nr_huge_page_B = 5632;
      
      unsigned int get_random(unsigned int max)
      {
      	struct timeval tv;
      
      	gettimeofday(&tv, NULL);
      	srandom(tv.tv_usec);
      	return random() % max;
      }
      
      static void play(void *addr, size_t size)
      {
      	unsigned char *start = addr,
      		      *end = start + size,
      		      *a;
      	start += get_random(size/2);
      
      	/* we could itterate on huge pages but let's give it more time. */
      	for (a = start; a < end; a += 4096)
      		*a = 0;
      }
      
      int main(int argc, char **argv)
      {
      	key_t key = IPC_PRIVATE;
      	size_t sizeA = nr_huge_page_A * huge_page_size;
      	size_t sizeB = nr_huge_page_B * huge_page_size;
      	int shmidA, shmidB;
      	void *addrA = NULL, *addrB = NULL;
      	int nr_children = 300, n = 0;
      
      	if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
      		perror("shmget:");
      		return 1;
      	}
      
      	if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
      		perror("shmat");
      		return 1;
      	}
      	if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
      		perror("shmget:");
      		return 1;
      	}
      
      	if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
      		perror("shmat");
      		return 1;
      	}
      
      fork_child:
      	switch(fork()) {
      		case 0:
      			switch (n%3) {
      			case 0:
      				play(addrA, sizeA);
      				break;
      			case 1:
      				play(addrB, sizeB);
      				break;
      			case 2:
      				break;
      			}
      			break;
      		case -1:
      			perror("fork:");
      			break;
      		default:
      			if (++n < nr_children)
      				goto fork_child;
      			play(addrA, sizeA);
      			break;
      	}
      	shmdt(addrA);
      	shmdt(addrB);
      	do {
      		wait(NULL);
      	} while (--n > 0);
      	shmctl(shmidA, IPC_RMID, NULL);
      	shmctl(shmidB, IPC_RMID, NULL);
      	return 0;
      }
      
      [akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2:
       - Adjust context
       - Drop the mmu_gather * parameters]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6f72a41f
    • Xiao Guangrong's avatar
      mm: mmu_notifier: fix freed page still mapped in secondary MMU · 5d7a6068
      Xiao Guangrong authored
      commit 3ad3d901 upstream.
      
      mmu_notifier_release() is called when the process is exiting.  It will
      delete all the mmu notifiers.  But at this time the page belonging to the
      process is still present in page tables and is present on the LRU list, so
      this race will happen:
      
            CPU 0                 CPU 1
      mmu_notifier_release:    try_to_unmap:
         hlist_del_init_rcu(&mn->hlist);
                                  ptep_clear_flush_notify:
                                        mmu nofifler not found
                                  free page  !!!!!!
                                  /*
                                   * At the point, the page has been
                                   * freed, but it is still mapped in
                                   * the secondary MMU.
                                   */
      
        mn->ops->release(mn, mm);
      
      Then the box is not stable and sometimes we can get this bug:
      
      [  738.075923] BUG: Bad page state in process migrate-perf  pfn:03bec
      [  738.075931] page:ffffea00000efb00 count:0 mapcount:0 mapping:          (null) index:0x8076
      [  738.075936] page flags: 0x20000000000014(referenced|dirty)
      
      The same issue is present in mmu_notifier_unregister().
      
      We can call ->release before deleting the notifier to ensure the page has
      been unmapped from the secondary MMU before it is freed.
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5d7a6068