1. 09 Aug, 2012 40 commits
    • Neil Horman's avatar
      drop_monitor: prevent init path from scheduling on the wrong cpu · 1b51d69a
      Neil Horman authored
      commit 4fdcfa12 upstream.
      
      I just noticed after some recent updates, that the init path for the drop
      monitor protocol has a minor error.  drop monitor maintains a per cpu structure,
      that gets initalized from a single cpu.  Normally this is fine, as the protocol
      isn't in use yet, but I recently made a change that causes a failed skb
      allocation to reschedule itself .  Given the current code, the implication is
      that this workqueue reschedule will take place on the wrong cpu.  If drop
      monitor is used early during the boot process, its possible that two cpus will
      access a single per-cpu structure in parallel, possibly leading to data
      corruption.
      
      This patch fixes the situation, by storing the cpu number that a given instance
      of this per-cpu data should be accessed from.  In the case of a need for a
      reschedule, the cpu stored in the struct is assigned the rescheule, rather than
      the currently executing cpu
      
      Tested successfully by myself.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1b51d69a
    • Neil Horman's avatar
      drop_monitor: Make updating data->skb smp safe · ea39e338
      Neil Horman authored
      commit 3885ca78 upstream.
      
      Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in
      its smp protections.  Specifically, its possible to replace data->skb while its
      being written.  This patch corrects that by making data->skb an rcu protected
      variable.  That will prevent it from being overwritten while a tracepoint is
      modifying it.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      CC: David Miller <davem@davemloft.net>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ea39e338
    • Neil Horman's avatar
      drop_monitor: fix sleeping in invalid context warning · caaf10b6
      Neil Horman authored
      commit cde2e9a6 upstream.
      
      Eric Dumazet pointed out this warning in the drop_monitor protocol to me:
      
      [   38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85
      [   38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch
      [   38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71
      [   38.352582] Call Trace:
      [   38.352592]  [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
      [   38.352599]  [<ffffffff81063f2a>] __might_sleep+0xca/0xf0
      [   38.352606]  [<ffffffff81655b16>] mutex_lock+0x26/0x50
      [   38.352610]  [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
      [   38.352616]  [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90
      [   38.352621]  [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170
      [   38.352625]  [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40
      [   38.352630]  [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0
      [   38.352636]  [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0
      [   38.352640]  [<ffffffff8154a600>] ? genl_rcv+0x30/0x30
      [   38.352645]  [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0
      [   38.352649]  [<ffffffff8154a5f0>] genl_rcv+0x20/0x30
      [   38.352653]  [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0
      [   38.352658]  [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310
      [   38.352663]  [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130
      [   38.352668]  [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0
      [   38.352673]  [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0
      [   38.352677]  [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390
      [   38.352682]  [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210
      [   38.352687]  [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0
      [   38.352693]  [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0
      [   38.352699]  [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10
      [   38.352703]  [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140
      [   38.352708]  [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80
      [   38.352713]  [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b
      
      It stems from holding a spinlock (trace_state_lock) while attempting to register
      or unregister tracepoint hooks, making in_atomic() true in this context, leading
      to the warning when the tracepoint calls might_sleep() while its taking a mutex.
      Since we only use the trace_state_lock to prevent trace protocol state races, as
      well as hardware stat list updates on an rcu write side, we can just convert the
      spinlock to a mutex to avoid this problem.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      CC: David Miller <davem@davemloft.net>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      caaf10b6
    • Jeongdo Son's avatar
      rt2x00: Add support for BUFFALO WLI-UC-GNM2 to rt2800usb. · 6cf299fa
      Jeongdo Son authored
      commit a769f957 upstream.
      
      This is a RT3070 based device.
      Signed-off-by: default avatarJeongdo Son <sohn9086@gmail.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6cf299fa
    • Jesse Barnes's avatar
      drm/i915: prefer wide & slow to fast & narrow in DP configs · 38aa8510
      Jesse Barnes authored
      commit 2514bc51 upstream.
      
      High frequency link configurations have the potential to cause trouble
      with long and/or cheap cables, so prefer slow and wide configurations
      instead.  This patch has the potential to cause trouble for eDP
      configurations that lie about available lanes, so if we run into that we
      can make it conditional on eDP.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45801
      Tested-by: peter@colberg.org
      Signed-off-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      38aa8510
    • Andreas Schwab's avatar
      m68k: Make sys_atomic_cmpxchg_32 work on classic m68k · 56c5dc34
      Andreas Schwab authored
      commit 9e2760d1 upstream.
      
      User space access must always go through uaccess accessors, since on
      classic m68k user space and kernel space are completely separate.
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Tested-by: default avatarThorsten Glaser <tg@debian.org>
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      56c5dc34
    • Boaz Harrosh's avatar
      ore: Fix out-of-bounds access in _ios_obj() · 922c9791
      Boaz Harrosh authored
      commit 9e62bb44 upstream.
      
      _ios_obj() is accessed by group_index not device_table index.
      
      The oc->comps array is only a group_full of devices at a time
      it is not like ore_comp_dev() which is indexed by a global
      device_table index.
      
      This did not BUG until now because exofs only uses a single
      COMP for all devices. But with other FSs like PanFS this is
      not true.
      
      This bug was only in the write_path, all other users were
      using it correctly
      
      [This is a bug since 3.2 Kernel]
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      922c9791
    • Takashi Iwai's avatar
      ALSA: hda - Support dock on Lenovo Thinkpad T530 with ALC269VC · 18fbe5a7
      Takashi Iwai authored
      commit 707fba3f upstream.
      
      Lenovo Thinkpad T530 with ALC269VC codec has a dock port but BIOS
      doesn't set up the pins properly.  Enable the pins as well as on
      Thinkpad X230 Tablet.
      Reported-and-tested-by: default avatarMario <anyc@hadiko.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      18fbe5a7
    • Daniel Mack's avatar
      ALSA: snd-usb: fix clock source validity index · 768049e3
      Daniel Mack authored
      commit aff252a8 upstream.
      
      uac_clock_source_is_valid() uses the control selector value to access
      the bmControls bitmap of the clock source unit. This is wrong, as
      control selector values start from 1, while the bitmap uses all
      available bits.
      
      In other words, "Clock Validity Control" is stored in D3..2, not D5..4
      of the clock selector unit's bmControls.
      Signed-off-by: default avatarDaniel Mack <zonque@gmail.com>
      Reported-by: default avatarAndreas Koch <andreas@akdesigninc.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      768049e3
    • Mel Gorman's avatar
      mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables · 6f72a41f
      Mel Gorman authored
      commit d833352a upstream.
      
      If a process creates a large hugetlbfs mapping that is eligible for page
      table sharing and forks heavily with children some of whom fault and
      others which destroy the mapping then it is possible for page tables to
      get corrupted.  Some teardowns of the mapping encounter a "bad pmd" and
      output a message to the kernel log.  The final teardown will trigger a
      BUG_ON in mm/filemap.c.
      
      This was reproduced in 3.4 but is known to have existed for a long time
      and goes back at least as far as 2.6.37.  It was probably was introduced
      in 2.6.20 by [39dde65c: shared page table for hugetlb page].  The messages
      look like this;
      
      [  ..........] Lots of bad pmd messages followed by this
      [  127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
      [  127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
      [  127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
      [  127.186778] ------------[ cut here ]------------
      [  127.186781] kernel BUG at mm/filemap.c:134!
      [  127.186782] invalid opcode: 0000 [#1] SMP
      [  127.186783] CPU 7
      [  127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
      [  127.186801]
      [  127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR
      [  127.186804] RIP: 0010:[<ffffffff810ed6ce>]  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
      [  127.186809] RSP: 0000:ffff8804144b5c08  EFLAGS: 00010002
      [  127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
      [  127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
      [  127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
      [  127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
      [  127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
      [  127.186815] FS:  00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
      [  127.186816] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
      [  127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
      [  127.186821] Stack:
      [  127.186822]  ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
      [  127.186824]  ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
      [  127.186825]  ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
      [  127.186827] Call Trace:
      [  127.186829]  [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80
      [  127.186832]  [<ffffffff811bc925>] truncate_hugepages+0x115/0x220
      [  127.186834]  [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30
      [  127.186837]  [<ffffffff811655c7>] evict+0xa7/0x1b0
      [  127.186839]  [<ffffffff811657a3>] iput_final+0xd3/0x1f0
      [  127.186840]  [<ffffffff811658f9>] iput+0x39/0x50
      [  127.186842]  [<ffffffff81162708>] d_kill+0xf8/0x130
      [  127.186843]  [<ffffffff81162812>] dput+0xd2/0x1a0
      [  127.186845]  [<ffffffff8114e2d0>] __fput+0x170/0x230
      [  127.186848]  [<ffffffff81236e0e>] ? rb_erase+0xce/0x150
      [  127.186849]  [<ffffffff8114e3ad>] fput+0x1d/0x30
      [  127.186851]  [<ffffffff81117db7>] remove_vma+0x37/0x80
      [  127.186853]  [<ffffffff81119182>] do_munmap+0x2d2/0x360
      [  127.186855]  [<ffffffff811cc639>] sys_shmdt+0xc9/0x170
      [  127.186857]  [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b
      [  127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
      [  127.186868] RIP  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
      [  127.186870]  RSP <ffff8804144b5c08>
      [  127.186871] ---[ end trace 7cbac5d1db69f426 ]---
      
      The bug is a race and not always easy to reproduce.  To reproduce it I was
      doing the following on a single socket I7-based machine with 16G of RAM.
      
      $ hugeadm --pool-pages-max DEFAULT:13G
      $ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax
      $ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall
      $ for i in `seq 1 9000`; do ./hugetlbfs-test; done
      
      On my particular machine, it usually triggers within 10 minutes but
      enabling debug options can change the timing such that it never hits.
      Once the bug is triggered, the machine is in trouble and needs to be
      rebooted.  The machine will respond but processes accessing proc like "ps
      aux" will hang due to the BUG_ON.  shutdown will also hang and needs a
      hard reset or a sysrq-b.
      
      The basic problem is a race between page table sharing and teardown.  For
      the most part page table sharing depends on i_mmap_mutex.  In some cases,
      it is also taking the mm->page_table_lock for the PTE updates but with
      shared page tables, it is the i_mmap_mutex that is more important.
      
      Unfortunately it appears to be also insufficient. Consider the following
      situation
      
      Process A					Process B
      ---------					---------
      hugetlb_fault					shmdt
        						LockWrite(mmap_sem)
          						  do_munmap
      						    unmap_region
      						      unmap_vmas
      						        unmap_single_vma
      						          unmap_hugepage_range
            						            Lock(i_mmap_mutex)
      							    Lock(mm->page_table_lock)
      							    huge_pmd_unshare/unmap tables <--- (1)
      							    Unlock(mm->page_table_lock)
            						            Unlock(i_mmap_mutex)
        huge_pte_alloc				      ...
          Lock(i_mmap_mutex)				      ...
          vma_prio_walk, find svma, spte		      ...
          Lock(mm->page_table_lock)			      ...
          share spte					      ...
          Unlock(mm->page_table_lock)			      ...
          Unlock(i_mmap_mutex)			      ...
        hugetlb_no_page									  <--- (2)
      						      free_pgtables
      						        unlink_file_vma
      							hugetlb_free_pgd_range
      						    remove_vma_list
      
      In this scenario, it is possible for Process A to share page tables with
      Process B that is trying to tear them down.  The i_mmap_mutex on its own
      does not prevent Process A walking Process B's page tables.  At (1) above,
      the page tables are not shared yet so it unmaps the PMDs.  Process A sets
      up page table sharing and at (2) faults a new entry.  Process B then trips
      up on it in free_pgtables.
      
      This patch fixes the problem by adding a new function
      __unmap_hugepage_range_final that is only called when the VMA is about to
      be destroyed.  This function clears VM_MAYSHARE during
      unmap_hugepage_range() under the i_mmap_mutex.  This makes the VMA
      ineligible for sharing and avoids the race.  Superficially this looks like
      it would then be vunerable to truncate and madvise issues but hugetlbfs
      has its own truncate handlers so does not use unmap_mapping_range() and
      does not support madvise(DONTNEED).
      
      This should be treated as a -stable candidate if it is merged.
      
      Test program is as follows. The test case was mostly written by Michal
      Hocko with a few minor changes to reproduce this bug.
      
      ==== CUT HERE ====
      
      static size_t huge_page_size = (2UL << 20);
      static size_t nr_huge_page_A = 512;
      static size_t nr_huge_page_B = 5632;
      
      unsigned int get_random(unsigned int max)
      {
      	struct timeval tv;
      
      	gettimeofday(&tv, NULL);
      	srandom(tv.tv_usec);
      	return random() % max;
      }
      
      static void play(void *addr, size_t size)
      {
      	unsigned char *start = addr,
      		      *end = start + size,
      		      *a;
      	start += get_random(size/2);
      
      	/* we could itterate on huge pages but let's give it more time. */
      	for (a = start; a < end; a += 4096)
      		*a = 0;
      }
      
      int main(int argc, char **argv)
      {
      	key_t key = IPC_PRIVATE;
      	size_t sizeA = nr_huge_page_A * huge_page_size;
      	size_t sizeB = nr_huge_page_B * huge_page_size;
      	int shmidA, shmidB;
      	void *addrA = NULL, *addrB = NULL;
      	int nr_children = 300, n = 0;
      
      	if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
      		perror("shmget:");
      		return 1;
      	}
      
      	if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
      		perror("shmat");
      		return 1;
      	}
      	if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
      		perror("shmget:");
      		return 1;
      	}
      
      	if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
      		perror("shmat");
      		return 1;
      	}
      
      fork_child:
      	switch(fork()) {
      		case 0:
      			switch (n%3) {
      			case 0:
      				play(addrA, sizeA);
      				break;
      			case 1:
      				play(addrB, sizeB);
      				break;
      			case 2:
      				break;
      			}
      			break;
      		case -1:
      			perror("fork:");
      			break;
      		default:
      			if (++n < nr_children)
      				goto fork_child;
      			play(addrA, sizeA);
      			break;
      	}
      	shmdt(addrA);
      	shmdt(addrB);
      	do {
      		wait(NULL);
      	} while (--n > 0);
      	shmctl(shmidA, IPC_RMID, NULL);
      	shmctl(shmidB, IPC_RMID, NULL);
      	return 0;
      }
      
      [akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2:
       - Adjust context
       - Drop the mmu_gather * parameters]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6f72a41f
    • Xiao Guangrong's avatar
      mm: mmu_notifier: fix freed page still mapped in secondary MMU · 5d7a6068
      Xiao Guangrong authored
      commit 3ad3d901 upstream.
      
      mmu_notifier_release() is called when the process is exiting.  It will
      delete all the mmu notifiers.  But at this time the page belonging to the
      process is still present in page tables and is present on the LRU list, so
      this race will happen:
      
            CPU 0                 CPU 1
      mmu_notifier_release:    try_to_unmap:
         hlist_del_init_rcu(&mn->hlist);
                                  ptep_clear_flush_notify:
                                        mmu nofifler not found
                                  free page  !!!!!!
                                  /*
                                   * At the point, the page has been
                                   * freed, but it is still mapped in
                                   * the secondary MMU.
                                   */
      
        mn->ops->release(mn, mm);
      
      Then the box is not stable and sometimes we can get this bug:
      
      [  738.075923] BUG: Bad page state in process migrate-perf  pfn:03bec
      [  738.075931] page:ffffea00000efb00 count:0 mapcount:0 mapping:          (null) index:0x8076
      [  738.075936] page flags: 0x20000000000014(referenced|dirty)
      
      The same issue is present in mmu_notifier_unregister().
      
      We can call ->release before deleting the notifier to ensure the page has
      been unmapped from the secondary MMU before it is freed.
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5d7a6068
    • Xishi Qiu's avatar
      mm: setup pageblock_order before it's used by sparsemem · 3927df7e
      Xishi Qiu authored
      commit ca57df79 upstream.
      
      On architectures with CONFIG_HUGETLB_PAGE_SIZE_VARIABLE set, such as
      Itanium, pageblock_order is a variable with default value of 0.  It's set
      to the right value by set_pageblock_order() in function
      free_area_init_core().
      
      But pageblock_order may be used by sparse_init() before free_area_init_core()
      is called along path:
      sparse_init()
          ->sparse_early_usemaps_alloc_node()
      	->usemap_size()
      	    ->SECTION_BLOCKFLAGS_BITS
      		->((1UL << (PFN_SECTION_SHIFT - pageblock_order)) *
      NR_PAGEBLOCK_BITS)
      
      The uninitialized pageblock_size will cause memory wasting because
      usemap_size() returns a much bigger value then it's really needed.
      
      For example, on an Itanium platform,
      sparse_init() pageblock_order=0 usemap_size=24576
      free_area_init_core() before pageblock_order=0, usemap_size=24576
      free_area_init_core() after pageblock_order=12, usemap_size=8
      
      That means 24K memory has been wasted for each section, so fix it by calling
      set_pageblock_order() from sparse_init().
      Signed-off-by: default avatarXishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarJiang Liu <liuj97@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3927df7e
    • Andrew Morton's avatar
      mm/page_alloc.c: remove pageblock_default_order() · 27c4b68b
      Andrew Morton authored
      commit 955c1cd7 upstream.
      
      This has always been broken: one version takes an unsigned int and the
      other version takes no arguments.  This bug was hidden because one
      version of set_pageblock_order() was a macro which doesn't evaluate its
      argument.
      
      Simplify it all and remove pageblock_default_order() altogether.
      Reported-by: default avatarrajman mekaco <rajman.mekaco@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      27c4b68b
    • Mark Brown's avatar
      ASoC: wm8962: Allow VMID time to fully ramp · 2145a2c9
      Mark Brown authored
      commit 9d40e558 upstream.
      
      Required for reliable power up from cold.
      Signed-off-by: default avatarMark Brown <broonie@opensource.wolfsonmicro.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2145a2c9
    • Colin Ian King's avatar
      USB: echi-dbgp: increase the controller wait time to come out of halt. · 0776e23d
      Colin Ian King authored
      commit f96a4216 upstream.
      
      The default 10 microsecond delay for the controller to come out of
      halt in dbgp_ehci_startup is too short, so increase it to 1 millisecond.
      
      This is based on emperical testing on various USB debug ports on
      modern machines such as a Lenovo X220i and an Ivybridge development
      platform that needed to wait ~450-950 microseconds.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0776e23d
    • Russell King's avatar
      ARM: Fix undefined instruction exception handling · 83358b9b
      Russell King authored
      commit 15ac49b6 upstream.
      
      While trying to get a v3.5 kernel booted on the cubox, I noticed that
      VFP does not work correctly with VFP bounce handling.  This is because
      of the confusion over 16-bit vs 32-bit instructions, and where PC is
      supposed to point to.
      
      The rule is that FP handlers are entered with regs->ARM_pc pointing at
      the _next_ instruction to be executed.  However, if the exception is
      not handled, regs->ARM_pc points at the faulting instruction.
      
      This is easy for ARM mode, because we know that the next instruction and
      previous instructions are separated by four bytes.  This is not true of
      Thumb2 though.
      
      Since all FP instructions are 32-bit in Thumb2, it makes things easy.
      We just need to select the appropriate adjustment.  Do this by moving
      the adjustment out of do_undefinstr() into the assembly code, as only
      the assembly code knows whether it's dealing with a 32-bit or 16-bit
      instruction.
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      83358b9b
    • Will Deacon's avatar
      ARM: 7478/1: errata: extend workaround for erratum #720789 · 1b481958
      Will Deacon authored
      commit 5a783cbc upstream.
      
      Commit cdf357f1 ("ARM: 6299/1: errata: TLBIASIDIS and TLBIMVAIS
      operations can broadcast a faulty ASID") replaced by-ASID TLB flushing
      operations with all-ASID variants to workaround A9 erratum #720789.
      
      This patch extends the workaround to include the tlb_range operations,
      which were overlooked by the original patch.
      Tested-by: default avatarSteve Capper <steve.capper@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1b481958
    • Colin Cross's avatar
      ARM: 7477/1: vfp: Always save VFP state in vfp_pm_suspend on UP · cdab6eed
      Colin Cross authored
      commit 24b35521 upstream.
      
      vfp_pm_suspend should save the VFP state in suspend after
      any lazy context switch.  If it only saves when the VFP is enabled,
      the state can get lost when, on a UP system:
        Thread 1 uses the VFP
        Context switch occurs to thread 2, VFP is disabled but the
           VFP context is not saved
        Thread 2 initiates suspend
        vfp_pm_suspend is called with the VFP disabled, and the unsaved
           VFP context of Thread 1 in the registers
      
      Modify vfp_pm_suspend to save the VFP context whenever
      vfp_current_hw_state is not NULL.
      
      Includes a fix from Ido Yariv <ido@wizery.com>, who pointed out that on
      SMP systems, the state pointer can be pointing to a freed task struct if
      a task exited on another cpu, fixed by using #ifndef CONFIG_SMP in the
      new if clause.
      
      Cc: Barry Song <bs14@csr.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Ido Yariv <ido@wizery.com>
      Cc: Daniel Drake <dsd@laptop.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarColin Cross <ccross@android.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      cdab6eed
    • Colin Cross's avatar
      ARM: 7476/1: vfp: only clear vfp state for current cpu in vfp_pm_suspend · fc257bc5
      Colin Cross authored
      commit a84b895a upstream.
      
      vfp_pm_suspend runs on each cpu, only clear the hardware state
      pointer for the current cpu.  Prevents a possible crash if one
      cpu clears the hw state pointer when another cpu has already
      checked if it is valid.
      Signed-off-by: default avatarColin Cross <ccross@android.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      fc257bc5
    • Will Deacon's avatar
      ARM: 7467/1: mutex: use generic xchg-based implementation for ARMv6+ · c51cf242
      Will Deacon authored
      commit a76d7bd9 upstream.
      
      The open-coded mutex implementation for ARMv6+ cores suffers from a
      severe lack of barriers, so in the uncontended case we don't actually
      protect any accesses performed during the critical section.
      
      Furthermore, the code is largely a duplication of the ARMv6+ atomic_dec
      code but optimised to remove a branch instruction, as the mutex fastpath
      was previously inlined. Now that this is executed out-of-line, we can
      reuse the atomic access code for the locking (in fact, we use the xchg
      code as this produces shorter critical sections).
      
      This patch uses the generic xchg based implementation for mutexes on
      ARMv6+, which introduces barriers to the lock/unlock operations and also
      has the benefit of removing a fair amount of inline assembly code.
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarNicolas Pitre <nico@linaro.org>
      Reported-by: default avatarShan Kang <kangshan0910@gmail.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c51cf242
    • Shawn Guo's avatar
      ARM: 7466/1: disable interrupt before spinning endlessly · 2bd8f381
      Shawn Guo authored
      commit 98bd8b96 upstream.
      
      The CPU will endlessly spin at the end of machine_halt and
      machine_restart calls.  However, this will lead to a soft lockup
      warning after about 20 seconds, if CONFIG_LOCKUP_DETECTOR is enabled,
      as system timer is still alive.
      
      Disable interrupt before going to spin endlessly, so that the lockup
      warning will never be seen.
      Reported-by: default avatarMarek Vasut <marex@denx.de>
      Signed-off-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2bd8f381
    • Stanislav Kinsbursky's avatar
      SUNRPC: return negative value in case rpcbind client creation error · baea1631
      Stanislav Kinsbursky authored
      commit caea33da upstream.
      
      Without this patch kernel will panic on LockD start, because lockd_up() checks
      lockd_up_net() result for negative value.
      From my pow it's better to return negative value from rpcbind routines instead
      of replacing all such checks like in lockd_up().
      Signed-off-by: default avatarStanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      baea1631
    • Ryusuke Konishi's avatar
      nilfs2: fix deadlock issue between chcp and thaw ioctls · 58546c72
      Ryusuke Konishi authored
      commit 572d8b39 upstream.
      
      An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command:
      
       chcp            D ffff88013870f3d0     0  1325   1324 0x00000004
       ...
       Call Trace:
         nilfs_transaction_begin+0x11c/0x1a0 [nilfs2]
         wake_up_bit+0x20/0x20
         copy_from_user+0x18/0x30 [nilfs2]
         nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2]
         nilfs_ioctl+0x252/0x61a [nilfs2]
         do_page_fault+0x311/0x34c
         get_unmapped_area+0x132/0x14e
         do_vfs_ioctl+0x44b/0x490
         __set_task_blocked+0x5a/0x61
         vm_mmap_pgoff+0x76/0x87
         __set_current_blocked+0x30/0x4a
         sys_ioctl+0x4b/0x6f
         system_call_fastpath+0x16/0x1b
       thaw            D ffff88013870d890     0  1352   1351 0x00000004
       ...
       Call Trace:
         rwsem_down_failed_common+0xdb/0x10f
         call_rwsem_down_write_failed+0x13/0x20
         down_write+0x25/0x27
         thaw_super+0x13/0x9e
         do_vfs_ioctl+0x1f5/0x490
         vm_mmap_pgoff+0x76/0x87
         sys_ioctl+0x4b/0x6f
         filp_close+0x64/0x6c
         system_call_fastpath+0x16/0x1b
      
      where the thaw ioctl deadlocked at thaw_super() when called while chcp was
      waiting at nilfs_transaction_begin() called from
      nilfs_ioctl_change_cpmode().  This deadlock is 100% reproducible.
      
      This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in
      read mode and then waits for unfreezing in nilfs_transaction_begin(),
      whereas thaw_super() locks sb->s_umount in write mode.  The locking of
      sb->s_umount here was intended to make snapshot mounts and the downgrade
      of snapshots to checkpoints exclusive.
      
      This fixes the deadlock issue by replacing the sb->s_umount usage in
      nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot
      mounts.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      58546c72
    • Dan Rosenberg's avatar
      lib/vsprintf.c: kptr_restrict: fix pK-error in SysRq show-all-timers(Q) · 1b772a14
      Dan Rosenberg authored
      commit 3715c530 upstream.
      
      When using ALT+SysRq+Q all the pointers are replaced with "pK-error" like
      this:
      
      	[23153.208033]   .base:               pK-error
      
      with echo h > /proc/sysrq-trigger it works:
      
      	[23107.776363]   .base:       ffff88023e60d540
      
      The intent behind this behavior was to return "pK-error" in cases where
      the %pK format specifier was used in interrupt context, because the
      CAP_SYSLOG check wouldn't be meaningful.  Clearly this should only apply
      when kptr_restrict is actually enabled though.
      Reported-by: default avatarStevie Trujillo <stevie.trujillo@gmail.com>
      Signed-off-by: default avatarDan Rosenberg <dan.j.rosenberg@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1b772a14
    • Greg Pearson's avatar
      pcdp: use early_ioremap/early_iounmap to access pcdp table · ffb709e6
      Greg Pearson authored
      commit 6c4088ac upstream.
      
      efi_setup_pcdp_console() is called during boot to parse the HCDP/PCDP
      EFI system table and setup an early console for printk output.  The
      routine uses ioremap/iounmap to setup access to the HCDP/PCDP table
      information.
      
      The call to ioremap is happening early in the boot process which leads
      to a panic on x86_64 systems:
      
          panic+0x01ca
          do_exit+0x043c
          oops_end+0x00a7
          no_context+0x0119
          __bad_area_nosemaphore+0x0138
          bad_area_nosemaphore+0x000e
          do_page_fault+0x0321
          page_fault+0x0020
          reserve_memtype+0x02a1
          __ioremap_caller+0x0123
          ioremap_nocache+0x0012
          efi_setup_pcdp_console+0x002b
          setup_arch+0x03a9
          start_kernel+0x00d4
          x86_64_start_reservations+0x012c
          x86_64_start_kernel+0x00fe
      
      This replaces the calls to ioremap/iounmap in efi_setup_pcdp_console()
      with calls to early_ioremap/early_iounmap which can be called during
      early boot.
      
      This patch was tested on an x86_64 prototype system which uses the
      HCDP/PCDP table for early console setup.
      Signed-off-by: default avatarGreg Pearson <greg.pearson@hp.com>
      Acked-by: default avatarKhalid Aziz <khalid.aziz@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ffb709e6
    • NeilBrown's avatar
      md/raid1: don't abort a resync on the first badblock. · 898ece8e
      NeilBrown authored
      commit b7219ccb upstream.
      
      If a resync of a RAID1 array with 2 devices finds a known bad block
      one device it will neither read from, or write to, that device for
      this block offset.
      So there will be one read_target (The other device) and zero write
      targets.
      This condition causes md/raid1 to abort the resync assuming that it
      has finished - without known bad blocks this would be true.
      
      When there are no write targets because of the presence of bad blocks
      we should only skip over the area covered by the bad block.
      RAID10 already gets this right, raid1 doesn't.  Or didn't.
      
      As this can cause a 'sync' to abort early and appear to have succeeded
      it could lead to some data corruption, so it suitable for -stable.
      Reported-by: default avatarAlexander Lyakas <alex.bolshoy@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      898ece8e
    • Jeff Layton's avatar
      nfs: skip commit in releasepage if we're freeing memory for fs-related reasons · 1c88c581
      Jeff Layton authored
      commit 5cf02d09 upstream.
      
      We've had some reports of a deadlock where rpciod ends up with a stack
      trace like this:
      
          PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
           #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
           #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
           #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
           #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
           #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
           #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
           #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
           #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
           #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
           #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
          #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
          #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
          #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
          #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
          #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
          #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
          #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
          #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
          #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
          #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
          #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
          #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
          #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
          #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
          #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
          #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
      
      rpciod is trying to allocate memory for a new socket to talk to the
      server. The VM ends up calling ->releasepage to get more memory, and it
      tries to do a blocking commit. That commit can't succeed however without
      a connected socket, so we deadlock.
      
      Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
      socket allocation, and having nfs_release_page check for that flag when
      deciding whether to do a commit call. Also, set PF_FSTRANS
      unconditionally in rpc_async_schedule since that function can also do
      allocations sometimes.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1c88c581
    • Heiko Carstens's avatar
      s390/mm: fix fault handling for page table walk case · c929e9bc
      Heiko Carstens authored
      commit 008c2e8f upstream.
      
      Make sure the kernel does not incorrectly create a SIGBUS signal during
      user space accesses:
      
      For user space accesses in the switched addressing mode case the kernel
      may walk page tables and access user address space via the kernel
      mapping. If a page table entry is invalid the function __handle_fault()
      gets called in order to emulate a page fault and trigger all the usual
      actions like paging in a missing page etc. by calling handle_mm_fault().
      
      If handle_mm_fault() returns with an error fixup handling is necessary.
      For the switched addressing mode case all errors need to be mapped to
      -EFAULT, so that the calling uaccess function can return -EFAULT to
      user space.
      
      Unfortunately the __handle_fault() incorrectly calls do_sigbus() if
      VM_FAULT_SIGBUS is set. This however should only happen if a page fault
      was triggered by a user space instruction. For kernel mode uaccesses
      the correct action is to only return -EFAULT.
      So user space may incorrectly see SIGBUS signals because of this bug.
      
      For current machines this would only be possible for the switched
      addressing mode case in conjunction with futex operations.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      [bwh: Backported to 3.2: do_exception() and do_sigbus() parameters differ]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c929e9bc
    • Asias He's avatar
      virtio-blk: Use block layer provided spinlock · e5518571
      Asias He authored
      commit 2c95a329 upstream.
      
      Block layer will allocate a spinlock for the queue if the driver does
      not provide one in blk_init_queue().
      
      The reason to use the internal spinlock is that blk_cleanup_queue() will
      switch to use the internal spinlock in the cleanup code path.
      
              if (q->queue_lock != &q->__queue_lock)
                      q->queue_lock = &q->__queue_lock;
      
      However, processes which are in D state might have taken the driver
      provided spinlock, when the processes wake up, they would release the
      block provided spinlock.
      
      =====================================
      [ BUG: bad unlock balance detected! ]
      3.4.0-rc7+ #238 Not tainted
      -------------------------------------
      fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
      [<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
      but there are no more locks to release!
      
      other info that might help us debug this:
      1 lock held by fio/3587:
       #0:  (&(&vblk->lock)->rlock){......}, at:
      [<ffffffff8132661a>] get_request_wait+0x19a/0x250
      
      Other drivers use block layer provided spinlock as well, e.g. SCSI.
      
      Switching to the block layer provided spinlock saves a bit of memory and
      does not increase lock contention. Performance test shows no real
      difference is observed before and after this patch.
      
      Changes in v2: Improve commit log as Michael suggested.
      
      Cc: virtualization@lists.linux-foundation.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarAsias He <asias@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e5518571
    • Alex Hung's avatar
      asus-wmi: use ASUS_WMI_METHODID_DSTS2 as default DSTS ID. · 78578169
      Alex Hung authored
      commit 63a78bb1 upstream.
      
      According to responses from the BIOS team, ASUS_WMI_METHODID_DSTS2
      (0x53545344) will be used as future DSTS ID. In addition, calling
      asus_wmi_evaluate_method(ASUS_WMI_METHODID_DSTS2, 0, 0, NULL) returns
      ASUS_WMI_UNSUPPORTED_METHOD in new ASUS laptop PCs. This patch fixes
      no DSTS ID will be assigned in this case.
      Signed-off-by: default avatarAlex Hung <alex.hung@canonical.com>
      Signed-off-by: default avatarMatthew Garrett <mjg@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      78578169
    • H. Peter Anvin's avatar
      random: mix in architectural randomness in extract_buf() · 7499bd63
      H. Peter Anvin authored
      commit d2e7c96a upstream.
      
      Mix in any architectural randomness in extract_buf() instead of
      xfer_secondary_buf().  This allows us to mix in more architectural
      randomness, and it also makes xfer_secondary_buf() faster, moving a
      tiny bit of additional CPU overhead to process which is extracting the
      randomness.
      
      [ Commit description modified by tytso to remove an extended
        advertisement for the RDRAND instruction. ]
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: DJ Johnston <dj.johnston@intel.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7499bd63
    • Joe Thornber's avatar
      dm thin: fix memory leak in process_prepared_mapping error paths · 46b4d87e
      Joe Thornber authored
      commit 905386f8 upstream.
      
      Fix memory leak in process_prepared_mapping by always freeing
      the dm_thin_new_mapping structs from the mapping_pool mempool on
      the error paths.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      46b4d87e
    • Alasdair G Kergon's avatar
      dm thin: reduce endio_hook pool size · 95dc400b
      Alasdair G Kergon authored
      commit 7768ed33 upstream.
      
      Reduce the slab size used for the dm_thin_endio_hook mempool.
      
      Allocation has been seen to fail on machines with smaller amounts
      of memory due to fragmentation.
      
        lvm: page allocation failure. order:5, mode:0xd0
        device-mapper: table: 253:38: thin-pool: Error creating pool's endio_hook mempool
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      95dc400b
    • Tony Luck's avatar
      Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts · d0192ce7
      Tony Luck authored
      commit a1193655 upstream.
      
      The following build error occured during a ia64 build with
      swap-over-NFS patches applied.
      
      net/core/sock.c:274:36: error: initializer element is not constant
      net/core/sock.c:274:36: error: (near initialization for 'memalloc_socks')
      net/core/sock.c:274:36: error: initializer element is not constant
      
      This is identical to a parisc build error. Fengguang Wu, Mel Gorman
      and James Bottomley did all the legwork to track the root cause of
      the problem. This fix and entire commit log is shamelessly copied
      from them with one extra detail to change a dubious runtime use of
      ATOMIC_INIT() to atomic_set() in drivers/char/mspec.c
      
      Dave Anglin says:
      > Here is the line in sock.i:
      >
      > struct static_key memalloc_socks = ((struct static_key) { .enabled =
      > ((atomic_t) { (0) }) });
      
      The above line contains two compound literals.  It also uses a designated
      initializer to initialize the field enabled.  A compound literal is not a
      constant expression.
      
      The location of the above statement isn't fully clear, but if a compound
      literal occurs outside the body of a function, the initializer list must
      consist of constant expressions.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d0192ce7
    • Martin Schwidefsky's avatar
      s390/mm: downgrade page table after fork of a 31 bit process · f22027d0
      Martin Schwidefsky authored
      commit 0f6f281b upstream.
      
      The downgrade of the 4 level page table created by init_new_context is
      currently done only in start_thread31. If a 31 bit process forks the
      new mm uses a 4 level page table, including the task size of 2<<42
      that goes along with it. This is incorrect as now a 31 bit process
      can map memory beyond 2GB. Define arch_dup_mmap to do the downgrade
      after fork.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f22027d0
    • Alan Cox's avatar
      x86, nops: Missing break resulting in incorrect selection on Intel · 60ed9e38
      Alan Cox authored
      commit d6250a3f upstream.
      
      The Intel case falls through into the generic case which then changes
      the values.  For cases like the P6 it doesn't do the right thing so
      this seems to be a screwup.
      Signed-off-by: default avatarAlan Cox <alan@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-lww2uirad4skzjlmrm0vru8o@git.kernel.orgSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      60ed9e38
    • Takashi Iwai's avatar
      ALSA: mpu401: Fix missing initialization of irq field · 67258e44
      Takashi Iwai authored
      commit bc733d49 upstream.
      
      The irq field of struct snd_mpu401 is supposed to be initialized to -1.
      Since it's set to zero as of now, a probing error before the irq
      installation results in a kernel warning "Trying to free already-free
      IRQ 0".
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=44821Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      67258e44
    • Takashi Iwai's avatar
      ALSA: hda - Fix invalid D3 of headphone DAC on VT202x codecs · 2c7b0211
      Takashi Iwai authored
      commit 6162552b upstream.
      
      We've got a bug report about the silent output from the headphone on a
      mobo with VT2021, and spotted out that this was because of the wrong
      D3 state on the DAC for the headphone output.  The bug is triggered by
      the incomplete check for this DAC in set_widgets_power_state_vt1718S().
      It checks only the connectivity of the primary output (0x27) but
      doesn't consider the path from the headphone pin (0x28).
      
      Now this patch fixes the problem by checking both pins for DAC 0x0b.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [bwh: Backported to 3.2: keep using snd_hda_codec_write() as
       update_power_state() is missing]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2c7b0211
    • Seth Forshee's avatar
      Input: synaptics - handle out of bounds values from the hardware · 2028a493
      Seth Forshee authored
      commit c0394506 upstream.
      
      The touchpad on the Acer Aspire One D250 will report out of range values
      in the extreme lower portion of the touchpad. These appear as abrupt
      changes in the values reported by the hardware from very low values to
      very high values, which can cause unexpected vertical jumps in the
      position of the mouse pointer.
      
      What seems to be happening is that the value is wrapping to a two's
      compliment negative value of higher resolution than the 13-bit value
      reported by the hardware, with the high-order bits being truncated. This
      patch adds handling for these values by converting them to the
      appropriate negative values.
      
      The only tricky part about this is deciding when to treat a number as
      negative. It stands to reason that if out of range values can be
      reported on the low end then it could also happen on the high end, so
      not all out of range values should be treated as negative. The approach
      taken here is to split the difference between the maximum legitimate
      value for the axis and the maximum possible value that the hardware can
      report, treating values greater than this number as negative and all
      other values as positive. This can be tweaked later if hardware is found
      that operates outside of these parameters.
      
      BugLink: http://bugs.launchpad.net/bugs/1001251Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Reviewed-by: default avatarDaniel Kurtz <djkurtz@chromium.org>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2028a493
    • Alexander Holler's avatar
      video/smscufx: fix line counting in fb_write · efb6d7d6
      Alexander Holler authored
      commit 2fe2d9f4 upstream.
      
      Line 0 and 1 were both written to line 0 (on the display) and all subsequent
      lines had an offset of -1. The result was that the last line on the display
      was never overwritten by writes to /dev/fbN.
      
      The origin of this bug seems to have been udlfb.
      Signed-off-by: default avatarAlexander Holler <holler@ahsoftware.de>
      Signed-off-by: default avatarFlorian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      efb6d7d6