1. 20 Jan, 2016 40 commits
    • Xiong Zhang's avatar
      ALSA: hda - Set SKL+ hda controller power at freeze() and thaw() · 772115a0
      Xiong Zhang authored
      commit 3e6db33a upstream.
      
      It takes three minutes to enter into hibernation on some OEM SKL
      machines and we see many codec spurious response after thaw() opertion.
      This is because HDA is still in D0 state after freeze() call and
      pci_pm_freeze/pci_pm_freeze_noirq() don't set D3 hot in pci_bus driver.
      It seems bios still access HDA when system enter into freeze state,
      HDA will receive codec response interrupt immediately after thaw() call.
      Because of this unexpected interrupt, HDA enter into a abnormal
      state and slow down the system enter into hibernation.
      
      In this patch, we put HDA into D3 hot state in azx_freeze_noirq() and
      put HDA into D0 state in azx_thaw_noirq().
      
      V2: Only apply this fix to SKL+
          Fix compile error when CONFIG_PM_SLEEP isn't defined
      
      [Yet another fix for CONFIG_PM_SLEEP ifdef and the additional comment
       by tiwai]
      Signed-off-by: default avatarXiong Zhang <xiong.y.zhang@intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      772115a0
    • Stewart Smith's avatar
      powerpc/powernv: pr_warn_once on unsupported OPAL_MSG type · 180053d7
      Stewart Smith authored
      commit 98da62b7 upstream.
      
      When running on newer OPAL firmware that supports sending extra
      OPAL_MSG types, we would print a warning on *every* message received.
      
      This could be a problem for kernels that don't support OPAL_MSG_OCC
      on machines that are running real close to thermal limits and the
      OCC is throttling the chip. For a kernel that is paying attention to
      the message queue, we could get these notifications quite often.
      
      Conceivably, future message types could also come fairly often,
      and printing that we didn't understand them 10,000 times provides
      no further information than printing them once.
      Signed-off-by: default avatarStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      180053d7
    • Neelesh Gupta's avatar
      powerpc/powernv: Fix the overflow of OPAL message notifiers head array · e27e2eea
      Neelesh Gupta authored
      commit 792f96e9 upstream.
      
      Fixes the condition check of incoming message type which can
      otherwise shoot beyond the message notifiers head array.
      Signed-off-by: default avatarNeelesh Gupta <neelegup@linux.vnet.ibm.com>
      Reviewed-by: default avatarVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      Reviewed-by: default avatarAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e27e2eea
    • Vineet Gupta's avatar
      ARC: dw2 unwind: Ignore CIE version !=1 gracefully instead of bailing · 03831428
      Vineet Gupta authored
      commit 323f41f9 upstream.
      
      ARC dwarf unwinder only supports CIE version == 1
      The boot time dwarf sanitizer (part of binary lookup table constructor)
      would simply bail if it saw CIE version == 3, rendering unwinder with a
      NULL lookup table.
      
      It seems libgcc linked with kernel does have such entries.
      
      With fallback linear search removed, and a NULL binary lookup table,
      unwinder fails to generate any stack trace.
      
      So allow graceful ignoring of unsupported CIE entries.
      
      This problem was initially seen in Alexey's setup (and not mine) as he
      was using buildroot built toolchain (libgcc) which doesn't get built with
      CFLAGS_FOR_TARGET="-gdwarf-2 which is my default
      
      Fixes STAR 9000985048: "kernel unwinder broken with stock tools"
      
      Fixes: 2e22502c ARC: dw2 unwind: Remove falllback linear search thru FDE entries
      Reported-by Alexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      03831428
    • Vineet Gupta's avatar
      ARC: dw2 unwind: Reinstante unwinding out of modules · ee744eaf
      Vineet Gupta authored
      commit bc79c9a7 upstream.
      
      The fix which removed linear searching of dwarf (because binary lookup
      data always exists) missed out on the fact that modules don't get the
      binary lookup tables info. This caused unwinding out of modules to stop
      working.
      
      So add binary lookup header setup (equivalent of eh_frame_hdr setup) to
      modules as well.
      
      While at it, confine the header setup to within unwinder code,
      reducing one API exposed out of unwinder code.
      
      Fixes: 2e22502c ARC: dw2 unwind: Remove falllback linear search thru FDE entries
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ee744eaf
    • Steven Rostedt (Red Hat)'s avatar
      ftrace/scripts: Have recordmcount copy the object file · e97590e0
      Steven Rostedt (Red Hat) authored
      commit a50bd439 upstream.
      
      Russell King found that he had weird side effects when compiling the kernel
      with hard linked ccache. The reason was that recordmcount modified the
      kernel in place via mmap, and when a file gets modified twice by
      recordmcount, it will complain about it. To fix this issue, Russell wrote a
      patch that checked if the file was hard linked more than once and would
      unlink it if it was.
      
      Linus Torvalds was not happy with the fact that recordmcount does this in
      place modification. Instead of doing the unlink only if the file has two or
      more hard links, it does the unlink all the time. In otherwords, it always
      does a copy if it changed something. That is, it does the write out if a
      change was made.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e97590e0
    • Daniel Mentz's avatar
      dma-debug: Fix dma_debug_entry offset calculation · 9108bc6d
      Daniel Mentz authored
      commit 0354aec1 upstream.
      
      dma-debug uses struct dma_debug_entry to keep track of dma coherent
      memory allocation requests. The virtual address is converted into a pfn
      and an offset. Previously, the offset was calculated using an incorrect
      bit mask.  As a result, we saw incorrect error messages from dma-debug
      like the following:
      
      "DMA-API: exceeded 7 overlapping mappings of cacheline 0x03e00000"
      
      Cacheline 0x03e00000 does not exist on our platform.
      
      Fixes: 0abdd7a8 ("dma-debug: introduce debug_dma_assert_idle()")
      Signed-off-by: default avatarDaniel Mentz <danielmentz@google.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9108bc6d
    • Russell King's avatar
      scripts: recordmcount: break hardlinks · 33debb3f
      Russell King authored
      commit dd39a265 upstream.
      
      recordmcount edits the file in-place, which can cause problems when
      using ccache in hardlink mode.  Arrange for recordmcount to break a
      hardlinked object.
      
      Link: http://lkml.kernel.org/r/E1a7MVT-0000et-62@rmk-PC.arm.linux.org.ukSigned-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      33debb3f
    • Johan Hovold's avatar
      spi: fix parent-device reference leak · 7329eb12
      Johan Hovold authored
      commit 157f38f9 upstream.
      
      Fix parent-device reference leak due to SPI-core taking an unnecessary
      reference to the parent when allocating the master structure, a
      reference that was never released.
      
      Note that driver core takes its own reference to the parent when the
      master device is registered.
      
      Fixes: 49dce689 ("spi doesn't need class_device")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7329eb12
    • Takashi Iwai's avatar
      ALSA: hda - Add a fixup for Thinkpad X1 Carbon 2nd · c999f74c
      Takashi Iwai authored
      commit b6903c0e upstream.
      
      Apply the same fixup for Thinkpad with dock to Thinkpad X1 Carbon 2nd,
      too.  This reduces the annoying loud cracking noise problem, as well
      as the support of missing docking port.
      
      Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=958439Reported-and-tested-by: default avatarBenjamin Poirier <bpoirier@suse.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c999f74c
    • Anson Huang's avatar
      ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted · 8e46fbac
      Anson Huang authored
      commit fa0708b3 upstream.
      
      In cpu_v7_do_suspend routine, r11 is used while it is NOT
      saved/restored, different compiler may have different usage
      of ARM general registers, so it may cause issues during
      calling cpu_v7_do_suspend.
      
      We meet kernel fault occurs when using GCC 4.8.3, r11 contains
      valid value before calling into cpu_v7_do_suspend, but when returned
      from this routine, r11 is corrupted and lead to kernel fault.
      Doing save/restore for those corrupted registers is a must in
      assemble code.
      Signed-off-by: default avatarAnson Huang <Anson.Huang@freescale.com>
      Reviewed-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8e46fbac
    • Krzysztof Hałasa's avatar
      ARM: dts: imx6: Fix Ethernet PHY mode on Ventana boards · 839e856f
      Krzysztof Hałasa authored
      commit 3a35e470 upstream.
      
      Gateworks Ventana boards seem to need "RGMII-ID" (internal delay)
      PHY mode, instead of simple "RGMII", for their Marvell 88E1510
      transceiver. Otherwise, the Ethernet MAC doesn't work with Marvell PHY
      driver (TX doesn't seem to work correctly).
      
      Tested on GW5400 rev. C.
      
      This bug affects ARM Fedora 23.
      Signed-off-by: default avatarKrzysztof Hałasa <khalasa@piap.pl>
      Acked-by: default avatarTim Harvey <tharvey@gateworks.com>
      Signed-off-by: default avatarShawn Guo <shawnguo@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      839e856f
    • Anssi Hannula's avatar
      ALSA: usb-audio: Add a more accurate volume quirk for AudioQuest DragonFly · df86d60e
      Anssi Hannula authored
      commit 42e3121d upstream.
      
      AudioQuest DragonFly DAC reports a volume control range of 0..50
      (0x0000..0x0032) which in USB Audio means a range of 0 .. 0.2dB, which
      is obviously incorrect and would cause software using the dB information
      in e.g. volume sliders to have a massive volume difference in 100..102%
      range.
      
      Commit 2d1cb7f6 ("ALSA: usb-audio: add dB range mapping for some
      devices") added a dB range mapping for it with range 0..50 dB.
      
      However, the actual volume mapping seems to be neither linear volume nor
      linear dB scale, but instead quite close to the cubic mapping e.g.
      alsamixer uses, with a range of approx. -53...0 dB.
      
      Replace the previous quirk with a custom dB mapping based on some basic
      output measurements, using a 10-item range TLV (which will still fit in
      alsa-lib MAX_TLV_RANGE_SIZE).
      
      Tested on AudioQuest DragonFly HW v1.2. The quirk is only applied if the
      range is 0..50, so if this gets fixed/changed in later HW revisions it
      will no longer be applied.
      
      v2: incorporated Takashi Iwai's suggestion for the quirk application
      method
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@iki.fi>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      df86d60e
    • Thomas Gleixner's avatar
      genirq: Prevent chip buslock deadlock · 6d463ade
      Thomas Gleixner authored
      commit abc7e40c upstream.
      
      If a interrupt chip utilizes chip->buslock then free_irq() can
      deadlock in the following way:
      
      CPU0				CPU1
      				interrupt(X) (Shared or spurious)
      free_irq(X)			interrupt_thread(X)
      chip_bus_lock(X)
      				   irq_finalize_oneshot(X)
      				     chip_bus_lock(X)
      synchronize_irq(X)
      
      synchronize_irq() waits for the interrupt thread to complete,
      i.e. forever.
      
      Solution is simple: Drop chip_bus_lock() before calling
      synchronize_irq() as we do with the irq_desc lock. There is nothing to
      be protected after the point where irq_desc lock has been released.
      
      This adds chip_bus_lock/unlock() to the remove_irq() code path, but
      that's actually correct in the case where remove_irq() is called on
      such an interrupt. The current users of remove_irq() are not affected
      as none of those interrupts is on a chip which requires buslock.
      Reported-by: default avatarFredrik Markström <fredrik.markstrom@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6d463ade
    • Peter Hurley's avatar
      tty: Fix GPF in flush_to_ldisc() · 7f70c1f4
      Peter Hurley authored
      commit 9ce119f3 upstream.
      
      A line discipline which does not define a receive_buf() method can
      can cause a GPF if data is ever received [1]. Oddly, this was known
      to the author of n_tracesink in 2011, but never fixed.
      
      [1] GPF report
          BUG: unable to handle kernel NULL pointer dereference at           (null)
          IP: [<          (null)>]           (null)
          PGD 3752d067 PUD 37a7b067 PMD 0
          Oops: 0010 [#1] SMP KASAN
          Modules linked in:
          CPU: 2 PID: 148 Comm: kworker/u10:2 Not tainted 4.4.0-rc2+ #51
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Workqueue: events_unbound flush_to_ldisc
          task: ffff88006da94440 ti: ffff88006db60000 task.ti: ffff88006db60000
          RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
          RSP: 0018:ffff88006db67b50  EFLAGS: 00010246
          RAX: 0000000000000102 RBX: ffff88003ab32f88 RCX: 0000000000000102
          RDX: 0000000000000000 RSI: ffff88003ab330a6 RDI: ffff88003aabd388
          RBP: ffff88006db67c48 R08: ffff88003ab32f9c R09: ffff88003ab31fb0
          R10: ffff88003ab32fa8 R11: 0000000000000000 R12: dffffc0000000000
          R13: ffff88006db67c20 R14: ffffffff863df820 R15: ffff88003ab31fb8
          FS:  0000000000000000(0000) GS:ffff88006dc00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
          CR2: 0000000000000000 CR3: 0000000037938000 CR4: 00000000000006e0
          Stack:
           ffffffff829f46f1 ffff88006da94bf8 ffff88006da94bf8 0000000000000000
           ffff88003ab31fb0 ffff88003aabd438 ffff88003ab31ff8 ffff88006430fd90
           ffff88003ab32f9c ffffed0007557a87 1ffff1000db6cf78 ffff88003ab32078
          Call Trace:
           [<ffffffff8127cf91>] process_one_work+0x8f1/0x17a0 kernel/workqueue.c:2030
           [<ffffffff8127df14>] worker_thread+0xd4/0x1180 kernel/workqueue.c:2162
           [<ffffffff8128faaf>] kthread+0x1cf/0x270 drivers/block/aoe/aoecmd.c:1302
           [<ffffffff852a7c2f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468
          Code:  Bad RIP value.
          RIP  [<          (null)>]           (null)
           RSP <ffff88006db67b50>
          CR2: 0000000000000000
          ---[ end trace a587f8947e54d6ea ]---
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7f70c1f4
    • Peter Hurley's avatar
      n_tty: Fix poll() after buffer-limited eof push read · c6b31270
      Peter Hurley authored
      commit ac8f3bf8 upstream.
      
      commit 40d5e090 ("n_tty: Fix EOF push handling") fixed EOF push
      for reads. However, that approach still allows a condition mismatch
      between poll() and read(), where poll() returns POLLIN but read()
      blocks. This state can happen when a previous read() returned because
      the user buffer was full and the next character was an EOF not at the
      beginning of the line. While the next read() will properly identify
      the condition and advance the read buffer tail without improperly
      indicating an EOF file condition (ie., read() will not mistakenly
      return 0), poll() will mistakenly indicate POLLIN.
      
      Although a possible solution would be to peek at the input buffer
      in n_tty_poll(), the better solution in this patch is to eat the
      EOF during the previous read() (ie., fix the problem by eliminating
      the condition).
      
      The current canon line buffer copy limits the scan for next end-of-line
      to the smaller of either,
         a. the remaining user buffer size
         b. completed lines in the input buffer
      When the remaining user buffer size is exactly one less than the
      end-of-line marked by EOF push, the EOF is not scanned nor skipped
      but left for subsequent reads. In the example below, the scan
      index 'eol' has stopped at the EOF because it is past the scan
      limit of 5 (not because it has found the next set bit in read_flags)
      
         user buffer [*nr = 5]    _ _ _ _ _
      
         read_flags               0 0 0 0 0   1
         input buffer             h e l l o [EOF]
                                  ^           ^
                                 /           /
                               tail        eol
      
         result: found = 0, tail += 5, *nr += 5
      
      Instead, allow the scan to peek ahead 1 byte (while still limiting the
      scan to completed lines in the input buffer). For the example above,
      
         result: found = 1, tail += 6, *nr += 5
      
      Because the scan limit is now bumped +1 byte, when the scan is
      completed, the tail advance and the user buffer copy limit is
      re-clamped to *nr when EOF is _not_ found.
      
      Fixes: 40d5e090 ("n_tty: Fix EOF push handling")
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c6b31270
    • Mans Rullgard's avatar
      ASoC: wm8974: set cache type for regmap · 18f60cb2
      Mans Rullgard authored
      commit 1ea5998a upstream.
      
      Attempting to use this codec driver triggers a BUG() in regcache_sync()
      since no cache type is set.  The register map of this device is fairly
      small and has few holes so a flat cache is suitable.
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Acked-by: default avatarCharles Keepax <ckeepax@opensource.wolfsonmicro.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      18f60cb2
    • Dmitry V. Levin's avatar
      sh64: fix __NR_fgetxattr · 3299b3e3
      Dmitry V. Levin authored
      commit 2d33fa10 upstream.
      
      According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr
      has to be defined to 259, but it doesn't.  Instead, it's defined to 269,
      which is of course used by another syscall, __NR_sched_setaffinity in this
      case.
      
      This bug was found by strace test suite.
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Acked-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3299b3e3
    • Junxiao Bi's avatar
      ocfs2: fix SGID not inherited issue · 323c78bc
      Junxiao Bi authored
      commit 854ee2e9 upstream.
      
      Commit 8f1eb487 ("ocfs2: fix umask ignored issue") introduced an
      issue, SGID of sub dir was not inherited from its parents dir.  It is
      because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but
      is overwritten by "mode" which don't have SGID set later.
      
      Fixes: 8f1eb487 ("ocfs2: fix umask ignored issue")
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Acked-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      323c78bc
    • Seth Jennings's avatar
      drivers/base/memory.c: prohibit offlining of memory blocks with missing sections · 26178a12
      Seth Jennings authored
      commit 26bbe7ef upstream.
      
      Commit bdee237c ("x86: mm: Use 2GB memory block size on large-memory
      x86-64 systems") and 982792c7 ("x86, mm: probe memory block size for
      generic x86 64bit") introduced large block sizes for x86.  This made it
      possible to have multiple sections per memory block where previously,
      there was a only every one section per block.
      
      Since blocks consist of contiguous ranges of section, there can be holes
      in the blocks where sections are not present.  If one attempts to
      offline such a block, a crash occurs since the code is not designed to
      deal with this.
      
      This patch is a quick fix to gaurd against the crash by not allowing
      blocks with non-present sections to be offlined.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=107781Signed-off-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Reported-by: default avatarAndrew Banman <abanman@sgi.com>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Russ Anderson <rja@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      26178a12
    • Naoya Horiguchi's avatar
      mm: hugetlb: call huge_pte_alloc() only if ptep is null · 42553197
      Naoya Horiguchi authored
      commit 0d777df5 upstream.
      
      Currently at the beginning of hugetlb_fault(), we call huge_pte_offset()
      and check whether the obtained *ptep is a migration/hwpoison entry or
      not.  And if not, then we get to call huge_pte_alloc().  This is racy
      because the *ptep could turn into migration/hwpoison entry after the
      huge_pte_offset() check.  This race results in BUG_ON in
      huge_pte_alloc().
      
      We don't have to call huge_pte_alloc() when the huge_pte_offset()
      returns non-NULL, so let's fix this bug with moving the code into else
      block.
      
      Note that the *ptep could turn into a migration/hwpoison entry after
      this block, but that's not a problem because we have another
      !pte_present check later (we never go into hugetlb_no_page() in that
      case.)
      
      Fixes: 290408d4 ("hugetlb: hugepage migration core")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      42553197
    • Michal Hocko's avatar
      mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress · 89040e36
      Michal Hocko authored
      commit 373ccbe5 upstream.
      
      Tetsuo Handa has reported that the system might basically livelock in
      OOM condition without triggering the OOM killer.
      
      The issue is caused by internal dependency of the direct reclaim on
      vmstat counter updates (via zone_reclaimable) which are performed from
      the workqueue context.  If all the current workers get assigned to an
      allocation request, though, they will be looping inside the allocator
      trying to reclaim memory but zone_reclaimable can see stalled numbers so
      it will consider a zone reclaimable even though it has been scanned way
      too much.  WQ concurrency logic will not consider this situation as a
      congested workqueue because it relies that worker would have to sleep in
      such a situation.  This also means that it doesn't try to spawn new
      workers or invoke the rescuer thread if the one is assigned to the
      queue.
      
      In order to fix this issue we need to do two things.  First we have to
      let wq concurrency code know that we are in trouble so we have to do a
      short sleep.  In order to prevent from issues handled by 0e093d99
      ("writeback: do not sleep on the congestion queue if there are no
      congested BDIs or if significant congestion is not being encountered in
      the current zone") we limit the sleep only to worker threads which are
      the ones of the interest anyway.
      
      The second thing to do is to create a dedicated workqueue for vmstat and
      mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
      have a spare worker thread for it.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Cristopher Lameter <clameter@sgi.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ kamal: backport to 4.2-stable: use queue_delayed_work() in vmstat_update ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      89040e36
    • Christoph Lameter's avatar
      vmstat: Reduce time interval to stat update on idle cpu · 3349aa40
      Christoph Lameter authored
      commit 57c2e36b upstream.
      
      It was noted that the vm stat shepherd runs every 2 seconds and that the
      vmstat update is then scheduled 2 seconds in the future.
      
      This yields an interval of double the time interval which is not desired.
      
      Change the shepherd so that it does not delay the vmstat update on the
      other cpu.  We stil have to use schedule_delayed_work since we are using a
      delayed_work_struct but we can set the delay to 0.
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Vinayak Menon <vinmenon@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ kamal: 3.19-stable prereq for
        373ccbe5 mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3349aa40
    • Naoya Horiguchi's avatar
      mm: hugetlb: fix hugepage memory leak caused by wrong reserve count · 9f631c00
      Naoya Horiguchi authored
      commit a88c7695 upstream.
      
      When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back on
      alloc_buddy_huge_page() to directly create a hugepage from the buddy
      allocator.
      
      In that case, however, if alloc_buddy_huge_page() succeeds we don't
      decrement h->resv_huge_pages, which means that successful
      hugetlb_fault() returns without releasing the reserve count.  As a
      result, subsequent hugetlb_fault() might fail despite that there are
      still free hugepages.
      
      This patch simply adds decrementing code on that code path.
      
      I reproduced this problem when testing v4.3 kernel in the following situation:
       - the test machine/VM is a NUMA system,
       - hugepage overcommiting is enabled,
       - most of hugepages are allocated and there's only one free hugepage
         which is on node 0 (for example),
       - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to
         node 1, tries to allocate a hugepage,
       - the allocation should fail but the reserve count is still hold.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ luis: backported to 3.16:
        - use 'chg' instead of 'gbl_chg'
        - adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9f631c00
    • Mikulas Patocka's avatar
      parisc iommu: fix panic due to trying to allocate too large region · e8fea463
      Mikulas Patocka authored
      commit e46e31a3 upstream.
      
      When using the Promise TX2+ SATA controller on PA-RISC, the system often
      crashes with kernel panic, for example just writing data with the dd
      utility will make it crash.
      
      Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources
      
      CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
      Backtrace:
       [<000000004021497c>] show_stack+0x14/0x20
       [<0000000040410bf0>] dump_stack+0x88/0x100
       [<000000004023978c>] panic+0x124/0x360
       [<0000000040452c18>] sba_alloc_range+0x698/0x6a0
       [<0000000040453150>] sba_map_sg+0x260/0x5b8
       [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
       [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
       [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
       [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
       [<000000004049da34>] scsi_request_fn+0x6e4/0x970
       [<00000000403e95a8>] __blk_run_queue+0x40/0x60
       [<00000000403e9d8c>] blk_run_queue+0x3c/0x68
       [<000000004049a534>] scsi_run_queue+0x2a4/0x360
       [<000000004049be68>] scsi_end_request+0x1a8/0x238
       [<000000004049de84>] scsi_io_completion+0xfc/0x688
       [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0
      
      The cause of the crash is not exhaustion of the IOMMU space, there is
      plenty of free pages. The function sba_alloc_range is called with size
      0x11000, thus the pages_needed variable is 0x11. The function
      sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
      0x10 (because dma_get_seg_boundary(dev) returns 0xffff).
      
      The function sba_search_bitmap attempts to allocate 17 pages that must not
      cross 16-page boundary - it can't satisfy this requirement
      (iommu_is_span_boundary always returns true) and fails even if there are
      many free entries in the IOMMU space.
      
      How did it happen that we try to allocate 17 pages that don't cross
      16-page boundary? The cause is in the function iommu_coalesce_chunks. This
      function tries to coalesce adjacent entries in the scatterlist. The
      function does several checks if it may coalesce one entry with the next,
      one of those checks is this:
      
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      
      When it finishes coalescing adjacent entries, it allocates the mapping:
      
      sg_dma_len(contig_sg) = dma_len;
      dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      sg_dma_address(contig_sg) =
      	PIDE_FLAG
      	| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
      	| dma_offset;
      
      It is possible that (startsg->length + dma_len > max_seg_size) is false
      (we are just near the 0x10000 max_seg_size boundary), so the funcion
      decides to coalesce this entry with the next entry. When the coalescing
      succeeds, the function performs
      	dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
      iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
      to allocate 17 pages for a device that must not cross 16-page boundary.
      
      To fix the bug, we must make sure that dma_len after addition of
      dma_offset and alignment doesn't cross the segment boundary. I.e. change
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      to
      	if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
      		break;
      
      This patch makes this change (it precalculates max_seg_boundary at the
      beginning of the function iommu_coalesce_chunks). I also added a check
      that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
      not needed for Promise TX2+ SATA, but it may be needed for other devices
      that have dma_get_seg_boundary lower than dma_get_max_seg_size).
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e8fea463
    • Prarit Bhargava's avatar
      powercap / RAPL: fix BIOS lock check · 67a5e079
      Prarit Bhargava authored
      commit 79a21dbf upstream.
      
      Intel RAPL initialized on several systems where the BIOS lock bit (msr
      0x610, bit 63) was set.  This occured because the return value of
      rapl_read_data_raw() was being checked, rather than the value of the variable
      passed in, locked.
      
      This patch properly implments the rapl_read_data_raw() call to check the
      variable locked, and now the Intel RAPL driver outputs the warning:
      
      	intel_rapl: RAPL package 0 domain package locked by BIOS
      
      and does not initialize for the package.
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Acked-by: default avatarJacob Pan <jacob.jun.pan@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      67a5e079
    • Alan Stern's avatar
      USB: add quirk for devices with broken LPM · 8fa6c4cc
      Alan Stern authored
      commit ad87e032 upstream.
      
      Some USB device / host controller combinations seem to have problems
      with Link Power Management.  For example, Steinar found that his xHCI
      controller wouldn't handle bandwidth calculations correctly for two
      video cards simultaneously when LPM was enabled, even though the bus
      had plenty of bandwidth available.
      
      This patch introduces a new quirk flag for devices that should remain
      disabled for LPM, and creates quirk entries for Steinar's devices.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarSteinar H. Gunderson <sgunderson@bigfoot.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8fa6c4cc
    • Mathias Nyman's avatar
      xhci: fix usb2 resume timing and races. · 8f367951
      Mathias Nyman authored
      commit f69115fd upstream.
      
      According to USB 2 specs ports need to signal resume for at least 20ms,
      in practice even longer, before moving to U0 state.
      Both host and devices can initiate resume.
      
      On device initiated resume, a port status interrupt with the port in resume
      state in issued. The interrupt handler tags a resume_done[port]
      timestamp with current time + USB_RESUME_TIMEOUT, and kick roothub timer.
      Root hub timer requests for port status, finds the port in resume state,
      checks if resume_done[port] timestamp passed, and set port to U0 state.
      
      On host initiated resume, current code sets the port to resume state,
      sleep 20ms, and finally sets the port to U0 state. This should also
      be changed to work in a similar way as the device initiated resume, with
      timestamp tagging, but that is not yet tested and will be a separate
      fix later.
      
      There are a few issues with this approach
      
      1. A host initiated resume will also generate a resume event. The event
         handler will find the port in resume state, believe it's a device
         initiated resume, and act accordingly.
      
      2. A port status request might cut the resume signalling short if a
         get_port_status request is handled during the host resume signalling.
         The port will be found in resume state. The timestamp is not set leading
         to time_after_eq(jiffies, timestamp) returning true, as timestamp = 0.
         get_port_status will proceed with moving the port to U0.
      
      3. If an error, or anything else happens to the port during device
         initiated resume signalling it will leave all the device resume
         parameters hanging uncleared, preventing further suspend, returning
         -EBUSY, and cause the pm thread to busyloop trying to enter suspend.
      
      Fix this by using the existing resuming_ports bitfield to indicate that
      resume signalling timing is taken care of.
      Check if the resume_done[port] is set before using it for timestamp
      comparison, and also clear out any resume signalling related variables
      if port is not in U0 or Resume state
      
      This issue was discovered when a PM thread busylooped, trying to runtime
      suspend the xhci USB 2 roothub on a Dell XPS
      Reported-by: default avatarDaniel J Blueman <daniel@quora.org>
      Tested-by: default avatarDaniel J Blueman <daniel@quora.org>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8f367951
    • James Bottomley's avatar
      ses: fix additional element traversal bug · f217c967
      James Bottomley authored
      commit 5e103356 upstream.
      
      KASAN found that our additional element processing scripts drop off
      the end of the VPD page into unallocated space.  The reason is that
      not every element has additional information but our traversal
      routines think they do, leading to them expecting far more additional
      information than is present.  Fix this by adding a gate to the
      traversal routine so that it only processes elements that are expected
      to have additional information (list is in SES-2 section 6.1.13.1:
      Additional Element Status diagnostic page overview)
      Reported-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Tested-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f217c967
    • Stefan Agner's avatar
      ARM: dts: vf610: use reset values for L2 cache latencies · 42ec3a62
      Stefan Agner authored
      commit 9c171905 upstream.
      
      Linux on Vybrid used several different L2 latencies so far, none
      of them seem to be the right ones. According to the application note
      AN4947 ("Understanding Vybrid Architecture"), the tag portion runs
      on CPU clock and is inside the L2 cache controller, whereas the data
      portion is stored in the external SRAM running on platform clock.
      Hence it is likely that the correct value requires a higher data
      latency then tag latency.
      
      These are the values which have been used so far:
      - The mainline values:
        arm,data-latency = <1 1 1>;
        arm,tag-latency = <2 2 2>;
        Those values have lead to problems on higher clocks. They look
        like a poor translation from the reset values (missing +1 offset
        and a mix up between tag/latency values).
      - The Linux 3.0 (SoC vendor BSP) values (converted to DT notation):
        arm,data-latency = <4 2 3>
        arm,tag-latency = <4 2 3>
        The cache initialization function along with the value matches the
        i.MX6 code from the same kernel, so it seems that those values have
        just been copied.
      - The Colibri values:
        arm,data-latency = <2 1 2>;
        arm,tag-latency = <3 2 3>;
        Those were a mix between the values of the Linux 3.0 based BSP and
        the mainline values above.
      - The SoC Reset values (converted to DT notation):
        arm,data-latency = <3 3 3>;
        arm,tag-latency = <2 2 2>;
      
      So far there is no official statement on what the correct values are.
      See also the related Freescale community thread:
      https://community.freescale.com/message/579785#579785
      
      For now, the reset values seem to be the best bet. Remove all other
      "bogus" values and use the reset value on vf610.dtsi level.
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Signed-off-by: default avatarShawn Guo <shawnguo@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      42ec3a62
    • Peter Ujfalusi's avatar
      ASoC: davinci-mcasp: Fix XDATA check in mcasp_start_tx · 3c195497
      Peter Ujfalusi authored
      commit e2a0c9fa upstream.
      
      The condition for checking for XDAT being cleared was not correct.
      
      Fixes: 36bcecd0 ("ASoC: davinci-mcasp: Correct TX start sequence")
      Signed-off-by: default avatarPeter Ujfalusi <peter.ujfalusi@ti.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3c195497
    • Kirill A. Shutemov's avatar
      vgaarb: fix signal handling in vga_get() · 9521297b
      Kirill A. Shutemov authored
      commit 9f5bd308 upstream.
      
      There are few defects in vga_get() related to signal hadning:
      
        - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE
          case;
      
        - if we found pending signal we must remove ourself from wait queue
          and change task state back to running;
      
        - -ERESTARTSYS is more appropriate, I guess.
      Signed-off-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Reviewed-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9521297b
    • James Bottomley's avatar
      ses: Fix problems with simple enclosures · 8e937e7f
      James Bottomley authored
      commit 3417c1b5 upstream.
      
      Simple enclosure implementations (mostly USB) are allowed to return only
      page 8 to every diagnostic query.  That really confuses our
      implementation because we assume the return is the page we asked for and
      end up doing incorrect offsets based on bogus information leading to
      accesses outside of allocated ranges.  Fix that by checking the page
      code of the return and giving an error if it isn't the one we asked for.
      This should fix reported bugs with USB storage by simply refusing to
      attach to enclosures that behave like this.  It's also good defensive
      practise now that we're starting to see more USB enclosures.
      Reported-by: default avatarAndrea Gelmini <andrea.gelmini@gelma.net>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Reviewed-by: default avatarTomas Henzl <thenzl@redhat.com>
      Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8e937e7f
    • Joe Thornber's avatar
      dm btree: fix bufio buffer leaks in dm_btree_del() error path · dcaff136
      Joe Thornber authored
      commit ed8b45a3 upstream.
      
      If dm_btree_del()'s call to push_frame() fails, e.g. due to
      btree_node_validator finding invalid metadata, the dm_btree_del() error
      path must unlock all frames (which have active dm-bufio buffers) that
      were pushed onto the del_stack.
      
      Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio
      buffers have leaked, e.g.:
        device-mapper: bufio: leaked buffer 3, hold count 1, list 0
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      dcaff136
    • Johannes Berg's avatar
      rfkill: copy the name into the rfkill struct · 78ecb946
      Johannes Berg authored
      commit b7bb1100 upstream.
      
      Some users of rfkill, like NFC and cfg80211, use a dynamic name when
      allocating rfkill, in those cases dev_name(). Therefore, the pointer
      passed to rfkill_alloc() might not be valid forever, I specifically
      found the case that the rfkill name was quite obviously an invalid
      pointer (or at least garbage) when the wiphy had been renamed.
      
      Fix this by making a copy of the rfkill name in rfkill_alloc().
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      78ecb946
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR · e091254a
      Paul Mackerras authored
      commit c20875a3 upstream.
      
      Currently it is possible for userspace (e.g. QEMU) to set a value
      for the MSR for a guest VCPU which has both of the TS bits set,
      which is an illegal combination.  The result of this is that when
      we execute a hrfid (hypervisor return from interrupt doubleword)
      instruction to enter the guest, the CPU will take a TM Bad Thing
      type of program interrupt (vector 0x700).
      
      Now, if PR KVM is configured in the kernel along with HV KVM, we
      actually handle this without crashing the host or giving hypervisor
      privilege to the guest; instead what happens is that we deliver a
      program interrupt to the guest, with SRR0 reflecting the address
      of the hrfid instruction and SRR1 containing the MSR value at that
      point.  If PR KVM is not configured in the kernel, then we try to
      run the host's program interrupt handler with the MMU set to the
      guest context, which almost certainly causes a host crash.
      
      This closes the hole by making kvmppc_set_msr_hv() check for the
      illegal combination and force the TS field to a safe value (00,
      meaning non-transactional).
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e091254a
    • John Keeping's avatar
      ASoC: es8328: Fix deemphasis values · a41c44c8
      John Keeping authored
      commit 84ebac4d upstream.
      
      This is using completely the wrong mask and value when updating the
      register.  Since the correct values are already defined in the header,
      switch to using a table with explicit constants rather than shifting the
      array index.
      Signed-off-by: default avatarJohn Keeping <john@metanate.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a41c44c8
    • Jan Stancek's avatar
      ipmi: move timer init to before irq is setup · 31764cd9
      Jan Stancek authored
      commit 27f972d3 upstream.
      
      We encountered a panic on boot in ipmi_si on a dell per320 due to an
      uninitialized timer as follows.
      
      static int smi_start_processing(void       *send_info,
                                      ipmi_smi_t intf)
      {
              /* Try to claim any interrupts. */
              if (new_smi->irq_setup)
                      new_smi->irq_setup(new_smi);
      
       --> IRQ arrives here and irq handler tries to modify uninitialized timer
      
          which triggers BUG_ON(!timer->function) in __mod_timer().
      
       Call Trace:
         <IRQ>
         [<ffffffffa0532617>] start_new_msg+0x47/0x80 [ipmi_si]
         [<ffffffffa053269e>] start_check_enables+0x4e/0x60 [ipmi_si]
         [<ffffffffa0532bd8>] smi_event_handler+0x1e8/0x640 [ipmi_si]
         [<ffffffff810f5584>] ? __rcu_process_callbacks+0x54/0x350
         [<ffffffffa053327c>] si_irq_handler+0x3c/0x60 [ipmi_si]
         [<ffffffff810efaf0>] handle_IRQ_event+0x60/0x170
         [<ffffffff810f245e>] handle_edge_irq+0xde/0x180
         [<ffffffff8100fc59>] handle_irq+0x49/0xa0
         [<ffffffff8154643c>] do_IRQ+0x6c/0xf0
         [<ffffffff8100ba53>] ret_from_intr+0x0/0x11
      
              /* Set up the timer that drives the interface. */
              setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi);
      
      The following patch fixes the problem.
      
      To: Openipmi-developer@lists.sourceforge.net
      To: Corey Minyard <minyard@acm.org>
      CC: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarJan Stancek <jstancek@redhat.com>
      Signed-off-by: default avatarTony Camuso <tcamuso@redhat.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      31764cd9
    • Joe Thornber's avatar
      dm space map metadata: fix ref counting bug when bootstrapping a new space map · 0b5f23b5
      Joe Thornber authored
      commit 50dd842a upstream.
      
      When applying block operations (BOPs) do not remove them from the
      uncommitted BOP ring-buffer until after they've been applied -- in case
      we recurse.
      
      Also, perform BOP_INC operation, in dm_sm_metadata_create() and
      sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather
      than using direct calls to sm_ll_inc().
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0b5f23b5
    • Joe Thornber's avatar
      dm thin metadata: fix bug when taking a metadata snapshot · 589850d3
      Joe Thornber authored
      commit 49e99fc7 upstream.
      
      When you take a metadata snapshot the btree roots for the mapping and
      details tree need to have their reference counts incremented so they
      persist for the lifetime of the metadata snap.
      
      The roots being incremented were those currently written in the
      superblock, which could possibly be out of date if concurrent IO is
      triggering new mappings, breaking of sharing, etc.
      
      Fix this by performing a commit with the metadata lock held while taking
      a metadata snapshot.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      589850d3