1. 25 Jan, 2016 39 commits
    • Steven Rostedt (Red Hat)'s avatar
      ftrace/module: Call clean up function when module init fails early · 2d6a0dc1
      Steven Rostedt (Red Hat) authored
      commit 049fb9bd upstream.
      
      If the module init code fails after calling ftrace_module_init() and before
      calling do_init_module(), we can suffer from a memory leak. This is because
      ftrace_module_init() allocates pages to store the locations that ftrace
      hooks are placed in the module text. If do_init_module() fails, it still
      calls the MODULE_GOING notifiers which will tell ftrace to do a clean up of
      the pages it allocated for the module. But if load_module() fails before
      then, the pages allocated by ftrace_module_init() will never be freed.
      
      Call ftrace_release_mod() on the module if load_module() fails before
      getting to do_init_module().
      
      Link: http://lkml.kernel.org/r/567CEA31.1070507@intel.comReported-by: default avatar"Qiu, PeiyangX" <peiyangx.qiu@intel.com>
      Fixes: a949ae56 "ftrace/module: Hardcode ftrace_module_init() call into load_module()"
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2d6a0dc1
    • Roman Volkov's avatar
      dts: vt8500: Add SDHC node to DTS file for WM8650 · 7ab823bb
      Roman Volkov authored
      commit 0f090bf1 upstream.
      
      Since WM8650 has the same 'WMT' SDHC controller as WM8505, and the driver
      is already in the kernel, this node enables the controller support for
      WM8650
      Signed-off-by: default avatarRoman Volkov <rvolkov@v1ros.org>
      Reviewed-by: default avatarAlexey Charkov <alchark@gmail.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7ab823bb
    • NeilBrown's avatar
      async_tx: use GFP_NOWAIT rather than GFP_IO · 89341a82
      NeilBrown authored
      commit b02bab6b upstream.
      
      These async_XX functions are called from md/raid5 in an atomic
      section, between get_cpu() and put_cpu(), so they must not sleep.
      So use GFP_NOWAIT rather than GFP_IO.
      
      Dan Williams writes: Longer term async_tx needs to be merged into md
      directly as we can allocate this unmap data statically per-stripe
      rather than per request.
      
      Fixed: 7476bd79 ("async_pq: convert to dmaengine_unmap_data")
      Reported-and-tested-by: default avatarStanislav Samsonov <slava@annapurnalabs.com>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      89341a82
    • Qiu Peiyang's avatar
      tracing: Fix setting of start_index in find_next() · ed801ef1
      Qiu Peiyang authored
      commit f36d1be2 upstream.
      
      When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel
      panic at t_show.
      
      general protection fault: 0000 [#1] PREEMPT SMP
      CPU: 0 PID: 2957 Comm: sh Tainted: G W  O 3.14.55-x86_64-01062-gd4acdc7 #2
      RIP: 0010:[<ffffffff811375b2>]
       [<ffffffff811375b2>] t_show+0x22/0xe0
      RSP: 0000:ffff88002b4ebe80  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
      RDX: 0000000000000004 RSI: ffffffff81fd26a6 RDI: ffff880032f9f7b1
      RBP: ffff88002b4ebe98 R08: 0000000000001000 R09: 000000000000ffec
      R10: 0000000000000000 R11: 000000000000000f R12: ffff880004d9b6c0
      R13: 7365725f6d706400 R14: ffff880004d9b6c0 R15: ffffffff82020570
      FS:  0000000000000000(0000) GS:ffff88003aa00000(0063) knlGS:00000000f776bc40
      CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 00000000f6c02ff0 CR3: 000000002c2b3000 CR4: 00000000001007f0
      Call Trace:
       [<ffffffff811dc076>] seq_read+0x2f6/0x3e0
       [<ffffffff811b749b>] vfs_read+0x9b/0x160
       [<ffffffff811b7f69>] SyS_read+0x49/0xb0
       [<ffffffff81a3a4b9>] ia32_do_call+0x13/0x13
       ---[ end trace 5bd9eb630614861e ]---
      Kernel panic - not syncing: Fatal exception
      
      When the first time find_next calls find_next_mod_format, it should
      iterate the trace_bprintk_fmt_list to find the first print format of
      the module. However in current code, start_index is smaller than *pos
      at first, and code will not iterate the list. Latter container_of will
      get the wrong address with former v, which will cause mod_fmt be a
      meaningless object and so is the returned mod_fmt->fmt.
      
      This patch will fix it by correcting the start_index. After fixed,
      when the first time calls find_next_mod_format, start_index will be
      equal to *pos, and code will iterate the trace_bprintk_fmt_list to
      get the right module printk format, so is the returned mod_fmt->fmt.
      
      Link: http://lkml.kernel.org/r/5684B900.9000309@intel.com
      
      Fixes: 102c9323 "tracing: Add __tracepoint_string() to export string pointers"
      Signed-off-by: default avatarQiu Peiyang <peiyangx.qiu@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ed801ef1
    • Colin Ian King's avatar
      ftrace/scripts: Fix incorrect use of sprintf in recordmcount · c8255dfd
      Colin Ian King authored
      commit 713a3e4d upstream.
      
      Fix build warning:
      
      scripts/recordmcount.c:589:4: warning: format not a string
      literal and no format arguments [-Wformat-security]
          sprintf("%s: failed\n", file);
      
      Fixes: a50bd439 ("ftrace/scripts: Have recordmcount copy the object file")
      Link: http://lkml.kernel.org/r/1451516801-16951-1-git-send-email-colin.king@canonical.com
      
      Cc: Li Bin <huawei.libin@huawei.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c8255dfd
    • Andrew Banman's avatar
      mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone() · cdfc80a4
      Andrew Banman authored
      commit 5f0f2887 upstream.
      
      test_pages_in_a_zone() does not account for the possibility of missing
      sections in the given pfn range.  pfn_valid_within always returns 1 when
      CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing
      sections to pass the test, leading to a kernel oops.
      
      Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check
      for missing sections before proceeding into the zone-check code.
      
      This also prevents a crash from offlining memory devices with missing
      sections.  Despite this, it may be a good idea to keep the related patch
      '[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with
      missing sections' because missing sections in a memory block may lead to
      other problems not covered by the scope of this fix.
      Signed-off-by: default avatarAndrew Banman <abanman@sgi.com>
      Acked-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Seth Jennings <sjennings@variantweb.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      cdfc80a4
    • Nikesh Oswal's avatar
      ASoC: arizona: Fix bclk for sample rates that are multiple of 4kHz · 0f251e0f
      Nikesh Oswal authored
      commit e73694d8 upstream.
      
      For a sample rate of 12kHz the bclk was taken from the 44.1kHz table as
      we test for a multiple of 8kHz. This patch fixes this issue by testing
      for multiples of 4kHz instead.
      Signed-off-by: default avatarNikesh Oswal <Nikesh.Oswal@cirrus.com>
      Signed-off-by: default avatarCharles Keepax <ckeepax@opensource.wolfsonmicro.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0f251e0f
    • Mario Kleiner's avatar
      ALSA: hda/realtek - Fix silent headphone output on MacPro 4,1 (v2) · dfdad38e
      Mario Kleiner authored
      commit 9f660a1c upstream.
      
      Without this patch, internal speaker and line-out work,
      but front headphone output jack stays silent on the
      Mac Pro 4,1.
      
      This code path also gets executed on the MacPro 5,1 due
      to identical codec SSID, but i don't know if it has any
      positive or adverse effects there or not.
      
      (v2) Implement feedback from Takashi Iwai: Reuse
           alc889_fixup_mbp_vref and just add a new nid
           0x19 for the MacPro 4,1.
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      dfdad38e
    • Helge Deller's avatar
      parisc: Fix syscall restarts · ef5ca35f
      Helge Deller authored
      commit 71a71fb5 upstream.
      
      On parisc syscalls which are interrupted by signals sometimes failed to
      restart and instead returned -ENOSYS which in the worst case lead to
      userspace crashes.
      A similiar problem existed on MIPS and was fixed by commit e967ef02
      ("MIPS: Fix restart of indirect syscalls").
      
      On parisc the current syscall restart code assumes that all syscall
      callers load the syscall number in the delay slot of the ble
      instruction. That's how it is e.g. done in the unistd.h header file:
      	ble 0x100(%sr2, %r0)
      	ldi #syscall_nr, %r20
      Because of that assumption the current code never restored %r20 before
      returning to userspace.
      
      This assumption is at least not true for code which uses the glibc
      syscall() function, which instead uses this syntax:
      	ble 0x100(%sr2, %r0)
      	copy regX, %r20
      where regX depend on how the compiler optimizes the code and register
      usage.
      
      This patch fixes this problem by adding code to analyze how the syscall
      number is loaded in the delay branch and - if needed - copy the syscall
      number to regX prior returning to userspace for the syscall restart.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ef5ca35f
    • Ashok Raj's avatar
      x86/mce: Ensure offline CPUs don't participate in rendezvous process · b4f22bf1
      Ashok Raj authored
      commit d90167a9 upstream.
      
      Intel's MCA implementation broadcasts MCEs to all CPUs on the
      node. This poses a problem for offlined CPUs which cannot
      participate in the rendezvous process:
      
        Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
        Kernel Offset: disabled
        Rebooting in 100 seconds..
      
      More specifically, Linux does a soft offline of a CPU when
      writing a 0 to /sys/devices/system/cpu/cpuX/online, which
      doesn't prevent the #MC exception from being broadcasted to that
      CPU.
      
      Ensure that offline CPUs don't participate in the MCE rendezvous
      and clear the RIP valid status bit so that a second MCE won't
      cause a shutdown.
      
      Without the patch, mce_start() will increment mce_callin and
      wait for all CPUs. Offlined CPUs should avoid participating in
      the rendezvous process altogether.
      Signed-off-by: default avatarAshok Raj <ashok.raj@intel.com>
      [ Massage commit message. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1449742346-21470-2-git-send-email-bp@alien8.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b4f22bf1
    • Alan Stern's avatar
      USB: fix invalid memory access in hub_activate() · 72c6f525
      Alan Stern authored
      commit e50293ef upstream.
      
      Commit 8520f380 ("USB: change hub initialization sleeps to
      delayed_work") changed the hub_activate() routine to make part of it
      run in a workqueue.  However, the commit failed to take a reference to
      the usb_hub structure or to lock the hub interface while doing so.  As
      a result, if a hub is plugged in and quickly unplugged before the work
      routine can run, the routine will try to access memory that has been
      deallocated.  Or, if the hub is unplugged while the routine is
      running, the memory may be deallocated while it is in active use.
      
      This patch fixes the problem by taking a reference to the usb_hub at
      the start of hub_activate() and releasing it at the end (when the work
      is finished), and by locking the hub interface while the work routine
      is running.  It also adds a check at the start of the routine to see
      if the hub has already been disconnected, in which nothing should be
      done.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reported-by: default avatarAlexandru Cornea <alexandru.cornea@intel.com>
      Tested-by: default avatarAlexandru Cornea <alexandru.cornea@intel.com>
      Fixes: 8520f380 ("USB: change hub initialization sleeps to delayed_work")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [ luis: backported to 3.16:
        - Added forward declaration of hub_release() which mainline had with commit
          32a69589 ("usb: hub: convert khubd into workqueue") ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      72c6f525
    • Dan Carpenter's avatar
      USB: ipaq.c: fix a timeout loop · a195293f
      Dan Carpenter authored
      commit abdc9a3b upstream.
      
      The code expects the loop to end with "retries" set to zero but, because
      it is a post-op, it will end set to -1.  I have fixed this by moving the
      decrement inside the loop.
      
      Fixes: 014aa2a3 ('USB: ipaq: minor ipaq_open() cleanup.')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a195293f
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set. · a6ef41a4
      Konrad Rzeszutek Wilk authored
      commit 408fb0e5 upstream.
      
      commit f598282f ("PCI: Fix the NIU MSI-X problem in a better way")
      teaches us that dealing with MSI-X can be troublesome.
      
      Further checks in the MSI-X architecture shows that if the
      PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we
      may not be able to access the BAR (since they are memory regions).
      
      Since the MSI-X tables are located in there.. that can lead
      to us causing PCIe errors. Inhibit us performing any
      operation on the MSI-X unless the MEMORY bit is set.
      
      Note that Xen hypervisor with:
      "x86/MSI-X: access MSI-X table only after having enabled MSI-X"
      will return:
      xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3!
      
      When the generic MSI code tries to setup the PIRQ without
      MEMORY bit set. Which means with later versions of Xen
      (4.6) this patch is not neccessary.
      
      This is part of XSA-157
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a6ef41a4
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled. · 1c668940
      Konrad Rzeszutek Wilk authored
      commit 7cfb905b upstream.
      
      Otherwise just continue on, returning the same values as
      previously (return of 0, and op->result has the PIRQ value).
      
      This does not change the behavior of XEN_PCI_OP_disable_msi[|x].
      
      The pci_disable_msi or pci_disable_msix have the checks for
      msi_enabled or msix_enabled so they will error out immediately.
      
      However the guest can still call these operations and cause
      us to disable the 'ack_intr'. That means the backend IRQ handler
      for the legacy interrupt will not respond to interrupts anymore.
      
      This will lead to (if the device is causing an interrupt storm)
      for the Linux generic code to disable the interrupt line.
      
      Naturally this will only happen if the device in question
      is plugged in on the motherboard on shared level interrupt GSI.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1c668940
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Do not install an IRQ handler for MSI interrupts. · f3b94b80
      Konrad Rzeszutek Wilk authored
      commit a396f3a2 upstream.
      
      Otherwise an guest can subvert the generic MSI code to trigger
      an BUG_ON condition during MSI interrupt freeing:
      
       for (i = 0; i < entry->nvec_used; i++)
              BUG_ON(irq_has_action(entry->irq + i));
      
      Xen PCI backed installs an IRQ handler (request_irq) for
      the dev->irq whenever the guest writes PCI_COMMAND_MEMORY
      (or PCI_COMMAND_IO) to the PCI_COMMAND register. This is
      done in case the device has legacy interrupts the GSI line
      is shared by the backend devices.
      
      To subvert the backend the guest needs to make the backend
      to change the dev->irq from the GSI to the MSI interrupt line,
      make the backend allocate an interrupt handler, and then command
      the backend to free the MSI interrupt and hit the BUG_ON.
      
      Since the backend only calls 'request_irq' when the guest
      writes to the PCI_COMMAND register the guest needs to call
      XEN_PCI_OP_enable_msi before any other operation. This will
      cause the generic MSI code to setup an MSI entry and
      populate dev->irq with the new PIRQ value.
      
      Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY
      and cause the backend to setup an IRQ handler for dev->irq
      (which instead of the GSI value has the MSI pirq). See
      'xen_pcibk_control_isr'.
      
      Then the guest disables the MSI: XEN_PCI_OP_disable_msi
      which ends up triggering the BUG_ON condition in 'free_msi_irqs'
      as there is an IRQ handler for the entry->irq (dev->irq).
      
      Note that this cannot be done using MSI-X as the generic
      code does not over-write dev->irq with the MSI-X PIRQ values.
      
      The patch inhibits setting up the IRQ handler if MSI or
      MSI-X (for symmetry reasons) code had been called successfully.
      
      P.S.
      Xen PCIBack when it sets up the device for the guest consumption
      ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device).
      XSA-120 addendum patch removed that - however when upstreaming said
      addendum we found that it caused issues with qemu upstream. That
      has now been fixed in qemu upstream.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f3b94b80
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled · 7a0b00fe
      Konrad Rzeszutek Wilk authored
      commit 5e0ce145 upstream.
      
      The guest sequence of:
      
        a) XEN_PCI_OP_enable_msix
        b) XEN_PCI_OP_enable_msix
      
      results in hitting an NULL pointer due to using freed pointers.
      
      The device passed in the guest MUST have MSI-X capability.
      
      The a) constructs and SysFS representation of MSI and MSI groups.
      The b) adds a second set of them but adding in to SysFS fails (duplicate entry).
      'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that
      in a) pdev->msi_irq_groups is still set) and also free's ALL of the
      MSI-X entries of the device (the ones allocated in step a) and b)).
      
      The unwind code: 'free_msi_irqs' deletes all the entries and tries to
      delete the pdev->msi_irq_groups (which hasn't been set to NULL).
      However the pointers in the SysFS are already freed and we hit an
      NULL pointer further on when 'strlen' is attempted on a freed pointer.
      
      The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard
      against that. The check for msi_enabled is not stricly neccessary.
      
      This is part of XSA-157
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7a0b00fe
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled · 55226b96
      Konrad Rzeszutek Wilk authored
      commit 56441f3c upstream.
      
      The guest sequence of:
      
       a) XEN_PCI_OP_enable_msi
       b) XEN_PCI_OP_enable_msi
       c) XEN_PCI_OP_disable_msi
      
      results in hitting an BUG_ON condition in the msi.c code.
      
      The MSI code uses an dev->msi_list to which it adds MSI entries.
      Under the above conditions an BUG_ON() can be hit. The device
      passed in the guest MUST have MSI capability.
      
      The a) adds the entry to the dev->msi_list and sets msi_enabled.
      The b) adds a second entry but adding in to SysFS fails (duplicate entry)
      and deletes all of the entries from msi_list and returns (with msi_enabled
      is still set).  c) pci_disable_msi passes the msi_enabled checks and hits:
      
      BUG_ON(list_empty(dev_to_msi_list(&dev->dev)));
      
      and blows up.
      
      The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard
      against that. The check for msix_enabled is not stricly neccessary.
      
      This is part of XSA-157.
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      55226b96
    • Konrad Rzeszutek Wilk's avatar
      xen/pciback: Save xen_pci_op commands before processing it · e69bc5ac
      Konrad Rzeszutek Wilk authored
      commit 8135cf8b upstream.
      
      Double fetch vulnerabilities that happen when a variable is
      fetched twice from shared memory but a security check is only
      performed the first time.
      
      The xen_pcibk_do_op function performs a switch statements on the op->cmd
      value which is stored in shared memory. Interestingly this can result
      in a double fetch vulnerability depending on the performed compiler
      optimization.
      
      This patch fixes it by saving the xen_pci_op command before
      processing it. We also use 'barrier' to make sure that the
      compiler does not perform any optimization.
      
      This is part of XSA155.
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarJan Beulich <JBeulich@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e69bc5ac
    • Roger Pau Monné's avatar
      xen-blkback: read from indirect descriptors only once · e4b67292
      Roger Pau Monné authored
      commit 18779149 upstream.
      
      Since indirect descriptors are in memory shared with the frontend, the
      frontend could alter the first_sect and last_sect values after they have
      been validated but before they are recorded in the request.  This may
      result in I/O requests that overflow the foreign page, possibly
      overwriting local pages when the I/O request is executed.
      
      When parsing indirect descriptors, only read first_sect and last_sect
      once.
      
      This is part of XSA155.
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [ luis: backported to 3.16:
        - Use ACCESS_ONCE instead of READ_ONCE
        - Use PAGE_SIZE instead of XEN_PAGE_SIZE ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e4b67292
    • Roger Pau Monné's avatar
      xen-blkback: only read request operation from shared ring once · 123ed94d
      Roger Pau Monné authored
      commit 1f13d75c upstream.
      
      A compiler may load a switch statement value multiple times, which could
      be bad when the value is in memory shared with the frontend.
      
      When converting a non-native request to a native one, ensure that
      src->operation is only loaded once by using READ_ONCE().
      
      This is part of XSA155.
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [ luis: backported to 3.16:
        - replaced READ_ONCE() by ACCESS_ONCE() ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      123ed94d
    • David Vrabel's avatar
      xen-netback: use RING_COPY_REQUEST() throughout · fb9643fe
      David Vrabel authored
      commit 68a33bfd upstream.
      
      Instead of open-coding memcpy()s and directly accessing Tx and Rx
      requests, use the new RING_COPY_REQUEST() that ensures the local copy
      is correct.
      
      This is more than is strictly necessary for guest Rx requests since
      only the id and gref fields are used and it is harmless if the
      frontend modifies these.
      
      This is part of XSA155.
      Reviewed-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [ kamal: backport to 3.13-stable: context (s/queue/vif/) ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fb9643fe
    • David Vrabel's avatar
      xen-netback: don't use last request to determine minimum Tx credit · fbc40eb1
      David Vrabel authored
      commit 0f589967 upstream.
      
      The last from guest transmitted request gives no indication about the
      minimum amount of credit that the guest might need to send a packet
      since the last packet might have been a small one.
      
      Instead allow for the worst case 128 KiB packet.
      
      This is part of XSA155.
      Reviewed-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [ kamal: backport to 3.13-stable: context ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fbc40eb1
    • David Vrabel's avatar
      xen: Add RING_COPY_REQUEST() · bfb335b5
      David Vrabel authored
      commit 454d5d88 upstream.
      
      Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly
      (i.e., by not considering that the other end may alter the data in the
      shared ring while it is being inspected).  Safe usage of a request
      generally requires taking a local copy.
      
      Provide a RING_COPY_REQUEST() macro to use instead of
      RING_GET_REQUEST() and an open-coded memcpy().  This takes care of
      ensuring that the copy is done correctly regardless of any possible
      compiler optimizations.
      
      Use a volatile source to prevent the compiler from reordering or
      omitting the copy.
      
      This is part of XSA155.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bfb335b5
    • Michael Holzheu's avatar
      s390/dis: Fix handling of format specifiers · 1938ca88
      Michael Holzheu authored
      commit 272fa59c upstream.
      
      The print_insn() function returns strings like "lghi %r1,0". To escape the
      '%' character in sprintf() a second '%' is used. For example "lghi %%r1,0"
      is converted into "lghi %r1,0".
      
      After print_insn() the output string is passed to printk(). Because format
      specifiers like "%r" or "%f" are ignored by printk() this works by chance
      most of the time. But for instructions with control registers like
      "lctl %c6,%c6,780" this fails because printk() interprets "%c" as
      character format specifier.
      
      Fix this problem and escape the '%' characters twice.
      
      For example "lctl %%%%c6,%%%%c6,780" is then converted by sprintf()
      into "lctl %%c6,%%c6,780" and by printk() into "lctl %c6,%c6,780".
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      [ luis: backported to 3.16:
        - drop condition with OPERAND_VR introduced only with commit
          3585cb02 ("s390/disassembler: add vector instructions") ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1938ca88
    • Xiong Zhang's avatar
      ALSA: hda - Set SKL+ hda controller power at freeze() and thaw() · b7827ba5
      Xiong Zhang authored
      commit 3e6db33a upstream.
      
      It takes three minutes to enter into hibernation on some OEM SKL
      machines and we see many codec spurious response after thaw() opertion.
      This is because HDA is still in D0 state after freeze() call and
      pci_pm_freeze/pci_pm_freeze_noirq() don't set D3 hot in pci_bus driver.
      It seems bios still access HDA when system enter into freeze state,
      HDA will receive codec response interrupt immediately after thaw() call.
      Because of this unexpected interrupt, HDA enter into a abnormal
      state and slow down the system enter into hibernation.
      
      In this patch, we put HDA into D3 hot state in azx_freeze_noirq() and
      put HDA into D0 state in azx_thaw_noirq().
      
      V2: Only apply this fix to SKL+
          Fix compile error when CONFIG_PM_SLEEP isn't defined
      
      [Yet another fix for CONFIG_PM_SLEEP ifdef and the additional comment
       by tiwai]
      Signed-off-by: default avatarXiong Zhang <xiong.y.zhang@intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b7827ba5
    • Vineet Gupta's avatar
      ARC: dw2 unwind: Ignore CIE version !=1 gracefully instead of bailing · ec48b677
      Vineet Gupta authored
      commit 323f41f9 upstream.
      
      ARC dwarf unwinder only supports CIE version == 1
      The boot time dwarf sanitizer (part of binary lookup table constructor)
      would simply bail if it saw CIE version == 3, rendering unwinder with a
      NULL lookup table.
      
      It seems libgcc linked with kernel does have such entries.
      
      With fallback linear search removed, and a NULL binary lookup table,
      unwinder fails to generate any stack trace.
      
      So allow graceful ignoring of unsupported CIE entries.
      
      This problem was initially seen in Alexey's setup (and not mine) as he
      was using buildroot built toolchain (libgcc) which doesn't get built with
      CFLAGS_FOR_TARGET="-gdwarf-2 which is my default
      
      Fixes STAR 9000985048: "kernel unwinder broken with stock tools"
      
      Fixes: 2e22502c ARC: dw2 unwind: Remove falllback linear search thru FDE entries
      Reported-by Alexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ec48b677
    • Vineet Gupta's avatar
      ARC: dw2 unwind: Reinstante unwinding out of modules · 978c03ee
      Vineet Gupta authored
      commit bc79c9a7 upstream.
      
      The fix which removed linear searching of dwarf (because binary lookup
      data always exists) missed out on the fact that modules don't get the
      binary lookup tables info. This caused unwinding out of modules to stop
      working.
      
      So add binary lookup header setup (equivalent of eh_frame_hdr setup) to
      modules as well.
      
      While at it, confine the header setup to within unwinder code,
      reducing one API exposed out of unwinder code.
      
      Fixes: 2e22502c ARC: dw2 unwind: Remove falllback linear search thru FDE entries
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      978c03ee
    • Steven Rostedt (Red Hat)'s avatar
      ftrace/scripts: Have recordmcount copy the object file · 8cfcb2dc
      Steven Rostedt (Red Hat) authored
      commit a50bd439 upstream.
      
      Russell King found that he had weird side effects when compiling the kernel
      with hard linked ccache. The reason was that recordmcount modified the
      kernel in place via mmap, and when a file gets modified twice by
      recordmcount, it will complain about it. To fix this issue, Russell wrote a
      patch that checked if the file was hard linked more than once and would
      unlink it if it was.
      
      Linus Torvalds was not happy with the fact that recordmcount does this in
      place modification. Instead of doing the unlink only if the file has two or
      more hard links, it does the unlink all the time. In otherwords, it always
      does a copy if it changed something. That is, it does the write out if a
      change was made.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8cfcb2dc
    • Russell King's avatar
      scripts: recordmcount: break hardlinks · 10d6d9b7
      Russell King authored
      commit dd39a265 upstream.
      
      recordmcount edits the file in-place, which can cause problems when
      using ccache in hardlink mode.  Arrange for recordmcount to break a
      hardlinked object.
      
      Link: http://lkml.kernel.org/r/E1a7MVT-0000et-62@rmk-PC.arm.linux.org.ukSigned-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      10d6d9b7
    • Johan Hovold's avatar
      spi: fix parent-device reference leak · 94e2c6af
      Johan Hovold authored
      commit 157f38f9 upstream.
      
      Fix parent-device reference leak due to SPI-core taking an unnecessary
      reference to the parent when allocating the master structure, a
      reference that was never released.
      
      Note that driver core takes its own reference to the parent when the
      master device is registered.
      
      Fixes: 49dce689 ("spi doesn't need class_device")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      94e2c6af
    • Anson Huang's avatar
      ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted · f073e42e
      Anson Huang authored
      commit fa0708b3 upstream.
      
      In cpu_v7_do_suspend routine, r11 is used while it is NOT
      saved/restored, different compiler may have different usage
      of ARM general registers, so it may cause issues during
      calling cpu_v7_do_suspend.
      
      We meet kernel fault occurs when using GCC 4.8.3, r11 contains
      valid value before calling into cpu_v7_do_suspend, but when returned
      from this routine, r11 is corrupted and lead to kernel fault.
      Doing save/restore for those corrupted registers is a must in
      assemble code.
      Signed-off-by: default avatarAnson Huang <Anson.Huang@freescale.com>
      Reviewed-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f073e42e
    • Anssi Hannula's avatar
      ALSA: usb-audio: Add a more accurate volume quirk for AudioQuest DragonFly · 0059fc7b
      Anssi Hannula authored
      commit 42e3121d upstream.
      
      AudioQuest DragonFly DAC reports a volume control range of 0..50
      (0x0000..0x0032) which in USB Audio means a range of 0 .. 0.2dB, which
      is obviously incorrect and would cause software using the dB information
      in e.g. volume sliders to have a massive volume difference in 100..102%
      range.
      
      Commit 2d1cb7f6 ("ALSA: usb-audio: add dB range mapping for some
      devices") added a dB range mapping for it with range 0..50 dB.
      
      However, the actual volume mapping seems to be neither linear volume nor
      linear dB scale, but instead quite close to the cubic mapping e.g.
      alsamixer uses, with a range of approx. -53...0 dB.
      
      Replace the previous quirk with a custom dB mapping based on some basic
      output measurements, using a 10-item range TLV (which will still fit in
      alsa-lib MAX_TLV_RANGE_SIZE).
      
      Tested on AudioQuest DragonFly HW v1.2. The quirk is only applied if the
      range is 0..50, so if this gets fixed/changed in later HW revisions it
      will no longer be applied.
      
      v2: incorporated Takashi Iwai's suggestion for the quirk application
      method
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@iki.fi>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [ kamal: backport to 3.13-stable: use snd_printk instead of usb_audio_info ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0059fc7b
    • Thomas Gleixner's avatar
      genirq: Prevent chip buslock deadlock · ae501c63
      Thomas Gleixner authored
      commit abc7e40c upstream.
      
      If a interrupt chip utilizes chip->buslock then free_irq() can
      deadlock in the following way:
      
      CPU0				CPU1
      				interrupt(X) (Shared or spurious)
      free_irq(X)			interrupt_thread(X)
      chip_bus_lock(X)
      				   irq_finalize_oneshot(X)
      				     chip_bus_lock(X)
      synchronize_irq(X)
      
      synchronize_irq() waits for the interrupt thread to complete,
      i.e. forever.
      
      Solution is simple: Drop chip_bus_lock() before calling
      synchronize_irq() as we do with the irq_desc lock. There is nothing to
      be protected after the point where irq_desc lock has been released.
      
      This adds chip_bus_lock/unlock() to the remove_irq() code path, but
      that's actually correct in the case where remove_irq() is called on
      such an interrupt. The current users of remove_irq() are not affected
      as none of those interrupts is on a chip which requires buslock.
      Reported-by: default avatarFredrik Markström <fredrik.markstrom@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ae501c63
    • Peter Hurley's avatar
      tty: Fix GPF in flush_to_ldisc() · d75e0b71
      Peter Hurley authored
      commit 9ce119f3 upstream.
      
      A line discipline which does not define a receive_buf() method can
      can cause a GPF if data is ever received [1]. Oddly, this was known
      to the author of n_tracesink in 2011, but never fixed.
      
      [1] GPF report
          BUG: unable to handle kernel NULL pointer dereference at           (null)
          IP: [<          (null)>]           (null)
          PGD 3752d067 PUD 37a7b067 PMD 0
          Oops: 0010 [#1] SMP KASAN
          Modules linked in:
          CPU: 2 PID: 148 Comm: kworker/u10:2 Not tainted 4.4.0-rc2+ #51
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Workqueue: events_unbound flush_to_ldisc
          task: ffff88006da94440 ti: ffff88006db60000 task.ti: ffff88006db60000
          RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
          RSP: 0018:ffff88006db67b50  EFLAGS: 00010246
          RAX: 0000000000000102 RBX: ffff88003ab32f88 RCX: 0000000000000102
          RDX: 0000000000000000 RSI: ffff88003ab330a6 RDI: ffff88003aabd388
          RBP: ffff88006db67c48 R08: ffff88003ab32f9c R09: ffff88003ab31fb0
          R10: ffff88003ab32fa8 R11: 0000000000000000 R12: dffffc0000000000
          R13: ffff88006db67c20 R14: ffffffff863df820 R15: ffff88003ab31fb8
          FS:  0000000000000000(0000) GS:ffff88006dc00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
          CR2: 0000000000000000 CR3: 0000000037938000 CR4: 00000000000006e0
          Stack:
           ffffffff829f46f1 ffff88006da94bf8 ffff88006da94bf8 0000000000000000
           ffff88003ab31fb0 ffff88003aabd438 ffff88003ab31ff8 ffff88006430fd90
           ffff88003ab32f9c ffffed0007557a87 1ffff1000db6cf78 ffff88003ab32078
          Call Trace:
           [<ffffffff8127cf91>] process_one_work+0x8f1/0x17a0 kernel/workqueue.c:2030
           [<ffffffff8127df14>] worker_thread+0xd4/0x1180 kernel/workqueue.c:2162
           [<ffffffff8128faaf>] kthread+0x1cf/0x270 drivers/block/aoe/aoecmd.c:1302
           [<ffffffff852a7c2f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468
          Code:  Bad RIP value.
          RIP  [<          (null)>]           (null)
           RSP <ffff88006db67b50>
          CR2: 0000000000000000
          ---[ end trace a587f8947e54d6ea ]---
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d75e0b71
    • Peter Hurley's avatar
      n_tty: Fix poll() after buffer-limited eof push read · 67ee7d42
      Peter Hurley authored
      commit ac8f3bf8 upstream.
      
      commit 40d5e090 ("n_tty: Fix EOF push handling") fixed EOF push
      for reads. However, that approach still allows a condition mismatch
      between poll() and read(), where poll() returns POLLIN but read()
      blocks. This state can happen when a previous read() returned because
      the user buffer was full and the next character was an EOF not at the
      beginning of the line. While the next read() will properly identify
      the condition and advance the read buffer tail without improperly
      indicating an EOF file condition (ie., read() will not mistakenly
      return 0), poll() will mistakenly indicate POLLIN.
      
      Although a possible solution would be to peek at the input buffer
      in n_tty_poll(), the better solution in this patch is to eat the
      EOF during the previous read() (ie., fix the problem by eliminating
      the condition).
      
      The current canon line buffer copy limits the scan for next end-of-line
      to the smaller of either,
         a. the remaining user buffer size
         b. completed lines in the input buffer
      When the remaining user buffer size is exactly one less than the
      end-of-line marked by EOF push, the EOF is not scanned nor skipped
      but left for subsequent reads. In the example below, the scan
      index 'eol' has stopped at the EOF because it is past the scan
      limit of 5 (not because it has found the next set bit in read_flags)
      
         user buffer [*nr = 5]    _ _ _ _ _
      
         read_flags               0 0 0 0 0   1
         input buffer             h e l l o [EOF]
                                  ^           ^
                                 /           /
                               tail        eol
      
         result: found = 0, tail += 5, *nr += 5
      
      Instead, allow the scan to peek ahead 1 byte (while still limiting the
      scan to completed lines in the input buffer). For the example above,
      
         result: found = 1, tail += 6, *nr += 5
      
      Because the scan limit is now bumped +1 byte, when the scan is
      completed, the tail advance and the user buffer copy limit is
      re-clamped to *nr when EOF is _not_ found.
      
      Fixes: 40d5e090 ("n_tty: Fix EOF push handling")
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      [ kamal: backported to 3.13: adjusted context ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      67ee7d42
    • Dmitry V. Levin's avatar
      sh64: fix __NR_fgetxattr · 371e4358
      Dmitry V. Levin authored
      commit 2d33fa10 upstream.
      
      According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr
      has to be defined to 259, but it doesn't.  Instead, it's defined to 269,
      which is of course used by another syscall, __NR_sched_setaffinity in this
      case.
      
      This bug was found by strace test suite.
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Acked-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      371e4358
    • Seth Jennings's avatar
      drivers/base/memory.c: prohibit offlining of memory blocks with missing sections · 147ca6b6
      Seth Jennings authored
      commit 26bbe7ef upstream.
      
      Commit bdee237c ("x86: mm: Use 2GB memory block size on large-memory
      x86-64 systems") and 982792c7 ("x86, mm: probe memory block size for
      generic x86 64bit") introduced large block sizes for x86.  This made it
      possible to have multiple sections per memory block where previously,
      there was a only every one section per block.
      
      Since blocks consist of contiguous ranges of section, there can be holes
      in the blocks where sections are not present.  If one attempts to
      offline such a block, a crash occurs since the code is not designed to
      deal with this.
      
      This patch is a quick fix to gaurd against the crash by not allowing
      blocks with non-present sections to be offlined.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=107781Signed-off-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Reported-by: default avatarAndrew Banman <abanman@sgi.com>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Russ Anderson <rja@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      147ca6b6
    • Naoya Horiguchi's avatar
      mm: hugetlb: call huge_pte_alloc() only if ptep is null · 9179dc36
      Naoya Horiguchi authored
      commit 0d777df5 upstream.
      
      Currently at the beginning of hugetlb_fault(), we call huge_pte_offset()
      and check whether the obtained *ptep is a migration/hwpoison entry or
      not.  And if not, then we get to call huge_pte_alloc().  This is racy
      because the *ptep could turn into migration/hwpoison entry after the
      huge_pte_offset() check.  This race results in BUG_ON in
      huge_pte_alloc().
      
      We don't have to call huge_pte_alloc() when the huge_pte_offset()
      returns non-NULL, so let's fix this bug with moving the code into else
      block.
      
      Note that the *ptep could turn into a migration/hwpoison entry after
      this block, but that's not a problem because we have another
      !pte_present check later (we never go into hugetlb_no_page() in that
      case.)
      
      Fixes: 290408d4 ("hugetlb: hugepage migration core")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9179dc36
    • Michal Hocko's avatar
      mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress · 10125ebf
      Michal Hocko authored
      commit 373ccbe5 upstream.
      
      Tetsuo Handa has reported that the system might basically livelock in
      OOM condition without triggering the OOM killer.
      
      The issue is caused by internal dependency of the direct reclaim on
      vmstat counter updates (via zone_reclaimable) which are performed from
      the workqueue context.  If all the current workers get assigned to an
      allocation request, though, they will be looping inside the allocator
      trying to reclaim memory but zone_reclaimable can see stalled numbers so
      it will consider a zone reclaimable even though it has been scanned way
      too much.  WQ concurrency logic will not consider this situation as a
      congested workqueue because it relies that worker would have to sleep in
      such a situation.  This also means that it doesn't try to spawn new
      workers or invoke the rescuer thread if the one is assigned to the
      queue.
      
      In order to fix this issue we need to do two things.  First we have to
      let wq concurrency code know that we are in trouble so we have to do a
      short sleep.  In order to prevent from issues handled by 0e093d99
      ("writeback: do not sleep on the congestion queue if there are no
      congested BDIs or if significant congestion is not being encountered in
      the current zone") we limit the sleep only to worker threads which are
      the ones of the interest anyway.
      
      The second thing to do is to create a dedicated workqueue for vmstat and
      mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
      have a spare worker thread for it.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Cristopher Lameter <clameter@sgi.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      [ luis: backported to 3.16, based on Ben's backport to 3.2:
        - use queue_delayed_work instead of queue_delayed_work_on in function
          vmstat_update()
        - change start_cpu_timer() instead of vmstat_shepherd()
        - adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      [ kamal: backport to 3.13-stable: context ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      10125ebf
  2. 22 Jan, 2016 1 commit
    • Mikulas Patocka's avatar
      parisc iommu: fix panic due to trying to allocate too large region · f9ac1915
      Mikulas Patocka authored
      commit e46e31a3 upstream.
      
      When using the Promise TX2+ SATA controller on PA-RISC, the system often
      crashes with kernel panic, for example just writing data with the dd
      utility will make it crash.
      
      Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources
      
      CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
      Backtrace:
       [<000000004021497c>] show_stack+0x14/0x20
       [<0000000040410bf0>] dump_stack+0x88/0x100
       [<000000004023978c>] panic+0x124/0x360
       [<0000000040452c18>] sba_alloc_range+0x698/0x6a0
       [<0000000040453150>] sba_map_sg+0x260/0x5b8
       [<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
       [<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
       [<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
       [<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
       [<000000004049da34>] scsi_request_fn+0x6e4/0x970
       [<00000000403e95a8>] __blk_run_queue+0x40/0x60
       [<00000000403e9d8c>] blk_run_queue+0x3c/0x68
       [<000000004049a534>] scsi_run_queue+0x2a4/0x360
       [<000000004049be68>] scsi_end_request+0x1a8/0x238
       [<000000004049de84>] scsi_io_completion+0xfc/0x688
       [<0000000040493c74>] scsi_finish_command+0x17c/0x1d0
      
      The cause of the crash is not exhaustion of the IOMMU space, there is
      plenty of free pages. The function sba_alloc_range is called with size
      0x11000, thus the pages_needed variable is 0x11. The function
      sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
      0x10 (because dma_get_seg_boundary(dev) returns 0xffff).
      
      The function sba_search_bitmap attempts to allocate 17 pages that must not
      cross 16-page boundary - it can't satisfy this requirement
      (iommu_is_span_boundary always returns true) and fails even if there are
      many free entries in the IOMMU space.
      
      How did it happen that we try to allocate 17 pages that don't cross
      16-page boundary? The cause is in the function iommu_coalesce_chunks. This
      function tries to coalesce adjacent entries in the scatterlist. The
      function does several checks if it may coalesce one entry with the next,
      one of those checks is this:
      
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      
      When it finishes coalescing adjacent entries, it allocates the mapping:
      
      sg_dma_len(contig_sg) = dma_len;
      dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      sg_dma_address(contig_sg) =
      	PIDE_FLAG
      	| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
      	| dma_offset;
      
      It is possible that (startsg->length + dma_len > max_seg_size) is false
      (we are just near the 0x10000 max_seg_size boundary), so the funcion
      decides to coalesce this entry with the next entry. When the coalescing
      succeeds, the function performs
      	dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
      And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
      iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
      to allocate 17 pages for a device that must not cross 16-page boundary.
      
      To fix the bug, we must make sure that dma_len after addition of
      dma_offset and alignment doesn't cross the segment boundary. I.e. change
      	if (startsg->length + dma_len > max_seg_size)
      		break;
      to
      	if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
      		break;
      
      This patch makes this change (it precalculates max_seg_boundary at the
      beginning of the function iommu_coalesce_chunks). I also added a check
      that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
      not needed for Promise TX2+ SATA, but it may be needed for other devices
      that have dma_get_seg_boundary lower than dma_get_max_seg_size).
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f9ac1915