1. 13 Feb, 2014 12 commits
    • Naoya Horiguchi's avatar
      mm/memory-failure.c: shift page lock from head page to tail page after thp split · 9af7b867
      Naoya Horiguchi authored
      commit 54b9dd14 upstream.
      
      After thp split in hwpoison_user_mappings(), we hold page lock on the
      raw error page only between try_to_unmap, hence we are in danger of race
      condition.
      
      I found in the RHEL7 MCE-relay testing that we have "bad page" error
      when a memory error happens on a thp tail page used by qemu-kvm:
      
        Triggering MCE exception on CPU 10
        mce: [Hardware Error]: Machine check events logged
        MCE exception done on CPU 10
        MCE 0x38c535: Killing qemu-kvm:8418 due to hardware memory corruption
        MCE 0x38c535: dirty LRU page recovery: Recovered
        qemu-kvm[8418]: segfault at 20 ip 00007ffb0f0f229a sp 00007fffd6bc5240 error 4 in qemu-kvm[7ffb0ef14000+420000]
        BUG: Bad page state in process qemu-kvm  pfn:38c400
        page:ffffea000e310000 count:0 mapcount:0 mapping:          (null) index:0x7ffae3c00
        page flags: 0x2fffff0008001d(locked|referenced|uptodate|dirty|swapbacked)
        Modules linked in: hwpoison_inject mce_inject vhost_net macvtap macvlan ...
        CPU: 0 PID: 8418 Comm: qemu-kvm Tainted: G   M        --------------   3.10.0-54.0.1.el7.mce_test_fixed.x86_64 #1
        Hardware name: NEC NEC Express5800/R120b-1 [N8100-1719F]/MS-91E7-001, BIOS 4.6.3C19 02/10/2011
        Call Trace:
          dump_stack+0x19/0x1b
          bad_page.part.59+0xcf/0xe8
          free_pages_prepare+0x148/0x160
          free_hot_cold_page+0x31/0x140
          free_hot_cold_page_list+0x46/0xa0
          release_pages+0x1c1/0x200
          free_pages_and_swap_cache+0xad/0xd0
          tlb_flush_mmu.part.46+0x4c/0x90
          tlb_finish_mmu+0x55/0x60
          exit_mmap+0xcb/0x170
          mmput+0x67/0xf0
          vhost_dev_cleanup+0x231/0x260 [vhost_net]
          vhost_net_release+0x3f/0x90 [vhost_net]
          __fput+0xe9/0x270
          ____fput+0xe/0x10
          task_work_run+0xc4/0xe0
          do_exit+0x2bb/0xa40
          do_group_exit+0x3f/0xa0
          get_signal_to_deliver+0x1d0/0x6e0
          do_signal+0x48/0x5e0
          do_notify_resume+0x71/0xc0
          retint_signal+0x48/0x8c
      
      The reason of this bug is that a page fault happens before unlocking the
      head page at the end of memory_failure().  This strange page fault is
      trying to access to address 0x20 and I'm not sure why qemu-kvm does
      this, but anyway as a result the SIGSEGV makes qemu-kvm exit and on the
      way we catch the bad page bug/warning because we try to free a locked
      page (which was the former head page.)
      
      To fix this, this patch suggests to shift page lock from head page to
      tail page just after thp split.  SIGSEGV still happens, but it affects
      only error affected VMs, not a whole system.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9af7b867
    • Konrad Rzeszutek Wilk's avatar
      xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4). · 4aa9ed1b
      Konrad Rzeszutek Wilk authored
      commit 51c71a3b upstream.
      
      The user has the option of disabling the platform driver:
      00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
      
      which is used to unplug the emulated drivers (IDE, Realtek 8169, etc)
      and allow the PV drivers to take over. If the user wishes
      to disable that they can set:
      
        xen_platform_pci=0
        (in the guest config file)
      
      or
        xen_emul_unplug=never
        (on the Linux command line)
      
      except it does not work properly. The PV drivers still try to
      load and since the Xen platform driver is not run - and it
      has not initialized the grant tables, most of the PV drivers
      stumble upon:
      
      input: Xen Virtual Keyboard as /devices/virtual/input/input5
      input: Xen Virtual Pointer as /devices/virtual/input/input6M
      ------------[ cut here ]------------
      kernel BUG at /home/konrad/ssd/konrad/linux/drivers/xen/grant-table.c:1206!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: xen_kbdfront(+) xenfs xen_privcmd
      CPU: 6 PID: 1389 Comm: modprobe Not tainted 3.13.0-rc1upstream-00021-ga6c892b-dirty #1
      Hardware name: Xen HVM domU, BIOS 4.4-unstable 11/26/2013
      RIP: 0010:[<ffffffff813ddc40>]  [<ffffffff813ddc40>] get_free_entries+0x2e0/0x300
      Call Trace:
       [<ffffffff8150d9a3>] ? evdev_connect+0x1e3/0x240
       [<ffffffff813ddd0e>] gnttab_grant_foreign_access+0x2e/0x70
       [<ffffffffa0010081>] xenkbd_connect_backend+0x41/0x290 [xen_kbdfront]
       [<ffffffffa0010a12>] xenkbd_probe+0x2f2/0x324 [xen_kbdfront]
       [<ffffffff813e5757>] xenbus_dev_probe+0x77/0x130
       [<ffffffff813e7217>] xenbus_frontend_dev_probe+0x47/0x50
       [<ffffffff8145e9a9>] driver_probe_device+0x89/0x230
       [<ffffffff8145ebeb>] __driver_attach+0x9b/0xa0
       [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
       [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230
       [<ffffffff8145cf1c>] bus_for_each_dev+0x8c/0xb0
       [<ffffffff8145e7d9>] driver_attach+0x19/0x20
       [<ffffffff8145e260>] bus_add_driver+0x1a0/0x220
       [<ffffffff8145f1ff>] driver_register+0x5f/0xf0
       [<ffffffff813e55c5>] xenbus_register_driver_common+0x15/0x20
       [<ffffffff813e76b3>] xenbus_register_frontend+0x23/0x40
       [<ffffffffa0015000>] ? 0xffffffffa0014fff
       [<ffffffffa001502b>] xenkbd_init+0x2b/0x1000 [xen_kbdfront]
       [<ffffffff81002049>] do_one_initcall+0x49/0x170
      
      .. snip..
      
      which is hardly nice. This patch fixes this by having each
      PV driver check for:
       - if running in PV, then it is fine to execute (as that is their
         native environment).
       - if running in HVM, check if user wanted 'xen_emul_unplug=never',
         in which case bail out and don't load any PV drivers.
       - if running in HVM, and if PCI device 5853:0001 (xen_platform_pci)
         does not exist, then bail out and not load PV drivers.
       - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=ide-disks',
         then bail out for all PV devices _except_ the block one.
         Ditto for the network one ('nics').
       - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=unnecessary'
         then load block PV driver, and also setup the legacy IDE paths.
         In (v3) make it actually load PV drivers.
      
      Reported-by: Sander Eikelenboom <linux@eikelenboom.it
      Reported-by: default avatarAnthony PERARD <anthony.perard@citrix.com>
      Reported-and-Tested-by: default avatarFabio Fantoni <fabio.fantoni@m2r.biz>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [v2: Add extra logic to handle the myrid ways 'xen_emul_unplug'
      can be used per Ian and Stefano suggestion]
      [v3: Make the unnecessary case work properly]
      [v4: s/disks/ide-disks/ spotted by Fabio]
      Reviewed-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: Bjorn Helgaas <bhelgaas@google.com> [for PCI parts]
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4aa9ed1b
    • AKASHI Takahiro's avatar
      audit: correct a type mismatch in audit_syscall_exit() · 941a0851
      AKASHI Takahiro authored
      commit 06bdadd7 upstream.
      
      audit_syscall_exit() saves a result of regs_return_value() in intermediate
      "int" variable and passes it to __audit_syscall_exit(), which expects its
      second argument as a "long" value.  This will result in truncating the
      value returned by a system call and making a wrong audit record.
      
      I don't know why gcc compiler doesn't complain about this, but anyway it
      causes a problem at runtime on arm64 (and probably most 64-bit archs).
      Signed-off-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      941a0851
    • Richard Guy Briggs's avatar
      audit: reset audit backlog wait time after error recovery · 5f6c20cd
      Richard Guy Briggs authored
      commit e789e561 upstream.
      
      When the audit queue overflows and times out (audit_backlog_wait_time), the
      audit queue overflow timeout is set to zero.  Once the audit queue overflow
      timeout condition recovers, the timeout should be reset to the original value.
      
      See also:
      	https://lkml.org/lkml/2013/9/2/473Signed-off-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Signed-off-by: default avatarDan Duval <dan.duval@oracle.com>
      Signed-off-by: default avatarChuck Anderson <chuck.anderson@oracle.com>
      Signed-off-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f6c20cd
    • Miklos Szeredi's avatar
      fuse: fix pipe_buf_operations · cfb7ee59
      Miklos Szeredi authored
      commit 28a625cb upstream.
      
      Having this struct in module memory could Oops when if the module is
      unloaded while the buffer still persists in a pipe.
      
      Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal
      merge them into nosteal_pipe_buf_ops (this is the same as
      default_pipe_buf_ops except stealing the page from the buffer is not
      allowed).
      Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfb7ee59
    • Bjorn Helgaas's avatar
      Revert "EISA: Initialize device before its resources" · eb45c9b3
      Bjorn Helgaas authored
      commit 765ee51f upstream.
      
      This reverts commit 26abfeed.
      
      In the eisa_probe() force_probe path, if we were unable to request slot
      resources (e.g., [io 0x800-0x8ff]), we skipped the slot with "Cannot
      allocate resource for EISA slot %d" before reading the EISA signature in
      eisa_init_device().
      
      Commit 26abfeed moved eisa_init_device() earlier, so we tried to read
      the EISA signature before requesting the slot resources, and this caused
      hangs during boot.
      
      Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1251816Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb45c9b3
    • Alex Williamson's avatar
      intel-iommu: fix off-by-one in pagetable freeing · 1c6270df
      Alex Williamson authored
      commit 08336fd2 upstream.
      
      dma_pte_free_level() has an off-by-one error when checking whether a pte
      is completely covered by a range.  Take for example the case of
      attempting to free pfn 0x0 - 0x1ff, ie.  512 entries covering the first
      2M superpage.
      
      The level_size() is 0x200 and we test:
      
        static void dma_pte_free_level(...
      	...
      
      	if (!(0 > 0 || 0x1ff < 0 + 0x200)) {
      		...
      	}
      
      Clearly the 2nd test is true, which means we fail to take the branch to
      clear and free the pagetable entry.  As a result, we're leaking
      pagetables and failing to install new pages over the range.
      
      This was found with a PCI device assigned to a QEMU guest using vfio-pci
      without a VGA device present.  The first 1M of guest address space is
      mapped with various combinations of 4K pages, but eventually the range
      is entirely freed and replaced with a 2M contiguous mapping.
      intel-iommu errors out with something like:
      
        ERROR: DMA PTE for vPFN 0x0 already set (to 5c2b8003 not 849c00083)
      
      In this case 5c2b8003 is the pointer to the previous leaf page that was
      neither freed nor cleared and 849c00083 is the superpage entry that
      we're trying to replace it with.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c6270df
    • Wanlong Gao's avatar
      arch/sh/kernel/kgdb.c: add missing #include <linux/sched.h> · 032e521f
      Wanlong Gao authored
      commit 53a52f17 upstream.
      
        arch/sh/kernel/kgdb.c: In function 'sleeping_thread_to_gdb_regs':
        arch/sh/kernel/kgdb.c:225:32: error: implicit declaration of function 'task_stack_page' [-Werror=implicit-function-declaration]
        arch/sh/kernel/kgdb.c:242:23: error: dereferencing pointer to incomplete type
        arch/sh/kernel/kgdb.c:243:22: error: dereferencing pointer to incomplete type
        arch/sh/kernel/kgdb.c: In function 'singlestep_trap_handler':
        arch/sh/kernel/kgdb.c:310:27: error: 'SIGTRAP' undeclared (first use in this function)
        arch/sh/kernel/kgdb.c:310:27: note: each undeclared identifier is reported only once for each function it appears in
      
      This was introduced by commit 16559ae4 ("kgdb: remove #include
      <linux/serial_8250.h> from kgdb.h").
      
      [geert@linux-m68k.org: reworded and reformatted]
      Signed-off-by: default avatarWanlong Gao <gaowanlong@cn.fujitsu.com>
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@linux-m68k.org>
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      032e521f
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Check if tracing is enabled in trace_puts() · b64f0de1
      Steven Rostedt (Red Hat) authored
      commit 3132e107 upstream.
      
      If trace_puts() is used very early in boot up, it can crash the machine
      if it is called before the ring buffer is allocated. If a trace_printk()
      is used with no arguments, then it will be converted into a trace_puts()
      and suffer the same fate.
      
      Fixes: 09ae7234 "tracing: Add trace_puts() for even faster trace_printk() tracing"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b64f0de1
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Have trace buffer point back to trace_array · 3b5121fd
      Steven Rostedt (Red Hat) authored
      commit dced341b upstream.
      
      The trace buffer has a descriptor pointer that goes back to the trace
      array. But it was never assigned. Luckily, nothing uses it (yet), but
      it will in the future.
      
      Although nothing currently uses this, if any of the new features get
      backported to older kernels, and because this is such a simple change,
      I'm marking it for stable too.
      
      Fixes: 12883efb "tracing: Consolidate max_tr into main trace_array structure"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b5121fd
    • Tetsuo Handa's avatar
      SELinux: Fix memory leak upon loading policy · 07415a18
      Tetsuo Handa authored
      commit 8ed81460 upstream.
      
      Hello.
      
      I got below leak with linux-3.10.0-54.0.1.el7.x86_64 .
      
      [  681.903890] kmemleak: 5538 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      Below is a patch, but I don't know whether we need special handing for undoing
      ebitmap_set_bit() call.
      ----------
      >>From fe97527a90fe95e2239dfbaa7558f0ed559c0992 Mon Sep 17 00:00:00 2001
      From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Date: Mon, 6 Jan 2014 16:30:21 +0900
      Subject: SELinux: Fix memory leak upon loading policy
      
      Commit 2463c26d "SELinux: put name based create rules in a hashtable" did not
      check return value from hashtab_insert() in filename_trans_read(). It leaks
      memory if hashtab_insert() returns error.
      
        unreferenced object 0xffff88005c9160d0 (size 8):
          comm "systemd", pid 1, jiffies 4294688674 (age 235.265s)
          hex dump (first 8 bytes):
            57 0b 00 00 6b 6b 6b a5                          W...kkk.
          backtrace:
            [<ffffffff816604ae>] kmemleak_alloc+0x4e/0xb0
            [<ffffffff811cba5e>] kmem_cache_alloc_trace+0x12e/0x360
            [<ffffffff812aec5d>] policydb_read+0xd1d/0xf70
            [<ffffffff812b345c>] security_load_policy+0x6c/0x500
            [<ffffffff812a623c>] sel_write_load+0xac/0x750
            [<ffffffff811eb680>] vfs_write+0xc0/0x1f0
            [<ffffffff811ec08c>] SyS_write+0x4c/0xa0
            [<ffffffff81690419>] system_call_fastpath+0x16/0x1b
            [<ffffffffffffffff>] 0xffffffffffffffff
      
      However, we should not return EEXIST error to the caller, or the systemd will
      show below message and the boot sequence freezes.
      
        systemd[1]: Failed to load SELinux policy. Freezing.
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07415a18
    • Paul Bolle's avatar
      mei: mei_hbm_dispatch() returns void · fec8eb1f
      Paul Bolle authored
      Building hbm.o for v3.13.2 triggers a GCC warning:
          drivers/misc/mei/hbm.c: In function 'mei_hbm_dispatch':
          drivers/misc/mei/hbm.c:596:3: warning: 'return' with a value, in function returning void [enabled by default]
             return 0;
             ^
      
      GCC is correct, obviously. So let's return void instead of zero here.
      Signed-off-by: default avatarPaul Bolle <pebolle@tiscali.nl>
      Acked-by: default avatarTomas Winkler <tomas.winkler@intel.com>
      Cc: Alexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fec8eb1f
  2. 06 Feb, 2014 28 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.13.2 · fd82174a
      Greg Kroah-Hartman authored
      fd82174a
    • Borislav Petkov's avatar
      x86, cpu, amd: Add workaround for family 16h, erratum 793 · 31752e89
      Borislav Petkov authored
      commit 3b564968 upstream.
      
      This adds the workaround for erratum 793 as a precaution in case not
      every BIOS implements it.  This addresses CVE-2013-6885.
      
      Erratum text:
      
      [Revision Guide for AMD Family 16h Models 00h-0Fh Processors,
      document 51810 Rev. 3.04 November 2013]
      
      793 Specific Combination of Writes to Write Combined Memory Types and
      Locked Instructions May Cause Core Hang
      
      Description
      
      Under a highly specific and detailed set of internal timing
      conditions, a locked instruction may trigger a timing sequence whereby
      the write to a write combined memory type is not flushed, causing the
      locked instruction to stall indefinitely.
      
      Potential Effect on System
      
      Processor core hang.
      
      Suggested Workaround
      
      BIOS should set MSR
      C001_1020[15] = 1b.
      
      Fix Planned
      
      No fix planned
      
      [ hpa: updated description, fixed typo in MSR name ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnicTested-by: default avatarAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31752e89
    • Paul Mackerras's avatar
      powerpc: Make sure "cache" directory is removed when offlining cpu · 8698e686
      Paul Mackerras authored
      commit 91b973f9 upstream.
      
      The code in remove_cache_dir() is supposed to remove the "cache"
      subdirectory from the sysfs directory for a CPU when that CPU is
      being offlined.  It tries to do this by calling kobject_put() on
      the kobject for the subdirectory.  However, the subdirectory only
      gets removed once the last reference goes away, and the reference
      being put here may well not be the last reference.  That means
      that the "cache" subdirectory may still exist when the offlining
      operation has finished.  If the same CPU subsequently gets onlined,
      the code tries to add a new "cache" subdirectory.  If the old
      subdirectory has not yet been removed, we get a WARN_ON in the
      sysfs code, with stack trace, and an error message printed on the
      console.  Further, we ultimately end up with an online cpu with no
      "cache" subdirectory.
      
      This fixes it by doing an explicit kobject_del() at the point where
      we want the subdirectory to go away.  kobject_del() removes the sysfs
      directory even though the object still exists in memory.  The object
      will get freed at some point in the future.  A subsequent onlining
      operation can create a new sysfs directory, even if the old object
      still exists in memory, without causing any problems.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8698e686
    • Srivatsa S. Bhat's avatar
      powerpc: Fix the setup of CPU-to-Node mappings during CPU online · 5326f5fc
      Srivatsa S. Bhat authored
      commit d4edc5b6 upstream.
      
      On POWER platforms, the hypervisor can notify the guest kernel about dynamic
      changes in the cpu-numa associativity (VPHN topology update). Hence the
      cpu-to-node mappings that we got from the firmware during boot, may no longer
      be valid after such updates. This is handled using the arch_update_cpu_topology()
      hook in the scheduler, and the sched-domains are rebuilt according to the new
      mappings.
      
      But unfortunately, at the moment, CPU hotplug ignores these updated mappings
      and instead queries the firmware for the cpu-to-numa relationships and uses
      them during CPU online. So the kernel can end up assigning wrong NUMA nodes
      to CPUs during subsequent CPU hotplug online operations (after booting).
      
      Further, a particularly problematic scenario can result from this bug:
      On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8)
      threads per core. The switch to Single-Threaded (ST) mode is performed by
      offlining all except the first CPU thread in each core. Switching back to
      SMT mode involves onlining those other threads back, in each core.
      
      Now consider this scenario:
      
      1. During boot, the kernel gets the cpu-to-node mappings from the firmware
         and assigns the CPUs to NUMA nodes appropriately, during CPU online.
      
      2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and
         communicates this update to the kernel. The kernel in turn updates its
         cpu-to-node associations and rebuilds its sched domains. Everything is
         fine so far.
      
      3. Now, the user switches the machine from SMT to ST mode (say, by running
         ppc64_cpu --smt=1). This involves offlining all except 1 thread in each
         core.
      
      4. The user then tries to switch back from ST to SMT mode (say, by running
         ppc64_cpu --smt=4), and this involves onlining those threads back. Since
         CPU hotplug ignores the new mappings, it queries the firmware and tries to
         associate the newly onlined sibling threads to the old NUMA nodes. This
         results in sibling threads within the same core getting associated with
         different NUMA nodes, which is incorrect.
      
         The scheduler's build-sched-domains code gets thoroughly confused with this
         and enters an infinite loop and causes soft-lockups, as explained in detail
         in commit 3be7db6a (powerpc: VPHN topology change updates all siblings).
      
      So to fix this, use the numa_cpu_lookup_table to remember the updated
      cpu-to-node mappings, and use them during CPU hotplug online operations.
      Further, we also need to ensure that all threads in a core are assigned to a
      common NUMA node, irrespective of whether all those threads were online during
      the topology update. To achieve this, we take care not to use cpu_sibling_mask()
      since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread()
      and set up the mappings manually using the 'threads_per_core' value for that
      particular platform. This helps us ensure that we don't hit this bug with any
      combination of CPU hotplug and SMT mode switching.
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5326f5fc
    • David Sterba's avatar
      btrfs: restrict snapshotting to own subvolumes · 000bed8a
      David Sterba authored
      commit d0242061 upstream.
      
      Currently, any user can snapshot any subvolume if the path is accessible and
      thus indirectly create and keep files he does not own under his direcotries.
      This is not possible with traditional directories.
      
      In security context, a user can snapshot root filesystem and pin any
      potentially buggy binaries, even if the updates are applied.
      
      All the snapshots are visible to the administrator, so it's possible to
      verify if there are suspicious snapshots.
      
      Another more practical problem is that any user can pin the space used
      by eg. root and cause ENOSPC.
      
      Original report:
      https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/484786Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      000bed8a
    • Chris Mason's avatar
      Btrfs: setup inode location during btrfs_init_inode_locked · 5d4d57d2
      Chris Mason authored
      commit 90d3e592 upstream.
      
      We have a race during inode init because the BTRFS_I(inode)->location is setup
      after the inode hash table lock is dropped.  btrfs_find_actor uses the location
      field, so our search might not find an existing inode in the hash table if we
      race with the inode init code.
      
      This commit changes things to setup the location field sooner.  Also the find actor now
      uses only the location objectid to match inodes.  For inode hashing, we just
      need a unique and stable test, it doesn't have to reflect the inode numbers we
      show to userland.
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d4d57d2
    • Wang Shilong's avatar
      Btrfs: handle EAGAIN case properly in btrfs_drop_snapshot() · fb0f7df2
      Wang Shilong authored
      commit 90515e7f upstream.
      
      We may return early in btrfs_drop_snapshot(), we shouldn't
      call btrfs_std_err() for this case, fix it.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb0f7df2
    • Andy Grover's avatar
      target/iscsi: Fix network portal creation race · 0bf44d68
      Andy Grover authored
      commit ee291e63 upstream.
      
      When creating network portals rapidly, such as when restoring a
      configuration, LIO's code to reuse existing portals can return a false
      negative if the thread hasn't run yet and set np_thread_state to
      ISCSI_NP_THREAD_ACTIVE. This causes an error in the network stack
      when attempting to bind to the same address/port.
      
      This patch sets NP_THREAD_ACTIVE before the np is placed on g_np_list,
      so even if the thread hasn't run yet, iscsit_get_np will return the
      existing np.
      
      Also, convert np_lock -> np_mutex + hold across adding new net portal
      to g_np_list to prevent a race where two threads may attempt to create
      the same network portal, resulting in one of them failing.
      
      (nab: Add missing mutex_unlocks in iscsit_add_np failure paths)
      (DanC: Fix incorrect spin_unlock -> spin_unlock_bh)
      Signed-off-by: default avatarAndy Grover <agrover@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bf44d68
    • Nicholas Bellinger's avatar
      iscsi-target: Pre-allocate more tags to avoid ack starvation · e00d98a6
      Nicholas Bellinger authored
      commit 4a4caa29 upstream.
      
      This patch addresses an traditional iscsi-target fabric ack starvation
      issue where iscsit_allocate_cmd() -> percpu_ida_alloc_state() ends up
      hitting slow path percpu-ida code, because iscsit_ack_from_expstatsn()
      is expected to free ack'ed tags after tag allocation.
      
      This is done to take into account the tags waiting to be acknowledged
      and released in iscsit_ack_from_expstatsn(), but who's number are not
      directly limited by the CmdSN Window queue_depth being enforced by
      the target.
      
      So that said, this patch bumps up the pre-allocated number of
      per session tags to:
      
        (max(queue_depth, ISCSIT_MIN_TAGS) * 2) + ISCSIT_EXTRA_TAGS
      
      for good measure to avoid the percpu_ida_alloc_state() slow path.
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e00d98a6
    • Asias He's avatar
      virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze · f91c2051
      Asias He authored
      commit f466f753 upstream.
      
      vqs are freed in virtscsi_freeze but the hotcpu_notifier is not
      unregistered. We will have a use-after-free usage when the notifier
      callback is called after virtscsi_freeze.
      
      Fixes: 285e71ea
      ("virtio-scsi: reset virtqueue affinity when doing cpu hotplug")
      Signed-off-by: default avatarAsias He <asias.hejun@gmail.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f91c2051
    • Dan Carpenter's avatar
      SCSI: qla4xxx: overflow in qla4xxx_set_chap_entry() · 8507df58
      Dan Carpenter authored
      commit 3c60cfd7 upstream.
      
      We should cap the size of memcpy() because it comes from the network
      and can't be trusted.
      
      Fixes: 26ffd7b4 ('[SCSI] qla4xxx: Add support to set CHAP entries')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarVikas Chaudhary <vikas.chaudhary@qlogic.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8507df58
    • Vijaya Mohan Guvva's avatar
      SCSI: bfa: Chinook quad port 16G FC HBA claim issue · e88f3608
      Vijaya Mohan Guvva authored
      commit dcaf9aed upstream.
      
      Bfa driver crash is observed while pushing the firmware on to chinook
      quad port card due to uninitialized bfi_image_ct2 access which gets
      initialized only for CT2 ASIC based cards after request_firmware().
      For quard port chinook (CT2 ASIC based), bfi_image_ct2 is not getting
      initialized as there is no check for chinook PCI device ID before
      request_firmware and instead bfi_image_cb is initialized as it is the
      default case for card type check.
      
      This patch includes changes to read the right firmware for quad port chinook.
      Signed-off-by: default avatarVijaya Mohan Guvva <vmohan@brocade.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e88f3608
    • Thomas Pugliese's avatar
      usb: core: get config and string descriptors for unauthorized devices · 6d954143
      Thomas Pugliese authored
      commit 83e83ecb upstream.
      
      There is no need to skip querying the config and string descriptors for
      unauthorized WUSB devices when usb_new_device is called.  It is allowed
      by WUSB spec.  The only action that needs to be delayed until
      authorization time is the set config.  This change allows user mode
      tools to see the config and string descriptors earlier in enumeration
      which is needed for some WUSB devices to function properly on Android
      systems.  It also reduces the amount of divergent code paths needed
      for WUSB devices.
      Signed-off-by: default avatarThomas Pugliese <thomas.pugliese@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d954143
    • Mikulas Patocka's avatar
      hpfs: remember free space · 3e700ec6
      Mikulas Patocka authored
      commit 2cbe5c76 upstream.
      
      Previously, hpfs scanned all bitmaps each time the user asked for free
      space using statfs.  This patch changes it so that hpfs scans the
      bitmaps only once, remembes the free space and on next invocation of
      statfs it returns the value instantly.
      
      New versions of wine are hammering on the statfs syscall very heavily,
      making some games unplayable when they're stored on hpfs, with load
      times in minutes.
      
      This should be backported to the stable kernels because it fixes
      user-visible problem (excessive level load times in wine).
      Signed-off-by: default avatarMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e700ec6
    • Takashi Iwai's avatar
      ALSA: hda - Don't set indep_hp flag for old AD codecs · 8a8c97e6
      Takashi Iwai authored
      commit cbd209f4 upstream.
      
      Some old AD codecs don't like the independent HP handling, either it
      contains a single DAC (AD1981) or it mandates the mixer routing
      (AD1986A).  This patch removes the indep_hp flag for such codecs.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=68081Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a8c97e6
    • Mihai Caraman's avatar
      KVM: PPC: e500: Fix bad address type in deliver_tlb_misss() · 946c119b
      Mihai Caraman authored
      commit 70713fe3 upstream.
      
      Use gva_t instead of unsigned int for eaddr in deliver_tlb_miss().
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      946c119b
    • Andreas Schwab's avatar
    • Helge Deller's avatar
      parisc: fix cache-flushing · 625988d3
      Helge Deller authored
      commit 57737c49 upstream.
      
      This commit:
      f8dae006: parisc: Ensure full cache coherency for kmap/kunmap
      caused negative caching side-effects, e.g. hanging processes with expect and
      too many inequivalent alias messages from flush_dcache_page() on Debian 5 systems.
      
      This patch now partly reverts it and has been in production use on our debian buildd
      makeservers since a week without any major problems.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      625988d3
    • Helge Deller's avatar
      parisc/sti_console: prefer Linux fonts over built-in ROM fonts · 3b11ff52
      Helge Deller authored
      commit 8a10bc9d upstream.
      
      The built-in ROM fonts lack many necessary ASCII characters, which is
      why it makes sens to prefer the Linux fonts instead if they are
      available.  This makes consoles on STI graphics cards which are not
      supported by the stifb driver (e.g. Visualize FXe) looks much nicer.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b11ff52
    • Mikulas Patocka's avatar
      alpha: fix broken network checksum · 2a98c6f0
      Mikulas Patocka authored
      commit 0ef38d70 upstream.
      
      The patch 3ddc5b46 breaks networking on
      alpha (there is a follow-up fix 5cfe8f1b,
      but networking is still broken even with the second patch).
      
      The patch 3ddc5b46 makes
      csum_partial_copy_from_user check the pointer with access_ok. However,
      csum_partial_copy_from_user is called also from csum_partial_copy_nocheck
      and csum_partial_copy_nocheck is called on kernel pointers and it is
      supposed not to check pointer validity.
      
      This bug results in ssh session hangs if the system is loaded and bulk
      data are printed to ssh terminal.
      
      This patch fixes csum_partial_copy_nocheck to call set_fs(KERNEL_DS), so
      that access_ok in csum_partial_copy_from_user accepts kernel-space
      addresses.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMatt Turner <mattst88@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a98c6f0
    • Dong Aisheng's avatar
      mmc: sdhci-esdhc-imx: fix access hardirq-unsafe lock in atomic context · 8fabad70
      Dong Aisheng authored
      commit a974862f upstream.
      
      Sometimes we may meet the following lockdep issue.
      The root cause is .set_clock callback is executed with spin_lock_irqsave
      in sdhci_do_set_ios. However, the IMX set_clock callback will try to access
      clk_get_rate which is using a mutex lock.
      
      The fix avoids access mutex in .set_clock callback by initializing the
      pltfm_host->clock at probe time and use it later instead of calling
      clk_get_rate again in atomic context.
      
      [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
      3.13.0-rc1+ #285 Not tainted
      ------------------------------------------------------
      kworker/u8:1/29 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
       (prepare_lock){+.+...}, at: [<80480b08>] clk_prepare_lock+0x44/0xe4
      
      and this task is already holding:
       (&(&host->lock)->rlock#2){-.-...}, at: [<804611f4>] sdhci_do_set_ios+0x20/0x720
      which would create a new lock dependency:
       (&(&host->lock)->rlock#2){-.-...} -> (prepare_lock){+.+...}
      
      but this new dependency connects a HARDIRQ-irq-safe lock:
       (&(&host->lock)->rlock#2){-.-...}
      ... which became HARDIRQ-irq-safe at:
        [<8005f030>] mark_lock+0x140/0x6ac
        [<80060760>] __lock_acquire+0xb30/0x1cbc
        [<800620d0>] lock_acquire+0x70/0x84
        [<8061d2f0>] _raw_spin_lock+0x30/0x40
        [<80460668>] sdhci_irq+0x24/0xa68
        [<8006b1d4>] handle_irq_event_percpu+0x54/0x18c
        [<8006b350>] handle_irq_event+0x44/0x64
        [<8006e50c>] handle_fasteoi_irq+0xa0/0x170
        [<8006a8f0>] generic_handle_irq+0x30/0x44
        [<8000f238>] handle_IRQ+0x54/0xbc
        [<8000864c>] gic_handle_irq+0x30/0x64
        [<80013024>] __irq_svc+0x44/0x5c
        [<80614c58>] printk+0x38/0x40
        [<804622a8>] sdhci_add_host+0x844/0xbcc
        [<80464948>] sdhci_esdhc_imx_probe+0x378/0x67c
        [<8032ee88>] platform_drv_probe+0x20/0x50
        [<8032d48c>] driver_probe_device+0x118/0x234
        [<8032d690>] __driver_attach+0x9c/0xa0
        [<8032b89c>] bus_for_each_dev+0x68/0x9c
        [<8032cf44>] driver_attach+0x20/0x28
        [<8032cbc8>] bus_add_driver+0x148/0x1f4
        [<8032dce0>] driver_register+0x80/0x100
        [<8032ee54>] __platform_driver_register+0x50/0x64
        [<8084b094>] sdhci_esdhc_imx_driver_init+0x18/0x20
        [<80008980>] do_one_initcall+0x108/0x16c
        [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
        [<80611c50>] kernel_init+0x10/0x120
        [<8000e9c8>] ret_from_fork+0x14/0x2c
      
      to a HARDIRQ-irq-unsafe lock:
       (prepare_lock){+.+...}
      ... which became HARDIRQ-irq-unsafe at:
      ...  [<8005f030>] mark_lock+0x140/0x6ac
        [<8005f604>] mark_held_locks+0x68/0x12c
        [<8005f780>] trace_hardirqs_on_caller+0xb8/0x1d8
        [<8005f8b4>] trace_hardirqs_on+0x14/0x18
        [<8061a130>] mutex_trylock+0x180/0x20c
        [<80480ad8>] clk_prepare_lock+0x14/0xe4
        [<804816a4>] clk_notifier_register+0x28/0xf0
        [<80015120>] twd_clk_init+0x50/0x68
        [<80008980>] do_one_initcall+0x108/0x16c
        [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
        [<80611c50>] kernel_init+0x10/0x120
        [<8000e9c8>] ret_from_fork+0x14/0x2c
      
      other info that might help us debug this:
      
       Possible interrupt unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(prepare_lock);
                                     local_irq_disable();
                                     lock(&(&host->lock)->rlock#2);
                                     lock(prepare_lock);
        <Interrupt>
          lock(&(&host->lock)->rlock#2);
      
       *** DEADLOCK ***
      
      3 locks held by kworker/u8:1/29:
       #0:  (kmmcd){.+.+.+}, at: [<8003db18>] process_one_work+0x128/0x468
       #1:  ((&(&host->detect)->work)){+.+.+.}, at: [<8003db18>] process_one_work+0x128/0x468
       #2:  (&(&host->lock)->rlock#2){-.-...}, at: [<804611f4>] sdhci_do_set_ios+0x20/0x720
      
      the dependencies between HARDIRQ-irq-safe lock and the holding lock:
      -> (&(&host->lock)->rlock#2){-.-...} ops: 330 {
         IN-HARDIRQ-W at:
                          [<8005f030>] mark_lock+0x140/0x6ac
                          [<80060760>] __lock_acquire+0xb30/0x1cbc
                          [<800620d0>] lock_acquire+0x70/0x84
                          [<8061d2f0>] _raw_spin_lock+0x30/0x40
                          [<80460668>] sdhci_irq+0x24/0xa68
                          [<8006b1d4>] handle_irq_event_percpu+0x54/0x18c
                          [<8006b350>] handle_irq_event+0x44/0x64
                          [<8006e50c>] handle_fasteoi_irq+0xa0/0x170
                          [<8006a8f0>] generic_handle_irq+0x30/0x44
                          [<8000f238>] handle_IRQ+0x54/0xbc
                          [<8000864c>] gic_handle_irq+0x30/0x64
                          [<80013024>] __irq_svc+0x44/0x5c
                          [<80614c58>] printk+0x38/0x40
                          [<804622a8>] sdhci_add_host+0x844/0xbcc
                          [<80464948>] sdhci_esdhc_imx_probe+0x378/0x67c
                          [<8032ee88>] platform_drv_probe+0x20/0x50
                          [<8032d48c>] driver_probe_device+0x118/0x234
                          [<8032d690>] __driver_attach+0x9c/0xa0
                          [<8032b89c>] bus_for_each_dev+0x68/0x9c
                          [<8032cf44>] driver_attach+0x20/0x28
                          [<8032cbc8>] bus_add_driver+0x148/0x1f4
                          [<8032dce0>] driver_register+0x80/0x100
                          [<8032ee54>] __platform_driver_register+0x50/0x64
                          [<8084b094>] sdhci_esdhc_imx_driver_init+0x18/0x20
                          [<80008980>] do_one_initcall+0x108/0x16c
                          [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
                          [<80611c50>] kernel_init+0x10/0x120
                          [<8000e9c8>] ret_from_fork+0x14/0x2c
         IN-SOFTIRQ-W at:
                          [<8005f030>] mark_lock+0x140/0x6ac
                          [<80060204>] __lock_acquire+0x5d4/0x1cbc
                          [<800620d0>] lock_acquire+0x70/0x84
                          [<8061d40c>] _raw_spin_lock_irqsave+0x40/0x54
                          [<8045e4a4>] sdhci_tasklet_finish+0x1c/0x120
                          [<8002b538>] tasklet_action+0xa0/0x15c
                          [<8002b778>] __do_softirq+0x118/0x290
                          [<8002bcf4>] irq_exit+0xb4/0x10c
                          [<8000f240>] handle_IRQ+0x5c/0xbc
                          [<8000864c>] gic_handle_irq+0x30/0x64
                          [<80013024>] __irq_svc+0x44/0x5c
                          [<80614c58>] printk+0x38/0x40
                          [<804622a8>] sdhci_add_host+0x844/0xbcc
                          [<80464948>] sdhci_esdhc_imx_probe+0x378/0x67c
                          [<8032ee88>] platform_drv_probe+0x20/0x50
                          [<8032d48c>] driver_probe_device+0x118/0x234
                          [<8032d690>] __driver_attach+0x9c/0xa0
                          [<8032b89c>] bus_for_each_dev+0x68/0x9c
                          [<8032cf44>] driver_attach+0x20/0x28
                          [<8032cbc8>] bus_add_driver+0x148/0x1f4
                          [<8032dce0>] driver_register+0x80/0x100
                          [<8032ee54>] __platform_driver_register+0x50/0x64
                          [<8084b094>] sdhci_esdhc_imx_driver_init+0x18/0x20
                          [<80008980>] do_one_initcall+0x108/0x16c
                          [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
                          [<80611c50>] kernel_init+0x10/0x120
                          [<8000e9c8>] ret_from_fork+0x14/0x2c
         INITIAL USE at:
                         [<8005f030>] mark_lock+0x140/0x6ac
                         [<8005ff0c>] __lock_acquire+0x2dc/0x1cbc
                         [<800620d0>] lock_acquire+0x70/0x84
                         [<8061d40c>] _raw_spin_lock_irqsave+0x40/0x54
                         [<804611f4>] sdhci_do_set_ios+0x20/0x720
                         [<80461924>] sdhci_set_ios+0x30/0x3c
                         [<8044cea0>] mmc_power_up+0x6c/0xd0
                         [<8044dac4>] mmc_start_host+0x60/0x70
                         [<8044eb3c>] mmc_add_host+0x60/0x88
                         [<8046225c>] sdhci_add_host+0x7f8/0xbcc
                         [<80464948>] sdhci_esdhc_imx_probe+0x378/0x67c
                         [<8032ee88>] platform_drv_probe+0x20/0x50
                         [<8032d48c>] driver_probe_device+0x118/0x234
                         [<8032d690>] __driver_attach+0x9c/0xa0
                         [<8032b89c>] bus_for_each_dev+0x68/0x9c
                         [<8032cf44>] driver_attach+0x20/0x28
                         [<8032cbc8>] bus_add_driver+0x148/0x1f4
                         [<8032dce0>] driver_register+0x80/0x100
                         [<8032ee54>] __platform_driver_register+0x50/0x64
                         [<8084b094>] sdhci_esdhc_imx_driver_init+0x18/0x20
                         [<80008980>] do_one_initcall+0x108/0x16c
                         [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
                         [<80611c50>] kernel_init+0x10/0x120
                         [<8000e9c8>] ret_from_fork+0x14/0x2c
       }
       ... key      at: [<80e040e8>] __key.26952+0x0/0x8
       ... acquired at:
         [<8005eb60>] check_usage+0x3d0/0x5c0
         [<8005edac>] check_irq_usage+0x5c/0xb8
         [<80060d38>] __lock_acquire+0x1108/0x1cbc
         [<800620d0>] lock_acquire+0x70/0x84
         [<8061a210>] mutex_lock_nested+0x54/0x3c0
         [<80480b08>] clk_prepare_lock+0x44/0xe4
         [<8048188c>] clk_get_rate+0x14/0x64
         [<8046374c>] esdhc_pltfm_set_clock+0x20/0x2a4
         [<8045d70c>] sdhci_set_clock+0x4c/0x498
         [<80461518>] sdhci_do_set_ios+0x344/0x720
         [<80461924>] sdhci_set_ios+0x30/0x3c
         [<8044c390>] __mmc_set_clock+0x44/0x60
         [<8044cd4c>] mmc_set_clock+0x10/0x14
         [<8044f8f4>] mmc_init_card+0x1b4/0x1520
         [<80450f00>] mmc_attach_mmc+0xb4/0x194
         [<8044da08>] mmc_rescan+0x294/0x2f0
         [<8003db94>] process_one_work+0x1a4/0x468
         [<8003e850>] worker_thread+0x118/0x3e0
         [<80044de0>] kthread+0xd4/0xf0
         [<8000e9c8>] ret_from_fork+0x14/0x2c
      
      the dependencies between the lock to be acquired and HARDIRQ-irq-unsafe lock:
      -> (prepare_lock){+.+...} ops: 395 {
         HARDIRQ-ON-W at:
                          [<8005f030>] mark_lock+0x140/0x6ac
                          [<8005f604>] mark_held_locks+0x68/0x12c
                          [<8005f780>] trace_hardirqs_on_caller+0xb8/0x1d8
                          [<8005f8b4>] trace_hardirqs_on+0x14/0x18
                          [<8061a130>] mutex_trylock+0x180/0x20c
                          [<80480ad8>] clk_prepare_lock+0x14/0xe4
                          [<804816a4>] clk_notifier_register+0x28/0xf0
                          [<80015120>] twd_clk_init+0x50/0x68
                          [<80008980>] do_one_initcall+0x108/0x16c
                          [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
                          [<80611c50>] kernel_init+0x10/0x120
                          [<8000e9c8>] ret_from_fork+0x14/0x2c
         SOFTIRQ-ON-W at:
                          [<8005f030>] mark_lock+0x140/0x6ac
                          [<8005f604>] mark_held_locks+0x68/0x12c
                          [<8005f7c8>] trace_hardirqs_on_caller+0x100/0x1d8
                          [<8005f8b4>] trace_hardirqs_on+0x14/0x18
                          [<8061a130>] mutex_trylock+0x180/0x20c
                          [<80480ad8>] clk_prepare_lock+0x14/0xe4
                          [<804816a4>] clk_notifier_register+0x28/0xf0
                          [<80015120>] twd_clk_init+0x50/0x68
                          [<80008980>] do_one_initcall+0x108/0x16c
                          [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
                          [<80611c50>] kernel_init+0x10/0x120
                          [<8000e9c8>] ret_from_fork+0x14/0x2c
         INITIAL USE at:
                         [<8005f030>] mark_lock+0x140/0x6ac
                         [<8005ff0c>] __lock_acquire+0x2dc/0x1cbc
                         [<800620d0>] lock_acquire+0x70/0x84
                         [<8061a0c8>] mutex_trylock+0x118/0x20c
                         [<80480ad8>] clk_prepare_lock+0x14/0xe4
                         [<80482af8>] __clk_init+0x1c/0x45c
                         [<8048306c>] _clk_register+0xd0/0x170
                         [<80483148>] clk_register+0x3c/0x7c
                         [<80483b4c>] clk_register_fixed_rate+0x88/0xd8
                         [<80483c04>] of_fixed_clk_setup+0x68/0x94
                         [<8084c6fc>] of_clk_init+0x44/0x68
                         [<808202b0>] time_init+0x2c/0x38
                         [<8081ca14>] start_kernel+0x1e4/0x368
                         [<10008074>] 0x10008074
       }
       ... key      at: [<808afebc>] prepare_lock+0x38/0x48
       ... acquired at:
         [<8005eb94>] check_usage+0x404/0x5c0
         [<8005edac>] check_irq_usage+0x5c/0xb8
         [<80060d38>] __lock_acquire+0x1108/0x1cbc
         [<800620d0>] lock_acquire+0x70/0x84
         [<8061a210>] mutex_lock_nested+0x54/0x3c0
         [<80480b08>] clk_prepare_lock+0x44/0xe4
         [<8048188c>] clk_get_rate+0x14/0x64
         [<8046374c>] esdhc_pltfm_set_clock+0x20/0x2a4
         [<8045d70c>] sdhci_set_clock+0x4c/0x498
         [<80461518>] sdhci_do_set_ios+0x344/0x720
         [<80461924>] sdhci_set_ios+0x30/0x3c
         [<8044c390>] __mmc_set_clock+0x44/0x60
         [<8044cd4c>] mmc_set_clock+0x10/0x14
         [<8044f8f4>] mmc_init_card+0x1b4/0x1520
         [<80450f00>] mmc_attach_mmc+0xb4/0x194
         [<8044da08>] mmc_rescan+0x294/0x2f0
         [<8003db94>] process_one_work+0x1a4/0x468
         [<8003e850>] worker_thread+0x118/0x3e0
         [<80044de0>] kthread+0xd4/0xf0
         [<8000e9c8>] ret_from_fork+0x14/0x2c
      
      stack backtrace:
      CPU: 2 PID: 29 Comm: kworker/u8:1 Not tainted 3.13.0-rc1+ #285
      Workqueue: kmmcd mmc_rescan
      Backtrace:
      [<80012160>] (dump_backtrace+0x0/0x10c) from [<80012438>] (show_stack+0x18/0x1c)
       r6:00000000 r5:00000000 r4:8088ecc8 r3:bfa11200
      [<80012420>] (show_stack+0x0/0x1c) from [<80616b14>] (dump_stack+0x84/0x9c)
      [<80616a90>] (dump_stack+0x0/0x9c) from [<8005ebb4>] (check_usage+0x424/0x5c0)
       r5:80979940 r4:bfa29b44
      [<8005e790>] (check_usage+0x0/0x5c0) from [<8005edac>] (check_irq_usage+0x5c/0xb8)
      [<8005ed50>] (check_irq_usage+0x0/0xb8) from [<80060d38>] (__lock_acquire+0x1108/0x1cbc)
       r8:bfa115e8 r7:80df9884 r6:80dafa9c r5:00000003 r4:bfa115d0
      [<8005fc30>] (__lock_acquire+0x0/0x1cbc) from [<800620d0>] (lock_acquire+0x70/0x84)
      [<80062060>] (lock_acquire+0x0/0x84) from [<8061a210>] (mutex_lock_nested+0x54/0x3c0)
       r7:bfa11200 r6:80dafa9c r5:00000000 r4:80480b08
      [<8061a1bc>] (mutex_lock_nested+0x0/0x3c0) from [<80480b08>] (clk_prepare_lock+0x44/0xe4)
      [<80480ac4>] (clk_prepare_lock+0x0/0xe4) from [<8048188c>] (clk_get_rate+0x14/0x64)
       r6:03197500 r5:bf0e9aa8 r4:bf827400 r3:808ae128
      [<80481878>] (clk_get_rate+0x0/0x64) from [<8046374c>] (esdhc_pltfm_set_clock+0x20/0x2a4)
       r5:bf0e9aa8 r4:bf0e9c40
      [<8046372c>] (esdhc_pltfm_set_clock+0x0/0x2a4) from [<8045d70c>] (sdhci_set_clock+0x4c/0x498)
      [<8045d6c0>] (sdhci_set_clock+0x0/0x498) from [<80461518>] (sdhci_do_set_ios+0x344/0x720)
       r8:0000003b r7:20000113 r6:bf0e9d68 r5:bf0e9aa8 r4:bf0e9c40
      r3:00000000
      [<804611d4>] (sdhci_do_set_ios+0x0/0x720) from [<80461924>] (sdhci_set_ios+0x30/0x3c)
       r9:00000004 r8:bf131000 r7:bf131048 r6:00000000 r5:bf0e9aa8
      r4:bf0e9800
      [<804618f4>] (sdhci_set_ios+0x0/0x3c) from [<8044c390>] (__mmc_set_clock+0x44/0x60)
       r5:03197500 r4:bf0e9800
      [<8044c34c>] (__mmc_set_clock+0x0/0x60) from [<8044cd4c>] (mmc_set_clock+0x10/0x14)
       r5:00000000 r4:bf0e9800
      [<8044cd3c>] (mmc_set_clock+0x0/0x14) from [<8044f8f4>] (mmc_init_card+0x1b4/0x1520)
      [<8044f740>] (mmc_init_card+0x0/0x1520) from [<80450f00>] (mmc_attach_mmc+0xb4/0x194)
      [<80450e4c>] (mmc_attach_mmc+0x0/0x194) from [<8044da08>] (mmc_rescan+0x294/0x2f0)
       r5:8065f358 r4:bf0e9af8
      [<8044d774>] (mmc_rescan+0x0/0x2f0) from [<8003db94>] (process_one_work+0x1a4/0x468)
       r8:00000000 r7:bfa29eb0 r6:bf80dc00 r5:bf0e9af8 r4:bf9e3f00
      r3:8044d774
      [<8003d9f0>] (process_one_work+0x0/0x468) from [<8003e850>] (worker_thread+0x118/0x3e0)
      [<8003e738>] (worker_thread+0x0/0x3e0) from [<80044de0>] (kthread+0xd4/0xf0)
      [<80044d0c>] (kthread+0x0/0xf0) from [<8000e9c8>] (ret_from_fork+0x14/0x2c)
       r7:00000000 r6:00000000 r5:80044d0c r4:bf9e7f00
      
      Fixes: 0ddf03c9 mmc: esdhc-imx: parse max-frequency from devicetree
      Signed-off-by: default avatarDong Aisheng <b29396@freescale.com>
      Acked-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Tested-by: default avatarPhilippe De Muyter <phdm@macqel.be>
      Signed-off-by: default avatarChris Ball <chris@printf.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8fabad70
    • Aisheng Dong's avatar
      mmc: sdhci: fix lockdep error in tuning routine · ee02211f
      Aisheng Dong authored
      commit 2b35bd83 upstream.
      
      The sdhci_execute_tuning routine gets lock separately by
      disable_irq(host->irq);
      spin_lock(&host->lock);
      It will cause the following lockdep error message since the &host->lock
      could also be got in irq context.
      Use spin_lock_irqsave/spin_unlock_restore instead to get rid of
      this error message.
      
      [ INFO: inconsistent lock state ]
      3.13.0-rc1+ #287 Not tainted
      ---------------------------------
      inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
      kworker/u2:1/33 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (&(&host->lock)->rlock){?.-...}, at: [<8045f7f4>] sdhci_execute_tuning+0x4c/0x710
      {IN-HARDIRQ-W} state was registered at:
        [<8005f030>] mark_lock+0x140/0x6ac
        [<80060760>] __lock_acquire+0xb30/0x1cbc
        [<800620d0>] lock_acquire+0x70/0x84
        [<8061d1c8>] _raw_spin_lock+0x30/0x40
        [<804605cc>] sdhci_irq+0x24/0xa68
        [<8006b1d4>] handle_irq_event_percpu+0x54/0x18c
        [<8006b350>] handle_irq_event+0x44/0x64
        [<8006e50c>] handle_fasteoi_irq+0xa0/0x170
        [<8006a8f0>] generic_handle_irq+0x30/0x44
        [<8000f238>] handle_IRQ+0x54/0xbc
        [<8000864c>] gic_handle_irq+0x30/0x64
        [<80013024>] __irq_svc+0x44/0x5c
        [<80329bf4>] dev_vprintk_emit+0x50/0x58
        [<80329c24>] dev_printk_emit+0x28/0x30
        [<80329fec>] __dev_printk+0x4c/0x90
        [<8032a180>] dev_err+0x3c/0x48
        [<802dd4f0>] _regulator_get+0x158/0x1cc
        [<802dd5b4>] regulator_get_optional+0x18/0x1c
        [<80461df4>] sdhci_add_host+0x42c/0xbd8
        [<80464820>] sdhci_esdhc_imx_probe+0x378/0x67c
        [<8032ee88>] platform_drv_probe+0x20/0x50
        [<8032d48c>] driver_probe_device+0x118/0x234
        [<8032d690>] __driver_attach+0x9c/0xa0
        [<8032b89c>] bus_for_each_dev+0x68/0x9c
        [<8032cf44>] driver_attach+0x20/0x28
        [<8032cbc8>] bus_add_driver+0x148/0x1f4
        [<8032dce0>] driver_register+0x80/0x100
        [<8032ee54>] __platform_driver_register+0x50/0x64
        [<8084b094>] sdhci_esdhc_imx_driver_init+0x18/0x20
        [<80008980>] do_one_initcall+0x108/0x16c
        [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
        [<80611b28>] kernel_init+0x10/0x120
        [<8000e9c8>] ret_from_fork+0x14/0x2c
      irq event stamp: 805
      hardirqs last  enabled at (805): [<8061d43c>] _raw_spin_unlock_irqrestore+0x38/0x4c
      hardirqs last disabled at (804): [<8061d2c8>] _raw_spin_lock_irqsave+0x24/0x54
      softirqs last  enabled at (570): [<8002b824>] __do_softirq+0x1c4/0x290
      softirqs last disabled at (561): [<8002bcf4>] irq_exit+0xb4/0x10c
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&host->lock)->rlock);
        <Interrupt>
          lock(&(&host->lock)->rlock);
      
       *** DEADLOCK ***
      
      2 locks held by kworker/u2:1/33:
       #0:  (kmmcd){.+.+..}, at: [<8003db18>] process_one_work+0x128/0x468
       #1:  ((&(&host->detect)->work)){+.+...}, at: [<8003db18>] process_one_work+0x128/0x468
      
      stack backtrace:
      CPU: 0 PID: 33 Comm: kworker/u2:1 Not tainted 3.13.0-rc1+ #287
      Workqueue: kmmcd mmc_rescan
      Backtrace:
      [<80012160>] (dump_backtrace+0x0/0x10c) from [<80012438>] (show_stack+0x18/0x1c)
       r6:bfad0900 r5:00000000 r4:8088ecc8 r3:bfad0900
      [<80012420>] (show_stack+0x0/0x1c) from [<806169ec>] (dump_stack+0x84/0x9c)
      [<80616968>] (dump_stack+0x0/0x9c) from [<806147b4>] (print_usage_bug+0x260/0x2d0)
       r5:8076ba88 r4:80977410
      [<80614554>] (print_usage_bug+0x0/0x2d0) from [<8005f0d0>] (mark_lock+0x1e0/0x6ac)
       r9:8005e678 r8:00000000 r7:bfad0900 r6:00001015 r5:bfad0cd0
      r4:00000002
      [<8005eef0>] (mark_lock+0x0/0x6ac) from [<80060234>] (__lock_acquire+0x604/0x1cbc)
      [<8005fc30>] (__lock_acquire+0x0/0x1cbc) from [<800620d0>] (lock_acquire+0x70/0x84)
      [<80062060>] (lock_acquire+0x0/0x84) from [<8061d1c8>] (_raw_spin_lock+0x30/0x40)
       r7:00000000 r6:bfb63000 r5:00000000 r4:bfb60568
      [<8061d198>] (_raw_spin_lock+0x0/0x40) from [<8045f7f4>] (sdhci_execute_tuning+0x4c/0x710)
       r4:bfb60000
      [<8045f7a8>] (sdhci_execute_tuning+0x0/0x710) from [<80453454>] (mmc_sd_init_card+0x5f8/0x660)
      [<80452e5c>] (mmc_sd_init_card+0x0/0x660) from [<80453748>] (mmc_attach_sd+0xb4/0x180)
       r9:bf92d400 r8:8065f364 r7:00061a80 r6:bfb60000 r5:8065f358
      r4:bfb60000
      [<80453694>] (mmc_attach_sd+0x0/0x180) from [<8044d9f8>] (mmc_rescan+0x284/0x2f0)
       r5:8065f358 r4:bfb602f8
      [<8044d774>] (mmc_rescan+0x0/0x2f0) from [<8003db94>] (process_one_work+0x1a4/0x468)
       r8:00000000 r7:bfb55eb0 r6:bf80dc00 r5:bfb602f8 r4:bfb35980
      r3:8044d774
      [<8003d9f0>] (process_one_work+0x0/0x468) from [<8003e850>] (worker_thread+0x118/0x3e0)
      [<8003e738>] (worker_thread+0x0/0x3e0) from [<80044de0>] (kthread+0xd4/0xf0)
      [<80044d0c>] (kthread+0x0/0xf0) from [<8000e9c8>] (ret_from_fork+0x14/0x2c)
       r7:00000000 r6:00000000 r5:80044d0c r4:bfb37b40
      Signed-off-by: default avatarDong Aisheng <b29396@freescale.com>
      Signed-off-by: default avatarChris Ball <chris@printf.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee02211f
    • David Cohen's avatar
      mmc: sdhci-pci: add broken HS200 quirk for Intel Merrifield · 8a6551ad
      David Cohen authored
      commit 390145f9 upstream.
      
      Due to unknown hw issue so far, Merrifield is unable to enable HS200
      support. This patch adds quirk to avoid SDHCI to initialize with error
      below:
      
      [   53.850132] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W
      3.12.0-rc6-00037-g3d7c8d9-dirty #36
      [   53.850150] Hardware name: Intel Corporation Merrifield/SALT BAY,
      BIOS 397 2013.09.12:11.51.40
      [   53.850167]  00000000 00000000 ee409e48 c18816d2 00000000 ee409e78
      c123e254 c1acc9b0
      [   53.850227]  00000000 00000000 c1b14148 000003de c16c03bf c16c03bf
      ee75b480 ed97c54c
      [   53.850282]  ee75b480 ee409e88 c123e292 00000009 00000000 ee409ef8
      c16c03bf c1207fac
      [   53.850339] Call Trace:
      [   53.850376]  [<c18816d2>] dump_stack+0x4b/0x79
      [   53.850408]  [<c123e254>] warn_slowpath_common+0x84/0xa0
      [   53.850436]  [<c16c03bf>] ? sdhci_send_command+0xb4f/0xc50
      [   53.850462]  [<c16c03bf>] ? sdhci_send_command+0xb4f/0xc50
      [   53.850490]  [<c123e292>] warn_slowpath_null+0x22/0x30
      [   53.850516]  [<c16c03bf>] sdhci_send_command+0xb4f/0xc50
      [   53.850545]  [<c1207fac>] ? native_sched_clock+0x2c/0xb0
      [   53.850575]  [<c14c1f93>] ? delay_tsc+0x73/0xb0
      [   53.850601]  [<c14c1ebe>] ? __const_udelay+0x1e/0x20
      [   53.850626]  [<c16bdeb3>] ? sdhci_reset+0x93/0x190
      [   53.850654]  [<c16c05b0>] sdhci_finish_data+0xf0/0x2e0
      [   53.850683]  [<c16c130f>] sdhci_irq+0x31f/0x930
      [   53.850713]  [<c12cb080>] ? __buffer_unlock_commit+0x10/0x20
      [   53.850740]  [<c12cbcd7>] ? trace_buffer_unlock_commit+0x37/0x50
      [   53.850773]  [<c1288f3c>] handle_irq_event_percpu+0x5c/0x220
      [   53.850800]  [<c128bc96>] ? handle_fasteoi_irq+0x16/0xd0
      [   53.850827]  [<c128913a>] handle_irq_event+0x3a/0x60
      [   53.850852]  [<c128bc80>] ? unmask_irq+0x30/0x30
      [   53.850878]  [<c128bcce>] handle_fasteoi_irq+0x4e/0xd0
      [   53.850895]  <IRQ>  [<c1890b52>] ? do_IRQ+0x42/0xb0
      [   53.850943]  [<c1890a31>] ? common_interrupt+0x31/0x38
      [   53.850973]  [<c12b00d8>] ? cgroup_mkdir+0x4e8/0x580
      [   53.851001]  [<c1208d32>] ? default_idle+0x22/0xf0
      [   53.851029]  [<c1209576>] ? arch_cpu_idle+0x26/0x30
      [   53.851054]  [<c1288505>] ? cpu_startup_entry+0x65/0x240
      [   53.851082]  [<c18793d5>] ? rest_init+0xb5/0xc0
      [   53.851108]  [<c1879320>] ? __read_lock_failed+0x18/0x18
      [   53.851138]  [<c1bf6a15>] ? start_kernel+0x31b/0x321
      [   53.851164]  [<c1bf652f>] ? repair_env_string+0x51/0x51
      [   53.851190]  [<c1bf6363>] ? i386_start_kernel+0x139/0x13c
      [   53.851209] ---[ end trace 92777f5fe48d33f2 ]---
      [   53.853449] mmcblk0: error -84 transferring data, sector 11142162, nr
      304, cmd response 0x0, card status 0x0
      [   53.853476] mmcblk0: retrying using single block read
      [   55.937863] sdhci: Timeout waiting for Buffer Read Ready interrupt
      during tuning procedure, falling back to fixed sampling clock
      [   56.207951] sdhci: Timeout waiting for Buffer Read Ready interrupt
      during tuning procedure, falling back to fixed sampling clock
      [   66.228785] mmc0: Timeout waiting for hardware interrupt.
      [   66.230855] ------------[ cut here ]------------
      Signed-off-by: default avatarDavid Cohen <david.a.cohen@linux.intel.com>
      Reviewed-by: default avatarChuanxiao Dong <chuanxiao.dong@intel.com>
      Acked-by: default avatarDong Aisheng <b29396@freescale.com>
      Signed-off-by: default avatarChris Ball <chris@printf.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a6551ad
    • David Cohen's avatar
      mmc: sdhci: add quirk for broken HS200 support · a5891096
      David Cohen authored
      commit 13868bf2 upstream.
      
      This patch defines a quirk for platforms unable to enable HS200 support.
      Signed-off-by: default avatarDavid Cohen <david.a.cohen@linux.intel.com>
      Reviewed-by: default avatarChuanxiao Dong <chuanxiao.dong@intel.com>
      Acked-by: default avatarDong Aisheng <b29396@freescale.com>
      Signed-off-by: default avatarChris Ball <chris@printf.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5891096
    • Duan Jiong's avatar
      net: gre: use icmp_hdr() to get inner ip header · c8579f32
      Duan Jiong authored
      [ Upstream commit c0c0c50f ]
      
      When dealing with icmp messages, the skb->data points the
      ip header that triggered the sending of the icmp message.
      
      In gre_cisco_err(), the parse_gre_header() is called, and the
      iptunnel_pull_header() is called to pull the skb at the end of
      the parse_gre_header(), so the skb->data doesn't point the
      inner ip header.
      
      Unfortunately, the ipgre_err still needs those ip addresses in
      inner ip header to look up tunnel by ip_tunnel_lookup().
      
      So just use icmp_hdr() to get inner ip header instead of skb->data.
      Signed-off-by: default avatarDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8579f32
    • Annie Li's avatar
      xen-netfront: fix resource leak in netfront · b84c36cb
      Annie Li authored
      [ Upstream commit cefe0078 ]
      
      This patch removes grant transfer releasing code from netfront, and uses
      gnttab_end_foreign_access to end grant access since
      gnttab_end_foreign_access_ref may fail when the grant entry is
      currently used for reading or writing.
      
      * clean up grant transfer code kept from old netfront(2.6.18) which grants
      pages for access/map and transfer. But grant transfer is deprecated in current
      netfront, so remove corresponding release code for transfer.
      
      * fix resource leak, release grant access (through gnttab_end_foreign_access)
      and skb for tx/rx path, use get_page to ensure page is released when grant
      access is completed successfully.
      
      Xen-blkfront/xen-tpmfront/xen-pcifront also have similar issue, but patches
      for them will be created separately.
      
      V6: Correct subject line and commit message.
      
      V5: Remove unecessary change in xennet_end_access.
      
      V4: Revert put_page in gnttab_end_foreign_access, and keep netfront change in
      single patch.
      
      V3: Changes as suggestion from David Vrabel, ensure pages are not freed untill
      grant acess is ended.
      
      V2: Improve patch comments.
      Signed-off-by: default avatarAnnie Li <annie.li@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b84c36cb
    • Holger Eitzenberger's avatar
      net: Fix memory leak if TPROXY used with TCP early demux · 0fdedfaa
      Holger Eitzenberger authored
      [ Upstream commit a452ce34 ]
      
      I see a memory leak when using a transparent HTTP proxy using TPROXY
      together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable):
      
      unreferenced object 0xffff88008cba4a40 (size 1696):
        comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s)
        hex dump (first 32 bytes):
          0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00  .. j@..7..2.....
          02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9
          [<ffffffff81270185>] sk_prot_alloc+0x29/0xc5
          [<ffffffff812702cf>] sk_clone_lock+0x14/0x283
          [<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b
          [<ffffffff8129a893>] netlink_broadcast+0x14/0x16
          [<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3
          [<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d
          [<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0
          [<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e
          [<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55
          [<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725
          [<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154
          [<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514
          [<ffffffff8127aa77>] process_backlog+0xee/0x1c5
          [<ffffffff8127c949>] net_rx_action+0xa7/0x200
          [<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157
      
      But there are many more, resulting in the machine going OOM after some
      days.
      
      From looking at the TPROXY code, and with help from Florian, I see
      that the memory leak is introduced in tcp_v4_early_demux():
      
        void tcp_v4_early_demux(struct sk_buff *skb)
        {
          /* ... */
      
          iph = ip_hdr(skb);
          th = tcp_hdr(skb);
      
          if (th->doff < sizeof(struct tcphdr) / 4)
              return;
      
          sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
                             iph->saddr, th->source,
                             iph->daddr, ntohs(th->dest),
                             skb->skb_iif);
          if (sk) {
              skb->sk = sk;
      
      where the socket is assigned unconditionally to skb->sk, also bumping
      the refcnt on it.  This is problematic, because in our case the skb
      has already a socket assigned in the TPROXY target.  This then results
      in the leak I see.
      
      The very same issue seems to be with IPv6, but haven't tested.
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0fdedfaa
    • Oliver Hartkopp's avatar
      fib_frontend: fix possible NULL pointer dereference · 5c9dfac1
      Oliver Hartkopp authored
      [ Upstream commit a0065f26 ]
      
      The two commits 0115e8e3 (net: remove delay at device dismantle) and
      748e2d93 (net: reinstate rtnl in call_netdevice_notifiers()) silently
      removed a NULL pointer check for in_dev since Linux 3.7.
      
      This patch re-introduces this check as it causes crashing the kernel when
      setting small mtu values on non-ip capable netdevices.
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c9dfac1