1. 09 Feb, 2014 7 commits
  2. 08 Feb, 2014 6 commits
  3. 07 Feb, 2014 15 commits
    • Linus Torvalds's avatar
      Merge tag 'driver-core-3.14-rc2' of... · 34a9bff4
      Linus Torvalds authored
      Merge tag 'driver-core-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core fix from Greg KH:
       "Here is a single kernfs fix to resolve a much-reported lockdep issue
        with the removal of entries in sysfs"
      
      * tag 'driver-core-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag
      34a9bff4
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 41f76d8b
      Linus Torvalds authored
      Pull ceph fixes from Sage Weil:
       "There is an RBD fix for a crash due to the immutable bio changes, an
        error path fix, and a locking fix in the recent redirect support"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
        libceph: do not dereference a NULL bio pointer
        libceph: take map_sem for read in handle_reply()
        libceph: factor out logic from ceph_osdc_start_request()
        libceph: fix error handling in ceph_osdc_init()
      41f76d8b
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 42be3f35
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
       - Relax VDSO alignment requirements so that the kernel-picked one (4K)
         does not conflict with the dynamic linker's one (64K)
       - VDSO gettimeofday fix
       - Barrier fixes for atomic operations and cache flushing
       - TLB invalidation when overriding early page mappings during boot
       - Wired up new 32-bit arm (compat) syscalls
       - LSM_MMAP_MIN_ADDR when COMPAT is enabled
       - defconfig update
       - Clean-up (comments, pgd_alloc).
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: defconfig: Expand default enabled features
        arm64: asm: remove redundant "cc" clobbers
        arm64: atomics: fix use of acquire + release for full barrier semantics
        arm64: barriers: allow dsb macro to take option parameter
        security: select correct default LSM_MMAP_MIN_ADDR on arm on arm64
        arm64: compat: Wire up new AArch32 syscalls
        arm64: vdso: update wtm fields for CLOCK_MONOTONIC_COARSE
        arm64: vdso: fix coarse clock handling
        arm64: simplify pgd_alloc
        arm64: fix typo: s/SERRROR/SERROR/
        arm64: Invalidate the TLB when replacing pmd entries during boot
        arm64: Align CMA sizes to PAGE_SIZE
        arm64: add DSB after icache flush in __flush_icache_all()
        arm64: vdso: prevent ld from aligning PT_LOAD segments to 64k
      42be3f35
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · d94d0e27
      Linus Torvalds authored
      Pull MIPS updates from Ralf Baechle:
       "hree minor patches.  All have sat in -next for a few days"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: fpu.h: Fix build when CONFIG_BUG is not set
        MIPS: Wire up sched_setattr/sched_getattr syscalls
        MIPS: Alchemy: Fix DB1100 GPIO registration
      d94d0e27
    • Linus Torvalds's avatar
      Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 3e382dd9
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "A series of small fixes.  Mostly driver ones.  There is one core
        regression fix on a patch that was meant to fix some race issues on
        vb2, but that actually caused more harm than good.  So, we're just
        reverting it for now"
      
      * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        [media] adv7842: Composite free-run platfrom-data fix
        [media] v4l2-dv-timings: fix GTF calculation
        [media] hdpvr: Fix memory leak in debug
        [media] af9035: add ID [2040:f900] Hauppauge WinTV-MiniStick 2
        [media] mxl111sf: Fix compile when CONFIG_DVB_USB_MXL111SF is unset
        [media] mxl111sf: Fix unintentional garbage stack read
        [media] cx24117: use a valid dev pointer for dev_err printout
        [media] cx24117: remove dead code in always 'false' if statement
        [media] update Michael Krufky's email address
        [media] vb2: Check if there are buffers before streamon
        [media] Revert "[media] videobuf_vm_{open,close} race fixes"
        [media] go7007-loader: fix usb_dev leak
        [media] media: bt8xx: add missing put_device call
        [media] exynos4-is: Compile in fimc-lite runtime PM callbacks conditionally
        [media] exynos4-is: Compile in fimc runtime PM callbacks conditionally
        [media] exynos4-is: Fix error paths in probe() for !pm_runtime_enabled()
        [media] s5p-jpeg: Fix wrong NV12 format parameters
        [media] s5k5baf: allow to handle arbitrary long i2c sequences
      3e382dd9
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · 2091f435
      Linus Torvalds authored
      Pull hwmon fixes from Guenter Roeck:
       "Fix PMBus driver problem with some multi-page voltage sensors and fix
        da9055 interrupt initialization"
      
      * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (da9055) Remove use of regmap_irq_get_virq()
        hwmon: (pmbus) Support per-page exponent in linear mode
      2091f435
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 22446d3f
      Linus Torvalds authored
      Pull ACPI and power management fixes from Rafael Wysocki:
       "These include a fix for a recent ACPI hotplug regression, four
        concurrency related fixes and one PCI device removal fix for
        ACPI-based PCI hotplug (ACPIPHP), intel_pstate fix that should go into
        stable, three simple ACPI cleanups and a new entry for the ACPI video
        blacklist.
      
        Specifics:
      
         - Fix for a recent ACPI hotplug regression causing a NULL pointer
           dereference to occur while handling ACPI eject notifications for
           already ejected devices.  From Toshi Kani.
      
         - Four concurrency-related fixes for ACPIPHP.  Two of them add
           missing locking and the other two fix race conditions related to
           reference counting.
      
         - ACPIPHP fix to avoid NULL pointer dereferences during device
           removal involving Virtual Funcions.
      
         - intel_pstate fix to make it compute the percentage of time the CPU
           is busy properly.  From Dirk Brandewie.
      
         - Removal of two unnecessary NULL pointer checks in ACPI code and a
           fix for sscanf() format string from Dan Carpenter and Luis G.F.
      
         - New ACPI video blacklist entry for HP EliteBook Revolve 810 from
           Mika Westerberg"
      
      * tag 'pm+acpi-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / hotplug: Fix panic on eject to ejected device
        ACPI / battery: Fix incorrect sscanf() string in acpi_battery_init_alarm()
        ACPI / proc: remove unneeded NULL check
        ACPI / utils: remove a pointless NULL check
        ACPI / video: Add HP EliteBook Revolve 810 to the blacklist
        intel_pstate: Take core C0 time into account for core busy calculation
        ACPI / hotplug / PCI: Fix bridge removal race vs dock events
        ACPI / hotplug / PCI: Fix bridge removal race in handle_hotplug_event()
        ACPI / hotplug / PCI: Scan root bus under the PCI rescan-remove lock
        ACPI / hotplug / PCI: Move PCI rescan-remove locking to hotplug_event()
        ACPI / hotplug / PCI: Remove entries from bus->devices in reverse order
      22446d3f
    • Ilya Dryomov's avatar
      libceph: do not dereference a NULL bio pointer · 0ec1d15e
      Ilya Dryomov authored
      Commit f38a5181 ("ceph: Convert to immutable biovecs") introduced
      a NULL pointer dereference, which broke rbd in -rc1.  Fix it.
      
      Cc: Kent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      0ec1d15e
    • H. Peter Anvin's avatar
      Merge tag 'efi-urgent' into x86/urgent · a3b072cd
      H. Peter Anvin authored
       * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit).
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      a3b072cd
    • Ilya Dryomov's avatar
      libceph: take map_sem for read in handle_reply() · ff513ace
      Ilya Dryomov authored
      Handling redirect replies requires both map_sem and request_mutex.
      Taking map_sem unconditionally near the top of handle_reply() avoids
      possible race conditions that arise from releasing request_mutex to be
      able to acquire map_sem in redirect reply case.  (Lock ordering is:
      map_sem, request_mutex, crush_mutex.)
      Signed-off-by: default avatarIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      ff513ace
    • Ilya Dryomov's avatar
      libceph: factor out logic from ceph_osdc_start_request() · 0bbfdfe8
      Ilya Dryomov authored
      Factor out logic from ceph_osdc_start_request() into a new helper,
      __ceph_osdc_start_request().  ceph_osdc_start_request() now amounts to
      taking locks and calling __ceph_osdc_start_request().
      Signed-off-by: default avatarIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      0bbfdfe8
    • Mark Rutland's avatar
      arm64: defconfig: Expand default enabled features · 55834a77
      Mark Rutland authored
      FPGA implementations of the Cortex-A57 and Cortex-A53 are now available
      in the form of the SMM-A57 and SMM-A53 Soft Macrocell Models (SMMs) for
      Versatile Express. As these attach to a Motherboard Express V2M-P1 it
      would be useful to have support for some V2M-P1 peripherals enabled by
      default.
      
      Additionally a couple of of features have been introduced since the last
      defconfig update (CMA, jump labels) that would be good to have enabled
      by default to ensure they are build and boot tested.
      
      This patch updates the arm64 defconfig to enable support for these
      devices and features. The arm64 Kconfig is modified to select
      HAVE_PATA_PLATFORM, which is required to enable support for the
      CompactFlash controller on the V2M-P1.
      
      A few options which don't need to appear in defconfig are trimmed:
      
      * BLK_DEV - selected by default
      * EXPERIMENTAL - otherwise gone from the kernel
      * MII - selected by drivers which require it
      * USB_SUPPORT - selected by default
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      55834a77
    • Will Deacon's avatar
      arm64: asm: remove redundant "cc" clobbers · 95c41896
      Will Deacon authored
      cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
      from inline asm blocks that only use these instructions to implement
      conditional branches.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      95c41896
    • Will Deacon's avatar
      arm64: atomics: fix use of acquire + release for full barrier semantics · 8e86f0b4
      Will Deacon authored
      Linux requires a number of atomic operations to provide full barrier
      semantics, that is no memory accesses after the operation can be
      observed before any accesses up to and including the operation in
      program order.
      
      On arm64, these operations have been incorrectly implemented as follows:
      
      	// A, B, C are independent memory locations
      
      	<Access [A]>
      
      	// atomic_op (B)
      1:	ldaxr	x0, [B]		// Exclusive load with acquire
      	<op(B)>
      	stlxr	w1, x0, [B]	// Exclusive store with release
      	cbnz	w1, 1b
      
      	<Access [C]>
      
      The assumption here being that two half barriers are equivalent to a
      full barrier, so the only permitted ordering would be A -> B -> C
      (where B is the atomic operation involving both a load and a store).
      
      Unfortunately, this is not the case by the letter of the architecture
      and, in fact, the accesses to A and C are permitted to pass their
      nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
      or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
      store-release on B). This is a clear violation of the full barrier
      requirement.
      
      The simple way to fix this is to implement the same algorithm as ARMv7
      using explicit barriers:
      
      	<Access [A]>
      
      	// atomic_op (B)
      	dmb	ish		// Full barrier
      1:	ldxr	x0, [B]		// Exclusive load
      	<op(B)>
      	stxr	w1, x0, [B]	// Exclusive store
      	cbnz	w1, 1b
      	dmb	ish		// Full barrier
      
      	<Access [C]>
      
      but this has the undesirable effect of introducing *two* full barrier
      instructions. A better approach is actually the following, non-intuitive
      sequence:
      
      	<Access [A]>
      
      	// atomic_op (B)
      1:	ldxr	x0, [B]		// Exclusive load
      	<op(B)>
      	stlxr	w1, x0, [B]	// Exclusive store with release
      	cbnz	w1, 1b
      	dmb	ish		// Full barrier
      
      	<Access [C]>
      
      The simple observations here are:
      
        - The dmb ensures that no subsequent accesses (e.g. the access to C)
          can enter or pass the atomic sequence.
      
        - The dmb also ensures that no prior accesses (e.g. the access to A)
          can pass the atomic sequence.
      
        - Therefore, no prior access can pass a subsequent access, or
          vice-versa (i.e. A is strictly ordered before C).
      
        - The stlxr ensures that no prior access can pass the store component
          of the atomic operation.
      
      The only tricky part remaining is the ordering between the ldxr and the
      access to A, since the absence of the first dmb means that we're now
      permitting re-ordering between the ldxr and any prior accesses.
      
      From an (arbitrary) observer's point of view, there are two scenarios:
      
        1. We have observed the ldxr. This means that if we perform a store to
           [B], the ldxr will still return older data. If we can observe the
           ldxr, then we can potentially observe the permitted re-ordering
           with the access to A, which is clearly an issue when compared to
           the dmb variant of the code. Thankfully, the exclusive monitor will
           save us here since it will be cleared as a result of the store and
           the ldxr will retry. Notice that any use of a later memory
           observation to imply observation of the ldxr will also imply
           observation of the access to A, since the stlxr/dmb ensure strict
           ordering.
      
        2. We have not observed the ldxr. This means we can perform a store
           and influence the later ldxr. However, that doesn't actually tell
           us anything about the access to [A], so we've not lost anything
           here either when compared to the dmb variant.
      
      This patch implements this solution for our barriered atomic operations,
      ensuring that we satisfy the full barrier requirements where they are
      needed.
      
      Cc: <stable@vger.kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      8e86f0b4
    • Adam Thomson's avatar
      hwmon: (da9055) Remove use of regmap_irq_get_virq() · 4f545a4b
      Adam Thomson authored
      Remove use of regmap_irq_get_virq() in driver probe which was
      conflicting with use of platform_get_irq_byname().
      platform_get_irq_byname() already returns the VIRQ number due
      to MFD core translation so using regmap_irq_get_virq() on that
      returned value results in an incorrect IRQ being requested.
      The driver probes then fail because of this.
      Signed-off-by: default avatarAdam Thomson <Adam.Thomson.Opensource@diasemi.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      4f545a4b
  4. 06 Feb, 2014 12 commits
    • Rafael J. Wysocki's avatar
      Merge branches 'acpi-cleanup' and 'acpi-video' · a2ff34c4
      Rafael J. Wysocki authored
      * acpi-cleanup:
        ACPI / battery: Fix incorrect sscanf() string in acpi_battery_init_alarm()
        ACPI / proc: remove unneeded NULL check
        ACPI / utils: remove a pointless NULL check
      
      * acpi-video:
        ACPI / video: Add HP EliteBook Revolve 810 to the blacklist
      a2ff34c4
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-cpufreq' · 7fd90506
      Rafael J. Wysocki authored
      * pm-cpufreq:
        intel_pstate: Take core C0 time into account for core busy calculation
      7fd90506
    • Rafael J. Wysocki's avatar
      Merge branches 'acpi-pci-hotplug' and 'acpi-hotplug' · 93e73711
      Rafael J. Wysocki authored
      * acpi-pci-hotplug:
        ACPI / hotplug / PCI: Fix bridge removal race vs dock events
        ACPI / hotplug / PCI: Fix bridge removal race in handle_hotplug_event()
        ACPI / hotplug / PCI: Scan root bus under the PCI rescan-remove lock
        ACPI / hotplug / PCI: Move PCI rescan-remove locking to hotplug_event()
        ACPI / hotplug / PCI: Remove entries from bus->devices in reverse order
      
      * acpi-hotplug:
        ACPI / hotplug: Fix panic on eject to ejected device
      93e73711
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew Morton) · 9343224b
      Linus Torvalds authored
      Merge a bunch of fixes from Andrew Morton:
       "Commit 579f8290 ("swap: add a simple detector for inappropriate
        swapin readahead") is a feature.  No probs if you decide to defer it
        until the next merge window.
      
        It has been sitting in my tree for over a year because of my dislike
        of all the magic numbers, but recent discussion with Hugh has made me
        give up"
      
      * emailed patches fron Andrew Morton <akpm@linux-foundation.org>:
        mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq
        arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved.
        arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug()
        mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq()
        mm/swap: fix race on swap_info reuse between swapoff and swapon
        swap: add a simple detector for inappropriate swapin readahead
        ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters
        Documentation/kernel-parameters.txt: fix memmap= language
      9343224b
    • KOSAKI Motohiro's avatar
      mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq · 227d53b3
      KOSAKI Motohiro authored
      To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
      During aio buffer migration, we have a possibility to see the following
      call stack.
      
      aio_migratepage  [disable interrupt]
        migrate_page_copy
          clear_page_dirty_for_io
            set_page_dirty
              __set_page_dirty_buffers
                __set_page_dirty
                  spin_lock_irq
      
      This mean, current aio migration is a deadlockable.  spin_lock_irqsave
      is a safer alternative and we should use it.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: David Rientjes rientjes@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      227d53b3
    • Tang Chen's avatar
      arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved. · 7bc35fdd
      Tang Chen authored
      The following path will cause array out of bound.
      
      memblock_add_region() will always set nid in memblock.reserved to
      MAX_NUMNODES.  In numa_register_memblks(), after we set all nid to
      correct valus in memblock.reserved, we called setup_node_data(), and
      used memblock_alloc_nid() to allocate memory, with nid set to
      MAX_NUMNODES.
      
      The nodemask_t type can be seen as a bit array.  And the index is 0 ~
      MAX_NUMNODES-1.
      
      After that, when we call node_set() in numa_clear_kernel_node_hotplug(),
      the nodemask_t got an index of value MAX_NUMNODES, which is out of [0 ~
      MAX_NUMNODES-1].
      
      See below:
      
      numa_init()
       |---> numa_register_memblks()
       |      |---> memblock_set_node(memory)		set correct nid in memblock.memory
       |      |---> memblock_set_node(reserved)	set correct nid in memblock.reserved
       |      |......
       |      |---> setup_node_data()
       |             |---> memblock_alloc_nid()	here, nid is set to MAX_NUMNODES (1024)
       |......
       |---> numa_clear_kernel_node_hotplug()
              |---> node_set()			here, we have an index 1024, and overflowed
      
      This patch moves nid setting to numa_clear_kernel_node_hotplug() to fix
      this problem.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Tested-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Tested-by: default avatarDave Jones <davej@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7bc35fdd
    • Tang Chen's avatar
      arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug() · 017c217a
      Tang Chen authored
      On-stack variable numa_kernel_nodes in numa_clear_kernel_node_hotplug()
      was not initialized.  So we need to initialize it.
      
      [akpm@linux-foundation.org: use NODE_MASK_NONE, per David]
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Tested-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Reported-by: default avatarDavid Rientjes <rientjes@google.com>
      Tested-by: default avatarDave Jones <davej@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      017c217a
    • KOSAKI Motohiro's avatar
      mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() · a85d9df1
      KOSAKI Motohiro authored
      During aio stress test, we observed the following lockdep warning.  This
      mean AIO+numa_balancing is currently deadlockable.
      
      The problem is, aio_migratepage disable interrupt, but
      __set_page_dirty_nobuffers unintentionally enable it again.
      
      Generally, all helper function should use spin_lock_irqsave() instead of
      spin_lock_irq() because they don't know caller at all.
      
         other info that might help us debug this:
          Possible unsafe locking scenario:
      
                CPU0
                ----
           lock(&(&ctx->completion_lock)->rlock);
           <Interrupt>
             lock(&(&ctx->completion_lock)->rlock);
      
          *** DEADLOCK ***
      
            dump_stack+0x19/0x1b
            print_usage_bug+0x1f7/0x208
            mark_lock+0x21d/0x2a0
            mark_held_locks+0xb9/0x140
            trace_hardirqs_on_caller+0x105/0x1d0
            trace_hardirqs_on+0xd/0x10
            _raw_spin_unlock_irq+0x2c/0x50
            __set_page_dirty_nobuffers+0x8c/0xf0
            migrate_page_copy+0x434/0x540
            aio_migratepage+0xb1/0x140
            move_to_new_page+0x7d/0x230
            migrate_pages+0x5e5/0x700
            migrate_misplaced_page+0xbc/0xf0
            do_numa_page+0x102/0x190
            handle_pte_fault+0x241/0x970
            handle_mm_fault+0x265/0x370
            __do_page_fault+0x172/0x5a0
            do_page_fault+0x1a/0x70
            page_fault+0x28/0x30
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a85d9df1
    • Weijie Yang's avatar
      mm/swap: fix race on swap_info reuse between swapoff and swapon · f893ab41
      Weijie Yang authored
      swapoff clear swap_info's SWP_USED flag prematurely and free its
      resources after that.  A concurrent swapon will reuse this swap_info
      while its previous resources are not cleared completely.
      
      These late freed resources are:
       - p->percpu_cluster
       - swap_cgroup_ctrl[type]
       - block_device setting
       - inode->i_flags &= ~S_SWAPFILE
      
      This patch clears the SWP_USED flag after all its resources are freed,
      so that swapon can reuse this swap_info by alloc_swap_info() safely.
      
      [akpm@linux-foundation.org: tidy up code comment]
      Signed-off-by: default avatarWeijie Yang <weijie.yang@samsung.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f893ab41
    • Shaohua Li's avatar
      swap: add a simple detector for inappropriate swapin readahead · 579f8290
      Shaohua Li authored
      This is a patch to improve swap readahead algorithm.  It's from Hugh and
      I slightly changed it.
      
      Hugh's original changelog:
      
      swapin readahead does a blind readahead, whether or not the swapin is
      sequential.  This may be ok on harddisk, because large reads have
      relatively small costs, and if the readahead pages are unneeded they can
      be reclaimed easily - though, what if their allocation forced reclaim of
      useful pages? But on SSD devices large reads are more expensive than
      small ones: if the readahead pages are unneeded, reading them in caused
      significant overhead.
      
      This patch adds very simplistic random read detection.  Stealing the
      PageReadahead technique from Konstantin Khlebnikov's patch, avoiding the
      vma/anon_vma sophistications of Shaohua Li's patch, swapin_nr_pages()
      simply looks at readahead's current success rate, and narrows or widens
      its readahead window accordingly.  There is little science to its
      heuristic: it's about as stupid as can be whilst remaining effective.
      
      The table below shows elapsed times (in centiseconds) when running a
      single repetitive swapping load across a 1000MB mapping in 900MB ram
      with 1GB swap (the harddisk tests had taken painfully too long when I
      used mem=500M, but SSD shows similar results for that).
      
      Vanilla is the 3.6-rc7 kernel on which I started; Shaohua denotes his
      Sep 3 patch in mmotm and linux-next; HughOld denotes my Oct 1 patch
      which Shaohua showed to be defective; HughNew this Nov 14 patch, with
      page_cluster as usual at default of 3 (8-page reads); HughPC4 this same
      patch with page_cluster 4 (16-page reads); HughPC0 with page_cluster 0
      (1-page reads: no readahead).
      
      HDD for swapping to harddisk, SSD for swapping to VertexII SSD.  Seq for
      sequential access to the mapping, cycling five times around; Rand for
      the same number of random touches.  Anon for a MAP_PRIVATE anon mapping;
      Shmem for a MAP_SHARED anon mapping, equivalent to tmpfs.
      
      One weakness of Shaohua's vma/anon_vma approach was that it did not
      optimize Shmem: seen below.  Konstantin's approach was perhaps mistuned,
      50% slower on Seq: did not compete and is not shown below.
      
      HDD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     73921   76210   75611   76904   78191  121542
      Seq Shmem    73601   73176   73855   72947   74543  118322
      Rand Anon   895392  831243  871569  845197  846496  841680
      Rand Shmem 1058375 1053486  827935  764955  764376  756489
      
      SSD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     24634   24198   24673   25107   21614   70018
      Seq Shmem    24959   24932   25052   25703   22030   69678
      Rand Anon    43014   26146   28075   25989   26935   25901
      Rand Shmem   45349   45215   28249   24268   24138   24332
      
      These tests are, of course, two extremes of a very simple case: under
      heavier mixed loads I've not yet observed any consistent improvement or
      degradation, and wider testing would be welcome.
      
      Shaohua Li:
      
      Test shows Vanilla is slightly better in sequential workload than Hugh's
      patch.  I observed with Hugh's patch sometimes the readahead size is
      shrinked too fast (from 8 to 1 immediately) in sequential workload if
      there is no hit.  And in such case, continuing doing readahead is good
      actually.
      
      I don't prepare a sophisticated algorithm for the sequential workload
      because so far we can't guarantee sequential accessed pages are swap out
      sequentially.  So I slightly change Hugh's heuristic - don't shrink
      readahead size too fast.
      
      Here is my test result (unit second, 3 runs average):
      	Vanilla		Hugh		New
      Seq	356		370		360
      Random	4525		2447		2444
      
      Attached graph is the swapin/swapout throughput I collected with 'vmstat
      2'.  The first part is running a random workload (till around 1200 of
      the x-axis) and the second part is running a sequential workload.
      swapin and swapout throughput are almost identical in steady state in
      both workloads.  These are expected behavior.  while in Vanilla, swapin
      is much bigger than swapout especially in random workload (because wrong
      readahead).
      
      Original patches by: Shaohua Li and Konstantin Khlebnikov.
      
      [fengguang.wu@intel.com: swapin_nr_pages() can be static]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarShaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      579f8290
    • Zongxun Wang's avatar
      ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters · fb951eb5
      Zongxun Wang authored
      Even if using the same jbd2 handle, we cannot rollback a transaction.
      So once some error occurs after successfully allocating clusters, the
      allocated clusters will never be used and it means they are lost.  For
      example, call ocfs2_claim_clusters successfully when expanding a file,
      but failed in ocfs2_insert_extent.  So we need free the allocated
      clusters if they are not used indeed.
      Signed-off-by: default avatarZongxun Wang <wangzongxun@huawei.com>
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Acked-by: default avatarJoel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fb951eb5
    • Randy Dunlap's avatar
      Documentation/kernel-parameters.txt: fix memmap= language · 277cba1d
      Randy Dunlap authored
      Clean up descriptions of memmap= boot options.
      
      Add periods (full stops), drop commas, change "used" to "reserved" or
      "marked".
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Andiry Xu <andiry.xu@gmail.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      277cba1d