Commits · 0534af01cca338193abbfdb53650af91e65dbf10 · Kirill Smelkov / linux

11 Apr, 2014 1 commit

x86, calgary: Use 8M TCE table size by default · 0534af01

WANG Chao authored Mar 10, 2014

New kexec-tools wants to pass kdump kernel needed memmap via E820
directly, instead of memmap=exactmap. This makes saved_max_pfn not
be passed down to 2nd kernel. To keep 1st kernel and 2nd kernel using
the same TCE table size, Muli suggest to hard code the size to max (8M).

We can't get rid of saved_max_pfn this time, for backward compatibility
with old first kernel and new second kernel. However new first kernel
and old second kernel can not work unfortunately.

v2->v1:
- retain saved_max_pfn so new 2nd kernel can work with old 1st kernel
  from Vivek
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Muli Ben-Yehuda <mulix@mulix.org>
Acked-by: Jon Mason <jdmason@kudzu.us>
Link: http://lkml.kernel.org/r/1394463120-26999-1-git-send-email-chaowang@redhat.comSigned-off-by: H. Peter Anvin <hpa@linux.intel.com>

0534af01

09 Feb, 2014 3 commits

x86/gpu: Print the Intel graphics stolen memory range · c71ef7b3

Ville Syrjälä authored Feb 05, 2014

Print an informative message when reserving the graphics stolen
memory region in the early quirk.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1391628540-23072-4-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

c71ef7b3

x86/gpu: Add Intel graphics stolen memory quirk for gen2 platforms · a4dff769

Ville Syrjälä authored Feb 05, 2014

There isn't an explicit stolen memory base register on gen2.
Some old comment in the i915 code suggests we should get it via
max_low_pfn_mapped, but that's clearly a bad idea on my MGM.

The e820 map in said machine looks like this:

	BIOS-e820: [mem 0x0000000000000000-0x000000000009f7ff] usable
	BIOS-e820: [mem 0x000000000009f800-0x000000000009ffff] reserved
	BIOS-e820: [mem 0x00000000000ce000-0x00000000000cffff] reserved
	BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved
	BIOS-e820: [mem 0x0000000000100000-0x000000001f6effff] usable
	BIOS-e820: [mem 0x000000001f6f0000-0x000000001f6f7fff] ACPI data
	BIOS-e820: [mem 0x000000001f6f8000-0x000000001f6fffff] ACPI NVS
	BIOS-e820: [mem 0x000000001f700000-0x000000001fffffff] reserved
	BIOS-e820: [mem 0x00000000fec10000-0x00000000fec1ffff] reserved
	BIOS-e820: [mem 0x00000000ffb00000-0x00000000ffbfffff] reserved
	BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved

That makes max_low_pfn_mapped = 1f6f0000, so assuming our stolen
memory would start there would place it on top of some ACPI
memory regions. So not a good idea as already stated.

The 9MB region after the ACPI regions at 0x1f700000 however
looks promising given that the macine reports the stolen memory
size to be 8MB. Looking at the PGTBL_CTL register, the GTT
entries are at offset 0x1fee00000, and given that the GTT
entries occupy 128KB, it looks like the stolen memory could
start at 0x1f700000 and the GTT entries would occupy the last
128KB of the stolen memory.

After some more digging through chipset documentation, I've
determined the BIOS first allocates space for something called
TSEG (something to do with SMM) from the top of memory, and then
it allocates the graphics stolen memory below that. Accordind to
the chipset documentation TSEG has a fixed size of 1MB on 855.
So that explains the top 1MB in the e820 region. And it also
confirms that the GTT entries are in fact at the end of the the
stolen memory region.

Derive the stolen memory base address on gen2 the same as the
BIOS does (TOM-TSEG_SIZE-stolen_size). There are a few
differences between the registers on various gen2 chipsets, so a
few different codepaths are required.

865G is again bit more special since it seems to support enough
memory to hit 4GB address space issues. This means the PCI
allocations will also affect the location of the stolen memory.
Fortunately there appears to be the TOUD register which may give
us the correct answer directly. But the chipset docs are a bit
unclear, so I'm not 100% sure that the graphics stolen memory is
always the last thing the BIOS steals. Someone would need to
verify it on a real system.

I tested this on the my 830 and 855 machines, and so far
everything looks peachy.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1391628540-23072-3-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

a4dff769

x86/gpu: Add vfunc for Intel graphics stolen memory base address · 52ca7045

Ville Syrjälä authored Feb 05, 2014

For gen2 devices we're going to need another way to determine
the stolen memory base address. Make that into a vfunc as well.

Also drop the bogus inline keyword from gen8_stolen_size().
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1391628540-23072-2-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: Ingo Molnar <mingo@kernel.org>

52ca7045

08 Feb, 2014 6 commits

Merge tag 'pinctrl-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 49447903

Linus Torvalds authored Feb 08, 2014

Pull pinctrl fixes from Linus Walleij:
 "First round of pin control fixes for v3.14:

   - Protect pinctrl_list_add() with the proper mutex.  This was
     identified by RedHat.  Caused nasty locking warnings was rootcased
     by Stanislaw Gruszka.

   - Avoid adding dangerous debugfs files when either half of the
     subsystem is unused: pinmux or pinconf.

   - Various fixes to various drivers: locking, hardware particulars, DT
     parsing, error codes"

* tag 'pinctrl-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: tegra: return correct error type
  pinctrl: do not init debugfs entries for unimplemented functionalities
  pinctrl: protect pinctrl_list add
  pinctrl: sirf: correct the pin index of ac97_pins group
  pinctrl: imx27: fix offset calculation in imx_read_2bit
  pinctrl: vt8500: Change devicetree data parsing
  pinctrl: imx27: fix wrong offset to ICONFB
  pinctrl: at91: use locked variant of irq_set_handler

49447903

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c132adef

Linus Torvalds authored Feb 08, 2014

Pull irq fix from Thomas Gleixner:
 "Add a missing Kconfig dependency"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Generic irq chip requires IRQ_DOMAIN

c132adef

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c1ff8431

Linus Torvalds authored Feb 08, 2014

Pull x86 fixes from Peter Anvin:
 "Quite a varied little collection of fixes.  Most of them are
  relatively small or isolated; the biggest one is Mel Gorman's fixes
  for TLB range flushing.

  A couple of AMD-related fixes (including not crashing when given an
  invalid microcode image) and fix a crash when compiled with gcov"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, AMD: Unify valid container checks
  x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y
  x86/efi: Allow mapping BGRT on x86-32
  x86: Fix the initialization of physnode_map
  x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable()
  x86/intel/mid: Fix X86_INTEL_MID dependencies
  arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLIT
  mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge
  x86: mm: change tlb_flushall_shift for IvyBridge
  x86/mm: Eliminate redundant page table walk during TLB range flushing
  x86/mm: Clean up inconsistencies when flushing TLB ranges
  mm, x86: Account for TLB flushes only when debugging
  x86/AMD/NB: Fix amd_set_subcaches() parameter type
  x86/quirks: Add workaround for AMD F16h Erratum792
  x86, doc, kconfig: Fix dud URL for Microcode data

c1ff8431

Merge tag 'jfs-3.14-rc2' of git://github.com/kleikamp/linux-shaggy · ec2e6cb2

Linus Torvalds authored Feb 08, 2014

Pull jfs fix from David Kleikamp:
 "Fix regression"

* tag 'jfs-3.14-rc2' of git://github.com/kleikamp/linux-shaggy:
  jfs: fix generic posix ACL regression

ec2e6cb2

jfs: fix generic posix ACL regression · c18f7b51

Dave Kleikamp authored Feb 07, 2014

I missed a couple errors in reviewing the patches converting jfs
to use the generic posix ACL function. Setting ACL's currently
fails with -EOPNOTSUPP.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

c18f7b51

watchdog: dw_wdt: Add dependency on HAS_IOMEM · 1ccfe6f9

Richard Weinberger authored Jan 31, 2014

On archs like S390 or um this driver cannot build nor work.
Make it depend on HAS_IOMEM to bypass build failures.

drivers/built-in.o: In function `dw_wdt_drv_probe':
drivers/watchdog/dw_wdt.c:302: undefined reference to `devm_ioremap_resource'
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>

1ccfe6f9

07 Feb, 2014 15 commits

Merge tag 'driver-core-3.14-rc2' of... · 34a9bff4

Linus Torvalds authored Feb 07, 2014

Merge tag 'driver-core-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fix from Greg KH:
 "Here is a single kernfs fix to resolve a much-reported lockdep issue
  with the removal of entries in sysfs"

* tag 'driver-core-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag

34a9bff4

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 41f76d8b

Linus Torvalds authored Feb 07, 2014

Pull ceph fixes from Sage Weil:
 "There is an RBD fix for a crash due to the immutable bio changes, an
  error path fix, and a locking fix in the recent redirect support"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: do not dereference a NULL bio pointer
  libceph: take map_sem for read in handle_reply()
  libceph: factor out logic from ceph_osdc_start_request()
  libceph: fix error handling in ceph_osdc_init()

41f76d8b

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 42be3f35

Linus Torvalds authored Feb 07, 2014

Pull arm64 fixes from Catalin Marinas:
 - Relax VDSO alignment requirements so that the kernel-picked one (4K)
   does not conflict with the dynamic linker's one (64K)
 - VDSO gettimeofday fix
 - Barrier fixes for atomic operations and cache flushing
 - TLB invalidation when overriding early page mappings during boot
 - Wired up new 32-bit arm (compat) syscalls
 - LSM_MMAP_MIN_ADDR when COMPAT is enabled
 - defconfig update
 - Clean-up (comments, pgd_alloc).

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: defconfig: Expand default enabled features
  arm64: asm: remove redundant "cc" clobbers
  arm64: atomics: fix use of acquire + release for full barrier semantics
  arm64: barriers: allow dsb macro to take option parameter
  security: select correct default LSM_MMAP_MIN_ADDR on arm on arm64
  arm64: compat: Wire up new AArch32 syscalls
  arm64: vdso: update wtm fields for CLOCK_MONOTONIC_COARSE
  arm64: vdso: fix coarse clock handling
  arm64: simplify pgd_alloc
  arm64: fix typo: s/SERRROR/SERROR/
  arm64: Invalidate the TLB when replacing pmd entries during boot
  arm64: Align CMA sizes to PAGE_SIZE
  arm64: add DSB after icache flush in __flush_icache_all()
  arm64: vdso: prevent ld from aligning PT_LOAD segments to 64k

42be3f35

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · d94d0e27

Linus Torvalds authored Feb 07, 2014

Pull MIPS updates from Ralf Baechle:
 "hree minor patches.  All have sat in -next for a few days"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: fpu.h: Fix build when CONFIG_BUG is not set
  MIPS: Wire up sched_setattr/sched_getattr syscalls
  MIPS: Alchemy: Fix DB1100 GPIO registration

d94d0e27

Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 3e382dd9

Linus Torvalds authored Feb 07, 2014

Pull media fixes from Mauro Carvalho Chehab:
 "A series of small fixes.  Mostly driver ones.  There is one core
  regression fix on a patch that was meant to fix some race issues on
  vb2, but that actually caused more harm than good.  So, we're just
  reverting it for now"

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] adv7842: Composite free-run platfrom-data fix
  [media] v4l2-dv-timings: fix GTF calculation
  [media] hdpvr: Fix memory leak in debug
  [media] af9035: add ID [2040:f900] Hauppauge WinTV-MiniStick 2
  [media] mxl111sf: Fix compile when CONFIG_DVB_USB_MXL111SF is unset
  [media] mxl111sf: Fix unintentional garbage stack read
  [media] cx24117: use a valid dev pointer for dev_err printout
  [media] cx24117: remove dead code in always 'false' if statement
  [media] update Michael Krufky's email address
  [media] vb2: Check if there are buffers before streamon
  [media] Revert "[media] videobuf_vm_{open,close} race fixes"
  [media] go7007-loader: fix usb_dev leak
  [media] media: bt8xx: add missing put_device call
  [media] exynos4-is: Compile in fimc-lite runtime PM callbacks conditionally
  [media] exynos4-is: Compile in fimc runtime PM callbacks conditionally
  [media] exynos4-is: Fix error paths in probe() for !pm_runtime_enabled()
  [media] s5p-jpeg: Fix wrong NV12 format parameters
  [media] s5k5baf: allow to handle arbitrary long i2c sequences

3e382dd9

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · 2091f435

Linus Torvalds authored Feb 07, 2014

Pull hwmon fixes from Guenter Roeck:
 "Fix PMBus driver problem with some multi-page voltage sensors and fix
  da9055 interrupt initialization"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (da9055) Remove use of regmap_irq_get_virq()
  hwmon: (pmbus) Support per-page exponent in linear mode

2091f435

Merge tag 'pm+acpi-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 22446d3f

Linus Torvalds authored Feb 07, 2014

Pull ACPI and power management fixes from Rafael Wysocki:
 "These include a fix for a recent ACPI hotplug regression, four
  concurrency related fixes and one PCI device removal fix for
  ACPI-based PCI hotplug (ACPIPHP), intel_pstate fix that should go into
  stable, three simple ACPI cleanups and a new entry for the ACPI video
  blacklist.

  Specifics:

   - Fix for a recent ACPI hotplug regression causing a NULL pointer
     dereference to occur while handling ACPI eject notifications for
     already ejected devices.  From Toshi Kani.

   - Four concurrency-related fixes for ACPIPHP.  Two of them add
     missing locking and the other two fix race conditions related to
     reference counting.

   - ACPIPHP fix to avoid NULL pointer dereferences during device
     removal involving Virtual Funcions.

   - intel_pstate fix to make it compute the percentage of time the CPU
     is busy properly.  From Dirk Brandewie.

   - Removal of two unnecessary NULL pointer checks in ACPI code and a
     fix for sscanf() format string from Dan Carpenter and Luis G.F.

   - New ACPI video blacklist entry for HP EliteBook Revolve 810 from
     Mika Westerberg"

* tag 'pm+acpi-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / hotplug: Fix panic on eject to ejected device
  ACPI / battery: Fix incorrect sscanf() string in acpi_battery_init_alarm()
  ACPI / proc: remove unneeded NULL check
  ACPI / utils: remove a pointless NULL check
  ACPI / video: Add HP EliteBook Revolve 810 to the blacklist
  intel_pstate: Take core C0 time into account for core busy calculation
  ACPI / hotplug / PCI: Fix bridge removal race vs dock events
  ACPI / hotplug / PCI: Fix bridge removal race in handle_hotplug_event()
  ACPI / hotplug / PCI: Scan root bus under the PCI rescan-remove lock
  ACPI / hotplug / PCI: Move PCI rescan-remove locking to hotplug_event()
  ACPI / hotplug / PCI: Remove entries from bus->devices in reverse order

22446d3f

libceph: do not dereference a NULL bio pointer · 0ec1d15e

Ilya Dryomov authored Feb 05, 2014

Commit f38a5181 ("ceph: Convert to immutable biovecs") introduced
a NULL pointer dereference, which broke rbd in -rc1.  Fix it.

Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

0ec1d15e

Merge tag 'efi-urgent' into x86/urgent · a3b072cd

H. Peter Anvin authored Feb 07, 2014

 * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit).
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

a3b072cd

libceph: take map_sem for read in handle_reply() · ff513ace

Ilya Dryomov authored Feb 03, 2014

Handling redirect replies requires both map_sem and request_mutex.
Taking map_sem unconditionally near the top of handle_reply() avoids
possible race conditions that arise from releasing request_mutex to be
able to acquire map_sem in redirect reply case.  (Lock ordering is:
map_sem, request_mutex, crush_mutex.)
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

ff513ace

libceph: factor out logic from ceph_osdc_start_request() · 0bbfdfe8

Ilya Dryomov authored Jan 31, 2014

Factor out logic from ceph_osdc_start_request() into a new helper,
__ceph_osdc_start_request().  ceph_osdc_start_request() now amounts to
taking locks and calling __ceph_osdc_start_request().
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>

0bbfdfe8

arm64: defconfig: Expand default enabled features · 55834a77

Mark Rutland authored Feb 07, 2014

FPGA implementations of the Cortex-A57 and Cortex-A53 are now available
in the form of the SMM-A57 and SMM-A53 Soft Macrocell Models (SMMs) for
Versatile Express. As these attach to a Motherboard Express V2M-P1 it
would be useful to have support for some V2M-P1 peripherals enabled by
default.

Additionally a couple of of features have been introduced since the last
defconfig update (CMA, jump labels) that would be good to have enabled
by default to ensure they are build and boot tested.

This patch updates the arm64 defconfig to enable support for these
devices and features. The arm64 Kconfig is modified to select
HAVE_PATA_PLATFORM, which is required to enable support for the
CompactFlash controller on the V2M-P1.

A few options which don't need to appear in defconfig are trimmed:

* BLK_DEV - selected by default
* EXPERIMENTAL - otherwise gone from the kernel
* MII - selected by drivers which require it
* USB_SUPPORT - selected by default
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

55834a77

arm64: asm: remove redundant "cc" clobbers · 95c41896

Will Deacon authored Feb 04, 2014

cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
from inline asm blocks that only use these instructions to implement
conditional branches.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

95c41896

arm64: atomics: fix use of acquire + release for full barrier semantics · 8e86f0b4

Will Deacon authored Feb 04, 2014

Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.

On arm64, these operations have been incorrectly implemented as follows:

	// A, B, C are independent memory locations

	<Access [A]>

	// atomic_op (B)
1:	ldaxr	x0, [B]		// Exclusive load with acquire
	<op(B)>
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b

	<Access [C]>

The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).

Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.

The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:

	<Access [A]>

	// atomic_op (B)
	dmb	ish		// Full barrier
1:	ldxr	x0, [B]		// Exclusive load
	<op(B)>
	stxr	w1, x0, [B]	// Exclusive store
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	<Access [C]>

but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:

	<Access [A]>

	// atomic_op (B)
1:	ldxr	x0, [B]		// Exclusive load
	<op(B)>
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	<Access [C]>

The simple observations here are:

  - The dmb ensures that no subsequent accesses (e.g. the access to C)
    can enter or pass the atomic sequence.

  - The dmb also ensures that no prior accesses (e.g. the access to A)
    can pass the atomic sequence.

  - Therefore, no prior access can pass a subsequent access, or
    vice-versa (i.e. A is strictly ordered before C).

  - The stlxr ensures that no prior access can pass the store component
    of the atomic operation.

The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.

From an (arbitrary) observer's point of view, there are two scenarios:

  1. We have observed the ldxr. This means that if we perform a store to
     [B], the ldxr will still return older data. If we can observe the
     ldxr, then we can potentially observe the permitted re-ordering
     with the access to A, which is clearly an issue when compared to
     the dmb variant of the code. Thankfully, the exclusive monitor will
     save us here since it will be cleared as a result of the store and
     the ldxr will retry. Notice that any use of a later memory
     observation to imply observation of the ldxr will also imply
     observation of the access to A, since the stlxr/dmb ensure strict
     ordering.

  2. We have not observed the ldxr. This means we can perform a store
     and influence the later ldxr. However, that doesn't actually tell
     us anything about the access to [A], so we've not lost anything
     here either when compared to the dmb variant.

This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.

Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

8e86f0b4

hwmon: (da9055) Remove use of regmap_irq_get_virq() · 4f545a4b

Adam Thomson authored Feb 06, 2014

Remove use of regmap_irq_get_virq() in driver probe which was
conflicting with use of platform_get_irq_byname().
platform_get_irq_byname() already returns the VIRQ number due
to MFD core translation so using regmap_irq_get_virq() on that
returned value results in an incorrect IRQ being requested.
The driver probes then fail because of this.
Signed-off-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

4f545a4b

06 Feb, 2014 15 commits

Merge branches 'acpi-cleanup' and 'acpi-video' · a2ff34c4

Rafael J. Wysocki authored Feb 06, 2014

* acpi-cleanup:
  ACPI / battery: Fix incorrect sscanf() string in acpi_battery_init_alarm()
  ACPI / proc: remove unneeded NULL check
  ACPI / utils: remove a pointless NULL check

* acpi-video:
  ACPI / video: Add HP EliteBook Revolve 810 to the blacklist

a2ff34c4

Merge branch 'pm-cpufreq' · 7fd90506

Rafael J. Wysocki authored Feb 06, 2014

* pm-cpufreq:
  intel_pstate: Take core C0 time into account for core busy calculation

7fd90506

Merge branches 'acpi-pci-hotplug' and 'acpi-hotplug' · 93e73711

Rafael J. Wysocki authored Feb 06, 2014

* acpi-pci-hotplug:
  ACPI / hotplug / PCI: Fix bridge removal race vs dock events
  ACPI / hotplug / PCI: Fix bridge removal race in handle_hotplug_event()
  ACPI / hotplug / PCI: Scan root bus under the PCI rescan-remove lock
  ACPI / hotplug / PCI: Move PCI rescan-remove locking to hotplug_event()
  ACPI / hotplug / PCI: Remove entries from bus->devices in reverse order

* acpi-hotplug:
  ACPI / hotplug: Fix panic on eject to ejected device

93e73711

Merge branch 'akpm' (patches from Andrew Morton) · 9343224b

Linus Torvalds authored Feb 06, 2014

Merge a bunch of fixes from Andrew Morton:
 "Commit 579f8290 ("swap: add a simple detector for inappropriate
  swapin readahead") is a feature.  No probs if you decide to defer it
  until the next merge window.

  It has been sitting in my tree for over a year because of my dislike
  of all the magic numbers, but recent discussion with Hugh has made me
  give up"

* emailed patches fron Andrew Morton <akpm@linux-foundation.org>:
  mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq
  arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved.
  arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug()
  mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq()
  mm/swap: fix race on swap_info reuse between swapoff and swapon
  swap: add a simple detector for inappropriate swapin readahead
  ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters
  Documentation/kernel-parameters.txt: fix memmap= language

9343224b

mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq · 227d53b3

KOSAKI Motohiro authored Feb 06, 2014

To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
During aio buffer migration, we have a possibility to see the following
call stack.

aio_migratepage  [disable interrupt]
  migrate_page_copy
    clear_page_dirty_for_io
      set_page_dirty
        __set_page_dirty_buffers
          __set_page_dirty
            spin_lock_irq

This mean, current aio migration is a deadlockable.  spin_lock_irqsave
is a safer alternative and we should use it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Rientjes rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

227d53b3

arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved. · 7bc35fdd

Tang Chen authored Feb 06, 2014

The following path will cause array out of bound.

memblock_add_region() will always set nid in memblock.reserved to
MAX_NUMNODES.  In numa_register_memblks(), after we set all nid to
correct valus in memblock.reserved, we called setup_node_data(), and
used memblock_alloc_nid() to allocate memory, with nid set to
MAX_NUMNODES.

The nodemask_t type can be seen as a bit array.  And the index is 0 ~
MAX_NUMNODES-1.

After that, when we call node_set() in numa_clear_kernel_node_hotplug(),
the nodemask_t got an index of value MAX_NUMNODES, which is out of [0 ~
MAX_NUMNODES-1].

See below:

numa_init()
 |---> numa_register_memblks()
 |      |---> memblock_set_node(memory)		set correct nid in memblock.memory
 |      |---> memblock_set_node(reserved)	set correct nid in memblock.reserved
 |      |......
 |      |---> setup_node_data()
 |             |---> memblock_alloc_nid()	here, nid is set to MAX_NUMNODES (1024)
 |......
 |---> numa_clear_kernel_node_hotplug()
        |---> node_set()			here, we have an index 1024, and overflowed

This patch moves nid setting to numa_clear_kernel_node_hotplug() to fix
this problem.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reported-by: Dave Jones <davej@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Tested-by: Dave Jones <davej@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7bc35fdd

arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug() · 017c217a

Tang Chen authored Feb 06, 2014

On-stack variable numa_kernel_nodes in numa_clear_kernel_node_hotplug()
was not initialized.  So we need to initialize it.

[akpm@linux-foundation.org: use NODE_MASK_NONE, per David]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: David Rientjes <rientjes@google.com>
Tested-by: Dave Jones <davej@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

017c217a

mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() · a85d9df1

KOSAKI Motohiro authored Feb 06, 2014

During aio stress test, we observed the following lockdep warning.  This
mean AIO+numa_balancing is currently deadlockable.

The problem is, aio_migratepage disable interrupt, but
__set_page_dirty_nobuffers unintentionally enable it again.

Generally, all helper function should use spin_lock_irqsave() instead of
spin_lock_irq() because they don't know caller at all.

   other info that might help us debug this:
    Possible unsafe locking scenario:

          CPU0
          ----
     lock(&(&ctx->completion_lock)->rlock);
     <Interrupt>
       lock(&(&ctx->completion_lock)->rlock);

    *** DEADLOCK ***

      dump_stack+0x19/0x1b
      print_usage_bug+0x1f7/0x208
      mark_lock+0x21d/0x2a0
      mark_held_locks+0xb9/0x140
      trace_hardirqs_on_caller+0x105/0x1d0
      trace_hardirqs_on+0xd/0x10
      _raw_spin_unlock_irq+0x2c/0x50
      __set_page_dirty_nobuffers+0x8c/0xf0
      migrate_page_copy+0x434/0x540
      aio_migratepage+0xb1/0x140
      move_to_new_page+0x7d/0x230
      migrate_pages+0x5e5/0x700
      migrate_misplaced_page+0xbc/0xf0
      do_numa_page+0x102/0x190
      handle_pte_fault+0x241/0x970
      handle_mm_fault+0x265/0x370
      __do_page_fault+0x172/0x5a0
      do_page_fault+0x1a/0x70
      page_fault+0x28/0x30
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a85d9df1

mm/swap: fix race on swap_info reuse between swapoff and swapon · f893ab41

Weijie Yang authored Feb 06, 2014

swapoff clear swap_info's SWP_USED flag prematurely and free its
resources after that.  A concurrent swapon will reuse this swap_info
while its previous resources are not cleared completely.

These late freed resources are:
 - p->percpu_cluster
 - swap_cgroup_ctrl[type]
 - block_device setting
 - inode->i_flags &= ~S_SWAPFILE

This patch clears the SWP_USED flag after all its resources are freed,
so that swapon can reuse this swap_info by alloc_swap_info() safely.

[akpm@linux-foundation.org: tidy up code comment]
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f893ab41

swap: add a simple detector for inappropriate swapin readahead · 579f8290

Shaohua Li authored Feb 06, 2014

This is a patch to improve swap readahead algorithm.  It's from Hugh and
I slightly changed it.

Hugh's original changelog:

swapin readahead does a blind readahead, whether or not the swapin is
sequential.  This may be ok on harddisk, because large reads have
relatively small costs, and if the readahead pages are unneeded they can
be reclaimed easily - though, what if their allocation forced reclaim of
useful pages? But on SSD devices large reads are more expensive than
small ones: if the readahead pages are unneeded, reading them in caused
significant overhead.

This patch adds very simplistic random read detection.  Stealing the
PageReadahead technique from Konstantin Khlebnikov's patch, avoiding the
vma/anon_vma sophistications of Shaohua Li's patch, swapin_nr_pages()
simply looks at readahead's current success rate, and narrows or widens
its readahead window accordingly.  There is little science to its
heuristic: it's about as stupid as can be whilst remaining effective.

The table below shows elapsed times (in centiseconds) when running a
single repetitive swapping load across a 1000MB mapping in 900MB ram
with 1GB swap (the harddisk tests had taken painfully too long when I
used mem=500M, but SSD shows similar results for that).

Vanilla is the 3.6-rc7 kernel on which I started; Shaohua denotes his
Sep 3 patch in mmotm and linux-next; HughOld denotes my Oct 1 patch
which Shaohua showed to be defective; HughNew this Nov 14 patch, with
page_cluster as usual at default of 3 (8-page reads); HughPC4 this same
patch with page_cluster 4 (16-page reads); HughPC0 with page_cluster 0
(1-page reads: no readahead).

HDD for swapping to harddisk, SSD for swapping to VertexII SSD.  Seq for
sequential access to the mapping, cycling five times around; Rand for
the same number of random touches.  Anon for a MAP_PRIVATE anon mapping;
Shmem for a MAP_SHARED anon mapping, equivalent to tmpfs.

One weakness of Shaohua's vma/anon_vma approach was that it did not
optimize Shmem: seen below.  Konstantin's approach was perhaps mistuned,
50% slower on Seq: did not compete and is not shown below.

HDD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
Seq Anon     73921   76210   75611   76904   78191  121542
Seq Shmem    73601   73176   73855   72947   74543  118322
Rand Anon   895392  831243  871569  845197  846496  841680
Rand Shmem 1058375 1053486  827935  764955  764376  756489

SSD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
Seq Anon     24634   24198   24673   25107   21614   70018
Seq Shmem    24959   24932   25052   25703   22030   69678
Rand Anon    43014   26146   28075   25989   26935   25901
Rand Shmem   45349   45215   28249   24268   24138   24332

These tests are, of course, two extremes of a very simple case: under
heavier mixed loads I've not yet observed any consistent improvement or
degradation, and wider testing would be welcome.

Shaohua Li:

Test shows Vanilla is slightly better in sequential workload than Hugh's
patch.  I observed with Hugh's patch sometimes the readahead size is
shrinked too fast (from 8 to 1 immediately) in sequential workload if
there is no hit.  And in such case, continuing doing readahead is good
actually.

I don't prepare a sophisticated algorithm for the sequential workload
because so far we can't guarantee sequential accessed pages are swap out
sequentially.  So I slightly change Hugh's heuristic - don't shrink
readahead size too fast.

Here is my test result (unit second, 3 runs average):
	Vanilla		Hugh		New
Seq	356		370		360
Random	4525		2447		2444

Attached graph is the swapin/swapout throughput I collected with 'vmstat
2'.  The first part is running a random workload (till around 1200 of
the x-axis) and the second part is running a sequential workload.
swapin and swapout throughput are almost identical in steady state in
both workloads.  These are expected behavior.  while in Vanilla, swapin
is much bigger than swapout especially in random workload (because wrong
readahead).

Original patches by: Shaohua Li and Konstantin Khlebnikov.

[fengguang.wu@intel.com: swapin_nr_pages() can be static]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

579f8290

ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters · fb951eb5

Zongxun Wang authored Feb 06, 2014

Even if using the same jbd2 handle, we cannot rollback a transaction.
So once some error occurs after successfully allocating clusters, the
allocated clusters will never be used and it means they are lost.  For
example, call ocfs2_claim_clusters successfully when expanding a file,
but failed in ocfs2_insert_extent.  So we need free the allocated
clusters if they are not used indeed.
Signed-off-by: Zongxun Wang <wangzongxun@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fb951eb5

Documentation/kernel-parameters.txt: fix memmap= language · 277cba1d

Randy Dunlap authored Feb 06, 2014

Clean up descriptions of memmap= boot options.

Add periods (full stops), drop commas, change "used" to "reserved" or
"marked".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andiry Xu <andiry.xu@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

277cba1d

Merge tag 'sound-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · f2de3a15

Linus Torvalds authored Feb 06, 2014

Pull sound fixes from Takashi Iwai:
 "A few HD-audio fixes and one USB-audio kconfig dependency fix.  All
  small and device-specific changes marked with Cc to stable"

* tag 'sound-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Improve loopback path lookups for AD1983
  ALSA: hda - Fix missing VREF setup for Mac Pro 1,1
  ALSA: hda - Add missing mixer widget for AD1983
  ALSA: hda/realtek - Avoid invalid COEFs for ALC271X
  ALSA: hda - Fix silent output on Toshiba Satellite L40
  ALSA: usb-audio: Add missing kconfig dependecy

f2de3a15

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 65f0505b

Linus Torvalds authored Feb 06, 2014

Pull drm fixes from Dave Airlie:
 "A few regression fixes already, one for my own stupidity, and mgag200
  typo fix, vmwgfx fixes and ttm regression fixes, and a radeon register
  checker update for older cards to handle geom shaders"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon: allow geom rings to be setup on r600/r700 (v2)
  drm/mgag200,ast,cirrus: fix regression with drm_can_sleep conversion
  drm/ttm: Don't clear page metadata of imported sg pages
  drm/ttm: Fix TTM object open regression
  vmwgfx: Fix unitialized stack read in vmw_setup_otable_base
  drm/vmwgfx: Reemit context bindings when necessary v2
  drm/vmwgfx: Detect old user-space drivers and set up legacy emulation v2
  drm/vmwgfx: Emulate legacy shaders on guest-backed devices v2
  drm/vmwgfx: Fix legacy surface reference size copyback
  drm/vmwgfx: Fix SET_SHADER_CONST emulation on guest-backed devices
  drm/vmwgfx: Fix regression caused by "drm/ttm: make ttm reservation calls behave like reservation calls"
  drm/vmwgfx: Don't commit staged bindings if execbuf fails
  drm/mgag200: fix typo causing bw limits to be ignored on some chips

65f0505b

x86, microcode, AMD: Unify valid container checks · 75a1ba5b

Borislav Petkov authored Feb 03, 2014

For additional coverage, BorisO and friends unknowlingly did swap AMD
microcode with Intel microcode blobs in order to see what happens. What
did happen on 32-bit was

[    5.722656] BUG: unable to handle kernel paging request at be3a6008
[    5.722693] IP: [<c106d6b4>] load_microcode_amd+0x24/0x3f0
[    5.722716] *pdpt = 0000000000000000 *pde = 0000000000000000

because there was a valid initrd there but without valid microcode in it
and the container check happened *after* the relocated ramdisk handling
on 32-bit, which was clearly wrong.

While at it, take care of the ramdisk relocation on both 32- and 64-bit
as it is done on both. Also, comment what we're doing because this code
is a bit tricky.
Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391460104-7261-1-git-send-email-bp@alien8.deSigned-off-by: H. Peter Anvin <hpa@zytor.com>

75a1ba5b