Commits · 1292c129597ce42a75d9e97cd312c3242e10a6f3 · Kirill Smelkov / linux

08 Dec, 2011 13 commits

arm/dt: tegra: Fix SDHCI nodes to match board files · 1292c129

Stephen Warren authored Nov 21, 2011

Mark any SDHCI controllers that aren't registered by the board files as
disabled in the device-tree files.

In practice, these controllers:

* Have nothing hooked up to them at all, or
* For ports intended for SDIO usage, the drivers for anything that might
  be attached are not in the device-tree yet. If/when drivers appear, the
  SD/MMC port can be re-enabled.

The only possible exception is TrimSlice's mico SD slot, but that wasn't
enabled in the board files before anyway, and doesn't work when all the
SDHCI controllers are enabled anyway.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

1292c129

arm/dt: tegra: Fix serial nodes to match board files · 31c1ec92

Stephen Warren authored Nov 21, 2011

Mark any serial ports that aren't registered by the board files as disabled
in the device-tree files.

In practice, none of the now-disabled ports ended up succeeding device
probing because of the missing clock-frequency property. However,
explicitly marking the devices disabled has the advantage of squashing
the dev_warn() the failed probe causes, and documenting that we intend
the port not to be used, rather than accidentally left out the property.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

31c1ec92

arm/dt: tegra: Fix I2C nodes to match board files · 88950f3b

Stephen Warren authored Nov 21, 2011

With board files, all I2C busses run at 400KHz. Fix the device-tree
to be consistent with this. It's possible this is incorrect, but at
least it keeps the board files and device-tree consistent.

Also, disable any I2C controllers that the board files don't register,
also for consistency.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

88950f3b

arm/dt: tegra: Remove /chosen node · 492f204d

Stephen Warren authored Nov 21, 2011

The command-lines present in the existing /chosen node are not necessarily
correct for all users. Ideally, we should simply use the command-line
supplied by the boot-loader.

In fact, using the boot-loader's cmdline is quite easy; either the
bootloader fully supports DT, in which case it can modify the DT passed
to the kernel to include its command-line, or CONFIG_APPENDED_DTB can
be used in conjunction with CONFIG_ARM_ATAG_DTB_COMPAT, and the kernel
will substitute the bootloader's command-line into the DT.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

492f204d

arm/dt: tegra: Remove /memreserve/ from device-tree files · 5a854265

Stephen Warren authored Nov 21, 2011

There are no drivers in the kernel at present which can make use of the
memory reserved by /memreserve/, so there is no point reserving it. Remove
/memreserve/ to allow the user more memory. It's also unclear whether any
future driver would actually require /memreserve/, or allocate memory
through some other mechanism.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

5a854265

arm/tegra: board-dt: Enable audio-related clocks · 586187e2

Stephen Warren authored Dec 07, 2011

Certain clocks are required for core audio functionality. Set up the
appropriate parenting relationships, and enable clocks that must be
on permanently.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

586187e2

arm/tegra: board-dt: Fix AUXDATA typo · f110164e

Stephen Warren authored Dec 07, 2011

Fix the address of the I2S2 controller in the AUXDATA table.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

f110164e

arm/dt: tegra: add dts file for paz00 · cc2afa43

Marc Dietrich authored Nov 01, 2011

This adds a dts file for paz00. As a side effect, this also enables
the embedded controller which controls the keyboard, touchpad, power,
leds, and some other functions.

Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Marc Dietrich <marvin24@gmx.de>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

cc2afa43

arm/tegra: Add device-tree support for TrimSlice board · a7db2c15

Stephen Warren authored Oct 25, 2011

* Add device-tree file for TrimSlice
* Add that to the list of .dts files to build
* Update board-dt.c to recognize TrimSlice board name

v2: Makefile: Add board-trimslice-pinmux.c to obj-$(CONFIG_MACH_TEGRA_DT).
v3: Makefile: Use brackets not braces around var names
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

a7db2c15

arm/dt: tegra: Clean up I2S and DAS nodes · 64f88ec3

Stephen Warren authored Dec 07, 2011

The I2S and DAS nodes don't have children, and hence don't need to set
address/size cells.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

64f88ec3

USB: ehci-tegra: add probing through device tree · 4a53f4e6

Olof Johansson authored Nov 04, 2011

Rely on platform_data being passed through auxdata for now; more elaborate
bindings for phy config and tunings to be added.

v2: moved vbus-gpio check to the helper function, added check for !of_node,
    added usb2 clock to board-dt table.
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Grant Likely <grant.likely@secretlab.ca>

4a53f4e6

arm/dt: add basic usb nodes to tegra device trees · c27317c0

Olof Johansson authored Nov 04, 2011

For now they are a minimal binding. It needs to be amended with
vendor-specific settings for phy setup and link tuning, etc.

v2: Added bindings specification and phy_type properties
Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Stephen Warren <swarren@nvidia.com>

c27317c0

arm/tegra: fix variable formatting in makefile · 317d5330

Olof Johansson authored Nov 09, 2011

For some reason it started out using {} instead of (), and it's
proliferated from there. Switch back to ().
Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Colin Cross <ccross@android.com>

317d5330

01 Dec, 2011 10 commits

Linux 3.2-rc4 · 5611cc45
Linus Torvalds authored Dec 01, 2011

5611cc45

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 · 0a4ebed7

Linus Torvalds authored Dec 01, 2011

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits)
  ocfs2: avoid unaligned access to dqc_bitmap
  ocfs2: Use filemap_write_and_wait() instead of write_inode_now()
  ocfs2: honor O_(D)SYNC flag in fallocate
  ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2
  ocfs2: send correct UUID to cleancache initialization
  ocfs2: Commit transactions in error cases -v2
  ocfs2: make direntry invalid when deleting it
  fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free
  ocfs2: Avoid livelock in ocfs2_readpage()
  ocfs2: serialize unaligned aio
  ocfs2: Implement llseek()
  ocfs2: Fix ocfs2_page_mkwrite()
  ocfs2: Add comment about orphan scanning
  ocfs2: Clean up messages in the fs
  ocfs2/cluster: Cluster up now includes network connections too
  ocfs2/cluster: Add new function o2net_fill_node_map()
  ocfs2/cluster: Fix output in file elapsed_time_in_ms
  ocfs2/dlm: dlmlock_remote() needs to account for remastery
  ocfs2/dlm: Take inflight reference count for remotely mastered resources too
  ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()
  ...

0a4ebed7

ocfs2: avoid unaligned access to dqc_bitmap · 93925579

Akinobu Mita authored Nov 15, 2011

The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
but not 64-bit aligned.  The dqc_bitmap is accessed by ocfs2_set_bit(),
ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit().  These
are wrapper macros for ext2_*_bit() which need to take an unsigned long
aligned address (though some architectures are able to handle unaligned
address correctly)

So some 64bit architectures may not be able to access the dqc_bitmap
correctly.

This avoids such unaligned access by using another wrapper functions for
ext2_*_bit().  The code is taken from fs/ext4/mballoc.c which also need to
handle unaligned bitmap access.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Joel Becker <jlbec@evilplan.org>

93925579

Merge branch 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm · 3b120ab7

Linus Torvalds authored Dec 01, 2011

* 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
ARM: 7182/1: ARM cpu topology: fix warning
ARM: 7181/1: Restrict kprobes probing SWP instructions to ARMv5 and below
ARM: 7180/1: Change kprobes testcase with unpredictable STRD instruction
ARM: 7177/1: GIC: avoid skipping non-existent PPIs in irq_start calculation
ARM: 7176/1: cpu_pm: register GIC PM notifier only once
ARM: 7175/1: add subname parameter to mfp_set_groupg callers
ARM: 7174/1: Fix build error in kprobes test code on Thumb2 kernels
ARM: 7172/1: dma: Drop GFP_COMP for DMA memory allocations
ARM: 7171/1: unwind: add unwind directives to bitops assembly macros
ARM: 7170/2: fix compilation breakage in entry-armv.S
ARM: 7168/1: use cache type functions for arch_get_unmapped_area
ARM: perf: check that we have a platform device when reserving PMU
ARM: 7166/1: Use PMD_SHIFT instead of PGDIR_SHIFT in dma-consistent.c
ARM: 7165/2: PL330: Fix typo in _prepare_ccr()
ARM: 7163/2: PL330: Only register usable channels
ARM: 7162/1: errata: tidy up Kconfig options for PL310 errata workarounds
ARM: 7161/1: errata: no automatic store buffer drain
ARM: perf: initialise used_mask for fake PMU during validation
ARM: PMU: remove pmu_init declaration
ARM: PMU: re-export release_pmu symbol to modules

3b120ab7

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · b930c264

Linus Torvalds authored Dec 01, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix meta data raid-repair merge problem
  Btrfs: skip allocation attempt from empty cluster
  Btrfs: skip block groups without enough space for a cluster
  Btrfs: start search for new cluster at the beginning
  Btrfs: reset cluster's max_size when creating bitmap
  Btrfs: initialize new bitmaps' list
  Btrfs: fix oops when calling statfs on readonly device
  Btrfs: Don't error on resizing FS to same size
  Btrfs: fix deadlock on metadata reservation when evicting a inode
  Fix URL of btrfs-progs git repository in docs
  btrfs scrub: handle -ENOMEM from init_ipath()

b930c264

Btrfs: fix meta data raid-repair merge problem · f4a8e656

Jan Schmidt authored Dec 01, 2011

Commit 4a54c8c1 introduced raid-repair, killing the individual
readpage_io_failed_hook entries from inode.c and disk-io.c. Commit
4bb31e92 introduced new readahead code, adding a readpage_io_failed_hook to
disk-io.c.

The raid-repair commit had logic to disable raid-repair, if
readpage_io_failed_hook is set. Thus, the readahead commit effectively
disabled raid-repair for meta data.

This commit changes the logic to always attempt raid-repair when needed and
call the readpage_io_failed_hook in case raid-repair fails. This is much
more straight forward and should have been like that from the beginning.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reported-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

f4a8e656

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband · 11d814a2

Linus Torvalds authored Nov 30, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB: Fix RCU lockdep splats
  IB/ipoib: Prevent hung task or softlockup processing multicast response
  IB/qib: Fix over-scheduling of QSFP work
  RDMA/cxgb4: Fix retry with MPAv1 logic for MPAv2
  RDMA/cxgb4: Fix iw_cxgb4 count_rcqes() logic
  IB/qib: Don't use schedule_work()

11d814a2

Merge branch 'dt-for-linus' of git://sources.calxeda.com/kernel/linux · c290b2f2

Linus Torvalds authored Nov 30, 2011

* 'dt-for-linus' of git://sources.calxeda.com/kernel/linux:
  of: Add Silicon Image vendor prefix
  of/irq: of_irq_init: add check for parent equal to child node

c290b2f2

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · d6e92d36

Linus Torvalds authored Nov 30, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: twl: fix twl4030 support for smps regulators
  regulator: fix use after free bug
  regulator: aat2870: Fix the logic of checking if no id is matched in aat2870_get_regulator

d6e92d36

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · cd5b49bc

Linus Torvalds authored Nov 30, 2011

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (45 commits)
  ARM: ux500: update defconfig
  ARM: u300: update defconfig
  ARM: at91: enable additional boards in existing soc defconfig files
  ARM: at91: refresh soc defconfig files for 3.2
  ARM: at91: rename defconfig files appropriately
  ARM: OMAP2+: Fix Compilation error when omap_l3_noc built as module
  ARM: OMAP2+: Remove empty io.h
  ARM: OMAP2: select ARM_AMBA if OMAP3_EMU is defined
  ARM: OMAP: smartreflex: fix IRQ handling bug
  ARM: OMAP: PM: only register TWL with voltage layer when device is present
  ARM: OMAP: hwmod: Fix the addr space, irq, dma count APIs
  arm: mx28: fix bit operation in clock setting
  ARM: imx: export imx_ioremap
  ARM: imx/mm-imx3: conditionally compile i.MX31 and i.MX35 code
  ARM: mx5: Fix checkpatch warnings in cpu-imx5.c
  MAINTAINERS: Add missing directory
  ARM: imx: drop 'ARCH_MX31' and 'ARCH_MX35'
  ARM: imx6q: move clock register map to machine_desc.map_io
  ARM: pxa168/gplugd: add the correct SSP device
  ARM: Update mach-types to fix mxs build breakage
  ...

cd5b49bc

30 Nov, 2011 14 commits

ARM: 7182/1: ARM cpu topology: fix warning · 4cbd6b16

Vincent Guittot authored Nov 29, 2011

kernel/sched.c:7354:2: warning: initialization from incompatible pointer type

Align cpu_coregroup_mask prototype interface with sched_domain_mask_f typedef
use int cpu instead of unsigned int cpu

Cc: <stable@vger.kernel.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

4cbd6b16

ARM: 7181/1: Restrict kprobes probing SWP instructions to ARMv5 and below · b5bed7fe

Jon Medhurst (Tixy) authored Nov 29, 2011

The SWP instruction is deprecated on ARMv6 and with ARMv7 it will be
UNDEFINED when CONFIG_SWP_EMULATE is selected. In this case, probing a
SWP instruction will cause an oops when the kprobes emulation code
executes an undefined instruction.

As the SWP instruction should be rare or non-existent in kernels for
ARMv6 and later, we can simply avoid these problems by not allowing
probing of these.
Reported-by: Leif Lindholm <leif.lindholm@arm.com>
Tested-by: Leif Lindholm <leif.lindholm@arm.com>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

b5bed7fe

ARM: 7180/1: Change kprobes testcase with unpredictable STRD instruction · 14383c29

Jon Medhurst (Tixy) authored Nov 29, 2011

There is a kprobes testcase for the instruction "strd r2, [r3], r4".
This has unpredictable behaviour as it uses r3 for register writeback
addressing and also stores it to memory.

On a cortex A9, this testcase would fail because the instruction writes
the updated value of r3 to memory, whereas the kprobes emulation code
writes the original value.

Fix this by changing testcase to used r5 instead of r3.
Reported-by: Leif Lindholm <leif.lindholm@arm.com>
Tested-by: Leif Lindholm <leif.lindholm@arm.com>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

14383c29

Btrfs: skip allocation attempt from empty cluster · be064d11

Alexandre Oliva authored Nov 30, 2011

If we don't have a cluster, don't bother trying to allocate from it,
jumping right away to the attempt to allocate a new cluster.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

be064d11

Btrfs: skip block groups without enough space for a cluster · 425d8315

Alexandre Oliva authored Nov 30, 2011

We test whether a block group has enough free space to hold the
requested block, but when we're doing clustered allocation, we can
save some cycles by testing whether it has enough room for the cluster
upfront, otherwise we end up attempting to set up a cluster and
failing. Only in the NO_EMPTY_SIZE loop do we attempt an unclustered
allocation, and by then we'll have zeroed the cluster size, so this
patch won't stop us from using the block group as a last resort.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

425d8315

Btrfs: start search for new cluster at the beginning · 1b22bad7

Alexandre Oliva authored Nov 30, 2011

Instead of starting at zero (offset is always zero), request a cluster
starting at search_start, that denotes the beginning of the current
block group.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

1b22bad7

Btrfs: reset cluster's max_size when creating bitmap · b78d09bc

Alexandre Oliva authored Nov 30, 2011

The field that indicates the size of the largest contiguous chunk of
free space in the cluster is not initialized when setting up bitmaps,
it's only increased when we find a larger contiguous chunk.  We end up
retaining a larger value than appropriate for highly-fragmented
clusters, which may cause pointless searches for large contiguous
groups, and even cause clusters that do not meet the density
requirements to be set up.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

b78d09bc

Btrfs: initialize new bitmaps' list · f2d0f676

Alexandre Oliva authored Nov 28, 2011

We're failing to create clusters with bitmaps because
setup_cluster_no_bitmap checks that the list is empty before inserting
the bitmap entry in the list for setup_cluster_bitmap, but the list
field is only initialized when it is restored from the on-disk free
space cache, or when it is written out to disk.

Besides a potential race condition due to the multiple use of the list
field, filesystem performance severely degrades over time: as we use
up all non-bitmap free extents, the try-to-set-up-cluster dance is
done at every metadata block allocation. For every block group, we
fail to set up a cluster, and after failing on them all up to twice,
we fall back to the much slower unclustered allocation.

To make matters worse, before the unclustered allocation, we try to
create new block groups until we reach the 1% threshold, which
introduces additional bitmaps and thus block groups that we'll iterate
over at each metadata block request.

f2d0f676

Btrfs: fix oops when calling statfs on readonly device · b772a86e

Li Zefan authored Nov 28, 2011

To reproduce this bug:

  # dd if=/dev/zero of=img bs=1M count=256
  # mkfs.btrfs img
  # losetup -r /dev/loop1 img
  # mount /dev/loop1 /mnt
  OOPS!!

It triggered BUG_ON(!nr_devices) in btrfs_calc_avail_data_space().

To fix this, instead of checking write-only devices, we check all open
deivces:

  # df -h /dev/loop1
  Filesystem            Size  Used Avail Use% Mounted on
  /dev/loop1            250M   28K  238M   1% /mnt
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>

b772a86e

Btrfs: Don't error on resizing FS to same size · ece7d20e

Mike Fleetwood authored Nov 18, 2011

It seems overly harsh to fail a resize of a btrfs file system to the
same size when a shrink or grow would succeed. User app GParted trips
over this error. Allow it by bypassing the shrink or grow operation.
Signed-off-by: Mike Fleetwood <mike.fleetwood@googlemail.com>

ece7d20e

Btrfs: fix deadlock on metadata reservation when evicting a inode · aa38a711

Miao Xie authored Nov 18, 2011

When I ran the xfstests, I found the test tasks was blocked on meta-data
reservation.

By debugging, I found the reason of this bug:
   start transaction
        |
	v
   reserve meta-data space
	|
	v
   flush delay allocation -> iput inode -> evict inode
	^					|
	|					v
   wait for delay allocation flush <- reserve meta-data space

And besides that, the flush on evicting inode will block the thread, which
is reclaiming the memory, and make oom happen easily.

Fix this bug by skipping the flush step when evicting inode.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>

aa38a711

Fix URL of btrfs-progs git repository in docs · b52f75a5

Arnd Hannemann authored Nov 16, 2011

	The location of the btrfs-progs repository has been changed.
	This patch updates the documentation accordingly.
Signed-off-by: Arnd Hannemann <arnd@arndnet.de>

b52f75a5

btrfs scrub: handle -ENOMEM from init_ipath() · 26bdef54

Dan Carpenter authored Nov 16, 2011

init_ipath() can return an ERR_PTR(-ENOMEM).
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

26bdef54

Merge branches 'cxgb4', 'ipoib', 'misc' and 'qib' into for-next · a493f1a2
Roland Dreier authored Nov 29, 2011

a493f1a2

29 Nov, 2011 3 commits

Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 8cd79203

Linus Torvalds authored Nov 29, 2011

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: Update comments describing device power management callbacks
  PM / Sleep: Update documentation related to system wakeup
  PM / Runtime: Make documentation follow the new behavior of irq_safe
  PM / Sleep: Correct inaccurate information in devices.txt
  PM / Domains: Document how PM domains are used by the PM core
  PM / Hibernate: Do not leak memory in error/test code paths

8cd79203

IB: Fix RCU lockdep splats · 580da35a

Eric Dumazet authored Nov 29, 2011

Commit f2c31e32 ("net: fix NULL dereferences in check_peer_redir()")
forgot to take care of infiniband uses of dst neighbours.

Many thanks to Marc Aurele who provided a nice bug report and feedback.
Reported-by: Marc Aurele La France <tsi@ualberta.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>

580da35a

IB/ipoib: Prevent hung task or softlockup processing multicast response · 3874397c

Mike Marciniszyn authored Nov 21, 2011

This following can occur with ipoib when processing a multicast reponse:

    BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982]
    Modules linked in: ...
    CPU 0:
    Modules linked in: ...
    Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5
    RIP: 0010:[<ffffffff814ddb27>]  [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20
    RSP: 0018:ffff8802119ed860  EFLAGS: 00000246
    0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299
    RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246
    RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000
    R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860
    R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003
    FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
    [<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib]
    [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
    [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
    [<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib]
    [<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib]
    [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0
    [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0
    [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0
    [<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib]
    [<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib]
    [<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa]
    [<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad]
    [<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core]
    [<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa]
    [<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa]
    [<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa]
    [<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad]
    [<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad]
    [<ffffffff81088830>] ? worker_thread+0x170/0x2a0
    [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
    [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0
    [<ffffffff8108ddf6>] ? kthread+0x96/0xa0
    [<ffffffff8100c1ca>] ? child_rip+0xa/0x20

Coinciding with stack trace is the following message:

    ib0: ib_address_create failed

The code below in ipoib_mcast_join_finish() will note the above
failure in the address handle but otherwise continue:

                ah = ipoib_create_ah(dev, priv->pd, &av);
                if (!ah) {
                        ipoib_warn(priv, "ib_address_create failed\n");
                } else {

The while loop at the bottom of ipoib_mcast_join_finish() will attempt
to send queued multicast packets in mcast->pkt_queue and eventually
end up in ipoib_mcast_send():

        if (!mcast->ah) {
                if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE)
                        skb_queue_tail(&mcast->pkt_queue, skb);
                else {
                        ++dev->stats.tx_dropped;
                        dev_kfree_skb_any(skb);
                }

My read is that the code will requeue the packet and return to the
ipoib_mcast_join_finish() while loop and the stage is set for the
"hung" task diagnostic as the while loop never sees a non-NULL ah, and
will do nothing to resolve.

There are GFP_ATOMIC allocates in the provider routines, so this is
possible and should be dealt with.

The test that induced the failure is associated with a host SM on the
same server during a shutdown.

This patch causes ipoib_mcast_join_finish() to exit with an error
which will flush the queued mcast packets.  Nothing is done to unwind
the QP attached state so that subsequent sends from above will retry
the join.
Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Reviewed-by: Gary Leshner <gary.leshner@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

3874397c