Commits · 31155dabf022ae9e239ca5321287f8128857a52d · nexedi / linux

19 Sep, 2012 10 commits

Fix 'Device not ready' issue on mpt2sas · 31155dab

James Bottomley authored Jul 25, 2012

commit 14216561 upstream.

This is a particularly nasty SCSI ATA Translation Layer (SATL) problem.

SAT-2 says (section 8.12.2)

        if the device is in the stopped state as the result of
        processing a START STOP UNIT command (see 9.11), then the SATL
        shall terminate the TEST UNIT READY command with CHECK CONDITION
        status with the sense key set to NOT READY and the additional
        sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
        REQUIRED;

mpt2sas internal SATL seems to implement this.  The result is very confusing
standby behaviour (using hdparm -y).  If you suspend a drive and then send
another command, usually it wakes up.  However, if the next command is a TEST
UNIT READY, the SATL sees that the drive is suspended and proceeds to follow
the SATL rules for this, returning NOT READY to all subsequent commands.  This
means that the ordering of TEST UNIT READY is crucial: if you send TUR and
then a command, you get a NOT READY to both back.  If you send a command and
then a TUR, you get GOOD status because the preceeding command woke the drive.

This bit us badly because

commit 85ef06d1
Author: Tejun Heo <tj@kernel.org>
Date:   Fri Jul 1 16:17:47 2011 +0200

    block: flush MEDIA_CHANGE from drivers on close(2)

Changed our ordering on TEST UNIT READY commands meaning that SATA drives
connected to an mpt2sas now suspend and refuse to wake (because the mpt2sas
SATL sees the suspend *before* the drives get awoken by the next ATA command)
resulting in lots of failed commands.

The standard is completely nuts forcing this inconsistent behaviour, but we
have to work around it.

The fix for this is twofold:

   1. Set the allow_restart flag so we wake the drive when we see it has been
      suspended

   2. Return all TEST UNIT READY status directly to the mid layer without any
      further error handling which prevents us causing error handling which
      may offline the device just because of a media check TUR.
Reported-by: Matthias Prager <linux@matthiasprager.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

31155dab

megaraid_sas: Move poll_aen_lock initializer · fb225ce3

Kashyap Desai authored Jul 17, 2012

commit bd8d6dd4 upstream.

The following patch moves the poll_aen_lock initializer from
megasas_probe_one() to megasas_init().  This prevents a crash when a user
loads the driver and tries to issue a poll() system call on the ioctl
interface with no adapters present.
Signed-off-by: Kashyap Desai <Kashyap.Desai@lsi.com>
Signed-off-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

fb225ce3

mpt2sas: Fix for Driver oops, when loading driver with max_queue_depth command... · f64c045a

sreekanth.reddy@lsi.com authored Jul 17, 2012

mpt2sas: Fix for Driver oops, when loading driver with max_queue_depth command line option to a very small value

commit 338b131a upstream.

If the specified max_queue_depth setting is less than the
expected number of internal commands, then driver will calculate
the queue depth size to a negitive number. This negitive number
is actually a very large number because variable is unsigned
16bit integer. So, the driver will ask for a very large amount of
memory for message frames and resulting into oops as memory
allocation routines will not able to handle such a large request.

So, in order to limit this kind of oops, The driver need to set
the max_queue_depth to a scsi mid layer's can_queue value. Then
the overall message frames required for IO is minimum of either
(max_queue_depth plus internal commands) or the IOC global
credits.
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

f64c045a

Input: i8042 - add Gigabyte T1005 series netbooks to noloop table · abad539f

Dmitry Torokhov authored Aug 21, 2012

commit 7b125b94 upstream.

They all define their chassis type as "Other" and therefore are not
categorized as "laptops" by the driver, which tries to perform AUX IRQ
delivery test which fails and causes touchpad not working.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42620Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

abad539f

iwlwifi: protect SRAM debugfs · 22d4436d

Johannes Berg authored Aug 21, 2012

commit 4fc79db1 upstream.

If the device is not started, we can't read its
SRAM and attempting to do so will cause issues.
Protect the debugfs read.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2:
 - Adjust context
 - Pass iwl_shared not iwl_priv pointer to iwl_is_ready_rf()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

22d4436d

iwlwifi: fix flow handler debug code · cfc41fe5

Johannes Berg authored Aug 21, 2012

commit 94543a8d upstream.

iwl_dbgfs_fh_reg_read() can cause crashes and/or
BUG_ON in slub because the ifdefs are wrong, the
code in iwl_dump_fh() should use DEBUGFS, not
DEBUG to protect the buffer writing code.

Also, while at it, clean up the arguments to the
function, some code and make it generally safer.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2: adjust filenames and context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

cfc41fe5

ARM: Orion: Set eth packet size csum offload limit · f7532f73

Arnaud Patard (Rtp) authored Jul 26, 2012

commit 58569aee upstream.

The mv643xx ethernet controller limits the packet size for the TX
checksum offloading. This patch sets this limits for Kirkwood and
Dove which have smaller limits that the default.

As a side note, this patch is an updated version of a patch sent some years
ago: http://lists.infradead.org/pipermail/linux-arm-kernel/2010-June/017320.html
which seems to have been lost.
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
[bwh: Backported to 3.2: adjust for the extra two parameters of
 orion_ge0{0,1}_init()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

f7532f73

ARM: OMAP2+: Fix dmtimer set source clock failure · 20ac628b

Jon Hunter authored Jul 13, 2012

commit 54f32a35 upstream.

Calling the dmtimer function omap_dm_timer_set_source() fails if following a
call to pm_runtime_put() to disable the timer. For example the following
sequence would fail to set the parent clock ...

	omap_dm_timer_stop(gptimer);
	omap_dm_timer_set_source(gptimer, OMAP_TIMER_SRC_32_KHZ);

The following error message would be seen ...

omap_dm_timer_set_source: failed to set timer_32k_ck as parent

The problem is that, by design, pm_runtime_put() simply decrements the usage
count and returns before the timer has actually been disabled. Therefore,
setting the parent clock failed because the timer was still active when the
trying to set the parent clock. Setting a parent clock will fail if the clock
you are setting the parent of has a non-zero usage count. To ensure that this
does not fail use pm_runtime_put_sync() when disabling the timer.

Note that this will not be seen on OMAP1 devices, because these devices do
not use the clock framework for dmtimers.
Signed-off-by: Jon Hunter <jon-hunter@ti.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

20ac628b

ARM: S3C24XX: Fix s3c2410_dma_enqueue parameters · ccd2951d

Heiko Stuebner authored Aug 07, 2012

commit b01858c7 upstream.

Commit d670ac01 (ARM: SAMSUNG: DMA Cleanup as per sparse) changed the
prototype of the s3c2410_dma_* functions to use the enum dma_ch instead
of an generic unsigned int.

In the s3c24xx dma.c s3c2410_dma_enqueue seems to have been forgotten,
the other functions there were changed correctly.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

ccd2951d

Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts · 3d0292f5

Mel Gorman authored Jul 23, 2012

commit bba3d8c3 upstream.

The following build error occured during a parisc build with
swap-over-NFS patches applied.

net/core/sock.c:274:36: error: initializer element is not constant
net/core/sock.c:274:36: error: (near initialization for 'memalloc_socks')
net/core/sock.c:274:36: error: initializer element is not constant

Dave Anglin says:
> Here is the line in sock.i:
>
> struct static_key memalloc_socks = ((struct static_key) { .enabled =
> ((atomic_t) { (0) }) });

The above line contains two compound literals.  It also uses a designated
initializer to initialize the field enabled.  A compound literal is not a
constant expression.

The location of the above statement isn't fully clear, but if a compound
literal occurs outside the body of a function, the initializer list must
consist of constant expressions.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

3d0292f5

12 Sep, 2012 30 commits

Linux 3.2.29 · 21094cfa
Ben Hutchings authored Sep 12, 2012

21094cfa

mm: avoid swapping out with swappiness==0 · a65afe79

Satoru Moriya authored May 29, 2012

commit fe35004f upstream.

Sometimes we'd like to avoid swapping out anonymous memory.  In
particular, avoid swapping out pages of important process or process
groups while there is a reasonable amount of pagecache on RAM so that we
can satisfy our customers' requirements.

OTOH, we can control how aggressive the kernel will swap memory pages with
/proc/sys/vm/swappiness for global and
/sys/fs/cgroup/memory/memory.swappiness for each memcg.

But with current reclaim implementation, the kernel may swap out even if
we set swappiness=0 and there is pagecache in RAM.

This patch changes the behavior with swappiness==0.  If we set
swappiness==0, the kernel does not swap out completely (for global reclaim
until the amount of free pages and filebacked pages in a zone has been
reduced to something very very small (nr_free + nr_filebacked < high
watermark)).
Signed-off-by: Satoru Moriya <satoru.moriya@hds.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
 - Adjust context
 - vmscan_swappiness() does not have a zone parameter]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

a65afe79

Squashfs: fix mount time sanity check for corrupted superblock · 4bddad58

Phillip Lougher authored Jan 02, 2012

commit cc37f75a upstream.

A Squashfs filesystem containing nothing but an empty directory,
although unusual and ultimately pointless, is still valid.

The directory_table >= next_table sanity check rejects these
filesystems as invalid because the directory_table is empty and
equal to next_table.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

4bddad58

asus-nb-wmi: add some video toggle keys · b64c7d34

AceLan Kao authored Jul 04, 2012

commit 3766054f upstream.

There are some new video switch keys that used by newer machines.
0xA0 - SDSP HDMI only
0xA1 - SDSP LCD + HDMI
0xA2 - SDSP CRT + HDMI
0xA3 - SDSP TV + HDMI
But in Linux, there is no suitable userspace application to handle this,
so, mapping them all to KEY_SWITCHVIDEOMODE.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

b64c7d34

NFS: Fix Oopses in nfs_lookup_revalidate and nfs4_lookup_revalidate · 2a53885e

Trond Myklebust authored Aug 22, 2012

Fix the following Oops in 3.5.1:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
 IP: [<ffffffffa03789cd>] nfs_lookup_revalidate+0x2d/0x480 [nfs]
 PGD 337c63067 PUD 0
 Oops: 0000 [#1] SMP
 CPU 5
 Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc af_packet binfmt_misc cpufreq_conservative cpufreq_userspace cpufreq_powersave dm_mod acpi_cpufreq mperf coretemp gpio_ich kvm_intel joydev kvm ioatdma hid_generic igb lpc_ich i7core_edac edac_core ptp serio_raw dca pcspkr i2c_i801 mfd_core sg pps_core usbhid crc32c_intel microcode button autofs4 uhci_hcd ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea ehci_hcd usbcore usb_common scsi_dh_rdac scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh edd fan ata_piix thermal processor thermal_sys

 Pid: 30431, comm: java Not tainted 3.5.1-2-default #1 Supermicro X8DTT/X8DTT
 RIP: 0010:[<ffffffffa03789cd>]  [<ffffffffa03789cd>] nfs_lookup_revalidate+0x2d/0x480 [nfs]
 RSP: 0018:ffff8801b418bd38  EFLAGS: 00010292
 RAX: 00000000fffffff6 RBX: ffff88032016d800 RCX: 0000000000000020
 RDX: ffffffff00000000 RSI: 0000000000000000 RDI: ffff8801824a7b00
 RBP: ffff8801b418bdf8 R08: 7fffff0034323030 R09: fffffffff04c03ed
 R10: ffff8801824a7b00 R11: 0000000000000002 R12: ffff8801824a7b00
 R13: ffff8801824a7b00 R14: 0000000000000000 R15: ffff8803201725d0
 FS:  00002b53a46cb700(0000) GS:ffff88033fc20000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000038 CR3: 000000020a426000 CR4: 00000000000007e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process java (pid: 30431, threadinfo ffff8801b418a000, task ffff8801b5d20600)
 Stack:
  ffff8801b418be44 ffff88032016d800 ffff8801b418bdf8 0000000000000000
  ffff8801824a7b00 ffff8801b418bdd7 ffff8803201725d0 ffffffff8116a9c0
  ffff8801b5c38dc0 0000000000000007 ffff88032016d800 0000000000000000
 Call Trace:
  [<ffffffff8116a9c0>] lookup_dcache+0x80/0xe0
  [<ffffffff8116aa43>] __lookup_hash+0x23/0x90
  [<ffffffff8116b4a5>] lookup_one_len+0xc5/0x100
  [<ffffffffa03869a3>] nfs_sillyrename+0xe3/0x210 [nfs]
  [<ffffffff8116cadf>] vfs_unlink.part.25+0x7f/0xe0
  [<ffffffff8116f22c>] do_unlinkat+0x1ac/0x1d0
  [<ffffffff815717b9>] system_call_fastpath+0x16/0x1b
  [<00002b5348b5f527>] 0x2b5348b5f526
 Code: ec 38 b8 f6 ff ff ff 4c 89 64 24 18 4c 89 74 24 28 49 89 fc 48 89 5c 24 08 48 89 6c 24 10 49 89 f6 4c 89 6c 24 20 4c 89 7c 24 30 <f6> 46 38 40 0f 85 d1 00 00 00 e8 c4 c4 df e0 48 8b 58 30 49 89
 RIP  [<ffffffffa03789cd>] nfs_lookup_revalidate+0x2d/0x480 [nfs]
  RSP <ffff8801b418bd38>
 CR2: 0000000000000038
 ---[ end trace 845113ed191985dd ]---

This Oops affects 3.5 kernels and older, and is due to lookup_one_len()
calling down to the dentry revalidation code with a NULL pointer
to struct nameidata.

It is fixed upstream by commit 0b728e19 (stop passing nameidata *
to ->d_revalidate())
Reported-by: Richard Ems <richard.ems@cape-horn-eng.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

2a53885e

Staging: speakup: fix an improperly-declared variable. · fc1dd47c

Christopher Brannon authored Jun 16, 2012

commit 4ea418b8 upstream.

A local static variable was declared as a pointer to a string
constant.  We're assigning to the underlying memory, so it
needs to be an array instead.
Signed-off-by: Christopher Brannon <chris@the-brannons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

fc1dd47c

pmac_zilog,kdb: Fix console poll hook to return instead of loop · 4422e6fe

Jason Wessel authored Aug 12, 2012

commit 38f8eefc upstream.

kdb <-> kgdb transitioning does not work properly with this UART
driver because the get character routine loops indefinitely as opposed
to returning NO_POLL_CHAR per the expectation of the KDB I/O driver
API.

The symptom is a kernel hang when trying to switch debug modes.

Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

4422e6fe

HID: add support for Cypress barcode scanner 04B4:ED81 · 7a42c228

Lionel Vaux authored Jul 22, 2012

commit 76c9d8fe upstream.

Add yet another device to the list of Cypress barcode scanners
needing the CP_RDESC_SWAPPED_MIN_MAX quirk.
Signed-off-by: Lionel Vaux (iouri) <lionel.vaux@free.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

7a42c228

virtio-blk: Reset device after blk_cleanup_queue() · 95964784

Asias He authored May 25, 2012

commit 483001c7 upstream.

blk_cleanup_queue() will call blk_drian_queue() to drain all the
requests before queue DEAD marking. If we reset the device before
blk_cleanup_queue() the drain would fail.

1) if the queue is stopped in do_virtblk_request() because device is
full, the q->request_fn() will not be called.

blk_drain_queue() {
   while(true) {
      ...
      if (!list_empty(&q->queue_head))
        __blk_run_queue(q) {
	    if (queue is not stoped)
		q->request_fn()
	}
      ...
   }
}

Do no reset the device before blk_cleanup_queue() gives the chance to
start the queue in interrupt handler blk_done().

2) In commit b79d866c, We abort requests
dispatched to driver before blk_cleanup_queue(). There is a race if
requests are dispatched to driver after the abort and before the queue
DEAD mark. To fix this, instead of aborting the requests explicitly, we
can just reset the device after after blk_cleanup_queue so that the
device can complete all the requests before queue DEAD marking in the
drain process.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

95964784

virtio-blk: Call del_gendisk() before disable guest kick · 88963fd6

Asias He authored May 25, 2012

commit 02e2b124 upstream.

del_gendisk() might not return due to failing to remove the
/sys/block/vda/serial sysfs entry when another thread (udev) is
trying to read it.

virtblk_remove()
  vdev->config->reset() : guest will not kick us through interrupt
    del_gendisk()
      device_del()
        kobject_del(): got stuck, sysfs entry ref count non zero

sysfs_open_file(): user space process read /sys/block/vda/serial
   sysfs_get_active() : got sysfs entry ref count
      dev_attr_show()
        virtblk_serial_show()
           blk_execute_rq() : got stuck, interrupt is disabled
                              request cannot be finished

This patch fixes it by calling del_gendisk() before we disable guest's
interrupt so that the request sent in virtblk_serial_show() will be
finished and del_gendisk() will success.

This fixes another race in hot-unplug process.

It is save to call del_gendisk(vblk->disk) before
flush_work(&vblk->config_work) which might access vblk->disk, because
vblk->disk is not freed until put_disk(vblk->disk).

Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

88963fd6

virtio-blk: Fix hot-unplug race in remove method · e5000b33

Asias He authored May 04, 2012

commit b79d866c upstream.

If we reset the virtio-blk device before the requests already dispatched
to the virtio-blk driver from the block layer are finised, we will stuck
in blk_cleanup_queue() and the remove will fail.

blk_cleanup_queue() calls blk_drain_queue() to drain all requests queued
before DEAD marking. However it will never success if the device is
already stopped. We'll have q->in_flight[] > 0, so the drain will not
finish.

How to reproduce the race:
1. hot-plug a virtio-blk device
2. keep reading/writing the device in guest
3. hot-unplug while the device is busy serving I/O

Test:
~1000 rounds of hot-plug/hot-unplug test passed with this patch.

Changes in v3:
- Drop blk_abort_queue and blk_abort_request
- Use __blk_end_request_all to complete request dispatched to driver

Changes in v2:
- Drop req_in_flight
- Use virtqueue_detach_unused_buf to get request dispatched to driver
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

e5000b33

virtio_blk: Drop unused request tracking list · 788c8de3

Asias He authored Mar 30, 2012

commit f65ca1dc upstream.

Benchmark shows small performance improvement on fusion io device.

Before:
  seq-read : io=1,024MB, bw=19,982KB/s, iops=39,964, runt= 52475msec
  seq-write: io=1,024MB, bw=20,321KB/s, iops=40,641, runt= 51601msec
  rnd-read : io=1,024MB, bw=15,404KB/s, iops=30,808, runt= 68070msec
  rnd-write: io=1,024MB, bw=14,776KB/s, iops=29,552, runt= 70963msec

After:
  seq-read : io=1,024MB, bw=20,343KB/s, iops=40,685, runt= 51546msec
  seq-write: io=1,024MB, bw=20,803KB/s, iops=41,606, runt= 50404msec
  rnd-read : io=1,024MB, bw=16,221KB/s, iops=32,442, runt= 64642msec
  rnd-write: io=1,024MB, bw=15,199KB/s, iops=30,397, runt= 68991msec
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

788c8de3

virtio_blk: fix config handler race · 0984c4c4

Michael S. Tsirkin authored Jan 12, 2012

commit 4678d6f9 upstream.

Fix a theoretical race related to config work
handler: a config interrupt might happen
after we flush config work but before we
reset the device. It will then cause the
config work to run during or after reset.

Two problems with this:
- if this runs after device is gone we will get use after free
- access of config while reset is in progress is racy
(as layout is changing).

As a solution
1. flush after reset when we know there will be no more interrupts
2. add a flag to disable config access before reset
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

0984c4c4

dmi: Feed DMI table to /dev/random driver · d9033bcc

Tony Luck authored Jul 20, 2012

commit d114a333 upstream.

Send the entire DMI (SMBIOS) table to the /dev/random driver to
help seed its pools.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

d9033bcc

random: Add comment to random_initialize() · 86946487

Tony Luck authored Jul 23, 2012

commit cbc96b75 upstream.

Many platforms have per-machine instance data (serial numbers,
asset tags, etc.) squirreled away in areas that are accessed
during early system bringup. Mixing this data into the random
pools has a very high value in providing better random data,
so we should allow (and even encourage) architecture code to
call add_device_randomness() from the setup_arch() paths.

However, this limits our options for internal structure of
the random driver since random_initialize() is not called
until long after setup_arch().

Add a big fat comment to rand_initialize() spelling out
this requirement.
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

86946487

MAINTAINERS: Theodore Ts'o is taking over the random driver · 94688740

Theodore Ts'o authored Jul 04, 2012

commit 330e0a01 upstream.

Matt Mackall stepped down as the /dev/random driver maintainer last
year, so Theodore Ts'o is taking back the /dev/random driver.

Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

94688740

drivers/char/random.c: fix boot id uniqueness race · a457fb13

Mathieu Desnoyers authored Apr 12, 2012

commit 44e4360f upstream.

/proc/sys/kernel/random/boot_id can be read concurrently by userspace
processes.  If two (or more) user-space processes concurrently read
boot_id when sysctl_bootid is not yet assigned, a race can occur making
boot_id differ between the reads.  Because the whole point of the boot id
is to be unique across a kernel execution, fix this by protecting this
operation with a spinlock.

Given that this operation is not frequently used, hitting the spinlock
on each call should not be an issue.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

a457fb13

Bluetooth: Fix using uninitialized option in RFCMode · 39b78b60

Szymon Janc authored Jun 08, 2012

commit 8f321f85 upstream.

If remote device sends bogus RFC option with invalid length,
undefined options values are used. Fix this by using defaults when
remote misbehaves.

This also fixes the following warning reported by gcc 4.7.0:

net/bluetooth/l2cap_core.c: In function 'l2cap_config_rsp':
net/bluetooth/l2cap_core.c:3302:13: warning: 'rfc.max_pdu_size' may be used uninitialized in this function [-Wmaybe-uninitialized]
net/bluetooth/l2cap_core.c:3266:24: note: 'rfc.max_pdu_size' was declared here
net/bluetooth/l2cap_core.c:3298:25: warning: 'rfc.monitor_timeout' may be used uninitialized in this function [-Wmaybe-uninitialized]
net/bluetooth/l2cap_core.c:3266:24: note: 'rfc.monitor_timeout' was declared here
net/bluetooth/l2cap_core.c:3297:25: warning: 'rfc.retrans_timeout' may be used uninitialized in this function [-Wmaybe-uninitialized]
net/bluetooth/l2cap_core.c:3266:24: note: 'rfc.retrans_timeout' was declared here
net/bluetooth/l2cap_core.c:3295:2: warning: 'rfc.mode' may be used uninitialized in this function [-Wmaybe-uninitialized]
net/bluetooth/l2cap_core.c:3266:24: note: 'rfc.mode' was declared here
Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

39b78b60

block: replace __getblk_slow misfix by grow_dev_page fix · 46d271ae

Hugh Dickins authored Aug 23, 2012

commit 676ce6d5 upstream.

Commit 91f68c89 ("block: fix infinite loop in __getblk_slow")
is not good: a successful call to grow_buffers() cannot guarantee
that the page won't be reclaimed before the immediate next call to
__find_get_block(), which is why there was always a loop there.

Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595:
inode #19278: block 664: comm cc1: unable to read itable block" on console,
which pointed to this commit.

I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs
sometimes fails from a missing header file, under memory pressure on
ppc G5.  I've never seen this on x86, and I've never seen it on 3.5-rc7
itself, despite that commit being in there: bisection pointed to an
irrelevant pinctrl merge, but hard to tell when failure takes between
18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2).

(I've since found such __ext4_get_inode_loc errors in /var/log/messages
from previous weeks: why the message never appeared on console until
yesterday morning is a mystery for another day.)

Revert 91f68c89, restoring __getblk_slow() to how it was (plus
a checkpatch nitfix).  Simplify the interface between grow_buffers()
and grow_dev_page(), and avoid the infinite loop beyond end of device
by instead checking init_page_buffers()'s end_block there (I presume
that's more efficient than a repeated call to blkdev_max_block()),
returning -ENXIO to __getblk_slow() in that case.

And remove akpm's ten-year-old "__getblk() cannot fail ... weird"
comment, but that is worrying: are all users of __getblk() really
now prepared for a NULL bh beyond end of device, or will some oops??
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

46d271ae

fs/buffer.c: remove BUG() in possible but rare condition · 5835d2dc

Glauber Costa authored Apr 25, 2012

commit 61065a30 upstream.

While stressing the kernel with with failing allocations today, I hit the
following chain of events:

alloc_page_buffers():

	bh = alloc_buffer_head(GFP_NOFS);
	if (!bh)
		goto no_grow; <= path taken

grow_dev_page():
        bh = alloc_page_buffers(page, size, 0);
        if (!bh)
                goto failed;  <= taken, consequence of the above

and then the failed path BUG()s the kernel.

The failure is inserted a litte bit artificially, but even then, I see no
reason why it should be deemed impossible in a real box.

Even though this is not a condition that we expect to see around every
time, failed allocations are expected to be handled, and BUG() sounds just
too much.  As a matter of fact, grow_dev_page() can return NULL just fine
in other circumstances, so I propose we just remove it, then.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

5835d2dc

rapidio/tsi721: fix unused variable compiler warning · 9bf9e4d2

Alexandre Bounine authored Aug 21, 2012

commit 9a9a9a7a upstream.

Fix unused variable compiler warning when built with CONFIG_RAPIDIO_DEBUG
option off.

This patch is applicable to kernel versions starting from v3.2
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

9bf9e4d2

rapidio/tsi721: fix inbound doorbell interrupt handling · 61ee0199

Alexandre Bounine authored Aug 21, 2012

commit 3670e7e1 upstream.

Make sure that there is no doorbell messages left behind due to disabled
interrupts during inbound doorbell processing.

The most common case for this bug is loss of rionet JOIN messages in
systems with three or more rionet participants and MSI or MSI-X enabled.
As result, requests for packet transfers may finish with "destination
unreachable" error message.

This patch is applicable to kernel versions starting from v3.2.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

61ee0199

drivers/rtc/rtc-rs5c348.c: fix hour decoding in 12-hour mode · 05621c97

Atsushi Nemoto authored Aug 21, 2012

commit 7dbfb315 upstream.

Correct the offset by subtracting 20 from tm_hour before taking the
modulo 12.

[ "Why 20?" I hear you ask. Or at least I did.

  Here's the reason why: RS5C348_BIT_PM is 32, and is - stupidly -
  included in the RS5C348_HOURS_MASK define.  So it's really subtracting
  out that bit to get "hour+12".  But then because it does things modulo
  12, it needs to add the 12 in again afterwards anyway.

  This code is confused.  It would be much clearer if RS5C348_HOURS_MASK
  just didn't include the RS5C348_BIT_PM bit at all, then it wouldn't
  need to do the silly subtract either.

  Whatever. It's all just math, the end result is the same.   - Linus ]
Reported-by: James Nute <newten82@gmail.com>
Tested-by: James Nute <newten82@gmail.com>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

05621c97

drivers/misc/sgi-xp/xpc_uv.c: SGI XPC fails to load when cpu 0 is out of IRQ resources · 33da4c1e

Robin Holt authored Aug 21, 2012

commit 7838f994 upstream.

On many of our larger systems, CPU 0 has had all of its IRQ resources
consumed before XPC loads.  Worst cases on machines with multiple 10
GigE cards and multiple IB cards have depleted the entire first socket
of IRQs.

This patch makes selecting the node upon which IRQs are allocated (as
well as all the other GRU Message Queue structures) specifiable as a
module load param and has a default behavior of searching all nodes/cpus
for an available resources.

[akpm@linux-foundation.org: fix build: include cpu.h and module.h]
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

33da4c1e

mm: hugetlbfs: correctly populate shared pmd · cf727a57

Michal Hocko authored Aug 21, 2012

commit eb48c071 upstream.

Each page mapped in a process's address space must be correctly
accounted for in _mapcount.  Normally the rules for this are
straightforward but hugetlbfs page table sharing is different.  The page
table pages at the PMD level are reference counted while the mapcount
remains the same.

If this accounting is wrong, it causes bugs like this one reported by
Larry Woodman:

  kernel BUG at mm/filemap.c:135!
  invalid opcode: 0000 [#1] SMP
  CPU 22
  Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi]
  Pid: 18001, comm: mpitest Tainted: G        W    3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2
  RIP: 0010:[<ffffffff8112cfed>]  [<ffffffff8112cfed>] __delete_from_page_cache+0x15d/0x170
  Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20)
  Call Trace:
    delete_from_page_cache+0x40/0x80
    truncate_hugepages+0x115/0x1f0
    hugetlbfs_evict_inode+0x18/0x30
    evict+0x9f/0x1b0
    iput_final+0xe3/0x1e0
    iput+0x3e/0x50
    d_kill+0xf8/0x110
    dput+0xe2/0x1b0
    __fput+0x162/0x240

During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc()
shared page tables with the check dst_pte == src_pte.  The logic is if
the PMD page is the same, they must be shared.  This assumes that the
sharing is between the parent and child.  However, if the sharing is
with a different process entirely then this check fails as in this
diagram:

  parent
    |
    ------------>pmd
                 src_pte----------> data page
                                        ^
  other--------->pmd--------------------|
                  ^
  child-----------|
                 dst_pte

For this situation to occur, it must be possible for Parent and Other to
have faulted and failed to share page tables with each other.  This is
possible due to the following style of race.

  PROC A                                          PROC B
  copy_hugetlb_page_range                         copy_hugetlb_page_range
    src_pte == huge_pte_offset                      src_pte == huge_pte_offset
    !src_pte so no sharing                          !src_pte so no sharing

  (time passes)

  hugetlb_fault                                   hugetlb_fault
    huge_pte_alloc                                  huge_pte_alloc
      huge_pmd_share                                 huge_pmd_share
        LOCK(i_mmap_mutex)
        find nothing, no sharing
        UNLOCK(i_mmap_mutex)
                                                      LOCK(i_mmap_mutex)
                                                      find nothing, no sharing
                                                      UNLOCK(i_mmap_mutex)
      pmd_alloc                                       pmd_alloc
      LOCK(instantiation_mutex)
      fault
      UNLOCK(instantiation_mutex)
                                                  LOCK(instantiation_mutex)
                                                  fault
                                                  UNLOCK(instantiation_mutex)

These two processes are not poing to the same data page but are not
sharing page tables because the opportunity was missed.  When either
process later forks, the src_pte == dst pte is potentially insufficient.
As the check falls through, the wrong PTE information is copied in
(harmless but wrong) and the mapcount is bumped for a page mapped by a
shared page table leading to the BUG_ON.

This patch addresses the issue by moving pmd_alloc into huge_pmd_share
which guarantees that the shared pud is populated in the same critical
section as pmd.  This also means that huge_pte_offset test in
huge_pmd_share is serialized correctly now which in turn means that the
success of the sharing will be higher as the racing tasks see the pud
and pmd populated together.

Race identified and changelog written mostly by Mel Gorman.

{akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style]
Reported-by: Larry Woodman <lwoodman@redhat.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Ken Chen <kenchen@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

cf727a57

cciss: fix incorrect scsi status reporting · e371958b

Stephen M. Cameron authored Aug 21, 2012

commit b0cf0b11 upstream.

Delete code which sets SCSI status incorrectly as it's already been set
correctly above this incorrect code.  The bug was introduced in 2009 by
commit b0e15f6d ("cciss: fix typo that causes scsi status to be
lost.")
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Reported-by: Roel van Meer <roel.vanmeer@bokxing.nl>
Tested-by: Roel van Meer <roel.vanmeer@bokxing.nl>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

e371958b

fbcon: fix race condition between console lock and cursor timer (v1.1) · 94fb2469

Dave Airlie authored Aug 21, 2012

commit d8636a27 upstream.

So we've had a fair few reports of fbcon handover breakage between
efi/vesafb and i915 surface recently, so I dedicated a couple of
days to finding the problem.

Essentially the last thing we saw was the conflicting framebuffer
message and that was all.

So after much tracing with direct netconsole writes (printks
under console_lock not so useful), I think I found the race.

Thread A (driver load)    Thread B (timer thread)
  unbind_con_driver ->              |
  bind_con_driver ->                |
  vc->vc_sw->con_deinit ->          |
  fbcon_deinit ->                   |
  console_lock()                    |
      |                             |
      |                       fbcon_flashcursor timer fires
      |                       console_lock() <- blocked for A
      |
      |
fbcon_del_cursor_timer ->
  del_timer_sync
  (BOOM)

Of course because all of this is under the console lock,
we never see anything, also since we also just unbound the active
console guess what we never see anything.

Hopefully this fixes the problem for anyone seeing vesafb->kms
driver handoff.

v1.1: add comment suggestion from Alan.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

94fb2469

Revert "drm/radeon: fix bo creation retry path" · 2744f4e7

Alex Deucher authored Aug 21, 2012

commit 676bc2e1 upstream.

This reverts commit d1c7871d.

ttm_bo_init() destroys the BO on failure. So this patch makes
the retry path work with freed memory.  This ends up causing
kernel panics when this path is hit.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

2744f4e7

svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping · 9d16d84d

J. Bruce Fields authored Aug 17, 2012

commit d10f27a7 upstream.

The rpc server tries to ensure that there will be room to send a reply
before it receives a request.

It does this by tracking, in xpt_reserved, an upper bound on the total
size of the replies that is has already committed to for the socket.

Currently it is adding in the estimate for a new reply *before* it
checks whether there is space available.  If it finds that there is not
space, it then subtracts the estimate back out.

This may lead the subsequent svc_xprt_enqueue to decide that there is
space after all.

The results is a svc_recv() that will repeatedly return -EAGAIN, causing
server threads to loop without doing any actual work.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

9d16d84d

svcrpc: sends on closed socket should stop immediately · 1057f770

J. Bruce Fields authored Aug 20, 2012

commit f06f00a2 upstream.

svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
However, the XPT_CLOSE won't be acted on immediately.  Meanwhile other
threads could send further replies before the socket is really shut
down.  This can manifest as data corruption: for example, if a truncated
read reply is followed by another rpc reply, that second reply will look
to the client like further read data.

Symptoms were data corruption preceded by svc_tcp_sendto logging
something like

	kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket
Reported-by: Malahal Naineni <malahal@us.ibm.com>
Tested-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

1057f770