- 06 Feb, 2015 26 commits
-
-
Bob Paauwe authored
commit af1a7301 upstream. When creating a fence for a tiled object, only fence the area that makes up the actual tiles. The object may be larger than the tiled area and if we allow those extra addresses to be fenced, they'll get converted to addresses beyond where the object is mapped. This opens up the possiblity of writes beyond the end of object. To prevent this, we adjust the size of the fence to only encompass the area that makes up the actual tiles. The extra space is considered un-tiled and now behaves as if it was a linear object. Testcase: igt/gem_tiled_fence_overflow Reported-by: Dan Hettena <danh@ghs.com> Signed-off-by: Bob Paauwe <bob.j.paauwe@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mugunthan V N authored
commit 02a54164 upstream. In Dual EMAC, the default VLANs are used to segregate Rx packets between the ports, so adding the same default VLAN to the switch will affect the normal packet transfers. So returning error on addition of dual EMAC default VLANs. Even if EMAC 0 default port VLAN is added to EMAC 1, it will lead to break dual EMAC port separations. Fixes: d9ba8f9e (driver: net: ethernet: cpsw: dual emac interface implementation) Reported-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ashay Jaiswal authored
commit 83b0302d upstream. The regulator framework maintains a list of consumer regulators for a regulator device and protects it from concurrent access using the regulator device's mutex lock. In the case of regulator_put() the consumer is removed and regulator device's parameters are updated without holding the regulator device's mutex. This would lead to a race condition between the regulator_put() and any function which traverses the consumer list or modifies regulator device's parameters. Fix this race condition by holding the regulator device's mutex in case of regulator_put. Signed-off-by: Ashay Jaiswal <ashayj@codeaurora.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mika Westerberg authored
commit c957e8f0 upstream. Once the current message is finished, the driver notifies SPI core about this by calling spi_finalize_current_message(). This function queues next message to be transferred. If there are more messages in the queue, it is possible that the driver is asked to transfer the next message at this point. When spi_finalize_current_message() returns the driver clears the drv_data->cur_chip pointer to NULL. The problem is that if the driver already started the next message clearing drv_data->cur_chip will cause NULL pointer dereference which crashes the kernel like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 IP: [<ffffffffa0022bc8>] cs_deassert+0x18/0x70 [spi_pxa2xx_platform] PGD 78bb8067 PUD 37712067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: CPU: 1 PID: 11 Comm: ksoftirqd/1 Tainted: G O 3.18.0-rc4-mjo #5 Hardware name: Intel Corp. VALLEYVIEW B3 PLATFORM/NOTEBOOK, BIOS MNW2CRB1.X64.0071.R30.1408131301 08/13/2014 task: ffff880077f9f290 ti: ffff88007a820000 task.ti: ffff88007a820000 RIP: 0010:[<ffffffffa0022bc8>] [<ffffffffa0022bc8>] cs_deassert+0x18/0x70 [spi_pxa2xx_platform] RSP: 0018:ffff88007a823d08 EFLAGS: 00010202 RAX: 0000000000000008 RBX: ffff8800379a4430 RCX: 0000000000000026 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff8800379a4430 RBP: ffff88007a823d18 R08: 00000000ffffffff R09: 000000007a9bc65a R10: 000000000000028f R11: 0000000000000005 R12: ffff880070123e98 R13: ffff880070123de8 R14: 0000000000000100 R15: ffffc90004888000 FS: 0000000000000000(0000) GS:ffff880079a80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000048 CR3: 000000007029b000 CR4: 00000000001007e0 Stack: ffff88007a823d58 ffff8800379a4430 ffff88007a823d48 ffffffffa0022c89 0000000000000000 ffff8800379a4430 0000000000000000 0000000000000006 ffff88007a823da8 ffffffffa0023be0 ffff88007a823dd8 ffffffff81076204 Call Trace: [<ffffffffa0022c89>] giveback+0x69/0xa0 [spi_pxa2xx_platform] [<ffffffffa0023be0>] pump_transfers+0x710/0x740 [spi_pxa2xx_platform] [<ffffffff81076204>] ? pick_next_task_fair+0x744/0x830 [<ffffffff81049679>] tasklet_action+0xa9/0xe0 [<ffffffff81049a0e>] __do_softirq+0xee/0x280 [<ffffffff81049bc0>] run_ksoftirqd+0x20/0x40 [<ffffffff810646df>] smpboot_thread_fn+0xff/0x1b0 [<ffffffff810645e0>] ? SyS_setgroups+0x150/0x150 [<ffffffff81060f9d>] kthread+0xcd/0xf0 [<ffffffff81060ed0>] ? kthread_create_on_node+0x180/0x180 [<ffffffff8187a82c>] ret_from_fork+0x7c/0xb0 Fix this by clearing drv_data->cur_chip before we call spi_finalize_current_message(). Reported-by: Martin Oldfield <m@mjoldfield.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Joe Thornber authored
commit 766a7888 upstream. Commit 9b1cc9f2 ("dm cache: share cache-metadata object across inactive and active DM tables") mistakenly ignored the use of ERR_PTR returns. Restore missing IS_ERR checks and ERR_PTR returns where appropriate. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Joe Thornber authored
commit 2a7eaea0 upstream. You can't modify the metadata in these modes. It's better to fail these messages immediately than let the block-manager deny write locks on metadata blocks. Otherwise these failed metadata changes will trigger 'needs_check' to get set in the metadata superblock -- requiring repair using the thin_check utility. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johannes Berg authored
commit 0fa7b391 upstream. In case userspace attempts to obtain key information for or delete a unicast key, this is currently erroneously rejected unless the driver sets the WIPHY_FLAG_IBSS_RSN flag. Apparently enough drivers do so it was never noticed. Fix that, and while at it fix a potential memory leak: the error path in the get_key() function was placed after allocating a message but didn't free it - move it to a better place. Luckily admin permissions are needed to call this operation. Fixes: e31b8213 ("cfg80211/mac80211: allow per-station GTKs") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mathy Vanhoef authored
commit 3a5c5e81 upstream. Fix a regression introduced by commit a5e70697 ("mac80211: add radiotap flag and handling for 5/10 MHz") where the IEEE80211_CHAN_CCK channel type flag was incorrectly replaced by the IEEE80211_CHAN_OFDM flag. This commit fixes that by using the CCK flag again. Fixes: a5e70697 ("mac80211: add radiotap flag and handling for 5/10 MHz") Signed-off-by: Mathy Vanhoef <vanhoefm@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Trond Myklebust authored
commit 3175e1dc upstream. If we start state recovery on a client that failed to initialise correctly, then we are very likely to Oops. Reported-by: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de> Link: http://lkml.kernel.org/r/130621862.279655.1421851650684.JavaMail.zimbra@desy.deSigned-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peng Tao authored
commit ee8a1a8b upstream. We only support swap file calling nfs_direct_IO. However, application might be able to get to nfs_direct_IO if it toggles O_DIRECT flag during IO and it can deadlock because we grab inode->i_mutex in nfs_file_direct_write(). So return 0 for such case. Then the generic layer will fall back to buffer IO. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jochen Hein authored
commit 1d90d6d5 upstream. Without this the aux port does not get detected, and consequently the touchpad will not work. With this patch the touchpad is detected: $ dmesg | grep -E "(SYN|i8042|serio)" pnp 00:03: Plug and Play ACPI device, IDs SYN1d22 PNP0f13 (active) i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12 serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input4 psmouse serio1: synaptics: Touchpad model: 1, fw: 8.1, id: 0x1e2b1, caps: 0xd00123/0x840300/0x126800, board id: 2863, fw id: 1473085 input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input6 dmidecode excerpt for this laptop is: Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: Medion Product Name: Akoya E7225 Version: 1.0 Signed-off-by: Jochen Hein <jochen@jochen.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Hutterer authored
commit 8543cf1c upstream. LEN0037 found in the Lenovo ThinkPad X1 Carbon 2nd (2014 model) Reported-and-tested-by: Bjoern Olausson <bjoern@olausson.de> Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Paul Osmialowski authored
commit 34e81ad5 upstream. This patch solves deadlock between clock prepare mutex and regmap mutex reported by Tomasz Figa in [1] by implementing solution from [2]: "always leave the clock of the i2c controller in a prepared state". [1] https://lkml.org/lkml/2014/7/2/171 [2] https://lkml.org/lkml/2014/7/2/207 On each i2c transfer handled by s3c24xx_i2c_xfer(), clk_prepare_enable() was called, which calls clk_prepare() then clk_enable(). clk_prepare() takes prepare_lock mutex before proceeding. Note that i2c transfer functions are invoked from many places in kernel, typically with some other additional lock held. It may happen that function on CPU1 (e.g. regmap_update_bits()) has taken a mutex (i.e. regmap lock mutex) then it attempts i2c communication in order to proceed (so it needs to obtain clock related prepare_lock mutex during transfer preparation stage due to clk_prepare() call). At the same time other task on CPU0 wants to operate on clock (e.g. to (un)prepare clock for some other reason) so it has taken prepare_lock mutex. CPU0: CPU1: clk_disable_unused() regulator_disable() clk_prepare_lock() map->lock(map->lock_arg) regmap_read() s3c24xx_i2c_xfer() map->lock(map->lock_arg) clk_prepare_lock() Implemented solution from [2] leaves i2c clock prepared. Preparation is done in s3c24xx_i2c_probe() function. Without this patch, it is immediately unprepared by clk_disable_unprepare() call. I've replaced this call with clk_disable() and I've added clk_unprepare() call in s3c24xx_i2c_remove(). The s3c24xx_i2c_xfer() function now uses clk_enable() instead of clk_prepare_enable() (and clk_disable() instead of clk_unprepare_disable()). Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ilya Dryomov authored
commit e69b8d41 upstream. This effectively reverts the last hunk of 392a9dad ("rbd: detect when clone image is flattened"). The problem with parent_overlap != 0 condition is that it's possible and completely valid to have an image with parent_overlap == 0 whose parent state needs to be cleaned up on unmap. The next commit, which drops the "clone image now standalone" logic, opens up another window of opportunity to hit this, but even without it # cat parent-ref.sh #!/bin/bash rbd create --image-format 2 --size 1 foo rbd snap create foo@snap rbd snap protect foo@snap rbd clone foo@snap bar rbd resize --allow-shrink --size 0 bar rbd resize --size 1 bar DEV=$(rbd map bar) rbd unmap $DEV leaves rbd_device/rbd_spec/etc and rbd_client along with ceph_client hanging around. My thinking behind calling rbd_dev_parent_put() unconditionally is that there shouldn't be any requests in flight at that point in time as we are deep into unmap sequence. Hence, even if rbd_dev_unparent() caused by flatten is delayed by in-flight requests, it will have finished by the time we reach rbd_dev_unprobe() caused by unmap, thus turning unconditional rbd_dev_parent_put() into a no-op. Fixes: http://tracker.ceph.com/issues/10352Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Clemens Ladisch authored
commit 0767e95b upstream. When the last subscriber to a "Through" port has been removed, the subscribed destination ports might still be active, so it would be wrong to send "all sounds off" and "reset controller" events to them. The proper place for such a shutdown would be the closing of the actual MIDI port (and close_substream() in rawmidi.c already can do this). This also fixes a deadlock when dummy_unuse() tries to send events to its own port that is already locked because it is being freed. Reported-by: Peter Billam <peter@www.pjb.com.au> Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Laurent Dufour authored
commit e6eb2eba upstream. The commit 3b8a3c01 ("powerpc/pseries: Fix endiannes issue in RTAS call from xmon") was fixing an endianness issue in the call made from xmon to RTAS. However, as Michael Ellerman noticed, this fix was not complete, the token value was not byte swapped. This lead to call an unexpected and most of the time unexisting RTAS function, which is silently ignored by RTAS. This fix addresses this hole. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ahmed S. Darwish authored
commit e638642b upstream. While being in an ERROR_WARNING state, and receiving further bus error events with error counters still in the ERROR_WARNING range of 97-127 inclusive, the state handling code erroneously reverts back to ERROR_ACTIVE. Per the CAN standard, only revert to ERROR_ACTIVE when the error counters are less than 96. Moreover, in certain Kvaser models, the BUS_ERROR flag is always set along with undefined bits in the M16C status register. Thus use bitwise operators instead of full equality for checking that register against bus errors. Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ahmed S. Darwish authored
commit 14c10c2a upstream. On some x86 laptops, plugging a Kvaser device again after an unplug makes the firmware always ignore the very first command. For such a case, provide some room for retries instead of completely exiting the driver init code. Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ahmed S. Darwish authored
commit 3803fa69 upstream. Send expected argument to the URB completion hander: a CAN netdevice instead of the network interface private context `kvaser_usb_net_priv'. This was discovered by having some garbage in the kernel log in place of the netdevice names: can0 and can1. Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ahmed S. Darwish authored
commit ded50066 upstream. Upon receiving a hardware event with the BUS_RESET flag set, the driver kills all of its anchored URBs and resets all of its transmit URB contexts. Unfortunately it does so under the context of URB completion handler `kvaser_usb_read_bulk_callback()', which is often called in an atomic context. While the device is flooded with many received error packets, usb_kill_urb() typically sleeps/reschedules till the transfer request of each killed URB in question completes, leading to the sleep in atomic bug. [3] In v2 submission of the original driver patch [1], it was stated that the URBs kill and tx contexts reset was needed since we don't receive any tx acknowledgments later and thus such resources will be locked down forever. Fortunately this is no longer needed since an earlier bugfix in this patch series is now applied: all tx URB contexts are reset upon CAN channel close. [2] Moreover, a BUS_RESET is now treated _exactly_ like a BUS_OFF event, which is the recommended handling method advised by the device manufacturer. [1] http://article.gmane.org/gmane.linux.network/239442 http://www.webcitation.org/6Vr2yagAQ [2] can: kvaser_usb: Reset all URB tx contexts upon channel close 889b77f7 [3] Stacktrace: <IRQ> [<ffffffff8158de87>] dump_stack+0x45/0x57 [<ffffffff8158b60c>] __schedule_bug+0x41/0x4f [<ffffffff815904b1>] __schedule+0x5f1/0x700 [<ffffffff8159360a>] ? _raw_spin_unlock_irqrestore+0xa/0x10 [<ffffffff81590684>] schedule+0x24/0x70 [<ffffffff8147d0a5>] usb_kill_urb+0x65/0xa0 [<ffffffff81077970>] ? prepare_to_wait_event+0x110/0x110 [<ffffffff8147d7d8>] usb_kill_anchored_urbs+0x48/0x80 [<ffffffffa01f4028>] kvaser_usb_unlink_tx_urbs+0x18/0x50 [kvaser_usb] [<ffffffffa01f45d0>] kvaser_usb_rx_error+0xc0/0x400 [kvaser_usb] [<ffffffff8108b14a>] ? vprintk_default+0x1a/0x20 [<ffffffffa01f5241>] kvaser_usb_read_bulk_callback+0x4c1/0x5f0 [kvaser_usb] [<ffffffff8147a73e>] __usb_hcd_giveback_urb+0x5e/0xc0 [<ffffffff8147a8a1>] usb_hcd_giveback_urb+0x41/0x110 [<ffffffffa0008748>] finish_urb+0x98/0x180 [ohci_hcd] [<ffffffff810cd1a7>] ? acct_account_cputime+0x17/0x20 [<ffffffff81069f65>] ? local_clock+0x15/0x30 [<ffffffffa000a36b>] ohci_work+0x1fb/0x5a0 [ohci_hcd] [<ffffffff814fbb31>] ? process_backlog+0xb1/0x130 [<ffffffffa000cd5b>] ohci_irq+0xeb/0x270 [ohci_hcd] [<ffffffff81479fc1>] usb_hcd_irq+0x21/0x30 [<ffffffff8108bfd3>] handle_irq_event_percpu+0x43/0x120 [<ffffffff8108c0ed>] handle_irq_event+0x3d/0x60 [<ffffffff8108ec84>] handle_fasteoi_irq+0x74/0x110 [<ffffffff81004dfd>] handle_irq+0x1d/0x30 [<ffffffff81004727>] do_IRQ+0x57/0x100 [<ffffffff8159482a>] common_interrupt+0x6a/0x6a Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Ujfalusi authored
commit 20602e34 upstream. We should select FSR also to be driven by McBSP, not only FSX. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Acked-by: Jarkko Nikula <jarkko.nikula@bitmer.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Qais Yousef authored
commit d3268a40 upstream. In soc_new_compress() when rtd->dai_link->dynamic is set, we create the pcm substreams with this call: ret = snd_pcm_new_internal(rtd->card->snd_card, new_name, num, 1, 0, &be_pcm); which passes 0 as capture_count leading to be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream being NULL, hence when trying to set rtd a few lines below we get an oops. Fix by using rtd->dai_link->dpcm_playback and rtd->dai_link->dpcm_capture as playback_count and capture_count to snd_pcm_new_internal(). Signed-off-by: Qais Yousef <qais.yousef@imgtec.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Aurelien BOUIN authored
commit adc60298 upstream. The xDC field should have 5 bit width according to Reference Manual. Thus this patch fixes it. Signed-off-by: Aurelien BOUIN <a_bouin@yahoo.fr> Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Zidan Wang authored
commit 22ee76da upstream. wm8960 codec can't support sample rate 11250, it must be 11025. Signed-off-by: Zidan Wang <b50113@freescale.com> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andy Shevchenko authored
commit 67bf9cda upstream. The FIFO size is 40 accordingly to the specifications, but this means 0x40, i.e. 64 bytes. This patch fixes the typo and enables FIFO size autodetection for Intel MID devices. Fixes: 7063c0d9 (spi/dw_spi: add DMA support) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kees Cook authored
commit d69911a6 upstream. Commit e6023367 ("x86, kaslr: Prevent .bss from overlaping initrd") added Perl to the required build environment. This reimplements in shell the Perl script used to find the size of the kernel with bss and brk added. Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Rob Landley <rob@landley.net> Acked-by: Rob Landley <rob@landley.net> Cc: Anca Emanuel <anca.emanuel@gmail.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Junjie Mao <eternal.n08@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 30 Jan, 2015 14 commits
-
-
Greg Kroah-Hartman authored
-
NeilBrown authored
commit 108cef3a upstream. It is critical that fetch_block() and handle_stripe_dirtying() are consistent in their analysis of what needs to be loaded. Otherwise raid5 can wait forever for a block that won't be loaded. Currently when writing to a RAID5 that is resyncing, to a location beyond the resync offset, handle_stripe_dirtying chooses a reconstruct-write cycle, but fetch_block() assumes a read-modify-write, and a lockup can happen. So treat that case just like RAID6, just as we do in handle_stripe_dirtying. RAID6 always does reconstruct-write. This bug was introduced when the behaviour of handle_stripe_dirtying was changed in 3.7, so the patch is suitable for any kernel since, though it will need careful merging for some versions. Cc: stable@vger.kernel.org (v3.7+) Fixes: a7854487Reported-by: Henry Cai <henryplusplus@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michal Hocko authored
commit 45f87de5 upstream. Commit 2457aec6 ("mm: non-atomically mark page accessed during page cache allocation where possible") has added a separate parameter for specifying gfp mask for radix tree allocations. Not only this is less than optimal from the API point of view because it is error prone, it is also buggy currently because grab_cache_page_write_begin is using GFP_KERNEL for radix tree and if fgp_flags doesn't contain FGP_NOFS (mostly controlled by fs by AOP_FLAG_NOFS flag) but the mapping_gfp_mask has __GFP_FS cleared then the radix tree allocation wouldn't obey the restriction and might recurse into filesystem and cause deadlocks. This is the case for most filesystems unfortunately because only ext4 and gfs2 are using AOP_FLAG_NOFS. Let's simply remove radix_gfp_mask parameter because the allocation context is same for both page cache and for the radix tree. Just make sure that the radix tree gets only the sane subset of the mask (e.g. do not pass __GFP_WRITE). Long term it is more preferable to convert remaining users of AOP_FLAG_NOFS to use mapping_gfp_mask instead and simplify this interface even further. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mel Gorman authored
commit 4ffeaf35 upstream. The fair zone allocation policy round-robins allocations between zones within a node to avoid age inversion problems during reclaim. If the first allocation fails, the batch counts are reset and a second attempt made before entering the slow path. One assumption made with this scheme is that batches expire at roughly the same time and the resets each time are justified. This assumption does not hold when zones reach their low watermark as the batches will be consumed at uneven rates. Allocation failure due to watermark depletion result in additional zonelist scans for the reset and another watermark check before hitting the slowpath. On UMA, the benefit is negligible -- around 0.25%. On 4-socket NUMA machine it's variable due to the variability of measuring overhead with the vmstat changes. The system CPU overhead comparison looks like 3.16.0-rc3 3.16.0-rc3 3.16.0-rc3 vanilla vmstat-v5 lowercost-v5 User 746.94 774.56 802.00 System 65336.22 32847.27 40852.33 Elapsed 27553.52 27415.04 27368.46 However it is worth noting that the overall benchmark still completed faster and intuitively it makes sense to take as few passes as possible through the zonelists. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mel Gorman authored
commit f7b5d647 upstream. The purpose of numa_zonelist_order=zone is to preserve lower zones for use with 32-bit devices. If locality is preferred then the numa_zonelist_order=node policy should be used. Unfortunately, the fair zone allocation policy overrides this by skipping zones on remote nodes until the lower one is found. While this makes sense from a page aging and performance perspective, it breaks the expected zonelist policy. This patch restores the expected behaviour for zone-list ordering. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mel Gorman authored
commit bb0b6dff upstream. When kswapd is awake reclaiming, the per-cpu stat thresholds are lowered to get more accurate counts to avoid breaching watermarks. This threshold update iterates over all possible CPUs which is unnecessary. Only online CPUs need to be updated. If a new CPU is onlined, refresh_zone_stat_thresholds() will set the thresholds correctly. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mel Gorman authored
commit 0d5d823a upstream. zone->pages_scanned is a write-intensive cache line during page reclaim and it's also updated during page free. Move the counter into vmstat to take advantage of the per-cpu updates and do not update it in the free paths unless necessary. On a small UMA machine running tiobench the difference is marginal. On a 4-node machine the overhead is more noticable. Note that automatic NUMA balancing was disabled for this test as otherwise the system CPU overhead is unpredictable. 3.16.0-rc3 3.16.0-rc3 3.16.0-rc3 vanillarearrange-v5 vmstat-v5 User 746.94 759.78 774.56 System 65336.22 58350.98 32847.27 Elapsed 27553.52 27282.02 27415.04 Note that the overhead reduction will vary depending on where exactly pages are allocated and freed. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mel Gorman authored
commit 3484b2de upstream. The arrangement of struct zone has changed over time and now it has reached the point where there is some inappropriate sharing going on. On x86-64 for example o The zone->node field is shared with the zone lock and zone->node is accessed frequently from the page allocator due to the fair zone allocation policy. o span_seqlock is almost never used by shares a line with free_area o Some zone statistics share a cache line with the LRU lock so reclaim-intensive and allocator-intensive workloads can bounce the cache line on a stat update This patch rearranges struct zone to put read-only and read-mostly fields together and then splits the page allocator intensive fields, the zone statistics and the page reclaim intensive fields into their own cache lines. Note that the type of lowmem_reserve changes due to the watermark calculations being signed and avoiding a signed/unsigned conversion there. On the test configuration I used the overall size of struct zone shrunk by one cache line. On smaller machines, this is not likely to be noticable. However, on a 4-node NUMA machine running tiobench the system CPU overhead is reduced by this patch. 3.16.0-rc3 3.16.0-rc3 vanillarearrange-v5r9 User 746.94 759.78 System 65336.22 58350.98 Elapsed 27553.52 27282.02 Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mel Gorman authored
commit 24b7e581 upstream. This was formerly the series "Improve sequential read throughput" which noted some major differences in performance of tiobench since 3.0. While there are a number of factors, two that dominated were the introduction of the fair zone allocation policy and changes to CFQ. The behaviour of fair zone allocation policy makes more sense than tiobench as a benchmark and CFQ defaults were not changed due to insufficient benchmarking. This series is what's left. It's one functional fix to the fair zone allocation policy when used on NUMA machines and a reduction of overhead in general. tiobench was used for the comparison despite its flaws as an IO benchmark as in this case we are primarily interested in the overhead of page allocator and page reclaim activity. On UMA, it makes little difference to overhead 3.16.0-rc3 3.16.0-rc3 vanilla lowercost-v5 User 383.61 386.77 System 403.83 401.74 Elapsed 5411.50 5413.11 On a 4-socket NUMA machine it's a bit more noticable 3.16.0-rc3 3.16.0-rc3 vanilla lowercost-v5 User 746.94 802.00 System 65336.22 40852.33 Elapsed 27553.52 27368.46 This patch (of 6): The LRU insertion and activate tracepoints take PFN as a parameter forcing the overhead to the caller. Move the overhead to the tracepoint fast-assign method to ensure the cost is only incurred when the tracepoint is active. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jerome Marchand authored
commit 2ab051e1 upstream. When memory cgoups are enabled, the code that decides to force to scan anonymous pages in get_scan_count() compares global values (free, high_watermark) to a value that is restricted to a memory cgroup (file). It make the code over-eager to force anon scan. For instance, it will force anon scan when scanning a memcg that is mainly populated by anonymous page, even when there is plenty of file pages to get rid of in others memcgs, even when swappiness == 0. It breaks user's expectation about swappiness and hurts performance. This patch makes sure that forced anon scan only happens when there not enough file pages for the all zone, not just in one random memcg. [hannes@cmpxchg.org: cleanups] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Joonsoo Kim authored
commit 474750ab upstream. Richard Yao reported a month ago that his system have a trouble with vmap_area_lock contention during performance analysis by /proc/meminfo. Andrew asked why his analysis checks /proc/meminfo stressfully, but he didn't answer it. https://lkml.org/lkml/2014/4/10/416 Although I'm not sure that this is right usage or not, there is a solution reducing vmap_area_lock contention with no side-effect. That is just to use rcu list iterator in get_vmalloc_info(). rcu can be used in this function because all RCU protocol is already respected by writers, since Nick Piggin commit db64fe02 ("mm: rewrite vmap layer") back in linux-2.6.28 Specifically : insertions use list_add_rcu(), deletions use list_del_rcu() and kfree_rcu(). Note the rb tree is not used from rcu reader (it would not be safe), only the vmap_area_list has full RCU protection. Note that __purge_vmap_area_lazy() already uses this rcu protection. rcu_read_lock(); list_for_each_entry_rcu(va, &vmap_area_list, list) { if (va->flags & VM_LAZY_FREE) { if (va->va_start < *start) *start = va->va_start; if (va->va_end > *end) *end = va->va_end; nr += (va->va_end - va->va_start) >> PAGE_SHIFT; list_add_tail(&va->purge_list, &valist); va->flags |= VM_LAZY_FREEING; va->flags &= ~VM_LAZY_FREE; } } rcu_read_unlock(); Peter: : While rcu list traversal over the vmap_area_list is safe, this may : arrive at different results than the spinlocked version. The rcu list : traversal version will not be a 'snapshot' of a single, valid instant : of the entire vmap_area_list, but rather a potential amalgam of : different list states. Joonsoo: : Yes, you are right, but I don't think that we should be strict here. : Meminfo is already not a 'snapshot' at specific time. While we try to get : certain stats, the other stats can change. And, although we may arrive at : different results than the spinlocked version, the difference would not be : large and would not make serious side-effect. [edumazet@google.com: add more commit description] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Richard Yao <ryao@gentoo.org> Acked-by: Eric Dumazet <edumazet@google.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Zhang Yanfei <zhangyanfei.yes@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jerome Marchand authored
commit 21bda264 upstream. Commit 71e3aac0 ("thp: transparent hugepage core") adds copy_pte_range prototype to huge_mm.h. I'm not sure why (or if) this function have been used outside of memory.c, but it currently isn't. This patch makes copy_pte_range() static again. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Rientjes authored
commit 14a4e214 upstream. Commit 9f1b868a ("mm: thp: khugepaged: add policy for finding target node") improved the previous khugepaged logic which allocated a transparent hugepages from the node of the first page being collapsed. However, it is still possible to collapse pages to remote memory which may suffer from additional access latency. With the current policy, it is possible that 255 pages (with PAGE_SHIFT == 12) will be collapsed remotely if the majority are allocated from that node. When zone_reclaim_mode is enabled, it means the VM should make every attempt to allocate locally to prevent NUMA performance degradation. In this case, we do not want to collapse hugepages to remote nodes that would suffer from increased access latency. Thus, when zone_reclaim_mode is enabled, only allow collapsing to nodes with RECLAIM_DISTANCE or less. There is no functional change for systems that disable zone_reclaim_mode. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Bob Liu <bob.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hugh Dickins authored
commit c0d73261 upstream. Use ACCESS_ONCE() in handle_pte_fault() when getting the entry or orig_pte upon which all subsequent decisions and pte_same() tests will be made. I have no evidence that its lack is responsible for the mm/filemap.c:202 BUG_ON(page_mapped(page)) in __delete_from_page_cache() found by trinity, and I am not optimistic that it will fix it. But I have found no other explanation, and ACCESS_ONCE() here will surely not hurt. If gcc does re-access the pte before passing it down, then that would be disastrous for correct page fault handling, and certainly could explain the page_mapped() BUGs seen (concurrent fault causing page to be mapped in a second time on top of itself: mapcount 2 for a single pte). Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-