- 28 Oct, 2016 21 commits
-
-
Helge Deller authored
commit 690d097c upstream. Increase the initial kernel default page mapping size for SMP kernels to 32MB and add a runtime check which panics early if the kernel is bigger than the initial mapping size. This fixes boot crashes of 32bit SMP kernels. Due to the introduction of huge page support in kernel 4.4 and it's required initial kernel layout in memory, a 32bit SMP kernel usually got bigger (in layout, not size) than 16MB. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sergey Senozhatsky authored
commit c6fe46a7 upstream. 'best' is always less or equals to 'pos', so `best - pos' returns a negative value which is then getting casted to `unsigned int' and passed to __cpufreq_driver_target()->acpi_cpufreq_target() for policy->freq_table selection. This results in BUG: unable to handle kernel paging request at ffff881019b469f8 IP: [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq] PGD 267f067 PUD 0 Oops: 0000 [#1] PREEMPT SMP CPU: 6 PID: 70 Comm: kworker/6:1 Not tainted 4.9.0-rc1-next-20161017-dbg-dirty Workqueue: events dbs_work_handler task: ffff88041b808000 task.stack: ffff88041b810000 RIP: 0010:[<ffffffffa00356c1>] [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq] RSP: 0018:ffff88041b813c60 EFLAGS: 00010282 RAX: ffff880419b46a00 RBX: ffff88041b848400 RCX: ffff880419b20f80 RDX: 00000000001dff38 RSI: 00000000ffffffff RDI: ffff88041b848400 RBP: ffff88041b813cb0 R08: 0000000000000006 R09: 0000000000000040 R10: ffffffff8207f9e0 R11: ffffffff8173595b R12: 0000000000000000 R13: ffff88041f1dff38 R14: 0000000000262900 R15: 0000000bfffffff4 FS: 0000000000000000(0000) GS:ffff88041f000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff881019b469f8 CR3: 000000041a2d3000 CR4: 00000000001406e0 Stack: ffff88041b813cb0 ffffffff813347f9 ffff88041b813ca0 ffffffff81334663 ffff88041f1d4bc0 ffff88041b848400 0000000000000000 0000000000000000 0000000000262900 0000000000000000 ffff88041b813d00 ffffffff813355dc Call Trace: [<ffffffff813347f9>] ? cpufreq_freq_transition_begin+0xf1/0xfc [<ffffffff81334663>] ? get_cpu_idle_time+0x97/0xa6 [<ffffffff813355dc>] __cpufreq_driver_target+0x3b6/0x44e [<ffffffff81336ca3>] cs_dbs_timer+0x11a/0x135 [<ffffffff81336fda>] dbs_work_handler+0x39/0x62 [<ffffffff81057823>] process_one_work+0x280/0x4a5 [<ffffffff81058719>] worker_thread+0x24f/0x397 [<ffffffff810584ca>] ? rescuer_thread+0x30b/0x30b [<ffffffff81418380>] ? nl80211_get_key+0x29/0x36a [<ffffffff8105d2b7>] kthread+0xfc/0x104 [<ffffffff8107ceea>] ? put_lock_stats.isra.9+0xe/0x20 [<ffffffff8105d1bb>] ? kthread_create_on_node+0x3f/0x3f [<ffffffff814b2092>] ret_from_fork+0x22/0x30 Code: 56 4d 6b ff 0c 41 55 41 54 53 48 83 ec 28 48 8b 15 ad 1e 00 00 44 8b 41 08 48 8b 87 c8 00 00 00 49 89 d5 4e 03 2c c5 80 b2 78 81 <46> 8b 74 38 04 45 3b 75 00 75 11 31 c0 83 39 00 0f 84 1c 01 00 RIP [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq] RSP <ffff88041b813c60> CR2: ffff881019b469f8 ---[ end trace 16d9fc7a17897d37 ]--- [ rjw: In some cases this bug may also cause incorrect frequencies to be selected by cpufreq governors. ] Fixes: 899bb664 (cpufreq: skip invalid entries when searching the frequency) Link: http://marc.info/?l=linux-kernel&m=147672030714331&w=2Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Srinivas Pandruvada authored
commit f9f4872d upstream. This is a requirement that MSR MSR_PM_ENABLE must be set to 0x01 before reading MSR_HWP_CAPABILITIES on a given CPU. If cpufreq init() is scheduled on a CPU which is not same as policy->cpu or migrates to a different CPU before calling msr read for MSR_HWP_CAPABILITIES, it is possible that MSR_PM_ENABLE was not to set to 0x01 on that CPU. This will cause GP fault. So like other places in this path rdmsrl_on_cpu should be used instead of rdmsrl. Moreover the scope of MSR_HWP_CAPABILITIES is on per thread basis, so it should be read from the same CPU, for which MSR MSR_HWP_REQUEST is getting set. dmesg dump or warning: [ 22.014488] WARNING: CPU: 139 PID: 1 at arch/x86/mm/extable.c:50 ex_handler_rdmsr_unsafe+0x68/0x70 [ 22.014492] unchecked MSR access error: RDMSR from 0x771 [ 22.014493] Modules linked in: [ 22.014507] CPU: 139 PID: 1 Comm: swapper/0 Not tainted 4.7.5+ #1 ... ... [ 22.014516] Call Trace: [ 22.014542] [<ffffffff813d7dd1>] dump_stack+0x63/0x82 [ 22.014558] [<ffffffff8107bc8b>] __warn+0xcb/0xf0 [ 22.014561] [<ffffffff8107bcff>] warn_slowpath_fmt+0x4f/0x60 [ 22.014563] [<ffffffff810676f8>] ex_handler_rdmsr_unsafe+0x68/0x70 [ 22.014564] [<ffffffff810677d9>] fixup_exception+0x39/0x50 [ 22.014604] [<ffffffff8102e400>] do_general_protection+0x80/0x150 [ 22.014610] [<ffffffff817f9ec8>] general_protection+0x28/0x30 [ 22.014635] [<ffffffff81687940>] ? get_target_pstate_use_performance+0xb0/0xb0 [ 22.014642] [<ffffffff810600c7>] ? native_read_msr+0x7/0x40 [ 22.014657] [<ffffffff81688123>] intel_pstate_hwp_set+0x23/0x130 [ 22.014660] [<ffffffff81688406>] intel_pstate_set_policy+0x1b6/0x340 [ 22.014662] [<ffffffff816829bb>] cpufreq_set_policy+0xeb/0x2c0 [ 22.014664] [<ffffffff81682f39>] cpufreq_init_policy+0x79/0xe0 [ 22.014666] [<ffffffff81682cb0>] ? cpufreq_update_policy+0x120/0x120 [ 22.014669] [<ffffffff816833a6>] cpufreq_online+0x406/0x820 [ 22.014671] [<ffffffff8168381f>] cpufreq_add_dev+0x5f/0x90 [ 22.014717] [<ffffffff81530ac8>] subsys_interface_register+0xb8/0x100 [ 22.014719] [<ffffffff816821bc>] cpufreq_register_driver+0x14c/0x210 [ 22.014749] [<ffffffff81fe1d90>] intel_pstate_init+0x39d/0x4d5 [ 22.014751] [<ffffffff81fe13f2>] ? cpufreq_gov_dbs_init+0x12/0x12 Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Aaro Koskinen authored
commit 899bb664 upstream. Skip invalid entries when searching the frequency. This fixes cpufreq at least on loongson2 MIPS board. Fixes: da0c6dc0 (cpufreq: Handle sorted frequency tables more efficiently) Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rafael J. Wysocki authored
commit abb66279 upstream. Commit d352cf47 (cpufreq: conservative: Do not use transition notifications) overlooked the case when the "frequency step" used by the conservative governor is small relative to the distances between the available frequencies and broke the algorithm by using policy->cur instead of the previously requested frequency when computing the next one. As a result, the governor may not be able to go outside of a narrow range between two consecutive available frequencies. Fix the problem by making the governor save the previously requested frequency and select the next one relative that value (unless it is out of range, in which case policy->cur will be used instead). Fixes: d352cf47 (cpufreq: conservative: Do not use transition notifications) Link: https://bugzilla.kernel.org/show_bug.cgi?id=177171Reported-and-tested-by: Aleksey Rybalkin <aleksey@rybalkin.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dave Gerlach authored
commit e01072d2 upstream. Now that the cpufreq-dt-platdev is used to create the cpufreq-dt platform device for all OMAP platforms and the platform code that did it before has been removed, add ti,am33xx and ti,dra7xx to the machine list in cpufreq-dt-platdev which had relied on the removed platform code to do this previously. Fixes: 7694ca6e (cpufreq: omap: Use generic platdev driver) Signed-off-by: Dave Gerlach <d-gerlach@ti.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sergei Shtylyov authored
commit e330b9a6 upstream. of_irq_get[_byname]() return 0 iff irq_create_of_mapping() call fails. Returning both error code and 0 on failure is a sign of a misdesigned API, it makes the failure check unnecessarily complex and error prone. We should rely on the platform IRQ resource in this case, not return 0, especially as 0 can be a valid IRQ resource too... Fixes: aff008ad ("platform_get_irq: Revert to platform_get_resource if of_irq_get fails") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bjorn Helgaas authored
commit 8dd99bca upstream. The tegra_pcie_phy_disable() path called pads_writel() with arguments in the wrong order. Swap them to be the "value, offset" order expected by pads_writel(). Fixes: 6fe7c187 ("PCI: tegra: Support per-lane PHYs") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Maik Broemme authored
commit 8e2e0317 upstream. Similar to the AR93xx and the AR94xx series, the AR95xx also have the same quirk for the Bus Reset. It will lead to instant system reset if the device is assigned via VFIO to a KVM VM. I've been able reproduce this behavior with a MikroTik R11e-2HnD. Fixes: c3e59ee4 ("PCI: Mark Atheros AR93xx to avoid bus reset") Signed-off-by: Maik Broemme <mbroemme@libmpq.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Haibo Chen authored
commit 02265cd6 upstream. Potentially overflowing expression 1000000 * data->timeout_clks with type unsigned int is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type unsigned long long. To avoid overflow, cast 1000000U to type unsigned long long. Special thanks to Coverity. Fixes: 7f05538a ("mmc: sdhci: fix data timeout (part 2)") Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Glöckner authored
commit 0ed50abb upstream. CMD23 aka SET_BLOCK_COUNT was introduced with MMC v3.1. Older versions of the specification allowed to terminate multi-block transfers only with CMD12. The patch fixes the following problem: mmc0: new MMC card at address 0001 mmcblk0: mmc0:0001 SDMB-16 15.3 MiB mmcblk0: timed out sending SET_BLOCK_COUNT command, card status 0x400900 ... blk_update_request: I/O error, dev mmcblk0, sector 0 Buffer I/O error on dev mmcblk0, logical block 0, async page read mmcblk0: unable to read partition table Signed-off-by: Daniel Glöckner <dg@emlix.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Larry Finger authored
commit 0c9d3491 upstream. Some RTL8821AE devices sold in Great Britain have the country code of 0x25 encoded in their EEPROM. This value is not tested in the routine that establishes the regulatory info for the chip. The fix is to set this code to have the same capabilities as the EU countries. In addition, the channels allowed for COUNTRY_CODE_ETSI were more properly suited for China and Israel, not the EU. This problem has also been fixed. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rajkumar Manoharan authored
commit 0628467f upstream. Firmware is running watchdog timer for tracking copy engine ring index and write index. Whenever both indices are stuck at same location for given duration, watchdog will be trigger to assert target. While updating copy engine destination ring write index, driver ensures that write index will not be same as read index by finding delta between these two indices (CE_RING_DELTA). HTT target to host copy engine (CE5) is special case where ring buffers will be reused and delta check is not applied while updating write index. In rare scenario, whenever CE5 ring is full, both indices will be referring same location and this is causing CE ring stuck issue as explained above. This issue is originally reported on IPQ4019 during long hour stress testing and during veriwave max clients testsuites. The same issue is also observed in other chips as well. Fix this by ensuring that write index is one less than read index which means that full ring is available for receiving data. Tested-by: Tamizh chelvam <c_traja@qti.qualcomm.com> Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lin Huang authored
commit c8a9a6da upstream. there define two devfreq_event_get_drvdata() function in devfreq-event.h when disable CONFIG_PM_DEVFREQ_EVENT, it will lead to build fail. So remove devfreq_event_get_drvdata() function. Fixes: f262f28c ("PM / devfreq: event: Add devfreq_event class") Signed-off-by: Lin Huang <hl@rock-chips.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Geert Uytterhoeven authored
commit 0278b34b upstream. Sometimes spidev_test crashes with: *** Error in `spidev_test': munmap_chunk(): invalid pointer: 0x00022020 *** Aborted or just Segmentation fault This is due to transfer_escaped_string() miscalculating the required size of the buffer by one byte, causing a buffer overflow in unescape(). Drop the bogus "+ 1" in the strlen() parameter to fix this. Note that unescape() never copies the zero-terminator of the source string, so it writes at most as many bytes as the length of the source string. Fixes: 30061915 (spi: spidev_test: Added input buffer from the terminal) Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lucas Stach authored
commit b1d51b44 upstream. The current clock tree only implements the minimal set of differences between the i.MX6Q and the i.MX6DL, but that doesn't really reflect reality. Apply the following fixes to match the RM: - DL has no GPU3D_SHADER_SEL/PODF, the shader domain is clocked by GPU3D_CORE - GPU3D_SHADER_SEL/PODF has been repurposed as GPU2D_CORE_SEL/PODF - GPU2D_CORE_SEL/PODF has been repurposed as MLB_SEL/PODF Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lucas Stach authored
commit d8846023 upstream. Initialize the GPU clock muxes to sane inputs. Until now they have not been changed from their default values, which means that both GPU3D shader and GPU2D core were fed by clock inputs whose rates exceed the maximium allowed frequency of the cores by as much as 200MHz. This fixes a severe GPU stability issue on i.MX6DL. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Remmet authored
commit 8f9165c9 upstream. http://www.ti.com/lit/pdf/SWCZ010: DCDC o/p voltage can go higher than programmed value Impact: VDDI, VDD2, and VIO output programmed voltage level can go higher than expected or crash, when coming out of PFM to PWM mode or using DVFS. Description: When DCDC CLK SYNC bits are 11/01: * VIO 3-MHz oscillator is the source clock of the digital core and input clock of VDD1 and VDD2 * Turn-on of VDD1 and VDD2 HSD PFETis synchronized or at a constant phase shift * Current pulled though VCC1+VCC2 is Iload(VDD1) + Iload(VDD2) * The 3 HSD PFET will be turned-on at the same time, causing the highest possible switching noise on the application. This noise level depends on the layout, the VBAT level, and the load current. The noise level increases with improper layout. When DCDC CLK SYNC bits are 00: * VIO 3-MHz oscillator is the source clock of digital core * VDD1 and VDD2 are running on their own 3-MHz oscillator * Current pulled though VCC1+VCC2 average of Iload(VDD1) + Iload(VDD2) * The switching noise of the 3 SMPS will be randomly spread over time, causing lower overall switching noise. Workaround: Set DCDCCTRL_REG[1:0]= 00. Signed-off-by: Jan Remmet <j.remmet@phytec.de> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Usyskin authored
commit ac182e8a upstream. Add device ids for Intel Kabypoint PCH (Kabylake) Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tomas Winkler authored
commit 2d4d5481 upstream. Correct errno on client disconnection is -ENODEV not -EBUSY Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Liu Gang authored
commit d71cf15b upstream. From the beginning of the gpio-mpc8xxx.c, the "handle_level_irq" has being used to handle GPIO interrupts in the PowerPC/Layerscape platforms. But actually, almost all PowerPC/Layerscape platforms assert an interrupt request upon either a high-to-low change or any change on the state of the signal. So the "handle_level_irq" is not reasonable for PowerPC/Layerscape GPIO interrupt, it should be "handle_edge_irq". Otherwise the system may lost some interrupts from the PIN's state changes. Signed-off-by: Liu Gang <Gang.Liu@nxp.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 22 Oct, 2016 19 commits
-
-
Greg Kroah-Hartman authored
-
Glauber Costa authored
commit 3932a86b upstream. While debugging timeouts happening in my application workload (ScyllaDB), I have observed calls to open() taking a long time, ranging everywhere from 2 seconds - the first ones that are enough to time out my application - to more than 30 seconds. The problem seems to happen because XFS may block on pending metadata updates under certain circumnstances, and that's confirmed with the following backtrace taken by the offcputime tool (iovisor/bcc): ffffffffb90c57b1 finish_task_switch ffffffffb97dffb5 schedule ffffffffb97e310c schedule_timeout ffffffffb97e1f12 __down ffffffffb90ea821 down ffffffffc046a9dc xfs_buf_lock ffffffffc046abfb _xfs_buf_find ffffffffc046ae4a xfs_buf_get_map ffffffffc046babd xfs_buf_read_map ffffffffc0499931 xfs_trans_read_buf_map ffffffffc044a561 xfs_da_read_buf ffffffffc0451390 xfs_dir3_leaf_read.constprop.16 ffffffffc0452b90 xfs_dir2_leaf_lookup_int ffffffffc0452e0f xfs_dir2_leaf_lookup ffffffffc044d9d3 xfs_dir_lookup ffffffffc047d1d9 xfs_lookup ffffffffc0479e53 xfs_vn_lookup ffffffffb925347a path_openat ffffffffb9254a71 do_filp_open ffffffffb9242a94 do_sys_open ffffffffb9242b9e sys_open ffffffffb97e42b2 entry_SYSCALL_64_fastpath 00007fb0698162ed [unknown] Inspecting my run with blktrace, I can see that the xfsaild kthread exhibit very high "Dispatch wait" times, on the dozens of seconds range and consistent with the open() times I have saw in that run. Still from the blktrace output, we can after searching a bit, identify the request that wasn't dispatched: 8,0 11 152 81.092472813 804 A WM 141698288 + 8 <- (8,1) 141696240 8,0 11 153 81.092472889 804 Q WM 141698288 + 8 [xfsaild/sda1] 8,0 11 154 81.092473207 804 G WM 141698288 + 8 [xfsaild/sda1] 8,0 11 206 81.092496118 804 I WM 141698288 + 8 ( 22911) [xfsaild/sda1] <==== 'I' means Inserted (into the IO scheduler) ===================================> 8,0 0 289372 96.718761435 0 D WM 141698288 + 8 (15626265317) [swapper/0] <==== Only 15s later the CFQ scheduler dispatches the request ======================> As we can see above, in this particular example CFQ took 15 seconds to dispatch this request. Going back to the full trace, we can see that the xfsaild queue had plenty of opportunity to run, and it was selected as the active queue many times. It would just always be preempted by something else (example): 8,0 1 0 81.117912979 0 m N cfq1618SN / insert_request 8,0 1 0 81.117913419 0 m N cfq1618SN / add_to_rr 8,0 1 0 81.117914044 0 m N cfq1618SN / preempt 8,0 1 0 81.117914398 0 m N cfq767A / slice expired t=1 8,0 1 0 81.117914755 0 m N cfq767A / resid=40 8,0 1 0 81.117915340 0 m N / served: vt=1948520448 min_vt=1948520448 8,0 1 0 81.117915858 0 m N cfq767A / sl_used=1 disp=0 charge=0 iops=1 sect=0 where cfq767 is the xfsaild queue and cfq1618 corresponds to one of the ScyllaDB IO dispatchers. The requests preempting the xfsaild queue are synchronous requests. That's a characteristic of ScyllaDB workloads, as we only ever issue O_DIRECT requests. While it can be argued that preempting ASYNC requests in favor of SYNC is part of the CFQ logic, I don't believe that doing so for 15+ seconds is anyone's goal. Moreover, unless I am misunderstanding something, that breaks the expectation set by the "fifo_expire_async" tunable, which in my system is set to the default. Looking at the code, it seems to me that the issue is that after we make an async queue active, there is no guarantee that it will execute any request. When the queue itself tests if it cfq_may_dispatch() it can bail if it sees SYNC requests in flight. An incoming request from another queue can also preempt it in such situation before we have the chance to execute anything (as seen in the trace above). This patch sets the must_dispatch flag if we notice that we have requests that are already fifo_expired. This flag is always cleared after cfq_dispatch_request() returns from cfq_dispatch_requests(), so it won't pin the queue for subsequent requests (unless they are themselves expired) Care is taken during preempt to still allow rt requests to preempt us regardless. Testing my workload with this patch applied produces much better results. From the application side I see no timeouts, and the open() latency histogram generated by systemtap looks much better, with the worst outlier at 131ms: Latency histogram of xfs_buf_lock acquisition (microseconds): value |-------------------------------------------------- count 0 | 11 1 |@@@@ 161 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1966 4 |@ 54 8 | 36 16 | 7 32 | 0 64 | 0 ~ 1024 | 0 2048 | 0 4096 | 1 8192 | 1 16384 | 2 32768 | 0 65536 | 0 131072 | 1 262144 | 0 524288 | 0 Signed-off-by: Glauber Costa <glauber@scylladb.com> CC: Jens Axboe <axboe@kernel.dk> CC: linux-block@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Glauber Costa <glauber@scylladb.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vishal Verma authored
commit c09f1218 upstream. Commit 20985164 "acpi: nfit: Add support for hot-add" added support for _FIT notifications, but it neglected to verify the notification event code matches the one in the ACPI spec for "NFIT Update". Currently there is only one code in the spec, but once additional codes are added, older kernels (without this fix) will misbehave by assuming all event notifications are for an NFIT Update. Fixes: 20985164 ("acpi: nfit: Add support for hot-add") Cc: <linux-acpi@vger.kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Laszlo Ersek authored
commit c2cbc38b upstream. Before commit a3257256 ("drm: Lobotomize set_busid nonsense for !pci drivers"), several DRM drivers for platform devices used to expose an explicit "drm_driver.set_busid" callback, invariably backed by drm_platform_set_busid(). Commit a3257256 removed drm_platform_set_busid(), along with the referring .set_busid field initializations. This was justified because interchangeable functionality had been implemented in drm_dev_alloc() / drm_dev_init(), which DRM_IOCTL_SET_VERSION would rely on going forward. However, commit a3257256 also removed drm_virtio_set_busid(), for which the same consolidation was not appropriate: this .set_busid callback had been implemented with drm_pci_set_busid(), and not drm_platform_set_busid(). The error regressed Xorg/xserver on QEMU's "virtio-vga" card; the drmGetBusid() function from libdrm would no longer return stable PCI identifiers like "pci:0000:00:02.0", but rather unstable platform ones like "virtio0". Reinstate drm_virtio_set_busid() with judicious use of git checkout -p a3257256^ -- drivers/gpu/drm/virtio Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Joachim Frieben <jfrieben@hotmail.com> Reported-by: Joachim Frieben <jfrieben@hotmail.com> Fixes: a3257256 Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1366842Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Howells authored
commit a818101d upstream. An NULL-pointer dereference happens in cachefiles_mark_object_inactive() when it tries to read i_blocks so that it can tell the cachefilesd daemon how much space it's making available. The problem is that cachefiles_drop_object() calls cachefiles_mark_object_inactive() after calling cachefiles_delete_object() because the object being marked active staves off attempts to (re-)use the file at that filename until after it has been deleted. This means that d_inode is NULL by the time we come to try to access it. To fix the problem, have the caller of cachefiles_mark_object_inactive() supply the number of blocks freed up. Without this, the following oops may occur: BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 IP: [<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles] ... CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G I ------------ 3.10.0-470.el7.x86_64 #1 Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011 Workqueue: fscache_object fscache_object_work_func [fscache] task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000 RIP: 0010:[<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles] RSP: 0018:ffff8800b77c3d70 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034 RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8 RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000 R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600 R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498 FS: 0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0 ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658 ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600 Call Trace: [<ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles] [<ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache] [<ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache] [<ffffffff810a605b>] process_one_work+0x17b/0x470 [<ffffffff810a6e96>] worker_thread+0x126/0x410 [<ffffffff810a6d70>] ? rescuer_thread+0x460/0x460 [<ffffffff810ae64f>] kthread+0xcf/0xe0 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140 [<ffffffff81695418>] ret_from_fork+0x58/0x90 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140 The oopsing code shows: callq 0xffffffff810af6a0 <wake_up_bit> mov 0xf8(%r12),%rax mov 0x30(%rax),%rax mov 0x98(%rax),%rax <---- oops here lock add %rax,0x130(%rbx) where this is: d_backing_inode(object->dentry)->i_blocks Fixes: a5b3a80b (CacheFiles: Provide read-and-reset release counters for cachefilesd) Reported-by: Jianhong Yin <jiyin@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Miklos Szeredi authored
commit f2b20f6e upstream. This fixes a bug where the permission was not properly checked in overlayfs. The testcase is ltp/utimensat01. It is also cleaner and safer to do the permission checking in the vfs helper instead of the caller. This patch introduces an additional ia_valid flag ATTR_TOUCH (since touch(1) is the most obvious user of utimes(NULL)) that is passed into notify_change whenever the conditions for this special permission checking mode are met. Reported-by: Aihua Zhang <zhangaihua1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Aihua Zhang <zhangaihua1@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marcelo Ricardo Leitner authored
commit 3a8db798 upstream. After backporting commit ee44b4bc ("dlm: use sctp 1-to-1 API") series to a kernel with an older workqueue which didn't use RCU yet, it was noticed that we are freeing the workqueues in dlm_lowcomms_stop() too early as free_conn() will try to access that memory for canceling the queued works if any. This issue was introduced by commit 0d737a8c as before it such attempt to cancel the queued works wasn't performed, so the issue was not present. This patch fixes it by simply inverting the free order. Fixes: 0d737a8c ("dlm: fix race while closing connections") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marcelo Cerri authored
commit 80da44c2 upstream. This patch changes the p8_ghash driver to use ghash-generic as a fixed fallback implementation. This allows the correct value of descsize to be defined directly in its shash_alg structure and avoids problems with incorrect buffer sizes when its state is exported or imported. Reported-by: Jan Stancek <jstancek@redhat.com> Fixes: cc333cd6 ("crypto: vmx - Adding GHASH routines for VMX module") Signed-off-by: Marcelo Cerri <marcelo.cerri@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Marcelo Cerri authored
commit a397ba82 upstream. Move common values and types used by ghash-generic to a new header file so drivers can directly use ghash-generic as a fallback implementation. Fixes: cc333cd6 ("crypto: vmx - Adding GHASH routines for VMX module") Signed-off-by: Marcelo Cerri <marcelo.cerri@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit 9b623df6 upstream. When zeroing blocks for DAX allocations, we also have to unmap aliases in the block device mappings. Otherwise writeback can overwrite zeros with stale data from block device page cache. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
gmail authored
commit e81d4477 upstream. The commit 6050d47a: "ext4: bail out from make_indexed_dir() on first error" could end up leaking bh2 in the error path. [ Also avoid renaming bh2 to bh, which just confuses things --tytso ] Signed-off-by: yangsheng <yngsion@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ross Zwisler authored
commit cca32b7e upstream. Currently when doing a DAX hole punch with ext4 we fail to do a writeback. This is because the logic around filemap_write_and_wait_range() in ext4_punch_hole() only looks for dirty page cache pages in the radix tree, not for dirty DAX exceptional entries. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Biggers authored
commit dcce7a46 upstream. This bug was introduced in v4.8-rc1. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Fabian Frederick authored
commit edf15aa1 upstream. Running xfstests generic/013 with kmemleak gives the following: unreferenced object 0xffff8801d3d27de0 (size 96): comm "fsstress", pid 4941, jiffies 4294860168 (age 53.485s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff818eaaf3>] kmemleak_alloc+0x23/0x40 [<ffffffff81179805>] __kmalloc+0xf5/0x1d0 [<ffffffff8122ef5c>] ext4_find_extent+0x1ec/0x2f0 [<ffffffff8123530c>] ext4_insert_range+0x34c/0x4a0 [<ffffffff81235942>] ext4_fallocate+0x4e2/0x8b0 [<ffffffff81181334>] vfs_fallocate+0x134/0x210 [<ffffffff8118203f>] SyS_fallocate+0x3f/0x60 [<ffffffff818efa9b>] entry_SYSCALL_64_fastpath+0x13/0x8f [<ffffffffffffffff>] 0xffffffffffffffff Problem seems mitigated by dropping refs and freeing path when there's no path[depth].p_ext Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
wangguang authored
commit 4e800c03 upstream. Pages clear buffers after ext4 delayed block allocation failed, However, it does not clean its pte_dirty flag. if the pages unmap ,in cording to the pte_dirty , unmap_page_range may try to call __set_page_dirty, which may lead to the bugon at mpage_prepare_extent_to_map:head = page_buffers(page);. This patch just call clear_page_dirty_for_io to clean pte_dirty at mpage_release_unused_pages for pages mmaped. Steps to reproduce the bug: (1) mmap a file in ext4 addr = (char *)mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); memset(addr, 'i', 4096); (2) return EIO at ext4_writepages->mpage_map_and_submit_extent->mpage_map_one_extent which causes this log message to be print: ext4_msg(sb, KERN_CRIT, "Delayed block allocation failed for " "inode %lu at logical offset %llu with" " max blocks %u with error %d", inode->i_ino, (unsigned long long)map->m_lblk, (unsigned)map->m_len, -err); (3)Unmap the addr cause warning at __set_page_dirty:WARN_ON_ONCE(warn && !PageUptodate(page)); (4) wait for a minute,then bugon happen. Signed-off-by: wangguang <wangguang03@zte.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daeho Jeong authored
commit 93e3b4e6 upstream. Now, ext4_do_update_inode() clears high 16-bit fields of uid/gid of deleted and evicted inode to fix up interoperability with old kernels. However, it checks only i_dtime of an inode to determine whether the inode was deleted and evicted, and this is very risky, because i_dtime can be used for the pointer maintaining orphan inode list, too. We need to further check whether the i_dtime is being used for the orphan inode list even if the i_dtime is not NULL. We found that high 16-bit fields of uid/gid of inode are unintentionally and permanently cleared when the inode truncation is just triggered, but not finished, and the inode metadata, whose high uid/gid bits are cleared, is written on disk, and the sudden power-off follows that in order. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Hobin Woo <hobin.woo@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Whitney authored
commit 14fbd4aa upstream. Online defragging of encrypted files is not currently implemented. However, the move extent ioctl can still return successfully when called. For example, this occurs when xfstest ext4/020 is run on an encrypted file system, resulting in a corrupted test file and a corresponding test failure. Until the proper functionality is implemented, fail the move extent ioctl if either the original or donor file is encrypted. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit e03a9976 upstream. Thomas has reported a lockdep splat hitting in add_transaction_credits(). The problem is that that function calls jbd2_might_wait_for_commit() while holding j_state_lock which is wrong (we do not really wait for transaction commit while holding that lock). Fix the problem by moving jbd2_might_wait_for_commit() into places where we are ready to wait for transaction commit and thus j_state_lock is unlocked. Fixes: 1eaa566dReported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wei Fang authored
commit c2a9737f upstream. We triggered a deadloop in truncate_inode_pages_range() on 32 bits architecture with the test case bellow: ... fd = open(); write(fd, buf, 4096); preadv64(fd, &iovec, 1, 0xffffffff000); ftruncate(fd, 0); ... Then ftruncate() will not return forever. The filesystem used in this case is ubifs, but it can be triggered on many other filesystems. When preadv64() is called with offset=0xffffffff000, a page with index=0xffffffff will be added to the radix tree of ->mapping. Then this page can be found in ->mapping with pagevec_lookup(). After that, truncate_inode_pages_range(), which is called in ftruncate(), will fall into an infinite loop: - find a page with index=0xffffffff, since index>=end, this page won't be truncated - index++, and index become 0 - the page with index=0xffffffff will be found again The data type of index is unsigned long, so index won't overflow to 0 on 64 bits architecture in this case, and the dead loop won't happen. Since truncate_inode_pages_range() is executed with holding lock of inode->i_rwsem, any operation related with this lock will be blocked, and a hung task will happen, e.g.: INFO: task truncate_test:3364 blocked for more than 120 seconds. ... call_rwsem_down_write_failed+0x17/0x30 generic_file_write_iter+0x32/0x1c0 ubifs_write_iter+0xcc/0x170 __vfs_write+0xc4/0x120 vfs_write+0xb2/0x1b0 SyS_write+0x46/0xa0 The page with index=0xffffffff added to ->mapping is useless. Fix this by checking the read position before allocating pages. Link: http://lkml.kernel.org/r/1475151010-40166-1-git-send-email-fangwei1@huawei.comSigned-off-by: Wei Fang <fangwei1@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-