- 19 Feb, 2016 40 commits
-
-
Mathias Krause authored
commit 3dcc8d39 upstream. Commit 12669631 ("PCI: Prevent out of bounds access in numa_node override") missed that the user-provided node could also be negative. Handle this case as well to avoid out-of-bounds accesses to the node_states[] array. However, allow the special value -1, i.e. NUMA_NO_NODE, to be able to set the 'no specific node' configuration. Fixes: 12669631 ("PCI: Prevent out of bounds access in numa_node override") Fixes: 63692df1 ("PCI: Allow numa_node override via sysfs") Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Sasha Levin <sasha.levin@oracle.com> CC: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Duyck authored
commit ea9a8854 upstream. The enumeration path should leave NumVFs set to zero. But after 4449f079 ("PCI: Calculate maximum number of buses required for VFs"), we call virtfn_max_buses() in the enumeration path, which changes NumVFs. This NumVFs change is visible via lspci and sysfs until a driver enables SR-IOV. Iterate from TotalVFs down to zero so NumVFs is zero when we're finished computing the maximum number of buses. Validate offset and stride in the loop, so we can test it at every possible NumVFs setting. Rename virtfn_max_buses() to compute_max_vf_buses() to hint that it does have a side effect of updating iov->max_VF_buses. [bhelgaas: changelog, rename, allow numVF==1 && stride==0, rework loop, reverse sense of error path] Fixes: 4449f079 ("PCI: Calculate maximum number of buses required for VFs") Based-on-patch-by: Ethan Zhao <ethan.zhao@oracle.com> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gabriele Paoloni authored
commit fa3b7cba upstream. The first argument of dw_pcie_cfg_read/write() is a 32-bit aligned address. The second argument is the byte offset into a 32-bit word, and dw_pcie_cfg_read/write() only look at the low two bits. SPEAr13xx used dw_pcie_cfg_read() and dw_pcie_cfg_write() incorrectly: it passed important address bits in the second argument, where they were ignored. Pass the complete 32-bit word address in the first argument and only the 2-bit offset into that word in the second argument. Without this fix, SPEAr13xx host will never work with few buggy gen1 card which connects with only gen1 host and also with any endpoint which would generate a read request of more than 128 bytes. [bhelgaas: changelog] Reported-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Pratyush Anand <panand@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sebastian Siewior authored
commit 6b238de1 upstream. If __erase_worker() fails to erase the EB and schedule_erase() fails as well to do anything about it then we go RO. But that is not a reason to leak the e argument here. Therefore clean up e. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sebastian Siewior authored
commit 1a31b20c upstream. Since fastmap we gained do_sync_erase(). This function can return an error and its error handling isn't obvious. First the memory allocation for struct ubi_work can fail and as such struct ubi_wl_entry is leaked. However if the memory allocation succeeds then the tail function takes care of the struct ubi_wl_entry. A free here could result in a double free. To make the error handling simpler, I split the tail function into one piece which does the work and another which frees the struct ubi_work which is passed as argument. As result do_sync_erase() can keep the struct on stack and we get rid of one error source. Fixes: 8199b901 ("UBI: Add fastmap support to the WL sub-system") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brian Norris authored
commit 96dd922c upstream. Fallout from commit 832f5dac ("MIPS: Remove all the uses of custom gpio.h") We see errors like this: drivers/mtd/nand/jz4740_nand.c: In function 'jz_nand_detect_bank': drivers/mtd/nand/jz4740_nand.c:340:9: error: 'JZ_GPIO_MEM_CS0' undeclared (first use in this function) drivers/mtd/nand/jz4740_nand.c:340:9: note: each undeclared identifier is reported only once for each function it appears in drivers/mtd/nand/jz4740_nand.c:359:2: error: implicit declaration of function 'jz_gpio_set_function' [-Werror=implicit-function-declaration] drivers/mtd/nand/jz4740_nand.c:359:29: error: 'JZ_GPIO_FUNC_MEM_CS0' undeclared (first use in this function) drivers/mtd/nand/jz4740_nand.c:399:29: error: 'JZ_GPIO_FUNC_NONE' undeclared (first use in this function) drivers/mtd/nand/jz4740_nand.c: In function 'jz_nand_probe': drivers/mtd/nand/jz4740_nand.c:528:13: error: 'JZ_GPIO_MEM_CS0' undeclared (first use in this function) drivers/mtd/nand/jz4740_nand.c: In function 'jz_nand_remove': drivers/mtd/nand/jz4740_nand.c:555:14: error: 'JZ_GPIO_MEM_CS0' undeclared (first use in this function) Patched similarly to: https://patchwork.linux-mips.org/patch/11089/ Fixes: 832f5dac ("MIPS: Remove all the uses of custom gpio.h") Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brian Norris authored
commit 9ca641b0 upstream. If multiple NAND chips are registered to the same controller, then when rebooting the system, the first one will grab the controller lock, while the second will wait forever for the first one to release it. i.e., a classic deadlock. This problem was solved for a similar case (suspend/resume) back in commit 6b0d9a84 ("mtd: nand: fix multi-chip suspend problem"), and the shutdown state really isn't much different for us, so rather than adding a new special case to nand_get_device(), we can just overload the FL_PM_SUSPENDED state. Now, multiple chips can "get" the same controller lock (preventing further I/O), while we still allow other chips to pass through nand_shutdown(). Original report: http://thread.gmane.org/gmane.linux.drivers.mtd/59726 http://lists.infradead.org/pipermail/linux-mtd/2015-July/059992.html Fixes: 72ea4036 ("mtd: nand: added nand_shutdown") Reported-by: Andrew E. Mileski <andrewm@isoar.ca> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: Andrew E. Mileski <andrewm@isoar.ca> Acked-by: Scott Branden <sbranden@broadcom.com> Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Brian Norris authored
commit f3c63795 upstream. Commit 073db4a5 ("mtd: fix: avoid race condition when accessing mtd->usecount") fixed a race condition but due to poor ordering of the mutex acquisition, introduced a potential deadlock. The deadlock can occur, for example, when rmmod'ing the m25p80 module, which will delete one or more MTDs, along with any corresponding mtdblock devices. This could potentially race with an acquisition of the block device as follows. -> blktrans_open() -> mutex_lock(&dev->lock); -> mutex_lock(&mtd_table_mutex); -> del_mtd_device() -> mutex_lock(&mtd_table_mutex); -> blktrans_notify_remove() -> del_mtd_blktrans_dev() -> mutex_lock(&dev->lock); This is a classic (potential) ABBA deadlock, which can be fixed by making the A->B ordering consistent everywhere. There was no real purpose to the ordering in the original patch, AFAIR, so this shouldn't be a problem. This ordering was actually already present in del_mtd_blktrans_dev(), for one, where the function tried to ensure that its caller already held mtd_table_mutex before it acquired &dev->lock: if (mutex_trylock(&mtd_table_mutex)) { mutex_unlock(&mtd_table_mutex); BUG(); } So, reverse the ordering of acquisition of &dev->lock and &mtd_table_mutex so we always acquire mtd_table_mutex first. Snippets of the lockdep output follow: # modprobe -r m25p80 [ 53.419251] [ 53.420838] ====================================================== [ 53.427300] [ INFO: possible circular locking dependency detected ] [ 53.433865] 4.3.0-rc6 #96 Not tainted [ 53.437686] ------------------------------------------------------- [ 53.444220] modprobe/372 is trying to acquire lock: [ 53.449320] (&new->lock){+.+...}, at: [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc [ 53.457271] [ 53.457271] but task is already holding lock: [ 53.463372] (mtd_table_mutex){+.+.+.}, at: [<c0439994>] del_mtd_device+0x18/0x100 [ 53.471321] [ 53.471321] which lock already depends on the new lock. [ 53.471321] [ 53.479856] [ 53.479856] the existing dependency chain (in reverse order) is: [ 53.487660] -> #1 (mtd_table_mutex){+.+.+.}: [ 53.492331] [<c043fc5c>] blktrans_open+0x34/0x1a4 [ 53.497879] [<c01afce0>] __blkdev_get+0xc4/0x3b0 [ 53.503364] [<c01b0bb8>] blkdev_get+0x108/0x320 [ 53.508743] [<c01713c0>] do_dentry_open+0x218/0x314 [ 53.514496] [<c0180454>] path_openat+0x4c0/0xf9c [ 53.519959] [<c0182044>] do_filp_open+0x5c/0xc0 [ 53.525336] [<c0172758>] do_sys_open+0xfc/0x1cc [ 53.530716] [<c000f740>] ret_fast_syscall+0x0/0x1c [ 53.536375] -> #0 (&new->lock){+.+...}: [ 53.540587] [<c063f124>] mutex_lock_nested+0x38/0x3cc [ 53.546504] [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc [ 53.552606] [<c043f164>] blktrans_notify_remove+0x7c/0x84 [ 53.558891] [<c04399f0>] del_mtd_device+0x74/0x100 [ 53.564544] [<c043c670>] del_mtd_partitions+0x80/0xc8 [ 53.570451] [<c0439aa0>] mtd_device_unregister+0x24/0x48 [ 53.576637] [<c046ce6c>] spi_drv_remove+0x1c/0x34 [ 53.582207] [<c03de0f0>] __device_release_driver+0x88/0x114 [ 53.588663] [<c03de19c>] device_release_driver+0x20/0x2c [ 53.594843] [<c03dd9e8>] bus_remove_device+0xd8/0x108 [ 53.600748] [<c03dacc0>] device_del+0x10c/0x210 [ 53.606127] [<c03dadd0>] device_unregister+0xc/0x20 [ 53.611849] [<c046d878>] __unregister+0x10/0x20 [ 53.617211] [<c03da868>] device_for_each_child+0x50/0x7c [ 53.623387] [<c046eae8>] spi_unregister_master+0x58/0x8c [ 53.629578] [<c03e12f0>] release_nodes+0x15c/0x1c8 [ 53.635223] [<c03de0f8>] __device_release_driver+0x90/0x114 [ 53.641689] [<c03de900>] driver_detach+0xb4/0xb8 [ 53.647147] [<c03ddc78>] bus_remove_driver+0x4c/0xa0 [ 53.652970] [<c00cab50>] SyS_delete_module+0x11c/0x1e4 [ 53.658976] [<c000f740>] ret_fast_syscall+0x0/0x1c [ 53.664621] [ 53.664621] other info that might help us debug this: [ 53.664621] [ 53.672979] Possible unsafe locking scenario: [ 53.672979] [ 53.679169] CPU0 CPU1 [ 53.683900] ---- ---- [ 53.688633] lock(mtd_table_mutex); [ 53.692383] lock(&new->lock); [ 53.698306] lock(mtd_table_mutex); [ 53.704658] lock(&new->lock); [ 53.707946] [ 53.707946] *** DEADLOCK *** Fixes: 073db4a5 ("mtd: fix: avoid race condition when accessing mtd->usecount") Reported-by: Felipe Balbi <balbi@ti.com> Tested-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Boris BREZILLON authored
commit e5bae867 upstream. If we fail to allocate a partition structure in the middle of the partition creation process, the already allocated partitions are never removed, which means they are still present in the partition list and their resources are never freed. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dmitry Kasatkin authored
commit 72e1eed8 upstream. If IMA_LOAD_X509 is enabled, either directly or indirectly via IMA_APPRAISE_SIGNED_INIT, certificates are loaded onto the IMA trusted keyring by the kernel via key_create_or_update(). When the KEY_ALLOC_TRUSTED flag is provided, certificates are loaded without first verifying the certificate is properly signed by a trusted key on the system keyring. This patch removes the KEY_ALLOC_TRUSTED flag. Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jarkko Sakkinen authored
commit b1a4144a upstream. Mimi reported that afb5abc2 reverts the fix in 398a1e71. This patch reverts it back. Fixes: afb5abc2 ("tpm: two-phase chip management functions") Reported-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Acked-by: Peter Huewe <PeterHuewe@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Martin Wilck authored
commit 2aef9da6 upstream. Release IRQs used for probing only. Otherwise the TPM will end up with all IRQs 3-15 assigned. Fixes: afb5abc2 ("tpm: two-phase chip management functions") Signed-off-by: Martin Wilck <Martin.Wilck@ts.fujitsu.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Acked-by: Peter Huewe <PeterHuewe@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Hon Ching \(Vicky\) Lo authored
commit 60ecd86c upstream. At ibm vtpm initialzation, tpm_ibmvtpm_probe() registers its interrupt handler, ibmvtpm_interrupt, which calls ibmvtpm_crq_process to allocate memory for rtce buffer. The current code uses 'GFP_KERNEL' as the type of kernel memory allocation, which resulted a warning at kernel/lockdep.c. This patch uses 'GFP_ATOMIC' instead so that the allocation is high-priority and does not sleep. Signed-off-by: Hon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com> Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jarkko Sakkinen authored
commit 149789ce upstream. The command buffer address must be read with exactly two 32-bit reads. Otherwise, on some HW platforms, it seems that HW will abort the read operation, which causes CPU to fill the read bytes with 1's. Therefore, we cannot rely on memcpy_fromio() but must call ioread32() two times instead. Also, this matches the PC Client Platform TPM Profile specification, which defines command buffer address with two 32-bit fields. Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ricardo Ribalda Delgado authored
commit eca37c7c upstream. Some users have reported that in polled mode the driver fails randomly to read the last word of the transfer. The end condition used for the transmissions (in polled and irq mode) has been the TX_EMPTY flag. But Lars-Peter Clausen has identified a delay from the TX_EMPTY to the actual end of the data rx. I believe that this race condition has not been detected until now because of the latency added by the IRQ handler or the PCIe bridge. This bugs affects setups with low latency access to the spi core. This patch replaces the readout logic: For all the words, except the last one, the TX_EMPTY flag is used (and cached). If !TX_EMPY or is the last word. The status register is read and the RX_EMPTY flag is used. The performance is not affected: there is an extra read of the Status Register, but the readout can start as soon as there is a word in the buffer. Reported-by: Edward Kigwana <ekigwana@scires.com> Initial-fix-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Uri Mashiach authored
commit e47301b0 upstream. Fix the below Oops when trying to modprobe wlcore_spi. The oops occurs because the wl1271_power_{off,on}() function doesn't check the power() function pointer. [ 23.401447] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 23.409954] pgd = c0004000 [ 23.412922] [00000000] *pgd=00000000 [ 23.416693] Internal error: Oops: 80000007 [#1] SMP ARM [ 23.422168] Modules linked in: wl12xx wlcore mac80211 cfg80211 musb_dsps musb_hdrc usbcore usb_common snd_soc_simple_card evdev joydev omap_rng wlcore_spi snd_soc_tlv320aic23_i2c rng_core snd_soc_tlv320aic23 c_can_platform c_can can_dev snd_soc_davinci_mcasp snd_soc_edma snd_soc_omap omap_wdt musb_am335x cpufreq_dt thermal_sys hwmon [ 23.453253] CPU: 0 PID: 36 Comm: kworker/0:2 Not tainted 4.2.0-00002-g951efee-dirty #233 [ 23.461720] Hardware name: Generic AM33XX (Flattened Device Tree) [ 23.468123] Workqueue: events request_firmware_work_func [ 23.473690] task: de32efc0 ti: de4ee000 task.ti: de4ee000 [ 23.479341] PC is at 0x0 [ 23.482112] LR is at wl12xx_set_power_on+0x28/0x124 [wlcore] [ 23.488074] pc : [<00000000>] lr : [<bf2581f0>] psr: 60000013 [ 23.488074] sp : de4efe50 ip : 00000002 fp : 00000000 [ 23.500162] r10: de7cdd00 r9 : dc848800 r8 : bf27af00 [ 23.505663] r7 : bf27a1a8 r6 : dcbd8a80 r5 : dce0e2e0 r4 : dce0d2e0 [ 23.512536] r3 : 00000000 r2 : 00000000 r1 : 00000001 r0 : dc848810 [ 23.519412] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 23.527109] Control: 10c5387d Table: 9cb78019 DAC: 00000015 [ 23.533160] Process kworker/0:2 (pid: 36, stack limit = 0xde4ee218) [ 23.539760] Stack: (0xde4efe50 to 0xde4f0000) [...] [ 23.665030] [<bf2581f0>] (wl12xx_set_power_on [wlcore]) from [<bf25f7ac>] (wlcore_nvs_cb+0x118/0xa4c [wlcore]) [ 23.675604] [<bf25f7ac>] (wlcore_nvs_cb [wlcore]) from [<c04387ec>] (request_firmware_work_func+0x30/0x58) [ 23.685784] [<c04387ec>] (request_firmware_work_func) from [<c0058e2c>] (process_one_work+0x1b4/0x4b4) [ 23.695591] [<c0058e2c>] (process_one_work) from [<c0059168>] (worker_thread+0x3c/0x4a4) [ 23.704124] [<c0059168>] (worker_thread) from [<c005ee68>] (kthread+0xd4/0xf0) [ 23.711747] [<c005ee68>] (kthread) from [<c000f598>] (ret_from_fork+0x14/0x3c) [ 23.719357] Code: bad PC value [ 23.722760] ---[ end trace 981be8510db9b3a9 ]--- Prevent oops by validationg power() pointer value before calling the function. Signed-off-by: Uri Mashiach <uri.mashiach@compulab.co.il> Acked-by: Igor Grinberg <grinberg@compulab.co.il> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Uri Mashiach authored
commit 9b2761cb upstream. The maximum chunks used by the function is (SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE + 1). The original commands array had space for (SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE) commands. When the last chunk is used (len > 4 * WSPI_MAX_CHUNK_SIZE), the last command is stored outside the bounds of the commands array. Oops 5 (page fault) is generated during current wl1271 firmware load attempt: root@debian-armhf:~# ifconfig wlan0 up [ 294.312399] Unable to handle kernel paging request at virtual address 00203fc4 [ 294.320173] pgd = de528000 [ 294.323028] [00203fc4] *pgd=00000000 [ 294.326916] Internal error: Oops: 5 [#1] SMP ARM [ 294.331789] Modules linked in: bnep rfcomm bluetooth ipv6 arc4 wl12xx wlcore mac80211 musb_dsps cfg80211 musb_hdrc usbcore usb_common wlcore_spi omap_rng rng_core musb_am335x omap_wdt cpufreq_dt thermal_sys hwmon [ 294.351838] CPU: 0 PID: 1827 Comm: ifconfig Not tainted 4.2.0-00002-g3e9ad27-dirty #78 [ 294.360154] Hardware name: Generic AM33XX (Flattened Device Tree) [ 294.366557] task: dc9d6d40 ti: de550000 task.ti: de550000 [ 294.372236] PC is at __spi_validate+0xa8/0x2ac [ 294.376902] LR is at __spi_sync+0x78/0x210 [ 294.381200] pc : [<c049c760>] lr : [<c049ebe0>] psr: 60000013 [ 294.381200] sp : de551998 ip : de5519d8 fp : 00200000 [ 294.393242] r10: de551c8c r9 : de5519d8 r8 : de3a9000 [ 294.398730] r7 : de3a9258 r6 : de3a9400 r5 : de551a48 r4 : 00203fbc [ 294.405577] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : de3a9000 [ 294.412420] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 294.419918] Control: 10c5387d Table: 9e528019 DAC: 00000015 [ 294.425954] Process ifconfig (pid: 1827, stack limit = 0xde550218) [ 294.432437] Stack: (0xde551998 to 0xde552000) ... [ 294.883613] [<c049c760>] (__spi_validate) from [<c049ebe0>] (__spi_sync+0x78/0x210) [ 294.891670] [<c049ebe0>] (__spi_sync) from [<bf036598>] (wl12xx_spi_raw_write+0xfc/0x148 [wlcore_spi]) [ 294.901661] [<bf036598>] (wl12xx_spi_raw_write [wlcore_spi]) from [<bf21c694>] (wlcore_boot_upload_firmware+0x1ec/0x458 [wlcore]) [ 294.914038] [<bf21c694>] (wlcore_boot_upload_firmware [wlcore]) from [<bf24532c>] (wl12xx_boot+0xc10/0xfac [wl12xx]) [ 294.925161] [<bf24532c>] (wl12xx_boot [wl12xx]) from [<bf20d5cc>] (wl1271_op_add_interface+0x5b0/0x910 [wlcore]) [ 294.936364] [<bf20d5cc>] (wl1271_op_add_interface [wlcore]) from [<bf15c4ac>] (ieee80211_do_open+0x44c/0xf7c [mac80211]) [ 294.947963] [<bf15c4ac>] (ieee80211_do_open [mac80211]) from [<c0537978>] (__dev_open+0xa8/0x110) [ 294.957307] [<c0537978>] (__dev_open) from [<c0537bf8>] (__dev_change_flags+0x88/0x148) [ 294.965713] [<c0537bf8>] (__dev_change_flags) from [<c0537cd0>] (dev_change_flags+0x18/0x48) [ 294.974576] [<c0537cd0>] (dev_change_flags) from [<c05a55a0>] (devinet_ioctl+0x6b4/0x7d0) [ 294.983191] [<c05a55a0>] (devinet_ioctl) from [<c0517040>] (sock_ioctl+0x1e4/0x2bc) [ 294.991244] [<c0517040>] (sock_ioctl) from [<c017d378>] (do_vfs_ioctl+0x420/0x6b0) [ 294.999208] [<c017d378>] (do_vfs_ioctl) from [<c017d674>] (SyS_ioctl+0x6c/0x7c) [ 295.006880] [<c017d674>] (SyS_ioctl) from [<c000f4c0>] (ret_fast_syscall+0x0/0x54) [ 295.014835] Code: e1550004 e2444034 0a00007d e5953018 (e5942008) [ 295.021544] ---[ end trace 66ed188198f4e24e ]--- Signed-off-by: Uri Mashiach <uri.mashiach@compulab.co.il> Acked-by: Igor Grinberg <grinberg@compulab.co.il> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Johan Hovold authored
commit 157f38f9 upstream. Fix parent-device reference leak due to SPI-core taking an unnecessary reference to the parent when allocating the master structure, a reference that was never released. Note that driver core takes its own reference to the parent when the master device is registered. Fixes: 49dce689 ("spi doesn't need class_device") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Vignesh R authored
commit bc27a539 upstream. Writing invalid command to QSPI_SPI_CMD_REG will terminate current transfer and de-assert the chip select. This has to be done before calling spi_finalize_current_message(). Because spi_finalize_current_message() will mark the end of current message transfer and schedule the next transfer. If the chipselect is not de-asserted before calling spi_finalize_current_message() then the next transfer will overlap with the previous transfer leading to data corruption. __spi_pump_message() can be called either from kthread worker context or directly from the calling process's context. It is possible that these two calls can race against each other. But race is serialized by checking whether master->cur_msg == NULL (pointer to msg being handled by transfer_one() at present). The master->cur_msg is set to NULL when spi_finalize_current_message() is called on that message, which means calling spi_finalize_current_message() allows __spi_sync() to pump next message in calling process context. Now if spi-ti-qspi calls spi_finalize_current_message() before we terminate transfer at hardware side, if __spi_pump_message() is called from process context then the successive transactions can overlap. Fix this by moving writing invalid command to QSPI_SPI_CMD_REG to before calling spi_finalize_current_message() call. Signed-off-by: Vignesh R <vigneshr@ti.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Mosberger-Tang authored
commit 06515f83 upstream. The DMA-slave configuration depends on the whether <= 8 or > 8 bits are transferred per word, so we need to call atmel_spi_dma_slave_config() with the correct value. Signed-off-by: David Mosberger <davidm@egauge.net> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Neil Armstrong authored
commit 468a3208 upstream. Since the "Switch driver to use transfer_one" change, the cs_change behavior has changed and a channel chip select can still be asserted when changing channel from a previous last transfer in a message having the cs_change attribute. Since there is no sense having multiple chip select being asserted at the same time, disable all the remaining forced chip selects in a the prepare_message called right before a spi_transfer_one_message call. It ignores the current channel configuration in order to keep the possibility to leave the chip select asserted between messages. It fixes this bug on a DM8168 SoC ES2.1 Soc and an OMAP4 ES2.1 SoC. It was hanging all the other channels transfers when a CHCONF_FORCE is present on the wrong channel. Fixes: b28cb941 ("spi: omap2-mcspi: Switch driver to use transfer_one") Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Reviewed-by: Michael Welling <mwelling@ieee.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mauricio Faria de Oliveira authored
commit 47796938 upstream. This reverts commit a1989b33. That commit introduced a regression at least for the case of the SG_IO ioctl() running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there are no active paths: the ioctl() fails with the ENOTTY errno immediately rather than blocking due to queue_if_no_path until a path becomes active, for example. That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2]) from multipath devices; which leads to SCSI/filesystem errors in such a guest. More general scenarios can hit that regression too. The following demonstration employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective (some output & user changes omitted for brevity and comments added for clarity). Reverting that commit restores normal operation (queueing) in failing scenarios; tested on linux-next (next-20151022). 1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM) $ cat sg_simple0.c ... see [3] ... $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c $ gcc sgio_inquiry.c -o sgio_inquiry 2) The ioctl() works fine with active paths present. # multipath -l 85ag56 85ag56 (...) dm-19 IBM ,2145 size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | |- 8:0:11:0 sdz 65:144 active undef running | `- 9:0:9:0 sdbf 67:144 active undef running `-+- policy='service-time 0' prio=0 status=enabled |- 8:0:12:0 sdae 65:224 active undef running `- 9:0:12:0 sdbo 68:32 active undef running $ ./sgio_inquiry /dev/mapper/85ag56 Some of the INQUIRY command's response: IBM 2145 0000 INQUIRY duration=0 millisecs, resid=0 3) The ioctl() fails with ENOTTY errno with _no_ active paths present, for unprivileged users (rather than blocking due to queue_if_no_path). # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \ do multipathd -k"fail path $path"; done # multipath -l 85ag56 85ag56 (...) dm-19 IBM ,2145 size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=enabled | |- 8:0:11:0 sdz 65:144 failed undef running | `- 9:0:9:0 sdbf 67:144 failed undef running `-+- policy='service-time 0' prio=0 status=enabled |- 8:0:12:0 sdae 65:224 failed undef running `- 9:0:12:0 sdbo 68:32 failed undef running $ ./sgio_inquiry /dev/mapper/85ag56 sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device 4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285); it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl(). $ dmesg <...> [] device-mapper: multipath: Failing path 65:144. [] device-mapper: multipath: Failing path 67:144. [] device-mapper: multipath: Failing path 65:224. [] device-mapper: multipath: Failing path 68:32. [] sgio_inquiry: sending ioctl 2285 to a partition! 5) The ioctl() only works if the SYS_CAP_RAWIO capability is present (then queueing happens -- in this example, queue_if_no_path is set); this is due to a conditional check in scsi_verify_blk_ioctl(). # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56' sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device # ./sgio_inquiry /dev/mapper/85ag56 & [1] 72830 # cat /proc/72830/stack [<c00000171c0df700>] 0xc00000171c0df700 [<c000000000015934>] __switch_to+0x204/0x350 [<c000000000152d4c>] msleep+0x5c/0x80 [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170 [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0 [<c0000000003128e4>] block_ioctl+0x64/0xd0 [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780 [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0 [<c000000000009358>] system_call+0x38/0xd0 6) This is the function call chain exercised in this analysis: SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c -> do_vfs_ioctl() -> vfs_ioctl() ... error = filp->f_op->unlocked_ioctl(filp, cmd, arg); ... -> dm_blk_ioctl() @ drivers/md/dm.c -> multipath_ioctl() @ drivers/md/dm-mpath.c ... (bdev = NULL, due to no active paths) ... if (!bdev || <...>) { int err = scsi_verify_blk_ioctl(NULL, cmd); if (err) r = err; } ... -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c ... if (bd && bd == bd->bd_contains) // not taken (bd = NULL) return 0; ... if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user) return 0; ... printk_ratelimited(KERN_WARNING "%s: sending ioctl %x to a partition!\n" <...>); return -ENOIOCTLCMD; <- ... return r ? : <...> <- ... if (error == -ENOIOCTLCMD) error = -ENOTTY; out: return error; ... Links: [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52 [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device') [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03) Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mikulas Patocka authored
commit ad5f498f upstream. Commit bfebd1cd ("dm: add full blk-mq support to request-based DM") moves the initialization of the fields backing_dev_info.congested_fn, backing_dev_info.congested_data and queuedata from the function dm_init_md_queue (that is called when the device is created) to dm_init_old_md_queue (that is called after the device type is determined). There is no locking when accessing these variables, thus it is possible for other parts of the kernel to briefly see this data in a transient state (e.g. queue->backing_dev_info.congested_fn initialized and md->queue->backing_dev_info.congested_data uninitialized, resulting in passing an incorrect parameter to the function dm_any_congested). This queue data is left initialized for blk-mq devices even though they that don't use it. Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Dmitry V. Levin authored
commit 2d33fa10 upstream. According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr has to be defined to 259, but it doesn't. Instead, it's defined to 269, which is of course used by another syscall, __NR_sched_setaffinity in this case. This bug was found by strace test suite. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
xuejiufei authored
commit c95a5180 upstream. When recovery master down, dlm_do_local_recovery_cleanup() only remove the $RECOVERY lock owned by dead node, but do not clear the refmap bit. Which will make umount thread falling in dead loop migrating $RECOVERY to the dead node. Signed-off-by: xuejiufei <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
xuejiufei authored
commit bef5502d upstream. We have found that migration source will trigger a BUG that the refcount of mle is already zero before put when the target is down during migration. The situation is as follows: dlm_migrate_lockres dlm_add_migration_mle dlm_mark_lockres_migrating dlm_get_mle_inuse <<<<<< Now the refcount of the mle is 2. dlm_send_one_lockres and wait for the target to become the new master. <<<<<< o2hb detect the target down and clean the migration mle. Now the refcount is 1. dlm_migrate_lockres woken, and put the mle twice when found the target goes down which trigger the BUG with the following message: "ERROR: bad mle: ". Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Joseph Qi authored
commit 5c9ee4cb upstream. When resizing, it firstly extends the last gd. Once it should backup super in the gd, it calculates new backup super and update the corresponding value. But it currently doesn't consider the situation that the backup super is already done. And in this case, it still sets the bit in gd bitmap and then decrease from bg_free_bits_count, which leads to a corrupted gd and trigger the BUG in ocfs2_block_group_set_bits: BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits); So check whether the backup super is done and then do the updates. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Junxiao Bi authored
commit 854ee2e9 upstream. Commit 8f1eb487 ("ocfs2: fix umask ignored issue") introduced an issue, SGID of sub dir was not inherited from its parents dir. It is because SGID is set into "inode->i_mode" in ocfs2_get_init_inode(), but is overwritten by "mode" which don't have SGID set later. Fixes: 8f1eb487 ("ocfs2: fix umask ignored issue") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mike Kravetz authored
commit dbe409e4 upstream. Dmitry Vyukov reported the following memory leak unreferenced object 0xffff88002eaafd88 (size 32): comm "a.out", pid 5063, jiffies 4295774645 (age 15.810s) hex dump (first 32 bytes): 28 e9 4e 63 00 88 ff ff 28 e9 4e 63 00 88 ff ff (.Nc....(.Nc.... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: kmalloc include/linux/slab.h:458 region_chg+0x2d4/0x6b0 mm/hugetlb.c:398 __vma_reservation_common+0x2c3/0x390 mm/hugetlb.c:1791 vma_needs_reservation mm/hugetlb.c:1813 alloc_huge_page+0x19e/0xc70 mm/hugetlb.c:1845 hugetlb_no_page mm/hugetlb.c:3543 hugetlb_fault+0x7a1/0x1250 mm/hugetlb.c:3717 follow_hugetlb_page+0x339/0xc70 mm/hugetlb.c:3880 __get_user_pages+0x542/0xf30 mm/gup.c:497 populate_vma_page_range+0xde/0x110 mm/gup.c:919 __mm_populate+0x1c7/0x310 mm/gup.c:969 do_mlock+0x291/0x360 mm/mlock.c:637 SYSC_mlock2 mm/mlock.c:658 SyS_mlock2+0x4b/0x70 mm/mlock.c:648 Dmitry identified a potential memory leak in the routine region_chg, where a region descriptor is not free'ed on an error path. However, the root cause for the above memory leak resides in region_del. In this specific case, a "placeholder" entry is created in region_chg. The associated page allocation fails, and the placeholder entry is left in the reserve map. This is "by design" as the entry should be deleted when the map is released. The bug is in the region_del routine which is used to delete entries within a specific range (and when the map is released). region_del did not handle the case where a placeholder entry exactly matched the start of the range range to be deleted. In this case, the entry would not be deleted and leaked. The fix is to take these special placeholder entries into account in region_del. The region_chg error path leak is also fixed. Fixes: feba16e2 ("mm/hugetlb: add region_del() to delete a specific range of entries") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Richard Weinberger authored
commit 9d8a7652 upstream. sigsuspend() is nowhere used except in signal.c itself, so we can mark it static do not pollute the global namespace. But this patch is more than a boring cleanup patch, it fixes a real issue on UserModeLinux. UML has a special console driver to display ttys using xterm, or other terminal emulators, on the host side. Vegard reported that sometimes UML is unable to spawn a xterm and he's facing the following warning: WARNING: CPU: 0 PID: 908 at include/linux/thread_info.h:128 sigsuspend+0xab/0xc0() It turned out that this warning makes absolutely no sense as the UML xterm code calls sigsuspend() on the host side, at least it tries. But as the kernel itself offers a sigsuspend() symbol the linker choose this one instead of the glibc wrapper. Interestingly this code used to work since ever but always blocked signals on the wrong side. Some recent kernel change made the WARN_ON() trigger and uncovered the bug. It is a wonderful example of how much works by chance on computers. :-) Fixes: 68f3f16d ("new helper: sigsuspend()") Signed-off-by: Richard Weinberger <richard@nod.at> Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Naoya Horiguchi authored
commit 0d777df5 upstream. Currently at the beginning of hugetlb_fault(), we call huge_pte_offset() and check whether the obtained *ptep is a migration/hwpoison entry or not. And if not, then we get to call huge_pte_alloc(). This is racy because the *ptep could turn into migration/hwpoison entry after the huge_pte_offset() check. This race results in BUG_ON in huge_pte_alloc(). We don't have to call huge_pte_alloc() when the huge_pte_offset() returns non-NULL, so let's fix this bug with moving the code into else block. Note that the *ptep could turn into a migration/hwpoison entry after this block, but that's not a problem because we have another !pte_present check later (we never go into hugetlb_no_page() in that case.) Fixes: 290408d4 ("hugetlb: hugepage migration core") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
OGAWA Hirofumi authored
commit 928a4771 upstream. For the root directory, . and .. are faked (using dir_emit_dots()) and ctx->pos is reset from 2 to 0. A corrupted root directory could cause fat_get_entry() to fail, but ->iterate() (fat_readdir()) reports progress to the VFS (with ctx->pos rewound to 0), so any following calls to ->iterate() continue to return the same entries again and again. The result is that userspace will never see the end of the directory, causing e.g. 'ls' to hang in a getdents() loop. [hirofumi@mail.parknet.co.jp: cleanup and make sure to correct fake_offset] Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Richard Weinberger <richard.weinberger@gmail.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mike Kravetz authored
commit 1817889e upstream. Hugh Dickins pointed out problems with the new hugetlbfs fallocate hole punch code. These problems are in the routine remove_inode_hugepages and mostly occur in the case where there are holes in the range of pages to be removed. These holes could be the result of a previous hole punch or simply sparse allocation. The current code could access pages outside the specified range. remove_inode_hugepages handles both hole punch and truncate operations. Page index handling was fixed/cleaned up so that the loop index always matches the page being processed. The code now only makes a single pass through the range of pages as it was determined page faults could not race with truncate. A cond_resched() was added after removing up to PAGEVEC_SIZE pages. Some totally unnecessary code in hugetlbfs_fallocate() that remained from early development was also removed. Tested with fallocate tests submitted here: http://librelist.com/browser//libhugetlbfs/2015/6/25/patch-tests-add-tests-for-fallocate-system-call/ And, some ftruncate tests under development Fixes: b5cec28d ("hugetlbfs: truncate_hugepages() takes a range of pages") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Hillf Danton" <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michal Hocko authored
commit 373ccbe5 upstream. Tetsuo Handa has reported that the system might basically livelock in OOM condition without triggering the OOM killer. The issue is caused by internal dependency of the direct reclaim on vmstat counter updates (via zone_reclaimable) which are performed from the workqueue context. If all the current workers get assigned to an allocation request, though, they will be looping inside the allocator trying to reclaim memory but zone_reclaimable can see stalled numbers so it will consider a zone reclaimable even though it has been scanned way too much. WQ concurrency logic will not consider this situation as a congested workqueue because it relies that worker would have to sleep in such a situation. This also means that it doesn't try to spawn new workers or invoke the rescuer thread if the one is assigned to the queue. In order to fix this issue we need to do two things. First we have to let wq concurrency code know that we are in trouble so we have to do a short sleep. In order to prevent from issues handled by 0e093d99 ("writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone") we limit the sleep only to worker threads which are the ones of the interest anyway. The second thing to do is to create a dedicated workqueue for vmstat and mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to have a spare worker thread for it. Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Tejun Heo <tj@kernel.org> Cc: Cristopher Lameter <clameter@sgi.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Naoya Horiguchi authored
commit a88c7695 upstream. When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back on alloc_buddy_huge_page() to directly create a hugepage from the buddy allocator. In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement h->resv_huge_pages, which means that successful hugetlb_fault() returns without releasing the reserve count. As a result, subsequent hugetlb_fault() might fail despite that there are still free hugepages. This patch simply adds decrementing code on that code path. I reproduced this problem when testing v4.3 kernel in the following situation: - the test machine/VM is a NUMA system, - hugepage overcommiting is enabled, - most of hugepages are allocated and there's only one free hugepage which is on node 0 (for example), - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to node 1, tries to allocate a hugepage, - the allocation should fail but the reserve count is still hold. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michal Hocko authored
commit c12176d3 upstream. Commit 424cdc14 ("memcg: convert threshold to bytes") has fixed a regression introduced by 3e32cb2e ("mm: memcontrol: lockless page counters") where thresholds were silently converted to use page units rather than bytes when interpreting the user input. The fix is not complete, though, as properly pointed out by Ben Hutchings during stable backport review. The page count is converted to bytes but unsigned long is used to hold the value which would be obviously not sufficient for 32b systems with more than 4G thresholds. The same applies to usage as taken from mem_cgroup_usage which might overflow. Let's remove this bytes vs. pages internal tracking differences and handle thresholds in page units internally. Chage mem_cgroup_usage() to return the value in page units and revert 424cdc14 because this should be sufficient for the consistent handling. mem_cgroup_read_u64 as the only users of mem_cgroup_usage outside of the threshold handling code is converted to give the proper in bytes result. It is doing that already for page_counter output so this is more consistent as well. The value presented to the userspace is still in bytes units. Fixes: 424cdc14 ("memcg: convert threshold to bytes") Fixes: 3e32cb2e ("mm: memcontrol: lockless page counters") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Ben Hutchings <ben@decadent.org.uk> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> From: Michal Hocko <mhocko@kernel.org> Subject: memcg: fix thresholds for 32b architectures. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> From: Andrew Morton <akpm@linux-foundation.org> Subject: memcg: fix thresholds for 32b architectures. don't attempt to inline mem_cgroup_usage() The compiler ignores the inline anwyay. And __always_inlining it adds 600 bytes of goop to the .o file. Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Greg Thelen authored
commit 0f930902 upstream. Since 5cec38ac ("fs, seq_file: fallback to vmalloc instead of oom kill processes") seq_buf_alloc() avoids calling the oom killer for PAGE_SIZE or smaller allocations; but larger allocations can use the oom killer via vmalloc(). Thus reads of small files can return ENOMEM, but larger files use the oom killer to avoid ENOMEM. The effect of this bug is that reads from /proc and other virtual filesystems can return ENOMEM instead of the preferred behavior - oom killing something (possibly the calling process). I don't know of anyone except Google who has noticed the issue. I suspect the fix is more needed in smaller systems where there isn't any reclaimable memory. But these seem like the kinds of systems which probably don't use the oom killer for production situations. Memory overcommit requires use of the oom killer to select a victim regardless of file size. Enable oom killer for small seq_buf_alloc() allocations. Fixes: 5cec38ac ("fs, seq_file: fallback to vmalloc instead of oom kill processes") Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andy Shevchenko authored
commit 9f029f54 upstream. There is a classical off-by-one error in case when we try to place, for example, 1+1 bytes as hex in the buffer of size 6. The expected result is to get an output truncated, but in the reality we get 6 bytes filed followed by terminating NUL. Change the logic how we fill the output in case of byte dumping into limited space. This will follow the snprintf() behaviour by truncating output even on half bytes. Fixes: 114fc1af (hexdump: make it return number of bytes placed in buffer) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com> Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tetsuo Handa authored
commit 426fb5e7 upstream. It was confirmed that a local unprivileged user can consume all memory reserves and hang up that system using time lag between the OOM killer sets TIF_MEMDIE on an OOM victim and sends SIGKILL to that victim, for printk() inside for_each_process() loop at oom_kill_process() can consume many seconds when there are many thread groups sharing the same memory. Before starting oom-depleter process: Node 0 DMA: 3*4kB (UM) 6*8kB (U) 4*16kB (UEM) 0*32kB 0*64kB 1*128kB (M) 2*256kB (EM) 2*512kB (UE) 2*1024kB (EM) 1*2048kB (E) 1*4096kB (M) = 9980kB Node 0 DMA32: 31*4kB (UEM) 27*8kB (UE) 32*16kB (UE) 13*32kB (UE) 14*64kB (UM) 7*128kB (UM) 8*256kB (UM) 8*512kB (UM) 3*1024kB (U) 4*2048kB (UM) 362*4096kB (UM) = 1503220kB As of invoking the OOM killer: Node 0 DMA: 11*4kB (UE) 8*8kB (UEM) 6*16kB (UE) 2*32kB (EM) 0*64kB 1*128kB (U) 3*256kB (UEM) 2*512kB (UE) 3*1024kB (UEM) 1*2048kB (U) 0*4096kB = 7308kB Node 0 DMA32: 1049*4kB (UEM) 507*8kB (UE) 151*16kB (UE) 53*32kB (UEM) 83*64kB (UEM) 52*128kB (EM) 25*256kB (UEM) 11*512kB (M) 6*1024kB (UM) 1*2048kB (M) 0*4096kB = 44556kB Between the thread group leader got TIF_MEMDIE and receives SIGKILL: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB The oom-depleter's thread group leader which got TIF_MEMDIE started memset() in user space after the OOM killer set TIF_MEMDIE, and it was free to abuse ALLOC_NO_WATERMARKS by TIF_MEMDIE for memset() in user space until SIGKILL is delivered. If SIGKILL is delivered before TIF_MEMDIE is set, the oom-depleter can terminate without touching memory reserves. Although the possibility of hitting this time lag is very small for 3.19 and earlier kernels because TIF_MEMDIE is set immediately before sending SIGKILL, preemption or long interrupts (an extreme example is SysRq-t) can step between and allow memory allocations which are not needed for terminating the OOM victim. Fixes: 83363b91 ("oom: make sure that TIF_MEMDIE is set under task_lock") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Catalin Marinas authored
commit d4322d88 upstream. On systems with a KMALLOC_MIN_SIZE of 128 (arm64, some mips and powerpc configurations defining ARCH_DMA_MINALIGN to 128), the first kmalloc_caches[] entry to be initialised after slab_early_init = 0 is "kmalloc-128" with index 7. Depending on the debug kernel configuration, sizeof(struct kmem_cache) can be larger than 128 resulting in an INDEX_NODE of 8. Commit 8fc9cf42 ("slab: make more slab management structure off the slab") enables off-slab management objects for sizes starting with PAGE_SIZE >> 5 (128 bytes for a 4KB page configuration) and the creation of the "kmalloc-128" cache would try to place the management objects off-slab. However, since KMALLOC_MIN_SIZE is already 128 and freelist_size == 32 in __kmem_cache_create(), kmalloc_slab(freelist_size) returns NULL (kmalloc_caches[7] not populated yet). This triggers the following bug on arm64: kernel BUG at /work/Linux/linux-2.6-aarch64/mm/slab.c:2283! Internal error: Oops - BUG: 0 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.3.0-rc4+ #540 Hardware name: Juno (DT) PC is at __kmem_cache_create+0x21c/0x280 LR is at __kmem_cache_create+0x210/0x280 [...] Call trace: __kmem_cache_create+0x21c/0x280 create_boot_cache+0x48/0x80 create_kmalloc_cache+0x50/0x88 create_kmalloc_caches+0x4c/0xf4 kmem_cache_init+0x100/0x118 start_kernel+0x214/0x33c This patch introduces an OFF_SLAB_MIN_SIZE definition to avoid off-slab management objects for sizes equal to or smaller than KMALLOC_MIN_SIZE. Fixes: 8fc9cf42 ("slab: make more slab management structure off the slab") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-