- 29 Apr, 2016 6 commits
-
-
Konstantin Khlebnikov authored
Khugepaged detects own VMAs by checking vm_file and vm_ops but this way it cannot distinguish private /dev/zero mappings from other special mappings like /dev/hpet which has no vm_ops and popultes PTEs in mmap. This fixes false-positive VM_BUG_ON and prevents installing THP where they are not expected. Link: http://lkml.kernel.org/r/CACT4Y+ZmuZMV5CjSFOeXviwQdABAgT7T+StKfTqan9YDtgEi5g@mail.gmail.com Fixes: 78f11a25 ("mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups") Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Krzysztof Kozlowski authored
Patchwork introduced a garbled Polish character in commit 1e3012d0 ("crypto: s5p-sss - Use memcpy_toio for iomem annotated memory") so fix the mail mapping. Additionally prefer to use kernel.org account for personal work, instead of my gmail address. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Andrea has found[1] a race condition on MMU-gather based TLB flush vs split_huge_page() or shrinker which frees huge zero under us (patch 1/2 and 2/2 respectively). With new THP refcounting, we don't need patch 1/2: mmu_gather keeps the page pinned until flush is complete and the pin prevents the page from being split under us. We still need patch 2/2. This is simplified version of Andrea's patch. We don't need fancy encoding. [1] http://lkml.kernel.org/r/1447938052-22165-1-git-send-email-aarcange@redhat.comSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Steve Capper authored
HugeTLB pages cannot be split, so we use the compound_mapcount to track rmaps. Currently page_mapped() will check the compound_mapcount, but will also go through the constituent pages of a THP compound page and query the individual _mapcount's too. Unfortunately, page_mapped() does not distinguish between HugeTLB and THP compound pages and assumes that a compound page always needs to have HPAGE_PMD_NR pages querying. For most cases when dealing with HugeTLB this is just inefficient, but for scenarios where the HugeTLB page size is less than the pmd block size (e.g. when using contiguous bit on ARM) this can lead to crashes. This patch adjusts the page_mapped function such that we skip the unnecessary THP reference checks for HugeTLB pages. Fixes: e1534ae9 ("mm: differentiate page_mapped() from page_mapcount() for compound pages") Signed-off-by: Steve Capper <steve.capper@arm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Atsushi Kumagai authored
PageAnon() always look at head page to check PAGE_MAPPING_ANON and tail page's page->mapping has just a poisoned data since commit 1c290f64 ("mm: sanitize page->mapping for tail pages"). If makedumpfile checks page->mapping of a compound tail page to distinguish anonymous page as usual, it must fail in newer kernel. So it's necessary to export OFFSET(page.compound_head) to avoid checking compound tail pages. The problem is that unnecessary hugepages won't be removed from a dump file in kernels 4.5.x and later. This means that extra disk space would be consumed. It's a problem, but not critical. Signed-off-by: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Atsushi Kumagai authored
makedumpfile refers page.lru.next to get the order of compound pages for page filtering. However, now the order is stored in page.compound_order, hence VMCOREINFO should be updated to export the offset of page.compound_order. The fact is, page.compound_order was introduced already in kernel 4.0, but the offset of it was the same as page.lru.next until kernel 4.3, so this was not actual problem. The above can be said also for page.lru.prev and page.compound_dtor, it's necessary to detect hugetlbfs pages. Further, the content was changed from direct address to the ID which means dtor. The problem is that unnecessary hugepages won't be removed from a dump file in kernels 4.4.x and later. This means that extra disk space would be consumed. It's a problem, but not critical. Signed-off-by: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 27 Apr, 2016 9 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds authored
Pull workqueue fix from Tejun Heo: "So, it turns out we had a silly bug in the most fundamental part of workqueue for a very long time. AFAICS, this dates back to pre-git era and has quite likely been there from the time workqueue was first introduced. A work item uses its PENDING bit to synchronize multiple queuers. Anyone who wins the PENDING bit owns the pending state of the work item. Whether a queuer wins or loses the race, one thing should be guaranteed - there will soon be at least one execution of the work item - where "after" means that the execution instance would be able to see all the changes that the queuer has made prior to the queueing attempt. Unfortunately, we were missing a smp_mb() after clearing PENDING for execution, so nothing guaranteed visibility of the changes that a queueing loser has made, which manifested as a reproducible blk-mq stall. Lots of kudos to Roman for debugging the problem. The patch for -stable is the minimal one. For v3.7, Peter is working on a patch to make the code path slightly more efficient and less fragile" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix ghost PENDING flag while doing MQ IO
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds authored
Pull cgroup fixes from Tejun Heo: "Two patches to fix a deadlock which can be easily triggered if memcg charge moving is used. This bug was introduced while converting threadgroup locking to a global percpu_rwsem and is caused by cgroup controller task migration path depending on the ability to create new kthreads. cpuset had a similar issue which was fixed by performing heavy-lifting operations asynchronous to task migration. The two patches fix the same issue in memcg in a similar way. The first patch makes the mechanism generic and the second relocates memcg charge moving outside the migration path. Given that we don't want to perform heavy operations while writelocking threadgroup lock anyway, moving them out of the way is a desirable solution. One thing to note is that the problem was difficult to debug because lockdep couldn't figure out the deadlock condition. Looking into how to improve that" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: memcg: relocate charge moving from ->attach to ->post_attach cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fixes from Wolfram Sang: "I2C has one buildfix, one ABBA deadlock fix, and three simple 'add ID' patches" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared i2c: cpm: Fix build break due to incompatible pointer types i2c: ismt: Add Intel DNV PCI ID i2c: xlp9xx: add support for Broadcom Vulcan i2c: rk3x: add support for rk3228
-
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arcLinus Torvalds authored
Pull ARC fixes from Vineet Gupta: - lockdep now works for ARCv2 builds - enable DT reserved-memory binding (for forthcoming HDMI driver) * tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: add support for reserved memory defined by device tree ARC: support generic per-device coherent dma mem Documentation: dt: arc: fix spelling mistakes ARCv2: Enable LOCKDEP
-
git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2Linus Torvalds authored
Pull arch/nios2 fix from Ley Foon Tan: "memset: use the right constraint modifier for the %4 output operand" * tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2: nios2: memset: use the right constraint modifier for the %4 output operand
-
Linus Torvalds authored
Merge tag 'platform-drivers-x86-v4.6-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull x86 platform driver fix from Darren Hart: "Fix regression caused by hotkey enabling value in toshiba_acpi" * tag 'platform-drivers-x86-v4.6-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: toshiba_acpi: Fix regression caused by hotkey enabling value
-
Alexey Brodkin authored
Enable reserved memory initialization from device tree. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-
Alexey Brodkin authored
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-
Romain Perier authored
Depending on the size of the area to be memset'ed, the nios2 memset implementation either uses a naive loop (for buffers smaller or equal than 8 bytes) or a more optimized implementation (for buffers larger than 8 bytes). This implementation does 4-byte stores rather than 1-byte stores to speed up memset. However, we discovered that on our nios2 platform, memset() was not properly setting the buffer to the expected value. A memset of 0xff would not set the entire buffer to 0xff, but to: 0xff 0x00 0xff 0x00 0xff 0x00 0xff 0x00 ... Which is obviously incorrect. Our investigation has revealed that the problem lies in the incorrect constraints used in the inline assembly. The following piece of assembly, from the nios2 memset implementation, is supposed to create a 4-byte value that repeats 4 times the 1-byte pattern passed as memset argument: /* fill8 %3, %5 (c & 0xff) */ " slli %4, %5, 8\n" " or %4, %4, %5\n" " slli %3, %4, 16\n" " or %3, %3, %4\n" However, depending on the compiler and optimization level, this code might be compiled as: 34: 280a923a slli r5,r5,8 38: 294ab03a or r5,r5,r5 3c: 2808943a slli r4,r5,16 40: 2148b03a or r4,r4,r5 This is wrong because r5 gets used both for %5 and %4, which leads to the final pattern stored in r4 to be 0xff00ff00 rather than the expected 0xffffffff. %4 is defined with the "=r" constraint, i.e as an output operand. However, as explained in http://www.ethernut.de/en/documents/arm-inline-asm.html, this does not prevent gcc from using the same register for an output operand (%4) and input operand (%5). By using the constraint modifier '&', we indicate that the register should be used for output only. With this change, we get the following assembly output: 34: 2810923a slli r8,r5,8 38: 4150b03a or r8,r8,r5 3c: 400e943a slli r7,r8,16 40: 3a0eb03a or r7,r7,r8 Which correctly produces the 0xffffffff pattern when 0xff is passed as the memset() pattern. It is worth mentioning the observed consequence of this bug: we were hitting the kernel BUG() in mm/bootmem.c:__free() that verifies when marking a page as free that it was previously marked as occupied (i.e that the bit was set to 1). The entire bootmem bitmap is set to 0xff bit via a memset() during the bootmem initialization. The bootmem_free() call right after the initialization was finding some bits to be set to 0, which didn't make sense since the bitmap has just been memset'ed to 0xff. Except that due to the bug explained above, the bitmap was in fact initialized to 0xff00ff00. Thanks to Marek Vasut for his help and feedback. Signed-off-by: Romain Perier <romain.perier@free-electrons.com> Acked-by: Marek Vasut <marex@denx.de> Acked-by: Ley Foon Tan <lftan@altera.com>
-
- 26 Apr, 2016 10 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) Handle v4/v6 mixed sockets properly in soreuseport, from Craig Gallak. 2) Bug fixes for the new macsec facility (missing kmalloc NULL checks, missing locking around netdev list traversal, etc.) from Sabrina Dubroca. 3) Fix handling of host routes on ifdown in ipv6, from David Ahern. 4) Fix double-fdput in bpf verifier. From Jann Horn. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits) bpf: fix double-fdput in replace_map_fd_with_map_ptr() net: ipv6: Delete host routes on an ifdown Revert "ipv6: Revert optional address flusing on ifdown." net/mlx4_en: fix spurious timestamping callbacks net: dummy: remove note about being Y by default cxgbi: fix uninitialized flowi6 ipv6: Revert optional address flusing on ifdown. ipv4/fib: don't warn when primary address is missing if in_dev is dead net/mlx5: Add pci shutdown callback net/mlx5_core: Remove static from local variable net/mlx5e: Use vport MTU rather than physical port MTU net/mlx5e: Fix minimum MTU net/mlx5e: Device's mtu field is u16 and not int net/mlx5_core: Add ConnectX-5 to list of supported devices net/mlx5e: Fix MLX5E_100BASE_T define net/mlx5_core: Fix soft lockup in steering error flow qlcnic: Update version to 5.3.64 net: stmmac: socfpga: Remove re-registration of reset controller macsec: fix netlink attribute validation macsec: add missing macsec prefix in uapi ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds authored
Pull ARM SoC fixes from Arnd Bergmann: "Here are the latest bug fixes for ARM SoCs, mostly addressing recent regressions. Changes are across several platforms, so I'm listing every change separately here. Regressions since 4.5: - A correction of the psci firmware DT binding, to prevent users from relying on unintended semantics - Actually getting the newly merged clock driver for some OMAP platforms to work - A revert of patches for the Qualcomm BAM, these need to be reworked for 4.7 to avoid breaking boards other than the one they were intended for - A correction for the I2C device nodes on the Socionext Uniphier platform - i.MX SDHCI was broken for non-DT platforms due to a change with the setting of the DMA mask - A revert of a patch that accidentally added a nonexisting clock on the Rensas "Porter" board - A couple of OMAP fixes that are all related to suspend after the power domain changes for dra7 - On Mediatek, revert part of the power domain initialization changes that broke mt8173-evb Fixes for older bugs: - Workaround for an "external abort" in the omap34xx suspend/resume code. - The USB1/eSATA should not be listed as an excon device on am57xx-beagle-x15 (broken since v4.0) - A v4.5 regression in the TI AM33xx and AM43XX DT specifying incorrect DMA request lines for the GPMC - The jiffies calibration on Renesas platforms was incorrect for some modern CPU cores. - A hardware errata woraround for clockdomains on TI DRA7" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: drivers: firmware: psci: unify enable-method binding on ARM {64,32}-bit systems arm64: dts: uniphier: fix I2C nodes of PH1-LD20 ARM: shmobile: timer: Fix preset_lpj leading to too short delays Revert "ARM: dts: porter: Enable SCIF_CLK frequency and pins" ARM: dts: r8a7791: Don't disable referenced optional clocks Revert "ARM: OMAP: Catch callers of revision information prior to it being populated" ARM: OMAP3: Fix external abort on 36xx waking from off mode idle ARM: dts: am57xx-beagle-x15: remove extcon_usb1 ARM: dts: am437x: Fix GPMC dma properties ARM: dts: am33xx: Fix GPMC dma properties Revert "soc: mediatek: SCPSYS: Fix double enabling of regulators" ARM: mach-imx: sdhci-esdhc-imx: initialize DMA mask ARM: DRA7: clockdomain: Implement timer workaround for errata i874 ARM: OMAP: Catch callers of revision information prior to it being populated ARM: dts: dra7: Correct clock tree for sys_32k_ck ARM: OMAP: DRA7: Provide proper class to omap2_set_globals_tap ARM: OMAP: DRA7: wakeupgen: Skip SAR save for wakeupgen Revert "dts: msm8974: Add dma channels for blsp2_i2c1 node" Revert "dts: msm8974: Add blsp2_bam dma node" ARM: dts: Add clocks for dm814x ADPLL
-
Linus Torvalds authored
This is more prep-work for the upcoming pty changes. Still just code cleanup with no actual semantic changes. This removes a bunch pointless complexity by just having the slave pty side remember the dentry associated with the devpts slave rather than the inode. That allows us to remove all the "look up the dentry" code for when we want to remove it again. Together with moving the tty pointer from "inode->i_private" to "dentry->d_fsdata" and getting rid of pointless inode locking, this removes about 30 lines of code. Not only is the end result smaller, it's simpler and easier to understand. The old code, for example, depended on the d_find_alias() to not just find the dentry, but also to check that it is still hashed, which in turn validated the tty pointer in the inode. That is a _very_ roundabout way to say "invalidate the cached tty pointer when the dentry is removed". The new code just does dentry->d_fsdata = NULL; in devpts_pty_kill() instead, invalidating the tty pointer rather more directly and obviously. Don't do something complex and subtle when the obvious straightforward approach will do. The rest of the patch (ie apart from code deletion and the above tty pointer clearing) is just switching the calling convention to pass the dentry or file pointer around instead of the inode. Cc: Eric Biederman <ebiederm@xmission.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jann@thejh.net> Cc: Greg KH <greg@kroah.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: Florian Weimer <fw@deneb.enyo.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jann Horn authored
When bpf(BPF_PROG_LOAD, ...) was invoked with a BPF program whose bytecode references a non-map file descriptor as a map file descriptor, the error handling code called fdput() twice instead of once (in __bpf_map_get() and in replace_map_fd_with_map_ptr()). If the file descriptor table of the current task is shared, this causes f_count to be decremented too much, allowing the struct file to be freed while it is still in use (use-after-free). This can be exploited to gain root privileges by an unprivileged user. This bug was introduced in commit 0246e64d ("bpf: handle pseudo BPF_LD_IMM64 insn"), but is only exploitable since commit 1be7f75d ("bpf: enable non-root eBPF programs") because previously, CAP_SYS_ADMIN was required to reach the vulnerable code. (posted publicly according to request by maintainer) Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
It was a simple idea -- save IPv6 configured addresses on a link down so that IPv6 behaves similar to IPv4. As always the devil is in the details and the IPv6 stack as too many behavioral differences from IPv4 making the simple idea more complicated than it needs to be. The current implementation for keeping IPv6 addresses can panic or spit out a warning in one of many paths: 1. IPv6 route gets an IPv4 route as its 'next' which causes a panic in rt6_fill_node while handling a route dump request. 2. rt->dst.obsolete is set to DST_OBSOLETE_DEAD hitting the WARN_ON in fib6_del 3. Panic in fib6_purge_rt because rt6i_ref count is not 1. The root cause of all these is references related to the host route for an address that is retained. So, this patch deletes the host route every time the ifdown loop runs. Since the host route is deleted and will be re-generated an up there is no longer a need for the l3mdev fix up. On the 'admin up' side move addrconf_permanent_addr into the NETDEV_UP event handling so that it runs only once versus on UP and CHANGE events. All of the current panics and warnings appear to be related to addresses on the loopback device, but given the catastrophic nature when a bug is triggered this patch takes the conservative approach and evicts all host routes rather than trying to determine when it can be re-used and when it can not. That can be a later optimizaton if desired. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
This reverts commit 841645b5. Ok, this puts the feature back. I've decided to apply David A.'s bug fix and run with that rather than make everyone wait another whole release for this feature. Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roman Pen authored
The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list with the following backtrace: [ 601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds. [ 601.347574] Tainted: G O 4.4.5-1-storage+ #6 [ 601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 601.348142] kworker/u129:5 D ffff880803077988 0 1636 2 0x00000000 [ 601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server] [ 601.348999] ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000 [ 601.349662] ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0 [ 601.350333] ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38 [ 601.350965] Call Trace: [ 601.351203] [<ffffffff815b0920>] ? bit_wait+0x60/0x60 [ 601.351444] [<ffffffff815b01d5>] schedule+0x35/0x80 [ 601.351709] [<ffffffff815b2dd2>] schedule_timeout+0x192/0x230 [ 601.351958] [<ffffffff812d43f7>] ? blk_flush_plug_list+0xc7/0x220 [ 601.352208] [<ffffffff810bd737>] ? ktime_get+0x37/0xa0 [ 601.352446] [<ffffffff815b0920>] ? bit_wait+0x60/0x60 [ 601.352688] [<ffffffff815af784>] io_schedule_timeout+0xa4/0x110 [ 601.352951] [<ffffffff815b3a4e>] ? _raw_spin_unlock_irqrestore+0xe/0x10 [ 601.353196] [<ffffffff815b093b>] bit_wait_io+0x1b/0x70 [ 601.353440] [<ffffffff815b056d>] __wait_on_bit+0x5d/0x90 [ 601.353689] [<ffffffff81127bd0>] wait_on_page_bit+0xc0/0xd0 [ 601.353958] [<ffffffff81096db0>] ? autoremove_wake_function+0x40/0x40 [ 601.354200] [<ffffffff81127cc4>] __filemap_fdatawait_range+0xe4/0x140 [ 601.354441] [<ffffffff81127d34>] filemap_fdatawait_range+0x14/0x30 [ 601.354688] [<ffffffff81129a9f>] filemap_write_and_wait_range+0x3f/0x70 [ 601.354932] [<ffffffff811ced3b>] blkdev_fsync+0x1b/0x50 [ 601.355193] [<ffffffff811c82d9>] vfs_fsync_range+0x49/0xa0 [ 601.355432] [<ffffffff811cf45a>] blkdev_write_iter+0xca/0x100 [ 601.355679] [<ffffffff81197b1a>] __vfs_write+0xaa/0xe0 [ 601.355925] [<ffffffff81198379>] vfs_write+0xa9/0x1a0 [ 601.356164] [<ffffffff811c59d8>] kernel_write+0x38/0x50 The underlying device is a null_blk, with default parameters: queue_mode = MQ submit_queues = 1 Verification that nullb0 has something inflight: root@pserver8:~# cat /sys/block/nullb0/inflight 0 1 root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \; ... /sys/block/nullb0/mq/0/cpu2/rq_list CTX pending: ffff8838038e2400 ... During debug it became clear that stalled request is always inserted in the rq_list from the following path: save_stack_trace_tsk + 34 blk_mq_insert_requests + 231 blk_mq_flush_plug_list + 281 blk_flush_plug_list + 199 wait_on_page_bit + 192 __filemap_fdatawait_range + 228 filemap_fdatawait_range + 20 filemap_write_and_wait_range + 63 blkdev_fsync + 27 vfs_fsync_range + 73 blkdev_write_iter + 202 __vfs_write + 170 vfs_write + 169 kernel_write + 56 So blk_flush_plug_list() was called with from_schedule == true. If from_schedule is true, that means that finally blk_mq_insert_requests() offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue, i.e. it calls kblockd_schedule_delayed_work_on(). That means, that we race with another CPU, which is about to execute __blk_mq_run_hw_queue() work. Further debugging shows the following traces from different CPUs: CPU#0 CPU#1 ---------------------------------- ------------------------------- reqeust A inserted STORE hctx->ctx_map[0] bit marked kblockd_schedule...() returns 1 <schedule to kblockd workqueue> request B inserted STORE hctx->ctx_map[1] bit marked kblockd_schedule...() returns 0 *** WORK PENDING bit is cleared *** flush_busy_ctxs() is executed, but bit 1, set by CPU#1, is not observed As a result request B pended forever. This behaviour can be explained by speculative LOAD of hctx->ctx_map on CPU#0, which is reordered with clear of PENDING bit and executed _before_ actual STORE of bit 1 on CPU#1. The proper fix is an explicit full barrier <mfence>, which guarantees that clear of PENDING bit is to be executed before all possible speculative LOADS or STORES inside actual work function. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Cc: Gioh Kim <gi-oh.kim@profitbricks.com> Cc: Michael Wang <yun.wang@profitbricks.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>
-
Sudeep Holla authored
Currently ARM CPUs DT bindings allows different enable-method value for PSCI based systems. On ARM 64-bit this property is required and must be "psci" while on ARM 32-bit systems this property is optional and must be "arm,psci" if present. However, "arm,psci" has always been the compatible string for the PSCI node, and was never intended to be the enable-method. So this is a bug in the binding and not a deliberate attempt at specifying 32-bit differently. This is problematic if 32-bit OS is run on 64-bit system which has "psci" as enable-method rather than the expected "arm,psci". So let's unify the value into "psci" and remove support for "arm,psci" before it finds any users. Reported-by: Soby Mathew <Soby.Mathew@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-
Eric Dumazet authored
When multiple skb are TX-completed in a row, we might incorrectly keep a timestamp of a prior skb and cause extra work. Fixes: ec693d47 ("net/mlx4_en: Add HW timestamping (TS) support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ivan Babrou authored
Signed-off-by: Ivan Babrou <ivan@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 25 Apr, 2016 9 commits
-
-
Jiri Benc authored
ip6_route_output looks into different fields in the passed flowi6 structure, yet cxgbi passes garbage in nearly all those fields. Zero the structure out first. Fixes: fc8d0590 ("libcxgbi: Add ipv6 api to driver") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tejun Heo authored
Hello, So, this ended up a lot simpler than I originally expected. I tested it lightly and it seems to work fine. Petr, can you please test these two patches w/o the lru drain drop patch and see whether the problem is gone? Thanks. ------ 8< ------ If charge moving is used, memcg performs relabeling of the affected pages from its ->attach callback which is called under both cgroup_threadgroup_rwsem and thus can't create new kthreads. This is fragile as various operations may depend on workqueues making forward progress which relies on the ability to create new kthreads. There's no reason to perform charge moving from ->attach which is deep in the task migration path. Move it to ->post_attach which is called after the actual migration is finished and cgroup_threadgroup_rwsem is dropped. * move_charge_struct->mm is added and ->can_attach is now responsible for pinning and recording the target mm. mem_cgroup_clear_mc() is updated accordingly. This also simplifies mem_cgroup_move_task(). * mem_cgroup_move_task() is now called from ->post_attach instead of ->attach. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@kernel.org> Debugged-and-tested-by: Petr Mladek <pmladek@suse.com> Reported-by: Cyril Hrubis <chrubis@suse.cz> Reported-by: Johannes Weiner <hannes@cmpxchg.org> Fixes: 1ed13287 ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem") Cc: <stable@vger.kernel.org> # 4.4+
-
Tejun Heo authored
Since e93ad19d ("cpuset: make mm migration asynchronous"), cpuset kicks off asynchronous NUMA node migration if necessary during task migration and flushes it from cpuset_post_attach_flush() which is called at the end of __cgroup_procs_write(). This is to avoid performing migration with cgroup_threadgroup_rwsem write-locked which can lead to deadlock through dependency on kworker creation. memcg has a similar issue with charge moving, so let's convert it to an official callback rather than the current one-off cpuset specific function. This patch adds cgroup_subsys->post_attach callback and makes cpuset register cpuset_post_attach_flush() as its ->post_attach. The conversion is mostly one-to-one except that the new callback is called under cgroup_mutex. This is to guarantee that no other migration operations are started before ->post_attach callbacks are finished. cgroup_mutex is one of the outermost mutex in the system and has never been and shouldn't be a problem. We can add specialized synchronization around __cgroup_procs_write() but I don't think there's any noticeable benefit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> # 4.4+ prerequisite for the next patch
-
David S. Miller authored
This reverts the following three commits: 70af921d 799977d9 f1705ec1 The feature was ill conceived, has terrible semantics, and has added nothing but regressions to the already fragile ipv6 stack. Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: David S. Miller <davem@davemloft.net>
-
Azael Avalos authored
Commit 52cbae01 ("toshiba_acpi: Change default Hotkey enabling value") changed the hotkeys enabling value, as it was the same value Windows uses, however, it turns out that the value tells the EC that the driver will now take care of the hardware events like the physical RFKill switch or the pointing device toggle button. This patch reverts such commit by changing the default hotkey enabling value to 0x09, which enables hotkey events only, making the hardware buttons working again. Fixes bugs 113331 and 114941. Signed-off-by: Azael Avalos <coproscefalo@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Darren Hart <dvhart@linux.intel.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds authored
Pull crypto fixes from Herbert Xu: "This fixes a couple of regressions in the talitos driver that were introduced back in 4.3. The first bug causes a crash when the driver's AEAD functionality is used while the second bug prevents its AEAD feature from working once you get past the first bug" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: talitos - fix AEAD tcrypt tests crypto: talitos - fix crash in talitos_cra_init()
-
Kevin Hilman authored
Merge tag 'omap-for-v4.6/dt-ti81xx-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Enable dm814x and dra62x clock driver. This branch has a dependency to the clk-ti branch from the Linux clk tree for the ADPLL clock driver. Otherwise things won't keep booting properly when we flip over to use the clock driver instead of fixed clocks set up by the bootloader. * tag 'omap-for-v4.6/dt-ti81xx-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: dts: Add clocks for dm814x ADPLL
-
Eric Engestrom authored
Signed-off-by: Eric Engestrom <eric@engestrom.ch> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
-
Paolo Abeni authored
After commit fbd40ea0 ("ipv4: Don't do expensive useless work during inetdev destroy.") when deleting an interface, fib_del_ifaddr() can be executed without any primary address present on the dead interface. The above is safe, but triggers some "bug: prim == NULL" warnings. This commit avoids warning if the in_dev is dead Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 24 Apr, 2016 6 commits
-
-
Linus Torvalds authored
-
David S. Miller authored
Saeed Mahameed says: ==================== mlx5 driver updates and fixes Changes from V0: - Dropped: ("net/mlx5e: Reset link modes upon setting speed to zero") - Fixed compilation issue introduced to mlx5_ib driver. - Rebased to df637193 ('Revert "Prevent NUll pointer dereference with two PHYs on cpsw"') This series has few bug fixes for mlx5 core and ethernet driver. Eli fixed a wrong static local variable declaration in flow steering API. Majd added the support of ConnectX-5 PF and VF and added the support for kernel shutdown pci callback for more robust reboot procedures. Maor fixed a soft lockup in flow steering. Rana fixed a wrog speed define in mlx5 EN driver. I also had the chance to introduce some bug fixes in mlx5 EN mtu reporting and handling. For -stable: net/mlx5_core: Fix soft lockup in steering error flow net/mlx5e: Device's mtu field is u16 and not int net/mlx5e: Fix minimum MTU net/mlx5e: Use vport MTU rather than physical port MTU ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Majd Dibbiny authored
This patch introduces kexec support for mlx5. When switching kernels, kexec() calls shutdown, which unloads the driver and cleans its resources. In addition, remove unregister netdev from shutdown flow. This will allow a clean shutdown, even if some netdev clients did not release their reference from this netdev. Releasing The HW resources only is enough as the kernel is shutting down Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eli Cohen authored
The static is not required and breaks re-entrancy if it will be required. Fixes: 25302363 ("net/mlx5_core: Flow steering tree initialization") Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Saeed Mahameed authored
Set and report vport MTU rather than physical MTU, Driver will set both vport and physical port mtu and will rely on the query of vport mtu. SRIOV VFs have to report their MTU to their vport manager (PF), and this will allow them to work with any MTU they need without failing the request. Also for some cases where the PF is not a port owner, PF can work with MTU less than the physical port mtu if set physical port mtu didn't take effect. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Saeed Mahameed authored
Minimum MTU that can be set in Connectx4 device is 68. This fixes the case where a user wants to set invalid MTU, the driver will fail to satisfy this request and the interface will stay down. It is better to report an error and continue working with old mtu. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-