- 06 Jul, 2017 37 commits
-
-
Andrea Arcangeli authored
Without a max deduplication limit for each KSM page, the list of the rmap_items associated to each stable_node can grow infinitely large. During the rmap walk each entry can take up to ~10usec to process because of IPIs for the TLB flushing (both for the primary MMU and the secondary MMUs with the MMU notifier). With only 16GB of address space shared in the same KSM page, that would amount to dozens of seconds of kernel runtime. A ~256 max deduplication factor will reduce the latencies of the rmap walks on KSM pages to order of a few msec. Just doing the cond_resched() during the rmap walks is not enough, the list size must have a limit too, otherwise the caller could get blocked in (schedule friendly) kernel computations for seconds, unexpectedly. There's room for optimization to significantly reduce the IPI delivery cost during the page_referenced(), but at least for page_migration in the KSM case (used by hard NUMA bindings, compaction and NUMA balancing) it may be inevitable to send lots of IPIs if each rmap_item->mm is active on a different CPU and there are lots of CPUs. Even if we ignore the IPI delivery cost, we've still to walk the whole KSM rmap list, so we can't allow millions or billions (ulimited) number of entries in the KSM stable_node rmap_item lists. The limit is enforced efficiently by adding a second dimension to the stable rbtree. So there are three types of stable_nodes: the regular ones (identical as before, living in the first flat dimension of the stable rbtree), the "chains" and the "dups". Every "chain" and all "dups" linked into a "chain" enforce the invariant that they represent the same write protected memory content, even if each "dup" will be pointed by a different KSM page copy of that content. This way the stable rbtree lookup computational complexity is unaffected if compared to an unlimited max_sharing_limit. It is still enforced that there cannot be KSM page content duplicates in the stable rbtree itself. Adding the second dimension to the stable rbtree only after the max_page_sharing limit hits, provides for a zero memory footprint increase on 64bit archs. The memory overhead of the per-KSM page stable_tree and per virtual mapping rmap_item is unchanged. Only after the max_page_sharing limit hits, we need to allocate a stable_tree "chain" and rb_replace() the "regular" stable_node with the newly allocated stable_node "chain". After that we simply add the "regular" stable_node to the chain as a stable_node "dup" by linking hlist_dup in the stable_node_chain->hlist. This way the "regular" (flat) stable_node is converted to a stable_node "dup" living in the second dimension of the stable rbtree. During stable rbtree lookups the stable_node "chain" is identified as stable_node->rmap_hlist_len == STABLE_NODE_CHAIN (aka is_stable_node_chain()). When dropping stable_nodes, the stable_node "dup" is identified as stable_node->head == STABLE_NODE_DUP_HEAD (aka is_stable_node_dup()). The STABLE_NODE_DUP_HEAD must be an unique valid pointer never used elsewhere in any stable_node->head/node to avoid a clashes with the stable_node->node.rb_parent_color pointer, and different from &migrate_nodes. So the second field of &migrate_nodes is picked and verified as always safe with a BUILD_BUG_ON in case the list_head implementation changes in the future. The STABLE_NODE_DUP is picked as a random negative value in stable_node->rmap_hlist_len. rmap_hlist_len cannot become negative when it's a "regular" stable_node or a stable_node "dup". The stable_node_chain->nid is irrelevant. The stable_node_chain->kpfn is aliased in a union with a time field used to rate limit the stable_node_chain->hlist prunes. The garbage collection of the stable_node_chain happens lazily during stable rbtree lookups (as for all other kind of stable_nodes), or while disabling KSM with "echo 2 >/sys/kernel/mm/ksm/run" while collecting the entire stable rbtree. While the "regular" stable_nodes and the stable_node "dups" must wait for their underlying tree_page to be freed before they can be freed themselves, the stable_node "chains" can be freed immediately if the stable_node->hlist turns empty. This is because the "chains" are never pointed by any page->mapping and they're effectively stable rbtree KSM self contained metadata. [akpm@linux-foundation.org: fix non-NUMA build] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Petr Holasek <pholasek@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Evgheni Dereveanchin <ederevea@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Gavin Guo <gavin.guo@canonical.com> Cc: Jay Vosburgh <jay.vosburgh@canonical.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wei Yang authored
When start_pfn equals end_pfn, __free_pages_memory() has no effect and __free_memory_core() will finally return (end_pfn - start_pfn) = 0. This patch returns 0 directly when start_pfn equals end_pfn. Link: http://lkml.kernel.org/r/20170502131115.6650-1-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nick Desaulniers authored
Clang and its -Wunsequenced emits a warning mm/vmscan.c:2961:25: error: unsequenced modification and access to 'gfp_mask' [-Wunsequenced] .gfp_mask = (gfp_mask = current_gfp_context(gfp_mask)), ^ While it is not clear to me whether the initialization code violates the specification (6.7.8 par 19 (ISO/IEC 9899) looks like it disagrees) the code is quite confusing and worth cleaning up anyway. Fix this by reusing sc.gfp_mask rather than the updated input gfp_mask parameter. Link: http://lkml.kernel.org/r/20170510154030.10720-1-nick.desaulniers@gmail.comSigned-off-by: Nick Desaulniers <nick.desaulniers@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Daniel Micay authored
The protection map is only modified by per-arch init code so it can be protected from writes after the init code runs. This change was extracted from PaX where it's part of KERNEXEC. Link: http://lkml.kernel.org/r/20170510174441.26163-1-danielmicay@gmail.comSigned-off-by: Daniel Micay <danielmicay@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Dave Hansen authored
There are a number of times that we loop over NR_MEM_SECTIONS, looking for section_present() on each section. But, when we have very large physical address spaces (large MAX_PHYSMEM_BITS), NR_MEM_SECTIONS becomes very large, making the loops quite long. With MAX_PHYSMEM_BITS=46 and a section size of 128MB, the current loops are 512k iterations, which we barely notice on modern hardware. But, raising MAX_PHYSMEM_BITS higher (like we will see on systems that support 5-level paging) makes this 64x longer and we start to notice, especially on slower systems like simulators. A 10-second delay for 512k iterations is annoying. But, a 640- second delay is crippling. This does not help if we have extremely sparse physical address spaces, but those are quite rare. We expect that most of the "slow" systems where this matters will also be quite small and non-sparse. To fix this, we track the highest section we've ever encountered. This lets us know when we will *never* see another section_present(), and lets us break out of the loops earlier. Doing the whole for_each_present_section_nr() macro is probably overkill, but it will ensure that any future loop iterations that we grow are more likely to be correct. Kirrill said "It shaved almost 40 seconds from boot time in qemu with 5-level paging enabled for me". Link: http://lkml.kernel.org/r/20170504174434.C45A4735@viggo.jf.intel.comSigned-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kees Cook authored
Some hardened environments want to build kernels with slab_nomerge already set (so that they do not depend on remembering to set the kernel command line option). This is desired to reduce the risk of kernel heap overflows being able to overwrite objects from merged caches and changes the requirements for cache layout control, increasing the difficulty of these attacks. By keeping caches unmerged, these kinds of exploits can usually only damage objects in the same cache (though the risk to metadata exploitation is unchanged). Link: http://lkml.kernel.org/r/20170620230911.GA25238@beastSigned-off-by: Kees Cook <keescook@chromium.org> Cc: Daniel Micay <danielmicay@gmail.com> Cc: David Windsor <dave@nullcore.net> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Daniel Micay <danielmicay@gmail.com> Cc: David Windsor <dave@nullcore.net> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Tejun Heo <tj@kernel.org> Cc: Daniel Mack <daniel@zonque.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Rik van Riel <riel@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Canjiang Lu authored
Link: http://lkml.kernel.org/r/20170616072918epcms5p4ff16c24ef8472b4c3b4371823cd87856@epcms5p4Signed-off-by: Canjiang Lu <canjiang.lu@samsung.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wei Yang authored
kmem_cache->cpu_partial is just used when CONFIG_SLUB_CPU_PARTIAL is set, so wrap it with config CONFIG_SLUB_CPU_PARTIAL will save some space on 32bit arch. This patch wraps kmem_cache->cpu_partial in config CONFIG_SLUB_CPU_PARTIAL and wraps its sysfs too. Link: http://lkml.kernel.org/r/20170502144533.10729-4-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wei Yang authored
cpu_slab's field partial is used when CONFIG_SLUB_CPU_PARTIAL is set, which means we can save a pointer's space on each cpu for every slub item. This patch wraps cpu_slab->partial in CONFIG_SLUB_CPU_PARTIAL and wraps its sysfs use too. [akpm@linux-foundation.org: avoid strange 80-col tricks] Link: http://lkml.kernel.org/r/20170502144533.10729-3-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wei Yang authored
Patch series "try to save some memory for kmem_cache in some cases", v2. kmem_cache is a frequently used data in kernel. During the code reading, I found maybe we could save some space in some cases. 1. On 64bit arch, type int will occupy a word if it doesn't sit well. 2. cpu_slab->partial is just used when CONFIG_SLUB_CPU_PARTIAL is set 3. cpu_partial is just used when CONFIG_SLUB_CPU_PARTIAL is set, while just save some space on 32bit arch. This patch (of 3): On 64bit arch, struct is 8-bytes aligned, so int will occupy a word if it doesn't sit well. This patch pack red_left_pad with reserved to save 8 bytes for struct kmem_cache on a 64bit arch. Link: http://lkml.kernel.org/r/20170502144533.10729-2-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wei Yang authored
Each time a slab is deactivated, the page and freelist pointer should be reset. This patch just merges these two options into deactivate_slab(). Link: http://lkml.kernel.org/r/20170507031215.3130-2-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wei Yang authored
When the code comes to this point, there are two cases: 1. cpu_slab is deactivated 2. cpu_slab is empty In both cased, cpu_slab->freelist is NULL at this moment. This patch removes the redundant assignment of cpu_slab->freelist. Link: http://lkml.kernel.org/r/20170507031215.3130-1-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
There is no real reason to duplicate kvmalloc* helpers so drop alloc_fdmem and replace it with the appropriate library function. Link: http://lkml.kernel.org/r/20170531155145.17111-2-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arvind Yadav authored
attribute_groups are not supposed to change at runtime. All functions working with attribute_groups provided by <linux/sysfs.h> work with const attribute_group. So mark the non-const structs as const. File size before: text data bss dec hex filename 4402 1088 38 5528 1598 fs/ocfs2/stackglue.o File size After adding 'const': text data bss dec hex filename 4442 1024 38 5504 1580 fs/ocfs2/stackglue.o Link: http://lkml.kernel.org/r/cab4e59b4918db3ed2ec77073a4cb310c4429ef5.1498808026.git.arvind.yadav.cs@gmail.comSigned-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
piaojun authored
'sd->dbg_sock' is malloced in sc_common_open(), but not freed at the end of sc_fop_release(). Link: http://lkml.kernel.org/r/594FB0A4.2050105@huawei.comSigned-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Fabian Frederick authored
Filesystems generally use SUPER_MAGIC values from magic.h instead of a local definition. Link: http://lkml.kernel.org/r/20170521154217.27917-1-fabf@skynet.beSigned-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Gang He authored
Fix a static code checker warning: fs/ocfs2/inode.c:179 ocfs2_iget() warn: passing zero to 'ERR_PTR' Fixes: d56a8f32 ("ocfs2: check/fix inode block for online file check") Link: http://lkml.kernel.org/r/1495516634-1952-1-git-send-email-ghe@suse.comSigned-off-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Reviewed-by: Eric Ren <zren@suse.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
SF Markus Elfring authored
drivers/sh/intc/virq.c: delete an error message for a failed memory allocation in add_virq_to_pirq() This issue was detected by using the Coccinelle software. Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf Link: http://lkml.kernel.org/r/54e30d61-5183-9911-cf35-1410fb78da5a@users.sourceforge.netSigned-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michael Ellerman authored
This header always exists, so doesn't require an ifdef around its inclusion. When CONFIG_ARCH_HAS_SET_MEMORY=y it includes the asm header, otherwise it provides empty versions of the set_memory_xx() routines. Link: http://lkml.kernel.org/r/1498717781-29151-4-git-send-email-mpe@ellerman.id.auSigned-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michael Ellerman authored
This header always exists, so doesn't require an ifdef around its inclusion. When CONFIG_ARCH_HAS_SET_MEMORY=y it includes the asm header, otherwise it provides empty versions of the set_memory_xx() routines. The usages of set_memory_xx() are still guarded by CONFIG_STRICT_MODULE_RWX. Link: http://lkml.kernel.org/r/1498717781-29151-3-git-send-email-mpe@ellerman.id.auSigned-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Laura Abbott <labbott@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michael Ellerman authored
This header always exists, so doesn't require an ifdef around its inclusion. When CONFIG_ARCH_HAS_SET_MEMORY=y it includes the asm header, otherwise it provides empty versions of the set_memory_xx() routines. Link: http://lkml.kernel.org/r/1498717781-29151-2-git-send-email-mpe@ellerman.id.auSigned-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Laura Abbott <labbott@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michael Ellerman authored
Currently code that wants to use set_memory_ro() etc, needs to include asm/set_memory.h, which doesn't exist on all arches. Some code knows it only builds on arches which have the header, other code guards the inclusion with an #ifdef, neither is ideal. So create linux/set_memory.h. This always exists, so users don't need an #ifdef just to include the header. When CONFIG_ARCH_HAS_SET_MEMORY=y it includes asm/set_memory.h, otherwise it provides empty non-failing implementations. Link: http://lkml.kernel.org/r/1498717781-29151-1-git-send-email-mpe@ellerman.id.auSigned-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Colin Ian King authored
Here are some of the more spelling mistakes and typos that I've found while fixing up spelling mistakes in kernel error message text over the past several weeks. Link: http://lkml.kernel.org/r/20170621142614.12529-1-colin.king@canonical.comSigned-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Joe Perches <joe@perches.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Rob Landley authored
Clarify help text that compression applies to ramfs as well as legacy ramdisk. Link: http://lkml.kernel.org/r/f206a960-5a61-cf59-f27c-e9f34872063c@landley.netSigned-off-by: Rob Landley <rob@landley.net> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Rob Landley authored
scripts/gen_initramfs_list.sh: teach INITRAMFS_ROOT_UID and INITRAMFS_ROOT_GID that -1 means "current user". Teach INITRAMFS_ROOT_UID and INITRAMFS_ROOT_GID that -1 means "current user". Link: http://lkml.kernel.org/r/2df3a9fb-4378-fa16-679d-99e788926c05@landley.netSigned-off-by: Rob Landley <rob@landley.net> Cc: Michal Marek <mmarek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Logan Gunthorpe authored
Add a default ioremap function which was not provided in all circumstances. (Only when CONFIG_PCI and CONFIG_TILEGX was set). I have designs to use them in scatterlist.c where they'd likely never be called with this architecture, but it is needed to compile. Thus, if the function is ever hit it returns NULL. Link: http://lkml.kernel.org/r/1495726904-27380-1-git-send-email-logang@deltatee.comSigned-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Stephen Bates <sbates@raithlin.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tobias Klauser authored
The mn10300 arch uses a verbatim copy of the asm-generic version and does not add any own implementations to the header, so use asm-generic/fb.h instead of duplicating code. Link: http://lkml.kernel.org/r/20170517083348.1815-1-tklauser@distanz.chSigned-off-by: Tobias Klauser <tklauser@distanz.ch> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Tobias Klauser authored
mn10300's asm/device.h is merely including asm-generic/device.h. Thus, the arch specific header can be omitted and the generic header can be used directly. Link: http://lkml.kernel.org/r/20170517124857.26834-1-tklauser@distanz.chSigned-off-by: Tobias Klauser <tklauser@distanz.ch> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Marcin Nowakowski authored
core_kernel_text is used by MIPS in its function graph trace processing, so having this method traced leads to an infinite set of recursive calls such as: Call Trace: ftrace_return_to_handler+0x50/0x128 core_kernel_text+0x10/0x1b8 prepare_ftrace_return+0x6c/0x114 ftrace_graph_caller+0x20/0x44 return_to_handler+0x10/0x30 return_to_handler+0x0/0x30 return_to_handler+0x0/0x30 ftrace_ops_no_ops+0x114/0x1bc core_kernel_text+0x10/0x1b8 core_kernel_text+0x10/0x1b8 core_kernel_text+0x10/0x1b8 ftrace_ops_no_ops+0x114/0x1bc core_kernel_text+0x10/0x1b8 prepare_ftrace_return+0x6c/0x114 ftrace_graph_caller+0x20/0x44 (...) Mark the function notrace to avoid it being traced. Link: http://lkml.kernel.org/r/1498028607-6765-1-git-send-email-marcin.nowakowski@imgtec.comSigned-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kirill A. Shutemov authored
Reinette reported the following crash: BUG: Bad page state in process log2exe pfn:57600 page:ffffea00015d8000 count:0 mapcount:0 mapping: (null) index:0x20200 flags: 0x4000000000040019(locked|uptodate|dirty|swapbacked) raw: 4000000000040019 0000000000000000 0000000000020200 00000000ffffffff raw: ffffea00015d8020 ffffea00015d8020 0000000000000000 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: 0x1(locked) Modules linked in: rfcomm 8021q bnep intel_rapl x86_pkg_temp_thermal coretemp efivars btusb btrtl btbcm pwm_lpss_pci snd_hda_codec_hdmi btintel pwm_lpss snd_hda_codec_realtek snd_soc_skl snd_hda_codec_generic snd_soc_skl_ipc spi_pxa2xx_platform snd_soc_sst_ipc snd_soc_sst_dsp i2c_designware_platform i2c_designware_core snd_hda_ext_core snd_soc_sst_match snd_hda_intel snd_hda_codec mei_me snd_hda_core mei snd_soc_rt286 snd_soc_rl6347a snd_soc_core efivarfs CPU: 1 PID: 354 Comm: log2exe Not tainted 4.12.0-rc7-test-test #19 Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0027.2016.1108.1529 11/08/2016 Call Trace: bad_page+0x16a/0x1f0 free_pages_check_bad+0x117/0x190 free_hot_cold_page+0x7b1/0xad0 __put_page+0x70/0xa0 madvise_free_huge_pmd+0x627/0x7b0 madvise_free_pte_range+0x6f8/0x1150 __walk_page_range+0x6b5/0xe30 walk_page_range+0x13b/0x310 madvise_free_page_range.isra.16+0xad/0xd0 madvise_free_single_vma+0x2e4/0x470 SyS_madvise+0x8ce/0x1450 If somebody frees the page under us and we hold the last reference to it, put_page() would attempt to free the page before unlocking it. The fix is trivial reorder of operations. Dave said: "I came up with the exact same patch. For posterity, here's the test case, generated by syzkaller and trimmed down by Reinette: https://www.sr71.net/~dave/intel/log2.c And the config that helps detect this: https://www.sr71.net/~dave/intel/config-log2" Fixes: b8d3c4c3 ("mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called") Link: http://lkml.kernel.org/r/20170628101249.17879-1-kirill.shutemov@linux.intel.comSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Huang Ying <ying.huang@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
David Rientjes authored
The motivation for commit abb2ea7d ("compiler, clang: suppress warning for unused static inline functions") was to suppress clang's warnings about unused static inline functions. For configs without CONFIG_OPTIMIZE_INLINING enabled, such as any non-x86 architecture, `inline' in the kernel implies that __attribute__((always_inline)) is used. Some code depends on that behavior, see https://lkml.org/lkml/2017/6/13/918: net/built-in.o: In function `__xchg_mb': arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99' arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99 The full fix would be to identify these breakages and annotate the functions with __always_inline instead of `inline'. But since we are late in the 4.12-rc cycle, simply carry forward the forced inlining behavior and work toward moving arm64, and other architectures, toward CONFIG_OPTIMIZE_INLINING behavior. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1706261552200.1075@chino.kir.corp.google.comSigned-off-by: David Rientjes <rientjes@google.com> Reported-by: Sodagudi Prasad <psodagud@codeaurora.org> Tested-by: Sodagudi Prasad <psodagud@codeaurora.org> Tested-by: Matthias Kaehlcke <mka@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds authored
Pull cgroup changes from Tejun Heo: - Waiman made the debug controller work and a lot more useful on cgroup2 - There were a couple issues with cgroup subtree delegation. The documentation on delegating to a non-root user was missing some part and cgroup namespace support wasn't factoring in delegation at all. The documentation is updated and the now there is a mount option to make cgroup namespace fit for delegation * 'for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: implement "nsdelegate" mount option cgroup: restructure cgroup_procs_write_permission() cgroup: "cgroup.subtree_control" should be writeable by delegatee cgroup: fix lockdep warning in debug controller cgroup: refactor cgroup_masks_read() in the debug controller cgroup: make debug an implicit controller on cgroup2 cgroup: Make debug cgroup support v2 and thread mode cgroup: Make Kconfig prompt of debug cgroup more accurate cgroup: Move debug cgroup to its own file cgroup: Keep accurate count of tasks in each css_set
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libataLinus Torvalds authored
Pull libata updates from Tejun Heo: - Christoph added support for TCG OPAL self encrypting disks - Minwoo added support for ATA PASS-THROUGH(32) - Linus Walleij removed spurious drvdata assignments in some drivers - Support for a few new device and other fixes * 'for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (33 commits) sd: add support for TCG OPAL self encrypting disks libata: fix build warning from unused goto label libata: Support for an ATA PASS-THROUGH(32) command. ahci: Add Device ID for ASMedia 1061R and 1062R sata_via: Enable optional hotplug on VT6420 ata: ahci_brcm: Avoid writing to read-only registers libata: Add the AHCI_HFLAG_NO_WRITE_TO_RO flag libata: Add the AHCI_HFLAG_YES_ALPM flag ata: ftide010: fix resource printing libata: make the function name in comment match the actual function ata: sata_rcar: make of_device_ids const. ata: pata_octeon_cf: make of_device_ids const. libata: Convert bare printks to pr_cont libahci: wrong comments in ahci_do_softreset() ata: declare ata_port_info structures as const ata: Add driver for Faraday Technology FTIDE010 ata: Add DT bindings for the Gemini SATA bridge ata: Add DT bindings for Faraday Technology FTIDE010 libata: implement SECURITY PROTOCOL IN/OUT libata: factor out a ata_identify_page_supported helper ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpuLinus Torvalds authored
Pull percpu updates from Tejun Heo: "These are the percpu changes for the v4.13-rc1 merge window. There are a couple visibility related changes - tracepoints and allocator stats through debugfs, along with __ro_after_init markings and a cosmetic rename in percpu_counter. Please note that the simple O(#elements_in_the_chunk) area allocator used by percpu allocator is again showing scalability issues, primarily with bpf allocating and freeing large number of counters. Dennis is working on the replacement allocator and the percpu allocator will be seeing increased churns in the coming cycles" * 'for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: fix static checker warnings in pcpu_destroy_chunk percpu: fix early calls for spinlock in pcpu_stats percpu: resolve err may not be initialized in pcpu_alloc percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batch percpu: add tracepoint support for percpu memory percpu: expose statistics about percpu memory via debugfs percpu: migrate percpu data structures to internal header percpu: add missing lockdep_assert_held to func pcpu_free_area mark most percpu globals as __ro_after_init
-
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linuxLinus Torvalds authored
Pull another parisc update from Helge Deller: "Christoph Hellwig provided one patch for the parisc architecture to drop the DMA_ERROR_CODE define from the parisc architecture" * 'parisc-4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: ->mapping_error
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds authored
Pull arm64 updates from Will Deacon: - RAS reporting via GHES/APEI (ACPI) - Indirect ftrace trampolines for modules - Improvements to kernel fault reporting - Page poisoning - Sigframe cleanups and preparation for SVE context - Core dump fixes - Sparse fixes (mainly relating to endianness) - xgene SoC PMU v3 driver - Misc cleanups and non-critical fixes * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits) arm64: fix endianness annotation for 'struct jit_ctx' and friends arm64: cpuinfo: constify attribute_group structures. arm64: ptrace: Fix incorrect get_user() use in compat_vfp_set() arm64: ptrace: Remove redundant overrun check from compat_vfp_set() arm64: ptrace: Avoid setting compat FP[SC]R to garbage if get_user fails arm64: fix endianness annotation for __apply_alternatives()/get_alt_insn() arm64: fix endianness annotation in get_kaslr_seed() arm64: add missing conversion to __wsum in ip_fast_csum() arm64: fix endianness annotation in acpi_parking_protocol.c arm64: use readq() instead of readl() to read 64bit entry_point arm64: fix endianness annotation for reloc_insn_movw() & reloc_insn_imm() arm64: fix endianness annotation for aarch64_insn_write() arm64: fix endianness annotation in aarch64_insn_read() arm64: fix endianness annotation in call_undef_hook() arm64: fix endianness annotation for debug-monitors.c ras: mark stub functions as 'inline' arm64: pass endianness info to sparse arm64: ftrace: fix !CONFIG_ARM64_MODULE_PLTS kernels arm64: signal: Allow expansion of the signal frame acpi: apei: check for pending errors when probing GHES entries ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds authored
Pull mnt namespace updates from Eric Biederman: "A big break-through came during this development cycle as a way was found to maintain the existing umount -l semantics while allowing for optimizations that improve the performance. That is represented by the first change in this series moving the reparenting of mounts into their own pass. This has allowed addressing the horrific performance of umount -l on a carefully crafted tree of mounts with locks held (0.06s vs 60s in my testing). What allowed this was not changing where umounts propagate to while propgating umounts. The next change fixes the case where the order of the mount whose umount are being progated visits a tree where the mounts are stacked upon each other in another order. This is weird but not hard to implement. The final change takes advantage of the unchanging mount propgation tree to skip parts of the mount propgation tree that have already been visited. Yielding a very nice speed up in the worst case. There remains one outstanding question about the semantics of umount -l that I am still discussiong with Ram Pai. In practice that area of the semantics was changed by 1064f874 ("mnt: Tuck mounts under others instead of creating shadow/side mounts.") and no regressions have been reported. Still I intend to finish talking that out with him to ensure there is not something a more intense use of mount propagation in the future will not cause to become significant" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: mnt: Make propagate_umount less slow for overlapping mount propagation trees mnt: In propgate_umount handle visiting mounts in any order mnt: In umount propagation reparent in a separate pass
-
- 05 Jul, 2017 3 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2Linus Torvalds authored
Pull GFS2 updates from Bob Peterson: "We've got eight GFS2 patches for this merge window: - Andreas Gruenbacher has four patches related to cleaning up the GFS2 inode evict process. This is about half of his patches designed to fix a long-standing GFS2 hang related to the inode shrinker: Shrinker calls gfs2 evict, evict calls DLM, DLM requires memory and blocks on the shrinker. These four patches have been well tested. His second set of patches are still being tested, so I plan to hold them until the next merge window, after we have more weeks of testing. The first patch eliminates the flush_delayed_work, which can block. - Andreas's second patch protects setting of gl_object for rgrps with a spin_lock to prevent proven races. - His third patch introduces a centralized mechanism for queueing glock work with better reference counting, to prevent more races. -His fourth patch retains a reference to inode glocks when an error occurs while creating an inode. This keeps the subsequent evict from needing to reacquire the glock, which might call into DLM and block in low memory conditions. - Arvind Yadav has a patch to add const to attribute_group structures. - I have a patch to detect directory entry inconsistencies and withdraw the file system if any are found. Better that than silent corruption. - I have a patch to remove a vestigial variable from glock structures, saving some slab space. - I have another patch to remove a vestigial variable from the GFS2 in-core superblock structure" * tag 'gfs2-4.13.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: GFS2: constify attribute_group structures. gfs2: gfs2_create_inode: Keep glock across iput gfs2: Clean up glock work enqueuing gfs2: Protect gl->gl_object by spin lock gfs2: Get rid of flush_delayed_work in gfs2_evict_inode GFS2: Eliminate vestigial sd_log_flush_wrapped GFS2: Remove gl_list from glock structure GFS2: Withdraw when directory entry inconsistencies are detected
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs updates from David Sterba: "The core updates improve error handling (mostly related to bios), with the usual incremental work on the GFP_NOFS (mis)use removal, refactoring or cleanups. Except the two top patches, all have been in for-next for an extensive amount of time. User visible changes: - statx support - quota override tunable - improved compression thresholds - obsoleted mount option alloc_start Core updates: - bio-related updates: - faster bio cloning - no allocation failures - preallocated flush bios - more kvzalloc use, memalloc_nofs protections, GFP_NOFS updates - prep work for btree_inode removal - dir-item validation - qgoup fixes and updates - cleanups: - removed unused struct members, unused code, refactoring - argument refactoring (fs_info/root, caller -> callee sink) - SEARCH_TREE ioctl docs" * 'for-4.13-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (115 commits) btrfs: Remove false alert when fiemap range is smaller than on-disk extent btrfs: Don't clear SGID when inheriting ACLs btrfs: fix integer overflow in calc_reclaim_items_nr btrfs: scrub: fix target device intialization while setting up scrub context btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges btrfs: qgroup: Introduce extent changeset for qgroup reserve functions btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quotas being enabled btrfs: qgroup: Return actually freed bytes for qgroup release or free data btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function btrfs: qgroup: Add quick exit for non-fs extents Btrfs: rework delayed ref total_bytes_pinned accounting Btrfs: return old and new total ref mods when adding delayed refs Btrfs: always account pinned bytes when dropping a tree block ref Btrfs: update total_bytes_pinned when pinning down extents Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT() Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64 btrfs: fix validation of XATTR_ITEM dir items btrfs: Verify dir_item in iterate_object_props btrfs: Check name_len before in btrfs_del_root_ref btrfs: Check name_len before reading btrfs_get_name ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull memdup_user() conversions from Al Viro: "A fairly self-contained series - hunting down open-coded memdup_user() and memdup_user_nul() instances" * 'work.memdup_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: bpf: don't open-code memdup_user() kimage_file_prepare_segments(): don't open-code memdup_user() ethtool: don't open-code memdup_user() do_ip_setsockopt(): don't open-code memdup_user() do_ipv6_setsockopt(): don't open-code memdup_user() irda: don't open-code memdup_user() xfrm_user_policy(): don't open-code memdup_user() ima_write_policy(): don't open-code memdup_user_nul() sel_write_validatetrans(): don't open-code memdup_user_nul()
-