- 09 Sep, 2017 40 commits
-
-
Eric Dumazet authored
__radix_tree_preload() only disables preemption if no error is returned. So we really need to make sure callers always check the return value. idr_preload() contract is to always disable preemption, so we need to add a missing preempt_disable() if an error happened. Similarly, ida_pre_get() only needs to call preempt_enable() in the case no error happened. Link: http://lkml.kernel.org/r/1504637190.15310.62.camel@edumazet-glaptop3.roam.corp.google.com Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree") Fixes: 7ad3d4d8 ("ida: Move ida_bitmap to a percpu variable") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> [4.11+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Baoquan He authored
One line of code was commented out by c++ style comment for debugging, but forgot removing it. Clean it up. Link: http://lkml.kernel.org/r/1503312113-11843-1-git-send-email-bhe@redhat.comSigned-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Dan Carpenter authored
This is mostly to keep the number of static checker warnings down so we can spot new bugs instead of them being drowned in noise. This function doesn't return normal kernel error codes but instead the return value is used to display exactly which memory failed. I chose -1 as hopefully that's a helpful thing to print. Link: http://lkml.kernel.org/r/20170817115420.uikisjvfmtrqkzjn@mwandaSigned-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Kees Cook <keescook@chromium.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com> Cc: Daniel Micay <danielmicay@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
As of commit 4cf0b354 ("rhashtable: avoid large lock-array allocations"), the default value for the locks multiplier was reduced from 128 to 32. Update the header file to reflect this. Link: http://lkml.kernel.org/r/20170815215401.30745-1-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yury Norov authored
The macro is the compile-time analogue of bitmap_from_u64() with the same purpose: convert the 64-bit number to the properly ordered pair of 32-bit parts, suitable for filling the bitmap in 32-bit BE environment. Use it to make test_bitmap_parselist() correct for 32-bit BE ABIs. Tested on BE mips/qemu. [akpm@linux-foundation.org: tweak code comment] Link: http://lkml.kernel.org/r/20170810172916.24144-1-ynorov@caviumnetworks.comSigned-off-by: Yury Norov <ynorov@caviumnetworks.com> Cc: Noam Camus <noamca@mellanox.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yury Norov authored
Do some basic checks for bitmap_parselist(). [akpm@linux-foundation.org: fix printk warning] Link: http://lkml.kernel.org/r/20170807225438.16161-2-ynorov@caviumnetworks.comSigned-off-by: Yury Norov <ynorov@caviumnetworks.com> Cc: Noam Camus <noamca@mellanox.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yury Norov authored
Current implementation of bitmap_parselist() uses a static variable to save local state while setting bits in the bitmap. It is obviously wrong if we assume execution in multiprocessor environment. Fortunately, it's possible to rewrite this portion of code to avoid using the static variable. It is also possible to set bits in the mask per-range with bitmap_set(), not per-bit, as it is implemented now, with set_bit(); which is way faster. The important side effect of this change is that setting bits in this function from now is not per-bit atomic and less memory-ordered. This is because set_bit() guarantees the order of memory accesses, while bitmap_set() does not. I think that it is the advantage of the new approach, because the bitmap_parselist() is intended to initialise bit arrays, and user should protect the whole bitmap during initialisation if needed. So protecting individual bits looks expensive and useless. Also, other range-oriented functions in lib/bitmap.c don't worry much about atomicity. With all that, setting 2k bits in map with the pattern like 0-2047:128/256 becomes ~50 times faster after applying the patch in my testing environment (arm64 hosted on qemu). The second patch of the series adds the test for bitmap_parselist(). It's not intended to cover all tricky cases, just to make sure that I didn't screw up during rework. Link: http://lkml.kernel.org/r/20170807225438.16161-1-ynorov@caviumnetworks.comSigned-off-by: Yury Norov <ynorov@caviumnetworks.com> Cc: Noam Camus <noamca@mellanox.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Florian Fainelli authored
Add a test module that allows testing that CONFIG_DEBUG_VIRTUAL works correctly, at least that it can catch invalid calls to virt_to_phys() against the non-linear kernel virtual address map. Link: http://lkml.kernel.org/r/20170808164035.26725-1-f.fainelli@gmail.comSigned-off-by: Florian Fainelli <f.fainelli@gmail.com> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andy Shevchenko authored
In some cases caller would like to use error code directly without shadowing. -EINVAL feels a rightful code to return in case of error in hex2bin(). Link: http://lkml.kernel.org/r/20170731135510.68023-1-andriy.shevchenko@linux.intel.comSigned-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
For the same reasons we already cache the leftmost pointer, apply the same optimization for rb_last() calls. Users must explicitly do this as rb_root_cached only deals with the smallest node. [dave@stgolabs.net: brain fart #1] Link: http://lkml.kernel.org/r/20170731155955.GD21328@linux-80c1.suse Link: http://lkml.kernel.org/r/20170719014603.19029-18-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
Such that we can optimize __mem_cgroup_largest_soft_limit_node(). The only overhead is the extra footprint for the cached pointer, but this should not be an issue for mem_cgroup_tree_per_node. [dave@stgolabs.net: brain fart #2] Link: http://lkml.kernel.org/r/20170731160114.GE21328@linux-80c1.suse Link: http://lkml.kernel.org/r/20170719014603.19029-17-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
... such that we can avoid the tree walks to get the node with the smallest key. Semantically the same, as the previously used rb_first(), but O(1). The main overhead is the extra footprint for the cached rb_node pointer, which should not matter for epoll. Link: http://lkml.kernel.org/r/20170719014603.19029-15-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
... such that we can avoid the tree walks to get the node with the smallest key. Semantically the same, as the previously used rb_first(), but O(1). The main overhead is the extra footprint for the cached rb_node pointer, which should not matter for procfs. Link: http://lkml.kernel.org/r/20170719014603.19029-14-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
interval_tree.h _is_ the generic flavor. Link: http://lkml.kernel.org/r/20170719014603.19029-13-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
Allow interval trees to quickly check for overlaps to avoid unnecesary tree lookups in interval_tree_iter_first(). As of this patch, all interval tree flavors will require using a 'rb_root_cached' such that we can have the leftmost node easily available. While most users will make use of this feature, those with special functions (in addition to the generic insert, delete, search calls) will avoid using the cached option as they can do funky things with insertions -- for example, vma_interval_tree_insert_after(). [jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()] Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Doug Ledford <dledford@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Jason Wang <jasowang@redhat.com> Cc: Christian Benvenuti <benve@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
... with the generic rbtree flavor instead. No changes in semantics whatsoever. Link: http://lkml.kernel.org/r/20170719014603.19029-11-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
... with the generic rbtree flavor instead. No changes in semantics whatsoever. Link: http://lkml.kernel.org/r/20170719014603.19029-10-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
... with the generic rbtree flavor instead. No changes in semantics whatsoever. Link: http://lkml.kernel.org/r/20170719014603.19029-9-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
... with the generic rbtree flavor instead. No changes in semantics whatsoever. Link: http://lkml.kernel.org/r/20170719014603.19029-8-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
We can work with a single rb_root_cached root to test both cached and non-cached rbtrees. In addition, also add a test to measure latencies between rb_first and its fast counterpart. Link: http://lkml.kernel.org/r/20170719014603.19029-7-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
This adds a second test for regular rb-tree testing in that there is no need to repeat it for the augmented flavor. Link: http://lkml.kernel.org/r/20170719014603.19029-6-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
Allows for more flexible debugging. Link: http://lkml.kernel.org/r/20170719014603.19029-5-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
While overall the code is very nicely commented, it might not be immediately obvious from the diagrams what is going on. Add a very brief summary of each case. Opposite cases where the node is the left child are left untouched. Link: http://lkml.kernel.org/r/20170719014603.19029-4-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
The only times the nil-parent (root node) condition is true is when the node is the first in the tree, or after fixing rbtree rule #4 and the case 1 rebalancing made the node the root. Such conditions do not apply most of the time: (i) The common case in an rbtree is to have more than a single node, so this is only true for the first rb_insert(). (ii) While there is a chance only one first rotation is needed, cases where the node's uncle is black (cases 2,3) are more common as we can have the following scenarios during the rotation looping: case1 only, case1+1, case2+3, case1+2+3, case3 only, etc. This patch, therefore, adds an unlikely() optimization to this conditional. When profiling with CONFIG_PROFILE_ANNOTATED_BRANCHES, a kernel build shows that the incorrect rate is less than 15%, and for workloads that involve insert mostly trees overtime tend to have less than 2% incorrect rate. Link: http://lkml.kernel.org/r/20170719014603.19029-3-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
Patch series "rbtree: Cache leftmost node internally", v4. A series to extending rbtrees to internally cache the leftmost node such that we can have fast overlap check optimization for all interval tree users[1]. The benefits of this series are that: (i) Unify users that do internal leftmost node caching. (ii) Optimize all interval tree users. (iii) Convert at least two new users (epoll and procfs) to the new interface. This patch (of 16): Red-black tree semantics imply that nodes with smaller or greater (or equal for duplicates) keys always be to the left and right, respectively. For the kernel this is extremely evident when considering our rb_first() semantics. Enabling lookups for the smallest node in the tree in O(1) can save a good chunk of cycles in not having to walk down the tree each time. To this end there are a few core users that explicitly do this, such as the scheduler and rtmutexes. There is also the desire for interval trees to have this optimization allowing faster overlap checking. This patch introduces a new 'struct rb_root_cached' which is just the root with a cached pointer to the leftmost node. The reason why the regular rb_root was not extended instead of adding a new structure was that this allows the user to have the choice between memory footprint and actual tree performance. The new wrappers on top of the regular rb_root calls are: - rb_first_cached(cached_root) -- which is a fast replacement for rb_first. - rb_insert_color_cached(node, cached_root, new) - rb_erase_cached(node, cached_root) In addition, augmented cached interfaces are also added for basic insertion and deletion operations; which becomes important for the interval tree changes. With the exception of the inserts, which adds a bool for updating the new leftmost, the interfaces are kept the same. To this end, porting rb users to the cached version becomes really trivial, and keeping current rbtree semantics for users that don't care about the optimization requires zero overhead. Link: http://lkml.kernel.org/r/20170719014603.19029-2-dave@stgolabs.netSigned-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthias Kaehlcke authored
GENMASK(_ULL) performs a left-shift of ~0UL(L), which technically results in an integer overflow. clang raises a warning if the overflow occurs in a preprocessor expression. Clear the low-order bits through a substraction instead of the left-shift to avoid the overflow. (akpm: no change in .text size in my testing) Link: http://lkml.kernel.org/r/20170803212020.24939-1-mka@chromium.orgSigned-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Babu Moger authored
We have seen some generic code use config parameter CONFIG_CPU_BIG_ENDIAN to decide the endianness. Here are the few examples. include/asm-generic/qrwlock.h drivers/of/base.c drivers/of/fdt.c drivers/tty/serial/earlycon.c drivers/tty/serial/serial_core.c Display warning if CPU_BIG_ENDIAN is not defined on big endian architecture and also warn if it defined on little endian architectures. Here is our original discussion https://lkml.org/lkml/2017/5/24/620 Link: http://lkml.kernel.org/r/1499358861-179979-4-git-send-email-babu.moger@oracle.comSigned-off-by: Babu Moger <babu.moger@oracle.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David S. Miller <davem@davemloft.net> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Michal Simek <monstr@monstr.eu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Babu Moger authored
microblaze architectures can be configured for either little or big endian formats. Add a choice option for the user to select the correct endian format(default to big endian). Also update the Makefile so toolchain can compile for the format it is configured for. Link: http://lkml.kernel.org/r/1499358861-179979-3-git-send-email-babu.moger@oracle.comSigned-off-by: Babu Moger <babu.moger@oracle.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Michal Simek <monstr@monstr.eu> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David S. Miller <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jonas Bonn <jonas@southpole.se> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Babu Moger authored
Patch series "Define CPU_BIG_ENDIAN or warn for inconsistencies", v3. While working on enabling queued rwlock on SPARC, found this following code in include/asm-generic/qrwlock.h which uses CONFIG_CPU_BIG_ENDIAN to clear a byte. static inline u8 *__qrwlock_write_byte(struct qrwlock *lock) { return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN); } Problem is many of the fixed big endian architectures don't define CPU_BIG_ENDIAN and clears the wrong byte. Define CPU_BIG_ENDIAN for all the fixed big endian architecture to fix it. Also found few more references of this config parameter in drivers/of/base.c drivers/of/fdt.c drivers/tty/serial/earlycon.c drivers/tty/serial/serial_core.c Be aware that this may cause regressions if someone has worked-around problems in the above code already. Remove the work-around. Here is our original discussion https://lkml.org/lkml/2017/5/24/620 Link: http://lkml.kernel.org/r/1499358861-179979-2-git-send-email-babu.moger@oracle.comSigned-off-by: Babu Moger <babu.moger@oracle.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Stafford Horne <shorne@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
First, number of CPUs can't be negative number. Second, different signnnedness leads to suboptimal code in the following cases: 1) kmalloc(nr_cpu_ids * sizeof(X)); "int" has to be sign extended to size_t. 2) while (loff_t *pos < nr_cpu_ids) MOVSXD is 1 byte longed than the same MOV. Other cases exist as well. Basically compiler is told that nr_cpu_ids can't be negative which can't be deduced if it is "int". Code savings on allyesconfig kernel: -3KB add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370) function old new delta coretemp_cpu_online 450 512 +62 rcu_init_one 1234 1272 +38 pci_device_probe 374 399 +25 ... pgdat_reclaimable_pages 628 556 -72 select_fallback_rq 446 369 -77 task_numa_find_cpu 1923 1807 -116 Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Where possible, call memset16(), memmove() or memcpy() instead of using open-coded loops. I don't like the calling convention that uses a byte count instead of a count of u16s, but it's a little late to change that. Reduces code size of fbcon.o by almost 400 bytes on my laptop build. [akpm@linux-foundation.org: fix build] Link: http://lkml.kernel.org/r/20170720184539.31609-9-willy@infradead.orgSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Minchan Kim <minchan@kernel.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
memset32() can be used to initialise these three arrays. Minor code footprint reduction. Link: http://lkml.kernel.org/r/20170720184539.31609-8-willy@infradead.orgSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
zram was the motivation for creating memset_l(). Minchan Kim sees a 7% performance improvement on x86 with 100MB of non-zero deduplicatable data: perf stat -r 10 dd if=/dev/zram0 of=/dev/null vanilla: 0.232050465 seconds time elapsed ( +- 0.51% ) memset_l: 0.217219387 seconds time elapsed ( +- 0.07% ) Link: http://lkml.kernel.org/r/20170720184539.31609-7-willy@infradead.orgSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Alpha already had an optimised fill-memory-with-16-bit-quantity assembler routine called memsetw(). It has a slightly different calling convention from memset16() in that it takes a byte count, not a count of words. That's the same convention used by ARM's __memset routines, so rename Alpha's routine to match and add a memset16() wrapper around it. Then convert Alpha's scr_memsetw() to call memset16() instead of memsetw(). Link: http://lkml.kernel.org/r/20170720184539.31609-6-willy@infradead.orgSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Reuse the existing optimised memset implementation to implement an optimised memset32 and memset64. Link: http://lkml.kernel.org/r/20170720184539.31609-5-willy@infradead.orgSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
These are single instructions on x86. There's no 64-bit instruction for x86-32, but we don't yet have any user for memset64() on 32-bit architectures, so don't bother to implement it. Link: http://lkml.kernel.org/r/20170720184539.31609-4-willy@infradead.orgSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
[akpm@linux-foundation.org: minor tweaks] Link: http://lkml.kernel.org/r/20170720184539.31609-3-willy@infradead.orgSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
Patch series "Multibyte memset variations", v4. A relatively common idiom we're missing is a function to fill an area of memory with a pattern which is larger than a single byte. I first noticed this with a zram patch which wanted to fill a page with an 'unsigned long' value. There turn out to be quite a few places in the kernel which can benefit from using an optimised function rather than a loop; sometimes text size, sometimes speed, and sometimes both. The optimised PowerPC version (not included here) improves performance by about 30% on POWER8 on just the raw memset_l(). Most of the extra lines of code come from the three testcases I added. This patch (of 8): memset16(), memset32() and memset64() are like memset(), but allow the caller to fill the destination with a value larger than a single byte. memset_l() and memset_p() allow the caller to use unsigned long and pointer values respectively. Link: http://lkml.kernel.org/r/20170720184539.31609-2-willy@infradead.orgSigned-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Masahiro Yamada authored
This macro is useful to avoid link error on 32-bit systems. We have the same definition in two drivers, so move it to include/linux/kernel.h While we are here, refactor DIV_ROUND_UP_ULL() by using DIV_ROUND_DOWN_ULL(). Link: http://lkml.kernel.org/r/1500945156-12907-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Mark Brown <broonie@kernel.org> Cc: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Boris Brezillon <boris.brezillon@free-electrons.com> Cc: Marek Vasut <marek.vasut@gmail.com> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Richard Weinberger <richard@nod.at> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
David Rientjes authored
If there are large numbers of hugepages to iterate while reading /proc/pid/smaps, the page walk never does cond_resched(). On archs without split pmd locks, there can be significant and observable contention on mm->page_table_lock which cause lengthy delays without rescheduling. Always reschedule in smaps_pte_range() if necessary since the pagewalk iteration can be expensive. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708211405520.131071@chino.kir.corp.google.comSigned-off-by: David Rientjes <rientjes@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-