Commits · 283ea345934d277e30c841c577e0e2142b4bfcae · Kirill Smelkov / linux

17 Oct, 2019 3 commits

coccinelle: api/devm_platform_ioremap_resource: remove useless script · 283ea345

Alexandre Belloni authored Oct 17, 2019

While it is useful for new drivers to use devm_platform_ioremap_resource,
this script is currently used to spam maintainers, often updating very
old drivers. The net benefit is the removal of 2 lines of code in the
driver but the review load for the maintainers is huge. As of now, more
that 560 patches have been sent, some of them obviously broken, as in:

https://lore.kernel.org/lkml/9bbcce19c777583815c92ce3c2ff2586@www.loen.fr/

Remove the script to reduce the spam.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

283ea345

Merge tag 'platform-drivers-x86-v5.4-3' of git://git.infradead.org/linux-platform-drivers-x86 · fe7d2c23

Linus Torvalds authored Oct 17, 2019

Pull x86 platform driver fixes from Andy Shevchenko:

 - Users of Intel P-Unit IPC driver might be surprised by harmless
   warning. Thus, switch to API which doesn't issue a warning at all.

 - I²C multi-instantiate driver continues to add slave devices even when
   IRQ resource is not found. For devices in the market IRQ resource is
   mandatory, so, fail the ->probe() of the parent driver to avoid
   slaves being probed.

 - Avoid compiler warning due to unused variable in Classmate laptop
   driver.

* tag 'platform-drivers-x86-v5.4-3' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: i2c-multi-instantiate: Fail the probe if no IRQ provided
  platform/x86: intel_punit_ipc: Avoid error message when retrieving IRQ
  platform/x86: classmate-laptop: remove unused variable

fe7d2c23

Merge tag 'gpio-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 7801158f

Linus Torvalds authored Oct 17, 2019

Pull GPIO fixes from Linus Walleij:
 "The fixes pertain to a problem with initializing the Intel GPIO
  irqchips when adding gpiochips.

  Andy fixed it up elegantly by adding a hardware initialization
  callback to the struct gpio_irq_chip so let's use this. Tested and
  verified on the target hardware"

* tag 'gpio-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: lynxpoint: set default handler to be handle_bad_irq()
  gpio: merrifield: Move hardware initialization to callback
  gpio: lynxpoint: Move hardware initialization to callback
  gpio: intel-mid: Move hardware initialization to callback
  gpiolib: Initialize the hardware with a callback
  gpio: merrifield: Restore use of irq_base

7801158f

16 Oct, 2019 1 commit

kthread: make __kthread_queue_delayed_work static · bc88f85c

Ben Dooks authored Oct 16, 2019

The __kthread_queue_delayed_work is not exported so
make it static, to avoid the following sparse warning:

kernel/kthread.c:869:6: warning: symbol '__kthread_queue_delayed_work' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bc88f85c

15 Oct, 2019 5 commits

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 3b1f00ac

Linus Torvalds authored Oct 15, 2019

Pull virtio fixes from Michael Tsirkin:
 "Some minor bugfixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost/test: stop device before reset
  tools/virtio: xen stub
  tools/virtio: more stubs

3b1f00ac

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 8625732e

Linus Torvalds authored Oct 15, 2019

Pull SCSI fixes from James Bottomley:
 "Five changes, two in drivers (qla2xxx, zfcp), one to MAINTAINERS
  (qla2xxx) and two in the core.

  The last two are mostly about removing incorrect messages from the
  kernel log: the resid message is definitely wrong and the sync cache
  on protected drive problem is arguably wrong"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: MAINTAINERS: Update qla2xxx driver
  scsi: zfcp: fix reaction on bit error threshold notification
  scsi: core: save/restore command resid for error handling
  scsi: qla2xxx: Remove WARN_ON_ONCE in qla2x00_status_cont_entry()
  scsi: sd: Ignore a failure to sync cache due to lack of authorization

8625732e

sparc64: disable fast-GUP due to unexplained oopses · 8e0d0ad2

Linus Torvalds authored Oct 15, 2019

HAVE_FAST_GUP enables the lockless quick page table walker for simple
cases, and is a nice optimization for some random loads that can then
use get_user_pages_fast() rather than the more careful page walker.

However, for some unexplained reason, it seems to be subtly broken on
sparc64.  The breakage is only with some compiler versions and some
hardware, and nobody seems to have figured out what triggers it,
although there's a simple reprodicer for the problem when it does
trigger.

The problem was introduced with the conversion to the generic GUP code
in commit 7b9afb86 ("sparc64: use the generic get_user_pages_fast
code"), but nothing looks obviously wrong in that conversion.  It may be
a compiler bug that just hits us with the code reorganization.  Or it
may be something very specific to sparc64.

This disables HAVE_FAST_GUP entirely.  That makes things like futexes a
bit slower, but at least they work.  If we can figure out the trigger,
that would be lovely, but it's been three months already..

Link: https://lore.kernel.org/lkml/20190717215956.GA30369@altlinux.org/
Fixes: 7b9afb86 ("sparc64: use the generic get_user_pages_fast code")
Reported-by: Dmitry V Levin <ldv@altlinux.org>
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Requested-by: Meelis Roos <mroos@linux.ee>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8e0d0ad2

Merge branch 'parisc-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 02755af0

Linus Torvalds authored Oct 15, 2019

Pull parisc fixes from Helge Deller:

 - Fix a parisc-specific fallout of Christoph's
   dma_set_mask_and_coherent() patches (Sven)

 - Fix a vmap memory leak in ioremap()/ioremap() (Helge)

 - Some minor cleanups and documentation updates (Nick, Helge)

* 'parisc-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Remove 32-bit DMA enforcement from sba_iommu
  parisc: Fix vmap memory leak in ioremap()/iounmap()
  parisc: prefer __section from compiler_attributes.h
  parisc: sysctl.c: Use CONFIG_PARISC instead of __hppa_ define
  MAINTAINERS: Add hp_sdc drivers to parisc arch

02755af0

Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging · 37b238da

Linus Torvalds authored Oct 15, 2019

Pull dmi fix from Jean Delvare.

* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  firmware: dmi: Fix unlikely out-of-bounds read in save_mem_devices

37b238da

14 Oct, 2019 30 commits

Merge branch 'akpm' (patches from Andrew) · 5bc52f64

Linus Torvalds authored Oct 14, 2019

Merge more fixes from Andrew Morton:
 "The usual shower of hotfixes and some followups to the recently merged
  page_owner enhancements"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once
  mm/slab.c: fix kernel-doc warning for __ksize()
  xarray.h: fix kernel-doc warning
  bitmap.h: fix kernel-doc warning and typo
  fs/fs-writeback.c: fix kernel-doc warning
  fs/libfs.c: fix kernel-doc warning
  fs/direct-io.c: fix kernel-doc warning
  mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()
  mm, hugetlb: allow hugepage allocations to reclaim as needed
  lib/test_meminit: add a kmem_cache_alloc_bulk() test
  mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations
  lib/generic-radix-tree.c: add kmemleak annotations
  mm/slub: fix a deadlock in show_slab_objects()
  mm, page_owner: rename flag indicating that page is allocated
  mm, page_owner: decouple freeing stack trace from debug_pagealloc
  mm, page_owner: fix off-by-one error in __set_page_owner_handle()

5bc52f64

gpio: lynxpoint: set default handler to be handle_bad_irq() · 75e99bf5

Andy Shevchenko authored Oct 09, 2019

We switch the default handler to be handle_bad_irq() instead of
handle_simple_irq() (which was not correct anyway).
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

75e99bf5

gpio: merrifield: Move hardware initialization to callback · 4c875409

Andy Shevchenko authored Oct 09, 2019

The driver wants to initialize related registers before IRQ chip will be added.
That's why move it to a corresponding callback. It also fixes the NULL pointer
dereference.

Fixes: 8f86a5b4 ("gpio: merrifield: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

4c875409

gpio: lynxpoint: Move hardware initialization to callback · a3391206

Andy Shevchenko authored Oct 09, 2019

The driver wants to initialize related registers before IRQ chip will be added.
That's why move it to a corresponding callback. It also fixes the NULL pointer
dereference.

Fixes: 7b1e8894 ("gpio: lynxpoint: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

a3391206

gpio: intel-mid: Move hardware initialization to callback · a752fbb4

Andy Shevchenko authored Oct 09, 2019

The driver wants to initialize related registers before IRQ chip will be added.
That's why move it to a corresponding callback. It also fixes the NULL pointer
dereference.

Fixes: 8069e69a ("gpio: intel-mid: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

a752fbb4

gpiolib: Initialize the hardware with a callback · 9411e3aa

Andy Shevchenko authored Oct 09, 2019

After changing the drivers to use GPIO core to add an IRQ chip
it appears that some of them requires a hardware initialization
before adding the IRQ chip.

Add an optional callback ->init_hw() to allow that drivers
to initialize hardware if needed.

This change is a part of the fix NULL pointer dereference
brought to the several drivers recently.

Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

9411e3aa

gpio: merrifield: Restore use of irq_base · 6658f87f

Andy Shevchenko authored Oct 09, 2019

During conversion to internal IRQ chip initialization the commit
8f86a5b4 ("gpio: merrifield: Pass irqchip when adding gpiochip")
lost the irq_base assignment.

drivers/gpio/gpio-merrifield.c: In function ‘mrfld_gpio_probe’:
drivers/gpio/gpio-merrifield.c:405:17: warning: variable ‘irq_base’ set but not used [-Wunused-but-set-variable]

Assign the girq->first to it.

Fixes: 8f86a5b4 ("gpio: merrifield: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

6658f87f

mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once · 3d7fed4a

Jane Chu authored Oct 14, 2019

Mmap /dev/dax more than once, then read the poison location using
address from one of the mappings.  The other mappings due to not having
the page mapped in will cause SIGKILLs delivered to the process.
SIGKILL succeeds over SIGBUS, so user process loses the opportunity to
handle the UE.

Although one may add MAP_POPULATE to mmap(2) to work around the issue,
MAP_POPULATE makes mapping 128GB of pmem several magnitudes slower, so
isn't always an option.

Details -

  ndctl inject-error --block=10 --count=1 namespace6.0

  ./read_poison -x dax6.0 -o 5120 -m 2
  mmaped address 0x7f5bb6600000
  mmaped address 0x7f3cf3600000
  doing local read at address 0x7f3cf3601400
  Killed

Console messages in instrumented kernel -

  mce: Uncorrected hardware memory error in user-access at edbe201400
  Memory failure: tk->addr = 7f5bb6601000
  Memory failure: address edbe201: call dev_pagemap_mapping_shift
  dev_pagemap_mapping_shift: page edbe201: no PUD
  Memory failure: tk->size_shift == 0
  Memory failure: Unable to find user space address edbe201 in read_poison
  Memory failure: tk->addr = 7f3cf3601000
  Memory failure: address edbe201: call dev_pagemap_mapping_shift
  Memory failure: tk->size_shift = 21
  Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page
    => to deliver SIGKILL
  Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption
    => to deliver SIGBUS

Link: http://lkml.kernel.org/r/1565112345-28754-3-git-send-email-jane.chu@oracle.comSigned-off-by: Jane Chu <jane.chu@oracle.com>
Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3d7fed4a

mm/slab.c: fix kernel-doc warning for __ksize() · 87bf4f71

Randy Dunlap authored Oct 14, 2019

Fix kernel-doc warning in mm/slab.c:

mm/slab.c:4215: warning: Function parameter or member 'objp' not described in '__ksize'

Also add Return: documentation section for this function.

Link: http://lkml.kernel.org/r/68c9fd7d-f09e-d376-e292-c7b2bdf1774d@infradead.org
Fixes: 10d1f8cb ("mm/slab: refactor common ksize KASAN logic into slab_common.c")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

87bf4f71

xarray.h: fix kernel-doc warning · 13bea898

Randy Dunlap authored Oct 14, 2019

Fix (Sphinx) kernel-doc warning in <linux/xarray.h>:

  include/linux/xarray.h:232: WARNING: Unexpected indentation.

Link: http://lkml.kernel.org/r/89ba2134-ce23-7c10-5ee1-ef83b35aa984@infradead.org
Fixes: a3e4d3f9 ("XArray: Redesign xa_alloc API")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

13bea898

bitmap.h: fix kernel-doc warning and typo · 2a7e582f

Randy Dunlap authored Oct 14, 2019

Fix kernel-doc warning in <linux/bitmap.h>:

  include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal'

Also fix small typo (bitnaps).

Link: http://lkml.kernel.org/r/0729ea7a-2c0d-b2c5-7dd3-3629ee0803e2@infradead.org
Fixes: b9fa6442 ("cpumask: Implement cpumask_or_equal()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2a7e582f

fs/fs-writeback.c: fix kernel-doc warning · b46ec1da

Randy Dunlap authored Oct 14, 2019

Fix kernel-doc warning in fs/fs-writeback.c:

  fs/fs-writeback.c:913: warning: Excess function parameter 'nr_pages' description in 'cgroup_writeback_by_id'

Link: http://lkml.kernel.org/r/756645ac-0ce8-d47e-d30a-04d9e4923a4f@infradead.org
Fixes: d62241c7 ("writeback, memcg: Implement cgroup_writeback_by_id()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b46ec1da

fs/libfs.c: fix kernel-doc warning · 8e88bfba

Randy Dunlap authored Oct 14, 2019

Fix kernel-doc warning in fs/libfs.c:

  fs/libfs.c:496: warning: Excess function parameter 'available' description in 'simple_write_end'

Link: http://lkml.kernel.org/r/5fc9d70b-e377-0ec9-066a-970d49579041@infradead.org
Fixes: ad2a722f ("libfs: Open code simple_commit_write into only user")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Boaz Harrosh <boazh@netapp.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8e88bfba

fs/direct-io.c: fix kernel-doc warning · c70d868f

Randy Dunlap authored Oct 14, 2019

Fix kernel-doc warning in fs/direct-io.c:

  fs/direct-io.c:258: warning: Excess function parameter 'offset' description in 'dio_complete'

Also, don't mark this function as having kernel-doc notation since it is
not exported.

Link: http://lkml.kernel.org/r/97908511-4328-4a56-17fe-f43a1d7aa470@infradead.org
Fixes: 6d544bb4 ("dio: centralize completion in dio_complete()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Zach Brown <zab@zabbo.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c70d868f

mm, compaction: fix wrong pfn handling in __reset_isolation_pfn() · a2e9a5af

Vlastimil Babka authored Oct 14, 2019

Florian and Dave reported [1] a NULL pointer dereference in
__reset_isolation_pfn().  While the exact cause is unclear, staring at
the code revealed two bugs, which might be related.

One bug is that if zone starts in the middle of pageblock, block_page
might correspond to different pfn than block_pfn, and then the
pfn_valid_within() checks will check different pfn's than those accessed
via struct page.  This might result in acessing an unitialized page in
CONFIG_HOLES_IN_ZONE configs.

The other bug is that end_page refers to the first page of next
pageblock and not last page of current pageblock.  The online and valid
check is then wrong and with sections, the while (page < end_page) loop
might wander off actual struct page arrays.

[1] https://lore.kernel.org/linux-xfs/87o8z1fvqu.fsf@mid.deneb.enyo.de/

Link: http://lkml.kernel.org/r/20191008152915.24704-1-vbabka@suse.cz
Fixes: 6b0868c8 ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Florian Weimer <fw@deneb.enyo.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a2e9a5af

mm, hugetlb: allow hugepage allocations to reclaim as needed · 3f36d866

David Rientjes authored Oct 14, 2019

Commit b39d0ee2 ("mm, page_alloc: avoid expensive reclaim when
compaction may not succeed") has chnaged the allocator to bail out from
the allocator early to prevent from a potentially excessive memory
reclaim.  __GFP_RETRY_MAYFAIL is designed to retry the allocation,
reclaim and compaction loop as long as there is a reasonable chance to
make forward progress.  Neither COMPACT_SKIPPED nor COMPACT_DEFERRED at
the INIT_COMPACT_PRIORITY compaction attempt gives this feedback.

The most obvious affected subsystem is hugetlbfs which allocates huge
pages based on an admin request (or via admin configured overcommit).  I
have done a simple test which tries to allocate half of the memory for
hugetlb pages while the memory is full of a clean page cache.  This is
not an unusual situation because we try to cache as much of the memory
as possible and sysctl/sysfs interface to allocate huge pages is there
for flexibility to allocate hugetlb pages at any time.

System has 1GB of RAM and we are requesting 515MB worth of hugetlb pages
after the memory is prefilled by a clean page cache:

  root@test1:~# cat hugetlb_test.sh

  set -x
  echo 0 > /proc/sys/vm/nr_hugepages
  echo 3 > /proc/sys/vm/drop_caches
  echo 1 > /proc/sys/vm/compact_memory
  dd if=/mnt/data/file-1G of=/dev/null bs=$((4<<10))
  TS=$(date +%s)
  echo 256 > /proc/sys/vm/nr_hugepages
  cat /proc/sys/vm/nr_hugepages

The results for 2 consecutive runs on clean 5.3

  root@test1:~# sh hugetlb_test.sh
  + echo 0
  + echo 3
  + echo 1
  + dd if=/mnt/data/file-1G of=/dev/null bs=4096
  262144+0 records in
  262144+0 records out
  1073741824 bytes (1.1 GB) copied, 21.0694 s, 51.0 MB/s
  + date +%s
  + TS=1569905284
  + echo 256
  + cat /proc/sys/vm/nr_hugepages
  256
  root@test1:~# sh hugetlb_test.sh
  + echo 0
  + echo 3
  + echo 1
  + dd if=/mnt/data/file-1G of=/dev/null bs=4096
  262144+0 records in
  262144+0 records out
  1073741824 bytes (1.1 GB) copied, 21.7548 s, 49.4 MB/s
  + date +%s
  + TS=1569905311
  + echo 256
  + cat /proc/sys/vm/nr_hugepages
  256

Now with b39d0ee2 applied

  root@test1:~# sh hugetlb_test.sh
  + echo 0
  + echo 3
  + echo 1
  + dd if=/mnt/data/file-1G of=/dev/null bs=4096
  262144+0 records in
  262144+0 records out
  1073741824 bytes (1.1 GB) copied, 20.1815 s, 53.2 MB/s
  + date +%s
  + TS=1569905516
  + echo 256
  + cat /proc/sys/vm/nr_hugepages
  11
  root@test1:~# sh hugetlb_test.sh
  + echo 0
  + echo 3
  + echo 1
  + dd if=/mnt/data/file-1G of=/dev/null bs=4096
  262144+0 records in
  262144+0 records out
  1073741824 bytes (1.1 GB) copied, 21.9485 s, 48.9 MB/s
  + date +%s
  + TS=1569905541
  + echo 256
  + cat /proc/sys/vm/nr_hugepages
  12

The success rate went down by factor of 20!

Although hugetlb allocation requests might fail and it is reasonable to
expect them to under extremely fragmented memory or when the memory is
under a heavy pressure but the above situation is not that case.

Fix the regression by reverting back to the previous behavior for
__GFP_RETRY_MAYFAIL requests and disable the beail out heuristic for
those requests.

Mike said:

: hugetlbfs allocations are commonly done via sysctl/sysfs shortly after
: boot where this may not be as much of an issue.  However, I am aware of at
: least three use cases where allocations are made after the system has been
: up and running for quite some time:
:
: - DB reconfiguration.  If sysctl/sysfs fails to get required number of
:   huge pages, system is rebooted to perform allocation after boot.
:
: - VM provisioning.  If unable get required number of huge pages, fall
:   back to base pages.
:
: - An application that does not preallocate pool, but rather allocates
:   pages at fault time for optimal NUMA locality.
:
: In all cases, I would expect b39d0ee2 to cause regressions and
: noticable behavior changes.
:
: My quick/limited testing in
: https://lkml.kernel.org/r/3468b605-a3a9-6978-9699-57c52a90bd7e@oracle.com
: was insufficient.  It was also mentioned that if something like
: b39d0ee2 went forward, I would like exemptions for __GFP_RETRY_MAYFAIL
: requests as in this patch.

[mhocko@suse.com: reworded changelog]
Link: http://lkml.kernel.org/r/20191007075548.12456-1-mhocko@kernel.org
Fixes: b39d0ee2 ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed")
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3f36d866

lib/test_meminit: add a kmem_cache_alloc_bulk() test · 03a9349a

Alexander Potapenko authored Oct 14, 2019

Make sure allocations from kmem_cache_alloc_bulk() and
kmem_cache_free_bulk() are properly initialized.

Link: http://lkml.kernel.org/r/20191007091605.30530-2-glider@google.comSigned-off-by: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Thibaut Sautereau <thibaut@sautereau.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

03a9349a

mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations · 0f181f9f

Alexander Potapenko authored Oct 14, 2019

slab_alloc_node() already zeroed out the freelist pointer if
init_on_free was on.  Thibaut Sautereau noticed that the same needs to
be done for kmem_cache_alloc_bulk(), which performs the allocations
separately.

kmem_cache_alloc_bulk() is currently used in two places in the kernel,
so this change is unlikely to have a major performance impact.

SLAB doesn't require a similar change, as auto-initialization makes the
allocator store the freelist pointers off-slab.

Link: http://lkml.kernel.org/r/20191007091605.30530-1-glider@google.com
Fixes: 6471384a ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Thibaut Sautereau <thibaut@sautereau.fr>
Reported-by: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0f181f9f

lib/generic-radix-tree.c: add kmemleak annotations · 3c52b0af

Eric Biggers authored Oct 14, 2019

Kmemleak is falsely reporting a leak of the slab allocation in
sctp_stream_init_ext():

  BUG: memory leak
  unreferenced object 0xffff8881114f5d80 (size 96):
   comm "syz-executor934", pid 7160, jiffies 4294993058 (age 31.950s)
   hex dump (first 32 bytes):
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
   backtrace:
     [<00000000ce7a1326>] kmemleak_alloc_recursive  include/linux/kmemleak.h:55 [inline]
     [<00000000ce7a1326>] slab_post_alloc_hook mm/slab.h:439 [inline]
     [<00000000ce7a1326>] slab_alloc mm/slab.c:3326 [inline]
     [<00000000ce7a1326>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
     [<000000007abb7ac9>] kmalloc include/linux/slab.h:547 [inline]
     [<000000007abb7ac9>] kzalloc include/linux/slab.h:742 [inline]
     [<000000007abb7ac9>] sctp_stream_init_ext+0x2b/0xa0  net/sctp/stream.c:157
     [<0000000048ecb9c1>] sctp_sendmsg_to_asoc+0x946/0xa00  net/sctp/socket.c:1882
     [<000000004483ca2b>] sctp_sendmsg+0x2a8/0x990 net/sctp/socket.c:2102
     [...]

But it's freed later.  Kmemleak misses the allocation because its
pointer is stored in the generic radix tree sctp_stream::out, and the
generic radix tree uses raw pages which aren't tracked by kmemleak.

Fix this by adding the kmemleak hooks to the generic radix tree code.

Link: http://lkml.kernel.org/r/20191004065039.727564-1-ebiggers@kernel.orgSigned-off-by: Eric Biggers <ebiggers@google.com>
Reported-by: <syzbot+7f3b6b106be8dcdcdeec@syzkaller.appspotmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3c52b0af

mm/slub: fix a deadlock in show_slab_objects() · e4f8e513

Qian Cai authored Oct 14, 2019

A long time ago we fixed a similar deadlock in show_slab_objects() [1].
However, it is apparently due to the commits like 01fb58bc ("slab:
remove synchronous synchronize_sched() from memcg cache deactivation
path") and 03afc0e2 ("slab: get_online_mems for
kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by
just reading files in /sys/kernel/slab which will generate a lockdep
splat below.

Since the "mem_hotplug_lock" here is only to obtain a stable online node
mask while racing with NUMA node hotplug, in the worst case, the results
may me miscalculated while doing NUMA node hotplug, but they shall be
corrected by later reads of the same files.

  WARNING: possible circular locking dependency detected
  ------------------------------------------------------
  cat/5224 is trying to acquire lock:
  ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at:
  show_slab_objects+0x94/0x3a8

  but task is already holding lock:
  b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (kn->count#45){++++}:
         lock_acquire+0x31c/0x360
         __kernfs_remove+0x290/0x490
         kernfs_remove+0x30/0x44
         sysfs_remove_dir+0x70/0x88
         kobject_del+0x50/0xb0
         sysfs_slab_unlink+0x2c/0x38
         shutdown_cache+0xa0/0xf0
         kmemcg_cache_shutdown_fn+0x1c/0x34
         kmemcg_workfn+0x44/0x64
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> #1 (slab_mutex){+.+.}:
         lock_acquire+0x31c/0x360
         __mutex_lock_common+0x16c/0xf78
         mutex_lock_nested+0x40/0x50
         memcg_create_kmem_cache+0x38/0x16c
         memcg_kmem_cache_create_func+0x3c/0x70
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> #0 (mem_hotplug_lock.rw_sem){++++}:
         validate_chain+0xd10/0x2bcc
         __lock_acquire+0x7f4/0xb8c
         lock_acquire+0x31c/0x360
         get_online_mems+0x54/0x150
         show_slab_objects+0x94/0x3a8
         total_objects_show+0x28/0x34
         slab_attr_show+0x38/0x54
         sysfs_kf_seq_show+0x198/0x2d4
         kernfs_seq_show+0xa4/0xcc
         seq_read+0x30c/0x8a8
         kernfs_fop_read+0xa8/0x314
         __vfs_read+0x88/0x20c
         vfs_read+0xd8/0x10c
         ksys_read+0xb0/0x120
         __arm64_sys_read+0x54/0x88
         el0_svc_handler+0x170/0x240
         el0_svc+0x8/0xc

  other info that might help us debug this:

  Chain exists of:
    mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(kn->count#45);
                                 lock(slab_mutex);
                                 lock(kn->count#45);
    lock(mem_hotplug_lock.rw_sem);

   *** DEADLOCK ***

  3 locks held by cat/5224:
   #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8
   #1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0
   #2: b8ff009693eee398 (kn->count#45){++++}, at:
  kernfs_seq_start+0x44/0xf0

  stack backtrace:
  Call trace:
   dump_backtrace+0x0/0x248
   show_stack+0x20/0x2c
   dump_stack+0xd0/0x140
   print_circular_bug+0x368/0x380
   check_noncircular+0x248/0x250
   validate_chain+0xd10/0x2bcc
   __lock_acquire+0x7f4/0xb8c
   lock_acquire+0x31c/0x360
   get_online_mems+0x54/0x150
   show_slab_objects+0x94/0x3a8
   total_objects_show+0x28/0x34
   slab_attr_show+0x38/0x54
   sysfs_kf_seq_show+0x198/0x2d4
   kernfs_seq_show+0xa4/0xcc
   seq_read+0x30c/0x8a8
   kernfs_fop_read+0xa8/0x314
   __vfs_read+0x88/0x20c
   vfs_read+0xd8/0x10c
   ksys_read+0xb0/0x120
   __arm64_sys_read+0x54/0x88
   el0_svc_handler+0x170/0x240
   el0_svc+0x8/0xc

I think it is important to mention that this doesn't expose the
show_slab_objects to use-after-free.  There is only a single path that
might really race here and that is the slab hotplug notifier callback
__kmem_cache_shrink (via slab_mem_going_offline_callback) but that path
doesn't really destroy kmem_cache_node data structures.

[1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html

[akpm@linux-foundation.org: add comment explaining why we don't need mem_hotplug_lock]
Link: http://lkml.kernel.org/r/1570192309-10132-1-git-send-email-cai@lca.pw
Fixes: 01fb58bc ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path")
Fixes: 03afc0e2 ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e4f8e513

mm, page_owner: rename flag indicating that page is allocated · fdf3bf80

Vlastimil Babka authored Oct 14, 2019

Commit 37389167 ("mm, page_owner: keep owner info when freeing the
page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate that page
is tracked as being allocated.  Kirril suggested naming it
PAGE_EXT_OWNER_ALLOCATED to make it more clear, as "active is somewhat
loaded term for a page".

Link: http://lkml.kernel.org/r/20190930122916.14969-4-vbabka@suse.czSigned-off-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fdf3bf80

mm, page_owner: decouple freeing stack trace from debug_pagealloc · 0fe9a448

Vlastimil Babka authored Oct 14, 2019

Commit 8974558f ("mm, page_owner, debug_pagealloc: save and dump
freeing stack trace") enhanced page_owner to also store freeing stack
trace, when debug_pagealloc is also enabled.  KASAN would also like to
do this [1] to improve error reports to debug e.g. UAF issues.

Kirill has suggested that the freeing stack trace saving should be also
possible to be enabled separately from KASAN or debug_pagealloc, i.e.
with an extra boot option.  Qian argued that we have enough options
already, and avoiding the extra overhead is not worth the complications
in the case of a debugging option.  Kirill noted that the extra stack
handle in struct page_owner requires 0.1% of memory.

This patch therefore enables free stack saving whenever page_owner is
enabled, regardless of whether debug_pagealloc or KASAN is also enabled.
KASAN kernels booted with page_owner=on will thus benefit from the
improved error reports.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=203967

[vbabka@suse.cz: v3]
  Link: http://lkml.kernel.org/r/20191007091808.7096-3-vbabka@suse.cz
Link: http://lkml.kernel.org/r/20190930122916.14969-3-vbabka@suse.czSigned-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Qian Cai <cai@lca.pw>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Walter Wu <walter-zh.wu@mediatek.com>
Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Suggested-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0fe9a448

mm, page_owner: fix off-by-one error in __set_page_owner_handle() · 5556cfe8

Vlastimil Babka authored Oct 14, 2019

Patch series "followups to debug_pagealloc improvements through
page_owner", v3.

These are followups to [1] which made it to Linus meanwhile.  Patches 1
and 3 are based on Kirill's review, patch 2 on KASAN request [2].  It
would be nice if all of this made it to 5.4 with [1] already there (or
at least Patch 1).

This patch (of 3):

As noted by Kirill, commit 7e2f2a0c ("mm, page_owner: record page
owner for each subpage") has introduced an off-by-one error in
__set_page_owner_handle() when looking up page_ext for subpages.  As a
result, the head page page_owner info is set twice, while for the last
tail page, it's not set at all.

Fix this and also make the code more efficient by advancing the page_ext
pointer we already have, instead of calling lookup_page_ext() for each
subpage.  Since the full size of struct page_ext is not known at compile
time, we can't use a simple page_ext++ statement, so introduce a
page_ext_next() inline function for that.

Link: http://lkml.kernel.org/r/20190930122916.14969-2-vbabka@suse.cz
Fixes: 7e2f2a0c ("mm, page_owner: record page owner for each subpage")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Reported-by: Miles Chen <miles.chen@mediatek.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Walter Wu <walter-zh.wu@mediatek.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5556cfe8

parisc: Remove 32-bit DMA enforcement from sba_iommu · c32c47aa

Sven Schnelle authored Sep 24, 2019

This breaks booting from sata_sil24 with the recent DMA change.
According to James Bottomley this was in to improve performance by
kicking the device into 32 bit descriptors, which are usually more
efficient, especially with older dual descriptor format cards like we
have on parisc systems.
Remove it for now to make DMA working again.

Fixes: dcc02c19 ("sata_sil24: use dma_set_mask_and_coherent")
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>

c32c47aa

parisc: Fix vmap memory leak in ioremap()/iounmap() · 513f7f74

Helge Deller authored Oct 04, 2019

Sven noticed that calling ioremap() and iounmap() multiple times leads
to a vmap memory leak:
	vmap allocation for size 4198400 failed:
	use vmalloc=<size> to increase size

It seems we missed calling vunmap() in iounmap().
Signed-off-by: Helge Deller <deller@gmx.de>
Noticed-by: Sven Schnelle <svens@stackframe.org>
Cc: <stable@vger.kernel.org> # v3.16+

513f7f74

parisc: prefer __section from compiler_attributes.h · 0703ad21

Nick Desaulniers authored Aug 12, 2019

Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Helge Deller <deller@gmx.de>

0703ad21

parisc: sysctl.c: Use CONFIG_PARISC instead of __hppa_ define · b67114db
Helge Deller authored Oct 04, 2019
```
Signed-off-by: Helge Deller <deller@gmx.de>
```
b67114db

firmware: dmi: Fix unlikely out-of-bounds read in save_mem_devices · 81dde26d

Jean Delvare authored Oct 14, 2019

Before reading the Extended Size field, we should ensure it fits in
the DMI record. There is already a record length check but it does
not cover that field.

It would take a seriously corrupted DMI table to hit that bug, so no
need to worry, but we should still fix it.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: 6deae96b ("firmware, DMI: Add function to look up a handle and return DIMM size")
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>

81dde26d

kmemleak: Do not corrupt the object_list during clean-up · 2abd839a

Catalin Marinas authored Oct 04, 2019

In case of an error (e.g. memory pool too small), kmemleak disables
itself and cleans up the already allocated metadata objects. However, if
this happens early before the RCU callback mechanism is available,
put_object() skips call_rcu() and frees the object directly. This is not
safe with the RCU list traversal in __kmemleak_do_cleanup().

Change the list traversal in __kmemleak_do_cleanup() to
list_for_each_entry_safe() and remove the rcu_read_{lock,unlock} since
the kmemleak is already disabled at this point. In addition, avoid an
unnecessary metadata object rb-tree look-up since it already has the
struct kmemleak_object pointer.

Fixes: c5665868 ("mm: kmemleak: use the memory pool for early allocations")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Reported-by: Ted Ts'o <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2abd839a

platform/x86: i2c-multi-instantiate: Fail the probe if no IRQ provided · 832392db

Andy Shevchenko authored Oct 11, 2019

For APIC case of interrupt we don't fail a ->probe() of the driver,
which makes kernel to print a lot of warnings from the children.

We have two options here:
- switch to platform_get_irq_optional(), though it won't stop children
  to be probed and failed
- fail the ->probe() of i2c-multi-instantiate

Since the in reality we never had devices in the wild where IRQ resource
is optional, the latter solution suits the best.

Fixes: 799d3379 ("platform/x86: i2c-multi-instantiate: Introduce IOAPIC IRQ support")
Reported-by: Ammy Yi <ammy.yi@intel.com>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>

832392db

13 Oct, 2019 1 commit
- Linux 5.4-rc3 · 4f5cafb5
  Linus Torvalds authored Oct 13, 2019
  
  4f5cafb5