Commits · ef64753c1922511e7d81947a8d27e72925a05e2c · nexedi / linux

17 Jan, 2020 1 commit

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · ef64753c

Linus Torvalds authored Jan 16, 2020

Pull clk fixes from Stephen Boyd:
 "Second collection of clk fixes for the next release.

  This one includes a fix for PM on TI SoCs with sysc devices and fixes
  a bunch of clks that are stuck always enabled on Qualcomm SDM845 SoCs.

  Allwinner SoCs get the usual set of fixes too, mostly correcting
  drivers to have the right bits that match the hardware.

  There's also a Samsung and Tegra fix in here to mark a clk critical
  and avoid a double free.

  And finally there's a fix for critical clks that silences a big
  warning splat about trying to enable a clk that couldn't even be
  prepared"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: ti: dra7-atl: Remove pm_runtime_irq_safe()
  clk: qcom: gcc-sdm845: Add missing flag to votable GDSCs
  clk: sunxi-ng: h6-r: Fix AR100/R_APB2 parent order
  clk: sunxi-ng: h6-r: Simplify R_APB1 clock definition
  clk: sunxi-ng: sun8i-r: Fix divider on APB0 clock
  clk: Don't try to enable critical clocks if prepare failed
  clk: tegra: Fix double-free in tegra_clk_init()
  clk: samsung: exynos5420: Keep top G3D clocks enabled
  clk: sunxi-ng: r40: Allow setting parent rate for external clock outputs
  clk: sunxi-ng: v3s: Fix incorrect number of hw_clks.

ef64753c

16 Jan, 2020 2 commits

Merge tag 'pm-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · f4353c3e

Linus Torvalds authored Jan 16, 2020

Pull power management fix from Rafael Wysocki:
 "Fix a coding mistake in the teo cpuidle governor causing data to be
  written beyond the last array element (Ikjoon Jang)"

* tag 'pm-5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: teo: Fix intervals[] array indexing bug

f4353c3e

Merge tag 'tag-chrome-platform-fixes-for-v5.5-rc7' of... · 0c99ee44

Linus Torvalds authored Jan 16, 2020

Merge tag 'tag-chrome-platform-fixes-for-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome platform fix from Benson Leung:
 "One fix in the wilco_ec keyboard backlight driver to allow the EC
  driver to continue loading in the absence of a backlight module"

* tag 'tag-chrome-platform-fixes-for-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  platform/chrome: wilco_ec: Fix keyboard backlight probing

0c99ee44

15 Jan, 2020 8 commits

Fix built-in early-load Intel microcode alignment · f5ae2ea6

Jari Ruusu authored Jan 12, 2020

Intel Software Developer's Manual, volume 3, chapter 9.11.6 says:

 "Note that the microcode update must be aligned on a 16-byte boundary
  and the size of the microcode update must be 1-KByte granular"

When early-load Intel microcode is loaded from initramfs, userspace tool
'iucode_tool' has already 16-byte aligned those microcode bits in that
initramfs image.  Image that was created something like this:

 iucode_tool --write-earlyfw=FOO.cpio microcode-files...

However, when early-load Intel microcode is loaded from built-in
firmware BLOB using CONFIG_EXTRA_FIRMWARE= kernel config option, that
16-byte alignment is not guaranteed.

Fix this by forcing all built-in firmware BLOBs to 16-byte alignment.

[ If we end up having other firmware with much bigger alignment
  requirements, we might need to introduce some method for the firmware
  to specify it, this is the minimal "just increase the alignment a bit
  to account for this one special case" patch    - Linus ]
Signed-off-by: Jari Ruusu <jari.ruusu@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f5ae2ea6

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2 · a4feff22

Linus Torvalds authored Jan 15, 2020

Pull arch/nios2 fixlet from Ley Foon Tan:
 "Update my nios2 maintainer email"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
  MAINTAINERS: Update Ley Foon Tan's email address

a4feff22

Merge tag 'platform-drivers-x86-v5.5-3' of git://git.infradead.org/linux-platform-drivers-x86 · 51d69817

Linus Torvalds authored Jan 15, 2020

Pull x86 platform driver fixes from Andy Shevchenko:

 - Fix keyboard brightness control for ASUS laptops

 - Better handling parameters of GPD pocket fan module to avoid
   thermal shock

 - Add IDs to PMC platform driver to support Intel Comet Lake

 - Fix potential dead lock in Mellanox TM FIFO driver and ABI
   documentation

* tag 'platform-drivers-x86-v5.5-3' of git://git.infradead.org/linux-platform-drivers-x86:
  Documentation/ABI: Add missed attribute for mlxreg-io sysfs interfaces
  Documentation/ABI: Fix documentation inconsistency for mlxreg-io sysfs interfaces
  platform/x86: asus-wmi: Fix keyboard brightness cannot be set to 0
  platform/x86: intel_pmc_core: update Comet Lake platform driver
  platform/x86: GPD pocket fan: Allow somewhat lower/higher temperature limits
  platform/x86: GPD pocket fan: Use default values when wrong modparams are given
  platform/mellanox: fix potential deadlock in the tmfifo driver
  platform/x86: intel-ips: Use the correct style for SPDX License Identifier

51d69817

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 0174cb6c

Linus Torvalds authored Jan 15, 2020

Pull crypto fix from Herbert Xu:
 "This fixes a build problem for the hisilicon driver"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: hisilicon/sec2 - Use atomics instead of __sync

0174cb6c

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 84bf3946

Linus Torvalds authored Jan 15, 2020

Pull vfs fixes from Al Viro:
 "Fixes for mountpoint_last() bugs (by converting to use of
  lookup_last()) and an autofs regression fix from this cycle (caused by
  follow_managed() breakage introduced in barrier fixes series)"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix autofs regression caused by follow_managed() changes
  reimplement path_mountpoint() with less magic

84bf3946

fix autofs regression caused by follow_managed() changes · 508c8772

Al Viro authored Jan 14, 2020

we need to reload ->d_flags after the call of ->d_manage() - the thing
might've been called with dentry still negative and have the damn thing
turned positive while we'd waited.

Fixes: d41efb52 "fs/namei.c: pull positivity check into follow_managed()"
Reported-by: Ian Kent <raven@themaw.net>
Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

508c8772

reimplement path_mountpoint() with less magic · c64cd6e3

Al Viro authored Jan 10, 2020

... and get rid of a bunch of bugs in it.  Background:
the reason for path_mountpoint() is that umount() really doesn't
want attempts to revalidate the root of what it's trying to umount.
The thing we want to avoid actually happen from complete_walk();
solution was to do something parallel to normal path_lookupat()
and it both went overboard and got the boilerplate subtly
(and not so subtly) wrong.

A better solution is to do pretty much what the normal path_lookupat()
does, but instead of complete_walk() do unlazy_walk().  All it takes
to avoid that ->d_weak_revalidate() call...  mountpoint_last() goes
away, along with everything it got wrong, and so does the magic around
LOOKUP_NO_REVAL.

Another source of bugs is that when we traverse mounts at the final
location (and we need to do that - umount . expects to get whatever's
overmounting ., if any, out of the lookup) we really ought to take
care of ->d_manage() - as it is, manual umount of autofs automount
in progress can lead to unpleasant surprises for the daemon.  Easily
solved by using handle_lookup_down() instead of follow_mount().
Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

c64cd6e3

MAINTAINERS: Update Ley Foon Tan's email address · 051d75d3

Ley Foon Tan authored Jan 15, 2020

@altera.com email is going to removed. Change to @intel.com email.
Signed-off-by: Ley Foon Tan <ley.foon.tan@intel.com>

051d75d3

14 Jan, 2020 26 commits

Merge tag 'nfs-for-5.5-2' of git://git.linux-nfs.org/projects/anna/linux-nfs · 95e20af9

Linus Torvalds authored Jan 14, 2020

Pull NFS client bugfixes from Anna Schumaker:
 "Three NFS over RDMA fixes for bugs Chuck found that can be hit during
  device removal:

   - Fix create_qp crash on device unload

   - Fix completion wait during device removal

   - Fix oops in receive handler after device removal"

* tag 'nfs-for-5.5-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  xprtrdma: Fix oops in Receive handler after device removal
  xprtrdma: Fix completion wait during device removal
  xprtrdma: Fix create_qp crash on device unload

95e20af9

xprtrdma: Fix oops in Receive handler after device removal · 671c450b

Chuck Lever authored Jan 03, 2020

Since v5.4, a device removal occasionally triggered this oops:

Dec 2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219
Dec 2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
Dec 2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
Dec 2 17:13:53 manet kernel: PGD 0 P4D 0
Dec 2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
Dec 2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G W 5.4.0-00050-g53717e43af61 #883
Dec 2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
Dec 2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
Dec 2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
Dec 2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 <48> 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
Dec 2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246
Dec 2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048
Dec 2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001
Dec 2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000
Dec 2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428
Dec 2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010
Dec 2 17:13:53 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
Dec 2 17:13:53 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0
Dec 2 17:13:53 manet kernel: Call Trace:
Dec 2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
Dec 2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
Dec 2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
Dec 2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
Dec 2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
Dec 2 17:13:53 manet kernel: kthread+0xf4/0xf9
Dec 2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
Dec 2 17:13:53 manet kernel: ret_from_fork+0x24/0x30

The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
is still pointing to the old ib_device, which has been freed. The
only way that is possible is if this rpcrdma_rep was not destroyed
by rpcrdma_ia_remove.

Debugging showed that was indeed the case: this rpcrdma_rep was
still in use by a completing RPC at the time of the device removal,
and thus wasn't on the rep free list. So, it was not found by
rpcrdma_reps_destroy().

The fix is to introduce a list of all rpcrdma_reps so that they all
can be found when a device is removed. That list is used to perform
only regbuf DMA unmapping, replacing that call to
rpcrdma_reps_destroy().

Meanwhile, to prevent corruption of this list, I've moved the
destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
protecting the rb_all_reps list.

Fixes: b0b227f0 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

671c450b

xprtrdma: Fix completion wait during device removal · 13cb886c

Chuck Lever authored Jan 03, 2020

I've found that on occasion, "rmmod <dev>" will hang while if an NFS
is under load.

Ensure that ri_remove_done is initialized only just before the
transport is woken up to force a close. This avoids the completion
possibly getting initialized again while the CM event handler is
waiting for a wake-up.

Fixes: bebd0318 ("xprtrdma: Support unplugging an HCA from under an NFS mount")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

13cb886c

xprtrdma: Fix create_qp crash on device unload · b32b9ed4

Chuck Lever authored Jan 03, 2020

On device re-insertion, the RDMA device driver crashes trying to set
up a new QP:

Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0
Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode
Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page
Nov 27 16:32:06 manet kernel: PGD 0 P4D 0
Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP
Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G W 5.4.0 #852
Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12
Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 <f0> 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00
Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046
Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000
Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0
Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000
Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8
Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000
Nov 27 16:32:06 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000
Nov 27 16:32:06 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0
Nov 27 16:32:06 manet kernel: Call Trace:
Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a
Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib]
Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a
Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139
Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib]

The fix is to copy the qp_init_attr struct that was just created by
rpcrdma_ep_create() instead of using the one from the previous
connection instance.

Fixes: 98ef77d1 ("xprtrdma: Send Queue size grows after a reconnect")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

b32b9ed4

Merge branch 'parisc-5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 452424cd

Linus Torvalds authored Jan 14, 2020

Pull parisc fixes from Helge Deller:
 "A boot crash fix by Mike Rapoport and a printk fix by Krzysztof
  Kozlowski"

* 'parisc-5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: fix map_pages() to actually populate upper directory
  parisc: Use proper printk format for resource_size_t

452424cd

Merge tag 'asm-generic-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground · 67373994

Linus Torvalds authored Jan 14, 2020

Pull asm-generic fixes from Arnd Bergmann:
 "Here are two bugfixes from Mike Rapoport, both fixing compile-time
  errors for the nds32 architecture that were recently introduced"

* tag 'asm-generic-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
  nds32: fix build failure caused by page table folding updates
  asm-generic/nds32: don't redefine cacheflush primitives

67373994

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · c21ed4d9

Linus Torvalds authored Jan 14, 2020

Pull SCSI fixes from James Bottomley:
 "Two simple fixes in the upper drivers (so both fairly core), one in
  enclosures, which fixes replugging a device into an enclosure slot and
  one in the disk driver which fixes revalidating a drive with
  protection information (PI) to make it a non-PI drive ... previously
  we were still remembering the old PI state.

  Both fixed issues are quite rare in the field"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: enclosure: Fix stale device oops with hot replug
  scsi: sd: Clear sdkp->protection_type if disk is reformatted without PI

c21ed4d9

Merge branch 'dhowells' (patches from DavidH) · e033e7d4

Linus Torvalds authored Jan 14, 2020

Merge misc fixes from David Howells.

Two afs fixes and a key refcounting fix.

* dhowells:
  afs: Fix afs_lookup() to not clobber the version on a new dentry
  afs: Fix use-after-loss-of-ref
  keys: Fix request_key() cache

e033e7d4

afs: Fix afs_lookup() to not clobber the version on a new dentry · f52b83b0

David Howells authored Jan 14, 2020

Fix afs_lookup() to not clobber the version set on a new dentry by
afs_do_lookup() - especially as it's using the wrong version of the
version (we need to use the one given to us by whatever op the dir
contents correspond to rather than what's in the afs_vnode).

Fixes: 9dd0b82e ("afs: Fix missing dentry data version updating")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f52b83b0

afs: Fix use-after-loss-of-ref · 40a708bd

David Howells authored Jan 14, 2020

afs_lookup() has a tracepoint to indicate the outcome of
d_splice_alias(), passing it the inode to retrieve the fid from.
However, the function gave up its ref on that inode when it called
d_splice_alias(), which may have failed and dropped the inode.

Fix this by caching the fid.

Fixes: 80548b03 ("afs: Add more tracepoints")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

40a708bd

keys: Fix request_key() cache · 8379bb84

David Howells authored Jan 14, 2020

When the key cached by request_key() and co. is cleaned up on exit(),
the code looks in the wrong task_struct, and so clears the wrong cache.
This leads to anomalies in key refcounting when doing, say, a kernel
build on an afs volume, that then trigger kasan to report a
use-after-free when the key is viewed in /proc/keys.

Fix this by making exit_creds() look in the passed-in task_struct rather
than in current (the task_struct cleanup code is deferred by RCU and
potentially run in another task).

Fixes: 7743c48e ("keys: Cache result of request_key*() temporarily in task_struct")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8379bb84

Merge branch 'akpm' (patches from Andrew) · 3f1f9a9b

Linus Torvalds authored Jan 14, 2020

Merge misc fixes from Andrew Morton:
 "11 mm fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE
  mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid
  mm/page-writeback.c: improve arithmetic divisions
  mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide
  mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio()
  mm, debug_pagealloc: don't rely on static keys too early
  mm: memcg/slab: fix percpu slab vmstats flushing
  mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment
  mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment
  mm/memory_hotplug: don't free usage map when removing a re-added early section
  mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations

3f1f9a9b

parisc: fix map_pages() to actually populate upper directory · 8b7f938e

Mike Rapoport authored Jan 08, 2020

The commit d96885e2 ("parisc: use pgtable-nopXd instead of
4level-fixup") converted PA-RISC to use folded page tables, but it missed
the conversion of pgd_populate() to pud_populate() in maps_pages()
function. This caused the upper page table directory to remain empty and
the system would crash as a result.

Using pud_populate() that actually populates the page table instead of
dummy pgd_populate() fixes the issue.

Fixes: d96885e2 ("parisc: use pgtable-nopXd instead of 4level-fixup")
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Jeroen Roovers <jer@gentoo.org>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Jeroen Roovers <jer@gentoo.org>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Helge Deller <deller@gmx.de>

8b7f938e

parisc: Use proper printk format for resource_size_t · 4f80b70e

Krzysztof Kozlowski authored Jan 03, 2020

resource_size_t should be printed with its own size-independent format
to fix warnings when compiling on 64-bit platform (e.g. with
COMPILE_TEST):

    arch/parisc/kernel/drivers.c: In function 'print_parisc_device':
    arch/parisc/kernel/drivers.c:892:9: warning:
        format '%p' expects argument of type 'void *',
        but argument 4 has type 'resource_size_t {aka unsigned int}' [-Wformat=]
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>

4f80b70e

Merge tag 'Intel-CVE-2019-14615' from bundle by Akeem Abodunrin. · 63d264fe

Linus Torvalds authored Jan 13, 2020

Merge Intel Gen9 graphics fix from Akeem Abodunrin:
 "Insufficient control flow in certain data structures for some Intel
  Processors with Intel Processor Graphics may allow an unauthenticated
  user to potentially enable information disclosure via local access

  This provides mitigation for Gen9 hardware. Note that Gen8 is not
  impacted due to a previously implemented workaround.

  The mitigation involves using an existing hardware feature to forcibly
  clear down all EU state at each context switch"

* tag 'Intel-CVE-2019-14615' of emailed bundle from Akeem G Abodunrin <akeem.g.abodunrin@intel.com>:
  drm/i915/gen9: Clear residual context state on context switch

63d264fe

mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE · 554913f6

Yang Shi authored Jan 13, 2020

Commit 99cb0dbd ("mm,thp: add read-only THP support for (non-shmem)
FS") introduced a new khugepaged scan result: SCAN_PAGE_HAS_PRIVATE, but
the corresponding description for trace events were not added.

Link: http://lkml.kernel.org/r/1574793844-2914-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: 99cb0dbd ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

554913f6

mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid · 2fe20210

Adrian Huang authored Jan 13, 2020

When booting with amd_iommu=off, the following WARNING message
appears:

  AMD-Vi: AMD IOMMU disabled on kernel command-line
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 0 at kernel/workqueue.c:2772 flush_workqueue+0x42e/0x450
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc3-amd-iommu #6
  Hardware name: Lenovo ThinkSystem SR655-2S/7D2WRCZ000, BIOS D8E101L-1.00 12/05/2019
  RIP: 0010:flush_workqueue+0x42e/0x450
  Code: ff 0f 0b e9 7a fd ff ff 4d 89 ef e9 33 fe ff ff 0f 0b e9 7f fd ff ff 0f 0b e9 bc fd ff ff 0f 0b e9 a8 fd ff ff e8 52 2c fe ff <0f> 0b 31 d2 48 c7 c6 e0 88 c5 95 48 c7 c7 d8 ad f0 95 e8 19 f5 04
  Call Trace:
   kmem_cache_destroy+0x69/0x260
   iommu_go_to_state+0x40c/0x5ab
   amd_iommu_prepare+0x16/0x2a
   irq_remapping_prepare+0x36/0x5f
   enable_IR_x2apic+0x21/0x172
   default_setup_apic_routing+0x12/0x6f
   apic_intr_mode_init+0x1a1/0x1f1
   x86_late_time_init+0x17/0x1c
   start_kernel+0x480/0x53f
   secondary_startup_64+0xb6/0xc0
  ---[ end trace 30894107c3749449 ]---
  x2apic: IRQ remapping doesn't support X2APIC mode
  x2apic disabled

The warning is caused by the calling of 'kmem_cache_destroy()'
in free_iommu_resources(). Here is the call path:

  free_iommu_resources
    kmem_cache_destroy
      flush_memcg_workqueue
        flush_workqueue

The root cause is that the IOMMU subsystem runs before the workqueue
subsystem, which the variable 'wq_online' is still 'false'.  This leads
to the statement 'if (WARN_ON(!wq_online))' in flush_workqueue() is
'true'.

Since the variable 'memcg_kmem_cache_wq' is not allocated during the
time, it is unnecessary to call flush_memcg_workqueue().  This prevents
the WARNING message triggered by flush_workqueue().

Link: http://lkml.kernel.org/r/20200103085503.1665-1-ahuang12@lenovo.com
Fixes: 92ee383f ("mm: fix race between kmem_cache destroy, create and deactivate")
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Reported-by: Xiaochun Lee <lixc17@lenovo.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2fe20210

mm/page-writeback.c: improve arithmetic divisions · 0a5d1a7f

Wen Yang authored Jan 13, 2020

Use div64_ul() instead of do_div() if the divisor is unsigned long, to
avoid truncation to 32-bit on 64-bit platforms.

Link: http://lkml.kernel.org/r/20200102081442.8273-4-wenyang@linux.alibaba.comSigned-off-by: Wen Yang <wenyang@linux.alibaba.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0a5d1a7f

mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide · d3ac946e

Wen Yang authored Jan 13, 2020

The two variables 'numerator' and 'denominator', though they are
declared as long, they should actually be unsigned long (according to
the implementation of the fprop_fraction_percpu() function)

And do_div() does a 64-by-32 division, while the divisor 'denominator'
is unsigned long, thus 64-bit on 64-bit platforms.  Hence the proper
function to call is div64_ul().

Link: http://lkml.kernel.org/r/20200102081442.8273-3-wenyang@linux.alibaba.comSigned-off-by: Wen Yang <wenyang@linux.alibaba.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d3ac946e

mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() · 6d9e8c65

Wen Yang authored Jan 13, 2020

Patch series "use div64_ul() instead of div_u64() if the divisor is
unsigned long".

We were first inspired by commit b0ab99e7 ("sched: Fix possible divide
by zero in avg_atom () calculation"), then refer to the recently analyzed
mm code, we found this suspicious place.

 201                 if (min) {
 202                         min *= this_bw;
 203                         do_div(min, tot_bw);
 204                 }

And we also disassembled and confirmed it:

  /usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c: 201
  0xffffffff811c37da <__wb_calc_thresh+234>:      xor    %r10d,%r10d
  0xffffffff811c37dd <__wb_calc_thresh+237>:      test   %rax,%rax
  0xffffffff811c37e0 <__wb_calc_thresh+240>:      je 0xffffffff811c3800 <__wb_calc_thresh+272>
  /usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c: 202
  0xffffffff811c37e2 <__wb_calc_thresh+242>:      imul   %r8,%rax
  /usr/src/debug/kernel-4.9.168-016.ali3000/linux-4.9.168-016.ali3000.alios7.x86_64/mm/page-writeback.c: 203
  0xffffffff811c37e6 <__wb_calc_thresh+246>:      mov    %r9d,%r10d    ---> truncates it to 32 bits here
  0xffffffff811c37e9 <__wb_calc_thresh+249>:      xor    %edx,%edx
  0xffffffff811c37eb <__wb_calc_thresh+251>:      div    %r10
  0xffffffff811c37ee <__wb_calc_thresh+254>:      imul   %rbx,%rax
  0xffffffff811c37f2 <__wb_calc_thresh+258>:      shr    $0x2,%rax
  0xffffffff811c37f6 <__wb_calc_thresh+262>:      mul    %rcx
  0xffffffff811c37f9 <__wb_calc_thresh+265>:      shr    $0x2,%rdx
  0xffffffff811c37fd <__wb_calc_thresh+269>:      mov    %rdx,%r10

This series uses div64_ul() instead of div_u64() if the divisor is
unsigned long, to avoid truncation to 32-bit on 64-bit platforms.

This patch (of 3):

The variables 'min' and 'max' are unsigned long and do_div truncates
them to 32 bits, which means it can test non-zero and be truncated to
zero for division.  Fix this issue by using div64_ul() instead.

Link: http://lkml.kernel.org/r/20200102081442.8273-2-wenyang@linux.alibaba.com
Fixes: 693108a8 ("writeback: make bdi->min/max_ratio handling cgroup writeback aware")
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6d9e8c65

mm, debug_pagealloc: don't rely on static keys too early · 8e57f8ac

Vlastimil Babka authored Jan 13, 2020

Commit 96a2b03f ("mm, debug_pagelloc: use static keys to enable
debugging") has introduced a static key to reduce overhead when
debug_pagealloc is compiled in but not enabled.  It relied on the
assumption that jump_label_init() is called before parse_early_param()
as in start_kernel(), so when the "debug_pagealloc=on" option is parsed,
it is safe to enable the static key.

However, it turns out multiple architectures call parse_early_param()
earlier from their setup_arch().  x86 also calls jump_label_init() even
earlier, so no issue was found while testing the commit, but same is not
true for e.g.  ppc64 and s390 where the kernel would not boot with
debug_pagealloc=on as found by our QA.

To fix this without tricky changes to init code of multiple
architectures, this patch partially reverts the static key conversion
from 96a2b03f.  Init-time and non-fastpath calls (such as in arch
code) of debug_pagealloc_enabled() will again test a simple bool
variable.  Fastpath mm code is converted to a new
debug_pagealloc_enabled_static() variant that relies on the static key,
which is enabled in a well-defined point in mm_init() where it's
guaranteed that jump_label_init() has been called, regardless of
architecture.

[sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early]
  Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au
Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz
Fixes: 96a2b03f ("mm, debug_pagelloc: use static keys to enable debugging")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Qian Cai <cai@lca.pw>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8e57f8ac

mm: memcg/slab: fix percpu slab vmstats flushing · 4a87e2a2

Roman Gushchin authored Jan 13, 2020

Currently slab percpu vmstats are flushed twice: during the memcg
offlining and just before freeing the memcg structure. Each time percpu
counters are summed, added to the atomic counterparts and propagated up
by the cgroup tree.

The second flushing is required due to how recursive vmstats are
implemented: counters are batched in percpu variables on a local level,
and once a percpu value is crossing some predefined threshold, it spills
over to atomic values on the local and each ascendant levels. It means
that without flushing some numbers cached in percpu variables will be
dropped on floor each time a cgroup is destroyed. And with uptime the
error on upper levels might become noticeable.

The first flushing aims to make counters on ancestor levels more
precise. Dying cgroups may resume in the dying state for a long time.
After kmem_cache reparenting which is performed during the offlining
slab counters of the dying cgroup don't have any chances to be updated,
because any slab operations will be performed on the parent level. It
means that the inaccuracy caused by percpu batching will not decrease up
to the final destruction of the cgroup. By the original idea flushing
slab counters during the offlining should minimize the visible
inaccuracy of slab counters on the parent level.

The problem is that percpu counters are not zeroed after the first
flushing. So every cached percpu value is summed twice. It creates a
small error (up to 32 pages per cpu, but usually less) which accumulates
on parent cgroup level. After creating and destroying of thousands of
child cgroups, slab counter on parent level can be way off the real
value.

For now, let's just stop flushing slab counters on memcg offlining. It
can't be done correctly without scheduling a work on each cpu: reading
and zeroing it during css offlining can race with an asynchronous
update, which doesn't expect values to be changed underneath.

With this change, slab counters on parent level will become eventually
consistent. Once all dying children are gone, values are correct. And
if not, the error is capped by 32 * NR_CPUS pages per dying cgroup.

It's not perfect, as slab are reparented, so any updates after the
reparenting will happen on the parent level. It means that if a slab
page was allocated, a counter on child level was bumped, then the page
was reparented and freed, the annihilation of positive and negative
counter values will not happen until the child cgroup is released. It
makes slab counters different from others, and it might want us to
implement flushing in a correct form again. But it's also a question of
performance: scheduling a work on each cpu isn't free, and it's an open
question if the benefit of having more accurate counters is worth it.

We might also consider flushing all counters on offlining, not only slab
counters.

So let's fix the main problem now: make the slab counters eventually
consistent, so at least the error won't grow with uptime (or more
precisely the number of created and destroyed cgroups). And think about
the accuracy of counters separately.

Link: http://lkml.kernel.org/r/20191220042728.1045881-1-guro@fb.com
Fixes: bee07b33 ("mm: memcontrol: flush percpu slab vmstats on kmem offlining")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4a87e2a2

mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment · 99158997

Kirill A. Shutemov authored Jan 13, 2020

Shmem/tmpfs tries to provide THP-friendly mappings if huge pages are
enabled.  But it doesn't work well with above-47bit hint address.

Normally, the kernel doesn't create userspace mappings above 47-bit,
even if the machine allows this (such as with 5-level paging on x86-64).
Not all user space is ready to handle wide addresses.  It's known that
at least some JIT compilers use higher bits in pointers to encode their
information.

Userspace can ask for allocation from full address space by specifying
hint address (with or without MAP_FIXED) above 47-bits.  If the
application doesn't need a particular address, but wants to allocate
from whole address space it can specify -1 as a hint address.

Unfortunately, this trick breaks THP alignment in shmem/tmp:
shmem_get_unmapped_area() would not try to allocate PMD-aligned area if
*any* hint address specified.

This can be fixed by requesting the aligned area if the we failed to
allocated at user-specified hint address.  The request with inflated
length will also take the user-specified hint address.  This way we will
not lose an allocation request from the full address space.

[kirill@shutemov.name: fold in a fixup]
  Link: http://lkml.kernel.org/r/20191223231309.t6bh5hkbmokihpfu@box
Link: http://lkml.kernel.org/r/20191220142548.7118-3-kirill.shutemov@linux.intel.com
Fixes: b569bab7 ("x86/mm: Prepare to expose larger address space to userspace")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Willhalm, Thomas" <thomas.willhalm@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Bruggeman, Otto G" <otto.g.bruggeman@intel.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

99158997

mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment · 97d3d0f9

Kirill A. Shutemov authored Jan 13, 2020

Patch series "Fix two above-47bit hint address vs.  THP bugs".

The two get_unmapped_area() implementations have to be fixed to provide
THP-friendly mappings if above-47bit hint address is specified.

This patch (of 2):

Filesystems use thp_get_unmapped_area() to provide THP-friendly
mappings.  For DAX in particular.

Normally, the kernel doesn't create userspace mappings above 47-bit,
even if the machine allows this (such as with 5-level paging on x86-64).
Not all user space is ready to handle wide addresses.  It's known that
at least some JIT compilers use higher bits in pointers to encode their
information.

Userspace can ask for allocation from full address space by specifying
hint address (with or without MAP_FIXED) above 47-bits.  If the
application doesn't need a particular address, but wants to allocate
from whole address space it can specify -1 as a hint address.

Unfortunately, this trick breaks thp_get_unmapped_area(): the function
would not try to allocate PMD-aligned area if *any* hint address
specified.

Modify the routine to handle it correctly:

 - Try to allocate the space at the specified hint address with length
   padding required for PMD alignment.
 - If failed, retry without length padding (but with the same hint
   address);
 - If the returned address matches the hint address return it.
 - Otherwise, align the address as required for THP and return.

The user specified hint address is passed down to get_unmapped_area() so
above-47bit hint address will be taken into account without breaking
alignment requirements.

Link: http://lkml.kernel.org/r/20191220142548.7118-2-kirill.shutemov@linux.intel.com
Fixes: b569bab7 ("x86/mm: Prepare to expose larger address space to userspace")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Thomas Willhalm <thomas.willhalm@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Bruggeman, Otto G" <otto.g.bruggeman@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

97d3d0f9

mm/memory_hotplug: don't free usage map when removing a re-added early section · 8068df3b

David Hildenbrand authored Jan 13, 2020

When we remove an early section, we don't free the usage map, as the
usage maps of other sections are placed into the same page.  Once the
section is removed, it is no longer an early section (especially, the
memmap is freed).  When we re-add that section, the usage map is reused,
however, it is no longer an early section.  When removing that section
again, we try to kfree() a usage map that was allocated during early
boot - bad.

Let's check against PageReserved() to see if we are dealing with an
usage map that was allocated during boot.  We could also check against
!(PageSlab(usage_page) || PageCompound(usage_page)), but PageReserved() is
cleaner.

Can be triggered using memtrace under ppc64/powernv:

  $ mount -t debugfs none /sys/kernel/debug/
  $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
  $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
   ------------[ cut here ]------------
   kernel BUG at mm/slub.c:3969!
   Oops: Exception in kernel mode, sig: 5 [#1]
   LE PAGE_SIZE=3D64K MMU=3DHash SMP NR_CPUS=3D2048 NUMA PowerNV
   Modules linked in:
   CPU: 0 PID: 154 Comm: sh Not tainted 5.5.0-rc2-next-20191216-00005-g0be1dba7b7c0 #61
   NIP kfree+0x338/0x3b0
   LR section_deactivate+0x138/0x200
   Call Trace:
     section_deactivate+0x138/0x200
     __remove_pages+0x114/0x150
     arch_remove_memory+0x3c/0x160
     try_remove_memory+0x114/0x1a0
     __remove_memory+0x20/0x40
     memtrace_enable_set+0x254/0x850
     simple_attr_write+0x138/0x160
     full_proxy_write+0x8c/0x110
     __vfs_write+0x38/0x70
     vfs_write+0x11c/0x2a0
     ksys_write+0x84/0x140
     system_call+0x5c/0x68
   ---[ end trace 4b053cbd84e0db62 ]---

The first invocation will offline+remove memory blocks.  The second
invocation will first add+online them again, in order to offline+remove
them again (usually we are lucky and the exact same memory blocks will
get "reallocated").

Tested on powernv with boot memory: The usage map will not get freed.
Tested on x86-64 with DIMMs: The usage map will get freed.

Using Dynamic Memory under a Power DLAPR can trigger it easily.

Triggering removal (I assume after previously removed+re-added) of
memory from the HMC GUI can crash the kernel with the same call trace
and is fixed by this patch.

Link: http://lkml.kernel.org/r/20191217104637.5509-1-david@redhat.com
Fixes: 326e1b8f ("mm/sparsemem: introduce a SECTION_IS_EARLY flag")
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Pingfan Liu <piliu@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8068df3b

mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations · cc638f32

Vlastimil Babka authored Jan 13, 2020

THP page faults now attempt a __GFP_THISNODE allocation first, which
should only compact existing free memory, followed by another attempt
that can allocate from any node using reclaim/compaction effort
specified by global defrag setting and madvise.

This patch makes the following changes to the scheme:

 - Before the patch, the first allocation relies on a check for
   pageblock order and __GFP_IO to prevent excessive reclaim. This
   however affects also the second attempt, which is not limited to
   single node.

   Instead of that, reuse the existing check for costly order
   __GFP_NORETRY allocations, and make sure the first THP attempt uses
   __GFP_NORETRY. As a side-effect, all costly order __GFP_NORETRY
   allocations will bail out if compaction needs reclaim, while
   previously they only bailed out when compaction was deferred due to
   previous failures.

   This should be still acceptable within the __GFP_NORETRY semantics.

 - Before the patch, the second allocation attempt (on all nodes) was
   passing __GFP_NORETRY. This is redundant as the check for pageblock
   order (discussed above) was stronger. It's also contrary to
   madvise(MADV_HUGEPAGE) which means some effort to allocate THP is
   requested.

   After this patch, the second attempt doesn't pass __GFP_THISNODE nor
   __GFP_NORETRY.

To sum up, THP page faults now try the following attempts:

1. local node only THP allocation with no reclaim, just compaction.
2. for madvised VMA's or when synchronous compaction is enabled always - THP
   allocation from any node with effort determined by global defrag setting
   and VMA madvise
3. fallback to base pages on any node

Link: http://lkml.kernel.org/r/08a3f4dd-c3ce-0009-86c5-9ee51aba8557@suse.cz
Fixes: b39d0ee2 ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cc638f32

13 Jan, 2020 3 commits

Documentation/ABI: Add missed attribute for mlxreg-io sysfs interfaces · f3efc406

Vadim Pasternak authored Jan 13, 2020

Add missed "cpld4_version" attribute.

Fixes: 52675da1 ("Documentation/ABI: Add new attribute for mlxreg-io sysfs interfaces")
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

f3efc406

Documentation/ABI: Fix documentation inconsistency for mlxreg-io sysfs interfaces · f4094826

Vadim Pasternak authored Jan 13, 2020

Fix attribute name from "jtag_enable", which described twice to
"cpld3_version", which is expected to be instead of second appearance
of "jtag_enable".

Fixes: 2752e344 ("Documentation/ABI: Add new attribute for mlxreg-io sysfs interfaces")
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

f4094826

Merge tag 'sunxi-clk-fixes-for-5.5' of... · a0af2742

Stephen Boyd authored Jan 13, 2020

Merge tag 'sunxi-clk-fixes-for-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into clk-fixes

Pull Allwinner clk fixes from Maxime Ripard:

Our usual set of fixes for Allwinner, to fix the number of reported
clocks on the v3s, fixing the external clock on the R40, and some
fixes for the AR100 co-processor clocks.

* tag 'sunxi-clk-fixes-for-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  clk: sunxi-ng: h6-r: Fix AR100/R_APB2 parent order
  clk: sunxi-ng: h6-r: Simplify R_APB1 clock definition
  clk: sunxi-ng: sun8i-r: Fix divider on APB0 clock
  clk: sunxi-ng: r40: Allow setting parent rate for external clock outputs
  clk: sunxi-ng: v3s: Fix incorrect number of hw_clks.

a0af2742