Commits · 7110629263469b4664d00b38ef80a656eddf3637 · nexedi / linux

08 Apr, 2019 5 commits

tpm: fix an invalid condition in tpm_common_poll · 71106292

Tadeusz Struk authored Mar 27, 2019

The poll condition should only check response_length,
because reads should only be issued if there is data to read.
The response_read flag only prevents double writes.
The problem was that the write set the response_read to false,
enqued a tpm job, and returned. Then application called poll
which checked the response_read flag and returned EPOLLIN.
Then the application called read, but got nothing.
After all that the async_work kicked in.
Added also mutex_lock around the poll check to prevent
other possible race conditions.

Fixes: 9488585b ("tpm: add support for partial reads")
Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Tested-by: Mantas Mikulėnas <grawity@gmail.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>

71106292

tpm: turn on TPM on suspend for TPM 1.x · e891db1a

Jarkko Sakkinen authored Mar 22, 2019

tpm_chip_start/stop() should be also called for TPM 1.x devices on
suspend. Add that functionality back. Do not lock the chip because
it is unnecessary as there are no multiple threads using it when
doing the suspend.

Fixes: a3fbfae8 ("tpm: take TPM chip power gating out of tpm_transmit()")
Reported-by: Paul Zimmerman <pauldzim@gmail.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Domenico Andreoli <domenico.andreoli@linux.com>
Signed-off-by: James Morris <james.morris@microsoft.com>

e891db1a

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · fd008d1a

Linus Torvalds authored Apr 07, 2019

Pull crypto fix from Herbert Xu:
 "This fixes a bug in the implementation of xcbc and cmac in caam"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: caam - fix copy of next buffer for xcbc and cmac

fd008d1a

slab: fix a crash by reading /proc/slab_allocators · fcf88917

Qian Cai authored Apr 06, 2019

The commit 510ded33 ("slab: implement slab_root_caches list")
changes the name of the list node within "struct kmem_cache" from "list"
to "root_caches_node", but leaks_show() still use the "list" which
causes a crash when reading /proc/slab_allocators.

You need to have CONFIG_SLAB=y and CONFIG_MEMCG=y to see the problem,
because without MEMCG all slab caches are root caches, and the "list"
node happens to be the right one.

Fixes: 510ded33 ("slab: implement slab_root_caches list")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Tobin C. Harding <tobin@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fcf88917

Linux 5.1-rc4 · 15ade5d2
Linus Torvalds authored Apr 07, 2019

15ade5d2

07 Apr, 2019 15 commits

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · d8491223

Linus Torvalds authored Apr 07, 2019

Pull ARM SoC fixes from Olof Johansson:
 "A collection of fixes from the last few weeks. Most of them are
  smaller tweaks and fixes to DT and hardware descriptions for boards.
  Some of the more significant ones are:

   - eMMC and RGMII stability tweaks for rk3288

   - DDC fixes for Rock PI 4

   - Audio fixes for two TI am335x eval boards

   - D_CAN clock fix for am335x

   - Compilation fixes for clang

   - !HOTPLUG_CPU compilation fix for one of the new platforms this
     release (milbeaut)

   - A revert of a gpio fix for nomadik that instead was fixed in the
     gpio subsystem

   - Whitespace fix for the DT JSON schema (no tabs allowed)"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (25 commits)
  ARM: milbeaut: fix build with !CONFIG_HOTPLUG_CPU
  ARM: iop: don't use using 64-bit DMA masks
  ARM: orion: don't use using 64-bit DMA masks
  Revert "ARM: dts: nomadik: Fix polarity of SPI CS"
  dt-bindings: cpu: Fix JSON schema
  arm/mach-at91/pm : fix possible object reference leak
  ARM: dts: at91: Fix typo in ISC_D0 on PC9
  ARM: dts: Fix dcan clkctrl clock for am3
  reset: meson-audio-arb: Fix missing .owner setting of reset_controller_dev
  dt-bindings: reset: meson-g12a: Add missing USB2 PHY resets
  ARM: dts: rockchip: Remove #address/#size-cells from rk3288-veyron gpio-keys
  ARM: dts: rockchip: Remove #address/#size-cells from rk3288 mipi_dsi
  ARM: dts: rockchip: Fix gpu opp node names for rk3288
  ARM: dts: am335x-evmsk: Correct the regulators for the audio codec
  ARM: dts: am335x-evm: Correct the regulators for the audio codec
  ARM: OMAP2+: add missing of_node_put after of_device_is_available
  ARM: OMAP1: ams-delta: Fix broken GPIO ID allocation
  arm64: dts: stratix10: add the sysmgr-syscon property from the gmac's
  arm64: dts: rockchip: fix rk3328 sdmmc0 write errors
  arm64: dts: rockchip: fix rk3328 rgmii high tx error rate
  ...

d8491223

Merge tag 'for-linus-20190407' of git://git.kernel.dk/linux-block · 429fba10

Linus Torvalds authored Apr 07, 2019

Pull block fixes from Jens Axboe:

 - Fixups for the pf/pcd queue handling (YueHaibing)

 - Revert of the three direct issue changes as they have been proven to
   cause an issue with dm-mpath (Bart)

 - Plug rq_count reset fix (Dongli)

 - io_uring double free in fileset registration error handling (me)

 - Make null_blk handle bad numa node passed in (John)

 - BFQ ifdef fix (Konstantin)

 - Flush queue leak fix (Shenghui)

 - Plug trace fix (Yufen)

* tag 'for-linus-20190407' of git://git.kernel.dk/linux-block:
  xsysace: Fix error handling in ace_setup
  null_blk: prevent crash from bad home_node value
  block: Revert v5.0 blk_mq_request_issue_directly() changes
  paride/pcd: Fix potential NULL pointer dereference and mem leak
  blk-mq: do not reset plug->rq_count before the list is sorted
  paride/pf: Fix potential NULL pointer dereference
  io_uring: fix double free in case of fileset regitration failure
  blk-mq: add trace block plug and unplug for multiple queues
  block: use blk_free_flush_queue() to free hctx->fq in blk_mq_init_hctx
  block/bfq: fix ifdef for CONFIG_BFQ_GROUP_IOSCHED=y

429fba10

ARM: milbeaut: fix build with !CONFIG_HOTPLUG_CPU · 9a8f3203

Arnd Bergmann authored Mar 13, 2019

When HOTPLUG_CPU is disabled, some fields in the smp operations
are not available or needed:

arch/arm/mach-milbeaut/platsmp.c:90:3: error: field designator 'cpu_die' does not refer to any field in type
      'struct smp_operations'
        .cpu_die                = m10v_cpu_die,
         ^
arch/arm/mach-milbeaut/platsmp.c:91:3: error: field designator 'cpu_kill' does not refer to any field in type
      'struct smp_operations'
        .cpu_kill               = m10v_cpu_kill,
         ^

Hide them in an #ifdef like the other platforms do.

Fixes: 9fb29c73 ("ARM: milbeaut: Add basic support for Milbeaut m10v SoC")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Olof Johansson <olof@lixom.net>

9a8f3203

ARM: iop: don't use using 64-bit DMA masks · 2125801c

Arnd Bergmann authored Mar 25, 2019

clang warns about statically defined DMA masks from the DMA_BIT_MASK
macro with length 64:

 arch/arm/mach-iop13xx/setup.c:303:35: error: shift count >= width of type [-Werror,-Wshift-count-overflow]
 static u64 iop13xx_adma_dmamask = DMA_BIT_MASK(64);
                                  ^~~~~~~~~~~~~~~~
 include/linux/dma-mapping.h:141:54: note: expanded from macro 'DMA_BIT_MASK'
 #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
                                                      ^ ~~~

The ones in iop shouldn't really be 64 bit masks, so changing them
to what the driver can support avoids the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Olof Johansson <olof@lixom.net>

2125801c

ARM: orion: don't use using 64-bit DMA masks · cd92d74d

Arnd Bergmann authored Mar 25, 2019

clang warns about statically defined DMA masks from the DMA_BIT_MASK
macro with length 64:

arch/arm/plat-orion/common.c:625:29: error: shift count >= width of type [-Werror,-Wshift-count-overflow]
                .coherent_dma_mask      = DMA_BIT_MASK(64),
                                          ^~~~~~~~~~~~~~~~
include/linux/dma-mapping.h:141:54: note: expanded from macro 'DMA_BIT_MASK'
 #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))

The ones in orion shouldn't really be 64 bit masks, so changing them
to what the driver can support avoids the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Olof Johansson <olof@lixom.net>

cd92d74d

Revert "ARM: dts: nomadik: Fix polarity of SPI CS" · fbe8758f

Olof Johansson authored Apr 07, 2019

This reverts commit fa946356.

Per Linus Walleij:

Dear ARM SoC maintainers,

can you please revert this patch. It was the wrong solution to the
wrong problem, and I must have acted in stress. Andrey fixed the
real bug in a proper way in these commits:

commit e5545c94
"gpio: of: Check propname before applying "cs-gpios" quirks"
commit 7ce40277
"gpio: of: Check for "spi-cs-high" in child instead of parent node"
Signed-off-by: Olof Johansson <olof@lixom.net>

fbe8758f

Merge tag 'omap-for-v5.1/fixes-signed' of... · c983f102

Olof Johansson authored Apr 07, 2019

Merge tag 'omap-for-v5.1/fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes

Fixes for omaps for v5.1-rc cycle

Few small fixes for omap variants:

- Fix ams-delta gpio IDs
- Add missing of_node_put for omapdss platform init code
- Fix unconfigured audio regulators for two am335x boards
- Fix use of wrong offset for am335x d_can clocks

* tag 'omap-for-v5.1/fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: Fix dcan clkctrl clock for am3
  ARM: dts: am335x-evmsk: Correct the regulators for the audio codec
  ARM: dts: am335x-evm: Correct the regulators for the audio codec
  ARM: OMAP2+: add missing of_node_put after of_device_is_available
  ARM: OMAP1: ams-delta: Fix broken GPIO ID allocation
Signed-off-by: Olof Johansson <olof@lixom.net>

c983f102

Merge tag 'at91-5.1-fixes' of... · fccf5166

Olof Johansson authored Apr 07, 2019

Merge tag 'at91-5.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/fixes

AT91 fixes for 5.1

- fix a typo in sama5d2 pinmuxing which concerns the ISC data 0 signal
- fix a kobject reference leak

* tag 'at91-5.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
  arm/mach-at91/pm : fix possible object reference leak
  ARM: dts: at91: Fix typo in ISC_D0 on PC9
Signed-off-by: Olof Johansson <olof@lixom.net>

fccf5166

Merge tag 'v5.1-rockchip-dtfixes-1' of... · a9708285

Olof Johansson authored Apr 07, 2019

Merge tag 'v5.1-rockchip-dtfixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes

Fixes for dtc warnings, fixes for ethernet transfers on rk3328,
sd-card related fixes on both rk3328 ans rk3288-tinker and a
regulator fix on rock64 and making ddc actually work on the
Rock PI 4 due to missing the ddc bus.

* tag 'v5.1-rockchip-dtfixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  ARM: dts: rockchip: Remove #address/#size-cells from rk3288-veyron gpio-keys
  ARM: dts: rockchip: Remove #address/#size-cells from rk3288 mipi_dsi
  ARM: dts: rockchip: Fix gpu opp node names for rk3288
  arm64: dts: rockchip: fix rk3328 sdmmc0 write errors
  arm64: dts: rockchip: fix rk3328 rgmii high tx error rate
  ARM: dts: rockchip: Fix SD card detection on rk3288-tinker
  arm64: dts: rockchip: Fix vcc_host1_5v GPIO polarity on rk3328-rock64
  ARM: dts: rockchip: fix rk3288 cpu opp node reference
  arm64: dts: rockchip: add DDC bus on Rock Pi 4
  arm64: dts: rockchip: fix rk3328-roc-cc gmac2io tx/rx_delay
Signed-off-by: Olof Johansson <olof@lixom.net>

a9708285

Merge tag 'stratix10_fix_for_v5.1' of... · 3e372088

Olof Johansson authored Apr 07, 2019

Merge tag 'stratix10_fix_for_v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into arm/fixes

arm64: dts: stratix10: fix emac loading warning
- Add missing "altr,sysmgr-syscon" property to all gmac nodes

* tag 'stratix10_fix_for_v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
  arm64: dts: stratix10: add the sysmgr-syscon property from the gmac's
Signed-off-by: Olof Johansson <olof@lixom.net>

3e372088

Merge tag 'reset-fixes-for-v5.1' of git://git.pengutronix.de/pza/linux into arm/fixes · 57683e45

Olof Johansson authored Apr 07, 2019

Reset controller fixes for v5.1

This tag adds missing USB PHY reset lines to the Meson G12A reset
controller header and fixes the Meson Audio ARB driver to prevent
module unloading while it is in use.

* tag 'reset-fixes-for-v5.1' of git://git.pengutronix.de/pza/linux:
  reset: meson-audio-arb: Fix missing .owner setting of reset_controller_dev
  dt-bindings: reset: meson-g12a: Add missing USB2 PHY resets
Signed-off-by: Olof Johansson <olof@lixom.net>

57683e45

dt-bindings: cpu: Fix JSON schema · ac0722f2

Maxime Ripard authored Mar 18, 2019

Commit fd73403a ("dt-bindings: arm: Add SMP enable-method for
Milbeaut") added support for a new cpu enable-method, but did so using
tabulations to ident. This is however invalid in the syntax, and resulted
in a failure when trying to use that schemas for validation.

Use spaces instead of tabs to indent to fix this.

Fixes: fd73403a ("dt-bindings: arm: Add SMP enable-method for Milbeaut")
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Sugaya Taichi <sugaya.taichi@socionext.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

ac0722f2

Merge tag 'for-linus-5.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 3b046891

Linus Torvalds authored Apr 07, 2019

Pull xen fixes from Juergen Gross:
 "One minor fix and a small cleanup for the xen privcmd driver"

* tag 'for-linus-5.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Prevent buffer overflow in privcmd ioctl
  xen: use struct_size() helper in kzalloc()

3b046891

Merge tag 'mtd/fixes-for-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · 82331a70

Linus Torvalds authored Apr 07, 2019

Pull MTD fix from Richard Weinberger:
 "A single fix for a possible infinite loop in the cfi_cmdset_0002
  driver"

* tag 'mtd/fixes-for-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer

82331a70

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · eccc58cb

Linus Torvalds authored Apr 07, 2019

Pull SCSI fixes from James Bottomley:
 "Five small fixes. Four in three drivers: qedi, lpfc and storvsc. The
  final one is labelled core, but merely adds a dh rdac entry for Lenovo
  systems"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: lpfc: Fix missing wakeups on abort threads
  scsi: storvsc: Reduce default ring buffer size to 128 Kbytes
  scsi: storvsc: Fix calculation of sub-channel count
  scsi: core: add new RDAC LENOVO/DE_Series device
  scsi: qedi: remove declaration of nvm_image from stack

eccc58cb

06 Apr, 2019 20 commits

Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · faac51dd

Linus Torvalds authored Apr 06, 2019

Pull i2c fix from Wolfram Sang:
 "A simple but wanted driver bugfix"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: imx: don't leak the i2c adapter on error

faac51dd

Merge branch 'parisc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 373c3925

Linus Torvalds authored Apr 06, 2019

Pull parisc fixes from Helge Deller:
 "A 32-bit boot regression fix introduced in the merge window, a QEMU
  detection fix and two fixes by Sven regarding ptrace & kprobes"

* 'parisc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Detect QEMU earlier in boot process
  parisc: also set iaoq_b in instruction_pointer_set()
  parisc: regs_return_value() should return gpr28
  Revert: parisc: Use F_EXTEND() macro in iosapic code

373c3925

parisc: Detect QEMU earlier in boot process · d006e95b

Helge Deller authored Apr 02, 2019

While adding LASI support to QEMU, I noticed that the QEMU detection in
the kernel happens much too late. For example, when a LASI chip is found
by the kernel, it registers the LASI LED driver as well.  But when we
run on QEMU it makes sense to avoid spending unnecessary CPU cycles, so
we need to access the running_on_QEMU flag earlier than before.

This patch now makes the QEMU detection the fist task of the Linux
kernel by moving it to where the kernel enters the C-coding.

Fixes: 310d8278 ("parisc: qemu idle sleep support")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v4.14+

d006e95b

parisc: also set iaoq_b in instruction_pointer_set() · f324fa58

Sven Schnelle authored Apr 04, 2019

When setting the instruction pointer on PA-RISC we also need
to set the back of the instruction queue to the new offset, otherwise
we will execute on instruction from the new location, and jumping
back to the old location stored in iaoq_b.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 75ebedf1 ("parisc: Add HAVE_REGS_AND_STACK_ACCESS_API feature")
Cc: stable@vger.kernel.org # 4.19+

f324fa58

parisc: regs_return_value() should return gpr28 · 45efd871

Sven Schnelle authored Apr 04, 2019

While working on kretprobes for PA-RISC I was wondering while the
kprobes sanity test always fails on kretprobes. This is caused by
returning gpr20 instead of gpr28.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.14+

45efd871

Revert: parisc: Use F_EXTEND() macro in iosapic code · c2f8d7cb

Helge Deller authored Mar 18, 2019

Revert parts of commit 97d7e2e3 ("parisc: Use F_EXTEND() macro in
iosapic code"). It breaks booting the 32-bit kernel on some machines.
Reported-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Sven Schnelle <svens@stackframe.org>
Fixes: 97d7e2e3 ("parisc: Use F_EXTEND() macro in iosapic code")
Signed-off-by: Helge Deller <deller@gmx.de>

c2f8d7cb

fs: stream_open - opener for stream-like files so that read and write can run... · 10dce8af

Kirill Smelkov authored Mar 26, 2019

fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock

Commit 9c225f26 ("vfs: atomic f_pos accesses as per POSIX") added
locking for file.f_pos access and in particular made concurrent read and
write not possible - now both those functions take f_pos lock for the
whole run, and so if e.g. a read is blocked waiting for data, write will
deadlock waiting for that read to complete.

This caused regression for stream-like files where previously read and
write could run simultaneously, but after that patch could not do so
anymore. See e.g. commit 581d21a2 ("xenbus: fix deadlock on writes
to /proc/xen/xenbus") which fixes such regression for particular case of
/proc/xen/xenbus.

The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
safety for read/write/lseek and added the locking to file descriptors of
all regular files. In 2014 that thread-safety problem was not new as it
was already discussed earlier in 2006.

However even though 2006'th version of Linus's patch was adding f_pos
locking "only for files that are marked seekable with FMODE_LSEEK (thus
avoiding the stream-like objects like pipes and sockets)", the 2014
version - the one that actually made it into the tree as 9c225f26 -
is doing so irregardless of whether a file is seekable or not.

See

    https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/
    https://lwn.net/Articles/180387
    https://lwn.net/Articles/180396

for historic context.

The reason that it did so is, probably, that there are many files that
are marked non-seekable, but e.g. their read implementation actually
depends on knowing current position to correctly handle the read. Some
examples:

	kernel/power/user.c		snapshot_read
	fs/debugfs/file.c		u32_array_read
	fs/fuse/control.c		fuse_conn_waiting_read + ...
	drivers/hwmon/asus_atk0110.c	atk_debugfs_ggrp_read
	arch/s390/hypfs/inode.c		hypfs_read_iter
	...

Despite that, many nonseekable_open users implement read and write with
pure stream semantics - they don't depend on passed ppos at all. And for
those cases where read could wait for something inside, it creates a
situation similar to xenbus - the write could be never made to go until
read is done, and read is waiting for some, potentially external, event,
for potentially unbounded time -> deadlock.

Besides xenbus, there are 14 such places in the kernel that I've found
with semantic patch (see below):

	drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
	drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
	drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
	drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
	net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
	drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
	drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
	drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
	net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
	drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
	drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
	drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
	drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
	drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()

In addition to the cases above another regression caused by f_pos
locking is that now FUSE filesystems that implement open with
FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
stream-like files - for the same reason as above e.g. read can deadlock
write locking on file.f_pos in the kernel.

FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990 ("fuse:
implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
write routines not depending on current position at all, and with both
read and write being potentially blocking operations:

See

    https://github.com/libfuse/osspd
    https://lwn.net/Articles/308445

    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510

Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
"somewhat pipe-like files ..." with read handler not using offset.
However that test implements only read without write and cannot exercise
the deadlock scenario:

    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216

I've actually hit the read vs write deadlock for real while implementing
my FUSE filesystem where there is /head/watch file, for which open
creates separate bidirectional socket-like stream in between filesystem
and its user with both read and write being later performed
simultaneously. And there it is semantically not easy to split the
stream into two separate read-only and write-only channels:

    https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169

Let's fix this regression. The plan is:

1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
   doing so would break many in-kernel nonseekable_open users which
   actually use ppos in read/write handlers.

2. Add stream_open() to kernel to open stream-like non-seekable file
   descriptors. Read and write on such file descriptors would never use
   nor change ppos. And with that property on stream-like files read and
   write will be running without taking f_pos lock - i.e. read and write
   could be running simultaneously.

3. With semantic patch search and convert to stream_open all in-kernel
   nonseekable_open users for which read and write actually do not
   depend on ppos and where there is no other methods in file_operations
   which assume @offset access.

4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
   steam_open if that bit is present in filesystem open reply.

   It was tempting to change fs/fuse/ open handler to use stream_open
   instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
   grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
   and in particular GVFS which actually uses offset in its read and
   write handlers

	https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481

   so if we would do such a change it will break a real user.

5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
   from v3.14+ (the kernel where 9c225f26 first appeared).

   This will allow to patch OSSPD and other FUSE filesystems that
   provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
   in their open handler and this way avoid the deadlock on all kernel
   versions. This should work because fs/fuse/ ignores unknown open
   flags returned from a filesystem and so passing FOPEN_STREAM to a
   kernel that is not aware of this flag cannot hurt. In turn the kernel
   that is not aware of FOPEN_STREAM will be < v3.14 where just
   FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
   write deadlock.

This patch adds stream_open, converts /proc/xen/xenbus to it and adds
semantic patch to automatically locate in-kernel places that are either
required to be converted due to read vs write deadlock, or that are just
safe to be converted because read and write do not use ppos and there
are no other funky methods in file_operations.

Regarding semantic patch I've verified each generated change manually -
that it is correct to convert - and each other nonseekable_open instance
left - that it is either not correct to convert there, or that it is not
converted due to current stream_open.cocci limitations.

The script also does not convert files that should be valid to convert,
but that currently have .llseek = noop_llseek or generic_file_llseek for
unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)

Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yongzhi Pan <panyongzhi@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Nikolaus Rath <Nikolaus@rath.org>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

10dce8af

xsysace: Fix error handling in ace_setup · 47b16820

Guenter Roeck authored Feb 19, 2019

If xace hardware reports a bad version number, the error handling code
in ace_setup() calls put_disk(), followed by queue cleanup. However, since
the disk data structure has the queue pointer set, put_disk() also
cleans and releases the queue. This results in blk_cleanup_queue()
accessing an already released data structure, which in turn may result
in a crash such as the following.

[   10.681671] BUG: Kernel NULL pointer dereference at 0x00000040
[   10.681826] Faulting instruction address: 0xc0431480
[   10.682072] Oops: Kernel access of bad area, sig: 11 [#1]
[   10.682251] BE PAGE_SIZE=4K PREEMPT Xilinx Virtex440
[   10.682387] Modules linked in:
[   10.682528] CPU: 0 PID: 1 Comm: swapper Tainted: G        W         5.0.0-rc6-next-20190218+ #2
[   10.682733] NIP:  c0431480 LR: c043147c CTR: c0422ad8
[   10.682863] REGS: cf82fbe0 TRAP: 0300   Tainted: G        W          (5.0.0-rc6-next-20190218+)
[   10.683065] MSR:  00029000 <CE,EE,ME>  CR: 22000222  XER: 00000000
[   10.683236] DEAR: 00000040 ESR: 00000000
[   10.683236] GPR00: c043147c cf82fc90 cf82ccc0 00000000 00000000 00000000 00000002 00000000
[   10.683236] GPR08: 00000000 00000000 c04310bc 00000000 22000222 00000000 c0002c54 00000000
[   10.683236] GPR16: 00000000 00000001 c09aa39c c09021b0 c09021dc 00000007 c0a68c08 00000000
[   10.683236] GPR24: 00000001 ced6d400 ced6dcf0 c0815d9c 00000000 00000000 00000000 cedf0800
[   10.684331] NIP [c0431480] blk_mq_run_hw_queue+0x28/0x114
[   10.684473] LR [c043147c] blk_mq_run_hw_queue+0x24/0x114
[   10.684602] Call Trace:
[   10.684671] [cf82fc90] [c043147c] blk_mq_run_hw_queue+0x24/0x114 (unreliable)
[   10.684854] [cf82fcc0] [c04315bc] blk_mq_run_hw_queues+0x50/0x7c
[   10.685002] [cf82fce0] [c0422b24] blk_set_queue_dying+0x30/0x68
[   10.685154] [cf82fcf0] [c0423ec0] blk_cleanup_queue+0x34/0x14c
[   10.685306] [cf82fd10] [c054d73c] ace_probe+0x3dc/0x508
[   10.685445] [cf82fd50] [c052d740] platform_drv_probe+0x4c/0xb8
[   10.685592] [cf82fd70] [c052abb0] really_probe+0x20c/0x32c
[   10.685728] [cf82fda0] [c052ae58] driver_probe_device+0x68/0x464
[   10.685877] [cf82fdc0] [c052b500] device_driver_attach+0xb4/0xe4
[   10.686024] [cf82fde0] [c052b5dc] __driver_attach+0xac/0xfc
[   10.686161] [cf82fe00] [c0528428] bus_for_each_dev+0x80/0xc0
[   10.686314] [cf82fe30] [c0529b3c] bus_add_driver+0x144/0x234
[   10.686457] [cf82fe50] [c052c46c] driver_register+0x88/0x15c
[   10.686610] [cf82fe60] [c09de288] ace_init+0x4c/0xac
[   10.686742] [cf82fe80] [c0002730] do_one_initcall+0xac/0x330
[   10.686888] [cf82fee0] [c09aafd0] kernel_init_freeable+0x34c/0x478
[   10.687043] [cf82ff30] [c0002c6c] kernel_init+0x18/0x114
[   10.687188] [cf82ff40] [c000f2f0] ret_from_kernel_thread+0x14/0x1c
[   10.687349] Instruction dump:
[   10.687435] 3863ffd4 4bfffd70 9421ffd0 7c0802a6 93c10028 7c9e2378 93e1002c 38810008
[   10.687637] 7c7f1b78 90010034 4bfffc25 813f008c <81290040> 75290100 4182002c 80810008
[   10.688056] ---[ end trace 13c9ff51d41b9d40 ]---

Fix the problem by setting the disk queue pointer to NULL before calling
put_disk(). A more comprehensive fix might be to rearrange the code
to check the hardware version before initializing data structures,
but I don't know if this would have undesirable side effects, and
it would increase the complexity of backporting the fix to older kernels.

Fixes: 74489a91 ("Add support for Xilinx SystemACE CompactFlash interface")
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

47b16820

null_blk: prevent crash from bad home_node value · 7ff684a6

John Pittman authored Apr 05, 2019

At module load, if the selected home_node value is greater than
the available numa nodes, the system will crash in
__alloc_pages_nodemask() due to a bad paging request.  Prevent this
user error crash by detecting the bad value, logging an error, and
setting g_home_node back to the default of NUMA_NO_NODE.
Signed-off-by: John Pittman <jpittman@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7ff684a6

Merge tag 'rtc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · be76865d

Linus Torvalds authored Apr 06, 2019

Pull RTC fixes from Alexandre Belloni:

 - Various alarm fixes for da9063, cros-ec and sh

 - sd3078 manufacturer name fix as this was introduced this cycle

* tag 'rtc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  rtc: da9063: set uie_unsupported when relevant
  rtc: sd3078: fix manufacturer name
  rtc: sh: Fix invalid alarm warning for non-enabled alarm
  rtc: cros-ec: Fail suspend/resume if wake IRQ can't be configured

be76865d

i2c: imx: don't leak the i2c adapter on error · 3ace6891

Laurentiu Tudor authored Apr 01, 2019

Make sure to free the i2c adapter on the error exit path.
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: e1ab9a46 ("i2c: imx: improve the error handling in i2c_imx_dma_request()")
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

3ace6891

Merge branch 'akpm' (patches from Andrew) · f654f0fc

Linus Torvalds authored Apr 05, 2019

Merge misc fixes from Andrew Morton:
 "14 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kernel/sysctl.c: fix out-of-bounds access when setting file-max
  mm/util.c: fix strndup_user() comment
  sh: fix multiple function definition build errors
  MAINTAINERS: add maintainer and replacing reviewer ARM/NUVOTON NPCM
  MAINTAINERS: fix bad pattern in ARM/NUVOTON NPCM
  mm: writeback: use exact memcg dirty counts
  psi: clarify the units used in pressure files
  mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()
  hugetlbfs: fix memory leak for resv_map
  mm: fix vm_fault_t cast in VM_FAULT_GET_HINDEX()
  lib/lzo: fix bugs for very short or empty input
  include/linux/bitrev.h: fix constant bitrev
  kmemleak: powerpc: skip scanning holes in the .bss section
  lib/string.c: implement a basic bcmp

f654f0fc

kernel/sysctl.c: fix out-of-bounds access when setting file-max · 9002b214

Will Deacon authored Apr 05, 2019

Commit 32a5ad9c ("sysctl: handle overflow for file-max") hooked up
min/max values for the file-max sysctl parameter via the .extra1 and
.extra2 fields in the corresponding struct ctl_table entry.

Unfortunately, the minimum value points at the global 'zero' variable,
which is an int.  This results in a KASAN splat when accessed as a long
by proc_doulongvec_minmax on 64-bit architectures:

  | BUG: KASAN: global-out-of-bounds in __do_proc_doulongvec_minmax+0x5d8/0x6a0
  | Read of size 8 at addr ffff2000133d1c20 by task systemd/1
  |
  | CPU: 0 PID: 1 Comm: systemd Not tainted 5.1.0-rc3-00012-g40b114779944 #2
  | Hardware name: linux,dummy-virt (DT)
  | Call trace:
  |  dump_backtrace+0x0/0x228
  |  show_stack+0x14/0x20
  |  dump_stack+0xe8/0x124
  |  print_address_description+0x60/0x258
  |  kasan_report+0x140/0x1a0
  |  __asan_report_load8_noabort+0x18/0x20
  |  __do_proc_doulongvec_minmax+0x5d8/0x6a0
  |  proc_doulongvec_minmax+0x4c/0x78
  |  proc_sys_call_handler.isra.19+0x144/0x1d8
  |  proc_sys_write+0x34/0x58
  |  __vfs_write+0x54/0xe8
  |  vfs_write+0x124/0x3c0
  |  ksys_write+0xbc/0x168
  |  __arm64_sys_write+0x68/0x98
  |  el0_svc_common+0x100/0x258
  |  el0_svc_handler+0x48/0xc0
  |  el0_svc+0x8/0xc
  |
  | The buggy address belongs to the variable:
  |  zero+0x0/0x40
  |
  | Memory state around the buggy address:
  |  ffff2000133d1b00: 00 00 00 00 00 00 00 00 fa fa fa fa 04 fa fa fa
  |  ffff2000133d1b80: fa fa fa fa 04 fa fa fa fa fa fa fa 04 fa fa fa
  | >ffff2000133d1c00: fa fa fa fa 04 fa fa fa fa fa fa fa 00 00 00 00
  |                                ^
  |  ffff2000133d1c80: fa fa fa fa 00 fa fa fa fa fa fa fa 00 00 00 00
  |  ffff2000133d1d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Fix the splat by introducing a unsigned long 'zero_ul' and using that
instead.

Link: http://lkml.kernel.org/r/20190403153409.17307-1-will.deacon@arm.com
Fixes: 32a5ad9c ("sysctl: handle overflow for file-max")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Christian Brauner <christian@brauner.io>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9002b214

mm/util.c: fix strndup_user() comment · e9145521

Andrew Morton authored Apr 05, 2019

The kerneldoc misdescribes strndup_user()'s return value.

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Timur Tabi <timur@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e9145521

sh: fix multiple function definition build errors · acaf892e

Randy Dunlap authored Apr 05, 2019

Many of the sh CPU-types have their own plat_irq_setup() and
arch_init_clk_ops() functions, so these same (empty) functions in
arch/sh/boards/of-generic.c are not needed and cause build errors.

If there is some case where these empty functions are needed, they can
be retained by marking them as "__weak" while at the same time making
builds that do not need them succeed.

Fixes these build errors:

arch/sh/boards/of-generic.o: In function `plat_irq_setup':
(.init.text+0x134): multiple definition of `plat_irq_setup'
arch/sh/kernel/cpu/sh2/setup-sh7619.o:(.init.text+0x30): first defined here
arch/sh/boards/of-generic.o: In function `arch_init_clk_ops':
(.init.text+0x118): multiple definition of `arch_init_clk_ops'
arch/sh/kernel/cpu/sh2/clock-sh7619.o:(.init.text+0x0): first defined here

Link: http://lkml.kernel.org/r/9ee4e0c5-f100-86a2-bd4d-1d3287ceab31@infradead.orgSigned-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

acaf892e

MAINTAINERS: add maintainer and replacing reviewer ARM/NUVOTON NPCM · 803cfadc

Tomer Maimon authored Apr 05, 2019

Add Tali Perry as Nuvoton NPCM maintainer, replace Brendan Higgins
Nuvoton NPCM reviewer with Benjamin Fair.

Link: http://lkml.kernel.org/r/20190328235752.334462-2-tmaimon77@gmail.comSigned-off-by: Tomer Maimon <tmaimon77@gmail.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Reviewed-by: Benjamin Fair <benjaminfair@google.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Cc: Joe Perches <joe@perches.com>
Cc: Avi Fishman <avifishman70@gmail.com>
Cc: Patrick Venture <venture@google.com>
Cc: Nancy Yuen <yuenn@google.com>
Cc: Tali Perry <tali.perry1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

803cfadc

MAINTAINERS: fix bad pattern in ARM/NUVOTON NPCM · 166dbd93

Tomer Maimon authored Apr 05, 2019

In the process of upstreaming architecture support for ARM/NUVOTON NPCM
include/dt-bindings/clock/nuvoton,npcm7xx-clks.h was renamed
include/dt-bindings/clock/nuvoton,npcm7xx-clock.h without updating
MAINTAINERS.  This updates the MAINTAINERS pattern to match the new name
of this file.

Link: http://lkml.kernel.org/r/20190328235752.334462-1-tmaimon77@gmail.com
Fixes: 6a498e06 ("MAINTAINERS: Add entry for the Nuvoton NPCM architecture")
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Tomer Maimon <tmaimon77@gmail.com>
Reported-by: Joe Perches <joe@perches.com>
Reviewed-by: Benjamin Fair <benjaminfair@google.com>
Cc: Avi Fishman <avifishman70@gmail.com>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Cc: Nancy Yuen <yuenn@google.com>
Cc: Patrick Venture <venture@google.com>
Cc: Tali Perry <tali.perry1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

166dbd93

mm: writeback: use exact memcg dirty counts · 0b3d6e6f

Greg Thelen authored Apr 05, 2019

Since commit a983b5eb ("mm: memcontrol: fix excessive complexity in
memory.stat reporting") memcg dirty and writeback counters are managed
as:

 1) per-memcg per-cpu values in range of [-32..32]

 2) per-memcg atomic counter

When a per-cpu counter cannot fit in [-32..32] it's flushed to the
atomic.  Stat readers only check the atomic.  Thus readers such as
balance_dirty_pages() may see a nontrivial error margin: 32 pages per
cpu.

Assuming 100 cpus:
   4k x86 page_size:  13 MiB error per memcg
  64k ppc page_size: 200 MiB error per memcg

Considering that dirty+writeback are used together for some decisions the
errors double.

This inaccuracy can lead to undeserved oom kills.  One nasty case is
when all per-cpu counters hold positive values offsetting an atomic
negative value (i.e.  per_cpu[*]=32, atomic=n_cpu*-32).
balance_dirty_pages() only consults the atomic and does not consider
throttling the next n_cpu*32 dirty pages.  If the file_lru is in the
13..200 MiB range then there's absolutely no dirty throttling, which
burdens vmscan with only dirty+writeback pages thus resorting to oom
kill.

It could be argued that tiny containers are not supported, but it's more
subtle.  It's the amount the space available for file lru that matters.
If a container has memory.max-200MiB of non reclaimable memory, then it
will also suffer such oom kills on a 100 cpu machine.

The following test reliably ooms without this patch.  This patch avoids
oom kills.

  $ cat test
  mount -t cgroup2 none /dev/cgroup
  cd /dev/cgroup
  echo +io +memory > cgroup.subtree_control
  mkdir test
  cd test
  echo 10M > memory.max
  (echo $BASHPID > cgroup.procs && exec /memcg-writeback-stress /foo)
  (echo $BASHPID > cgroup.procs && exec dd if=/dev/zero of=/foo bs=2M count=100)

  $ cat memcg-writeback-stress.c
  /*
   * Dirty pages from all but one cpu.
   * Clean pages from the non dirtying cpu.
   * This is to stress per cpu counter imbalance.
   * On a 100 cpu machine:
   * - per memcg per cpu dirty count is 32 pages for each of 99 cpus
   * - per memcg atomic is -99*32 pages
   * - thus the complete dirty limit: sum of all counters 0
   * - balance_dirty_pages() only sees atomic count -99*32 pages, which
   *   it max()s to 0.
   * - So a workload can dirty -99*32 pages before balance_dirty_pages()
   *   cares.
   */
  #define _GNU_SOURCE
  #include <err.h>
  #include <fcntl.h>
  #include <sched.h>
  #include <stdlib.h>
  #include <stdio.h>
  #include <sys/stat.h>
  #include <sys/sysinfo.h>
  #include <sys/types.h>
  #include <unistd.h>

  static char *buf;
  static int bufSize;

  static void set_affinity(int cpu)
  {
  	cpu_set_t affinity;

  	CPU_ZERO(&affinity);
  	CPU_SET(cpu, &affinity);
  	if (sched_setaffinity(0, sizeof(affinity), &affinity))
  		err(1, "sched_setaffinity");
  }

  static void dirty_on(int output_fd, int cpu)
  {
  	int i, wrote;

  	set_affinity(cpu);
  	for (i = 0; i < 32; i++) {
  		for (wrote = 0; wrote < bufSize; ) {
  			int ret = write(output_fd, buf+wrote, bufSize-wrote);
  			if (ret == -1)
  				err(1, "write");
  			wrote += ret;
  		}
  	}
  }

  int main(int argc, char **argv)
  {
  	int cpu, flush_cpu = 1, output_fd;
  	const char *output;

  	if (argc != 2)
  		errx(1, "usage: output_file");

  	output = argv[1];
  	bufSize = getpagesize();
  	buf = malloc(getpagesize());
  	if (buf == NULL)
  		errx(1, "malloc failed");

  	output_fd = open(output, O_CREAT|O_RDWR);
  	if (output_fd == -1)
  		err(1, "open(%s)", output);

  	for (cpu = 0; cpu < get_nprocs(); cpu++) {
  		if (cpu != flush_cpu)
  			dirty_on(output_fd, cpu);
  	}

  	set_affinity(flush_cpu);
  	if (fsync(output_fd))
  		err(1, "fsync(%s)", output);
  	if (close(output_fd))
  		err(1, "close(%s)", output);
  	free(buf);
  }

Make balance_dirty_pages() and wb_over_bg_thresh() work harder to
collect exact per memcg counters.  This avoids the aforementioned oom
kills.

This does not affect the overhead of memory.stat, which still reads the
single atomic counter.

Why not use percpu_counter? memcg already handles cpus going offline, so
no need for that overhead from percpu_counter.  And the percpu_counter
spinlocks are more heavyweight than is required.

It probably also makes sense to use exact dirty and writeback counters
in memcg oom reports.  But that is saved for later.

Link: http://lkml.kernel.org/r/20190329174609.164344-1-gthelen@google.comSigned-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>	[4.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0b3d6e6f

psi: clarify the units used in pressure files · be87ab0a

Waiman Long authored Apr 05, 2019

The output of the PSI files show a bunch of numbers with no unit. The
psi.txt documentation file also does not indicate what units are used.
One can only find out by looking at the source code. The units are
percentage for the averages and useconds for the total. Make the
information easier to find by documenting the units in psi.txt.

Link: http://lkml.kernel.org/r/20190402193810.3450-1-longman@redhat.comSigned-off-by: Waiman Long <longman@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

be87ab0a

mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd() · c6f3c5ee

Aneesh Kumar K.V authored Apr 05, 2019

With some architectures like ppc64, set_pmd_at() cannot cope with a
situation where there is already some (different) valid entry present.

Use pmdp_set_access_flags() instead to modify the pfn which is built to
deal with modifying existing PMD entries.

This is similar to commit cae85cb8 ("mm/memory.c: fix modifying of
page protection by insert_pfn()")

We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't
support pud pfn entries now.

Without this patch we also see the below message in kernel log "BUG:
non-zero pgtables_bytes on freeing mm:"

Link: http://lkml.kernel.org/r/20190402115125.18803-1-aneesh.kumar@linux.ibm.comSigned-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reported-by: Chandan Rajendra <chandan@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c6f3c5ee