Commits · 758e99fefe1d9230111296956335cd35995c0eaf · nexedi / linux

20 Feb, 2017 2 commits
- nfsd: minor nfsd_setattr cleanup · 758e99fe
  Christoph Hellwig authored Feb 20, 2017
```
Simplify exit paths, size_change use.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
  758e99fe
- nfsd: merge stable fix into main nfsd branch · 60709c09
  J. Bruce Fields authored Feb 20, 2017
  
  60709c09
17 Feb, 2017 6 commits

NFSD: Reserve adequate space for LOCKT operation · 7323f0d2

Kinglong Mee authored Feb 03, 2017

After tightening the OP_LOCKT reply size estimate, we can get warnings
like:

[11512.783519] RPC request reserved 124 but used 152
[11512.813624] RPC request reserved 108 but used 136
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

7323f0d2

NFSD: Get response size before operation for all RPCs · 2282cd2c

Kinglong Mee authored Feb 03, 2017

NFSD usess PAGE_SIZE as the reply size estimate for RPCs which don't
support op_rsize_bop(), A PAGE_SIZE (4096) is larger than many real
response sizes, eg, access (op_encode_hdr_size + 2), seek
(op_encode_hdr_size + 3).

This patch just adds op_rsize_bop() for all RPCs getting response size.

An overestimate is generally safe but the tighter estimates are probably
better.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

2282cd2c

nfsd/callback: Drop a useless data copy when comparing sessionid · 82743380
Kinglong Mee authored Feb 05, 2017
```
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
82743380

nfsd/callback: skip the callback tag · e86a40bc

Kinglong Mee authored Feb 05, 2017

The callback tag is NULL, and hdr->nops is unused too right now, but.
But if we were to ever test with a nonzero callback tag, nops will get a
bad value.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

e86a40bc

nfsd/callback: Cleanup callback cred on shutdown · f7d1ddbe

Kinglong Mee authored Feb 05, 2017

The rpccred gotten from rpc_lookup_machine_cred() should be put when
state is shutdown.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

f7d1ddbe

nfsd/idmap: return nfserr_inval for 0-length names · c3821b34

Kinglong Mee authored Feb 05, 2017

Tigran Mkrtchyan's new pynfs testcases for zero length principals fail:

SATT16   st_setattr.testEmptyPrincipal                            : FAILURE
           Setting empty owner should return NFS4ERR_INVAL,
           instead got NFS4ERR_BADOWNER
SATT17   st_setattr.testEmptyGroupPrincipal                       : FAILURE
           Setting empty owner_group should return NFS4ERR_INVAL,
           instead got NFS4ERR_BADOWNER

This patch checks the principal and returns nfserr_inval directly.  It
could check after decoding in nfs4xdr.c, but it's simpler to do it in
nfsd_map_xxxx.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

c3821b34

10 Feb, 2017 1 commit

nfsd: Revert "nfsd: special case truncates some more" · 0839ffb8

J. Bruce Fields authored Feb 09, 2017

This patch incorrectly attempted nested mnt_want_write, and incorrectly
disabled nfsd's owner override for truncate.  We'll fix those problems
and make another attempt soon, for the moment I think the safest is to
revert.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

0839ffb8

08 Feb, 2017 9 commits

SUNRPC/Cache: Always treat the invalid cache as unexpired · d6fc8821

Kinglong Mee authored Feb 08, 2017

When the first time pynfs runs after rpc/nfsd startup, always get the warning,

"Got error: Connection closed"

I found the problem is caused by,
1. A new startup of nfsd, rpc.mountd, etc,
2. A rpc request from client (pynfs test, or normal mounting),
3. An ip_map cache is created but invalid, so upcall to rpc.mountd,
4. rpc.mountd process the ip_map upcall, before write the valid data to nfsd,
   do auth_reload(), and check_useipaddr(),
5. For the first time, old_use_ipaddr = -1, it causes rpc.mountd do write_flush    that doing cache_clean,
6. The ip_map cache will be treat as expired and clean,
7. When rpc.mountd write the valid data to nfsd, a new ip_map is created
   and updated, the cache_check of old ip_map(doing the upcall) will
   return -ETIMEDOUT.
8. RPC layer return SVC_CLOSE and close the xprt after commit 4d712ef1
   "svcauth_gss: Close connection when dropping an incoming message"

NeilBrown suggest in another email,

"If CACHE_VALID is not set, then there is no data in the cache item,
 so there is nothing to expire. So it would be nice if cache items that
 don't have CACHE_VALID are never treated as expired."

v3, change the order of the two patches
v2, change the checking of CACHE_PENDING to CACHE_VALID
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

d6fc8821

SUNRPC: Drop all entries from cache_detail when cache_purge() · 471a930a

Kinglong Mee authored Feb 08, 2017

User always free the cache_detail after sunrpc_destroy_cache_detail(),
so, it must cleanup up entries that left in the cache_detail,
otherwise, NULL reference may be caused when using the left entries.

Also, NeriBrown suggests "write a stand-alone cache_purge()."

v3, move the cache_fresh_unlocked() out of write lock,
v2, a stand-alone cache_purge(), not only for sunrpc_destroy_cache_detail
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

471a930a

svcrdma: Poll CQs in "workqueue" mode · 81fa3275

Chuck Lever authored Feb 07, 2017

svcrdma calls svc_xprt_put() in its completion handlers, which
currently run in IRQ context.

However, svc_xprt_put() is meant to be invoked in process context,
not in IRQ context. After the last transport reference is gone, it
directly calls a transport release function that expects to run in
process context.

Change the CQ polling modes to IB_POLL_WORKQUEUE so that svcrdma
invokes svc_xprt_put() only in process context. As an added benefit,
bottom half-disabled spin locking can be eliminated from I/O paths.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

81fa3275

svcrdma: Combine list fields in struct svc_rdma_op_ctxt · a3ab867f

Chuck Lever authored Feb 07, 2017

Clean up: The free list and the dto_q list fields are never used at
the same time. Reduce the size of struct svc_rdma_op_ctxt by
combining these fields.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

a3ab867f

svcrdma: Remove unused sc_dto_q field · aba7d14b

Chuck Lever authored Feb 07, 2017

Clean up. Commit be99bb11 ("svcrdma: Use new CQ API for
RPC-over-RDMA server send CQs") removed code that used the sc_dto_q
field, but neglected to remove sc_dto_q at the same time.

Fixes: be99bb11 ("svcrdma: Use new CQ API for RPC-over- ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

aba7d14b

svcrdma: Clean up backchannel send header encoding · c2ccf64a

Chuck Lever authored Feb 07, 2017

Replace C structure-based XDR decoding with pointer arithmetic.
Pointer arithmetic is considered more portable.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

c2ccf64a

svcrdma: Clean up RPC-over-RDMA Call header decoder · 647e18e3

Chuck Lever authored Feb 07, 2017

Replace C structure-based XDR decoding with pointer arithmetic.
Pointer arithmetic is considered more portable.

Rename the "decode" functions. Nothing is decoded here, they
perform only transport header sanity checking. Use existing XDR
naming conventions to help readability.

Straight-line the hot path:
 - relocate the dprintk call sites out of line
 - remove unnecessary byte-swapping
 - reduce count of conditional branches

Deprecate RDMA_MSGP. It's not properly spec'd by RFC5666, and
therefore never used by any V1 client.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

647e18e3

svcrdma: Clean up RPC-over-RDMA Reply header encoder · 98fc21d3

Chuck Lever authored Feb 07, 2017

Replace C structure-based XDR decoding with pointer arithmetic.
Pointer arithmetic is considered more portable, and is used
throughout the kernel's existing XDR encoders. The gcc optimizer
generates similar assembler code either way.

Byte-swapping before a memory store on x86 typically results in an
instruction pipeline stall. Avoid byte-swapping when encoding a new
header.

svcrdma currently doesn't alter a connection's credit grant value
after the connection has been accepted, so it is effectively a
constant. Cache the byte-swapped value in a separate field.

Christoph suggested pulling the header encoding logic into the only
function that uses it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

98fc21d3

svcrdma: Another sendto chunk list parsing update · cbaf5803

Chuck Lever authored Feb 07, 2017

Commit 5fdca653 ("svcrdma: Renovate sendto chunk list parsing")
missed a spot. svc_rdma_xdr_get_reply_hdr_len() also assumes the
Write list has only one Write chunk. There's no harm in making this
code more general.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

cbaf5803

06 Feb, 2017 1 commit

NFSDv4: use export cache flushtime for changeid on V4ROOT objects. · b8800921

NeilBrown authored Jan 30, 2017

If you change the set of filesystems that are exported, then
the contents of various directories in the NFSv4 pseudo-root
is likely to change.  However the change-id of those
directories is currently tied to the underlying directory,
so the client may not see the changes in a timely fashion.

This patch changes the change-id number to be derived from the
"flush_time" of the export cache.  Whenever any changes are
made to the set of exported filesystems, this flush_time is
updated.  The result is that clients see changes to the set
of exported filesystems much more quickly, often immediately.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

b8800921

05 Feb, 2017 1 commit
- Linux 4.10-rc7 · d5adbfcd
  Linus Torvalds authored Feb 05, 2017
  
  d5adbfcd
04 Feb, 2017 6 commits

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a572a1b9

Linus Torvalds authored Feb 04, 2017

Pull irq fixes from Thomas Gleixner:

 - Prevent double activation of interrupt lines, which causes problems
   on certain interrupt controllers

 - Handle the fallout of the above because x86 (ab)uses the activation
   function to reconfigure interrupts under the hood.

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Make irq activate operations symmetric
  irqdomain: Avoid activating interrupts more than once

a572a1b9

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 24bc5fe7

Linus Torvalds authored Feb 04, 2017

Pull KVM fix from Radim Krčmář:
 "Fix a regression that prevented migration between hosts with different
  XSAVE features even if the missing features were not used by the guest
  (for stable)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: do not save guest-unsupported XSAVE state

24bc5fe7

Merge tag 'char-misc-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 412e6d3f

Linus Torvalds authored Feb 04, 2017

Pull char/misc driver fixes from Greg KH:
 "Here are two bugfixes that resolve some reported issues. One in the
  firmware loader, that should fix the much-reported problem of crashes
  with it. The other is a hyperv fix for a reported regression.

  Both have been in linux-next for a week or so with no reported issues"

* tag 'char-misc-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Drivers: hv: vmbus: finally fix hv_need_to_signal_on_read()
  firmware: fix NULL pointer dereference in __fw_load_abort()

412e6d3f

Merge tag 'staging-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 252bf9f4

Linus Torvalds authored Feb 04, 2017

Pull staging/IIO fixes from Greg KH:
 "Here are a few small IIO and one staging driver fix for 4.10-rc7. They
  fix some reported issues with the drivers.

  All of them have been in linux-next for a week or so with no reported
  issues"

* tag 'staging-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: greybus: timesync: validate platform state callback
  iio: dht11: Use usleep_range instead of msleep for start signal
  iio: adc: palmas_gpadc: retrieve a valid iio_dev in suspend/resume
  iio: health: max30100: fixed parenthesis around FIFO count check
  iio: health: afe4404: retrieve a valid iio_dev in suspend/resume
  iio: health: afe4403: retrieve a valid iio_dev in suspend/resume

252bf9f4

Merge tag 'usb-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 8fcdcc42

Linus Torvalds authored Feb 04, 2017

Pull USB fixes from Greg KH:
 "Here are some small USB fixes for some reported issues, and the usual
  number of new device ids for 4.10-rc7.

  All of these, except the last new device id, have been in linux-next
  for a while with no reported issues"

* tag 'usb-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: pl2303: add ATEN device ID
  usb: gadget: f_fs: Assorted buffer overflow checks.
  USB: Add quirk for WORLDE easykey.25 MIDI keyboard
  usb: musb: Fix external abort on non-linefetch for musb_irq_work()
  usb: musb: Fix host mode error -71 regression
  USB: serial: option: add device ID for HP lt2523 (Novatel E371)
  USB: serial: qcserial: add Dell DW5570 QDL

8fcdcc42

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · a0a28644

Linus Torvalds authored Feb 03, 2017

Pull SCSI fix from James Bottomley:
 "A single fix this time: a fix for a virtqueue removal bug which only
  appears to affect S390, but which results in the queue hanging forever
  thus causing the machine to fail shutdown"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: virtio_scsi: Reject commands when virtqueue is broken

a0a28644

03 Feb, 2017 14 commits

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · a49e6f58

Linus Torvalds authored Feb 03, 2017

Pull virtio/vhost fixes from Michael S. Tsirkin:
 "Last minute fixes:

   - ARM DMA fix revert

   - vhost endian-ness fix

   - MAINTAINERS: email address change for Amit"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  MAINTAINERS: update email address for Amit Shah
  vhost: fix initialization for vq->is_le
  Revert "vring: Force use of DMA API for ARM-based systems with legacy devices"

a49e6f58

Merge tag 'vfio-v4.10-rc7' of git://github.com/awilliam/linux-vfio · e9f7f17d

Linus Torvalds authored Feb 03, 2017

Pull VFIO fix from Alex Williamson:
 "Fix an error path in SPAPR IOMMU backend (Alexey Kardashevskiy)"

* tag 'vfio-v4.10-rc7' of git://github.com/awilliam/linux-vfio:
  vfio/spapr: Fix missing mutex unlock when creating a window

e9f7f17d

Merge branch 'akpm' (patches from Andrew) · 7a92cc6b

Linus Torvalds authored Feb 03, 2017

Merge fixes from Andrew Morton:
 "8 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, fs: check for fatal signals in do_generic_file_read()
  fs: break out of iomap_file_buffered_write on fatal signals
  base/memory, hotplug: fix a kernel oops in show_valid_zones()
  mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone()
  jump label: pass kbuild_cflags when checking for asm goto support
  shmem: fix sleeping from atomic context
  kasan: respect /proc/sys/kernel/traceoff_on_warning
  zswap: disable changing params if init fails

7a92cc6b

mm, fs: check for fatal signals in do_generic_file_read() · 5abf186a

Michal Hocko authored Feb 03, 2017

do_generic_file_read() can be told to perform a large request from
userspace.  If the system is under OOM and the reading task is the OOM
victim then it has an access to memory reserves and finishing the full
request can lead to the full memory depletion which is dangerous.  Make
sure we rather go with a short read and allow the killed task to
terminate.

Link: http://lkml.kernel.org/r/20170201092706.9966-3-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

5abf186a

fs: break out of iomap_file_buffered_write on fatal signals · d1908f52

Michal Hocko authored Feb 03, 2017

Tetsuo has noticed that an OOM stress test which performs large write
requests can cause the full memory reserves depletion.  He has tracked
this down to the following path

	__alloc_pages_nodemask+0x436/0x4d0
	alloc_pages_current+0x97/0x1b0
	__page_cache_alloc+0x15d/0x1a0          mm/filemap.c:728
	pagecache_get_page+0x5a/0x2b0           mm/filemap.c:1331
	grab_cache_page_write_begin+0x23/0x40   mm/filemap.c:2773
	iomap_write_begin+0x50/0xd0             fs/iomap.c:118
	iomap_write_actor+0xb5/0x1a0            fs/iomap.c:190
	? iomap_write_end+0x80/0x80             fs/iomap.c:150
	iomap_apply+0xb3/0x130                  fs/iomap.c:79
	iomap_file_buffered_write+0x68/0xa0     fs/iomap.c:243
	? iomap_write_end+0x80/0x80
	xfs_file_buffered_aio_write+0x132/0x390 [xfs]
	? remove_wait_queue+0x59/0x60
	xfs_file_write_iter+0x90/0x130 [xfs]
	__vfs_write+0xe5/0x140
	vfs_write+0xc7/0x1f0
	? syscall_trace_enter+0x1d0/0x380
	SyS_write+0x58/0xc0
	do_syscall_64+0x6c/0x200
	entry_SYSCALL64_slow_path+0x25/0x25

the oom victim has access to all memory reserves to make a forward
progress to exit easier.  But iomap_file_buffered_write and other
callers of iomap_apply loop to complete the full request.  We need to
check for fatal signals and back off with a short write instead.

As the iomap_apply delegates all the work down to the actor we have to
hook into those.  All callers that work with the page cache are calling
iomap_write_begin so we will check for signals there.  dax_iomap_actor
has to handle the situation explicitly because it copies data to the
userspace directly.  Other callers like iomap_page_mkwrite work on a
single page or iomap_fiemap_actor do not allocate memory based on the
given len.

Fixes: 68a9f5e7 ("xfs: implement iomap based buffered write path")
Link: http://lkml.kernel.org/r/20170201092706.9966-2-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d1908f52

base/memory, hotplug: fix a kernel oops in show_valid_zones() · a96dfddb

Toshi Kani authored Feb 03, 2017

Reading a sysfs "memoryN/valid_zones" file leads to the following oops
when the first page of a range is not backed by struct page.
show_valid_zones() assumes that 'start_pfn' is always valid for
page_zone().

 BUG: unable to handle kernel paging request at ffffea017a000000
 IP: show_valid_zones+0x6f/0x160

This issue may happen on x86-64 systems with 64GiB or more memory since
their memory block size is bumped up to 2GiB.  [1] An example of such
systems is desribed below.  0x3240000000 is only aligned by 1GiB and
this memory block starts from 0x3200000000, which is not backed by
struct page.

 BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable

Since test_pages_in_a_zone() already checks holes, fix this issue by
extending this function to return 'valid_start' and 'valid_end' for a
given range.  show_valid_zones() then proceeds with the valid range.

[1] 'Commit bdee237c ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.comSigned-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>	[4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a96dfddb

mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone() · deb88a2a

Toshi Kani authored Feb 03, 2017

Patch series "fix a kernel oops when reading sysfs valid_zones", v2.

A sysfs memory file is created for each 2GiB memory block on x86-64 when
the system has 64GiB or more memory.  [1] When the start address of a
memory block is not backed by struct page, i.e.  a memory range is not
aligned by 2GiB, reading its 'valid_zones' attribute file leads to a
kernel oops.  This issue was observed on multiple x86-64 systems with
more than 64GiB of memory.  This patch-set fixes this issue.

Patch 1 first fixes an issue in test_pages_in_a_zone(), which does not
test the start section.

Patch 2 then fixes the kernel oops by extending test_pages_in_a_zone()
to return valid [start, end).

Note for stable kernels: The memory block size change was made by commit
bdee237c ("x86: mm: Use 2GB memory block size on large-memory x86-64
systems"), which was accepted to 3.9.  However, this patch-set depends
on (and fixes) the change to test_pages_in_a_zone() made by commit
5f0f2887 ("mm/memory_hotplug.c: check for missing sections in
test_pages_in_a_zone()"), which was accepted to 4.4.

So, I recommend that we backport it up to 4.4.

[1] 'Commit bdee237c ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

This patch (of 2):

test_pages_in_a_zone() does not check 'start_pfn' when it is aligned by
section since 'sec_end_pfn' is set equal to 'pfn'.  Since this function
is called for testing the range of a sysfs memory file, 'start_pfn' is
always aligned by section.

Fix it by properly setting 'sec_end_pfn' to the next section pfn.

Also make sure that this function returns 1 only when the range belongs
to a zone.

Link: http://lkml.kernel.org/r/20170127222149.30893-2-toshi.kani@hpe.comSigned-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: <stable@vger.kernel.org>	[4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

deb88a2a

jump label: pass kbuild_cflags when checking for asm goto support · 35f860f9

David Lin authored Feb 03, 2017

Some versions of ARM GCC compiler such as Android toolchain throws in a
'-fpic' flag by default.  This causes the gcc-goto check script to fail
although some config would have '-fno-pic' flag in the KBUILD_CFLAGS.

This patch passes the KBUILD_CFLAGS to the check script so that the
script does not rely on the default config from different compilers.

Link: http://lkml.kernel.org/r/20170120234329.78868-1-dtwlin@google.comSigned-off-by: David Lin <dtwlin@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Michal Marek <mmarek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

35f860f9

shmem: fix sleeping from atomic context · 253fd0f0

Kirill A. Shutemov authored Feb 03, 2017

Syzkaller fuzzer managed to trigger this:

    BUG: sleeping function called from invalid context at mm/shmem.c:852
    in_atomic(): 1, irqs_disabled(): 0, pid: 529, name: khugepaged
    3 locks held by khugepaged/529:
     #0:  (shrinker_rwsem){++++..}, at: [<ffffffff818d7ef1>] shrink_slab.part.59+0x121/0xd30 mm/vmscan.c:451
     #1:  (&type->s_umount_key#29){++++..}, at: [<ffffffff81a63630>] trylock_super+0x20/0x100 fs/super.c:392
     #2:  (&(&sbinfo->shrinklist_lock)->rlock){+.+.-.}, at: [<ffffffff818fd83e>] spin_lock include/linux/spinlock.h:302 [inline]
     #2:  (&(&sbinfo->shrinklist_lock)->rlock){+.+.-.}, at: [<ffffffff818fd83e>] shmem_unused_huge_shrink+0x28e/0x1490 mm/shmem.c:427
    CPU: 2 PID: 529 Comm: khugepaged Not tainted 4.10.0-rc5+ #201
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
       shmem_undo_range+0xb20/0x2710 mm/shmem.c:852
       shmem_truncate_range+0x27/0xa0 mm/shmem.c:939
       shmem_evict_inode+0x35f/0xca0 mm/shmem.c:1030
       evict+0x46e/0x980 fs/inode.c:553
       iput_final fs/inode.c:1515 [inline]
       iput+0x589/0xb20 fs/inode.c:1542
       shmem_unused_huge_shrink+0xbad/0x1490 mm/shmem.c:446
       shmem_unused_huge_scan+0x10c/0x170 mm/shmem.c:512
       super_cache_scan+0x376/0x450 fs/super.c:106
       do_shrink_slab mm/vmscan.c:378 [inline]
       shrink_slab.part.59+0x543/0xd30 mm/vmscan.c:481
       shrink_slab mm/vmscan.c:2592 [inline]
       shrink_node+0x2c7/0x870 mm/vmscan.c:2592
       shrink_zones mm/vmscan.c:2734 [inline]
       do_try_to_free_pages+0x369/0xc80 mm/vmscan.c:2776
       try_to_free_pages+0x3c6/0x900 mm/vmscan.c:2982
       __perform_reclaim mm/page_alloc.c:3301 [inline]
       __alloc_pages_direct_reclaim mm/page_alloc.c:3322 [inline]
       __alloc_pages_slowpath+0xa24/0x1c30 mm/page_alloc.c:3683
       __alloc_pages_nodemask+0x544/0xae0 mm/page_alloc.c:3848
       __alloc_pages include/linux/gfp.h:426 [inline]
       __alloc_pages_node include/linux/gfp.h:439 [inline]
       khugepaged_alloc_page+0xc2/0x1b0 mm/khugepaged.c:750
       collapse_huge_page+0x182/0x1fe0 mm/khugepaged.c:955
       khugepaged_scan_pmd+0xfdf/0x12a0 mm/khugepaged.c:1208
       khugepaged_scan_mm_slot mm/khugepaged.c:1727 [inline]
       khugepaged_do_scan mm/khugepaged.c:1808 [inline]
       khugepaged+0xe9b/0x1590 mm/khugepaged.c:1853
       kthread+0x326/0x3f0 kernel/kthread.c:227
       ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430

The iput() from atomic context was a bad idea: if after igrab() somebody
else calls iput() and we left with the last inode reference, our iput()
would lead to inode eviction and therefore sleeping.

This patch should fix the situation.

Link: http://lkml.kernel.org/r/20170131093141.GA15899@node.shutemov.nameSigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

253fd0f0

kasan: respect /proc/sys/kernel/traceoff_on_warning · 4f40c6e5

Peter Zijlstra authored Feb 03, 2017

After much waiting I finally reproduced a KASAN issue, only to find my
trace-buffer empty of useful information because it got spooled out :/

Make kasan_report honour the /proc/sys/kernel/traceoff_on_warning
interface.

Link: http://lkml.kernel.org/r/20170125164106.3514-1-aryabinin@virtuozzo.comSigned-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4f40c6e5

zswap: disable changing params if init fails · d7b028f5

Dan Streetman authored Feb 03, 2017

Add zswap_init_failed bool that prevents changing any of the module
params, if init_zswap() fails, and set zswap_enabled to false.  Change
'enabled' param to a callback, and check zswap_init_failed before
allowing any change to 'enabled', 'zpool', or 'compressor' params.

Any driver that is built-in to the kernel will not be unloaded if its
init function returns error, and its module params remain accessible for
users to change via sysfs.  Since zswap uses param callbacks, which
assume that zswap has been initialized, changing the zswap params after
a failed initialization will result in WARNING due to the param
callbacks expecting a pool to already exist.  This prevents that by
immediately exiting any of the param callbacks if initialization failed.

This was reported here:
  https://marc.info/?l=linux-mm&m=147004228125528&w=4

And fixes this WARNING:
  [  429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60

The warning is just noise, and not serious.  However, when init fails,
zswap frees all its percpu dstmem pages and its kmem cache.  The kmem
cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
the percpu dstmem pages are definitely a problem, as they're used as
temporary buffer for compressed pages before copying into place in the
zpool.

If the user does get zswap enabled after an init failure, then zswap
will likely Oops on the first page it tries to compress (or worse, start
corrupting memory).

Fixes: 90b0fc26 ("zswap: change zpool/compressor at runtime")
Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.orgSigned-off-by: Dan Streetman <dan.streetman@canonical.com>
Reported-by: Marcin Miroslaw <marcin@mejor.pl>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d7b028f5

Merge tag 'regulator-fix-v4.10-rc6' of... · 3f67790d

Linus Torvalds authored Feb 03, 2017

Merge tag 'regulator-fix-v4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "Three changes here: two run of the mill driver specific fixes and a
  change from Mark Rutland which reverts some new device specific ACPI
  binding code which was added during the merge window as there are
  concerns about this sending the wrong signal about usage of regulators
  in ACPI systems"

* tag 'regulator-fix-v4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: fixed: Revert support for ACPI interface
  regulator: axp20x: AXP806: Fix dcdcb being set instead of dcdce
  regulator: twl6030: fix range comparison, allowing vsel = 59

3f67790d

MAINTAINERS: update email address for Amit Shah · 79134d11

Amit Shah authored Feb 03, 2017

I'm leaving my job at Red Hat, this email address will stop working next week.
Update it to one that I will have access to later.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

79134d11

vhost: fix initialization for vq->is_le · cda8bba0

Halil Pasic authored Jan 30, 2017

Currently, under certain circumstances vhost_init_is_le does just a part
of the initialization job, and depends on vhost_reset_is_le being called
too. For this reason vhost_vq_init_access used to call vhost_reset_is_le
when vq->private_data is NULL. This is not only counter intuitive, but
also real a problem because it breaks vhost_net. The bug was introduced to
vhost_net with commit 2751c988 ("vhost: cross-endian support for
legacy devices"). The symptom is corruption of the vq's used.idx field
(virtio) after VHOST_NET_SET_BACKEND was issued as a part of the vhost
shutdown on a vq with pending descriptors.

Let us make sure the outcome of vhost_init_is_le never depend on the state
it is actually supposed to initialize, and fix virtio_net by removing the
reset from vhost_vq_init_access.

With the above, there is no reason for vhost_reset_is_le to do just half
of the job. Let us make vhost_reset_is_le reinitialize is_le.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reported-by: Michael A. Tebolt <miket@us.ibm.com>
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Fixes: commit 2751c988 ("vhost: cross-endian support for legacy devices")
Cc: <stable@vger.kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Michael A. Tebolt <miket@us.ibm.com>

cda8bba0