Commits · ede6507d67e9f10a8df7f96ed0176d639cd25beb · Kirill Smelkov / linux

10 Nov, 2017 21 commits

dm cache: remove usused deferred_cells member from struct cache · ede6507d
Joe Thornber authored Nov 09, 2017
```
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
```
ede6507d

dm cache policy smq: allocate cache blocks in order · 9768a10d

Joe Thornber authored Nov 09, 2017

Previously, cache blocks were being allocated in reverse order.  Fix
this by pulling the block off the head of the free list.

Shouldn't have any impact on performance or latency but it is more
correct to have the cache blocks allocated/mapped in ascending order.
This fix will slightly increase the chances of two adjacent oblocks
being in adjacent cblocks.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

9768a10d

dm cache policy smq: change max background work from 10240 to 4096 blocks · 8ee18ede

Joe Thornber authored Nov 09, 2017

10240 blocks was too much, lowering this reduces the latency of copying
and consumes less memory.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

8ee18ede

dm cache background tracker: limit amount of background work that may be issued at once · 64748b16

Joe Thornber authored Nov 08, 2017

On large systems the cache policy can be over enthusiastic and queue far
too much dirty data to be written back.  This consumes memory.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

64748b16

dm cache policy smq: take origin idle status into account when queuing writebacks · deb71918

Joe Thornber authored Nov 08, 2017

If the origin device is idle try and writeback more data.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

deb71918

dm cache policy smq: handle races with queuing background_work · 1e72a8e8

Joe Thornber authored Nov 08, 2017

The background_tracker holds a set of promotions/demotions that the
cache policy wishes the core target to implement.

When adding a new operation to the tracker it's possible that an
operation on the same block is already present (but in practise this
doesn't appear to be happening).  Catch these situations and do the
appropriate cleanup.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

1e72a8e8

dm raid: fix panic when attempting to force a raid to sync · 23397844

Heinz Mauelshagen authored Nov 02, 2017

Requesting a sync on an active raid device via a table reload
(see 'sync' parameter in Documentation/device-mapper/dm-raid.txt)
skips the super_load() call that defines the superblock size
(rdev->sb_size) -- resulting in an oops if/when super_sync()->memset()
is called.

Fix by moving the initialization of the superblock start and size
out of super_load() to the caller (analyse_superblocks).
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

23397844

dm integrity: allow unaligned bv_offset · 95b1369a

Mikulas Patocka authored Nov 07, 2017

When slub_debug is enabled kmalloc returns unaligned memory. XFS uses
this unaligned memory for its buffers (if an unaligned buffer crosses a
page, XFS frees it and allocates a full page instead - see the function
xfs_buf_allocate_memory).

dm-integrity checks if bv_offset is aligned on page size and this check
fail with slub_debug and XFS.

Fix this bug by removing the bv_offset check, leaving only the check for
bv_len.

Fixes: 7eada909 ("dm: add integrity target")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: Bruno Prémont <bonbons@sysophe.eu>
Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

95b1369a

dm crypt: allow unaligned bv_offset · 0440d5c0

Mikulas Patocka authored Nov 07, 2017

When slub_debug is enabled kmalloc returns unaligned memory. XFS uses
this unaligned memory for its buffers (if an unaligned buffer crosses a
page, XFS frees it and allocates a full page instead - see the function
xfs_buf_allocate_memory).

dm-crypt checks if bv_offset is aligned on page size and these checks
fail with slub_debug and XFS.

Fix this bug by removing the bv_offset checks. Switch to checking if
bv_len is aligned instead of bv_offset (this check should be sufficient
to prevent overruns if a bio with too small bv_len is received).

Fixes: 8f0009a2 ("dm crypt: optionally support larger encryption sector size")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: Bruno Prémont <bonbons@sysophe.eu>
Tested-by: Bruno Prémont <bonbons@sysophe.eu>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

0440d5c0

dm: small cleanup in dm_get_md() · 49de5769

Mike Snitzer authored Nov 06, 2017

Makes dm_get_md() and dm_get_from_kobject() have similar code.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

49de5769

dm: fix race between dm_get_from_kobject() and __dm_destroy() · b9a41d21

Hou Tao authored Nov 01, 2017

The following BUG_ON was hit when testing repeat creation and removal of
DM devices:

    kernel BUG at drivers/md/dm.c:2919!
    CPU: 7 PID: 750 Comm: systemd-udevd Not tainted 4.1.44
    Call Trace:
     [<ffffffff81649e8b>] dm_get_from_kobject+0x34/0x3a
     [<ffffffff81650ef1>] dm_attr_show+0x2b/0x5e
     [<ffffffff817b46d1>] ? mutex_lock+0x26/0x44
     [<ffffffff811df7f5>] sysfs_kf_seq_show+0x83/0xcf
     [<ffffffff811de257>] kernfs_seq_show+0x23/0x25
     [<ffffffff81199118>] seq_read+0x16f/0x325
     [<ffffffff811de994>] kernfs_fop_read+0x3a/0x13f
     [<ffffffff8117b625>] __vfs_read+0x26/0x9d
     [<ffffffff8130eb59>] ? security_file_permission+0x3c/0x44
     [<ffffffff8117bdb8>] ? rw_verify_area+0x83/0xd9
     [<ffffffff8117be9d>] vfs_read+0x8f/0xcf
     [<ffffffff81193e34>] ? __fdget_pos+0x12/0x41
     [<ffffffff8117c686>] SyS_read+0x4b/0x76
     [<ffffffff817b606e>] system_call_fastpath+0x12/0x71

The bug can be easily triggered, if an extra delay (e.g. 10ms) is added
between the test of DMF_FREEING & DMF_DELETING and dm_get() in
dm_get_from_kobject().

To fix it, we need to ensure the test of DMF_FREEING & DMF_DELETING and
dm_get() are done in an atomic way, so _minor_lock is used.

The other callers of dm_get() have also been checked to be OK: some
callers invoke dm_get() under _minor_lock, some callers invoke it under
_hash_lock, and dm_start_request() invoke it after increasing
md->open_count.

Cc: stable@vger.kernel.org
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

b9a41d21

dm: allocate struct mapped_device with kvzalloc · 856eb091

Mikulas Patocka authored Oct 31, 2017

The structure srcu_struct can be very big, its size is proportional to the
value CONFIG_NR_CPUS. The Fedora kernel has CONFIG_NR_CPUS 8192, the field
io_barrier in the struct mapped_device has 84kB in the debugging kernel
and 50kB in the non-debugging kernel. The large size may result in failure
of the function kzalloc_node.

In order to avoid the allocation failure, we use the function
kvzalloc_node, this function falls back to vmalloc if a large contiguous
chunk of memory is not available. This patch also moves the field
io_barrier to the last position of struct mapped_device - the reason is
that on many processor architectures, short memory offsets result in
smaller code than long memory offsets - on x86-64 it reduces code size by
320 bytes.

Note to stable kernel maintainers - the kernels 4.11 and older don't have
the function kvzalloc_node, you can use the function vzalloc_node instead.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

856eb091

dm zoned: ignore last smaller runt zone · 114e0259

Damien Le Moal authored Oct 28, 2017

The SCSI layer allows ZBC drives to have a smaller last runt zone. For
such a device, specifying the entire capacity for a dm-zoned target
table entry fails because the specified capacity is not aligned on a
device zone size indicated in the request queue structure of the
device.

Fix this problem by ignoring the last runt zone in the entry length
when seting up the dm-zoned target (ctr method) and when iterating table
entries of the target (iterate_devices method). This allows dm-zoned
users to still easily setup a target using the entire device capacity
(as mandated by dm-zoned) or the aligned capacity excluding the last
runt zone.

While at it, replace direct references to the device queue chunk_sectors
limit with calls to the accessor blk_queue_zone_sectors().
Reported-by: Peter Desnoyers <pjd@ccs.neu.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

114e0259

dm space map metadata: use ARRAY_SIZE · fbc61291

Jérémy Lefaure authored Oct 01, 2017

Using the ARRAY_SIZE macro improves the readability of the code.

Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
 (sizeof(E)@p /sizeof(*E))
|
 (sizeof(E)@p /sizeof(E[...]))
|
 (sizeof(E)@p /sizeof(T))
)
Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

fbc61291

dm log writes: add support for DAX · 98d82f48

Ross Zwisler authored Oct 19, 2017

Now that we have the ability log filesystem writes using a flat buffer, add
support for DAX.

The motivation for this support is the need for an xfstest that can test
the new MAP_SYNC DAX flag.  By logging the filesystem activity with
dm-log-writes we can show that the MAP_SYNC page faults are writing out
their metadata as they happen, instead of requiring an explicit
msync/fsync.

Unfortunately we can't easily track data that has been written via
mmap() now that the dax_flush() abstraction was removed by commit
c3ca015f ("dax: remove the pmem_dax_ops->flush abstraction").
Otherwise we could just treat each flush as a big write, and store the
data that is being synced to media.  It may be worthwhile to add the
dax_flush() entry point back, just as a notifier so we can do this
logging.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

98d82f48

dm log writes: add support for inline data buffers · e5a20660

Ross Zwisler authored Oct 19, 2017

Currently dm-log-writes supports writing filesystem data via BIOs, and
writing internal metadata from a flat buffer via write_metadata().

For DAX writes, though, we won't have a BIO, but will instead have an
iterator that we'll want to use to fill a flat data buffer.

So, create write_inline_data() which allows us to write filesystem data
using a flat buffer as a source, and wire it up in log_one_block().
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

e5a20660

dm cache: simplify get_per_bio_data() by removing data_size argument · 693b960e

Mike Snitzer authored Oct 19, 2017

There is only one per_bio_data size now that writethrough-specific data
was removed from the per_bio_data structure.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

693b960e

dm cache: remove all obsolete writethrough-specific code · 9958f1d9

Mike Snitzer authored Oct 19, 2017

Now that the writethrough code is much simpler there is no need to track
so much state or cascade bio submission (as was done, via
writethrough_endio(), to issue origin then cache IO in series).

As such the obsolete writethrough list and workqueue is also removed.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

9958f1d9

dm cache: submit writethrough writes in parallel to origin and cache · 2df3bae9

Mike Snitzer authored Oct 19, 2017

Discontinue issuing writethrough write IO in series to the origin and
then cache.

Use bio_clone_fast() to create a new origin clone bio that will be
mapped to the origin device and then bio_chain() it to the bio that gets
remapped to the cache device.  The origin clone bio does _not_ have a
copy of the per_bio_data -- as such check_if_tick_bio_needed() will not
be called.

The cache bio (parent bio) will not complete until the origin bio has
completed -- this fulfills bio_clone_fast()'s requirements as well as
the requirement to not complete the original IO until the write IO has
completed to both the origin and cache device.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

2df3bae9

dm cache: pass cache structure to mode functions · 8e3c3827

Mike Snitzer authored Oct 19, 2017

No functional changes, just a bit cleaner than passing cache_features
structure.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

8e3c3827

dm cache: fix race condition in the writeback mode overwrite_bio optimisation · d1260e2a

Joe Thornber authored Nov 10, 2017

When a DM cache in writeback mode moves data between the slow and fast
device it can often avoid a copy if the triggering bio either:

i) covers the whole block (no point copying if we're about to overwrite it)
ii) the migration is a promotion and the origin block is currently discarded

Prior to this fix there was a race with case (ii).  The discard status
was checked with a shared lock held (rather than exclusive).  This meant
another bio could run in parallel and write data to the origin, removing
the discard state.  After the promotion the parallel write would have
been lost.

With this fix the discard status is re-checked once the exclusive lock
has been aquired.  If the block is no longer discarded it falls back to
the slower full copy path.

Fixes: b29d4986 ("dm cache: significant rework to leverage dm-bio-prison-v2")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

d1260e2a

24 Oct, 2017 3 commits

dm cache: convert dm_cache_metadata.ref_count from atomic_t to refcount_t · 6bdd0796

Elena Reshetova authored Oct 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable dm_cache_metadata.ref_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

6bdd0796

dm: convert table_device.count from atomic_t to refcount_t · b0b4d7c6

Elena Reshetova authored Oct 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable table_device.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

b0b4d7c6

dm: convert dm_dev_internal.count from atomic_t to refcount_t · 2a0b4682

Elena Reshetova authored Oct 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable dm_dev_internal.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

2a0b4682

23 Oct, 2017 4 commits

Linux 4.14-rc6 · bb176f67
Linus Torvalds authored Oct 23, 2017

bb176f67

Merge tag 'staging-4.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · dd9d064e

Linus Torvalds authored Oct 23, 2017

Pull staging and IIO fixes from Greg KH:
 "Here are a small number of patches to resolve some reported IIO and a
  staging driver problem. Nothing major here, full details are in the
  shortlog below.

  All have been in linux-next with no reported issues"

* tag 'staging-4.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: bcm2835-audio: Fix memory corruption
  iio: adc: at91-sama5d2_adc: fix probe error on missing trigger property
  iio: adc: dln2-adc: fix build error
  iio: dummy: events: Add missing break
  staging: iio: ade7759: fix signed extension bug on shift of a u8
  iio: pressure: zpa2326: Remove always-true check which confuses gcc
  iio: proximity: as3935: noise detection + threshold changes

dd9d064e

Merge tag 'char-misc-4.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 17e7637f

Linus Torvalds authored Oct 23, 2017

Pull char/misc driver fixes from Greg KH:
 "Here are four small fixes for 4.14-rc6.

  Three of them are binder driver fixes for reported issues, and the
  last one is a hyperv driver bugfix. Nothing major, but good fixes to
  get into 4.14-final.

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-4.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  android: binder: Fix null ptr dereference in debug msg
  android: binder: Don't get mm from task
  vmbus: hvsock: add proper sync for vmbus_hvsock_device_unregister()
  binder: call poll_wait() unconditionally.

17e7637f

Merge tag 'usb-4.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 58059921

Linus Torvalds authored Oct 23, 2017

Pull USB/PHY fixes from Greg KH:
 "Here are a small number of USB and PHY driver fixes for 4.14-rc6

  There is the usual musb and xhci fixes in here, as well as some needed
  phy patches. Also is a nasty regression fix for usbfs that has started
  to hit a lot of people using virtual machines.

  All of these have been in linux-next with no reported problems"

* tag 'usb-4.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
  usb: hub: Allow reset retry for USB2 devices on connect bounce
  USB: core: fix out-of-bounds access bug in usb_get_bos_descriptor()
  MAINTAINERS: fix git tree url for musb module
  usb: quirks: add quirk for WORLDE MINI MIDI keyboard
  usb: musb: sunxi: Explicitly release USB PHY on exit
  usb: musb: Check for host-mode using is_host_active() on reset interrupt
  usb: musb: musb_cppi41: Configure the number of channels for DA8xx
  usb: musb: musb_cppi41: Fix cppi41_set_dma_mode() for DA8xx
  usb: musb: musb_cppi41: Fix the address of teardown and autoreq registers
  USB: musb: fix late external abort on suspend
  USB: musb: fix session-bit runtime-PM quirk
  usb: cdc_acm: Add quirk for Elatec TWN3
  USB: devio: Revert "USB: devio: Don't corrupt user memory"
  usb: xhci: Handle error condition in xhci_stop_device()
  usb: xhci: Reset halted endpoint if trb is noop
  xhci: Cleanup current_cmd in xhci_cleanup_command_queue()
  xhci: Identify USB 3.1 capable hosts by their port protocol capability
  USB: serial: metro-usb: add MS7820 device id
  phy: rockchip-typec: Check for errors from tcphy_phy_init()
  phy: rockchip-typec: Don't set the aux voltage swing to 400 mV
  ...

58059921

22 Oct, 2017 12 commits

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 02982f85

Linus Torvalds authored Oct 22, 2017

Pull input fix from Dmitry Torokhov:
 "A fix for a broken commit in the previous pull breaking automatic
  module loading of input handlers, such ad evdev"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: do not use property bits when generating module alias

02982f85

Input: do not use property bits when generating module alias · 09c3e01b

Dmitry Torokhov authored Oct 22, 2017

The commit 8724ecb0 ("Input: allow matching device IDs on property
bits") started using property bits when generating module aliases for input
handlers, but did not adjust the generation of MODALIAS attribute on input
device uevents, breaking automatic module loading. Given that no handler
currently uses property bits in their module tables, let's revert this part
of the commit for now.
Reported-by: Damien Wyart <damien.wyart@gmail.com>
Tested-by: Damien Wyart <damien.wyart@gmail.com>
Fixes: 8724ecb0 ("Input: allow matching device IDs on property bits")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

09c3e01b

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 936fd005

Linus Torvalds authored Oct 22, 2017

Pull x86 fixes from Thomas Gleixner:
 "A couple of fixes addressing the following issues:

   - The last polishing for the TLB code, removing the last BUG_ON() and
     the debug file along with tidying up the lazy TLB code.

   - Prevent triple fault on 1st Gen. 486 caused by stupidly calling the
     early IDT setup after the first function which causes a fault which
     should be caught by the exception table.

   - Limit the mmap of /dev/mem to valid addresses

   - Prevent late microcode loading on Broadwell X

   - Remove a redundant assignment in the cache info code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Limit mmap() of /dev/mem to valid physical addresses
  x86/mm: Remove debug/x86/tlb_defer_switch_to_init_mm
  x86/mm: Tidy up "x86/mm: Flush more aggressively in lazy TLB mode"
  x86/mm/64: Remove the last VM_BUG_ON() from the TLB code
  x86/microcode/intel: Disable late loading on model 79
  x86/idt: Initialize early IDT before cr4_init_shadow()
  x86/cpu/intel_cacheinfo: Remove redundant assignment to 'this_leaf'

936fd005

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9e415a8e

Linus Torvalds authored Oct 22, 2017

Pull timer fix from Thomas Gleixner:
 "A single fix to make the cs5535 clock event driver robust agaist
  spurious interrupts"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevents/drivers/cs5535: Improve resilience to spurious interrupts

9e415a8e

Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5670a847

Linus Torvalds authored Oct 22, 2017

Pull smp/hotplug fix from Thomas Gleixner:
 "The recent rework of the callback invocation missed to cleanup the
  leftovers of the operation, so under certain circumstances a
  subsequent CPU hotplug operation accesses stale data and crashes.
  Clean it up."

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu/hotplug: Reset node state after operation

5670a847

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 085cf9bf

Linus Torvalds authored Oct 22, 2017

Pull perf fixes from Thomas Gleixner:
 "A series of fixes for perf tooling:

   - Make xyarray return the X/Y size correctly which fixes a crash in
     the exit code.

   - Fix the libc path in test so it works not only on Debian/Ubuntu
     correctly

   - Check for eBPF file existance and output a useful error message
     instead of failing to compile a non existant file

   - Make sure perf_hpp_fmt is not longer references before freeing it

   - Use list_del_init() in the histogram code to prevent a crash when
     the already deleted element is deleted again

   - Remove the leftovers of the removed '-l' option

   - Add reviewer entries to the MAINTAINERS file"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf test shell trace+probe_libc_inet_pton.sh: Be compatible with Debian/Ubuntu
  perf xyarray: Fix wrong processing when closing evsel fd
  perf buildid-list: Fix crash when processing PERF_RECORD_NAMESPACE
  perf record: Fix documentation for a inexistent option '-l'
  perf tools: Add long time reviewers to MAINTAINERS
  perf tools: Check wether the eBPF file exists in event parsing
  perf hists: Add extra integrity checks to fmt_free()
  perf hists: Fix crash in perf_hpp__reset_output_field()

085cf9bf

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4f184d7d

Linus Torvalds authored Oct 22, 2017

Pull irq fixes from Thomas Gleixner:
 "A set of small fixes mostly in the irq drivers area:

   - Make the tango irq chip work correctly, which requires a new
     function in the generiq irq chip implementation

   - A set of updates to the GIC-V3 ITS driver removing a bogus BUG_ON()
     and parsing the VCPU table size correctly"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: generic chip: remove irq_gc_mask_disable_reg_and_ack()
  irqchip/tango: Use irq_gc_mask_disable_and_ack_set
  genirq: generic chip: Add irq_gc_mask_disable_and_ack_set()
  irqchip/gic-v3-its: Add missing changes to support 52bit physical address
  irqchip/gic-v3-its: Fix the incorrect parsing of VCPU table size
  irqchip/gic-v3-its: Fix the incorrect BUG_ON in its_init_vpe_domain()
  DT: arm,gic-v3: Update the ITS size in the examples

4f184d7d

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b8d389e8

Linus Torvalds authored Oct 22, 2017

Pull objtool fix from Thomas Gleixner:
 "Plug a memory leak in the instruction decoder"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix memory leak in decode_instructions()

b8d389e8

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · b5ac3beb

Linus Torvalds authored Oct 21, 2017

Pull networking fixes from David Miller:
 "A little more than usual this time around. Been travelling, so that is
  part of it.

  Anyways, here are the highlights:

   1) Deal with memcontrol races wrt. listener dismantle, from Eric
      Dumazet.

   2) Handle page allocation failures properly in nfp driver, from Jaku
      Kicinski.

   3) Fix memory leaks in macsec, from Sabrina Dubroca.

   4) Fix crashes in pppol2tp_session_ioctl(), from Guillaume Nault.

   5) Several fixes in bnxt_en driver, including preventing potential
      NVRAM parameter corruption from Michael Chan.

   6) Fix for KRACK attacks in wireless, from Johannes Berg.

   7) rtnetlink event generation fixes from Xin Long.

   8) Deadlock in mlxsw driver, from Ido Schimmel.

   9) Disallow arithmetic operations on context pointers in bpf, from
      Jakub Kicinski.

  10) Missing sock_owned_by_user() check in sctp_icmp_redirect(), from
      Xin Long.

  11) Only TCP is supported for sockmap, make that explicit with a
      check, from John Fastabend.

  12) Fix IP options state races in DCCP and TCP, from Eric Dumazet.

  13) Fix panic in packet_getsockopt(), also from Eric Dumazet.

  14) Add missing locked in hv_sock layer, from Dexuan Cui.

  15) Various aquantia bug fixes, including several statistics handling
      cures. From Igor Russkikh et al.

  16) Fix arithmetic overflow in devmap code, from John Fastabend.

  17) Fix busted socket memory accounting when we get a fault in the tcp
      zero copy paths. From Willem de Bruijn.

  18) Don't leave opt->tot_len uninitialized in ipv6, from Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  stmmac: Don't access tx_q->dirty_tx before netif_tx_lock
  ipv6: flowlabel: do not leave opt->tot_len with garbage
  of_mdio: Fix broken PHY IRQ in case of probe deferral
  textsearch: fix typos in library helpers
  rxrpc: Don't release call mutex on error pointer
  net: stmmac: Prevent infinite loop in get_rx_timestamp_status()
  net: stmmac: Fix stmmac_get_rx_hwtstamp()
  net: stmmac: Add missing call to dev_kfree_skb()
  mlxsw: spectrum_router: Configure TIGCR on init
  mlxsw: reg: Add Tunneling IPinIP General Configuration Register
  net: ethtool: remove error check for legacy setting transceiver type
  soreuseport: fix initialization race
  net: bridge: fix returning of vlan range op errors
  sock: correct sk_wmem_queued accounting on efault in tcp zerocopy
  bpf: add test cases to bpf selftests to cover all access tests
  bpf: fix pattern matches for direct packet access
  bpf: fix off by one for range markings with L{T, E} patterns
  bpf: devmap fix arithmetic overflow in bitmap_size calculation
  net: aquantia: Bad udp rate on default interrupt coalescing
  net: aquantia: Enable coalescing management via ethtool interface
  ...

b5ac3beb

stmmac: Don't access tx_q->dirty_tx before netif_tx_lock · 8d5f4b07

Bernd Edlinger authored Oct 21, 2017

This is the possible reason for different hard to reproduce
problems on my ARMv7-SMP test system.

The symptoms are in recent kernels imprecise external aborts,
and in older kernels various kinds of network stalls and
unexpected page allocation failures.

My testing indicates that the trouble started between v4.5 and v4.6
and prevails up to v4.14.

Using the dirty_tx before acquiring the spin lock is clearly
wrong and was first introduced with v4.6.

Fixes: e3ad57c9 ("stmmac: review RX/TX ring management")
Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d5f4b07

ipv6: flowlabel: do not leave opt->tot_len with garbage · 864e2a1f

Eric Dumazet authored Oct 21, 2017

When syzkaller team brought us a C repro for the crash [1] that
had been reported many times in the past, I finally could find
the root cause.

If FlowLabel info is merged by fl6_merge_options(), we leave
part of the opt_space storage provided by udp/raw/l2tp with random value
in opt_space.tot_len, unless a control message was provided at sendmsg()
time.

Then ip6_setup_cork() would use this random value to perform a kzalloc()
call. Undefined behavior and crashes.

Fix is to properly set tot_len in fl6_merge_options()

At the same time, we can also avoid consuming memory and cpu cycles
to clear it, if every option is copied via a kmemdup(). This is the
change in ip6_setup_cork().

[1]
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801cb64a100 task.stack: ffff8801cc350000
RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168
RSP: 0018:ffff8801cc357550 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010
RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014
RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10
R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0
R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0
FS:  00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0
DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
 ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729
 udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340
 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x358/0x5a0 net/socket.c:1750
 SyS_sendto+0x40/0x50 net/socket.c:1718
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4520a9
RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9
RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016
RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee
R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029
Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

864e2a1f

of_mdio: Fix broken PHY IRQ in case of probe deferral · 66bdede4

Geert Uytterhoeven authored Oct 18, 2017

If an Ethernet PHY is initialized before the interrupt controller it is
connected to, a message like the following is printed:

    irq: no irq domain found for /interrupt-controller@e61c0000 !

However, the actual error is ignored, leading to a non-functional (POLL)
PHY interrupt later:

    Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=POLL)

Depending on whether the PHY driver will fall back to polling, Ethernet
may or may not work.

To fix this:
  1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to
     of_irq_get().
     Unlike the former, the latter returns -EPROBE_DEFER if the
     interrupt controller is not yet available, so this condition can be
     detected.
     Other errors are handled the same as before, i.e. use the passed
     mdio->irq[addr] as interrupt.
  2. Propagate and handle errors from of_mdiobus_register_phy() and
     of_mdiobus_register_device().
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

66bdede4