1. 20 May, 2017 40 commits
    • Martin Brandenburg's avatar
      orangefs: do not check possibly stale size on truncate · 8bd83223
      Martin Brandenburg authored
      commit 53950ef5 upstream.
      
      Let the server figure this out because our size might be out of date or
      not present.
      
      The bug was that
      
      	xfs_io -f -t -c "pread -v 0 100" /mnt/foo
      	echo "Test" > /mnt/foo
      	xfs_io -f -t -c "pread -v 0 100" /mnt/foo
      
      fails because the second truncate did not happen if nothing had
      requested the size after the write in echo.  Thus i_size was zero (not
      present) and the orangefs_setattr though i_size was zero and there was
      nothing to do.
      Signed-off-by: default avatarMartin Brandenburg <martin@omnibond.com>
      Signed-off-by: default avatarMike Marshall <hubcap@omnibond.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8bd83223
    • Martin Brandenburg's avatar
      orangefs: do not set getattr_time on orangefs_lookup · 66f1885b
      Martin Brandenburg authored
      commit 17930b25 upstream.
      
      Since orangefs_lookup calls orangefs_iget which calls
      orangefs_inode_getattr, getattr_time will get set.
      Signed-off-by: default avatarMartin Brandenburg <martin@omnibond.com>
      Signed-off-by: default avatarMike Marshall <hubcap@omnibond.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66f1885b
    • Martin Brandenburg's avatar
      orangefs: clean up oversize xattr validation · a6d3c332
      Martin Brandenburg authored
      commit e675c5ec upstream.
      
      Also don't check flags as this has been validated by the VFS already.
      
      Fix an off-by-one error in the max size checking.
      
      Stop logging just because userspace wants to write attributes which do
      not fit.
      
      This and the previous commit fix xfstests generic/020.
      Signed-off-by: default avatarMartin Brandenburg <martin@omnibond.com>
      Signed-off-by: default avatarMike Marshall <hubcap@omnibond.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6d3c332
    • Martin Brandenburg's avatar
      7adb1503
    • Eric Biggers's avatar
      ext4: evict inline data when writing to memory map · 1567c170
      Eric Biggers authored
      commit 7b4cc978 upstream.
      
      Currently the case of writing via mmap to a file with inline data is not
      handled.  This is maybe a rare case since it requires a writable memory
      map of a very small file, but it is trivial to trigger with on
      inline_data filesystem, and it causes the
      'BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA));' in
      ext4_writepages() to be hit:
      
          mkfs.ext4 -O inline_data /dev/vdb
          mount /dev/vdb /mnt
          xfs_io -f /mnt/file \
      	-c 'pwrite 0 1' \
      	-c 'mmap -w 0 1m' \
      	-c 'mwrite 0 1' \
      	-c 'fsync'
      
      	kernel BUG at fs/ext4/inode.c:2723!
      	invalid opcode: 0000 [#1] SMP
      	CPU: 1 PID: 2532 Comm: xfs_io Not tainted 4.11.0-rc1-xfstests-00301-g071d9acf3d1f #633
      	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
      	task: ffff88003d3a8040 task.stack: ffffc90000300000
      	RIP: 0010:ext4_writepages+0xc89/0xf8a
      	RSP: 0018:ffffc90000303ca0 EFLAGS: 00010283
      	RAX: 0000028410000000 RBX: ffff8800383fa3b0 RCX: ffffffff812afcdc
      	RDX: 00000a9d00000246 RSI: ffffffff81e660e0 RDI: 0000000000000246
      	RBP: ffffc90000303dc0 R08: 0000000000000002 R09: 869618e8f99b4fa5
      	R10: 00000000852287a2 R11: 00000000a03b49f4 R12: ffff88003808e698
      	R13: 0000000000000000 R14: 7fffffffffffffff R15: 7fffffffffffffff
      	FS:  00007fd3e53094c0(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000
      	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      	CR2: 00007fd3e4c51000 CR3: 000000003d554000 CR4: 00000000003406e0
      	Call Trace:
      	 ? _raw_spin_unlock+0x27/0x2a
      	 ? kvm_clock_read+0x1e/0x20
      	 do_writepages+0x23/0x2c
      	 ? do_writepages+0x23/0x2c
      	 __filemap_fdatawrite_range+0x80/0x87
      	 filemap_write_and_wait_range+0x67/0x8c
      	 ext4_sync_file+0x20e/0x472
      	 vfs_fsync_range+0x8e/0x9f
      	 ? syscall_trace_enter+0x25b/0x2d0
      	 vfs_fsync+0x1c/0x1e
      	 do_fsync+0x31/0x4a
      	 SyS_fsync+0x10/0x14
      	 do_syscall_64+0x69/0x131
      	 entry_SYSCALL64_slow_path+0x25/0x25
      
      We could try to be smart and keep the inline data in this case, or at
      least support delayed allocation when allocating the block, but these
      solutions would be more complicated and don't seem worthwhile given how
      rare this case seems to be.  So just fix the bug by calling
      ext4_convert_inline_data() when we're asked to make a page writable, so
      that any inline data gets evicted, with the block allocated immediately.
      Reported-by: default avatarNick Alcock <nick.alcock@oracle.com>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1567c170
    • Jan Kara's avatar
      jbd2: fix dbench4 performance regression for 'nobarrier' mounts · 7621cb79
      Jan Kara authored
      commit 5052b069 upstream.
      
      Commit b685d3d6 "block: treat REQ_FUA and REQ_PREFLUSH as
      synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. Since
      JBD2 strips REQ_FUA and REQ_FLUSH flags from submitted IO when the
      filesystem is mounted with nobarrier mount option, journal superblock
      writes ended up being async writes after this patch and that caused
      heavy performance regression for dbench4 benchmark with high number of
      processes. In my test setup with HP RAID array with non-volatile write
      cache and 32 GB ram, dbench4 runs with 8 processes regressed by ~25%.
      
      Fix the problem by making sure journal superblock writes are always
      treated as synchronous since they generally block progress of the
      journalling machinery and thus the whole filesystem.
      
      Fixes: b685d3d6Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7621cb79
    • Christian Borntraeger's avatar
      perf annotate s390: Implement jump types for perf annotate · 7a17cbb4
      Christian Borntraeger authored
      commit d9f8dfa9 upstream.
      
      Implement simple detection for all kind of jumps and branches.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Cc: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-s390 <linux-s390@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1491465112-45819-3-git-send-email-borntraeger@de.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a17cbb4
    • Christian Borntraeger's avatar
      perf annotate s390: Fix perf annotate error -95 (4.10 regression) · b09eace7
      Christian Borntraeger authored
      commit e77852b3 upstream.
      
      since 4.10 perf annotate exits on s390 with an "unknown error -95".
      Turns out that commit 786c1b51 ("perf annotate: Start supporting
      cross arch annotation") added a hard requirement for architecture
      support when objdump is used but only provided x86 and arm support.
      Meanwhile power was added so lets add s390 as well.
      
      While at it make sure to implement the branch and jump types.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Cc: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-s390 <linux-s390@vger.kernel.org>
      Fixes: 786c1b51 "perf annotate: Start supporting cross arch annotation"
      Link: http://lkml.kernel.org/r/1491465112-45819-2-git-send-email-borntraeger@de.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b09eace7
    • Adrian Hunter's avatar
      perf auxtrace: Fix no_size logic in addr_filter__resolve_kernel_syms() · 08b0b5bd
      Adrian Hunter authored
      commit c3a0bbc7 upstream.
      
      Address filtering with kernel symbols incorrectly resulted in the error
      "Cannot determine size of symbol" because the no_size logic was the wrong
      way around.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarAndi Kleen <ak@linux.intel.com>
      Link: http://lkml.kernel.org/r/1490357752-27942-1-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08b0b5bd
    • Mike Marciniszyn's avatar
      IB/hfi1: Prevent kernel QP post send hard lockups · 2d584522
      Mike Marciniszyn authored
      commit b6eac931 upstream.
      
      The driver progress routines can call cond_resched() when
      a timeslice is exhausted and irqs are enabled.
      
      If the ULP had been holding a spin lock without disabling irqs and
      the post send directly called the progress routine, the cond_resched()
      could yield allowing another thread from the same ULP to deadlock
      on that same lock.
      
      Correct by replacing the current hfi1_do_send() calldown with a unique
      one for post send and adding an argument to hfi1_do_send() to indicate
      that the send engine is running in a thread.   If the routine is not
      running in a thread, avoid calling cond_resched().
      
      Fixes: Commit 831464ce ("IB/hfi1: Don't call cond_resched in atomic mode when sending packets")
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d584522
    • Jack Morgenstein's avatar
      IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level · 7a48c89a
      Jack Morgenstein authored
      commit fb7a9174 upstream.
      
      A warning message during SRIOV multicast cleanup should have actually been
      a debug level message. The condition generating the warning does no harm
      and can fill the message log.
      
      In some cases, during testing, some tests were so intense as to swamp the
      message log with these warning messages, causing a stall in the console
      message log output task. This stall caused an NMI to be sent to all CPUs
      (so that they all dumped their stacks into the message log).
      Aside from the message flood causing an NMI, the tests all passed.
      
      Once the message flood which caused the NMI is removed (by reducing the
      warning message to debug level), the NMI no longer occurs.
      
      Sample message log (console log) output illustrating the flood and
      resultant NMI (snippets with comments and modified with ... instead
      of hex digits, to satisfy checkpatch.pl):
      
       <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
       *** About 4000 almost identical lines in less than one second ***
       <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
       INFO: rcu_sched detected stalls on CPUs/tasks: { 17} (...)
       *** { 17} above indicates that CPU 17 was the one that stalled ***
       sending NMI to all CPUs:
       ...
       NMI backtrace for cpu 17
       CPU: 17 PID: 45909 Comm: kworker/17:2
       Hardware name: HP ProLiant DL360p Gen8, BIOS P71 09/08/2013
       Workqueue: events fb_flashcursor
       task: ffff880478...... ti: ffff88064e...... task.ti: ffff88064e......
       RIP: 0010:[ffffffff81......]  [ffffffff81......] io_serial_in+0x15/0x20
       RSP: 0018:ffff88064e257cb0  EFLAGS: 00000002
       RAX: 0000000000...... RBX: ffffffff81...... RCX: 0000000000......
       RDX: 0000000000...... RSI: 0000000000...... RDI: ffffffff81......
       RBP: ffff88064e...... R08: ffffffff81...... R09: 0000000000......
       R10: 0000000000...... R11: ffff88064e...... R12: 0000000000......
       R13: 0000000000...... R14: ffffffff81...... R15: 0000000000......
       FS:  0000000000......(0000) GS:ffff8804af......(0000) knlGS:000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080......
       CR2: 00007f2a2f...... CR3: 0000000001...... CR4: 0000000000......
       DR0: 0000000000...... DR1: 0000000000...... DR2: 0000000000......
       DR3: 0000000000...... DR6: 00000000ff...... DR7: 0000000000......
       Stack:
       ffff88064e...... ffffffff81...... ffffffff81...... 0000000000......
       ffffffff81...... ffff88064e...... ffffffff81...... ffffffff81......
       ffffffff81...... ffff88064e...... ffffffff81...... 0000000000......
       Call Trace:
      [<ffffffff813d099b>] wait_for_xmitr+0x3b/0xa0
      [<ffffffff813d0b5c>] serial8250_console_putchar+0x1c/0x30
      [<ffffffff813d0b40>] ? serial8250_console_write+0x140/0x140
      [<ffffffff813cb5fa>] uart_console_write+0x3a/0x80
      [<ffffffff813d0aae>] serial8250_console_write+0xae/0x140
      [<ffffffff8107c4d1>] call_console_drivers.constprop.15+0x91/0xf0
      [<ffffffff8107d6cf>] console_unlock+0x3bf/0x400
      [<ffffffff813503cd>] fb_flashcursor+0x5d/0x140
      [<ffffffff81355c30>] ? bit_clear+0x120/0x120
      [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
      [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
      [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
      [<ffffffff810a5aef>] kthread+0xcf/0xe0
      [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
      [<ffffffff81645858>] ret_from_fork+0x58/0x90
      [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
      Code: 48 89 e5 d3 e6 48 63 f6 48 03 77 10 8b 06 5d c3 66 0f 1f 44 00 00 66 66 66 6
      
      As indicated in the stack trace above, the console output task got swamped.
      
      Fixes: b9c5d6a6 ("IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a48c89a
    • Jack Morgenstein's avatar
      IB/mlx4: Fix ib device initialization error flow · 295a7aab
      Jack Morgenstein authored
      commit 99e68909 upstream.
      
      In mlx4_ib_add, procedure mlx4_ib_alloc_eqs is called to allocate EQs.
      
      However, in the mlx4_ib_add error flow, procedure mlx4_ib_free_eqs is not
      called to free the allocated EQs.
      
      Fixes: e605b743 ("IB/mlx4: Increase the number of vectors (EQs) available for ULPs")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      295a7aab
    • Shamir Rabinovitch's avatar
      IB/IPoIB: ibX: failed to create mcg debug file · f98af463
      Shamir Rabinovitch authored
      commit 771a5258 upstream.
      
      When udev renames the netdev devices, ipoib debugfs entries does not
      get renamed. As a result, if subsequent probe of ipoib device reuse the
      name then creating a debugfs entry for the new device would fail.
      
      Also, moved ipoib_create_debug_files and ipoib_delete_debug_files as part
      of ipoib event handling in order to avoid any race condition between these.
      
      Fixes: 1732b0ef ([IPoIB] add path record information in debugfs)
      Signed-off-by: default avatarVijay Kumar <vijay.ac.kumar@oracle.com>
      Signed-off-by: default avatarShamir Rabinovitch <shamir.rabinovitch@oracle.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f98af463
    • Michael J. Ruhl's avatar
      IB/core: For multicast functions, verify that LIDs are multicast LIDs · 88ded1f1
      Michael J. Ruhl authored
      commit 8561eae6 upstream.
      
      The Infiniband spec defines "A multicast address is defined by a
      MGID and a MLID" (section 10.5).  Currently the MLID value is not
      validated.
      
      Add check to verify that the MLID value is in the correct address
      range.
      
      Fixes: 0c33aeed ("[IB] Add checks to multicast attach and detach")
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
      Signed-off-by: default avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88ded1f1
    • Parav Pandit's avatar
      IB/core: Fix kernel crash during fail to initialize device · 82f0fecc
      Parav Pandit authored
      commit 4be3a4fa upstream.
      
      This patch fixes the kernel crash that occurs during ib_dealloc_device()
      called due to provider driver fails with an error after
      ib_alloc_device() and before it can register using ib_register_device().
      
      This crashed seen in tha lab as below which can occur with any IB device
      which fails to perform its device initialization before invoking
      ib_register_device().
      
      This patch avoids touching cache and port immutable structures if device
      is not yet initialized.
      It also releases related memory when cache and port immutable data
      structure initialization fails during register_device() state.
      
      [81416.561946] BUG: unable to handle kernel NULL pointer dereference at (null)
      [81416.570340] IP: ib_cache_release_one+0x29/0x80 [ib_core]
      [81416.576222] PGD 78da66067
      [81416.576223] PUD 7f2d7c067
      [81416.579484] PMD 0
      [81416.582720]
      [81416.587242] Oops: 0000 [#1] SMP
      [81416.722395] task: ffff8807887515c0 task.stack: ffffc900062c0000
      [81416.729148] RIP: 0010:ib_cache_release_one+0x29/0x80 [ib_core]
      [81416.735793] RSP: 0018:ffffc900062c3a90 EFLAGS: 00010202
      [81416.741823] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
      [81416.749785] RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff880859fec000
      [81416.757757] RBP: ffffc900062c3aa0 R08: ffff8808536e5ac0 R09: ffff880859fec5b0
      [81416.765708] R10: 00000000536e5c01 R11: ffff8808536e5ac0 R12: ffff880859fec000
      [81416.773672] R13: 0000000000000000 R14: ffff8808536e5ac0 R15: ffff88084ebc0060
      [81416.781621] FS:  00007fd879fab740(0000) GS:ffff88085fac0000(0000) knlGS:0000000000000000
      [81416.790522] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [81416.797094] CR2: 0000000000000000 CR3: 00000007eb215000 CR4: 00000000003406e0
      [81416.805051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [81416.812997] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [81416.820950] Call Trace:
      [81416.824226]  ib_device_release+0x1e/0x40 [ib_core]
      [81416.829858]  device_release+0x32/0xa0
      [81416.834370]  kobject_cleanup+0x63/0x170
      [81416.839058]  kobject_put+0x25/0x50
      [81416.843319]  ib_dealloc_device+0x25/0x40 [ib_core]
      [81416.848986]  mlx5_ib_add+0x163/0x1990 [mlx5_ib]
      [81416.854414]  mlx5_add_device+0x5a/0x160 [mlx5_core]
      [81416.860191]  mlx5_register_interface+0x8d/0xc0 [mlx5_core]
      [81416.866587]  ? 0xffffffffa09e9000
      [81416.870816]  mlx5_ib_init+0x15/0x17 [mlx5_ib]
      [81416.876094]  do_one_initcall+0x51/0x1b0
      [81416.880861]  ? __vunmap+0x85/0xd0
      [81416.885113]  ? kmem_cache_alloc_trace+0x14b/0x1b0
      [81416.890768]  ? vfree+0x2e/0x70
      [81416.894762]  do_init_module+0x60/0x1fa
      [81416.899441]  load_module+0x15f6/0x1af0
      [81416.904114]  ? __symbol_put+0x60/0x60
      [81416.908709]  ? ima_post_read_file+0x3d/0x80
      [81416.913828]  ? security_kernel_post_read_file+0x6b/0x80
      [81416.920006]  SYSC_finit_module+0xa6/0xf0
      [81416.924888]  SyS_finit_module+0xe/0x10
      [81416.929568]  entry_SYSCALL_64_fastpath+0x1a/0xa9
      [81416.935089] RIP: 0033:0x7fd879494949
      [81416.939543] RSP: 002b:00007ffdbc1b4e58 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
      [81416.947982] RAX: ffffffffffffffda RBX: 0000000001b66f00 RCX: 00007fd879494949
      [81416.955965] RDX: 0000000000000000 RSI: 000000000041a13c RDI: 0000000000000003
      [81416.963926] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000001b652a0
      [81416.971861] R10: 0000000000000003 R11: 0000000000000202 R12: 00007ffdbc1b3e70
      [81416.979763] R13: 00007ffdbc1b3e50 R14: 0000000000000005 R15: 0000000000000000
      [81417.008005] RIP: ib_cache_release_one+0x29/0x80 [ib_core] RSP: ffffc900062c3a90
      [81417.016045] CR2: 0000000000000000
      
      Fixes: 55aeed06 ("IB/core: Make ib_alloc_device init the kobject")
      Fixes: 7738613e ("IB/core: Add per port immutable struct to ib_device")
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82f0fecc
    • Jack Morgenstein's avatar
      IB/core: Fix sysfs registration error flow · 5c7f1dfa
      Jack Morgenstein authored
      commit b312be3d upstream.
      
      The kernel commit cited below restructured ib device management
      so that the device kobject is initialized in ib_alloc_device.
      
      As part of the restructuring, the kobject is now initialized in
      procedure ib_alloc_device, and is later added to the device hierarchy
      in the ib_register_device call stack, in procedure
      ib_device_register_sysfs (which calls device_add).
      
      However, in the ib_device_register_sysfs error flow, if an error
      occurs following the call to device_add, the cleanup procedure
      device_unregister is called. This call results in the device object
      being deleted -- which results in various use-after-free crashes.
      
      The correct cleanup call is device_del -- which undoes device_add
      without deleting the device object.
      
      The device object will then (correctly) be deleted in the
      ib_register_device caller's error cleanup flow, when the caller invokes
      ib_dealloc_device.
      
      Fixes: 55aeed06 ("IB/core: Make ib_alloc_device init the kobject")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c7f1dfa
    • Ding Tianhong's avatar
      iov_iter: don't revert iov buffer if csum error · 09944c26
      Ding Tianhong authored
      commit a6a59932 upstream.
      
      The patch 32786821 (make skb_copy_datagram_msg() et.al. preserve
      ->msg_iter on error) will revert the iov buffer if copy to iter
      failed, but it didn't copy any datagram if the skb_checksum_complete
      error, so no need to revert any data at this place.
      
      v2: Sabrina notice that return -EFAULT when checksum error is not correct
          here, it would confuse the caller about the return value, so fix it.
      
      Fixes: 32786821 ("make skb_copy_datagram_msg() et.al. preserve->msg_iter on error")
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09944c26
    • Alex Williamson's avatar
      vfio/type1: Remove locked page accounting workqueue · 1e0b0a9e
      Alex Williamson authored
      commit 0cfef2b7 upstream.
      
      If the mmap_sem is contented then the vfio type1 IOMMU backend will
      defer locked page accounting updates to a workqueue task.  This has a
      few problems and depending on which side the user tries to play, they
      might be over-penalized for unmaps that haven't yet been accounted or
      race the workqueue to enter more mappings than they're allowed.  The
      original intent of this workqueue mechanism seems to be focused on
      reducing latency through the ioctl, but we cannot do so at the cost
      of correctness.  Remove this workqueue mechanism and update the
      callers to allow for failure.  We can also now recheck the limit under
      write lock to make sure we don't exceed it.
      
      vfio_pin_pages_remote() also now necessarily includes an unwind path
      which we can jump to directly if the consecutive page pinning finds
      that we're exceeding the user's memory limits.  This avoids the
      current lazy approach which does accounting and mapping up to the
      fault, only to return an error on the next iteration to unwind the
      entire vfio_dma.
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarKirti Wankhede <kwankhede@nvidia.com>
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e0b0a9e
    • Dennis Yang's avatar
      dm thin: fix a memory leak when passing discard bio down · 34224e0e
      Dennis Yang authored
      commit 948f581a upstream.
      
      dm-thin does not free the discard_parent bio after all chained sub
      bios finished. The following kmemleak report could be observed after
      pool with discard_passdown option processes discard bios in
      linux v4.11-rc7. To fix this, we drop the discard_parent bio reference
      when its endio (passdown_endio) called.
      
      unreferenced object 0xffff8803d6b29700 (size 256):
        comm "kworker/u8:0", pid 30349, jiffies 4379504020 (age 143002.776s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          01 00 00 00 00 00 00 f0 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81a5efd9>] kmemleak_alloc+0x49/0xa0
          [<ffffffff8114ec34>] kmem_cache_alloc+0xb4/0x100
          [<ffffffff8110eec0>] mempool_alloc_slab+0x10/0x20
          [<ffffffff8110efa5>] mempool_alloc+0x55/0x150
          [<ffffffff81374939>] bio_alloc_bioset+0xb9/0x260
          [<ffffffffa018fd20>] process_prepared_discard_passdown_pt1+0x40/0x1c0 [dm_thin_pool]
          [<ffffffffa018b409>] break_up_discard_bio+0x1a9/0x200 [dm_thin_pool]
          [<ffffffffa018b484>] process_discard_cell_passdown+0x24/0x40 [dm_thin_pool]
          [<ffffffffa018b24d>] process_discard_bio+0xdd/0xf0 [dm_thin_pool]
          [<ffffffffa018ecf6>] do_worker+0xa76/0xd50 [dm_thin_pool]
          [<ffffffff81086239>] process_one_work+0x139/0x370
          [<ffffffff810867b1>] worker_thread+0x61/0x450
          [<ffffffff8108b316>] kthread+0xd6/0xf0
          [<ffffffff81a6cd1f>] ret_from_fork+0x3f/0x70
          [<ffffffffffffffff>] 0xffffffffffffffff
      Signed-off-by: default avatarDennis Yang <dennisyang@qnap.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34224e0e
    • Bart Van Assche's avatar
      dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue() · 899750e7
      Bart Van Assche authored
      commit 23a60124 upstream.
      
      Otherwise the request-based DM blk-mq request_queue will be put into
      service without being properly exported via sysfs.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      899750e7
    • Somasundaram Krishnasamy's avatar
      dm era: save spacemap metadata root after the pre-commit · ee68aa11
      Somasundaram Krishnasamy authored
      commit 117aceb0 upstream.
      
      When committing era metadata to disk, it doesn't always save the latest
      spacemap metadata root in superblock. Due to this, metadata is getting
      corrupted sometimes when reopening the device. The correct order of update
      should be, pre-commit (shadows spacemap root), save the spacemap root
      (newly shadowed block) to in-core superblock and then the final commit.
      Signed-off-by: default avatarSomasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee68aa11
    • Ondrej Kozina's avatar
      dm crypt: rewrite (wipe) key in crypto layer using random data · f38faf56
      Ondrej Kozina authored
      commit c82feeec upstream.
      
      The message "key wipe" used to wipe real key stored in crypto layer by
      rewriting it with zeroes.  Since commit 28856a9e ("crypto: xts -
      consolidate sanity check for keys") this no longer works in FIPS mode
      for XTS.
      
      While running in FIPS mode the crypto key part has to differ from the
      tweak key.
      
      Fixes: 28856a9e ("crypto: xts - consolidate sanity check for keys")
      Signed-off-by: default avatarOndrej Kozina <okozina@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f38faf56
    • Gary R Hook's avatar
      crypto: ccp - Change ISR handler method for a v5 CCP · 6ab60d2b
      Gary R Hook authored
      commit 6263b51e upstream.
      
      The CCP has the ability to perform several operations simultaneously,
      but only one interrupt.  When implemented as a PCI device and using
      MSI-X/MSI interrupts, use a tasklet model to service interrupts. By
      disabling and enabling interrupts from the CCP, coupled with the
      queuing that tasklets provide, we can ensure that all events
      (occurring on the device) are recognized and serviced.
      
      This change fixes a problem wherein 2 or more busy queues can cause
      notification bits to change state while a (CCP) interrupt is being
      serviced, but after the queue state has been evaluated. This results
      in the event being 'lost' and the queue hanging, waiting to be
      serviced. Since the status bits are never fully de-asserted, the
      CCP never generates another interrupt (all bits zero -> one or more
      bits one), and no further CCP operations will be executed.
      Signed-off-by: default avatarGary R Hook <gary.hook@amd.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ab60d2b
    • Gary R Hook's avatar
      crypto: ccp - Change ISR handler method for a v3 CCP · 747cbf47
      Gary R Hook authored
      commit 7b537b24 upstream.
      
      The CCP has the ability to perform several operations simultaneously,
      but only one interrupt.  When implemented as a PCI device and using
      MSI-X/MSI interrupts, use a tasklet model to service interrupts. By
      disabling and enabling interrupts from the CCP, coupled with the
      queuing that tasklets provide, we can ensure that all events
      (occurring on the device) are recognized and serviced.
      
      This change fixes a problem wherein 2 or more busy queues can cause
      notification bits to change state while a (CCP) interrupt is being
      serviced, but after the queue state has been evaluated. This results
      in the event being 'lost' and the queue hanging, waiting to be
      serviced. Since the status bits are never fully de-asserted, the
      CCP never generates another interrupt (all bits zero -> one or more
      bits one), and no further CCP operations will be executed.
      Signed-off-by: default avatarGary R Hook <gary.hook@amd.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      747cbf47
    • Gary R Hook's avatar
      crypto: ccp - Disable interrupts early on unload · 511b79fb
      Gary R Hook authored
      commit 116591fe upstream.
      
      Ensure that we disable interrupts first when shutting down
      the driver.
      Signed-off-by: default avatarGary R Hook <ghook@amd.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      511b79fb
    • Gary R Hook's avatar
      crypto: ccp - Use only the relevant interrupt bits · c133daf7
      Gary R Hook authored
      commit 56467cb1 upstream.
      
      Each CCP queue can product interrupts for 4 conditions:
      operation complete, queue empty, error, and queue stopped.
      This driver only works with completion and error events.
      Signed-off-by: default avatarGary R Hook <gary.hook@amd.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c133daf7
    • Stephan Mueller's avatar
      crypto: algif_aead - Require setkey before accept(2) · b576fed7
      Stephan Mueller authored
      commit 2a2a251f upstream.
      
      Some cipher implementations will crash if you try to use them
      without calling setkey first.  This patch adds a check so that
      the accept(2) call will fail with -ENOKEY if setkey hasn't been
      done on the socket yet.
      
      Fixes: 400c40cf ("crypto: algif - add AEAD support")
      Signed-off-by: default avatarStephan Mueller <smueller@chronox.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b576fed7
    • Krzysztof Kozlowski's avatar
      crypto: s5p-sss - Close possible race for completed requests · 72a03cf8
      Krzysztof Kozlowski authored
      commit 42d5c176 upstream.
      
      Driver is capable of handling only one request at a time and it stores
      it in its state container struct s5p_aes_dev.  This stored request must be
      protected between concurrent invocations (e.g. completing current
      request and scheduling new one).  Combination of lock and "busy" field
      is used for that purpose.
      
      When "busy" field is true, the driver will not accept new request thus
      it will not overwrite currently handled data.
      
      However commit 28b62b14 ("crypto: s5p-sss - Fix spinlock recursion
      on LRW(AES)") moved some of the write to "busy" field out of a lock
      protected critical section.  This might lead to potential race between
      completing current request and scheduling a new one.  Effectively the
      request completion might try to operate on new crypto request.
      
      Fixes: 28b62b14 ("crypto: s5p-sss - Fix spinlock recursion on LRW(AES)")
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Reviewed-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72a03cf8
    • Mike Snitzer's avatar
      block: fix blk_integrity_register to use template's interval_exp if not 0 · d9ae27b6
      Mike Snitzer authored
      commit 2859323e upstream.
      
      When registering an integrity profile: if the template's interval_exp is
      not 0 use it, otherwise use the ilog2() of logical block size of the
      provided gendisk.
      
      This fixes a long-standing DM linear target bug where it cannot pass
      integrity data to the underlying device if its logical block size
      conflicts with the underlying device's logical block size.
      Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9ae27b6
    • Marc Zyngier's avatar
      arm64: KVM: Fix decoding of Rt/Rt2 when trapping AArch32 CP accesses · d9783032
      Marc Zyngier authored
      commit c667186f upstream.
      
      Our 32bit CP14/15 handling inherited some of the ARMv7 code for handling
      the trapped system registers, completely missing the fact that the
      fields for Rt and Rt2 are now 5 bit wide, and not 4...
      
      Let's fix it, and provide an accessor for the most common Rt case.
      Reviewed-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9783032
    • Andrew Jones's avatar
      KVM: arm/arm64: fix races in kvm_psci_vcpu_on · 95121cc9
      Andrew Jones authored
      commit 6c7a5dce upstream.
      
      Fix potential races in kvm_psci_vcpu_on() by taking the kvm->lock
      mutex.  In general, it's a bad idea to allow more than one PSCI_CPU_ON
      to process the same target VCPU at the same time.  One such problem
      that may arise is that one PSCI_CPU_ON could be resetting the target
      vcpu, which fills the entire sys_regs array with a temporary value
      including the MPIDR register, while another looks up the VCPU based
      on the MPIDR value, resulting in no target VCPU found.  Resolves both
      races found with the kvm-unit-tests/arm/psci unit test.
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarChristoffer Dall <cdall@linaro.org>
      Reported-by: default avatarLevente Kurusa <lkurusa@redhat.com>
      Suggested-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarAndrew Jones <drjones@redhat.com>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95121cc9
    • Paolo Bonzini's avatar
      Revert "KVM: Support vCPU-based gfn->hva cache" · d1daa545
      Paolo Bonzini authored
      commit 4e335d9e upstream.
      
      This reverts commit bbd64115.
      
      I've been sitting on this revert for too long and it unfortunately
      missed 4.11.  It's also the reason why I haven't merged ring-based
      dirty tracking for 4.12.
      
      Using kvm_vcpu_memslots in kvm_gfn_to_hva_cache_init and
      kvm_vcpu_write_guest_offset_cached means that the MSR value can
      now be used to access SMRAM, simply by making it point to an SMRAM
      physical address.  This is problematic because it lets the guest
      OS overwrite memory that it shouldn't be able to touch.
      
      Fixes: bbd64115Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1daa545
    • David Hildenbrand's avatar
      KVM: x86: fix user triggerable warning in kvm_apic_accept_events() · 2d271c95
      David Hildenbrand authored
      commit 28bf2888 upstream.
      
      If we already entered/are about to enter SMM, don't allow switching to
      INIT/SIPI_RECEIVED, otherwise the next call to kvm_apic_accept_events()
      will report a warning.
      
      Same applies if we are already in MP state INIT_RECEIVED and SMM is
      requested to be turned on. Refuse to set the VCPU events in this case.
      
      Fixes: cd7764fe ("KVM: x86: latch INITs while in system management mode")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d271c95
    • Vince Weaver's avatar
      perf/x86: Fix Broadwell-EP DRAM RAPL events · d85d4c87
      Vince Weaver authored
      commit 33b88e70 upstream.
      
      It appears as though the Broadwell-EP DRAM units share the special
      units quirk with Haswell-EP/KNL.
      
      Without this patch, you get really high results (a single DRAM using 20W
      of power).
      
      The powercap driver in drivers/powercap/intel_rapl.c already has this
      change.
      Signed-off-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d85d4c87
    • Richard Weinberger's avatar
      um: Fix PTRACE_POKEUSER on x86_64 · 9e748196
      Richard Weinberger authored
      commit 9abc74a2 upstream.
      
      This is broken since ever but sadly nobody noticed.
      Recent versions of GDB set DR_CONTROL unconditionally and
      UML dies due to a heap corruption. It turns out that
      the PTRACE_POKEUSER was copy&pasted from i386 and assumes
      that addresses are 4 bytes long.
      
      Fix that by using 8 as address size in the calculation.
      Reported-by: default avatarjie cao <cj3054@gmail.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e748196
    • Ben Hutchings's avatar
      x86, pmem: Fix cache flushing for iovec write < 8 bytes · f25c69bd
      Ben Hutchings authored
      commit 8376efd3 upstream.
      
      Commit 11e63f6d added cache flushing for unaligned writes from an
      iovec, covering the first and last cache line of a >= 8 byte write and
      the first cache line of a < 8 byte write.  But an unaligned write of
      2-7 bytes can still cover two cache lines, so make sure we flush both
      in that case.
      
      Fixes: 11e63f6d ("x86, pmem: fix broken __copy_user_nocache ...")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f25c69bd
    • Andy Lutomirski's avatar
      selftests/x86/ldt_gdt_32: Work around a glibc sigaction() bug · 20fd61db
      Andy Lutomirski authored
      commit 65973dd3 upstream.
      
      i386 glibc is buggy and calls the sigaction syscall incorrectly.
      
      This is asymptomatic for normal programs, but it blows up on
      programs that do evil things with segmentation.  The ldt_gdt
      self-test is an example of such an evil program.
      
      This doesn't appear to be a regression -- I think I just got lucky
      with the uninitialized memory that glibc threw at the kernel when I
      wrote the test.
      
      This hackish fix manually issues sigaction(2) syscalls to undo the
      damage.  Without the fix, ldt_gdt_32 segfaults; with the fix, it
      passes for me.
      
      See: https://sourceware.org/bugzilla/show_bug.cgi?id=21269Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/aaab0f9f93c9af25396f01232608c163a760a668.1490218061.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20fd61db
    • Ashish Kalra's avatar
      x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup · 0dc0e26f
      Ashish Kalra authored
      commit d594aa02 upstream.
      
      The minimum size for a new stack (512 bytes) setup for arch/x86/boot components
      when the bootloader does not setup/provide a stack for the early boot components
      is not "enough".
      
      The setup code executing as part of early kernel startup code, uses the stack
      beyond 512 bytes and accidentally overwrites and corrupts part of the BSS
      section. This is exposed mostly in the early video setup code, where
      it was corrupting BSS variables like force_x, force_y, which in-turn affected
      kernel parameters such as screen_info (screen_info.orig_video_cols) and
      later caused an exception/panic in console_init().
      
      Most recent boot loaders setup the stack for early boot components, so this
      stack overwriting into BSS section issue has not been exposed.
      Signed-off-by: default avatarAshish Kalra <ashish@bluestacks.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170419152015.10011-1-ashishkalra@Ashishs-MacBook-Pro.localSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0dc0e26f
    • Guenter Roeck's avatar
      usb: hub: Do not attempt to autosuspend disconnected devices · 12001f3a
      Guenter Roeck authored
      commit f5cccf49 upstream.
      
      While running a bind/unbind stress test with the dwc3 usb driver on rk3399,
      the following crash was observed.
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000218
      pgd = ffffffc00165f000
      [00000218] *pgd=000000000174f003, *pud=000000000174f003,
      				*pmd=0000000001750003, *pte=00e8000001751713
      Internal error: Oops: 96000005 [#1] PREEMPT SMP
      Modules linked in: uinput uvcvideo videobuf2_vmalloc cmac
      ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat rfcomm
      xt_mark fuse bridge stp llc zram btusb btrtl btbcm btintel bluetooth
      ip6table_filter mwifiex_pcie mwifiex cfg80211 cdc_ether usbnet r8152 mii joydev
      snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device ppp_async
      ppp_generic slhc tun
      CPU: 1 PID: 29814 Comm: kworker/1:1 Not tainted 4.4.52 #507
      Hardware name: Google Kevin (DT)
      Workqueue: pm pm_runtime_work
      task: ffffffc0ac540000 ti: ffffffc0af4d4000 task.ti: ffffffc0af4d4000
      PC is at autosuspend_check+0x74/0x174
      LR is at autosuspend_check+0x70/0x174
      ...
      Call trace:
      [<ffffffc00080dcc0>] autosuspend_check+0x74/0x174
      [<ffffffc000810500>] usb_runtime_idle+0x20/0x40
      [<ffffffc000785ae0>] __rpm_callback+0x48/0x7c
      [<ffffffc000786af0>] rpm_idle+0x1e8/0x498
      [<ffffffc000787cdc>] pm_runtime_work+0x88/0xcc
      [<ffffffc000249bb8>] process_one_work+0x390/0x6b8
      [<ffffffc00024abcc>] worker_thread+0x480/0x610
      [<ffffffc000251a80>] kthread+0x164/0x178
      [<ffffffc0002045d0>] ret_from_fork+0x10/0x40
      
      Source:
      
      (gdb) l *0xffffffc00080dcc0
      0xffffffc00080dcc0 is in autosuspend_check
      (drivers/usb/core/driver.c:1778).
      1773		/* We don't need to check interfaces that are
      1774		 * disabled for runtime PM.  Either they are unbound
      1775		 * or else their drivers don't support autosuspend
      1776		 * and so they are permanently active.
      1777		 */
      1778		if (intf->dev.power.disable_depth)
      1779			continue;
      1780		if (atomic_read(&intf->dev.power.usage_count) > 0)
      1781			return -EBUSY;
      1782		w |= intf->needs_remote_wakeup;
      
      Code analysis shows that intf is set to NULL in usb_disable_device() prior
      to setting actconfig to NULL. At the same time, usb_runtime_idle() does not
      lock the usb device, and neither does any of the functions in the
      traceback. This means that there is no protection against a race condition
      where usb_disable_device() is removing dev->actconfig->interface[] pointers
      while those are being accessed from autosuspend_check().
      
      To solve the problem, synchronize and validate device state between
      autosuspend_check() and usb_disconnect().
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      12001f3a
    • Guenter Roeck's avatar
      usb: hub: Fix error loop seen after hub communication errors · 90d4e2a6
      Guenter Roeck authored
      commit 245b2eec upstream.
      
      While stress testing a usb controller using a bind/unbind looop, the
      following error loop was observed.
      
      usb 7-1.2: new low-speed USB device number 3 using xhci-hcd
      usb 7-1.2: hub failed to enable device, error -108
      usb 7-1-port2: cannot disable (err = -22)
      usb 7-1-port2: couldn't allocate usb_device
      usb 7-1-port2: cannot disable (err = -22)
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: activate --> -22
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: activate --> -22
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: activate --> -22
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: activate --> -22
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: activate --> -22
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: activate --> -22
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: activate --> -22
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: activate --> -22
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      ** 57 printk messages dropped ** hub 7-1:1.0: activate --> -22
      ** 82 printk messages dropped ** hub 7-1:1.0: hub_ext_port_status failed (err = -22)
      
      This continues forever. After adding tracebacks into the code,
      the call sequence leading to this is found to be as follows.
      
      [<ffffffc0007fc8e0>] hub_activate+0x368/0x7b8
      [<ffffffc0007fceb4>] hub_resume+0x2c/0x3c
      [<ffffffc00080b3b8>] usb_resume_interface.isra.6+0x128/0x158
      [<ffffffc00080b5d0>] usb_suspend_both+0x1e8/0x288
      [<ffffffc00080c9c4>] usb_runtime_suspend+0x3c/0x98
      [<ffffffc0007820a0>] __rpm_callback+0x48/0x7c
      [<ffffffc00078217c>] rpm_callback+0xa8/0xd4
      [<ffffffc000786234>] rpm_suspend+0x84/0x758
      [<ffffffc000786ca4>] rpm_idle+0x2c8/0x498
      [<ffffffc000786ed4>] __pm_runtime_idle+0x60/0xac
      [<ffffffc00080eba8>] usb_autopm_put_interface+0x6c/0x7c
      [<ffffffc000803798>] hub_event+0x10ac/0x12ac
      [<ffffffc000249bb8>] process_one_work+0x390/0x6b8
      [<ffffffc00024abcc>] worker_thread+0x480/0x610
      [<ffffffc000251a80>] kthread+0x164/0x178
      [<ffffffc0002045d0>] ret_from_fork+0x10/0x40
      
      kick_hub_wq() is called from hub_activate() even after failures to
      communicate with the hub. This results in an endless sequence of
      hub event -> hub activate -> wq trigger -> hub event -> ...
      
      Provide two solutions for the problem.
      
      - Only trigger the hub event queue if communication with the hub
        is successful.
      - After a suspend failure, only resume already suspended interfaces
        if the communication with the device is still possible.
      
      Each of the changes fixes the observed problem. Use both to improve
      robustness.
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90d4e2a6