1. 15 Aug, 2018 19 commits
    • Jack Morgenstein's avatar
      IB/mlx4: Mark user MR as writable if actual virtual memory is writable · e2ba7bf1
      Jack Morgenstein authored
      commit d8f9cc32 upstream.
      
      To allow rereg_user_mr to modify the MR from read-only to writable without
      using get_user_pages again, we needed to define the initial MR as writable.
      However, this was originally done unconditionally, without taking into
      account the writability of the underlying virtual memory.
      
      As a result, any attempt to register a read-only MR over read-only
      virtual memory failed.
      
      To fix this, do not add the writable flag bit when the user virtual memory
      is not writable (e.g. const memory).
      
      However, when the underlying memory is NOT writable (and we therefore
      do not define the initial MR as writable), the IB core adds a
      "force writable" flag to its user-pages request. If this succeeds,
      the reg_user_mr caller gets a writable copy of the original pages.
      
      If the user-space caller then does a rereg_user_mr operation to enable
      writability, this will succeed. This should not be allowed, since
      the original virtual memory was not writable.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 9376932d ("IB/mlx4_ib: Add support for user MR re-registration")
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2ba7bf1
    • Jack Morgenstein's avatar
      IB/core: Make testing MR flags for writability a static inline function · 11410f99
      Jack Morgenstein authored
      commit 08bb558a upstream.
      
      Make the MR writability flags check, which is performed in umem.c,
      a static inline function in file ib_verbs.h
      
      This allows the function to be used by low-level infiniband drivers.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11410f99
    • Eric W. Biederman's avatar
      proc: Fix proc_sys_prune_dcache to hold a sb reference · a3a7b992
      Eric W. Biederman authored
      commit 2fd1d2c4 upstream.
      
      Andrei Vagin writes:
      FYI: This bug has been reproduced on 4.11.7
      > BUG: Dentry ffff895a3dd01240{i=4e7c09a,n=lo}  still in use (1) [unmount of proc proc]
      > ------------[ cut here ]------------
      > WARNING: CPU: 1 PID: 13588 at fs/dcache.c:1445 umount_check+0x6e/0x80
      > CPU: 1 PID: 13588 Comm: kworker/1:1 Not tainted 4.11.7-200.fc25.x86_64 #1
      > Hardware name: CompuLab sbc-flt1/fitlet, BIOS SBCFLT_0.08.04 06/27/2015
      > Workqueue: events proc_cleanup_work
      > Call Trace:
      >  dump_stack+0x63/0x86
      >  __warn+0xcb/0xf0
      >  warn_slowpath_null+0x1d/0x20
      >  umount_check+0x6e/0x80
      >  d_walk+0xc6/0x270
      >  ? dentry_free+0x80/0x80
      >  do_one_tree+0x26/0x40
      >  shrink_dcache_for_umount+0x2d/0x90
      >  generic_shutdown_super+0x1f/0xf0
      >  kill_anon_super+0x12/0x20
      >  proc_kill_sb+0x40/0x50
      >  deactivate_locked_super+0x43/0x70
      >  deactivate_super+0x5a/0x60
      >  cleanup_mnt+0x3f/0x90
      >  mntput_no_expire+0x13b/0x190
      >  kern_unmount+0x3e/0x50
      >  pid_ns_release_proc+0x15/0x20
      >  proc_cleanup_work+0x15/0x20
      >  process_one_work+0x197/0x450
      >  worker_thread+0x4e/0x4a0
      >  kthread+0x109/0x140
      >  ? process_one_work+0x450/0x450
      >  ? kthread_park+0x90/0x90
      >  ret_from_fork+0x2c/0x40
      > ---[ end trace e1c109611e5d0b41 ]---
      > VFS: Busy inodes after unmount of proc. Self-destruct in 5 seconds.  Have a nice day...
      > BUG: unable to handle kernel NULL pointer dereference at           (null)
      > IP: _raw_spin_lock+0xc/0x30
      > PGD 0
      
      Fix this by taking a reference to the super block in proc_sys_prune_dcache.
      
      The superblock reference is the core of the fix however the sysctl_inodes
      list is converted to a hlist so that hlist_del_init_rcu may be used.  This
      allows proc_sys_prune_dache to remove inodes the sysctl_inodes list, while
      not causing problems for proc_sys_evict_inode when if it later choses to
      remove the inode from the sysctl_inodes list.  Removing inodes from the
      sysctl_inodes list allows proc_sys_prune_dcache to have a progress
      guarantee, while still being able to drop all locks.  The fact that
      head->unregistering is set in start_unregistering ensures that no more
      inodes will be added to the the sysctl_inodes list.
      
      Previously the code did a dance where it delayed calling iput until the
      next entry in the list was being considered to ensure the inode remained on
      the sysctl_inodes list until the next entry was walked to.  The structure
      of the loop in this patch does not need that so is much easier to
      understand and maintain.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAndrei Vagin <avagin@gmail.com>
      Tested-by: default avatarAndrei Vagin <avagin@openvz.org>
      Fixes: ace0c791 ("proc/sysctl: Don't grab i_lock under sysctl_lock.")
      Fixes: d6cffbbe ("proc/sysctl: prune stale dentries during unregistering")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3a7b992
    • Eric W. Biederman's avatar
      proc/sysctl: Don't grab i_lock under sysctl_lock. · 631f93a6
      Eric W. Biederman authored
      commit ace0c791 upstream.
      
      Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
      > This patch has locking problem. I've got lockdep splat under LTP.
      >
      > [ 6633.115456] ======================================================
      > [ 6633.115502] [ INFO: possible circular locking dependency detected ]
      > [ 6633.115553] 4.9.10-debug+ #9 Tainted: G             L
      > [ 6633.115584] -------------------------------------------------------
      > [ 6633.115627] ksm02/284980 is trying to acquire lock:
      > [ 6633.115659]  (&sb->s_type->i_lock_key#4){+.+...}, at: [<ffffffff816bc1ce>] igrab+0x1e/0x80
      > [ 6633.115834] but task is already holding lock:
      > [ 6633.115882]  (sysctl_lock){+.+...}, at: [<ffffffff817e379b>] unregister_sysctl_table+0x6b/0x110
      > [ 6633.116026] which lock already depends on the new lock.
      > [ 6633.116026]
      > [ 6633.116080]
      > [ 6633.116080] the existing dependency chain (in reverse order) is:
      > [ 6633.116117]
      > -> #2 (sysctl_lock){+.+...}:
      > -> #1 (&(&dentry->d_lockref.lock)->rlock){+.+...}:
      > -> #0 (&sb->s_type->i_lock_key#4){+.+...}:
      >
      > d_lock nests inside i_lock
      > sysctl_lock nests inside d_lock in d_compare
      >
      > This patch adds i_lock nesting inside sysctl_lock.
      
      Al Viro <viro@ZenIV.linux.org.uk> replied:
      > Once ->unregistering is set, you can drop sysctl_lock just fine.  So I'd
      > try something like this - use rcu_read_lock() in proc_sys_prune_dcache(),
      > drop sysctl_lock() before it and regain after.  Make sure that no inodes
      > are added to the list ones ->unregistering has been set and use RCU list
      > primitives for modifying the inode list, with sysctl_lock still used to
      > serialize its modifications.
      >
      > Freeing struct inode is RCU-delayed (see proc_destroy_inode()), so doing
      > igrab() is safe there.  Since we don't drop inode reference until after we'd
      > passed beyond it in the list, list_for_each_entry_rcu() should be fine.
      
      I agree with Al Viro's analsysis of the situtation.
      
      Fixes: d6cffbbe ("proc/sysctl: prune stale dentries during unregistering")
      Reported-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Tested-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Suggested-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      631f93a6
    • Konstantin Khlebnikov's avatar
      proc/sysctl: prune stale dentries during unregistering · b96e215e
      Konstantin Khlebnikov authored
      commit d6cffbbe upstream.
      
      Currently unregistering sysctl table does not prune its dentries.
      Stale dentries could slowdown sysctl operations significantly.
      
      For example, command:
      
       # for i in {1..100000} ; do unshare -n -- sysctl -a &> /dev/null ; done
       creates a millions of stale denties around sysctls of loopback interface:
      
       # sysctl fs.dentry-state
       fs.dentry-state = 25812579  24724135        45      0       0       0
      
       All of them have matching names thus lookup have to scan though whole
       hash chain and call d_compare (proc_sys_compare) which checks them
       under system-wide spinlock (sysctl_lock).
      
       # time sysctl -a > /dev/null
       real    1m12.806s
       user    0m0.016s
       sys     1m12.400s
      
      Currently only memory reclaimer could remove this garbage.
      But without significant memory pressure this never happens.
      
      This patch collects sysctl inodes into list on sysctl table header and
      prunes all their dentries once that table unregisters.
      
      Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
      > On 10.02.2017 10:47, Al Viro wrote:
      >> how about >> the matching stats *after* that patch?
      >
      > dcache size doesn't grow endlessly, so stats are fine
      >
      > # sysctl fs.dentry-state
      > fs.dentry-state = 92712	58376	45	0	0	0
      >
      > # time sysctl -a &>/dev/null
      >
      > real	0m0.013s
      > user	0m0.004s
      > sys	0m0.008s
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Suggested-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b96e215e
    • Al Viro's avatar
      fix __legitimize_mnt()/mntput() race · e31578c6
      Al Viro authored
      commit 119e1ef8 upstream.
      
      __legitimize_mnt() has two problems - one is that in case of success
      the check of mount_lock is not ordered wrt preceding increment of
      refcount, making it possible to have successful __legitimize_mnt()
      on one CPU just before the otherwise final mntpu() on another,
      with __legitimize_mnt() not seeing mntput() taking the lock and
      mntput() not seeing the increment done by __legitimize_mnt().
      Solved by a pair of barriers.
      
      Another is that failure of __legitimize_mnt() on the second
      read_seqretry() leaves us with reference that'll need to be
      dropped by caller; however, if that races with final mntput()
      we can end up with caller dropping rcu_read_lock() and doing
      mntput() to release that reference - with the first mntput()
      having freed the damn thing just as rcu_read_lock() had been
      dropped.  Solution: in "do mntput() yourself" failure case
      grab mount_lock, check if MNT_DOOMED has been set by racing
      final mntput() that has missed our increment and if it has -
      undo the increment and treat that as "failure, caller doesn't
      need to drop anything" case.
      
      It's not easy to hit - the final mntput() has to come right
      after the first read_seqretry() in __legitimize_mnt() *and*
      manage to miss the increment done by __legitimize_mnt() before
      the second read_seqretry() in there.  The things that are almost
      impossible to hit on bare hardware are not impossible on SMP
      KVM, though...
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Fixes: 48a066e7 ("RCU'd vsfmounts")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e31578c6
    • Al Viro's avatar
      fix mntput/mntput race · 87a2d84d
      Al Viro authored
      commit 9ea0a46c upstream.
      
      mntput_no_expire() does the calculation of total refcount under mount_lock;
      unfortunately, the decrement (as well as all increments) are done outside
      of it, leading to false positives in the "are we dropping the last reference"
      test.  Consider the following situation:
      	* mnt is a lazy-umounted mount, kept alive by two opened files.  One
      of those files gets closed.  Total refcount of mnt is 2.  On CPU 42
      mntput(mnt) (called from __fput()) drops one reference, decrementing component
      	* After it has looked at component #0, the process on CPU 0 does
      mntget(), incrementing component #0, gets preempted and gets to run again -
      on CPU 69.  There it does mntput(), which drops the reference (component #69)
      and proceeds to spin on mount_lock.
      	* On CPU 42 our first mntput() finishes counting.  It observes the
      decrement of component #69, but not the increment of component #0.  As the
      result, the total it gets is not 1 as it should've been - it's 0.  At which
      point we decide that vfsmount needs to be killed and proceed to free it and
      shut the filesystem down.  However, there's still another opened file
      on that filesystem, with reference to (now freed) vfsmount, etc. and we are
      screwed.
      
      It's not a wide race, but it can be reproduced with artificial slowdown of
      the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
      
      Fix consists of moving the refcount decrement under mount_lock; the tricky
      part is that we want (and can) keep the fast case (i.e. mount that still
      has non-NULL ->mnt_ns) entirely out of mount_lock.  All places that zero
      mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
      before that mntput().  IOW, if mntput() observes (under rcu_read_lock())
      a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
      be dropped.
      Reported-by: default avatarJann Horn <jannh@google.com>
      Tested-by: default avatarJann Horn <jannh@google.com>
      Fixes: 48a066e7 ("RCU'd vsfmounts")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87a2d84d
    • Al Viro's avatar
      make sure that __dentry_kill() always invalidates d_seq, unhashed or not · 59199c04
      Al Viro authored
      commit 4c0d7cd5 upstream.
      
      RCU pathwalk relies upon the assumption that anything that changes
      ->d_inode of a dentry will invalidate its ->d_seq.  That's almost
      true - the one exception is that the final dput() of already unhashed
      dentry does *not* touch ->d_seq at all.  Unhashing does, though,
      so for anything we'd found by RCU dcache lookup we are fine.
      Unfortunately, we can *start* with an unhashed dentry or jump into
      it.
      
      We could try and be careful in the (few) places where that could
      happen.  Or we could just make the final dput() invalidate the damn
      thing, unhashed or not.  The latter is much simpler and easier to
      backport, so let's do it that way.
      Reported-by: default avatar"Dae R. Jeong" <threeearcat@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59199c04
    • Al Viro's avatar
      root dentries need RCU-delayed freeing · cfac7df7
      Al Viro authored
      commit 90bad5e0 upstream.
      
      Since mountpoint crossing can happen without leaving lazy mode,
      root dentries do need the same protection against having their
      memory freed without RCU delay as everything else in the tree.
      
      It's partially hidden by RCU delay between detaching from the
      mount tree and dropping the vfsmount reference, but the starting
      point of pathwalk can be on an already detached mount, in which
      case umount-caused RCU delay has already passed by the time the
      lazy pathwalk grabs rcu_read_lock().  If the starting point
      happens to be at the root of that vfsmount *and* that vfsmount
      covers the entire filesystem, we get trouble.
      
      Fixes: 48a066e7 ("RCU'd vsfmounts")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfac7df7
    • Linus Torvalds's avatar
      init: rename and re-order boot_cpu_state_init() · 6bb53ee1
      Linus Torvalds authored
      commit b5b1404d upstream.
      
      This is purely a preparatory patch for upcoming changes during the 4.19
      merge window.
      
      We have a function called "boot_cpu_state_init()" that isn't really
      about the bootup cpu state: that is done much earlier by the similarly
      named "boot_cpu_init()" (note lack of "state" in name).
      
      This function initializes some hotplug CPU state, and needs to run after
      the percpu data has been properly initialized.  It even has a comment to
      that effect.
      
      Except it _doesn't_ actually run after the percpu data has been properly
      initialized.  On x86 it happens to do that, but on at least arm and
      arm64, the percpu base pointers are initialized by the arch-specific
      'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
      
      This had some unexpected results, and in particular we have a patch
      pending for the merge window that did the obvious cleanup of using
      'this_cpu_write()' in the cpu hotplug init code:
      
        -       per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
        +       this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
      
      which is obviously the right thing to do.  Except because of the
      ordering issue, it actually failed miserably and unexpectedly on arm64.
      
      So this just fixes the ordering, and changes the name of the function to
      be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
      hotplug state, because the core CPU state was supposed to have already
      been done earlier.
      
      Marked for stable, since the (not yet merged) patch that will show this
      problem is marked for stable.
      Reported-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarMian Yousaf Kaukab <yousaf.kaukab@suse.com>
      Suggested-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bb53ee1
    • Bart Van Assche's avatar
      scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled · bcf447f8
      Bart Van Assche authored
      commit 1214fd7b upstream.
      
      Surround scsi_execute() calls with scsi_autopm_get_device() and
      scsi_autopm_put_device(). Note: removing sr_mutex protection from the
      scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of
      sr_mutex is to serialize cdrom_*() calls.
      
      This patch avoids that complaints similar to the following appear in the
      kernel log if runtime power management is enabled:
      
      INFO: task systemd-udevd:650 blocked for more than 120 seconds.
           Not tainted 4.18.0-rc7-dbg+ #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      systemd-udevd   D28176   650    513 0x00000104
      Call Trace:
      __schedule+0x444/0xfe0
      schedule+0x4e/0xe0
      schedule_preempt_disabled+0x18/0x30
      __mutex_lock+0x41c/0xc70
      mutex_lock_nested+0x1b/0x20
      __blkdev_get+0x106/0x970
      blkdev_get+0x22c/0x5a0
      blkdev_open+0xe9/0x100
      do_dentry_open.isra.19+0x33e/0x570
      vfs_open+0x7c/0xd0
      path_openat+0x6e3/0x1120
      do_filp_open+0x11c/0x1c0
      do_sys_open+0x208/0x2d0
      __x64_sys_openat+0x59/0x70
      do_syscall_64+0x77/0x230
      entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Maurizio Lombardi <mlombard@redhat.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: <stable@vger.kernel.org>
      Tested-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bcf447f8
    • Hans de Goede's avatar
      ACPI / LPSS: Add missing prv_offset setting for byt/cht PWM devices · 51b3938e
      Hans de Goede authored
      commit fdcb613d upstream.
      
      The LPSS PWM device on on Bay Trail and Cherry Trail devices has a set
      of private registers at offset 0x800, the current lpss_device_desc for
      them already sets the LPSS_SAVE_CTX flag to have these saved/restored
      over device-suspend, but the current lpss_device_desc was not setting
      the prv_offset field, leading to the regular device registers getting
      saved/restored instead.
      
      This is causing the PWM controller to no longer work, resulting in a black
      screen,  after a suspend/resume on systems where the firmware clears the
      APB clock and reset bits at offset 0x804.
      
      This commit fixes this by properly setting prv_offset to 0x800 for
      the PWM devices.
      
      Cc: stable@vger.kernel.org
      Fixes: e1c74817 ("ACPI / LPSS: Add Intel BayTrail ACPI mode PWM")
      Fixes: 1bfbd8eb ("ACPI / LPSS: Add ACPI IDs for Intel Braswell")
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Acked-by: default avatarRafael J . Wysocki <rjw@rjwysocki.net>
      Signed-off-by: default avatarThierry Reding <thierry.reding@gmail.com>
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51b3938e
    • Juergen Gross's avatar
      xen/netfront: don't cache skb_shinfo() · af3bd8d6
      Juergen Gross authored
      commit d472b3a6 upstream.
      
      skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache
      its return value.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af3bd8d6
    • Linus Torvalds's avatar
      Mark HI and TASKLET softirq synchronous · fbf12e19
      Linus Torvalds authored
      commit 3c53776e upstream.
      
      Way back in 4.9, we committed 4cd13c21 ("softirq: Let ksoftirqd do
      its job"), and ever since we've had small nagging issues with it.  For
      example, we've had:
      
        1ff68820 ("watchdog: core: make sure the watchdog_worker is not deferred")
        8d5755b3 ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
        217f6974 ("net: busy-poll: allow preemption in sk_busy_loop()")
      
      all of which worked around some of the effects of that commit.
      
      The DVB people have also complained that the commit causes excessive USB
      URB latencies, which seems to be due to the USB code using tasklets to
      schedule USB traffic.  This seems to be an issue mainly when already
      living on the edge, but waiting for ksoftirqd to handle it really does
      seem to cause excessive latencies.
      
      Now Hanna Hawa reports that this issue isn't just limited to USB URB and
      DVB, but also causes timeout problems for the Marvell SoC team:
      
       "I'm facing kernel panic issue while running raid 5 on sata disks
        connected to Macchiatobin (Marvell community board with Armada-8040
        SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
        and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
        uses a tasklet to clean the done descriptors from the queue"
      
      The latency problem causes a panic:
      
        mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
        Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction
      
      We've discussed simply just reverting the original commit entirely, and
      also much more involved solutions (with per-softirq threads etc).  This
      patch is intentionally stupid and fairly limited, because the issue
      still remains, and the other solutions either got sidetracked or had
      other issues.
      
      We should probably also consider the timer softirqs to be synchronous
      and not be delayed to ksoftirqd (since they were the issue with the
      earlier watchdog problems), but that should be done as a separate patch.
      This does only the tasklet cases.
      Reported-and-tested-by: default avatarHanna Hawa <hannah@marvell.com>
      Reported-and-tested-by: default avatarJosef Griebichler <griebichler.josef@gmx.at>
      Reported-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbf12e19
    • Andrey Konovalov's avatar
      kasan: add no_sanitize attribute for clang builds · 50bed434
      Andrey Konovalov authored
      commit 12c8f25a upstream.
      
      KASAN uses the __no_sanitize_address macro to disable instrumentation of
      particular functions.  Right now it's defined only for GCC build, which
      causes false positives when clang is used.
      
      This patch adds a definition for clang.
      
      Note, that clang's revision 329612 or higher is required.
      
      [andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check]
        Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com
      Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.comSigned-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Acked-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Paul Lawrence <paullawrence@google.com>
      Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Sodagudi Prasad <psodagud@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50bed434
    • John David Anglin's avatar
      parisc: Define mb() and add memory barriers to assembler unlock sequences · 2106b21a
      John David Anglin authored
      commit fedb8da9 upstream.
      
      For years I thought all parisc machines executed loads and stores in
      order. However, Jeff Law recently indicated on gcc-patches that this is
      not correct. There are various degrees of out-of-order execution all the
      way back to the PA7xxx processor series (hit-under-miss). The PA8xxx
      series has full out-of-order execution for both integer operations, and
      loads and stores.
      
      This is described in the following article:
      http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml
      
      For this reason, we need to define mb() and to insert a memory barrier
      before the store unlocking spinlocks. This ensures that all memory
      accesses are complete prior to unlocking. The ldcw instruction performs
      the same function on entry.
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # 4.0+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2106b21a
    • Helge Deller's avatar
      parisc: Enable CONFIG_MLONGCALLS by default · 5f394c9e
      Helge Deller authored
      commit 66509a27 upstream.
      
      Enable the -mlong-calls compiler option by default, because otherwise in most
      cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F
      relocations. This fixes building the 64-bit defconfig.
      
      Cc: stable@vger.kernel.org # 4.0+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f394c9e
    • Tadeusz Struk's avatar
      tpm: fix race condition in tpm_common_write() · 1d4167a8
      Tadeusz Struk authored
      commit 3ab2011e upstream.
      
      There is a race condition in tpm_common_write function allowing
      two threads on the same /dev/tpm<N>, or two different applications
      on the same /dev/tpmrm<N> to overwrite each other commands/responses.
      Fixed this by taking the priv->buffer_mutex early in the function.
      
      Also converted the priv->data_pending from atomic to a regular size_t
      type. There is no need for it to be atomic since it is only touched
      under the protection of the priv->buffer_mutex.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTadeusz Struk <tadeusz.struk@intel.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d4167a8
    • Theodore Ts'o's avatar
      ext4: fix check to prevent initializing reserved inodes · 954e572a
      Theodore Ts'o authored
      commit 50122847 upstream.
      
      Commit 8844618d: "ext4: only look at the bg_flags field if it is
      valid" will complain if block group zero does not have the
      EXT4_BG_INODE_ZEROED flag set.  Unfortunately, this is not correct,
      since a freshly created file system has this flag cleared.  It gets
      almost immediately after the file system is mounted read-write --- but
      the following somewhat unlikely sequence will end up triggering a
      false positive report of a corrupted file system:
      
         mkfs.ext4 /dev/vdc
         mount -o ro /dev/vdc /vdc
         mount -o remount,rw /dev/vdc
      
      Instead, when initializing the inode table for block group zero, test
      to make sure that itable_unused count is not too large, since that is
      the case that will result in some or all of the reserved inodes
      getting cleared.
      
      This fixes the failures reported by Eric Whiteney when running
      generic/230 and generic/231 in the the nojournal test case.
      
      Fixes: 8844618d ("ext4: only look at the bg_flags field if it is valid")
      Reported-by: default avatarEric Whitney <enwlinux@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      954e572a
  2. 09 Aug, 2018 18 commits
  3. 06 Aug, 2018 3 commits