1. 15 Aug, 2018 13 commits
    • Al Viro's avatar
      fix mntput/mntput race · 87a2d84d
      Al Viro authored
      commit 9ea0a46c upstream.
      
      mntput_no_expire() does the calculation of total refcount under mount_lock;
      unfortunately, the decrement (as well as all increments) are done outside
      of it, leading to false positives in the "are we dropping the last reference"
      test.  Consider the following situation:
      	* mnt is a lazy-umounted mount, kept alive by two opened files.  One
      of those files gets closed.  Total refcount of mnt is 2.  On CPU 42
      mntput(mnt) (called from __fput()) drops one reference, decrementing component
      	* After it has looked at component #0, the process on CPU 0 does
      mntget(), incrementing component #0, gets preempted and gets to run again -
      on CPU 69.  There it does mntput(), which drops the reference (component #69)
      and proceeds to spin on mount_lock.
      	* On CPU 42 our first mntput() finishes counting.  It observes the
      decrement of component #69, but not the increment of component #0.  As the
      result, the total it gets is not 1 as it should've been - it's 0.  At which
      point we decide that vfsmount needs to be killed and proceed to free it and
      shut the filesystem down.  However, there's still another opened file
      on that filesystem, with reference to (now freed) vfsmount, etc. and we are
      screwed.
      
      It's not a wide race, but it can be reproduced with artificial slowdown of
      the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
      
      Fix consists of moving the refcount decrement under mount_lock; the tricky
      part is that we want (and can) keep the fast case (i.e. mount that still
      has non-NULL ->mnt_ns) entirely out of mount_lock.  All places that zero
      mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
      before that mntput().  IOW, if mntput() observes (under rcu_read_lock())
      a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
      be dropped.
      Reported-by: default avatarJann Horn <jannh@google.com>
      Tested-by: default avatarJann Horn <jannh@google.com>
      Fixes: 48a066e7 ("RCU'd vsfmounts")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87a2d84d
    • Al Viro's avatar
      make sure that __dentry_kill() always invalidates d_seq, unhashed or not · 59199c04
      Al Viro authored
      commit 4c0d7cd5 upstream.
      
      RCU pathwalk relies upon the assumption that anything that changes
      ->d_inode of a dentry will invalidate its ->d_seq.  That's almost
      true - the one exception is that the final dput() of already unhashed
      dentry does *not* touch ->d_seq at all.  Unhashing does, though,
      so for anything we'd found by RCU dcache lookup we are fine.
      Unfortunately, we can *start* with an unhashed dentry or jump into
      it.
      
      We could try and be careful in the (few) places where that could
      happen.  Or we could just make the final dput() invalidate the damn
      thing, unhashed or not.  The latter is much simpler and easier to
      backport, so let's do it that way.
      Reported-by: default avatar"Dae R. Jeong" <threeearcat@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59199c04
    • Al Viro's avatar
      root dentries need RCU-delayed freeing · cfac7df7
      Al Viro authored
      commit 90bad5e0 upstream.
      
      Since mountpoint crossing can happen without leaving lazy mode,
      root dentries do need the same protection against having their
      memory freed without RCU delay as everything else in the tree.
      
      It's partially hidden by RCU delay between detaching from the
      mount tree and dropping the vfsmount reference, but the starting
      point of pathwalk can be on an already detached mount, in which
      case umount-caused RCU delay has already passed by the time the
      lazy pathwalk grabs rcu_read_lock().  If the starting point
      happens to be at the root of that vfsmount *and* that vfsmount
      covers the entire filesystem, we get trouble.
      
      Fixes: 48a066e7 ("RCU'd vsfmounts")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfac7df7
    • Linus Torvalds's avatar
      init: rename and re-order boot_cpu_state_init() · 6bb53ee1
      Linus Torvalds authored
      commit b5b1404d upstream.
      
      This is purely a preparatory patch for upcoming changes during the 4.19
      merge window.
      
      We have a function called "boot_cpu_state_init()" that isn't really
      about the bootup cpu state: that is done much earlier by the similarly
      named "boot_cpu_init()" (note lack of "state" in name).
      
      This function initializes some hotplug CPU state, and needs to run after
      the percpu data has been properly initialized.  It even has a comment to
      that effect.
      
      Except it _doesn't_ actually run after the percpu data has been properly
      initialized.  On x86 it happens to do that, but on at least arm and
      arm64, the percpu base pointers are initialized by the arch-specific
      'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
      
      This had some unexpected results, and in particular we have a patch
      pending for the merge window that did the obvious cleanup of using
      'this_cpu_write()' in the cpu hotplug init code:
      
        -       per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
        +       this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
      
      which is obviously the right thing to do.  Except because of the
      ordering issue, it actually failed miserably and unexpectedly on arm64.
      
      So this just fixes the ordering, and changes the name of the function to
      be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
      hotplug state, because the core CPU state was supposed to have already
      been done earlier.
      
      Marked for stable, since the (not yet merged) patch that will show this
      problem is marked for stable.
      Reported-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarMian Yousaf Kaukab <yousaf.kaukab@suse.com>
      Suggested-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bb53ee1
    • Bart Van Assche's avatar
      scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled · bcf447f8
      Bart Van Assche authored
      commit 1214fd7b upstream.
      
      Surround scsi_execute() calls with scsi_autopm_get_device() and
      scsi_autopm_put_device(). Note: removing sr_mutex protection from the
      scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of
      sr_mutex is to serialize cdrom_*() calls.
      
      This patch avoids that complaints similar to the following appear in the
      kernel log if runtime power management is enabled:
      
      INFO: task systemd-udevd:650 blocked for more than 120 seconds.
           Not tainted 4.18.0-rc7-dbg+ #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      systemd-udevd   D28176   650    513 0x00000104
      Call Trace:
      __schedule+0x444/0xfe0
      schedule+0x4e/0xe0
      schedule_preempt_disabled+0x18/0x30
      __mutex_lock+0x41c/0xc70
      mutex_lock_nested+0x1b/0x20
      __blkdev_get+0x106/0x970
      blkdev_get+0x22c/0x5a0
      blkdev_open+0xe9/0x100
      do_dentry_open.isra.19+0x33e/0x570
      vfs_open+0x7c/0xd0
      path_openat+0x6e3/0x1120
      do_filp_open+0x11c/0x1c0
      do_sys_open+0x208/0x2d0
      __x64_sys_openat+0x59/0x70
      do_syscall_64+0x77/0x230
      entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Maurizio Lombardi <mlombard@redhat.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: <stable@vger.kernel.org>
      Tested-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bcf447f8
    • Hans de Goede's avatar
      ACPI / LPSS: Add missing prv_offset setting for byt/cht PWM devices · 51b3938e
      Hans de Goede authored
      commit fdcb613d upstream.
      
      The LPSS PWM device on on Bay Trail and Cherry Trail devices has a set
      of private registers at offset 0x800, the current lpss_device_desc for
      them already sets the LPSS_SAVE_CTX flag to have these saved/restored
      over device-suspend, but the current lpss_device_desc was not setting
      the prv_offset field, leading to the regular device registers getting
      saved/restored instead.
      
      This is causing the PWM controller to no longer work, resulting in a black
      screen,  after a suspend/resume on systems where the firmware clears the
      APB clock and reset bits at offset 0x804.
      
      This commit fixes this by properly setting prv_offset to 0x800 for
      the PWM devices.
      
      Cc: stable@vger.kernel.org
      Fixes: e1c74817 ("ACPI / LPSS: Add Intel BayTrail ACPI mode PWM")
      Fixes: 1bfbd8eb ("ACPI / LPSS: Add ACPI IDs for Intel Braswell")
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Acked-by: default avatarRafael J . Wysocki <rjw@rjwysocki.net>
      Signed-off-by: default avatarThierry Reding <thierry.reding@gmail.com>
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51b3938e
    • Juergen Gross's avatar
      xen/netfront: don't cache skb_shinfo() · af3bd8d6
      Juergen Gross authored
      commit d472b3a6 upstream.
      
      skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache
      its return value.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af3bd8d6
    • Linus Torvalds's avatar
      Mark HI and TASKLET softirq synchronous · fbf12e19
      Linus Torvalds authored
      commit 3c53776e upstream.
      
      Way back in 4.9, we committed 4cd13c21 ("softirq: Let ksoftirqd do
      its job"), and ever since we've had small nagging issues with it.  For
      example, we've had:
      
        1ff68820 ("watchdog: core: make sure the watchdog_worker is not deferred")
        8d5755b3 ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
        217f6974 ("net: busy-poll: allow preemption in sk_busy_loop()")
      
      all of which worked around some of the effects of that commit.
      
      The DVB people have also complained that the commit causes excessive USB
      URB latencies, which seems to be due to the USB code using tasklets to
      schedule USB traffic.  This seems to be an issue mainly when already
      living on the edge, but waiting for ksoftirqd to handle it really does
      seem to cause excessive latencies.
      
      Now Hanna Hawa reports that this issue isn't just limited to USB URB and
      DVB, but also causes timeout problems for the Marvell SoC team:
      
       "I'm facing kernel panic issue while running raid 5 on sata disks
        connected to Macchiatobin (Marvell community board with Armada-8040
        SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
        and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
        uses a tasklet to clean the done descriptors from the queue"
      
      The latency problem causes a panic:
      
        mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
        Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction
      
      We've discussed simply just reverting the original commit entirely, and
      also much more involved solutions (with per-softirq threads etc).  This
      patch is intentionally stupid and fairly limited, because the issue
      still remains, and the other solutions either got sidetracked or had
      other issues.
      
      We should probably also consider the timer softirqs to be synchronous
      and not be delayed to ksoftirqd (since they were the issue with the
      earlier watchdog problems), but that should be done as a separate patch.
      This does only the tasklet cases.
      Reported-and-tested-by: default avatarHanna Hawa <hannah@marvell.com>
      Reported-and-tested-by: default avatarJosef Griebichler <griebichler.josef@gmx.at>
      Reported-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbf12e19
    • Andrey Konovalov's avatar
      kasan: add no_sanitize attribute for clang builds · 50bed434
      Andrey Konovalov authored
      commit 12c8f25a upstream.
      
      KASAN uses the __no_sanitize_address macro to disable instrumentation of
      particular functions.  Right now it's defined only for GCC build, which
      causes false positives when clang is used.
      
      This patch adds a definition for clang.
      
      Note, that clang's revision 329612 or higher is required.
      
      [andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check]
        Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com
      Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.comSigned-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Acked-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Paul Lawrence <paullawrence@google.com>
      Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Sodagudi Prasad <psodagud@codeaurora.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50bed434
    • John David Anglin's avatar
      parisc: Define mb() and add memory barriers to assembler unlock sequences · 2106b21a
      John David Anglin authored
      commit fedb8da9 upstream.
      
      For years I thought all parisc machines executed loads and stores in
      order. However, Jeff Law recently indicated on gcc-patches that this is
      not correct. There are various degrees of out-of-order execution all the
      way back to the PA7xxx processor series (hit-under-miss). The PA8xxx
      series has full out-of-order execution for both integer operations, and
      loads and stores.
      
      This is described in the following article:
      http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml
      
      For this reason, we need to define mb() and to insert a memory barrier
      before the store unlocking spinlocks. This ensures that all memory
      accesses are complete prior to unlocking. The ldcw instruction performs
      the same function on entry.
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # 4.0+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2106b21a
    • Helge Deller's avatar
      parisc: Enable CONFIG_MLONGCALLS by default · 5f394c9e
      Helge Deller authored
      commit 66509a27 upstream.
      
      Enable the -mlong-calls compiler option by default, because otherwise in most
      cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F
      relocations. This fixes building the 64-bit defconfig.
      
      Cc: stable@vger.kernel.org # 4.0+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f394c9e
    • Tadeusz Struk's avatar
      tpm: fix race condition in tpm_common_write() · 1d4167a8
      Tadeusz Struk authored
      commit 3ab2011e upstream.
      
      There is a race condition in tpm_common_write function allowing
      two threads on the same /dev/tpm<N>, or two different applications
      on the same /dev/tpmrm<N> to overwrite each other commands/responses.
      Fixed this by taking the priv->buffer_mutex early in the function.
      
      Also converted the priv->data_pending from atomic to a regular size_t
      type. There is no need for it to be atomic since it is only touched
      under the protection of the priv->buffer_mutex.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTadeusz Struk <tadeusz.struk@intel.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d4167a8
    • Theodore Ts'o's avatar
      ext4: fix check to prevent initializing reserved inodes · 954e572a
      Theodore Ts'o authored
      commit 50122847 upstream.
      
      Commit 8844618d: "ext4: only look at the bg_flags field if it is
      valid" will complain if block group zero does not have the
      EXT4_BG_INODE_ZEROED flag set.  Unfortunately, this is not correct,
      since a freshly created file system has this flag cleared.  It gets
      almost immediately after the file system is mounted read-write --- but
      the following somewhat unlikely sequence will end up triggering a
      false positive report of a corrupted file system:
      
         mkfs.ext4 /dev/vdc
         mount -o ro /dev/vdc /vdc
         mount -o remount,rw /dev/vdc
      
      Instead, when initializing the inode table for block group zero, test
      to make sure that itable_unused count is not too large, since that is
      the case that will result in some or all of the reserved inodes
      getting cleared.
      
      This fixes the failures reported by Eric Whiteney when running
      generic/230 and generic/231 in the the nojournal test case.
      
      Fixes: 8844618d ("ext4: only look at the bg_flags field if it is valid")
      Reported-by: default avatarEric Whitney <enwlinux@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      954e572a
  2. 09 Aug, 2018 18 commits
  3. 06 Aug, 2018 9 commits