1. 30 May, 2018 11 commits
    • Davidlohr Bueso's avatar
      Revert "ipc/shm: Fix shmat mmap nil-page protection" · bd4792b3
      Davidlohr Bueso authored
      commit a73ab244 upstream.
      
      Patch series "ipc/shm: shmat() fixes around nil-page".
      
      These patches fix two issues reported[1] a while back by Joe and Andrea
      around how shmat(2) behaves with nil-page.
      
      The first reverts a commit that it was incorrectly thought that mapping
      nil-page (address=0) was a no no with MAP_FIXED.  This is not the case,
      with the exception of SHM_REMAP; which is address in the second patch.
      
      I chose two patches because it is easier to backport and it explicitly
      reverts bogus behaviour.  Both patches ought to be in -stable and ltp
      testcases need updated (the added testcase around the cve can be
      modified to just test for SHM_RND|SHM_REMAP).
      
      [1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805
      
      This patch (of 2):
      
      Commit 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection")
      worked on the idea that we should not be mapping as root addr=0 and
      MAP_FIXED.  However, it was reported that this scenario is in fact
      valid, thus making the patch both bogus and breaks userspace as well.
      
      For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
      initialization[1].
      
      [1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
      Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
      Fixes: 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection")
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reported-by: default avatarJoe Lawrence <joe.lawrence@redhat.com>
      Reported-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd4792b3
    • Joe Jin's avatar
      xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent · 9de92451
      Joe Jin authored
      commit 4855c92d upstream.
      
      When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
      but Dom Heap is increased by the same size. Tracing raidconfig we found
      that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
      to apply memory. If the memory allocated by Dom0 is not in the DMA area,
      it will exchange memory with Xen to meet the requiment. Later drivers
      call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
      the check condition (dev_addr + size - 1 <= dma_mask) is always false,
      it prevents calling xen_destroy_contiguous_region() to return the memory
      to the Xen DMA heap.
      
      This issue introduced by commit 6810df88 "xen-swiotlb: When doing
      coherent alloc/dealloc check before swizzling the MFNs.".
      Signed-off-by: default avatarJoe Jin <joe.jin@oracle.com>
      Tested-by: default avatarJohn Sobecki <john.sobecki@oracle.com>
      Reviewed-by: default avatarRzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9de92451
    • Sudip Mukherjee's avatar
      libata: blacklist Micron 500IT SSD with MU01 firmware · 03d08c08
      Sudip Mukherjee authored
      commit 136d769e upstream.
      
      While whitelisting Micron M500DC drives, the tweaked blacklist entry
      enabled queued TRIM from M500IT variants also. But these do not support
      queued TRIM. And while using those SSDs with the latest kernel we have
      seen errors and even the partition table getting corrupted.
      
      Some part from the dmesg:
      [    6.727384] ata1.00: ATA-9: Micron_M500IT_MTFDDAK060MBD, MU01, max UDMA/133
      [    6.727390] ata1.00: 117231408 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
      [    6.741026] ata1.00: supports DRM functions and may not be fully accessible
      [    6.759887] ata1.00: configured for UDMA/133
      [    6.762256] scsi 0:0:0:0: Direct-Access     ATA      Micron_M500IT_MT MU01 PQ: 0 ANSI: 5
      
      and then for the error:
      [  120.860334] ata1.00: exception Emask 0x1 SAct 0x7ffc0007 SErr 0x0 action 0x6 frozen
      [  120.860338] ata1.00: irq_stat 0x40000008
      [  120.860342] ata1.00: failed command: SEND FPDMA QUEUED
      [  120.860351] ata1.00: cmd 64/01:00:00:00:00/00:00:00:00:00/a0 tag 0 ncq dma 512 out
               res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x5 (timeout)
      [  120.860353] ata1.00: status: { DRDY }
      [  120.860543] ata1: hard resetting link
      [  121.166128] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
      [  121.166376] ata1.00: supports DRM functions and may not be fully accessible
      [  121.186238] ata1.00: supports DRM functions and may not be fully accessible
      [  121.204445] ata1.00: configured for UDMA/133
      [  121.204454] ata1.00: device reported invalid CHS sector 0
      [  121.204541] sd 0:0:0:0: [sda] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
      [  121.204546] sd 0:0:0:0: [sda] tag#18 Sense Key : 0x5 [current]
      [  121.204550] sd 0:0:0:0: [sda] tag#18 ASC=0x21 ASCQ=0x4
      [  121.204555] sd 0:0:0:0: [sda] tag#18 CDB: opcode=0x93 93 08 00 00 00 00 00 04 28 80 00 00 00 30 00 00
      [  121.204559] print_req_error: I/O error, dev sda, sector 272512
      
      After few reboots with these errors, and the SSD is corrupted.
      After blacklisting it, the errors are not seen and the SSD does not get
      corrupted any more.
      
      Fixes: 243918be ("libata: Do not blacklist Micron M500DC")
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03d08c08
    • Tejun Heo's avatar
      libata: Blacklist some Sandisk SSDs for NCQ · 3b56232f
      Tejun Heo authored
      commit 322579dc upstream.
      
      Sandisk SSDs SD7SN6S256G and SD8SN8U256G are regularly locking up
      regularly under sustained moderate load with NCQ enabled.  Blacklist
      for now.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b56232f
    • Corneliu Doban's avatar
      mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register · 089440aa
      Corneliu Doban authored
      commit 5f651b87 upstream.
      
      When the host controller accepts only 32bit writes, the value of the
      16bit TRANSFER_MODE register, that has the same 32bit address as the
      16bit COMMAND register, needs to be saved and it will be written
      in a 32bit write together with the command as this will trigger the
      host to send the command on the SD interface.
      When sending the tuning command, TRANSFER_MODE is written and then
      sdhci_set_transfer_mode reads it back to clear AUTO_CMD12 bit and
      write it again resulting in wrong value to be written because the
      initial write value was saved in a shadow and the read-back returned
      a wrong value, from the register.
      Fix sdhci_iproc_readw to return the saved value of TRANSFER_MODE
      when a saved value exist.
      Same fix for read of BLOCK_SIZE and BLOCK_COUNT registers, that are
      saved for a different reason, although a scenario that will cause the
      mentioned problem on this registers is not probable.
      
      Fixes: b580c52d ("mmc: sdhci-iproc: add IPROC SDHCI driver")
      Signed-off-by: default avatarCorneliu Doban <corneliu.doban@broadcom.com>
      Signed-off-by: default avatarScott Branden <scott.branden@broadcom.com>
      Cc: stable@vger.kernel.org # v4.1+
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      089440aa
    • Ben Hutchings's avatar
      ALSA: timer: Fix pause event notification · c04a69fb
      Ben Hutchings authored
      commit 3ae18097 upstream.
      
      Commit f65e0d29 ("ALSA: timer: Call notifier in the same spinlock")
      combined the start/continue and stop/pause functions, and in doing so
      changed the event code for the pause case to SNDRV_TIMER_EVENT_CONTINUE.
      Change it back to SNDRV_TIMER_EVENT_PAUSE.
      
      Fixes: f65e0d29 ("ALSA: timer: Call notifier in the same spinlock")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c04a69fb
    • Al Viro's avatar
      aio: fix io_destroy(2) vs. lookup_ioctx() race · 1ac501e0
      Al Viro authored
      commit baf10564 upstream.
      
      kill_ioctx() used to have an explicit RCU delay between removing the
      reference from ->ioctx_table and percpu_ref_kill() dropping the refcount.
      At some point that delay had been removed, on the theory that
      percpu_ref_kill() itself contained an RCU delay.  Unfortunately, that was
      the wrong kind of RCU delay and it didn't care about rcu_read_lock() used
      by lookup_ioctx().  As the result, we could get ctx freed right under
      lookup_ioctx().  Tejun has fixed that in a6d7cff4 ("fs/aio: Add explicit
      RCU grace period when freeing kioctx"); however, that fix is not enough.
      
      Suppose io_destroy() from one thread races with e.g. io_setup() from another;
      CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2
      has picked it (under rcu_read_lock()).  Then CPU1 proceeds to drop the
      refcount, getting it to 0 and triggering a call of free_ioctx_users(),
      which proceeds to drop the secondary refcount and once that reaches zero
      calls free_ioctx_reqs().  That does
              INIT_RCU_WORK(&ctx->free_rwork, free_ioctx);
              queue_rcu_work(system_wq, &ctx->free_rwork);
      and schedules freeing the whole thing after RCU delay.
      
      In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the
      refcount from 0 to 1 and returned the reference to io_setup().
      
      Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get
      freed until after percpu_ref_get().  Sure, we'd increment the counter before
      ctx can be freed.  Now we are out of rcu_read_lock() and there's nothing to
      stop freeing of the whole thing.  Unfortunately, CPU2 assumes that since it
      has grabbed the reference, ctx is *NOT* going away until it gets around to
      dropping that reference.
      
      The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss.
      It's not costlier than what we currently do in normal case, it's safe to
      call since freeing *is* delayed and it closes the race window - either
      lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users
      won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx()
      fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see
      the object in question at all.
      
      Cc: stable@kernel.org
      Fixes: a6d7cff4 "fs/aio: Add explicit RCU grace period when freeing kioctx"
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ac501e0
    • Al Viro's avatar
      affs_lookup(): close a race with affs_remove_link() · 4d21e6e8
      Al Viro authored
      commit 30da870c upstream.
      
      we unlock the directory hash too early - if we are looking at secondary
      link and primary (in another directory) gets removed just as we unlock,
      we could have the old primary moved in place of the secondary, leaving
      us to look into freed entry (and leaving our dentry with ->d_fsdata
      pointing to a freed entry).
      
      Cc: stable@vger.kernel.org # 2.4.4+
      Acked-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d21e6e8
    • Colin Ian King's avatar
      KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable" · bf8ac80f
      Colin Ian King authored
      commit ba3696e9 upstream.
      
      Trivial fix to spelling mistake in debugfs_entries text.
      
      Fixes: 669e846e ("KVM/MIPS32: MIPS arch specific APIs for KVM")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kernel-janitors@vger.kernel.org
      Cc: <stable@vger.kernel.org> # 3.10+
      Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf8ac80f
    • Maciej W. Rozycki's avatar
      MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs · f363b0ca
      Maciej W. Rozycki authored
      commit 9a3a92cc upstream.
      
      Check the TIF_32BIT_FPREGS task setting of the tracee rather than the
      tracer in determining the layout of floating-point general registers in
      the floating-point context, correcting access to odd-numbered registers
      for o32 tracees where the setting disagrees between the two processes.
      
      Fixes: 597ce172 ("MIPS: Support for 64-bit FP with O32 binaries")
      Signed-off-by: default avatarMaciej W. Rozycki <macro@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.14+
      Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f363b0ca
    • Maciej W. Rozycki's avatar
      MIPS: ptrace: Expose FIR register through FP regset · 422494a9
      Maciej W. Rozycki authored
      commit 71e909c0 upstream.
      
      Correct commit 7aeb753b ("MIPS: Implement task_user_regset_view.")
      and expose the FIR register using the unused 4 bytes at the end of the
      NT_PRFPREG regset.  Without that register included clients cannot use
      the PTRACE_GETREGSET request to retrieve the complete FPU register set
      and have to resort to one of the older interfaces, either PTRACE_PEEKUSR
      or PTRACE_GETFPREGS, to retrieve the missing piece of data.  Also the
      register is irreversibly missing from core dumps.
      
      This register is architecturally hardwired and read-only so the write
      path does not matter.  Ignore data supplied on writes then.
      
      Fixes: 7aeb753b ("MIPS: Implement task_user_regset_view.")
      Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarMaciej W. Rozycki <macro@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.13+
      Patchwork: https://patchwork.linux-mips.org/patch/19273/Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      422494a9
  2. 26 May, 2018 29 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.4.133 · 7620164e
      Greg Kroah-Hartman authored
      7620164e
    • Tetsuo Handa's avatar
      x86/kexec: Avoid double free_page() upon do_kexec_load() failure · eef045e7
      Tetsuo Handa authored
      commit a466ef76 upstream.
      
      >From ff82bedd3e12f0d3353282054ae48c3bd8c72012 Mon Sep 17 00:00:00 2001
      From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Date: Wed, 9 May 2018 12:12:39 +0900
      Subject: x86/kexec: Avoid double free_page() upon do_kexec_load() failure
      
      syzbot is reporting crashes after memory allocation failure inside
      do_kexec_load() [1]. This is because free_transition_pgtable() is called
      by both init_transition_pgtable() and machine_kexec_cleanup() when memory
      allocation failed inside init_transition_pgtable().
      
      Regarding 32bit code, machine_kexec_free_page_tables() is called by both
      machine_kexec_alloc_page_tables() and machine_kexec_cleanup() when memory
      allocation failed inside machine_kexec_alloc_page_tables().
      
      Fix this by leaving the error handling to machine_kexec_cleanup()
      (and optionally setting NULL after free_page()).
      
      [1] https://syzkaller.appspot.com/bug?id=91e52396168cf2bdd572fe1e1bc0bc645c1c6b40
      
      Fixes: f5deb796 ("x86: kexec: Use one page table in x86_64 machine_kexec")
      Fixes: 92be3d6b ("kexec/i386: allocate page table pages dynamically")
      Reported-by: default avatarsyzbot <syzbot+d96f60296ef613fe1d69@syzkaller.appspotmail.com>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: prudo@linux.vnet.ibm.com
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: syzkaller-bugs@googlegroups.com
      Cc: takahiro.akashi@linaro.org
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: akpm@linux-foundation.org
      Cc: dyoung@redhat.com
      Cc: kirill.shutemov@linux.intel.com
      Link: https://lkml.kernel.org/r/201805091942.DGG12448.tMFVFSJFQOOLHO@I-love.SAKURA.ne.jpSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eef045e7
    • Tetsuo Handa's avatar
      hfsplus: stop workqueue when fill_super() failed · 338762ca
      Tetsuo Handa authored
      commit 66072c29 upstream.
      
      syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1].  This
      is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync().
      
      As far as I can see, it is hfsplus_mark_mdb_dirty() from
      hfsplus_new_inode() in hfsplus_fill_super() that calls
      queue_delayed_work().  Therefore, I assume that hfsplus_new_inode() does
      not fail if queue_delayed_work() was called, and the out_put_hidden_dir
      label is the appropriate location to call cancel_delayed_work_sync().
      
      [1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c9
      
      Link: http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056201@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+4f2e5f086147d543ab03@syzkaller.appspotmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      338762ca
    • Johannes Berg's avatar
      cfg80211: limit wiphy names to 128 bytes · 87c807f1
      Johannes Berg authored
      commit a7cfebcb upstream.
      
      There's currently no limit on wiphy names, other than netlink
      message size and memory limitations, but that causes issues when,
      for example, the wiphy name is used in a uevent, e.g. in rfkill
      where we use the same name for the rfkill instance, and then the
      buffer there is "only" 2k for the environment variables.
      
      This was reported by syzkaller, which used a 4k name.
      
      Limit the name to something reasonable, I randomly picked 128.
      
      Reported-by: syzbot+230d9e642a85d3fec29c@syzkaller.appspotmail.com
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87c807f1
    • Geert Uytterhoeven's avatar
      gpio: rcar: Add Runtime PM handling for interrupts · c610b0cb
      Geert Uytterhoeven authored
      commit b26a719b upstream.
      
      The R-Car GPIO driver handles Runtime PM for requested GPIOs only.
      
      When using a GPIO purely as an interrupt source, no Runtime PM handling
      is done, and the GPIO module's clock may not be enabled.
      
      To fix this:
        - Add .irq_request_resources() and .irq_release_resources() callbacks
          to handle Runtime PM when an interrupt is requested,
        - Add irq_bus_lock() and sync_unlock() callbacks to handle Runtime PM
          when e.g. disabling/enabling an interrupt, or configuring the
          interrupt type.
      
      Fixes: d5c3d846 "net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS"
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      [fabrizio: cherry-pick to v4.4.y. Use container_of instead of
      gpiochip_get_data.]
      Signed-off-by: default avatarFabrizio Castro <fabrizio.castro@bp.renesas.com>
      Reviewed-by: default avatarBiju Das <biju.das@bp.renesas.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c610b0cb
    • John Stultz's avatar
      time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting · 09f7ebaa
      John Stultz authored
      commit 3d88d56c upstream.
      
      Due to how the MONOTONIC_RAW accumulation logic was handled,
      there is the potential for a 1ns discontinuity when we do
      accumulations. This small discontinuity has for the most part
      gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
      in their vDSO clock_gettime implementation, we've seen failures
      with the inconsistency-check test in kselftest.
      
      This patch addresses the issue by using the same sub-ns
      accumulation handling that CLOCK_MONOTONIC uses, which avoids
      the issue for in-kernel users.
      
      Since the ARM64 vDSO implementation has its own clock_gettime
      calculation logic, this patch reduces the frequency of errors,
      but failures are still seen. The ARM64 vDSO will need to be
      updated to include the sub-nanosecond xtime_nsec values in its
      calculation for this issue to be completely fixed.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarDaniel Mentz <danielmentz@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "stable #4 . 8+" <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [fabrizio: cherry-pick to 4.4. Kept cycle_t type for function
      logarithmic_accumulation local variable "interval". Dropped
      casting of "interval" variable]
      Signed-off-by: default avatarFabrizio Castro <fabrizio.castro@bp.renesas.com>
      Signed-off-by: default avatarBiju Das <biju.das@bp.renesas.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09f7ebaa
    • Vinod Koul's avatar
      dmaengine: ensure dmaengine helpers check valid callback · 92cffdc9
      Vinod Koul authored
      commit 757d12e5 upstream.
      
      dmaengine has various device callbacks and exposes helper
      functions to invoke these. These helpers should check if channel,
      device and callback is valid or not before invoking them.
      Reported-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarFabrizio Castro <fabrizio.castro@bp.renesas.com>
      Signed-off-by: default avatarJianming Qiao <jianming.qiao@bp.renesas.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92cffdc9
    • Jens Remus's avatar
      scsi: zfcp: fix infinite iteration on ERP ready list · 36797134
      Jens Remus authored
      commit fa89adba upstream.
      
      zfcp_erp_adapter_reopen() schedules blocking of all of the adapter's
      rports via zfcp_scsi_schedule_rports_block() and enqueues a reopen
      adapter ERP action via zfcp_erp_action_enqueue(). Both are separately
      processed asynchronously and concurrently.
      
      Blocking of rports is done in a kworker by zfcp_scsi_rport_work(). It
      calls zfcp_scsi_rport_block(), which then traces a DBF REC "scpdely" via
      zfcp_dbf_rec_trig().  zfcp_dbf_rec_trig() acquires the DBF REC spin lock
      and then iterates with list_for_each() over the adapter's ERP ready list
      without holding the ERP lock. This opens a race window in which the
      current list entry can be moved to another list, causing list_for_each()
      to iterate forever on the wrong list, as the erp_ready_head is never
      encountered as terminal condition.
      
      Meanwhile the ERP action can be processed in the ERP thread by
      zfcp_erp_thread(). It calls zfcp_erp_strategy(), which acquires the ERP
      lock and then calls zfcp_erp_action_to_running() to move the ERP action
      from the ready to the running list.  zfcp_erp_action_to_running() can
      move the ERP action using list_move() just during the aforementioned
      race window. It then traces a REC RUN "erator1" via zfcp_dbf_rec_run().
      zfcp_dbf_rec_run() tries to acquire the DBF REC spin lock. If this is
      held by the infinitely looping kworker, it effectively spins forever.
      
      Example Sequence Diagram:
      
      Process                ERP Thread             rport_work
      -------------------    -------------------    -------------------
      zfcp_erp_adapter_reopen()
      zfcp_erp_adapter_block()
      zfcp_scsi_schedule_rports_block()
      lock ERP                                      zfcp_scsi_rport_work()
      zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER)
      list_add_tail() on ready                      !(rport_task==RPORT_ADD)
      wake_up() ERP thread                          zfcp_scsi_rport_block()
      zfcp_dbf_rec_trig()    zfcp_erp_strategy()    zfcp_dbf_rec_trig()
      unlock ERP                                    lock DBF REC
      zfcp_erp_wait()        lock ERP
      |                      zfcp_erp_action_to_running()
      |                                             list_for_each() ready
      |                      list_move()              current entry
      |                        ready to running
      |                      zfcp_dbf_rec_run()       endless loop over running
      |                      zfcp_dbf_rec_run_lvl()
      |                      lock DBF REC spins forever
      
      Any adapter recovery can trigger this, such as setting the device offline
      or reboot.
      
      V4.9 commit 4eeaa4f3 ("zfcp: close window with unblocked rport
      during rport gone") introduced additional tracing of (un)blocking of
      rports. It missed that the adapter->erp_lock must be held when calling
      zfcp_dbf_rec_trig().
      
      This fix uses the approach formerly introduced by commit aa0fec62
      ("[SCSI] zfcp: Fix sparse warning by providing new entry in dbf") that got
      later removed by commit ae0904f6 ("[SCSI] zfcp: Redesign of the debug
      tracing for recovery actions.").
      
      Introduce zfcp_dbf_rec_trig_lock(), a wrapper for zfcp_dbf_rec_trig() that
      acquires and releases the adapter->erp_lock for read.
      Reported-by: default avatarSebastian Ott <sebott@linux.ibm.com>
      Signed-off-by: default avatarJens Remus <jremus@linux.ibm.com>
      Fixes: 4eeaa4f3 ("zfcp: close window with unblocked rport during rport gone")
      Cc: <stable@vger.kernel.org> # 2.6.32+
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36797134
    • Alexander Potapenko's avatar
      scsi: sg: allocate with __GFP_ZERO in sg_build_indirect() · 93314640
      Alexander Potapenko authored
      commit a45b599a upstream.
      
      This shall help avoid copying uninitialized memory to the userspace when
      calling ioctl(fd, SG_IO) with an empty command.
      
      Reported-by: syzbot+7d26fc1eea198488deab@syzkaller.appspotmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93314640
    • Jason Yan's avatar
      scsi: libsas: defer ata device eh commands to libata · 6efcc74e
      Jason Yan authored
      commit 318aaf34 upstream.
      
      When ata device doing EH, some commands still attached with tasks are
      not passed to libata when abort failed or recover failed, so libata did
      not handle these commands. After these commands done, sas task is freed,
      but ata qc is not freed. This will cause ata qc leak and trigger a
      warning like below:
      
      WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
      ata_eh_finish+0xb4/0xcc
      CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G     W  OE 4.14.0#1
      ......
      Call trace:
      [<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc
      [<ffff0000088b8420>] ata_do_eh+0xc4/0xd8
      [<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c
      [<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694
      [<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80
      [<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170
      [<ffff0000080ebd70>] process_one_work+0x144/0x390
      [<ffff0000080ec100>] worker_thread+0x144/0x418
      [<ffff0000080f2c98>] kthread+0x10c/0x138
      [<ffff0000080855dc>] ret_from_fork+0x10/0x18
      
      If ata qc leaked too many, ata tag allocation will fail and io blocked
      for ever.
      
      As suggested by Dan Williams, defer ata device commands to libata and
      merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle
      ata qcs correctly after this.
      Signed-off-by: default avatarJason Yan <yanaijie@huawei.com>
      CC: Xiaofei Tan <tanxiaofei@huawei.com>
      CC: John Garry <john.garry@huawei.com>
      CC: Dan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6efcc74e
    • Martin Schwidefsky's avatar
      s390: use expoline thunks in the BPF JIT · 8c6d306f
      Martin Schwidefsky authored
      [ Upstream commit de5cb6eb ]
      
      The BPF JIT need safe guarding against spectre v2 in the sk_load_xxx
      assembler stubs and the indirect branches generated by the JIT itself
      need to be converted to expolines.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c6d306f
    • Martin Schwidefsky's avatar
      s390: extend expoline to BC instructions · f436cb96
      Martin Schwidefsky authored
      [ Upstream commit 6deaa3bb ]
      
      The BPF JIT uses a 'b <disp>(%r<x>)' instruction in the definition
      of the sk_load_word and sk_load_half functions.
      
      Add support for branch-on-condition instructions contained in the
      thunk code of an expoline.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f436cb96
    • Martin Schwidefsky's avatar
      s390: move spectre sysfs attribute code · c617e74f
      Martin Schwidefsky authored
      [ Upstream commit 4253b0e0 ]
      
      The nospec-branch.c file is compiled without the gcc options to
      generate expoline thunks. The return branch of the sysfs show
      functions cpu_show_spectre_v1 and cpu_show_spectre_v2 is an indirect
      branch as well. These need to be compiled with expolines.
      
      Move the sysfs functions for spectre reporting to a separate file
      and loose an '.' for one of the messages.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: d424986f ("s390: add sysfs attributes for spectre")
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c617e74f
    • Martin Schwidefsky's avatar
      s390/kernel: use expoline for indirect branches · 90305465
      Martin Schwidefsky authored
      [ Upstream commit c50c84c3 ]
      
      The assember code in arch/s390/kernel uses a few more indirect branches
      which need to be done with execute trampolines for CONFIG_EXPOLINE=y.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90305465
    • Martin Schwidefsky's avatar
      s390/lib: use expoline for indirect branches · 5ce9dc0f
      Martin Schwidefsky authored
      [ Upstream commit 97489e06 ]
      
      The return from the memmove, memset, memcpy, __memset16, __memset32 and
      __memset64 functions are done with "br %r14". These are indirect branches
      as well and need to use execute trampolines for CONFIG_EXPOLINE=y.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ce9dc0f
    • Martin Schwidefsky's avatar
      s390: move expoline assembler macros to a header · 73bf2b1c
      Martin Schwidefsky authored
      [ Upstream commit 6dd85fbb ]
      
      To be able to use the expoline branches in different assembler
      files move the associated macros from entry.S to a new header
      nospec-insn.h.
      
      While we are at it make the macros a bit nicer to use.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73bf2b1c
    • Martin Schwidefsky's avatar
      s390: add assembler macros for CPU alternatives · 2685d33c
      Martin Schwidefsky authored
      [ Upstream commit fba9eb79 ]
      
      Add a header with macros usable in assembler files to emit alternative
      code sequences. It works analog to the alternatives for inline assmeblies
      in C files, with the same restrictions and capabilities.
      The syntax is
      
           ALTERNATIVE "<default instructions sequence>", \
      		 "<alternative instructions sequence>", \
      		 "<features-bit>"
      and
      
           ALTERNATIVE_2 "<default instructions sequence>", \
      		   "<alternative instructions sqeuence #1>", \
      		   "<feature-bit #1>",
      		   "<alternative instructions sqeuence #2>", \
      		   "<feature-bit #2>"
      Reviewed-by: default avatarVasily Gorbik <gor@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2685d33c
    • Al Viro's avatar
      ext2: fix a block leak · 3cd868dc
      Al Viro authored
      commit 5aa1437d upstream.
      
      open file, unlink it, then use ioctl(2) to make it immutable or
      append only.  Now close it and watch the blocks *not* freed...
      
      Immutable/append-only checks belong in ->setattr().
      Note: the bug is old and backport to anything prior to 737f2e93
      ("ext2: convert to use the new truncate convention") will need
      these checks lifted into ext2_setattr().
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3cd868dc
    • Eric Dumazet's avatar
      tcp: purge write queue in tcp_connect_init() · 5bbe138a
      Eric Dumazet authored
      [ Upstream commit 7f582b24 ]
      
      syzkaller found a reliable way to crash the host, hitting a BUG()
      in __tcp_retransmit_skb()
      
      Malicous MSG_FASTOPEN is the root cause. We need to purge write queue
      in tcp_connect_init() at the point we init snd_una/write_seq.
      
      This patch also replaces the BUG() by a less intrusive WARN_ON_ONCE()
      
      kernel BUG at net/ipv4/tcp_output.c:2837!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 5276 Comm: syz-executor0 Not tainted 4.17.0-rc3+ #51
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__tcp_retransmit_skb+0x2992/0x2eb0 net/ipv4/tcp_output.c:2837
      RSP: 0000:ffff8801dae06ff8 EFLAGS: 00010206
      RAX: ffff8801b9fe61c0 RBX: 00000000ffc18a16 RCX: ffffffff864e1a49
      RDX: 0000000000000100 RSI: ffffffff864e2e12 RDI: 0000000000000005
      RBP: ffff8801dae073a0 R08: ffff8801b9fe61c0 R09: ffffed0039c40dd2
      R10: ffffed0039c40dd2 R11: ffff8801ce206e93 R12: 00000000421eeaad
      R13: ffff8801ce206d4e R14: ffff8801ce206cc0 R15: ffff8801cd4f4a80
      FS:  0000000000000000(0000) GS:ffff8801dae00000(0063) knlGS:00000000096bc900
      CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 0000000020000000 CR3: 00000001c47b6000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
       tcp_retransmit_skb+0x2e/0x250 net/ipv4/tcp_output.c:2923
       tcp_retransmit_timer+0xc50/0x3060 net/ipv4/tcp_timer.c:488
       tcp_write_timer_handler+0x339/0x960 net/ipv4/tcp_timer.c:573
       tcp_write_timer+0x111/0x1d0 net/ipv4/tcp_timer.c:593
       call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
       expire_timers kernel/time/timer.c:1363 [inline]
       __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
       run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
       __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
       invoke_softirq kernel/softirq.c:365 [inline]
       irq_exit+0x1d1/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:525 [inline]
       smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
      
      Fixes: cf60af03 ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bbe138a
    • Eric Dumazet's avatar
      sock_diag: fix use-after-free read in __sk_free · 8e299f7a
      Eric Dumazet authored
      [ Upstream commit 9709020c ]
      
      We must not call sock_diag_has_destroy_listeners(sk) on a socket
      that has no reference on net structure.
      
      BUG: KASAN: use-after-free in sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
      BUG: KASAN: use-after-free in __sk_free+0x329/0x340 net/core/sock.c:1609
      Read of size 8 at addr ffff88018a02e3a0 by task swapper/1/0
      
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.17.0-rc5+ #54
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
       __sk_free+0x329/0x340 net/core/sock.c:1609
       sk_free+0x42/0x50 net/core/sock.c:1623
       sock_put include/net/sock.h:1664 [inline]
       reqsk_free include/net/request_sock.h:116 [inline]
       reqsk_put include/net/request_sock.h:124 [inline]
       inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:672 [inline]
       reqsk_timer_handler+0xe27/0x10e0 net/ipv4/inet_connection_sock.c:739
       call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
       expire_timers kernel/time/timer.c:1363 [inline]
       __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
       run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
       __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
       invoke_softirq kernel/softirq.c:365 [inline]
       irq_exit+0x1d1/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:525 [inline]
       smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
       </IRQ>
      RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
      RSP: 0018:ffff8801d9ae7c38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
      RAX: dffffc0000000000 RBX: 1ffff1003b35cf8a RCX: 0000000000000000
      RDX: 1ffffffff11a30d0 RSI: 0000000000000001 RDI: ffffffff88d18680
      RBP: ffff8801d9ae7c38 R08: ffffed003b5e46c3 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
      R13: ffff8801d9ae7cf0 R14: ffffffff897bef20 R15: 0000000000000000
       arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
       default_idle+0xc2/0x440 arch/x86/kernel/process.c:354
       arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345
       default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
       cpuidle_idle_call kernel/sched/idle.c:153 [inline]
       do_idle+0x395/0x560 kernel/sched/idle.c:262
       cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:368
       start_secondary+0x426/0x5b0 arch/x86/kernel/smpboot.c:269
       secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
      
      Allocated by task 4557:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
       kmem_cache_zalloc include/linux/slab.h:691 [inline]
       net_alloc net/core/net_namespace.c:383 [inline]
       copy_net_ns+0x159/0x4c0 net/core/net_namespace.c:423
       create_new_namespaces+0x69d/0x8f0 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xc3/0x1f0 kernel/nsproxy.c:206
       ksys_unshare+0x708/0xf90 kernel/fork.c:2408
       __do_sys_unshare kernel/fork.c:2476 [inline]
       __se_sys_unshare kernel/fork.c:2474 [inline]
       __x64_sys_unshare+0x31/0x40 kernel/fork.c:2474
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 69:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
       net_free net/core/net_namespace.c:399 [inline]
       net_drop_ns.part.14+0x11a/0x130 net/core/net_namespace.c:406
       net_drop_ns net/core/net_namespace.c:405 [inline]
       cleanup_net+0x6a1/0xb20 net/core/net_namespace.c:541
       process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
       worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
       kthread+0x345/0x410 kernel/kthread.c:240
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      
      The buggy address belongs to the object at ffff88018a02c140
       which belongs to the cache net_namespace of size 8832
      The buggy address is located 8800 bytes inside of
       8832-byte region [ffff88018a02c140, ffff88018a02e3c0)
      The buggy address belongs to the page:
      page:ffffea0006280b00 count:1 mapcount:0 mapping:ffff88018a02c140 index:0x0 compound_mapcount: 0
      flags: 0x2fffc0000008100(slab|head)
      raw: 02fffc0000008100 ffff88018a02c140 0000000000000000 0000000100000001
      raw: ffffea00062a1320 ffffea0006268020 ffff8801d9bdde40 0000000000000000
      page dumped because: kasan: bad access detected
      
      Fixes: b922622e ("sock_diag: don't broadcast kernel sockets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Craig Gallek <kraig@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e299f7a
    • Willem de Bruijn's avatar
      packet: in packet_snd start writing at link layer allocation · d9fb8cc2
      Willem de Bruijn authored
      [ Upstream commit b84bbaf7 ]
      
      Packet sockets allow construction of packets shorter than
      dev->hard_header_len to accommodate protocols with variable length
      link layer headers. These packets are padded to dev->hard_header_len,
      because some device drivers interpret that as a minimum packet size.
      
      packet_snd reserves dev->hard_header_len bytes on allocation.
      SOCK_DGRAM sockets call skb_push in dev_hard_header() to ensure that
      link layer headers are stored in the reserved range. SOCK_RAW sockets
      do the same in tpacket_snd, but not in packet_snd.
      
      Syzbot was able to send a zero byte packet to a device with massive
      116B link layer header, causing padding to cross over into skb_shinfo.
      Fix this by writing from the start of the llheader reserved range also
      in the case of packet_snd/SOCK_RAW.
      
      Update skb_set_network_header to the new offset. This also corrects
      it for SOCK_DGRAM, where it incorrectly double counted reserve due to
      the skb_push in dev_hard_header.
      
      Fixes: 9ed988cd ("packet: validate variable length ll headers")
      Reported-by: syzbot+71d74a5406d02057d559@syzkaller.appspotmail.com
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9fb8cc2
    • Willem de Bruijn's avatar
      net: test tailroom before appending to linear skb · 671cf50f
      Willem de Bruijn authored
      [ Upstream commit 113f99c3 ]
      
      Device features may change during transmission. In particular with
      corking, a device may toggle scatter-gather in between allocating
      and writing to an skb.
      
      Do not unconditionally assume that !NETIF_F_SG at write time implies
      that the same held at alloc time and thus the skb has sufficient
      tailroom.
      
      This issue predates git history.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      671cf50f
    • Liu Bo's avatar
      btrfs: fix reading stale metadata blocks after degraded raid1 mounts · e2da30c5
      Liu Bo authored
      commit 02a3307a upstream.
      
      If a btree block, aka. extent buffer, is not available in the extent
      buffer cache, it'll be read out from the disk instead, i.e.
      
      btrfs_search_slot()
        read_block_for_search()  # hold parent and its lock, go to read child
          btrfs_release_path()
          read_tree_block()  # read child
      
      Unfortunately, the parent lock got released before reading child, so
      commit 5bdd3536 ("Btrfs: Fix block generation verification race") had
      used 0 as parent transid to read the child block.  It forces
      read_tree_block() not to check if parent transid is different with the
      generation id of the child that it reads out from disk.
      
      A simple PoC is included in btrfs/124,
      
      0. A two-disk raid1 btrfs,
      
      1. Right after mkfs.btrfs, block A is allocated to be device tree's root.
      
      2. Mount this filesystem and put it in use, after a while, device tree's
         root got COW but block A hasn't been allocated/overwritten yet.
      
      3. Umount it and reload the btrfs module to remove both disks from the
         global @fs_devices list.
      
      4. mount -odegraded dev1 and write some data, so now block A is allocated
         to be a leaf in checksum tree.  Note that only dev1 has the latest
         metadata of this filesystem.
      
      5. Umount it and mount it again normally (with both disks), since raid1
         can pick up one disk by the writer task's pid, if btrfs_search_slot()
         needs to read block A, dev2 which does NOT have the latest metadata
         might be read for block A, then we got a stale block A.
      
      6. As parent transid is not checked, block A is marked as uptodate and
         put into the extent buffer cache, so the future search won't bother
         to read disk again, which means it'll make changes on this stale
         one and make it dirty and flush it onto disk.
      
      To avoid the problem, parent transid needs to be passed to
      read_tree_block().
      
      In order to get a valid parent transid, we need to hold the parent's
      lock until finishing reading child.
      
      This patch needs to be slightly adapted for stable kernels, the
      &first_key parameter added to read_tree_block() is from 4.16+
      (581c1760). The fix is to replace 0 by 'gen'.
      
      Fixes: 5bdd3536 ("Btrfs: Fix block generation verification race")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      [ update changelog ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2da30c5
    • Anand Jain's avatar
      btrfs: fix crash when trying to resume balance without the resume flag · 68dea4bd
      Anand Jain authored
      commit 02ee654d upstream.
      
      We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance()
      only, which isn't called during the remount. So when resuming from
      the paused balance we hit the bug:
      
       kernel: kernel BUG at fs/btrfs/volumes.c:3890!
       ::
       kernel:  balance_kthread+0x51/0x60 [btrfs]
       kernel:  kthread+0x111/0x130
       ::
       kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8
      
      Reproducer:
        On a mounted filesystem:
      
        btrfs balance start --full-balance /btrfs
        btrfs balance pause /btrfs
        mount -o remount,ro /dev/sdb /btrfs
        mount -o remount,rw /dev/sdb /btrfs
      
      To fix this set the BTRFS_BALANCE_RESUME flag in
      btrfs_resume_balance_async().
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68dea4bd
    • Filipe Manana's avatar
      Btrfs: fix xattr loss after power failure · 72d1df8a
      Filipe Manana authored
      commit 9a8fca62 upstream.
      
      If a file has xattrs, we fsync it, to ensure we clear the flags
      BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its
      inode, the current transaction commits and then we fsync it (without
      either of those bits being set in its inode), we end up not logging
      all its xattrs. This results in deleting all xattrs when replying the
      log after a power failure.
      
      Trivial reproducer
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ touch /mnt/foobar
        $ setfattr -n user.xa -v qwerty /mnt/foobar
        $ xfs_io -c "fsync" /mnt/foobar
      
        $ sync
      
        $ xfs_io -c "pwrite -S 0xab 0 64K" /mnt/foobar
        $ xfs_io -c "fsync" /mnt/foobar
        <power failure>
      
        $ mount /dev/sdb /mnt
        $ getfattr --absolute-names --dump /mnt/foobar
        <empty output>
        $
      
      So fix this by making sure all xattrs are logged if we log a file's inode
      item and neither the flags BTRFS_INODE_NEEDS_FULL_SYNC nor
      BTRFS_INODE_COPY_EVERYTHING were set in the inode.
      
      Fixes: 36283bf7 ("Btrfs: fix fsync xattr loss in the fast fsync path")
      Cc: <stable@vger.kernel.org> # 4.2+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72d1df8a
    • Masami Hiramatsu's avatar
      ARM: 8772/1: kprobes: Prohibit kprobes on get_user functions · 104cff91
      Masami Hiramatsu authored
      commit 0d73c3f8 upstream.
      
      Since do_undefinstr() uses get_user to get the undefined
      instruction, it can be called before kprobes processes
      recursive check. This can cause an infinit recursive
      exception.
      Prohibit probing on get_user functions.
      
      Fixes: 24ba613c ("ARM kprobes: core code")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      104cff91
    • Masami Hiramatsu's avatar
      ARM: 8770/1: kprobes: Prohibit probing on optimized_callback · 51425b75
      Masami Hiramatsu authored
      commit 70948c05 upstream.
      
      Prohibit probing on optimized_callback() because
      it is called from kprobes itself. If we put a kprobes
      on it, that will cause a recursive call loop.
      Mark it NOKPROBE_SYMBOL.
      
      Fixes: 0dc016db ("ARM: kprobes: enable OPTPROBES for ARM 32")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51425b75
    • Masami Hiramatsu's avatar
      ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabed · 0ec84ae5
      Masami Hiramatsu authored
      commit 69af7e23 upstream.
      
      Since get_kprobe_ctlblk() uses smp_processor_id() to access
      per-cpu variable, it hits smp_processor_id sanity check as below.
      
      [    7.006928] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
      [    7.007859] caller is debug_smp_processor_id+0x20/0x24
      [    7.008438] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00192-g4eb17253e4b5 #1
      [    7.008890] Hardware name: Generic DT based system
      [    7.009917] [<c0313f0c>] (unwind_backtrace) from [<c030e6d8>] (show_stack+0x20/0x24)
      [    7.010473] [<c030e6d8>] (show_stack) from [<c0c64694>] (dump_stack+0x84/0x98)
      [    7.010990] [<c0c64694>] (dump_stack) from [<c071ca5c>] (check_preemption_disabled+0x138/0x13c)
      [    7.011592] [<c071ca5c>] (check_preemption_disabled) from [<c071ca80>] (debug_smp_processor_id+0x20/0x24)
      [    7.012214] [<c071ca80>] (debug_smp_processor_id) from [<c03335e0>] (optimized_callback+0x2c/0xe4)
      [    7.013077] [<c03335e0>] (optimized_callback) from [<bf0021b0>] (0xbf0021b0)
      
      To fix this issue, call get_kprobe_ctlblk() right after
      irq-disabled since that disables preemption.
      
      Fixes: 0dc016db ("ARM: kprobes: enable OPTPROBES for ARM 32")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ec84ae5
    • Dexuan Cui's avatar
      tick/broadcast: Use for_each_cpu() specially on UP kernels · 2c4c0ab8
      Dexuan Cui authored
      commit 5596fe34 upstream.
      
      for_each_cpu() unintuitively reports CPU0 as set independent of the actual
      cpumask content on UP kernels. This causes an unexpected PIT interrupt
      storm on a UP kernel running in an SMP virtual machine on Hyper-V, and as
      a result, the virtual machine can suffer from a strange random delay of 1~20
      minutes during boot-up, and sometimes it can hang forever.
      
      Protect if by checking whether the cpumask is empty before entering the
      for_each_cpu() loop.
      
      [ tglx: Use !IS_ENABLED(CONFIG_SMP) instead of #ifdeffery ]
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: stable@vger.kernel.org
      Cc: Rakib Mullick <rakib.mullick@gmail.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Link: https://lkml.kernel.org/r/KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
      Link: https://lkml.kernel.org/r/KL1P15301MB0006FA63BC22BEB64902EAA0BF930@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COMSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c4c0ab8