1. 26 Jun, 2017 40 commits
    • Linus Lüssing's avatar
      ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches · 4c8eb627
      Linus Lüssing authored
      [ Upstream commit a088d1d7 ]
      
      When for instance a mobile Linux device roams from one access point to
      another with both APs sharing the same broadcast domain and a
      multicast snooping switch in between:
      
      1)    (c) <~~~> (AP1) <--[SSW]--> (AP2)
      
      2)              (AP1) <--[SSW]--> (AP2) <~~~> (c)
      
      Then currently IPv6 multicast packets will get lost for (c) until an
      MLD Querier sends its next query message. The packet loss occurs
      because upon roaming the Linux host so far stayed silent regarding
      MLD and the snooping switch will therefore be unaware of the
      multicast topology change for a while.
      
      This patch fixes this by always resending MLD reports when an interface
      change happens, for instance from NO-CARRIER to CARRIER state.
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      4c8eb627
    • Stefan Brüns's avatar
      sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications · 0def8e45
      Stefan Brüns authored
      [ Upstream commit 5a70348e ]
      
      If a context is configured as dualstack ("IPv4v6"), the modem indicates
      the context activation with a slightly different indication message.
      The dual-stack indication omits the link_type (IPv4/v6) and adds
      additional address fields.
      IPv6 LSIs are identical to IPv4 LSIs, but have a different link type.
      Signed-off-by: default avatarStefan Brüns <stefan.bruens@rwth-aachen.de>
      Reviewed-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      0def8e45
    • Stefan Brüns's avatar
      sierra_net: Skip validating irrelevant fields for IDLE LSIs · 0c2950fa
      Stefan Brüns authored
      [ Upstream commit 764895d3 ]
      
      When the context is deactivated, the link_type is set to 0xff, which
      triggers a warning message, and results in a wrong link status, as
      the LSI is ignored.
      Signed-off-by: default avatarStefan Brüns <stefan.bruens@rwth-aachen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      0c2950fa
    • Ralf Baechle's avatar
      NET: mkiss: Fix panic · a9cbb7cd
      Ralf Baechle authored
      [ Upstream commit 7ba1b689 ]
      
      If a USB-to-serial adapter is unplugged, the driver re-initializes, with
      dev->hard_header_len and dev->addr_len set to zero, instead of the correct
      values.  If then a packet is sent through the half-dead interface, the
      kernel will panic due to running out of headroom in the skb when pushing
      for the AX.25 headers resulting in this panic:
      
      [<c0595468>] (skb_panic) from [<c0401f70>] (skb_push+0x4c/0x50)
      [<c0401f70>] (skb_push) from [<bf0bdad4>] (ax25_hard_header+0x34/0xf4 [ax25])
      [<bf0bdad4>] (ax25_hard_header [ax25]) from [<bf0d05d4>] (ax_header+0x38/0x40 [mkiss])
      [<bf0d05d4>] (ax_header [mkiss]) from [<c041b584>] (neigh_compat_output+0x8c/0xd8)
      [<c041b584>] (neigh_compat_output) from [<c043e7a8>] (ip_finish_output+0x2a0/0x914)
      [<c043e7a8>] (ip_finish_output) from [<c043f948>] (ip_output+0xd8/0xf0)
      [<c043f948>] (ip_output) from [<c043f04c>] (ip_local_out_sk+0x44/0x48)
      
      This patch makes mkiss behave like the 6pack driver. 6pack does not
      panic.  In 6pack.c sp_setup() (same function name here) the values for
      dev->hard_header_len and dev->addr_len are set to the same values as in
      my mkiss patch.
      
      [ralf@linux-mips.org: Massages original submission to conform to the usual
      standards for patch submissions.]
      Signed-off-by: default avatarThomas Osterried <thomas@osterried.de>
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      a9cbb7cd
    • Ralf Baechle's avatar
      NET: Fix /proc/net/arp for AX.25 · d914dc3b
      Ralf Baechle authored
      [ Upstream commit 4872e57c ]
      
      When sending ARP requests over AX.25 links the hwaddress in the neighbour
      cache are not getting initialized.  For such an incomplete arp entry
      ax2asc2 will generate an empty string resulting in /proc/net/arp output
      like the following:
      
      $ cat /proc/net/arp
      IP address       HW type     Flags       HW address            Mask     Device
      192.168.122.1    0x1         0x2         52:54:00:00:5d:5f     *        ens3
      172.20.1.99      0x3         0x0              *        bpq0
      
      The missing field will confuse the procfs parsing of arp(8) resulting in
      incorrect output for the device such as the following:
      
      $ arp
      Address                  HWtype  HWaddress           Flags Mask            Iface
      gateway                  ether   52:54:00:00:5d:5f   C                     ens3
      172.20.1.99                      (incomplete)                              ens3
      
      This changes the content of /proc/net/arp to:
      
      $ cat /proc/net/arp
      IP address       HW type     Flags       HW address            Mask     Device
      172.20.1.99      0x3         0x0         *                     *        bpq0
      192.168.122.1    0x1         0x2         52:54:00:00:5d:5f     *        ens3
      
      To do so it change ax2asc to put the string "*" in buf for a NULL address
      argument.  Finally the HW address field is left aligned in a 17 character
      field (the length of an ethernet HW address in the usual hex notation) for
      readability.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      d914dc3b
    • Jonathan T. Leighton's avatar
      ipv6: Inhibit IPv4-mapped src address on the wire. · 68978d69
      Jonathan T. Leighton authored
      [ Upstream commit ec5e3b0a ]
      
      This patch adds a check for the problematic case of an IPv4-mapped IPv6
      source address and a destination address that is neither an IPv4-mapped
      IPv6 address nor in6addr_any, and returns an appropriate error. The
      check in done before returning from looking up the route.
      Signed-off-by: default avatarJonathan T. Leighton <jtleight@udel.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      68978d69
    • Jonathan T. Leighton's avatar
      ipv6: Handle IPv4-mapped src to in6addr_any dst. · 19708236
      Jonathan T. Leighton authored
      [ Upstream commit 052d2369 ]
      
      This patch adds a check on the type of the source address for the case
      where the destination address is in6addr_any. If the source is an
      IPv4-mapped IPv6 source address, the destination is changed to
      ::ffff:127.0.0.1, and otherwise the destination is changed to ::1. This
      is done in three locations to handle UDP calls to either connect() or
      sendmsg() and TCP calls to connect(). Note that udpv6_sendmsg() delays
      handling an in6addr_any destination until very late, so the patch only
      needs to handle the case where the source is an IPv4-mapped IPv6
      address.
      Signed-off-by: default avatarJonathan T. Leighton <jtleight@udel.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      19708236
    • Anssi Hannula's avatar
      net: xilinx_emaclite: fix receive buffer overflow · dd4d061c
      Anssi Hannula authored
      [ Upstream commit cd224553 ]
      
      xilinx_emaclite looks at the received data to try to determine the
      Ethernet packet length but does not properly clamp it if
      proto_type == ETH_P_IP or 1500 < proto_type <= 1518, causing a buffer
      overflow and a panic via skb_panic() as the length exceeds the allocated
      skb size.
      
      Fix those cases.
      
      Also add an additional unconditional check with WARN_ON() at the end.
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@bitwise.fi>
      Fixes: bb81b2dd ("net: add Xilinx emac lite device driver")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      dd4d061c
    • Anssi Hannula's avatar
      net: xilinx_emaclite: fix freezes due to unordered I/O · 742e7978
      Anssi Hannula authored
      [ Upstream commit acf138f1 ]
      
      The xilinx_emaclite uses __raw_writel and __raw_readl for register
      accesses. Those functions do not imply any kind of memory barriers and
      they may be reordered.
      
      The driver does not seem to take that into account, though, and the
      driver does not satisfy the ordering requirements of the hardware.
      For clear examples, see xemaclite_mdio_write() and xemaclite_mdio_read()
      which try to set MDIO address before initiating the transaction.
      
      I'm seeing system freezes with the driver with GCC 5.4 and current
      Linux kernels on Zynq-7000 SoC immediately when trying to use the
      interface.
      
      In commit 123c1407 ("net: emaclite: Do not use microblaze and ppc
      IO functions") the driver was switched from non-generic
      in_be32/out_be32 (memory barriers, big endian) to
      __raw_readl/__raw_writel (no memory barriers, native endian), so
      apparently the device follows system endianness and the driver was
      originally written with the assumption of memory barriers.
      
      Rather than try to hunt for each case of missing barrier, just switch
      the driver to use iowrite32/ioread32/iowrite32be/ioread32be depending
      on endianness instead.
      
      Tested on little-endian Zynq-7000 ARM SoC FPGA.
      Signed-off-by: default avatarAnssi Hannula <anssi.hannula@bitwise.fi>
      Fixes: 123c1407 ("net: emaclite: Do not use microblaze and ppc IO
      functions")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      742e7978
    • Richard's avatar
      partitions/msdos: FreeBSD UFS2 file systems are not recognized · afae1d9d
      Richard authored
      [ Upstream commit 22322035 ]
      
      The code in block/partitions/msdos.c recognizes FreeBSD, OpenBSD
      and NetBSD partitions and does a reasonable job picking out OpenBSD
      and NetBSD UFS subpartitions.
      
      But for FreeBSD the subpartitions are always "bad".
      
          Kernel: <bsd:bad subpartition - ignored
      
      Though all 3 of these BSD systems use UFS as a file system, only
      FreeBSD uses relative start addresses in the subpartition
      declarations.
      
      The following patch fixes this for FreeBSD partitions and leaves
      the code for OpenBSD and NetBSD intact:
      Signed-off-by: default avatarRichard Narron <comet.berkeley@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      afae1d9d
    • Imre Deak's avatar
      PCI/PM: Add needs_resume flag to avoid suspend complete optimization · 7f6abe4c
      Imre Deak authored
      [ Upstream commit 4d071c32 ]
      
      Some drivers - like i915 - may not support the system suspend direct
      complete optimization due to differences in their runtime and system
      suspend sequence.  Add a flag that when set resumes the device before
      calling the driver's system suspend handlers which effectively disables
      the optimization.
      
      Needed by a future patch fixing suspend/resume on i915.
      
      Suggested by Rafael.
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      7f6abe4c
    • Kees Cook's avatar
      usercopy: Adjust tests to deal with SMAP/PAN · cd1c4f85
      Kees Cook authored
      [ Upstream commit f5f893c5 ]
      
      Under SMAP/PAN/etc, we cannot write directly to userspace memory, so
      this rearranges the test bytes to get written through copy_to_user().
      Additionally drops the bad copy_from_user() test that would trigger a
      memcpy() against userspace on failure.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      cd1c4f85
    • Kristina Martsenko's avatar
      arm64: entry: improve data abort handling of tagged pointers · 9da80866
      Kristina Martsenko authored
      [ Upstream commit 276e9327 ]
      
      When handling a data abort from EL0, we currently zero the top byte of
      the faulting address, as we assume the address is a TTBR0 address, which
      may contain a non-zero address tag. However, the address may be a TTBR1
      address, in which case we should not zero the top byte. This patch fixes
      that. The effect is that the full TTBR1 address is passed to the task's
      signal handler (or printed out in the kernel log).
      
      When handling a data abort from EL1, we leave the faulting address
      intact, as we assume it's either a TTBR1 address or a TTBR0 address with
      tag 0x00. This is true as far as I'm aware, we don't seem to access a
      tagged TTBR0 address anywhere in the kernel. Regardless, it's easy to
      forget about address tags, and code added in the future may not always
      remember to remove tags from addresses before accessing them. So add tag
      handling to the EL1 data abort handler as well. This also makes it
      consistent with the EL0 data abort handler.
      
      Fixes: d50240a5 ("arm64: mm: permit use of tagged pointers at EL0")
      Cc: <stable@vger.kernel.org> # 3.12.x-
      Reviewed-by: default avatarDave Martin <Dave.Martin@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarKristina Martsenko <kristina.martsenko@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      9da80866
    • Julius Werner's avatar
      drivers: char: mem: Fix wraparound check to allow mappings up to the end · 47e49f2d
      Julius Werner authored
      [ Upstream commit 32829da5 ]
      
      A recent fix to /dev/mem prevents mappings from wrapping around the end
      of physical address space. However, the check was written in a way that
      also prevents a mapping reaching just up to the end of physical address
      space, which may be a valid use case (especially on 32-bit systems).
      This patch fixes it by checking the last mapped address (instead of the
      first address behind that) for overflow.
      
      Fixes: b299cde2 ("drivers: char: mem: Check for address space wraparound with mmap()")
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarNico Huber <nico.h@gmx.de>
      Signed-off-by: default avatarJulius Werner <jwerner@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      47e49f2d
    • Takashi Iwai's avatar
      ASoC: Fix use-after-free at card unregistration · bb3556c1
      Takashi Iwai authored
      [ Upstream commit 4efda5f2 ]
      
      soc_cleanup_card_resources() call snd_card_free() at the last of its
      procedure.  This turned out to lead to a use-after-free.
      PCM runtimes have been already removed via soc_remove_pcm_runtimes(),
      while it's dereferenced later in soc_pcm_free() called via
      snd_card_free().
      
      The fix is simple: just move the snd_card_free() call to the beginning
      of the whole procedure.  This also gives another benefit: it
      guarantees that all operations have been shut down before actually
      releasing the resources, which was racy until now.
      Reported-and-tested-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      bb3556c1
    • Takashi Iwai's avatar
      ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT · 88c41586
      Takashi Iwai authored
      [ Upstream commit ba3021b2 ]
      
      snd_timer_user_tselect() reallocates the queue buffer dynamically, but
      it forgot to reset its indices.  Since the read may happen
      concurrently with ioctl and snd_timer_user_tselect() allocates the
      buffer via kmalloc(), this may lead to the leak of uninitialized
      kernel-space data, as spotted via KMSAN:
      
        BUG: KMSAN: use of unitialized memory in snd_timer_user_read+0x6c4/0xa10
        CPU: 0 PID: 1037 Comm: probe Not tainted 4.11.0-rc5+ #2739
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:16
         dump_stack+0x143/0x1b0 lib/dump_stack.c:52
         kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:1007
         kmsan_check_memory+0xc2/0x140 mm/kmsan/kmsan.c:1086
         copy_to_user ./arch/x86/include/asm/uaccess.h:725
         snd_timer_user_read+0x6c4/0xa10 sound/core/timer.c:2004
         do_loop_readv_writev fs/read_write.c:716
         __do_readv_writev+0x94c/0x1380 fs/read_write.c:864
         do_readv_writev fs/read_write.c:894
         vfs_readv fs/read_write.c:908
         do_readv+0x52a/0x5d0 fs/read_write.c:934
         SYSC_readv+0xb6/0xd0 fs/read_write.c:1021
         SyS_readv+0x87/0xb0 fs/read_write.c:1018
      
      This patch adds the missing reset of queue indices.  Together with the
      previous fix for the ioctl/read race, we cover the whole problem.
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Tested-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      88c41586
    • Takashi Iwai's avatar
      ALSA: timer: Fix race between read and ioctl · 5d28ba6e
      Takashi Iwai authored
      [ Upstream commit d11662f4 ]
      
      The read from ALSA timer device, the function snd_timer_user_tread(),
      may access to an uninitialized struct snd_timer_user fields when the
      read is concurrently performed while the ioctl like
      snd_timer_user_tselect() is invoked.  We have already fixed the races
      among ioctls via a mutex, but we seem to have forgotten the race
      between read vs ioctl.
      
      This patch simply applies (more exactly extends the already applied
      range of) tu->ioctl_lock in snd_timer_user_tread() for closing the
      race window.
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Tested-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      5d28ba6e
    • Dan Carpenter's avatar
      drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve() · 29837be8
      Dan Carpenter authored
      [ Upstream commit f0c62e98 ]
      
      If vmalloc() fails then we need to a bit of cleanup before returning.
      
      Cc: <stable@vger.kernel.org>
      Fixes: fb1d9738 ("drm/vmwgfx: Add DRM driver for VMware Virtual GPU")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      29837be8
    • Jin Yao's avatar
      perf/core: Drop kernel samples even though :u is specified · d6f90404
      Jin Yao authored
      [ Upstream commit cc1582c2 ]
      
      When doing sampling, for example:
      
        perf record -e cycles:u ...
      
      On workloads that do a lot of kernel entry/exits we see kernel
      samples, even though :u is specified. This is due to skid existing.
      
      This might be a security issue because it can leak kernel addresses even
      though kernel sampling support is disabled.
      
      The patch drops the kernel samples if exclude_kernel is specified.
      
      For example, test on Haswell desktop:
      
        perf record -e cycles:u <mgen>
        perf report --stdio
      
      Before patch applied:
      
          99.77%  mgen     mgen              [.] buf_read
           0.20%  mgen     mgen              [.] rand_buf_init
           0.01%  mgen     [kernel.vmlinux]  [k] apic_timer_interrupt
           0.00%  mgen     mgen              [.] last_free_elem
           0.00%  mgen     libc-2.23.so      [.] __random_r
           0.00%  mgen     libc-2.23.so      [.] _int_malloc
           0.00%  mgen     mgen              [.] rand_array_init
           0.00%  mgen     [kernel.vmlinux]  [k] page_fault
           0.00%  mgen     libc-2.23.so      [.] __random
           0.00%  mgen     libc-2.23.so      [.] __strcasestr
           0.00%  mgen     ld-2.23.so        [.] strcmp
           0.00%  mgen     ld-2.23.so        [.] _dl_start
           0.00%  mgen     libc-2.23.so      [.] sched_setaffinity@@GLIBC_2.3.4
           0.00%  mgen     ld-2.23.so        [.] _start
      
      We can see kernel symbols apic_timer_interrupt and page_fault.
      
      After patch applied:
      
          99.79%  mgen     mgen           [.] buf_read
           0.19%  mgen     mgen           [.] rand_buf_init
           0.00%  mgen     libc-2.23.so   [.] __random_r
           0.00%  mgen     mgen           [.] rand_array_init
           0.00%  mgen     mgen           [.] last_free_elem
           0.00%  mgen     libc-2.23.so   [.] vfprintf
           0.00%  mgen     libc-2.23.so   [.] rand
           0.00%  mgen     libc-2.23.so   [.] __random
           0.00%  mgen     libc-2.23.so   [.] _int_malloc
           0.00%  mgen     libc-2.23.so   [.] _IO_doallocbuf
           0.00%  mgen     ld-2.23.so     [.] do_lookup_x
           0.00%  mgen     ld-2.23.so     [.] open_verify.constprop.7
           0.00%  mgen     ld-2.23.so     [.] _dl_important_hwcaps
           0.00%  mgen     libc-2.23.so   [.] sched_setaffinity@@GLIBC_2.3.4
           0.00%  mgen     ld-2.23.so     [.] _start
      
      There are only userspace symbols.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Cc: kan.liang@intel.com
      Cc: mark.rutland@arm.com
      Cc: will.deacon@arm.com
      Cc: yao.jin@intel.com
      Link: http://lkml.kernel.org/r/1495706947-3744-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      d6f90404
    • Michael Bringmann's avatar
      powerpc/hotplug-mem: Fix missing endian conversion of aa_index · f4455627
      Michael Bringmann authored
      [ Upstream commit dc421b20 ]
      
      When adding or removing memory, the aa_index (affinity value) for the
      memblock must also be converted to match the endianness of the rest
      of the 'ibm,dynamic-memory' property.  Otherwise, subsequent retrieval
      of the attribute will likely lead to non-existent nodes, followed by
      using the default node in the code inappropriately.
      
      Fixes: 5f97b2a0 ("powerpc/pseries: Implement memory hotplug add in the kernel")
      Cc: stable@vger.kernel.org # v4.1+
      Signed-off-by: default avatarMichael Bringmann <mwb@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      f4455627
    • Michael Ellerman's avatar
      powerpc/numa: Fix percpu allocations to be NUMA aware · 7ee9689e
      Michael Ellerman authored
      [ Upstream commit ba4a648f ]
      
      In commit 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
      switched to the generic implementation of cpu_to_node(), which uses a percpu
      variable to hold the NUMA node for each CPU.
      
      Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
      of our percpu areas, leading to a chicken and egg problem. In practice what
      happens is when we are setting up the percpu areas, cpu_to_node() reports that
      all CPUs are on node 0, so we allocate all percpu areas on node 0.
      
      This is visible in the dmesg output, as all pcpu allocs being in group 0:
      
        pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
        pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
        pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
        pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
        pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
        pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
      
      To fix it we need an early_cpu_to_node() which can run prior to percpu being
      setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
      in. With the patch dmesg output shows two groups, 0 and 1:
      
        pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
        pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
        pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
        pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
        pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
        pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
      
      We can also check the data_offset in the paca of various CPUs, with the fix we
      see:
      
        CPU 0:  data_offset = 0x0ffe8b0000
        CPU 24: data_offset = 0x1ffe5b0000
      
      And we can see from dmesg that CPU 24 has an allocation on node 1:
      
        node   0: [mem 0x0000000000000000-0x0000000fffffffff]
        node   1: [mem 0x0000001000000000-0x0000001fffffffff]
      
      Cc: stable@vger.kernel.org # v3.16+
      Fixes: 8c272261 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      7ee9689e
    • Johannes Thumshirn's avatar
      scsi: qla2xxx: don't disable a not previously enabled PCI device · eecbbd83
      Johannes Thumshirn authored
      [ Upstream commit ddff7ed4 ]
      
      When pci_enable_device() or pci_enable_device_mem() fail in
      qla2x00_probe_one() we bail out but do a call to
      pci_disable_device(). This causes the dev_WARN_ON() in
      pci_disable_device() to trigger, as the device wasn't enabled
      previously.
      
      So instead of taking the 'probe_out' error path we can directly return
      *iff* one of the pci_enable_device() calls fails.
      
      Additionally rename the 'probe_out' goto label's name to the more
      descriptive 'disable_device'.
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Fixes: e315cd28 ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarGiridhar Malavali <giridhar.malavali@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      eecbbd83
    • Marc Zyngier's avatar
      KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages · 4a213a0f
      Marc Zyngier authored
      [ Upstream commit d6dbdd3c ]
      
      Under memory pressure, we start ageing pages, which amounts to parsing
      the page tables. Since we don't want to allocate any extra level,
      we pass NULL for our private allocation cache. Which means that
      stage2_get_pud() is allowed to fail. This results in the following
      splat:
      
      [ 1520.409577] Unable to handle kernel NULL pointer dereference at virtual address 00000008
      [ 1520.417741] pgd = ffff810f52fef000
      [ 1520.421201] [00000008] *pgd=0000010f636c5003, *pud=0000010f56f48003, *pmd=0000000000000000
      [ 1520.429546] Internal error: Oops: 96000006 [#1] PREEMPT SMP
      [ 1520.435156] Modules linked in:
      [ 1520.438246] CPU: 15 PID: 53550 Comm: qemu-system-aar Tainted: G        W       4.12.0-rc4-00027-g1885c397eaec #7205
      [ 1520.448705] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB12A 10/26/2016
      [ 1520.463726] task: ffff800ac5fb4e00 task.stack: ffff800ce04e0000
      [ 1520.469666] PC is at stage2_get_pmd+0x34/0x110
      [ 1520.474119] LR is at kvm_age_hva_handler+0x44/0xf0
      [ 1520.478917] pc : [<ffff0000080b137c>] lr : [<ffff0000080b149c>] pstate: 40000145
      [ 1520.486325] sp : ffff800ce04e33d0
      [ 1520.489644] x29: ffff800ce04e33d0 x28: 0000000ffff40064
      [ 1520.494967] x27: 0000ffff27e00000 x26: 0000000000000000
      [ 1520.500289] x25: ffff81051ba65008 x24: 0000ffff40065000
      [ 1520.505618] x23: 0000ffff40064000 x22: 0000000000000000
      [ 1520.510947] x21: ffff810f52b20000 x20: 0000000000000000
      [ 1520.516274] x19: 0000000058264000 x18: 0000000000000000
      [ 1520.521603] x17: 0000ffffa6fe7438 x16: ffff000008278b70
      [ 1520.526940] x15: 000028ccd8000000 x14: 0000000000000008
      [ 1520.532264] x13: ffff7e0018298000 x12: 0000000000000002
      [ 1520.537582] x11: ffff000009241b93 x10: 0000000000000940
      [ 1520.542908] x9 : ffff0000092ef800 x8 : 0000000000000200
      [ 1520.548229] x7 : ffff800ce04e36a8 x6 : 0000000000000000
      [ 1520.553552] x5 : 0000000000000001 x4 : 0000000000000000
      [ 1520.558873] x3 : 0000000000000000 x2 : 0000000000000008
      [ 1520.571696] x1 : ffff000008fd5000 x0 : ffff0000080b149c
      [ 1520.577039] Process qemu-system-aar (pid: 53550, stack limit = 0xffff800ce04e0000)
      [...]
      [ 1521.510735] [<ffff0000080b137c>] stage2_get_pmd+0x34/0x110
      [ 1521.516221] [<ffff0000080b149c>] kvm_age_hva_handler+0x44/0xf0
      [ 1521.522054] [<ffff0000080b0610>] handle_hva_to_gpa+0xb8/0xe8
      [ 1521.527716] [<ffff0000080b3434>] kvm_age_hva+0x44/0xf0
      [ 1521.532854] [<ffff0000080a58b0>] kvm_mmu_notifier_clear_flush_young+0x70/0xc0
      [ 1521.539992] [<ffff000008238378>] __mmu_notifier_clear_flush_young+0x88/0xd0
      [ 1521.546958] [<ffff00000821eca0>] page_referenced_one+0xf0/0x188
      [ 1521.552881] [<ffff00000821f36c>] rmap_walk_anon+0xec/0x250
      [ 1521.558370] [<ffff000008220f78>] rmap_walk+0x78/0xa0
      [ 1521.563337] [<ffff000008221104>] page_referenced+0x164/0x180
      [ 1521.569002] [<ffff0000081f1af0>] shrink_active_list+0x178/0x3b8
      [ 1521.574922] [<ffff0000081f2058>] shrink_node_memcg+0x328/0x600
      [ 1521.580758] [<ffff0000081f23f4>] shrink_node+0xc4/0x328
      [ 1521.585986] [<ffff0000081f2718>] do_try_to_free_pages+0xc0/0x340
      [ 1521.592000] [<ffff0000081f2a64>] try_to_free_pages+0xcc/0x240
      [...]
      
      The trivial fix is to handle this NULL pud value early, rather than
      dereferencing it blindly.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      4a213a0f
    • Jeff Mahoney's avatar
      btrfs: fix memory leak in update_space_info failure path · 951269f9
      Jeff Mahoney authored
      [ Upstream commit 896533a7 ]
      
      If we fail to add the space_info kobject, we'll leak the memory
      for the percpu counter.
      
      Fixes: 6ab0a202 (btrfs: publish allocation data in sysfs)
      Cc: <stable@vger.kernel.org> # v3.14+
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      951269f9
    • David Sterba's avatar
      btrfs: use correct types for page indices in btrfs_page_exists_in_range · d42014c8
      David Sterba authored
      [ Upstream commit cc2b702c ]
      
      Variables start_idx and end_idx are supposed to hold a page index
      derived from the file offsets. The int type is not the right one though,
      offsets larger than 1 << 44 will get silently trimmed off the high bits.
      (1 << 44 is 16TiB)
      
      What can go wrong, if start is below the boundary and end gets trimmed:
      - if there's a page after start, we'll find it (radix_tree_gang_lookup_slot)
      - the final check "if (page->index <= end_idx)" will unexpectedly fail
      
      The function will return false, ie. "there's no page in the range",
      although there is at least one.
      
      btrfs_page_exists_in_range is used to prevent races in:
      
      * in hole punching, where we make sure there are not pages in the
        truncated range, otherwise we'll wait for them to finish and redo
        truncation, but we're going to replace the pages with holes anyway so
        the only problem is the intermediate state
      
      * lock_extent_direct: we want to make sure there are no pages before we
        lock and start DIO, to prevent stale data reads
      
      For practical occurence of the bug, there are several constaints.  The
      file must be quite large, the affected range must cross the 16TiB
      boundary and the internal state of the file pages and pending operations
      must match.  Also, we must not have started any ordered data in the
      range, otherwise we don't even reach the buggy function check.
      
      DIO locking tries hard in several places to avoid deadlocks with
      buffered IO and avoids waiting for ranges. The worst consequence seems
      to be stale data read.
      
      CC: Liu Bo <bo.li.liu@oracle.com>
      CC: stable@vger.kernel.org	# 3.16+
      Fixes: fc4adbff ("btrfs: Drop EXTENT_UPTODATE check in hole punching and direct locking")
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      d42014c8
    • Frederic Barrat's avatar
      cxl: Fix error path on bad ioctl · cc558c20
      Frederic Barrat authored
      [ Upstream commit cec422c1 ]
      
      Fix error path if we can't copy user structure on CXL_IOCTL_START_WORK
      ioctl. We shouldn't unlock the context status mutex as it was not
      locked (yet).
      
      Fixes: 0712dc7e ("cxl: Fix issues when unmapping contexts")
      Cc: stable@vger.kernel.org # v3.19+
      Signed-off-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: default avatarVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      cc558c20
    • Al Viro's avatar
      ufs: set correct ->s_maxsize · c58e11d1
      Al Viro authored
      [ Upstream commit 6b0d144f ]
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      c58e11d1
    • Al Viro's avatar
      fix ufs_isblockset() · 7ba100d5
      Al Viro authored
      [ Upstream commit 414cf718 ]
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      7ba100d5
    • Tejun Heo's avatar
      cpuset: consider dying css as offline · 7f805350
      Tejun Heo authored
      [ Upstream commit 41c25707 ]
      
      In most cases, a cgroup controller don't care about the liftimes of
      cgroups.  For the controller, a css becomes online when ->css_online()
      is called on it and offline when ->css_offline() is called.
      
      However, cpuset is special in that the user interface it exposes cares
      whether certain cgroups exist or not.  Combined with the RCU delay
      between cgroup removal and css offlining, this can lead to user
      visible behavior oddities where operations which should succeed after
      cgroup removals fail for some time period.  The effects of cgroup
      removals are delayed when seen from userland.
      
      This patch adds css_is_dying() which tests whether offline is pending
      and updates is_cpuset_online() so that the function returns false also
      while offline is pending.  This gets rid of the userland visible
      delays.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
      Link: http://lkml.kernel.org/r/327ca1f5-7957-fbb9-9e5f-9ba149d40ba2@oracle.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      7f805350
    • Matt Ranostay's avatar
      iio: proximity: as3935: fix AS3935_INT mask · 51037ec2
      Matt Ranostay authored
      [ Upstream commit 275292d3 ]
      
      AS3935 interrupt mask has been incorrect so valid lightning events
      would never trigger an buffer event. Also noise interrupt should be
      BIT(0).
      
      Fixes: 24ddb0e4 ("iio: Add AS3935 lightning sensor support")
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarMatt Ranostay <matt.ranostay@konsulko.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      51037ec2
    • Oleg Drokin's avatar
      staging/lustre/lov: remove set_fs() call from lov_getstripe() · 60e9d774
      Oleg Drokin authored
      [ Upstream commit 0a33252e ]
      
      lov_getstripe() calls set_fs(KERNEL_DS) so that it can handle a struct
      lov_user_md pointer from user- or kernel-space.  This changes the
      behavior of copy_from_user() on SPARC and may result in a misaligned
      access exception which in turn oopses the kernel.  In fact the
      relevant argument to lov_getstripe() is never called with a
      kernel-space pointer and so changing the address limits is unnecessary
      and so we remove the calls to save, set, and restore the address
      limits.
      Signed-off-by: default avatarJohn L. Hammond <john.hammond@intel.com>
      Reviewed-on: http://review.whamcloud.com/6150
      Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3221Reviewed-by: default avatarAndreas Dilger <andreas.dilger@intel.com>
      Reviewed-by: default avatarLi Wei <wei.g.li@intel.com>
      Signed-off-by: default avatarOleg Drokin <green@linuxhacker.ru>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      60e9d774
    • Michael Thalmeier's avatar
      usb: chipidea: debug: check before accessing ci_role · 6f4f7e81
      Michael Thalmeier authored
      [ Upstream commit 0340ff83 ]
      
      ci_role BUGs when the role is >= CI_ROLE_END.
      
      Cc: stable@vger.kernel.org  #v3.10+
      Signed-off-by: default avatarMichael Thalmeier <michael.thalmeier@hale.at>
      Signed-off-by: default avatarPeter Chen <peter.chen@nxp.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      6f4f7e81
    • Jisheng Zhang's avatar
      usb: chipidea: udc: fix NULL pointer dereference if udc_start failed · 9738b3df
      Jisheng Zhang authored
      [ Upstream commit aa1f058d ]
      
      Fix below NULL pointer dereference. we set ci->roles[CI_ROLE_GADGET]
      too early in ci_hdrc_gadget_init(), if udc_start() fails due to some
      reason, the ci->roles[CI_ROLE_GADGET] check in  ci_hdrc_gadget_destroy
      can't protect us.
      
      We fix this issue by only setting ci->roles[CI_ROLE_GADGET] if
      udc_start() succeed.
      
      [    1.398550] Unable to handle kernel NULL pointer dereference at
      virtual address 00000000
      ...
      [    1.448600] PC is at dma_pool_free+0xb8/0xf0
      [    1.453012] LR is at dma_pool_free+0x28/0xf0
      [    2.113369] [<ffffff80081817d8>] dma_pool_free+0xb8/0xf0
      [    2.118857] [<ffffff800841209c>] destroy_eps+0x4c/0x68
      [    2.124165] [<ffffff8008413770>] ci_hdrc_gadget_destroy+0x28/0x50
      [    2.130461] [<ffffff800840fa30>] ci_hdrc_probe+0x588/0x7e8
      [    2.136129] [<ffffff8008380fb8>] platform_drv_probe+0x50/0xb8
      [    2.142066] [<ffffff800837f494>] driver_probe_device+0x1fc/0x2a8
      [    2.148270] [<ffffff800837f68c>] __device_attach_driver+0x9c/0xf8
      [    2.154563] [<ffffff800837d570>] bus_for_each_drv+0x58/0x98
      [    2.160317] [<ffffff800837f174>] __device_attach+0xc4/0x138
      [    2.166072] [<ffffff800837f738>] device_initial_probe+0x10/0x18
      [    2.172185] [<ffffff800837e58c>] bus_probe_device+0x94/0xa0
      [    2.177940] [<ffffff800837c560>] device_add+0x3f0/0x560
      [    2.183337] [<ffffff8008380d20>] platform_device_add+0x180/0x240
      [    2.189541] [<ffffff800840f0e8>] ci_hdrc_add_device+0x440/0x4f8
      [    2.195654] [<ffffff8008414194>] ci_hdrc_usb2_probe+0x13c/0x2d8
      [    2.201769] [<ffffff8008380fb8>] platform_drv_probe+0x50/0xb8
      [    2.207705] [<ffffff800837f494>] driver_probe_device+0x1fc/0x2a8
      [    2.213910] [<ffffff800837f5ec>] __driver_attach+0xac/0xb0
      [    2.219575] [<ffffff800837d4b0>] bus_for_each_dev+0x60/0xa0
      [    2.225329] [<ffffff800837ec80>] driver_attach+0x20/0x28
      [    2.230816] [<ffffff800837e880>] bus_add_driver+0x1d0/0x238
      [    2.236571] [<ffffff800837fdb0>] driver_register+0x60/0xf8
      [    2.242237] [<ffffff8008380ef4>] __platform_driver_register+0x44/0x50
      [    2.248891] [<ffffff80086fd440>] ci_hdrc_usb2_driver_init+0x18/0x20
      [    2.255365] [<ffffff8008082950>] do_one_initcall+0x38/0x128
      [    2.261121] [<ffffff80086e0d00>] kernel_init_freeable+0x1ac/0x250
      [    2.267414] [<ffffff800852f0b8>] kernel_init+0x10/0x100
      [    2.272810] [<ffffff8008082680>] ret_from_fork+0x10/0x50
      
      Cc: stable <stable@vger.kernel.org>
      Fixes: 3f124d23 ("usb: chipidea: add role init and destroy APIs")
      Signed-off-by: default avatarJisheng Zhang <jszhang@marvell.com>
      Signed-off-by: default avatarPeter Chen <peter.chen@nxp.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      9738b3df
    • Thinh Nguyen's avatar
      usb: gadget: f_mass_storage: Serialize wake and sleep execution · db87e41d
      Thinh Nguyen authored
      [ Upstream commit dc9217b6 ]
      
      f_mass_storage has a memorry barrier issue with the sleep and wake
      functions that can cause a deadlock. This results in intermittent hangs
      during MSC file transfer. The host will reset the device after receiving
      no response to resume the transfer. This issue is seen when dwc3 is
      processing 2 transfer-in-progress events at the same time, invoking
      completion handlers for CSW and CBW. Also this issue occurs depending on
      the system timing and latency.
      
      To increase the chance to hit this issue, you can force dwc3 driver to
      wait and process those 2 events at once by adding a small delay (~100us)
      in dwc3_check_event_buf() whenever the request is for CSW and read the
      event count again. Avoid debugging with printk and ftrace as extra
      delays and memory barrier will mask this issue.
      
      Scenario which can lead to failure:
      -----------------------------------
      1) The main thread sleeps and waits for the next command in
         get_next_command().
      2) bulk_in_complete() wakes up main thread for CSW.
      3) bulk_out_complete() tries to wake up the running main thread for CBW.
      4) thread_wakeup_needed is not loaded with correct value in
         sleep_thread().
      5) Main thread goes to sleep again.
      
      The pattern is shown below. Note the 2 critical variables.
       * common->thread_wakeup_needed
       * bh->state
      
      	CPU 0 (sleep_thread)		CPU 1 (wakeup_thread)
      	==============================  ===============================
      
      					bh->state = BH_STATE_FULL;
      					smp_wmb();
      	thread_wakeup_needed = 0;	thread_wakeup_needed = 1;
      	smp_rmb();
      	if (bh->state != BH_STATE_FULL)
      		sleep again ...
      
      As pointed out by Alan Stern, this is an R-pattern issue. The issue can
      be seen when there are two wakeups in quick succession. The
      thread_wakeup_needed can be overwritten in sleep_thread, and the read of
      the bh->state maybe reordered before the write to thread_wakeup_needed.
      
      This patch applies full memory barrier smp_mb() in both sleep_thread()
      and wakeup_thread() to ensure the order which the thread_wakeup_needed
      and bh->state are written and loaded.
      
      However, a better solution in the future would be to use wait_queue
      method that takes care of managing memory barrier between waker and
      waiter.
      
      Cc: <stable@vger.kernel.org>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarThinh Nguyen <thinhn@synopsys.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      db87e41d
    • Konstantin Khlebnikov's avatar
      ext4: keep existing extra fields when inode expands · 92629579
      Konstantin Khlebnikov authored
      [ Upstream commit 887a9730 ]
      
      ext4_expand_extra_isize() should clear only space between old and new
      size.
      
      Fixes: 6dd4ee7c # v2.6.23
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      92629579
    • Jan Kara's avatar
      ext4: fix SEEK_HOLE · 4d1adc2a
      Jan Kara authored
      [ Upstream commit 7d95eddf ]
      
      Currently, SEEK_HOLE implementation in ext4 may both return that there's
      a hole at some offset although that offset already has data and skip
      some holes during a search for the next hole. The first problem is
      demostrated by:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "seek -h 0" file
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (2.054 GiB/sec and 538461.5385 ops/sec)
      Whence	Result
      HOLE	0
      
      Where we can see that SEEK_HOLE wrongly returned offset 0 as containing
      a hole although we have written data there. The second problem can be
      demonstrated by:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
             -c "seek -h 0" file
      
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (1.978 GiB/sec and 518518.5185 ops/sec)
      wrote 8192/8192 bytes at offset 131072
      8 KiB, 2 ops; 0.0000 sec (2 GiB/sec and 500000.0000 ops/sec)
      Whence	Result
      HOLE	139264
      
      Where we can see that hole at offsets 56k..128k has been ignored by the
      SEEK_HOLE call.
      
      The underlying problem is in the ext4_find_unwritten_pgoff() which is
      just buggy. In some cases it fails to update returned offset when it
      finds a hole (when no pages are found or when the first found page has
      higher index than expected), in some cases conditions for detecting hole
      are just missing (we fail to detect a situation where indices of
      returned pages are not contiguous).
      
      Fix ext4_find_unwritten_pgoff() to properly detect non-contiguous page
      indices and also handle all cases where we got less pages then expected
      in one place and handle it properly there.
      
      CC: stable@vger.kernel.org
      Fixes: c8c0df24
      CC: Zheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      4d1adc2a
    • Wanpeng Li's avatar
      KVM: async_pf: avoid async pf injection when in guest mode · 8406f302
      Wanpeng Li authored
      [ Upstream commit 9bc1f09f ]
      
       INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
             Not tainted 4.12.0-rc4+ #8
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       gnome-terminal- D    0  1734   1015 0x00000000
       Call Trace:
        __schedule+0x3cd/0xb30
        schedule+0x40/0x90
        kvm_async_pf_task_wait+0x1cc/0x270
        ? __vfs_read+0x37/0x150
        ? prepare_to_swait+0x22/0x70
        do_async_page_fault+0x77/0xb0
        ? do_async_page_fault+0x77/0xb0
        async_page_fault+0x28/0x30
      
      This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
      and then gives stress to memory on L1, I can observed this hang on L1 when
      at least ~70% swap area is occupied on L0.
      
      This is due to async pf was injected to L2 which should be injected to L1,
      L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
      actually), and L1 guest starts accumulating tasks stuck in D state in
      kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.
      
      This patch fixes the hang by doing async pf when executing L1 guest.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      8406f302
    • Marc Zyngier's avatar
      arm: KVM: Allow unaligned accesses at HYP · fdb67b2a
      Marc Zyngier authored
      [ Upstream commit 33b5c388 ]
      
      We currently have the HSCTLR.A bit set, trapping unaligned accesses
      at HYP, but we're not really prepared to deal with it.
      
      Since the rest of the kernel is pretty happy about that, let's follow
      its example and set HSCTLR.A to zero. Modern CPUs don't really care.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <cdall@linaro.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      fdb67b2a
    • Wanpeng Li's avatar
      KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation · 1e8dabb6
      Wanpeng Li authored
      [ Upstream commit a3641631 ]
      
      If "i" is the last element in the vcpu->arch.cpuid_entries[] array, it
      potentially can be exploited the vulnerability. this will out-of-bounds
      read and write.  Luckily, the effect is small:
      
      	/* when no next entry is found, the current entry[i] is reselected */
      	for (j = i + 1; ; j = (j + 1) % nent) {
      		struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j];
      		if (ej->function == e->function) {
      
      It reads ej->maxphyaddr, which is user controlled.  However...
      
      			ej->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT;
      
      After cpuid_entries there is
      
      	int maxphyaddr;
      	struct x86_emulate_ctxt emulate_ctxt;  /* 16-byte aligned */
      
      So we have:
      
      - cpuid_entries at offset 1B50 (6992)
      - maxphyaddr at offset 27D0 (6992 + 3200 = 10192)
      - padding at 27D4...27DF
      - emulate_ctxt at 27E0
      
      And it writes in the padding.  Pfew, writing the ops field of emulate_ctxt
      would have been much worse.
      
      This patch fixes it by modding the index to avoid the out-of-bounds
      access. Worst case, i == j and ej->function == e->function,
      the loop can bail out.
      Reported-by: default avatarMoguofang <moguofang@huawei.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Guofang Mo <moguofang@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      1e8dabb6
    • Paolo Bonzini's avatar
      kvm: async_pf: fix rcu_irq_enter() with irqs enabled · 702eb8d2
      Paolo Bonzini authored
      [ Upstream commit bbaf0e2b ]
      
      native_safe_halt enables interrupts, and you just shouldn't
      call rcu_irq_enter() with interrupts enabled.  Reorder the
      call with the following local_irq_disable() to respect the
      invariant.
      Reported-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      702eb8d2