1. 27 Jul, 2016 40 commits
    • Alex Deucher's avatar
      drm/radeon: fix asic initialization for virtualized environments · 24b37a76
      Alex Deucher authored
      commit 05082b8b upstream.
      
      When executing in a PCI passthrough based virtuzliation environment, the
      hypervisor will usually attempt to send a PCIe bus reset signal to the
      ASIC when the VM reboots. In this scenario, the card is not correctly
      initialized, but we still consider it to be posted. Therefore, in a
      passthrough based environemnt we should always post the card to guarantee
      it is in a good state for driver initialization.
      
      Ported from amdgpu commit:
      amdgpu: fix asic initialization for virtualized environments
      
      Cc: Andres Rodriguez <andres.rodriguez@amd.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24b37a76
    • Jeff Mahoney's avatar
      btrfs: account for non-CoW'd blocks in btrfs_abort_transaction · ba797a4f
      Jeff Mahoney authored
      commit 64c12921 upstream.
      
      The test for !trans->blocks_used in btrfs_abort_transaction is
      insufficient to determine whether it's safe to drop the transaction
      handle on the floor.  btrfs_cow_block, informed by should_cow_block,
      can return blocks that have already been CoW'd in the current
      transaction.  trans->blocks_used is only incremented for new block
      allocations. If an operation overlaps the blocks in the current
      transaction entirely and must abort the transaction, we'll happily
      let it clean up the trans handle even though it may have modified
      the blocks and will commit an incomplete operation.
      
      In the long-term, I'd like to do closer tracking of when the fs
      is actually modified so we can still recover as gracefully as possible,
      but that approach will need some discussion.  In the short term,
      since this is the only code using trans->blocks_used, let's just
      switch it to a bool indicating whether any blocks were used and set
      it when should_cow_block returns false.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba797a4f
    • Tejun Heo's avatar
      percpu: fix synchronization between synchronous map extension and chunk destruction · 05f18e6f
      Tejun Heo authored
      commit 6710e594 upstream.
      
      For non-atomic allocations, pcpu_alloc() can try to extend the area
      map synchronously after dropping pcpu_lock; however, the extension
      wasn't synchronized against chunk destruction and the chunk might get
      freed while extension is in progress.
      
      This patch fixes the bug by putting most of non-atomic allocations
      under pcpu_alloc_mutex to synchronize against pcpu_balance_work which
      is responsible for async chunk management including destruction.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-tested-by: default avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Reported-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Fixes: 1a4d7607 ("percpu: implement asynchronous chunk population")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05f18e6f
    • Tejun Heo's avatar
      percpu: fix synchronization between chunk->map_extend_work and chunk destruction · 5825418a
      Tejun Heo authored
      commit 4f996e23 upstream.
      
      Atomic allocations can trigger async map extensions which is serviced
      by chunk->map_extend_work.  pcpu_balance_work which is responsible for
      destroying idle chunks wasn't synchronizing properly against
      chunk->map_extend_work and may end up freeing the chunk while the work
      item is still in flight.
      
      This patch fixes the bug by rolling async map extension operations
      into pcpu_balance_work.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-tested-by: default avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Reported-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Fixes: 9c824b6a ("percpu: make sure chunk->map array has available space")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5825418a
    • Miklos Szeredi's avatar
      af_unix: fix hard linked sockets on overlay · d808791e
      Miklos Szeredi authored
      commit eb0a4a47 upstream.
      
      Overlayfs uses separate inodes even in the case of hard links on the
      underlying filesystems.  This is a problem for AF_UNIX socket
      implementation which indexes sockets based on the inode.  This resulted in
      hard linked sockets not working.
      
      The fix is to use the real, underlying inode.
      
      Test case follows:
      
      -- ovl-sock-test.c --
      #include <unistd.h>
      #include <err.h>
      #include <sys/socket.h>
      #include <sys/un.h>
      
      #define SOCK "test-sock"
      #define SOCK2 "test-sock2"
      
      int main(void)
      {
      	int fd, fd2;
      	struct sockaddr_un addr = {
      		.sun_family = AF_UNIX,
      		.sun_path = SOCK,
      	};
      	struct sockaddr_un addr2 = {
      		.sun_family = AF_UNIX,
      		.sun_path = SOCK2,
      	};
      
      	unlink(SOCK);
      	unlink(SOCK2);
      	if ((fd = socket(AF_UNIX, SOCK_STREAM, 0)) == -1)
      		err(1, "socket");
      	if (bind(fd, (struct sockaddr *) &addr, sizeof(addr)) == -1)
      		err(1, "bind");
      	if (listen(fd, 0) == -1)
      		err(1, "listen");
      	if (link(SOCK, SOCK2) == -1)
      		err(1, "link");
      	if ((fd2 = socket(AF_UNIX, SOCK_STREAM, 0)) == -1)
      		err(1, "socket");
      	if (connect(fd2, (struct sockaddr *) &addr2, sizeof(addr2)) == -1)
      		err (1, "connect");
      	return 0;
      }
      ----
      Reported-by: default avatarAlexander Morozov <alexandr.morozov@docker.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d808791e
    • Miklos Szeredi's avatar
      vfs: add d_real_inode() helper · 8b8b6b53
      Miklos Szeredi authored
      commit a1180844 upstream.
      
      Needed by the following fix.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b8b6b53
    • James Morse's avatar
      arm64: kernel: Save and restore UAO and addr_limit on exception entry · 2ce30374
      James Morse authored
      commit e19a6ee2 upstream.
      
      If we take an exception while at EL1, the exception handler inherits
      the original context's addr_limit and PSTATE.UAO values. To be consistent
      always reset addr_limit and PSTATE.UAO on (re-)entry to EL1. This
      prevents accidental re-use of the original context's addr_limit.
      
      Based on a similar patch for arm from Russell King.
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ce30374
    • Shaokun Zhang's avatar
      arm64: mm: remove page_mapping check in __sync_icache_dcache · bd1e0058
      Shaokun Zhang authored
      commit 20c27a42 upstream.
      
      __sync_icache_dcache unconditionally skips the cache maintenance for
      anonymous pages, under the assumption that flushing is only required in
      the presence of D-side aliases [see 7249b79f ("arm64: Do not flush
      the D-cache for anonymous pages")].
      
      Unfortunately, this breaks migration of anonymous pages holding
      self-modifying code, where userspace cannot be reasonably expected to
      reissue maintenance instructions in response to a migration.
      
      This patch fixes the problem by removing the broken page_mapping(page)
      check from the cache syncing code, otherwise we may end up fetching and
      executing stale instructions from the PoU.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarShaokun Zhang <zhangshaokun@hisilicon.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd1e0058
    • Mark Rutland's avatar
      arm64: fix dump_instr when PAN and UAO are in use · e75da3fe
      Mark Rutland authored
      commit c5cea06b upstream.
      
      If the kernel is set to show unhandled signals, and a user task does not
      handle a SIGILL as a result of an instruction abort, we will attempt to
      log the offending instruction with dump_instr before killing the task.
      
      We use dump_instr to log the encoding of the offending userspace
      instruction. However, dump_instr is also used to dump instructions from
      kernel space, and internally always switches to KERNEL_DS before dumping
      the instruction with get_user. When both PAN and UAO are in use, reading
      a user instruction via get_user while in KERNEL_DS will result in a
      permission fault, which leads to an Oops.
      
      As we have regs corresponding to the context of the original instruction
      abort, we can inspect this and only flip to KERNEL_DS if the original
      abort was taken from the kernel, avoiding this issue. At the same time,
      remove the redundant (and incorrect) comments regarding the order
      dump_mem and dump_instr are called in.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reported-by: default avatarVladimir Murzin <vladimir.murzin@arm.com>
      Tested-by: default avatarVladimir Murzin <vladimir.murzin@arm.com>
      Fixes: 57f4959b ("arm64: kernel: Add support for User Access Override")
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e75da3fe
    • Robin Murphy's avatar
      drm/nouveau/Revert "drm/nouveau/device/pci: set as non-CPU-coherent on ARM64" · 896f190d
      Robin Murphy authored
      commit 539aae6e upstream.
      
      This reverts commit 1733a2ad.
      
      There is apparently something amiss with the way the TTM code handles
      DMA buffers, which the above commit was attempting to work around for
      arm64 systems with non-coherent PCI. Unfortunately, this completely
      breaks systems *with* coherent PCI (which appear to be the majority).
      
      Booting a plain arm64 defconfig + CONFIG_DRM + CONFIG_DRM_NOUVEAU on
      a machine with a PCI GPU having coherent dma_map_ops (in this case a
      7600GT card plugged into an ARM Juno board) results in a fatal crash:
      
      [    2.803438] nouveau 0000:06:00.0: DRM: allocated 1024x768 fb: 0x9000, bo ffffffc976141c00
      [    2.897662] Unable to handle kernel NULL pointer dereference at virtual address 000001ac
      [    2.897666] pgd = ffffff8008e00000
      [    2.897675] [000001ac] *pgd=00000009ffffe003, *pud=00000009ffffe003, *pmd=0000000000000000
      [    2.897680] Internal error: Oops: 96000045 [#1] PREEMPT SMP
      [    2.897685] Modules linked in:
      [    2.897692] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc5+ #543
      [    2.897694] Hardware name: ARM Juno development board (r1) (DT)
      [    2.897699] task: ffffffc9768a0000 ti: ffffffc9768a8000 task.ti: ffffffc9768a8000
      [    2.897711] PC is at __memcpy+0x7c/0x180
      [    2.897719] LR is at OUT_RINGp+0x34/0x70
      [    2.897724] pc : [<ffffff80083465fc>] lr : [<ffffff800854248c>] pstate: 80000045
      [    2.897726] sp : ffffffc9768ab360
      [    2.897732] x29: ffffffc9768ab360 x28: 0000000000000001
      [    2.897738] x27: ffffffc97624c000 x26: 0000000000000000
      [    2.897744] x25: 0000000000000080 x24: 0000000000006c00
      [    2.897749] x23: 0000000000000005 x22: ffffffc97624c010
      [    2.897755] x21: 0000000000000004 x20: 0000000000000004
      [    2.897761] x19: ffffffc9763da000 x18: ffffffc976b2491c
      [    2.897766] x17: 0000000000000007 x16: 0000000000000006
      [    2.897771] x15: 0000000000000001 x14: 0000000000000001
      [    2.897777] x13: 0000000000e31b70 x12: ffffffc9768a0080
      [    2.897783] x11: 0000000000000000 x10: fffffffffffffb00
      [    2.897788] x9 : 0000000000000000 x8 : 0000000000000000
      [    2.897793] x7 : 0000000000000000 x6 : 00000000000001ac
      [    2.897799] x5 : 00000000ffffffff x4 : 0000000000000000
      [    2.897804] x3 : 0000000000000010 x2 : 0000000000000010
      [    2.897810] x1 : ffffffc97624c010 x0 : 00000000000001ac
      ...
      [    2.898494] Call trace:
      [    2.898499] Exception stack(0xffffffc9768ab1a0 to 0xffffffc9768ab2c0)
      [    2.898506] b1a0: ffffffc9763da000 0000000000000004 ffffffc9768ab360 ffffff80083465fc
      [    2.898513] b1c0: ffffffc976801e00 ffffffc9762b8000 ffffffc9768ab1f0 ffffff80080ec158
      [    2.898520] b1e0: ffffffc9768ab230 ffffff8008496d04 ffffffc975ce6d80 ffffffc9768ab36e
      [    2.898527] b200: ffffffc9768ab36f ffffffc9768ab29d ffffffc9768ab29e ffffffc9768a0000
      [    2.898533] b220: ffffffc9768ab250 ffffff80080e70c0 ffffffc9768ab270 ffffff8008496e44
      [    2.898540] b240: 00000000000001ac ffffffc97624c010 0000000000000010 0000000000000010
      [    2.898546] b260: 0000000000000000 00000000ffffffff 00000000000001ac 0000000000000000
      [    2.898552] b280: 0000000000000000 0000000000000000 fffffffffffffb00 0000000000000000
      [    2.898558] b2a0: ffffffc9768a0080 0000000000e31b70 0000000000000001 0000000000000001
      [    2.898566] [<ffffff80083465fc>] __memcpy+0x7c/0x180
      [    2.898574] [<ffffff800853e164>] nv04_fbcon_imageblit+0x1d4/0x2e8
      [    2.898582] [<ffffff800853d6d0>] nouveau_fbcon_imageblit+0xd8/0xe0
      [    2.898591] [<ffffff80083c4db4>] soft_cursor+0x154/0x1d8
      [    2.898598] [<ffffff80083c47b4>] bit_cursor+0x4fc/0x538
      [    2.898605] [<ffffff80083c0cfc>] fbcon_cursor+0x134/0x1a8
      [    2.898613] [<ffffff800841c280>] hide_cursor+0x38/0xa0
      [    2.898620] [<ffffff800841d420>] redraw_screen+0x120/0x228
      [    2.898628] [<ffffff80083bf268>] fbcon_prepare_logo+0x370/0x3f8
      [    2.898635] [<ffffff80083bf640>] fbcon_init+0x350/0x560
      [    2.898641] [<ffffff800841c634>] visual_init+0xac/0x108
      [    2.898648] [<ffffff800841df14>] do_bind_con_driver+0x1c4/0x3a8
      [    2.898655] [<ffffff800841e4f4>] do_take_over_console+0x174/0x1e8
      [    2.898662] [<ffffff80083bf8c4>] do_fbcon_takeover+0x74/0x100
      [    2.898669] [<ffffff80083c3e44>] fbcon_event_notify+0x8cc/0x920
      [    2.898680] [<ffffff80080d7e38>] notifier_call_chain+0x50/0x90
      [    2.898685] [<ffffff80080d8214>] __blocking_notifier_call_chain+0x4c/0x90
      [    2.898691] [<ffffff80080d826c>] blocking_notifier_call_chain+0x14/0x20
      [    2.898696] [<ffffff80083c5e1c>] fb_notifier_call_chain+0x1c/0x28
      [    2.898703] [<ffffff80083c81ac>] register_framebuffer+0x1cc/0x2e0
      [    2.898712] [<ffffff800845da80>] drm_fb_helper_initial_config+0x288/0x3e8
      [    2.898719] [<ffffff800853da20>] nouveau_fbcon_init+0xe0/0x118
      [    2.898727] [<ffffff800852d2f8>] nouveau_drm_load+0x268/0x890
      [    2.898734] [<ffffff8008466e24>] drm_dev_register+0xbc/0xc8
      [    2.898740] [<ffffff8008468a88>] drm_get_pci_dev+0xa0/0x180
      [    2.898747] [<ffffff800852cb28>] nouveau_drm_probe+0x1a0/0x1e0
      [    2.898755] [<ffffff80083a32e0>] pci_device_probe+0x98/0x110
      [    2.898763] [<ffffff800858e434>] driver_probe_device+0x204/0x2b0
      [    2.898770] [<ffffff800858e58c>] __driver_attach+0xac/0xb0
      [    2.898777] [<ffffff800858c3e0>] bus_for_each_dev+0x60/0xa0
      [    2.898783] [<ffffff800858dbc0>] driver_attach+0x20/0x28
      [    2.898789] [<ffffff800858d7b0>] bus_add_driver+0x1d0/0x238
      [    2.898796] [<ffffff800858ed50>] driver_register+0x60/0xf8
      [    2.898802] [<ffffff80083a20dc>] __pci_register_driver+0x3c/0x48
      [    2.898809] [<ffffff8008468eb4>] drm_pci_init+0xf4/0x120
      [    2.898818] [<ffffff8008c56fc0>] nouveau_drm_init+0x21c/0x230
      [    2.898825] [<ffffff80080829d4>] do_one_initcall+0x8c/0x190
      [    2.898832] [<ffffff8008c31af4>] kernel_init_freeable+0x14c/0x1f0
      [    2.898839] [<ffffff80088a0c20>] kernel_init+0x10/0x100
      [    2.898845] [<ffffff8008085e10>] ret_from_fork+0x10/0x40
      [    2.898853] Code: a88120c7 a8c12027 a88120c7 a8c12027 (a88120c7)
      [    2.898871] ---[ end trace d5713dcad023ee04 ]---
      [    2.898888] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      
      In a toss-up between the GPU seeing stale data artefacts on some systems
      vs. catastrophic kernel crashes on other systems, the latter would seem
      to take precedence, so revert this change until the real underlying
      problem can be fixed.
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Acked-by: default avatarAlexandre Courbot <acourbot@nvidia.com>
      [acourbot@nvidia.com: port to Nouveau tree, remove bits in lib/]
      Signed-off-by: default avatarAlexandre Courbot <acourbot@nvidia.com>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      896f190d
    • Junichi Nomura's avatar
      ipmi: Remove smi_msg from waiting_rcv_msgs list before handle_one_recv_msg() · b6d3c5d0
      Junichi Nomura authored
      commit ae4ea9a2 upstream.
      
      Commit 7ea0ed2b ("ipmi: Make the message handler easier to use for
      SMI interfaces") changed handle_new_recv_msgs() to call handle_one_recv_msg()
      for a smi_msg while the smi_msg is still connected to waiting_rcv_msgs list.
      That could lead to following list corruption problems:
      
      1) low-level function treats smi_msg as not connected to list
      
        handle_one_recv_msg() could end up calling smi_send(), which
        assumes the msg is not connected to list.
      
        For example, the following sequence could corrupt list by
        doing list_add_tail() for the entry still connected to other list.
      
          handle_new_recv_msgs()
            msg = list_entry(waiting_rcv_msgs)
            handle_one_recv_msg(msg)
              handle_ipmb_get_msg_cmd(msg)
                smi_send(msg)
                  spin_lock(xmit_msgs_lock)
                  list_add_tail(msg)
                  spin_unlock(xmit_msgs_lock)
      
      2) race between multiple handle_new_recv_msgs() instances
      
        handle_new_recv_msgs() once releases waiting_rcv_msgs_lock before calling
        handle_one_recv_msg() then retakes the lock and list_del() it.
      
        If others call handle_new_recv_msgs() during the window shown below
        list_del() will be done twice for the same smi_msg.
      
        handle_new_recv_msgs()
          spin_lock(waiting_rcv_msgs_lock)
          msg = list_entry(waiting_rcv_msgs)
          spin_unlock(waiting_rcv_msgs_lock)
        |
        | handle_one_recv_msg(msg)
        |
          spin_lock(waiting_rcv_msgs_lock)
          list_del(msg)
          spin_unlock(waiting_rcv_msgs_lock)
      
      Fixes: 7ea0ed2b ("ipmi: Make the message handler easier to use for SMI interfaces")
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      [Added a comment to describe why this works.]
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Tested-by: default avatarYe Feng <yefeng.yl@alibaba-inc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6d3c5d0
    • Stefan Agner's avatar
      drm/fsl-dcu: use flat regmap cache · c62d5ac1
      Stefan Agner authored
      commit ce492b3b upstream.
      
      Using flat regmap cache instead of RB-tree to avoid the following
      lockdep warning on driver load:
      WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:2755 lockdep_trace_alloc+0x15c/0x160()
      DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
      
      The RB-tree regmap cache needs to allocate new space on first
      writes. However, allocations in an atomic context (e.g. when a
      spinlock is held) are not allowed. The function regmap_write
      calls map->lock, which acquires a spinlock in the fast_io case.
      Since the FSL DCU driver uses MMIO, the regmap bus of type
      regmap_mmio is being used which has fast_io set to true.
      
      Use flat regmap cache and specify max register to be large
      enouth to cover all registers available in LS1021a and Vybrids
      register space.
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c62d5ac1
    • Mathieu Larouche's avatar
      drm/mgag200: Black screen fix for G200e rev 4 · 6d364182
      Mathieu Larouche authored
      commit d3922b69 upstream.
      
      - Fixed black screen for some resolutions of G200e rev4
      - Fixed testm & testn which had predetermined value.
      Reported-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarMathieu Larouche <mathieu.larouche@matrox.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d364182
    • Vegard Nossum's avatar
      apparmor: fix oops, validate buffer size in apparmor_setprocattr() · dba63efd
      Vegard Nossum authored
      commit 30a46a46 upstream.
      
      When proc_pid_attr_write() was changed to use memdup_user apparmor's
      (interface violating) assumption that the setprocattr buffer was always
      a single page was violated.
      
      The size test is not strictly speaking needed as proc_pid_attr_write()
      will reject anything larger, but for the sake of robustness we can keep
      it in.
      
      SMACK and SELinux look safe to me, but somebody else should probably
      have a look just in case.
      
      Based on original patch from Vegard Nossum <vegard.nossum@oracle.com>
      modified for the case that apparmor provides null termination.
      
      Fixes: bb646cdbReported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: John Johansen <john.johansen@canonical.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarJohn Johansen <john.johansen@canonical.com>
      Reviewed-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dba63efd
    • Joerg Roedel's avatar
      iommu/amd: Fix unity mapping initialization race · 59235422
      Joerg Roedel authored
      commit 522e5cb7 upstream.
      
      There is a race condition in the AMD IOMMU init code that
      causes requested unity mappings to be blocked by the IOMMU
      for a short period of time. This results on boot failures
      and IO_PAGE_FAULTs on some machines.
      
      Fix this by making sure the unity mappings are installed
      before all other DMA is blocked.
      
      Fixes: aafd8ba0 ('iommu/amd: Implement add_device and remove_device')
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59235422
    • Joerg Roedel's avatar
      iommu/vt-d: Enable QI on all IOMMUs before setting root entry · a1cf0802
      Joerg Roedel authored
      commit a4c34ff1 upstream.
      
      This seems to be required on some X58 chipsets on systems
      with more than one IOMMU. QI does not work until it is
      enabled on all IOMMUs in the system.
      Reported-by: default avatarDheeraj CVR <cvr.dheeraj@gmail.com>
      Tested-by: default avatarDheeraj CVR <cvr.dheeraj@gmail.com>
      Fixes: 5f0a7f76 ('iommu/vt-d: Make root entry visible for hardware right after allocation')
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1cf0802
    • Jean-Philippe Brucker's avatar
      iommu/arm-smmu: Wire up map_sg for arm-smmu-v3 · ba785c64
      Jean-Philippe Brucker authored
      commit 9aeb26cf upstream.
      
      The map_sg callback is missing from arm_smmu_ops, but is required by
      iommu.h. Similarly to most other IOMMU drivers, connect it to
      default_iommu_map_sg.
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba785c64
    • John Keeping's avatar
      iommu/rockchip: Fix zap cache during device attach · e44a3eee
      John Keeping authored
      commit ae8a7910 upstream.
      
      rk_iommu_command() takes a struct rk_iommu and iterates over the slave
      MMUs, so this is doubly wrong in that we're passing in the wrong pointer
      and talking to MMUs that we shouldn't be.
      
      Fixes: cd6438c5 ("iommu/rockchip: Reconstruct to support multi slaves")
      Signed-off-by: default avatarJohn Keeping <john@metanate.com>
      Tested-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Reviewed-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e44a3eee
    • Jiri Slaby's avatar
      base: make module_create_drivers_dir race-free · 41fa543b
      Jiri Slaby authored
      commit 7e1b1fc4 upstream.
      
      Modules which register drivers via standard path (driver_register) in
      parallel can cause a warning:
      WARNING: CPU: 2 PID: 3492 at ../fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
      sysfs: cannot create duplicate filename '/module/saa7146/drivers'
      Modules linked in: hexium_gemini(+) mxb(+) ...
      ...
      Call Trace:
      ...
       [<ffffffff812e63a2>] sysfs_warn_dup+0x62/0x80
       [<ffffffff812e6487>] sysfs_create_dir_ns+0x77/0x90
       [<ffffffff8140f2c4>] kobject_add_internal+0xb4/0x340
       [<ffffffff8140f5b8>] kobject_add+0x68/0xb0
       [<ffffffff8140f631>] kobject_create_and_add+0x31/0x70
       [<ffffffff8157a703>] module_add_driver+0xc3/0xd0
       [<ffffffff8155e5d4>] bus_add_driver+0x154/0x280
       [<ffffffff815604c0>] driver_register+0x60/0xe0
       [<ffffffff8145bed0>] __pci_register_driver+0x60/0x70
       [<ffffffffa0273e14>] saa7146_register_extension+0x64/0x90 [saa7146]
       [<ffffffffa0033011>] hexium_init_module+0x11/0x1000 [hexium_gemini]
      ...
      
      As can be (mostly) seen, driver_register causes this call sequence:
        -> bus_add_driver
          -> module_add_driver
            -> module_create_drivers_dir
      The last one creates "drivers" directory in /sys/module/<...>. When
      this is done in parallel, the directory is attempted to be created
      twice at the same time.
      
      This can be easily reproduced by loading mxb and hexium_gemini in
      parallel:
      while :; do
        modprobe mxb &
        modprobe hexium_gemini
        wait
        rmmod mxb hexium_gemini saa7146_vv saa7146
      done
      
      saa7146 calls pci_register_driver for both mxb and hexium_gemini,
      which means /sys/module/saa7146/drivers is to be created for both of
      them.
      
      Fix this by a new mutex in module_create_drivers_dir which makes the
      test-and-create "drivers" dir atomic.
      
      I inverted the condition and removed 'return' to avoid multiple
      unlocks or a goto.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Fixes: fe480a26 (Modules: only add drivers/ direcory if needed)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41fa543b
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Handle NULL formats in hold_module_trace_bprintk_format() · faa84d83
      Steven Rostedt (Red Hat) authored
      commit 70c8217a upstream.
      
      If a task uses a non constant string for the format parameter in
      trace_printk(), then the trace_printk_fmt variable is set to NULL. This
      variable is then saved in the __trace_printk_fmt section.
      
      The function hold_module_trace_bprintk_format() checks to see if duplicate
      formats are used by modules, and reuses them if so (saves them to the list
      if it is new). But this function calls lookup_format() that does a strcmp()
      to the value (which is now NULL) and can cause a kernel oops.
      
      This wasn't an issue till 3debb0a9 ("tracing: Fix trace_printk() to print
      when not using bprintk()") which added "__used" to the trace_printk_fmt
      variable, and before that, the kernel simply optimized it out (no NULL value
      was saved).
      
      The fix is simply to handle the NULL pointer in lookup_format() and have the
      caller ignore the value if it was NULL.
      
      Link: http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.xing@intel.comReported-by: default avatarxingzhen <zhengjun.xing@intel.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Fixes: 3debb0a9 ("tracing: Fix trace_printk() to print when not using bprintk()")
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      faa84d83
    • Allen Hung's avatar
      HID: multitouch: enable palm rejection for Windows Precision Touchpad · 3920fceb
      Allen Hung authored
      commit 6dd2e27a upstream.
      
      The usage Confidence is mandary to Windows Precision Touchpad devices. If
      it is examined in input_mapping on a WIndows Precision Touchpad, a new add
      quirk MT_QUIRK_CONFIDENCE desgned for such devices will be applied to the
      device. A touch with the confidence bit is not set is determined as
      invalid.
      
      Tested on Dell XPS13 9343
      Reviewed-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Tested-by: Andy Lutomirski <luto@kernel.org> # XPS 13 9350, BIOS 1.4.3
      Signed-off-by: default avatarAllen Hung <allen_hung@dell.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3920fceb
    • Allen Hung's avatar
      Revert "HID: multitouch: enable palm rejection if device implements confidence usage" · 63abc957
      Allen Hung authored
      commit 62630ea7 upstream.
      
      This reverts commit 25a84db1 ("HID: multitouch: enable palm rejection
      if device implements confidence usage")
      
      The commit enables palm rejection for Win8 Precision Touchpad devices but
      the quirk MT_QUIRK_VALID_IS_CONFIDENCE it is using is not working very
      properly. This quirk is originally designed for some WIn7 touchscreens. Use
      of this for a Win8 Precision Touchpad will cause unexpected pointer jumping
      problem.
      Reviewed-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Tested-by: Andy Lutomirski <luto@kernel.org> # XPS 13 9350, BIOS 1.4.3
      Signed-off-by: default avatarAllen Hung <allen_hung@dell.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63abc957
    • Scott Bauer's avatar
      HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commands · 5618231d
      Scott Bauer authored
      commit 93a2001b upstream.
      
      This patch validates the num_values parameter from userland during the
      HIDIOCGUSAGES and HIDIOCSUSAGES commands. Previously, if the report id was set
      to HID_REPORT_ID_UNKNOWN, we would fail to validate the num_values parameter
      leading to a heap overflow.
      Signed-off-by: default avatarScott Bauer <sbauer@plzdonthack.me>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5618231d
    • Oliver Neukum's avatar
      HID: elo: kill not flush the work · f956617c
      Oliver Neukum authored
      commit ed596a4a upstream.
      
      Flushing a work that reschedules itself is not a sensible operation. It needs
      to be killed. Failure to do so leads to a kernel panic in the timer code.
      Signed-off-by: default avatarOliver Neukum <ONeukum@suse.com>
      Reviewed-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f956617c
    • Quentin Casasnovas's avatar
      KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode. · f846155f
      Quentin Casasnovas authored
      commit ff30ef40 upstream.
      
      I couldn't get Xen to boot a L2 HVM when it was nested under KVM - it was
      getting a GP(0) on a rather unspecial vmread from Xen:
      
           (XEN) ----[ Xen-4.7.0-rc  x86_64  debug=n  Not tainted ]----
           (XEN) CPU:    1
           (XEN) RIP:    e008:[<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450
           (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d1v0)
           (XEN) rax: ffff82d0801e6288   rbx: ffff83003ffbfb7c   rcx: fffffffffffab928
           (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff83000bdd0000
           (XEN) rbp: ffff83000bdd0000   rsp: ffff83003ffbfab0   r8:  ffff830038813910
           (XEN) r9:  ffff83003faf3958   r10: 0000000a3b9f7640   r11: ffff83003f82d418
           (XEN) r12: 0000000000000000   r13: ffff83003ffbffff   r14: 0000000000004802
           (XEN) r15: 0000000000000008   cr0: 0000000080050033   cr4: 00000000001526e0
           (XEN) cr3: 000000003fc79000   cr2: 0000000000000000
           (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
           (XEN) Xen code around <ffff82d0801e629e> (vmx_get_segment_register+0x14e/0x450):
           (XEN)  00 00 41 be 02 48 00 00 <44> 0f 78 74 24 08 0f 86 38 56 00 00 b8 08 68 00
           (XEN) Xen stack trace from rsp=ffff83003ffbfab0:
      
           ...
      
           (XEN) Xen call trace:
           (XEN)    [<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450
           (XEN)    [<ffff82d0801f3695>] get_page_from_gfn_p2m+0x165/0x300
           (XEN)    [<ffff82d0801bfe32>] hvmemul_get_seg_reg+0x52/0x60
           (XEN)    [<ffff82d0801bfe93>] hvm_emulate_prepare+0x53/0x70
           (XEN)    [<ffff82d0801ccacb>] handle_mmio+0x2b/0xd0
           (XEN)    [<ffff82d0801be591>] emulate.c#_hvm_emulate_one+0x111/0x2c0
           (XEN)    [<ffff82d0801cd6a4>] handle_hvm_io_completion+0x274/0x2a0
           (XEN)    [<ffff82d0801f334a>] __get_gfn_type_access+0xfa/0x270
           (XEN)    [<ffff82d08012f3bb>] timer.c#add_entry+0x4b/0xb0
           (XEN)    [<ffff82d08012f80c>] timer.c#remove_entry+0x7c/0x90
           (XEN)    [<ffff82d0801c8433>] hvm_do_resume+0x23/0x140
           (XEN)    [<ffff82d0801e4fe7>] vmx_do_resume+0xa7/0x140
           (XEN)    [<ffff82d080164aeb>] context_switch+0x13b/0xe40
           (XEN)    [<ffff82d080128e6e>] schedule.c#schedule+0x22e/0x570
           (XEN)    [<ffff82d08012c0cc>] softirq.c#__do_softirq+0x5c/0x90
           (XEN)    [<ffff82d0801602c5>] domain.c#idle_loop+0x25/0x50
           (XEN)
           (XEN)
           (XEN) ****************************************
           (XEN) Panic on CPU 1:
           (XEN) GENERAL PROTECTION FAULT
           (XEN) [error_code=0000]
           (XEN) ****************************************
      
      Tracing my host KVM showed it was the one injecting the GP(0) when
      emulating the VMREAD and checking the destination segment permissions in
      get_vmx_mem_address():
      
           3)               |    vmx_handle_exit() {
           3)               |      handle_vmread() {
           3)               |        nested_vmx_check_permission() {
           3)               |          vmx_get_segment() {
           3)   0.074 us    |            vmx_read_guest_seg_base();
           3)   0.065 us    |            vmx_read_guest_seg_selector();
           3)   0.066 us    |            vmx_read_guest_seg_ar();
           3)   1.636 us    |          }
           3)   0.058 us    |          vmx_get_rflags();
           3)   0.062 us    |          vmx_read_guest_seg_ar();
           3)   3.469 us    |        }
           3)               |        vmx_get_cs_db_l_bits() {
           3)   0.058 us    |          vmx_read_guest_seg_ar();
           3)   0.662 us    |        }
           3)               |        get_vmx_mem_address() {
           3)   0.068 us    |          vmx_cache_reg();
           3)               |          vmx_get_segment() {
           3)   0.074 us    |            vmx_read_guest_seg_base();
           3)   0.068 us    |            vmx_read_guest_seg_selector();
           3)   0.071 us    |            vmx_read_guest_seg_ar();
           3)   1.756 us    |          }
           3)               |          kvm_queue_exception_e() {
           3)   0.066 us    |            kvm_multiple_exception();
           3)   0.684 us    |          }
           3)   4.085 us    |        }
           3)   9.833 us    |      }
           3) + 10.366 us   |    }
      
      Cross-checking the KVM/VMX VMREAD emulation code with the Intel Software
      Developper Manual Volume 3C - "VMREAD - Read Field from Virtual-Machine
      Control Structure", I found that we're enforcing that the destination
      operand is NOT located in a read-only data segment or any code segment when
      the L1 is in long mode - BUT that check should only happen when it is in
      protected mode.
      
      Shuffling the code a bit to make our emulation follow the specification
      allows me to boot a Xen dom0 in a nested KVM and start HVM L2 guests
      without problems.
      
      Fixes: f9eb4af6 ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
      Signed-off-by: default avatarQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Eugene Korenevsky <ekorenevsky@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f846155f
    • James Morse's avatar
      KVM: arm/arm64: Stop leaking vcpu pid references · 77588bda
      James Morse authored
      commit 591d215a upstream.
      
      kvm provides kvm_vcpu_uninit(), which amongst other things, releases the
      last reference to the struct pid of the task that was last running the vcpu.
      
      On arm64 built with CONFIG_DEBUG_KMEMLEAK, starting a guest with kvmtool,
      then killing it with SIGKILL results (after some considerable time) in:
      > cat /sys/kernel/debug/kmemleak
      > unreferenced object 0xffff80007d5ea080 (size 128):
      >  comm "lkvm", pid 2025, jiffies 4294942645 (age 1107.776s)
      >  hex dump (first 32 bytes):
      >    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      >    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      >  backtrace:
      >    [<ffff8000001b30ec>] create_object+0xfc/0x278
      >    [<ffff80000071da34>] kmemleak_alloc+0x34/0x70
      >    [<ffff80000019fa2c>] kmem_cache_alloc+0x16c/0x1d8
      >    [<ffff8000000d0474>] alloc_pid+0x34/0x4d0
      >    [<ffff8000000b5674>] copy_process.isra.6+0x79c/0x1338
      >    [<ffff8000000b633c>] _do_fork+0x74/0x320
      >    [<ffff8000000b66b0>] SyS_clone+0x18/0x20
      >    [<ffff800000085cb0>] el0_svc_naked+0x24/0x28
      >    [<ffffffffffffffff>] 0xffffffffffffffff
      
      On x86 kvm_vcpu_uninit() is called on the path from kvm_arch_destroy_vm(),
      on arm no equivalent call is made. Add the call to kvm_arch_vcpu_free().
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Fixes: 749cf76c ("KVM: ARM: Initial skeleton to compile KVM support")
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      77588bda
    • Christian Borntraeger's avatar
      KVM: s390/mm: Fix CMMA reset during reboot · c0917816
      Christian Borntraeger authored
      commit 1c343f7b upstream.
      
      commit 1e133ab2 ("s390/mm: split arch/s390/mm/pgtable.c") factored
      out the page table handling code from __gmap_zap and  __s390_reset_cmma
      into ptep_zap_unused and added a simple flag that tells which one of the
      function (reset or not) is to be made. This also changed the behaviour,
      as it also zaps unused page table entries on reset.
      Turns out that this is wrong as s390_reset_cmma uses the page walker,
      which DOES NOT take the ptl lock.
      
      The most simple fix is to not do the zapping part on reset (which uses
      the walker)
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Fixes: 1e133ab2 ("s390/mm: split arch/s390/mm/pgtable.c")
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0917816
    • Xiubo Li's avatar
      kvm: Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES · 95fddb6a
      Xiubo Li authored
      commit caf1ff26 upstream.
      
      These days, we experienced one guest crash with 8 cores and 3 disks,
      with qemu error logs as bellow:
      
      qemu-system-x86_64: /build/qemu-2.0.0/kvm-all.c:984:
      kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
      
      And then we found one patch(bdf026317d) in qemu tree, which said
      could fix this bug.
      
      Execute the following script will reproduce the BUG quickly:
      
      irq_affinity.sh
      ========================================================================
      
      vda_irq_num=25
      vdb_irq_num=27
      while [ 1 ]
      do
          for irq in {1,2,4,8,10,20,40,80}
              do
                  echo $irq > /proc/irq/$vda_irq_num/smp_affinity
                  echo $irq > /proc/irq/$vdb_irq_num/smp_affinity
                  dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct
                  dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct
              done
      done
      ========================================================================
      
      The following qemu log is added in the qemu code and is displayed when
      this bug reproduced:
      
      kvm_irqchip_commit_routes: max gsi: 1008, nr_allocated_irq_routes: 1024,
      irq_routes->nr: 1024, gsi_count: 1024.
      
      That's to say when irq_routes->nr == 1024, there are 1024 routing entries,
      but in the kernel code when routes->nr >= 1024, will just return -EINVAL;
      
      The nr is the number of the routing entries which is in of
      [1 ~ KVM_MAX_IRQ_ROUTES], not the index in [0 ~ KVM_MAX_IRQ_ROUTES - 1].
      
      This patch fix the BUG above.
      Signed-off-by: default avatarXiubo Li <lixiubo@cmss.chinamobile.com>
      Signed-off-by: default avatarWei Tang <tangwei@cmss.chinamobile.com>
      Signed-off-by: default avatarZhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95fddb6a
    • Yang Zhang's avatar
      kvm: vmx: check apicv is active before using VT-d posted interrupt · c91ef8db
      Yang Zhang authored
      commit a0052191 upstream.
      
      VT-d posted interrupt is relying on the CPU side's posted interrupt.
      Need to check whether VCPU's APICv is active before enabing VT-d
      posted interrupt.
      
      Fixes: d62caabbSigned-off-by: default avatarYang Zhang <yang.zhang.wz@gmail.com>
      Signed-off-by: default avatarShengge Ding <shengge.dsg@alibaba-inc.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c91ef8db
    • Dan Carpenter's avatar
      KEYS: potential uninitialized variable · d8498ead
      Dan Carpenter authored
      commit 38327424 upstream.
      
      If __key_link_begin() failed then "edit" would be uninitialized.  I've
      added a check to fix that.
      
      This allows a random user to crash the kernel, though it's quite
      difficult to achieve.  There are three ways it can be done as the user
      would have to cause an error to occur in __key_link():
      
       (1) Cause the kernel to run out of memory.  In practice, this is difficult
           to achieve without ENOMEM cropping up elsewhere and aborting the
           attempt.
      
       (2) Revoke the destination keyring between the keyring ID being looked up
           and it being tested for revocation.  In practice, this is difficult to
           time correctly because the KEYCTL_REJECT function can only be used
           from the request-key upcall process.  Further, users can only make use
           of what's in /sbin/request-key.conf, though this does including a
           rejection debugging test - which means that the destination keyring
           has to be the caller's session keyring in practice.
      
       (3) Have just enough key quota available to create a key, a new session
           keyring for the upcall and a link in the session keyring, but not then
           sufficient quota to create a link in the nominated destination keyring
           so that it fails with EDQUOT.
      
      The bug can be triggered using option (3) above using something like the
      following:
      
      	echo 80 >/proc/sys/kernel/keys/root_maxbytes
      	keyctl request2 user debug:fred negate @t
      
      The above sets the quota to something much lower (80) to make the bug
      easier to trigger, but this is dependent on the system.  Note also that
      the name of the keyring created contains a random number that may be
      between 1 and 10 characters in size, so may throw the test off by
      changing the amount of quota used.
      
      Assuming the failure occurs, something like the following will be seen:
      
      	kfree_debugcheck: out of range ptr 6b6b6b6b6b6b6b68h
      	------------[ cut here ]------------
      	kernel BUG at ../mm/slab.c:2821!
      	...
      	RIP: 0010:[<ffffffff811600f9>] kfree_debugcheck+0x20/0x25
      	RSP: 0018:ffff8804014a7de8  EFLAGS: 00010092
      	RAX: 0000000000000034 RBX: 6b6b6b6b6b6b6b68 RCX: 0000000000000000
      	RDX: 0000000000040001 RSI: 00000000000000f6 RDI: 0000000000000300
      	RBP: ffff8804014a7df0 R08: 0000000000000001 R09: 0000000000000000
      	R10: ffff8804014a7e68 R11: 0000000000000054 R12: 0000000000000202
      	R13: ffffffff81318a66 R14: 0000000000000000 R15: 0000000000000001
      	...
      	Call Trace:
      	  kfree+0xde/0x1bc
      	  assoc_array_cancel_edit+0x1f/0x36
      	  __key_link_end+0x55/0x63
      	  key_reject_and_link+0x124/0x155
      	  keyctl_reject_key+0xb6/0xe0
      	  keyctl_negate_key+0x10/0x12
      	  SyS_keyctl+0x9f/0xe7
      	  do_syscall_64+0x63/0x13a
      	  entry_SYSCALL64_slow_path+0x25/0x25
      
      Fixes: f70e2e06 ('KEYS: Do preallocation for __key_link()')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8498ead
    • Martin KaFai Lau's avatar
      ipv6: Fix mem leak in rt6i_pcpu · 2ac4d627
      Martin KaFai Lau authored
      [ Upstream commit 903ce4ab ]
      
      It was first reported and reproduced by Petr (thanks!) in
      https://bugzilla.kernel.org/show_bug.cgi?id=119581
      
      free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().
      
      However, after fixing a deadlock bug in
      commit 9c7370a1 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
      free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.
      
      It is worth to note that rt6i_pcpu is protected by table->tb6_lock.
      
      kmemleak somehow did not report it.  We nailed it down by
      observing the pcpu entries in /proc/vmallocinfo (first suggested
      by Hannes, thanks!).
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Fixes: 9c7370a1 ("ipv6: Fix a potential deadlock when creating pcpu rt")
      Reported-by: default avatarPetr Novopashenniy <pety@rusnet.ru>
      Tested-by: default avatarPetr Novopashenniy <pety@rusnet.ru>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Petr Novopashenniy <pety@rusnet.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ac4d627
    • Bjørn Mork's avatar
      cdc_ncm: workaround for EM7455 "silent" data interface · 33884dba
      Bjørn Mork authored
      [ Upstream commit c086e709 ]
      
      Several Lenovo users have reported problems with their Sierra
      Wireless EM7455 modem. The driver has loaded successfully and
      the MBIM management channel has appeared to work, including
      establishing a connection to the mobile network. But no frames
      have been received over the data interface.
      
      The problem affects all EM7455 and MC7455, and is assumed to
      affect other modems based on the same Qualcomm chipset and
      baseband firmware.
      
      Testing narrowed the problem down to what seems to be a
      firmware timing bug during initialization. Adding a short sleep
      while probing is sufficient to make the problem disappear.
      Experiments have shown that 1-2 ms is too little to have any
      effect, while 10-20 ms is enough to reliably succeed.
      Reported-by: default avatarStefan Armbruster <ml001@armbruster-it.de>
      Reported-by: default avatarRalph Plawetzki <ralph@purejava.org>
      Reported-by: default avatarAndreas Fett <andreas.fett@secunet.com>
      Reported-by: default avatarRasmus Lerdorf <rasmus@lerdorf.com>
      Reported-by: default avatarSamo Ratnik <samo.ratnik@gmail.com>
      Reported-and-tested-by: default avatarAleksander Morgado <aleksander@aleksander.es>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33884dba
    • Haishuang Yan's avatar
      geneve: fix max_mtu setting · 309bcd10
      Haishuang Yan authored
      [ Upstream commit d5d5e8d5 ]
      
      For ipv6+udp+geneve encapsulation data, the max_mtu should subtract
      sizeof(ipv6hdr), instead of sizeof(iphdr).
      Signed-off-by: default avatarHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      309bcd10
    • Daniel Borkmann's avatar
      macsec: set actual real device for xmit when !protect_frames · 80420d65
      Daniel Borkmann authored
      [ Upstream commit 79c62220 ]
      
      Avoid recursions of dev_queue_xmit() to the wrong net device when
      frames are unprotected, since at that time skb->dev still points to
      our own macsec dev and unlike macsec_encrypt_finish() dev pointer
      doesn't get updated to real underlying device.
      
      Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80420d65
    • WANG Cong's avatar
      net_sched: fix mirrored packets checksum · 49729ecc
      WANG Cong authored
      [ Upstream commit 82a31b92 ]
      
      Similar to commit 9b368814 ("net: fix bridge multicast packet checksum validation")
      we need to fixup the checksum for CHECKSUM_COMPLETE when
      pushing skb on RX path. Otherwise we get similar splats.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      49729ecc
    • David S. Miller's avatar
      packet: Use symmetric hash for PACKET_FANOUT_HASH. · 74a601e6
      David S. Miller authored
      [ Upstream commit eb70db87 ]
      
      People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
      they want packets going in both directions on a flow to hash to the
      same bucket.
      
      The core kernel SKB hash became non-symmetric when the ipv6 flow label
      and other entities were incorporated into the standard flow hash order
      to increase entropy.
      
      But there are no users of PACKET_FANOUT_HASH who want an assymetric
      hash, they all want a symmetric one.
      
      Therefore, use the flow dissector to compute a flat symmetric hash
      over only the protocol, addresses and ports.  This hash does not get
      installed into and override the normal skb hash, so this change has
      no effect whatsoever on the rest of the stack.
      Reported-by: default avatarEric Leblond <eric@regit.org>
      Tested-by: default avatarEric Leblond <eric@regit.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74a601e6
    • Peter Zijlstra's avatar
      sched/fair: Fix cfs_rq avg tracking underflow · 59eca1a4
      Peter Zijlstra authored
      commit 89741892 upstream.
      
      As per commit:
      
        b7fa30c9 ("sched/fair: Fix post_init_entity_util_avg() serialization")
      
      > the code generated from update_cfs_rq_load_avg():
      >
      > 	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
      > 		s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
      > 		sa->load_avg = max_t(long, sa->load_avg - r, 0);
      > 		sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
      > 		removed_load = 1;
      > 	}
      >
      > turns into:
      >
      > ffffffff81087064:       49 8b 85 98 00 00 00    mov    0x98(%r13),%rax
      > ffffffff8108706b:       48 85 c0                test   %rax,%rax
      > ffffffff8108706e:       74 40                   je     ffffffff810870b0 <update_blocked_averages+0xc0>
      > ffffffff81087070:       4c 89 f8                mov    %r15,%rax
      > ffffffff81087073:       49 87 85 98 00 00 00    xchg   %rax,0x98(%r13)
      > ffffffff8108707a:       49 29 45 70             sub    %rax,0x70(%r13)
      > ffffffff8108707e:       4c 89 f9                mov    %r15,%rcx
      > ffffffff81087081:       bb 01 00 00 00          mov    $0x1,%ebx
      > ffffffff81087086:       49 83 7d 70 00          cmpq   $0x0,0x70(%r13)
      > ffffffff8108708b:       49 0f 49 4d 70          cmovns 0x70(%r13),%rcx
      >
      > Which you'll note ends up with sa->load_avg -= r in memory at
      > ffffffff8108707a.
      
      So I _should_ have looked at other unserialized users of ->load_avg,
      but alas. Luckily nikbor reported a similar /0 from task_h_load() which
      instantly triggered recollection of this here problem.
      
      Aside from the intermediate value hitting memory and causing problems,
      there's another problem: the underflow detection relies on the signed
      bit. This reduces the effective width of the variables, IOW its
      effectively the same as having these variables be of signed type.
      
      This patch changes to a different means of unsigned underflow
      detection to not rely on the signed bit. This allows the variables to
      use the 'full' unsigned range. And it does so with explicit LOAD -
      STORE to ensure any intermediate value will never be visible in
      memory, allowing these unserialized loads.
      
      Note: GCC generates crap code for this, might warrant a look later.
      
      Note2: I say 'full' above, if we end up at U*_MAX we'll still explode;
             maybe we should do clamping on add too.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Cc: bsegall@google.com
      Cc: kernel@kyup.com
      Cc: morten.rasmussen@arm.com
      Cc: pjt@google.com
      Cc: steve.muckle@linaro.org
      Fixes: 9d89c257 ("sched/fair: Rewrite runnable load and utilization average tracking")
      Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      59eca1a4
    • Kirill A. Shutemov's avatar
      UBIFS: Implement ->migratepage() · 27bd6fcb
      Kirill A. Shutemov authored
      commit 4ac1c17b upstream.
      
      During page migrations UBIFS might get confused
      and the following assert triggers:
      [  213.480000] UBIFS assert failed in ubifs_set_page_dirty at 1451 (pid 436)
      [  213.490000] CPU: 0 PID: 436 Comm: drm-stress-test Not tainted 4.4.4-00176-geaa802524636-dirty #1008
      [  213.490000] Hardware name: Allwinner sun4i/sun5i Families
      [  213.490000] [<c0015e70>] (unwind_backtrace) from [<c0012cdc>] (show_stack+0x10/0x14)
      [  213.490000] [<c0012cdc>] (show_stack) from [<c02ad834>] (dump_stack+0x8c/0xa0)
      [  213.490000] [<c02ad834>] (dump_stack) from [<c0236ee8>] (ubifs_set_page_dirty+0x44/0x50)
      [  213.490000] [<c0236ee8>] (ubifs_set_page_dirty) from [<c00fa0bc>] (try_to_unmap_one+0x10c/0x3a8)
      [  213.490000] [<c00fa0bc>] (try_to_unmap_one) from [<c00fadb4>] (rmap_walk+0xb4/0x290)
      [  213.490000] [<c00fadb4>] (rmap_walk) from [<c00fb1bc>] (try_to_unmap+0x64/0x80)
      [  213.490000] [<c00fb1bc>] (try_to_unmap) from [<c010dc28>] (migrate_pages+0x328/0x7a0)
      [  213.490000] [<c010dc28>] (migrate_pages) from [<c00d0cb0>] (alloc_contig_range+0x168/0x2f4)
      [  213.490000] [<c00d0cb0>] (alloc_contig_range) from [<c010ec00>] (cma_alloc+0x170/0x2c0)
      [  213.490000] [<c010ec00>] (cma_alloc) from [<c001a958>] (__alloc_from_contiguous+0x38/0xd8)
      [  213.490000] [<c001a958>] (__alloc_from_contiguous) from [<c001ad44>] (__dma_alloc+0x23c/0x274)
      [  213.490000] [<c001ad44>] (__dma_alloc) from [<c001ae08>] (arm_dma_alloc+0x54/0x5c)
      [  213.490000] [<c001ae08>] (arm_dma_alloc) from [<c035cecc>] (drm_gem_cma_create+0xb8/0xf0)
      [  213.490000] [<c035cecc>] (drm_gem_cma_create) from [<c035cf20>] (drm_gem_cma_create_with_handle+0x1c/0xe8)
      [  213.490000] [<c035cf20>] (drm_gem_cma_create_with_handle) from [<c035d088>] (drm_gem_cma_dumb_create+0x3c/0x48)
      [  213.490000] [<c035d088>] (drm_gem_cma_dumb_create) from [<c0341ed8>] (drm_ioctl+0x12c/0x444)
      [  213.490000] [<c0341ed8>] (drm_ioctl) from [<c0121adc>] (do_vfs_ioctl+0x3f4/0x614)
      [  213.490000] [<c0121adc>] (do_vfs_ioctl) from [<c0121d30>] (SyS_ioctl+0x34/0x5c)
      [  213.490000] [<c0121d30>] (SyS_ioctl) from [<c000f2c0>] (ret_fast_syscall+0x0/0x34)
      
      UBIFS is using PagePrivate() which can have different meanings across
      filesystems. Therefore the generic page migration code cannot handle this
      case correctly.
      We have to implement our own migration function which basically does a
      plain copy but also duplicates the page private flag.
      UBIFS is not a block device filesystem and cannot use buffer_migrate_page().
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      [rw: Massaged changelog, build fixes, etc...]
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27bd6fcb
    • Richard Weinberger's avatar
      mm: Export migrate_page_move_mapping and migrate_page_copy · 3eaf687c
      Richard Weinberger authored
      commit 1118dce7 upstream.
      
      Export these symbols such that UBIFS can implement
      ->migratepage.
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3eaf687c
    • Harvey Hunt's avatar
      irqchip/mips-gic: Fix IRQs in gic_dev_domain · fe5b6d00
      Harvey Hunt authored
      commit 4b2312bd upstream.
      
      When allocating a new device IRQ, gic_dev_domain_alloc() correctly calls
      irq_domain_set_hwirq_and_chip(), but gic_irq_domain_alloc() does not. This
      means that gic_irq_domain believes all IRQs from the dev domain have an
      hwirq of 0 and creates incorrect mappings in the linear_revmap. As
      gic_irq_domain is a parent of the gic_dev_domain, this leads to an
      inability to boot on devices with a GIC. Excerpt of the error:
      
      [    2.297649] irq 0: nobody cared (try booting with the "irqpoll" option)
      ...
      [    2.436963] handlers:
      [    2.439492] Disabling IRQ #0
      
      Fix this by calling irq_domain_set_hwirq_and_chip() for both the dev and
      irq domain.
      
      Now that we are modifying the parent domain, be sure to clear it up in
      case of an allocation error.
      
      Fixes: c98c1822 ("irqchip/mips-gic: Add device hierarchy domain")
      Fixes: 2af70a96 ("irqchip/mips-gic: Add a IPI hierarchy domain")
      Signed-off-by: default avatarHarvey Hunt <harvey.hunt@imgtec.com>
      Tested-by: Govindraj Raja <Govindraj.Raja@imgtec.com> # On Pistachio SoC
      Reviewed-by: default avatarMatt Redfearn <matt.redfearn@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: Qais Yousef <qsyousef@gmail.com>
      Cc: jason@lakedaemon.net
      Cc: marc.zyngier@arm.com
      Link: http://lkml.kernel.org/r/1464001552-31174-1-git-send-email-harvey.hunt@imgtec.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe5b6d00