1. 07 Sep, 2016 18 commits
    • Ross Zwisler's avatar
      libnvdimm, nd_blk: mask off reserved status bits · a794c7a9
      Ross Zwisler authored
      commit 68202c9f upstream.
      
      The "NVDIMM Block Window Driver Writer's Guide":
      
          http://pmem.io/documents/NVDIMM_DriverWritersGuide-July-2016.pdf
      
      ...defines the layout of the block window status register.  For the July
      2016 version of the spec linked to above, this happens in Figure 4 on
      page 26.
      
      The only bits defined in this spec are bits 31, 5, 4, 2, 1 and 0.  The
      rest of the bits in the status register are reserved, and there is a
      warning following the diagram that says:
      
          Note: The driver cannot assume the value of the RESERVED bits in the
          status register are zero. These reserved bits need to be masked off, and
          the driver must avoid checking the state of those bits.
      
      This change ensures that for hardware implementations that set these
      reserved bits in the status register, the driver won't incorrectly fail the
      block I/Os.
      Reviewed-by: default avatarLee, Chun-Yi <jlee@suse.com>
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a794c7a9
    • Adrian Hunter's avatar
      perf intel-pt: Fix occasional decoding errors when tracing system-wide · af8ff84d
      Adrian Hunter authored
      commit 3d918fb1 upstream.
      
      In order to successfully decode Intel PT traces, context switch events
      are needed from the moment the trace starts. Currently that is ensured
      by using the 'immediate' flag which enables the switch event when it is
      opened.
      
      However, since commit 86c27869 ("perf intel-pt: Add support for
      PERF_RECORD_SWITCH") that might not always happen. When tracing
      system-wide the context switch event is added to the tracking event
      which was not set as 'immediate'. Change that so it is.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Fixes: 86c27869 ("perf intel-pt: Add support for PERF_RECORD_SWITCH")
      Link: http://lkml.kernel.org/r/1471245784-22580-1-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af8ff84d
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix tick_stop tracepoint symbols for user export · 07780111
      Steven Rostedt (Red Hat) authored
      commit c87edb36 upstream.
      
      The symbols used in the tick_stop tracepoint were not being converted
      properly into integers in the trace_stop format file. Instead we had this:
      
      print fmt: "success=%d dependency=%s", REC->success,
          __print_symbolic(REC->dependency, { 0, "NONE" },
           { (1 << TICK_DEP_BIT_POSIX_TIMER), "POSIX_TIMER" },
           { (1 << TICK_DEP_BIT_PERF_EVENTS), "PERF_EVENTS" },
           { (1 << TICK_DEP_BIT_SCHED), "SCHED" },
           { (1 << TICK_DEP_BIT_CLOCK_UNSTABLE), "CLOCK_UNSTABLE" })
      
      User space tools have no idea how to parse "TICK_DEP_BIT_SCHED" or the other
      symbols used to do the bit shifting. The reason is that the conversion was
      done with using the TICK_DEP_MASK_* symbols which are just macros that
      convert to the BIT shift itself (with the exception of NONE, which was
      converted properly, because it doesn't use bits, and is defined as zero).
      
      The TICK_DEP_BIT_* needs to be denoted by TRACE_DEFINE_ENUM() in order to
      have this properly converted for user space tools to parse this event.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Fixes: e6e6cc22 ("nohz: Use enum code for tick stop failure tracing message")
      Reported-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Tested-by: default avatarLuiz Capitulino <lcapitulino@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07780111
    • Alex Williamson's avatar
      vfio/pci: Fix NULL pointer oops in error interrupt setup handling · 71a3276e
      Alex Williamson authored
      commit c8952a70 upstream.
      
      There are multiple cases in vfio_pci_set_ctx_trigger_single() where
      we assume we can safely read from our data pointer without actually
      checking whether the user has passed any data via the count field.
      VFIO_IRQ_SET_DATA_NONE in particular is entirely broken since we
      attempt to pull an int32_t file descriptor out before even checking
      the data type.  The other data types assume the data pointer contains
      one element of their type as well.
      
      In part this is good news because we were previously restricted from
      doing much sanitization of parameters because it was missed in the
      past and we didn't want to break existing users.  Clearly DATA_NONE
      is completely broken, so it must not have any users and we can fix
      it up completely.  For DATA_BOOL and DATA_EVENTFD, we'll just
      protect ourselves, returning error when count is zero since we
      previously would have oopsed.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Reported-by: default avatarChris Thompson <the_cartographer@hotmail.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71a3276e
    • Chris Wilson's avatar
      mm/slub.c: run free_partial() outside of the kmem_cache_node->list_lock · e6279bc9
      Chris Wilson authored
      commit 60398923 upstream.
      
      With debugobjects enabled and using SLAB_DESTROY_BY_RCU, when a
      kmem_cache_node is destroyed the call_rcu() may trigger a slab
      allocation to fill the debug object pool (__debug_object_init:fill_pool).
      
      Everywhere but during kmem_cache_destroy(), discard_slab() is performed
      outside of the kmem_cache_node->list_lock and avoids a lockdep warning
      about potential recursion:
      
        =============================================
        [ INFO: possible recursive locking detected ]
        4.8.0-rc1-gfxbench+ #1 Tainted: G     U
        ---------------------------------------------
        rmmod/8895 is trying to acquire lock:
         (&(&n->list_lock)->rlock){-.-...}, at: [<ffffffff811c80d7>] get_partial_node.isra.63+0x47/0x430
      
        but task is already holding lock:
         (&(&n->list_lock)->rlock){-.-...}, at: [<ffffffff811cbda4>] __kmem_cache_shutdown+0x54/0x320
      
        other info that might help us debug this:
        Possible unsafe locking scenario:
              CPU0
              ----
         lock(&(&n->list_lock)->rlock);
         lock(&(&n->list_lock)->rlock);
      
         *** DEADLOCK ***
         May be due to missing lock nesting notation
         5 locks held by rmmod/8895:
         #0:  (&dev->mutex){......}, at: driver_detach+0x42/0xc0
         #1:  (&dev->mutex){......}, at: driver_detach+0x50/0xc0
         #2:  (cpu_hotplug.dep_map){++++++}, at: get_online_cpus+0x2d/0x80
         #3:  (slab_mutex){+.+.+.}, at: kmem_cache_destroy+0x3c/0x220
         #4:  (&(&n->list_lock)->rlock){-.-...}, at: __kmem_cache_shutdown+0x54/0x320
      
        stack backtrace:
        CPU: 6 PID: 8895 Comm: rmmod Tainted: G     U          4.8.0-rc1-gfxbench+ #1
        Hardware name: Gigabyte Technology Co., Ltd. H87M-D3H/H87M-D3H, BIOS F11 08/18/2015
        Call Trace:
          __lock_acquire+0x1646/0x1ad0
          lock_acquire+0xb2/0x200
          _raw_spin_lock+0x36/0x50
          get_partial_node.isra.63+0x47/0x430
          ___slab_alloc.constprop.67+0x1a7/0x3b0
          __slab_alloc.isra.64.constprop.66+0x43/0x80
          kmem_cache_alloc+0x236/0x2d0
          __debug_object_init+0x2de/0x400
          debug_object_activate+0x109/0x1e0
          __call_rcu.constprop.63+0x32/0x2f0
          call_rcu+0x12/0x20
          discard_slab+0x3d/0x40
          __kmem_cache_shutdown+0xdb/0x320
          shutdown_cache+0x19/0x60
          kmem_cache_destroy+0x1ae/0x220
          i915_gem_load_cleanup+0x14/0x40 [i915]
          i915_driver_unload+0x151/0x180 [i915]
          i915_pci_remove+0x14/0x20 [i915]
          pci_device_remove+0x34/0xb0
          __device_release_driver+0x95/0x140
          driver_detach+0xb6/0xc0
          bus_remove_driver+0x53/0xd0
          driver_unregister+0x27/0x50
          pci_unregister_driver+0x25/0x70
          i915_exit+0x1a/0x1e2 [i915]
          SyS_delete_module+0x193/0x1f0
          entry_SYSCALL_64_fastpath+0x1c/0xac
      
      Fixes: 52b4b950 ("mm: slab: free kmem_cache_node after destroy sysfs file")
      Link: http://lkml.kernel.org/r/1470759070-18743-1-git-send-email-chris@chris-wilson.co.ukReported-by: default avatarDave Gordon <david.s.gordon@intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Gordon <david.s.gordon@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6279bc9
    • Wei Yongjun's avatar
      virtio: fix memory leak in virtqueue_add() · 1711824f
      Wei Yongjun authored
      commit 58625edf upstream.
      
      When using the indirect buffers feature, 'desc' is allocated in
      virtqueue_add() but isn't freed before leaving on a ring full error,
      causing a memory leak.
      
      For example, it seems rather clear that this can trigger
      with virtio net if mergeable buffers are not used.
      Signed-off-by: default avatarWei Yongjun <weiyj.lk@gmail.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1711824f
    • Helge Deller's avatar
      parisc: Fix order of EREFUSED define in errno.h · 928f8268
      Helge Deller authored
      commit 3eb53b20 upstream.
      
      When building gccgo in userspace, errno.h gets parsed and the go include file
      sysinfo.go is generated.
      
      Since EREFUSED is defined to the same value as ECONNREFUSED, and ECONNREFUSED
      is defined later on in errno.h, this leads to go complaining that EREFUSED
      isn't defined yet.
      
      Fix this trivial problem by moving the define of EREFUSED down after
      ECONNREFUSED in errno.h (and clean up the indenting while touching this line).
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      928f8268
    • Austin Christ's avatar
      efi/capsule: Allocate whole capsule into virtual memory · e2358544
      Austin Christ authored
      commit 6862e6ad upstream.
      
      According to UEFI 2.6 section 7.5.3, the capsule should be in contiguous
      virtual memory and firmware may consume the capsule immediately. To
      correctly implement this functionality, the kernel driver needs to vmap
      the entire capsule at the time it is made available to firmware.
      
      The virtual allocation of the capsule update has been changed from kmap,
      which was only allocating the first page of the update, to vmap, and
      allocates the entire data payload.
      Signed-off-by: default avatarAustin Christ <austinwc@codeaurora.org>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: default avatarLee, Chun-Yi <jlee@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kweh Hock Leong <hock.leong.kweh@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1470912120-22831-3-git-send-email-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2358544
    • James Hogan's avatar
      arm64: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO · a7408eca
      James Hogan authored
      commit 3146bc64 upstream.
      
      AT_VECTOR_SIZE_ARCH should be defined with the maximum number of
      NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined
      for arm64 at all even though ARCH_DLINFO will contain one NEW_AUX_ENT
      for the VDSO address.
      
      This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for
      AT_BASE_PLATFORM which arm64 doesn't use, but lets define it now and add
      the comment above ARCH_DLINFO as found in several other architectures to
      remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to
      date.
      
      Fixes: f668cd16 ("arm64: ELF definitions")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7408eca
    • Takashi Iwai's avatar
      ALSA: hda - Manage power well properly for resume · 8ecb3bdb
      Takashi Iwai authored
      commit a52ff34e upstream.
      
      For SKL and later Intel chips, we control the power well per codec
      basis via link_power callback since the commit [03b135ce: ALSA:
      hda - remove dependency on i915 power well for SKL].
      However, there are a few exceptional cases where the gfx registers are
      accessed from the audio driver: namely the wakeup override bit
      toggling at (both system and runtime) resume.  This seems causing a
      kernel warning when accessed during the power well down (and likely
      resulting in the bogus register accesses).
      
      This patch puts the proper power up / down sequence around the resume
      code so that the wakeup bit is fiddled properly while the power is
      up.  (The other callback, sync_audio_rate, is used only in the PCM
      callback, so it's guaranteed in the power-on.)
      
      Also, by this proper power up/down, the instantaneous flip of wakeup
      bit in the resume callback that was introduced by the commit
      [033ea349: ALSA: hda - Fix Skylake codec timeout] becomes
      superfluous, as snd_hdac_display_power() already does it.  So we can
      clean it up together.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96214
      Fixes: 03b135ce ('ALSA: hda - remove dependency on i915 power well for SKL')
      Tested-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ecb3bdb
    • Vittorio Gambaletta (VittGam)'s avatar
      ALSA: usb-audio: Add quirk for ELP HD USB Camera · 8e9e9080
      Vittorio Gambaletta (VittGam) authored
      commit 41f5e3bd upstream.
      
      The ELP HD USB Camera (05a3:9420) needs this quirk for suppressing
      the unsupported sample rate inquiry.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=98481Signed-off-by: default avatarVittorio Gambaletta <linuxbugs@vittgam.net>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e9e9080
    • Piotr Karasinski's avatar
      ALSA: usb-audio: Add a sample rate quirk for Creative Live! Cam Socialize HD (VF0610) · 6be84fb4
      Piotr Karasinski authored
      commit 7627e40c upstream.
      
      VF0610 does not support reading the sample rate which leads to many
      lines of "cannot get freq at ep 0x82". This patch adds the USB ID
      (0x041E:4080) to snd_usb_get_sample_rate_quirk() list.
      Signed-off-by: default avatarPiotr Karasinski <peter.karasinski@gmail.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6be84fb4
    • Olga Kornievskaia's avatar
      SUNRPC: allow for upcalls for same uid but different gss service · 4fd2aa11
      Olga Kornievskaia authored
      commit 9130b8db upstream.
      
      It's possible to have simultaneous upcalls for the same UIDs but
      different GSS service. In that case, we need to allow for the
      upcall to gssd to proceed so that not the same context is used
      by two different GSS services. Some servers lock the use of context
      to the GSS service.
      Signed-off-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4fd2aa11
    • Trond Myklebust's avatar
      SUNRPC: Handle EADDRNOTAVAIL on connection failures · 071f3ed4
      Trond Myklebust authored
      commit 1f4c17a0 upstream.
      
      If the connect attempt immediately fails with an EADDRNOTAVAIL error, then
      that means our choice of source port number was bad.
      This error is expected when we set the SO_REUSEPORT socket option and we
      have 2 sockets sharing the same source and destination address and port
      combinations.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Fixes: 402e23b4 ("SUNRPC: Fix stupid typo in xs_sock_set_reuseport")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      071f3ed4
    • Dan Williams's avatar
      tools/testing/nvdimm: fix SIGTERM vs hotplug crash · 2be379d4
      Dan Williams authored
      commit d8d378fa upstream.
      
      The unit tests crash when hotplug races the previous probe. This race
      requires that the loading of the nfit_test module be terminated with
      SIGTERM, and the module to be unloaded while the ars scan is still
      running.
      
      In contrast to the normal nfit driver, the unit test calls
      acpi_nfit_init() twice to simulate hotplug, whereas the nominal case
      goes through the acpi_nfit_notify() event handler.  The
      acpi_nfit_notify() path is careful to flush the previous region
      registration before servicing the hotplug event. The unit test was
      missing this guarantee.
      
       BUG: unable to handle kernel NULL pointer dereference at           (null)
       IP: [<ffffffff810cdce7>] pwq_activate_delayed_work+0x47/0x170
       [..]
       Call Trace:
        [<ffffffff810ce186>] pwq_dec_nr_in_flight+0x66/0xa0
        [<ffffffff810ce490>] process_one_work+0x2d0/0x680
        [<ffffffff810ce331>] ? process_one_work+0x171/0x680
        [<ffffffff810ce88e>] worker_thread+0x4e/0x480
        [<ffffffff810ce840>] ? process_one_work+0x680/0x680
        [<ffffffff810ce840>] ? process_one_work+0x680/0x680
        [<ffffffff810d5343>] kthread+0xf3/0x110
        [<ffffffff8199846f>] ret_from_fork+0x1f/0x40
        [<ffffffff810d5250>] ? kthread_create_on_node+0x230/0x230
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2be379d4
    • Alex Thorlton's avatar
      x86/platform/uv: Skip UV runtime services mapping in the efi_runtime_disabled case · 942e1a1c
      Alex Thorlton authored
      commit f72075c9 upstream.
      
      This problem has actually been in the UV code for a while, but we didn't
      catch it until recently, because we had been relying on EFI_OLD_MEMMAP
      to allow our systems to boot for a period of time.  We noticed the issue
      when trying to kexec a recent community kernel, where we hit this NULL
      pointer dereference in efi_sync_low_kernel_mappings():
      
       [    0.337515] BUG: unable to handle kernel NULL pointer dereference at 0000000000000880
       [    0.346276] IP: [<ffffffff8105df8d>] efi_sync_low_kernel_mappings+0x5d/0x1b0
      
      The problem doesn't show up with EFI_OLD_MEMMAP because we skip the
      chunk of setup_efi_state() that sets the efi_loader_signature for the
      kexec'd kernel.  When the kexec'd kernel boots, it won't set EFI_BOOT in
      setup_arch, so we completely avoid the bug.
      
      We always kexec with noefi on the command line, so this shouldn't be an
      issue, but since we're not actually checking for efi_runtime_disabled in
      uv_bios_init(), we end up trying to do EFI runtime callbacks when we
      shouldn't be. This patch just adds a check for efi_runtime_disabled in
      uv_bios_init() so that we don't map in uv_systab when runtime_disabled ==
      true.
      Signed-off-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1470912120-22831-2-git-send-email-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      942e1a1c
    • Denys Vlasenko's avatar
      uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions · f05d3770
      Denys Vlasenko authored
      commit 68187872 upstream.
      
      Since instruction decoder now supports EVEX-encoded instructions, two fixes
      are needed to correctly handle them in uprobes.
      
      Extended bits for MODRM.rm field need to be sanitized just like we do it
      for VEX3, to avoid encoding wrong register for register-relative access.
      
      EVEX has _two_ extended bits: b and x. Theoretically, EVEX.x should be
      ignored by the CPU (since GPRs go only up to 15, not 31), but let's be
      paranoid here: proper encoding for register-relative access
      should have EVEX.x = 1.
      
      Secondly, we should fetch vex.vvvv for EVEX too.
      This is now super easy because instruction decoder populates
      vex_prefix.bytes[2] for all flavors of (e)vex encodings, even for VEX2.
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Fixes: 8a764a87 ("x86/asm/decoder: Create artificial 3rd byte for 2-byte VEX")
      Link: http://lkml.kernel.org/r/20160811154521.20469-1-dvlasenk@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f05d3770
    • Sebastian Andrzej Siewior's avatar
      x86/mm: Disable preemption during CR3 read+write · e2585f9d
      Sebastian Andrzej Siewior authored
      commit 5cf0791d upstream.
      
      There's a subtle preemption race on UP kernels:
      
      Usually current->mm (and therefore mm->pgd) stays the same during the
      lifetime of a task so it does not matter if a task gets preempted during
      the read and write of the CR3.
      
      But then, there is this scenario on x86-UP:
      
      TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by:
      
       -> mmput()
       -> exit_mmap()
       -> tlb_finish_mmu()
       -> tlb_flush_mmu()
       -> tlb_flush_mmu_tlbonly()
       -> tlb_flush()
       -> flush_tlb_mm_range()
       -> __flush_tlb_up()
       -> __flush_tlb()
       ->  __native_flush_tlb()
      
      At this point current->mm is NULL but current->active_mm still points to
      the "old" mm.
      
      Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its
      own mm so CR3 has changed.
      
      Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's
      mm and so CR3 remains unchanged. Once taskA gets active it continues
      where it was interrupted and that means it writes its old CR3 value
      back. Everything is fine because userland won't need its memory
      anymore.
      
      Now the fun part:
      
      Let's preempt taskA one more time and get back to taskB. This
      time switch_mm() won't do a thing because oldmm (->active_mm)
      is the same as mm (as per context_switch()). So we remain
      with a bad CR3 / PGD and return to userland.
      
      The next thing that happens is handle_mm_fault() with an address for
      the execution of its code in userland. handle_mm_fault() realizes that
      it has a PTE with proper rights so it returns doing nothing. But the
      CPU looks at the wrong PGD and insists that something is wrong and
      faults again. And again. And one more time…
      
      This pagefault circle continues until the scheduler gets tired of it and
      puts another task on the CPU. It gets little difficult if the task is a
      RT task with a high priority. The system will either freeze or it gets
      fixed by the software watchdog thread which usually runs at RT-max prio.
      But waiting for the watchdog will increase the latency of the RT task
      which is no good.
      
      Fix this by disabling preemption across the critical code section.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de
      [ Prettified the changelog. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2585f9d
  2. 20 Aug, 2016 22 commits