1. 19 Jan, 2017 40 commits
    • Andrei Vagin's avatar
      pid: fix lockdep deadlock warning due to ucount_lock · 52fd0ab0
      Andrei Vagin authored
      commit add7c65c upstream.
      
      =========================================================
      [ INFO: possible irq lock inversion dependency detected ]
      4.10.0-rc2-00024-g4aecec9-dirty #118 Tainted: G        W
      ---------------------------------------------------------
      swapper/1/0 just changed the state of lock:
       (&(&sighand->siglock)->rlock){-.....}, at: [<ffffffffbd0a1bc6>] __lock_task_sighand+0xb6/0x2c0
      but this lock took another, HARDIRQ-unsafe lock in the past:
       (ucounts_lock){+.+...}
      and interrupts could create inverse lock ordering between them.
      other info that might help us debug this:
      Chain exists of:                 &(&sighand->siglock)->rlock --> &(&tty->ctrl_lock)->rlock --> ucounts_lock
       Possible interrupt unsafe locking scenario:
             CPU0                    CPU1
             ----                    ----
        lock(ucounts_lock);
                                     local_irq_disable();
                                     lock(&(&sighand->siglock)->rlock);
                                     lock(&(&tty->ctrl_lock)->rlock);
        <Interrupt>
          lock(&(&sighand->siglock)->rlock);
      
       *** DEADLOCK ***
      
      This patch removes a dependency between rlock and ucount_lock.
      
      Fixes: f333c700 ("pidns: Add a limit on the number of pid namespaces")
      Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Acked-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52fd0ab0
    • Augusto Mecking Caringi's avatar
      vme: Fix wrong pointer utilization in ca91cx42_slave_get · 57bfd5a3
      Augusto Mecking Caringi authored
      commit c8a6a09c upstream.
      
      In ca91cx42_slave_get function, the value pointed by vme_base pointer is
      set through:
      
      *vme_base = ioread32(bridge->base + CA91CX42_VSI_BS[i]);
      
      So it must be dereferenced to be used in calculation of pci_base:
      
      *pci_base = (dma_addr_t)*vme_base + pci_offset;
      
      This bug was caught thanks to the following gcc warning:
      
      drivers/vme/bridges/vme_ca91cx42.c: In function ‘ca91cx42_slave_get’:
      drivers/vme/bridges/vme_ca91cx42.c:467:14: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      *pci_base = (dma_addr_t)vme_base + pci_offset;
      Signed-off-by: default avatarAugusto Mecking Caringi <augustocaringi@gmail.com>
      Acked-By: default avatarMartyn Welch <martyn@welchs.me.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      57bfd5a3
    • Herbert Xu's avatar
      Revert "tty: serial: 8250: add CON_CONSDEV to flags" · 1f363639
      Herbert Xu authored
      commit 6741f551 upstream.
      
      This commit needs to be reverted because it prevents people from
      using the serial console as a secondary console with input being
      directed to tty0.
      
      IOW, if you boot with console=ttyS0 console=tty0 then all kernels
      prior to this commit will produce output on both ttyS0 and tty0
      but input will only be taken from tty0.  With this patch the serial
      console will always be the primary console instead of tty0,
      potentially preventing people from getting into their machines in
      emergency situations.
      
      Fixes: d03516df ("tty: serial: 8250: add CON_CONSDEV to flags")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f363639
    • Takashi Sakamoto's avatar
      ASoC: hdmi-codec: use unsigned type to structure members with bit-field · f9cf776b
      Takashi Sakamoto authored
      commit 9e4d59ad upstream.
      
      This is a fix for Linux 4.10-rc1.
      
      In C language specification, a bit-field is interpreted as a signed or
      unsigned integer type consisting of the specified number of bits.
      
      In GCC manual, the range of a signed bit field of N bits is from
      -(2^N) / 2 to ((2^N) / 2) - 1
      https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#Bit-Fields
      
      Therefore, when defined as 1 bit-field with signed type, variables can
      represents -1 and 0.
      
      The snd-soc-hdmi-codec module includes a structure which has signed type
      members with bit-fields. Codes of this module assign 0 and 1 to the
      members. This seems to result in implementation-dependent behaviours.
      
      As of v4.10-rc1 merge window, outside of sound subsystem, this structure
      is referred by below GPU modules.
       - tda998x
       - sti-drm
       - mediatek-drm-hdmi
       - msm
      
      As long as I review their codes relevant to the structure, the structure
      members are used just for condition statements and printk formats.
      My proposal of change is a bit intrusive to the printk formats but this
      may be acceptable.
      
      Totally, it's reasonable to use unsigned type for the structure members.
      This bug is detected by Sparse, static code analyzer with below warnings.
      
      ./include/sound/hdmi-codec.h:39:26: error: dubious one-bit signed bitfield
      ./include/sound/hdmi-codec.h:40:28: error: dubious one-bit signed bitfield
      ./include/sound/hdmi-codec.h:41:29: error: dubious one-bit signed bitfield
      ./include/sound/hdmi-codec.h:42:31: error: dubious one-bit signed bitfield
      
      Fixes: 09184118 ("ASoC: hdmi-codec: Add hdmi-codec for external HDMI-encoders")
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Acked-by: default avatarArnaud Pouliquen <arnaud.pouliquen@st.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9cf776b
    • David Sterba's avatar
      btrfs: fix crash when tracepoint arguments are freed by wq callbacks · 28dad9aa
      David Sterba authored
      commit ac0c7cf8 upstream.
      
      Enabling btrfs tracepoints leads to instant crash, as reported. The wq
      callbacks could free the memory and the tracepoints started to
      dereference the members to get to fs_info.
      
      The proposed fix https://marc.info/?l=linux-btrfs&m=148172436722606&w=2
      removed the tracepoints but we could preserve them by passing only the
      required data in a safe way.
      
      Fixes: bc074524 ("btrfs: prefix fsid to all trace events")
      Reported-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28dad9aa
    • Mathias Nyman's avatar
      xhci: fix deadlock at host remove by running watchdog correctly · 4d0f302b
      Mathias Nyman authored
      commit d6169d04 upstream.
      
      If a URB is killed while the host is removed we can end up in a situation
      where the hub thread takes the roothub device lock, and waits for
      the URB to be given back by xhci-hcd, blocking the host remove code.
      
      xhci-hcd tries to stop the endpoint and give back the urb, but can't
      as the host is removed from PCI bus at the same time, preventing the normal
      way of giving back urb.
      
      Instead we need to rely on the stop command timeout function to give back
      the urb. This xhci_stop_endpoint_command_watchdog() timeout function
      used a XHCI_STATE_DYING flag to indicate if the timeout function is already
      running, but later this flag has been taking into use in other places to
      mark that xhci is dying.
      
      Remove checks for XHCI_STATE_DYING in xhci_urb_dequeue. We are still
      checking that reading from pci state does not return 0xffffffff or that
      host is not halted before trying to stop the endpoint.
      
      This whole area of stopping endpoints, giving back URBs, and the wathdog
      timeout need rework, this fix focuses on solving a specific deadlock
      issue that we can then send to stable before any major rework.
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d0f302b
    • Al Viro's avatar
      fix a fencepost error in pipe_advance() · d06367ac
      Al Viro authored
      commit b9dc6f65 upstream.
      
      The logics in pipe_advance() used to release all buffers past the new
      position failed in cases when the number of buffers to release was equal
      to pipe->buffers.  If that happened, none of them had been released,
      leaving pipe full.  Worse, it was trivial to trigger and we end up with
      pipe full of uninitialized pages.  IOW, it's an infoleak.
      Reported-by: default avatar"Alan J. Wylie" <alan@wylie.me.uk>
      Tested-by: default avatar"Alan J. Wylie" <alan@wylie.me.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d06367ac
    • Vlad Tsyrklevich's avatar
      i2c: fix kernel memory disclosure in dev interface · ab895739
      Vlad Tsyrklevich authored
      commit 30f939fe upstream.
      
      i2c_smbus_xfer() does not always fill an entire block, allowing
      kernel stack memory disclosure through the temp variable. Clear
      it before it's read to.
      Signed-off-by: default avatarVlad Tsyrklevich <vlad@tsyrklevich.net>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab895739
    • John Garry's avatar
      i2c: print correct device invalid address · 93c94ec2
      John Garry authored
      commit 6f724fb3 upstream.
      
      In of_i2c_register_device(), when the check for
      device address validity fails we print the info.addr,
      which has not been assigned properly.
      
      Fix this by printing the actual invalid address.
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Reviewed-by: default avatarVladimir Zapolskiy <vz@mleia.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Fixes: b4e2f6ac ("i2c: apply DT flags when probing")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93c94ec2
    • Guenter Roeck's avatar
      Input: elants_i2c - avoid divide by 0 errors on bad touchscreen data · 61a8c337
      Guenter Roeck authored
      commit 1c3415a0 upstream.
      
      The following crash may be seen if bad data is received from the
      touchscreen.
      
      [ 2189.425150] elants_i2c i2c-ELAN0001:00: unknown packet ff ff ff ff
      [ 2189.430738] divide error: 0000 [#1] PREEMPT SMP
      [ 2189.434679] gsmi: Log Shutdown Reason 0x03
      [ 2189.434689] Modules linked in: ip6t_REJECT nf_reject_ipv6 rfcomm evdi
      uinput uvcvideo cmac videobuf2_vmalloc videobuf2_memops snd_hda_codec_hdmi
      i2c_dev videobuf2_core snd_soc_sst_cht_bsw_rt5645 snd_hda_intel
      snd_intel_sst_acpi btusb btrtl btbcm btintel bluetooth snd_soc_sst_acpi
      snd_hda_codec snd_intel_sst_core snd_hwdep snd_soc_sst_mfld_platform
      snd_hda_core snd_soc_rt5645 memconsole_x86_legacy memconsole zram snd_soc_rl6231
      fuse ip6table_filter iwlmvm iwlwifi iwl7000_mac80211 cfg80211 iio_trig_sysfs
      joydev cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer
      kfifo_buf industrialio snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq
      snd_seq_device ppp_async ppp_generic slhc tun
      [ 2189.434866] CPU: 0 PID: 106 Comm: irq/184-ELAN000 Tainted: G        W
      3.18.0-13101-g57e8190 #1
      [ 2189.434883] Hardware name: GOOGLE Ultima, BIOS Google_Ultima.7287.131.43 07/20/2016
      [ 2189.434898] task: ffff88017a0b6d80 ti: ffff88017a2bc000 task.ti: ffff88017a2bc000
      [ 2189.434913] RIP: 0010:[<ffffffffbecc48d5>]  [<ffffffffbecc48d5>] elants_i2c_irq+0x190/0x200
      [ 2189.434937] RSP: 0018:ffff88017a2bfd98  EFLAGS: 00010293
      [ 2189.434948] RAX: 0000000000000000 RBX: ffff88017a967828 RCX: ffff88017a9678e8
      [ 2189.434962] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000000
      [ 2189.434975] RBP: ffff88017a2bfdd8 R08: 00000000000003e8 R09: 0000000000000000
      [ 2189.434989] R10: 0000000000000000 R11: 000000000044a2bd R12: ffff88017a991800
      [ 2189.435001] R13: ffffffffbe8a2a53 R14: ffff88017a0b6d80 R15: ffff88017a0b6d80
      [ 2189.435011] FS:  0000000000000000(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000
      [ 2189.435022] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 2189.435030] CR2: 00007f678d94b000 CR3: 000000003f41a000 CR4: 00000000001007f0
      [ 2189.435039] Stack:
      [ 2189.435044]  ffff88017a2bfda8 ffff88017a9678e8 646464647a2bfdd8 0000000006e09574
      [ 2189.435060]  0000000000000000 ffff88017a088b80 ffff88017a921000 ffffffffbe8a2a53
      [ 2189.435074]  ffff88017a2bfe08 ffffffffbe8a2a73 ffff88017a0b6d80 0000000006e09574
      [ 2189.435089] Call Trace:
      [ 2189.435101]  [<ffffffffbe8a2a53>] ? irq_thread_dtor+0xa9/0xa9
      [ 2189.435112]  [<ffffffffbe8a2a73>] irq_thread_fn+0x20/0x40
      [ 2189.435123]  [<ffffffffbe8a2be1>] irq_thread+0x14e/0x222
      [ 2189.435135]  [<ffffffffbee8cbeb>] ? __schedule+0x3b3/0x57a
      [ 2189.435145]  [<ffffffffbe8a29aa>] ? wake_threads_waitq+0x2d/0x2d
      [ 2189.435156]  [<ffffffffbe8a2a93>] ? irq_thread_fn+0x40/0x40
      [ 2189.435168]  [<ffffffffbe87c385>] kthread+0x10e/0x116
      [ 2189.435178]  [<ffffffffbe87c277>] ? __kthread_parkme+0x67/0x67
      [ 2189.435189]  [<ffffffffbee900ac>] ret_from_fork+0x7c/0xb0
      [ 2189.435199]  [<ffffffffbe87c277>] ? __kthread_parkme+0x67/0x67
      [ 2189.435208] Code: ff ff eb 73 0f b6 bb c1 00 00 00 83 ff 03 7e 13 49 8d 7c
      24 20 ba 04 00 00 00 48 c7 c6 8a cd 21 bf eb 4d 0f b6 83 c2 00 00 00 99 <f7> ff
      83 f8 37 75 15 48 6b f7 37 4c 8d a3 c4 00 00 00 4c 8d ac
      [ 2189.435312] RIP  [<ffffffffbecc48d5>] elants_i2c_irq+0x190/0x200
      [ 2189.435323]  RSP <ffff88017a2bfd98>
      [ 2189.435350] ---[ end trace f4945345a75d96dd ]---
      [ 2189.443841] Kernel panic - not syncing: Fatal exception
      [ 2189.444307] Kernel Offset: 0x3d800000 from 0xffffffff81000000
      	(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [ 2189.444519] gsmi: Log Shutdown Reason 0x02
      
      The problem was seen with a 3.18 based kernel, but there is no reason
      to believe that the upstream code is safe.
      
      Fixes: 66aee900 ("Input: add support for Elan eKTH I2C touchscreens")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61a8c337
    • Johan Hovold's avatar
      USB: serial: ch341: fix open and resume after B0 · 0556a65e
      Johan Hovold authored
      commit a20047f3 upstream.
      
      The private baud_rate variable is used to configure the port at open and
      reset-resume and must never be set to (and left at) zero or reset-resume
      and all further open attempts will fail.
      
      Fixes: aa91def4 ("USB: ch341: set tty baud speed according to tty struct")
      Fixes: 664d5df9 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0556a65e
    • Johan Hovold's avatar
      USB: serial: ch341: fix control-message error handling · 3ed1f6da
      Johan Hovold authored
      commit 2d5a9c72 upstream.
      
      A short control transfer would currently fail to be detected, something
      which could lead to stale buffer data being used as valid input.
      
      Check for short transfers, and make sure to log any transfer errors.
      
      Note that this also avoids leaking heap data to user space (TIOCMGET)
      and the remote device (break control).
      
      Fixes: 6ce76104 ("USB: Driver for CH341 USB-serial adaptor")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ed1f6da
    • Johan Hovold's avatar
      USB: serial: ch341: fix open error handling · 139556a9
      Johan Hovold authored
      commit f2950b78 upstream.
      
      Make sure to stop the interrupt URB before returning on errors during
      open.
      
      Fixes: 664d5df9 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      139556a9
    • Johan Hovold's avatar
      USB: serial: ch341: fix resume after reset · 1685daad
      Johan Hovold authored
      commit ce5e2928 upstream.
      
      Fix reset-resume handling which failed to resubmit the read and
      interrupt URBs, thereby leaving a port that was open before suspend in a
      broken state until closed and reopened.
      
      Fixes: 1ded7ea4 ("USB: ch341 serial: fix port number changed after resume")
      Fixes: 2bfd1c96 ("USB: serial: ch341: remove reset_resume callback")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1685daad
    • Johan Hovold's avatar
      USB: serial: ch341: fix initial modem-control state · 4aeab97a
      Johan Hovold authored
      commit 4e2da446 upstream.
      
      DTR and RTS will be asserted by the tty-layer when the port is opened
      and deasserted on close (if HUPCL is set). Make sure the initial state
      is not-asserted before the port is first opened as well.
      
      Fixes: 664d5df9 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4aeab97a
    • Johan Hovold's avatar
      USB: serial: kl5kusb105: fix line-state error handling · 58ede4be
      Johan Hovold authored
      commit 146cc8a1 upstream.
      
      The current implementation failed to detect short transfers when
      attempting to read the line state, and also, to make things worse,
      logged the content of the uninitialised heap transfer buffer.
      
      Fixes: abf492e7 ("USB: kl5kusb105: fix DMA buffers on stack")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58ede4be
    • Bin Liu's avatar
      usb: musb: fix runtime PM in debugfs · dfd48efc
      Bin Liu authored
      commit 7b6c1b4c upstream.
      
      MUSB driver now has runtime PM support, but the debugfs driver misses
      the PM _get/_put() calls, which could cause MUSB register access
      failure.
      Acked-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarBin Liu <b-liu@ti.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dfd48efc
    • Andy Lutomirski's avatar
      wusbcore: Fix one more crypto-on-the-stack bug · 88d3670a
      Andy Lutomirski authored
      commit 620f1a63 upstream.
      
      The driver put a constant buffer of all zeros on the stack and
      pointed a scatterlist entry at it.  This doesn't work with virtual
      stacks.  Use ZERO_PAGE instead.
      Reported-by: default avatarEric Biggers <ebiggers3@gmail.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88d3670a
    • Borislav Petkov's avatar
      x86/CPU/AMD: Fix Bulldozer topology · 99ff99b8
      Borislav Petkov authored
      commit a33d3317 upstream.
      
      The following commit:
      
        8196dab4 ("x86/cpu: Get rid of compute_unit_id")
      
      ... broke the initial strategy for Bulldozer-based cores' topology,
      where we consider each thread of a compute unit a standalone core
      and not a HT or SMT thread.
      
      Revert to the firmware-supplied core_id numbering and do not make
      them thread siblings as we don't consider them for such even if they
      technically are, more or less.
      Reported-and-tested-by: default avatarBrice Goglin <Brice.Goglin@inria.fr>
      Tested-by: default avatarYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 8196dab4 ("x86/cpu: Get rid of compute_unit_id")
      Link: http://lkml.kernel.org/r/20170105092638.5247-1-bp@alien8.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99ff99b8
    • Thomas Gleixner's avatar
      x86/bugs: Separate AMD E400 erratum and C1E bug · bd7e7694
      Thomas Gleixner authored
      commit 3344ed30 upstream.
      
      The workaround for the AMD Erratum E400 (Local APIC timer stops in C1E
      state) is a two step process:
      
       - Selection of the E400 aware idle routine
      
       - Detection whether the platform is affected
      
      The idle routine selection happens for possibly affected CPUs depending on
      family/model/stepping information. These range of CPUs is not necessarily
      affected as the decision whether to enable the C1E feature is made by the
      firmware. Unfortunately there is no way to query this at early boot.
      
      The current implementation polls a MSR in the E400 aware idle routine to
      detect whether the CPU is affected. This is inefficient on non affected
      CPUs because every idle entry has to do the MSR read.
      
      There is a better way to detect this before going idle for the first time
      which requires to seperate the bug flags:
      
        X86_BUG_AMD_E400 	- Selects the E400 aware idle routine and
        			  enables the detection
      
        X86_BUG_AMD_APIC_C1E  - Set when the platform is affected by E400
      
      Replace the current X86_BUG_AMD_APIC_C1E usage by the new X86_BUG_AMD_E400
      bug bit to select the idle routine which currently does an unconditional
      detection poll. X86_BUG_AMD_APIC_C1E is going to be used in later patches
      to remove the MSR polling and simplify the handling of this misfeature.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-3-bp@alien8.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd7e7694
    • Yazen Ghannam's avatar
      x86/cpu/AMD: Clean up cpu_llc_id assignment per topology feature · e2d9ad2c
      Yazen Ghannam authored
      commit b6a50cdd upstream.
      
      These changes do not affect current hw - just a cleanup:
      
      Currently, we assume that a system has a single Last Level Cache (LLC)
      per node, and that the cpu_llc_id is thus equal to the node_id. This no
      longer applies since Fam17h can have multiple last level caches within a
      node.
      
      So group the cpu_llc_id assignment by topology feature and family in
      order to make the computation of cpu_llc_id on the different families
      more clear.
      
      Here is how the LLC ID is being computed on the different families:
      
      The NODEID_MSR feature only applies to Fam10h in which case the LLC is
      at the node level.
      
      The TOPOEXT feature is used on families 15h, 16h and 17h. So far we only
      see multiple last level caches if L3 caches are available. Otherwise,
      the cpu_llc_id will default to be the phys_proc_id.
      
      We have L3 caches only on families 15h and 17h:
      
       - on Fam15h, the LLC is at the node level.
      
       - on Fam17h, the LLC is at the core complex level and can be found by
         right shifting the APIC ID. Also, keep the family checks explicit so that
         new families will fall back to the default, which will be node_id for
         TOPOEXT systems.
      
      Single node systems in families 10h and 15h will have a Node ID of 0
      which will be the same as the phys_proc_id, so we don't need to check
      for multiple nodes before using the node_id.
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarYazen Ghannam <Yazen.Ghannam@amd.com>
      [ Rewrote the commit message. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20161108153054.bs3sajbyevq6a6uu@pd.tnicSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2d9ad2c
    • Artur Molchanov's avatar
      bridge: netfilter: Fix dropping packets that moving through bridge interface · 259495a0
      Artur Molchanov authored
      commit 14221cc4 upstream.
      
      Problem:
      br_nf_pre_routing_finish() calls itself instead of
      br_nf_pre_routing_finish_bridge(). Due to this bug reverse path filter drops
      packets that go through bridge interface.
      
      User impact:
      Local docker containers with bridge network can not communicate with each
      other.
      
      Fixes: c5136b15 ("netfilter: bridge: add and use br_nf_hook_thresh")
      Signed-off-by: default avatarArtur Molchanov <artur.molchanov@synesis.ru>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      259495a0
    • Jan Kara's avatar
      xfs: Timely free truncated dirty pages · 6ba35da6
      Jan Kara authored
      commit 0a417b8d upstream.
      
      Commit 99579cce "xfs: skip dirty pages in ->releasepage()" started
      to skip dirty pages in xfs_vm_releasepage() which also has the effect
      that if a dirty page is truncated, it does not get freed by
      block_invalidatepage() and is lingering in LRU list waiting for reclaim.
      So a simple loop like:
      
      while true; do
      	dd if=/dev/zero of=file bs=1M count=100
      	rm file
      done
      
      will keep using more and more memory until we hit low watermarks and
      start pagecache reclaim which will eventually reclaim also the truncate
      pages. Keeping these truncated (and thus never usable) pages in memory
      is just a waste of memory, is unnecessarily stressing page cache
      reclaim, and reportedly also leads to anonymous mmap(2) returning ENOMEM
      prematurely.
      
      So instead of just skipping dirty pages in xfs_vm_releasepage(), return
      to old behavior of skipping them only if they have delalloc or unwritten
      buffers and fix the spurious warnings by warning only if the page is
      clean.
      
      CC: Brian Foster <bfoster@redhat.com>
      CC: Vlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarPetr Tůma <petr.tuma@d3s.mff.cuni.cz>
      Fixes: 99579cceSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ba35da6
    • Geert Uytterhoeven's avatar
      gpio: Move freeing of GPIO hogs before numbing of the device · 86673e93
      Geert Uytterhoeven authored
      commit 5018ada6 upstream.
      
      When removing a gpiochip that uses GPIO hogging (e.g. by unloading the
      chip's DT overlay), a warning is printed:
      
          gpio gpiochip8: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED
      
      This happens because gpiochip_free_hogs() is called after the gdev->chip
      pointer is reset to NULL. Hence __gpiod_free() cannot determine the
      chip in use, and cannot clear flags nor call the optional chip-specific
      .free() callback.
      
      Move the call to gpiochip_free_hogs() up to fix this.
      
      Fixes: ff2b1359 ("gpio: make the gpiochip a real device")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86673e93
    • Johannes Berg's avatar
      nl80211: fix sched scan netlink socket owner destruction · 0a28f539
      Johannes Berg authored
      commit 753aacfd upstream.
      
      A single netlink socket might own multiple interfaces *and* a
      scheduled scan request (which might belong to another interface),
      so when it goes away both may need to be destroyed.
      
      Remove the schedule_scan_stop indirection to fix this - it's only
      needed for interface destruction because of the way this works
      right now, with a single work taking care of all interfaces.
      
      Fixes: 93a1e86c ("nl80211: Stop scheduled scan if netlink client disappears")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a28f539
    • Nicolai Stange's avatar
      x86/efi: Don't allocate memmap through memblock after mm_init() · 14d6c966
      Nicolai Stange authored
      commit 20b1e22d upstream.
      
      With the following commit:
      
        4bc9f92e ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
      
      ...  efi_bgrt_init() calls into the memblock allocator through
      efi_mem_reserve() => efi_arch_mem_reserve() *after* mm_init() has been called.
      
      Indeed, KASAN reports a bad read access later on in efi_free_boot_services():
      
        BUG: KASAN: use-after-free in efi_free_boot_services+0xae/0x24c
                  at addr ffff88022de12740
        Read of size 4 by task swapper/0/0
        page:ffffea0008b78480 count:0 mapcount:-127
        mapping:          (null) index:0x1 flags: 0x5fff8000000000()
        [...]
        Call Trace:
         dump_stack+0x68/0x9f
         kasan_report_error+0x4c8/0x500
         kasan_report+0x58/0x60
         __asan_load4+0x61/0x80
         efi_free_boot_services+0xae/0x24c
         start_kernel+0x527/0x562
         x86_64_start_reservations+0x24/0x26
         x86_64_start_kernel+0x157/0x17a
         start_cpu+0x5/0x14
      
      The instruction at the given address is the first read from the memmap's
      memory, i.e. the read of md->type in efi_free_boot_services().
      
      Note that the writes earlier in efi_arch_mem_reserve() don't splat because
      they're done through early_memremap()ed addresses.
      
      So, after memblock is gone, allocations should be done through the "normal"
      page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
      it from efi_arch_mem_reserve(), efi_free_boot_services() and, for the sake
      of consistency, from efi_fake_memmap() as well.
      
      Note that for the latter, the memmap allocations cease to be page aligned.
      This isn't needed though.
      Tested-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Fixes: 4bc9f92e ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
      Link: http://lkml.kernel.org/r/20170105125130.2815-1-nicstange@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14d6c966
    • Peter Jones's avatar
      efi/x86: Prune invalid memory map entries and fix boot regression · 99b17ac0
      Peter Jones authored
      commit 0100a3e6 upstream.
      
      Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
      (2.28), include memory map entries with phys_addr=0x0 and num_pages=0.
      
      These machines fail to boot after the following commit,
      
        commit 8e80632f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
      
      Fix this by removing such bogus entries from the memory map.
      
      Furthermore, currently the log output for this case (with efi=debug)
      looks like:
      
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |  |  |  |  |  ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)
      
      This is clearly wrong, and also not as informative as it could be.  This
      patch changes it so that if we find obviously invalid memory map
      entries, we print an error and skip those entries.  It also detects the
      display of the address range calculation overflow, so the new output is:
      
       [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[0x0000000000000000-0x0000000000000000] (invalid)
      
      It also detects memory map sizes that would overflow the physical
      address, for example phys_addr=0xfffffffffffff000 and
      num_pages=0x0200000000000001, and prints:
      
       [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)
      
      It then removes these entries from the memory map.
      Signed-off-by: default avatarPeter Jones <pjones@redhat.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      [ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      [Matt: Include bugzilla info in commit log]
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99b17ac0
    • Ard Biesheuvel's avatar
      efi/libstub/arm*: Pass latest memory map to the kernel · 74ce3fd6
      Ard Biesheuvel authored
      commit abfb7b68 upstream.
      
      As reported by James Morse, the current libstub code involving the
      annotated memory map only works somewhat correctly by accident, due
      to the fact that a pool allocation happens to be reused immediately,
      retaining its former contents on most implementations of the
      UEFI boot services.
      
      Instead of juggling memory maps, which makes the code more complex than
      it needs to be, simply put placeholder values into the FDT for the memory
      map parameters, and only write the actual values after ExitBootServices()
      has been called.
      Reported-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Jeffrey Hugo <jhugo@codeaurora.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-efi@vger.kernel.org
      Fixes: ed9cc156 ("efi/libstub: Use efi_exit_boot_services() in FDT")
      Link: http://lkml.kernel.org/r/1482587963-20183-2-git-send-email-ard.biesheuvel@linaro.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74ce3fd6
    • Steve Rutherford's avatar
      KVM: x86: Introduce segmented_write_std · 736e77c0
      Steve Rutherford authored
      commit 129a72a0 upstream.
      
      Introduces segemented_write_std.
      
      Switches from emulated reads/writes to standard read/writes in fxsave,
      fxrstor, sgdt, and sidt.  This fixes CVE-2017-2584, a longstanding
      kernel memory leak.
      
      Since commit 283c95d0 ("KVM: x86: emulate FXSAVE and FXRSTOR",
      2016-11-09), which is luckily not yet in any final release, this would
      also be an exploitable kernel memory *write*!
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: 96051572
      Fixes: 283c95d0Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSteve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      736e77c0
    • Radim Krčmář's avatar
      KVM: x86: emulate FXSAVE and FXRSTOR · 83fedbb7
      Radim Krčmář authored
      commit 283c95d0 upstream.
      
      Internal errors were reported on 16 bit fxsave and fxrstor with ipxe.
      Old Intels don't have unrestricted_guest, so we have to emulate them.
      
      The patch takes advantage of the hardware implementation.
      
      AMD and Intel differ in saving and restoring other fields in first 32
      bytes.  A test wrote 0xff to the fxsave area, 0 to upper bits of MCSXR
      in the fxsave area, executed fxrstor, rewrote the fxsave area to 0xee,
      and executed fxsave:
      
        Intel (Nehalem):
          7f 1f 7f 7f ff 00 ff 07 ff ff ff ff ff ff 00 00
          ff ff ff ff ff ff 00 00 ff ff 00 00 ff ff 00 00
        Intel (Haswell -- deprecated FPU CS and FPU DS):
          7f 1f 7f 7f ff 00 ff 07 ff ff ff ff 00 00 00 00
          ff ff ff ff 00 00 00 00 ff ff 00 00 ff ff 00 00
        AMD (Opteron 2300-series):
          7f 1f 7f 7f ff 00 ee ee ee ee ee ee ee ee ee ee
          ee ee ee ee ee ee ee ee ff ff 00 00 ff ff 02 00
      
      fxsave/fxrstor will only be emulated on early Intels, so KVM can't do
      much to improve the situation.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      83fedbb7
    • Radim Krčmář's avatar
      KVM: x86: add asm_safe wrapper · aae8f346
      Radim Krčmář authored
      commit aabba3c6 upstream.
      
      Move the existing exception handling for inline assembly into a macro
      and switch its return values to X86EMUL type.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aae8f346
    • Radim Krčmář's avatar
      KVM: x86: add Align16 instruction flag · bc5e1316
      Radim Krčmář authored
      commit d3fe959f upstream.
      
      Needed for FXSAVE and FXRSTOR.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc5e1316
    • Wanpeng Li's avatar
      KVM: x86: fix NULL deref in vcpu_scan_ioapic · 90f70fcd
      Wanpeng Li authored
      commit 546d87e5 upstream.
      
      Reported by syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0
          IP: _raw_spin_lock+0xc/0x30
          PGD 3e28eb067
          PUD 3f0ac6067
          PMD 0
          Oops: 0002 [#1] SMP
          CPU: 0 PID: 2431 Comm: test Tainted: G           OE   4.10.0-rc1+ #3
          Call Trace:
           ? kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
           kvm_arch_vcpu_ioctl_run+0x10a8/0x15f0 [kvm]
           ? pick_next_task_fair+0xe1/0x4e0
           ? kvm_arch_vcpu_load+0xea/0x260 [kvm]
           kvm_vcpu_ioctl+0x33a/0x600 [kvm]
           ? hrtimer_try_to_cancel+0x29/0x130
           ? do_nanosleep+0x97/0xf0
           do_vfs_ioctl+0xa1/0x5d0
           ? __hrtimer_init+0x90/0x90
           ? do_nanosleep+0x5b/0xf0
           SyS_ioctl+0x79/0x90
           do_syscall_64+0x6e/0x180
           entry_SYSCALL64_slow_path+0x25/0x25
          RIP: _raw_spin_lock+0xc/0x30 RSP: ffffa43688973cc0
      
      The syzkaller folks reported a NULL pointer dereference due to
      ENABLE_CAP succeeding even without an irqchip.  The Hyper-V
      synthetic interrupt controller is activated, resulting in a
      wrong request to rescan the ioapic and a NULL pointer dereference.
      
          #include <sys/ioctl.h>
          #include <sys/mman.h>
          #include <sys/types.h>
          #include <linux/kvm.h>
          #include <pthread.h>
          #include <stddef.h>
          #include <stdint.h>
          #include <stdlib.h>
          #include <string.h>
          #include <unistd.h>
      
          #ifndef KVM_CAP_HYPERV_SYNIC
          #define KVM_CAP_HYPERV_SYNIC 123
          #endif
      
          void* thr(void* arg)
          {
      	struct kvm_enable_cap cap;
      	cap.flags = 0;
      	cap.cap = KVM_CAP_HYPERV_SYNIC;
      	ioctl((long)arg, KVM_ENABLE_CAP, &cap);
      	return 0;
          }
      
          int main()
          {
      	void *host_mem = mmap(0, 0x1000, PROT_READ|PROT_WRITE,
      			MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
      	int kvmfd = open("/dev/kvm", 0);
      	int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
      	struct kvm_userspace_memory_region memreg;
      	memreg.slot = 0;
      	memreg.flags = 0;
      	memreg.guest_phys_addr = 0;
      	memreg.memory_size = 0x1000;
      	memreg.userspace_addr = (unsigned long)host_mem;
      	host_mem[0] = 0xf4;
      	ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg);
      	int cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
      	struct kvm_sregs sregs;
      	ioctl(cpufd, KVM_GET_SREGS, &sregs);
      	sregs.cr0 = 0;
      	sregs.cr4 = 0;
      	sregs.efer = 0;
      	sregs.cs.selector = 0;
      	sregs.cs.base = 0;
      	ioctl(cpufd, KVM_SET_SREGS, &sregs);
      	struct kvm_regs regs = { .rflags = 2 };
      	ioctl(cpufd, KVM_SET_REGS, &regs);
      	ioctl(vmfd, KVM_CREATE_IRQCHIP, 0);
      	pthread_t th;
      	pthread_create(&th, 0, thr, (void*)(long)cpufd);
      	usleep(rand() % 10000);
      	ioctl(cpufd, KVM_RUN, 0);
      	pthread_join(th, 0);
      	return 0;
          }
      
      This patch fixes it by failing ENABLE_CAP if without an irqchip.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: 5c919412 (kvm/x86: Hyper-V synthetic interrupt controller)
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90f70fcd
    • David Matlack's avatar
      KVM: x86: flush pending lapic jump label updates on module unload · 5ed21cc0
      David Matlack authored
      commit cef84c30 upstream.
      
      KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled).
      These are implemented with delayed_work structs which can still be
      pending when the KVM module is unloaded. We've seen this cause kernel
      panics when the kvm_intel module is quickly reloaded.
      
      Use the new static_key_deferred_flush() API to flush pending updates on
      module unload.
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ed21cc0
    • David Matlack's avatar
      jump_labels: API for flushing deferred jump label updates · 483ecebb
      David Matlack authored
      commit b6416e61 upstream.
      
      Modules that use static_key_deferred need a way to synchronize with
      any delayed work that is still pending when the module is unloaded.
      Introduce static_key_deferred_flush() which flushes any pending
      jump label updates.
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      483ecebb
    • Wanpeng Li's avatar
      KVM: eventfd: fix NULL deref irqbypass consumer · 7caf473f
      Wanpeng Li authored
      commit 4f3dbdf4 upstream.
      
      Reported syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
          IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          PGD 0
      
          Oops: 0002 [#1] SMP
          CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
          Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
          task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
          RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          Call Trace:
           irqfd_shutdown+0x66/0xa0 [kvm]
           process_one_work+0x16b/0x480
           worker_thread+0x4b/0x500
           kthread+0x101/0x140
           ? process_one_work+0x480/0x480
           ? kthread_create_on_node+0x60/0x60
           ret_from_fork+0x25/0x30
          RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
          CR2: 0000000000000008
      
      The syzkaller folks reported a NULL pointer dereference that due to
      unregister an consumer which fails registration before. The syzkaller
      creates two VMs w/ an equal eventfd occasionally. So the second VM
      fails to register an irqbypass consumer. It will make irqfd as inactive
      and queue an workqueue work to shutdown irqfd and unregister the irqbypass
      consumer when eventfd is closed. However, the second consumer has been
      initialized though it fails registration. So the token(same as the first
      VM's) is taken to unregister the consumer through the workqueue, the
      consumer of the first VM is found and unregistered, then NULL deref incurred
      in the path of deleting consumer from the consumers list.
      
      This patch fixes it by making irq_bypass_register/unregister_consumer()
      looks for the consumer entry based on consumer pointer itself instead of
      token matching.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Suggested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7caf473f
    • Paolo Bonzini's avatar
      KVM: x86: fix emulation of "MOV SS, null selector" · 7718ffcf
      Paolo Bonzini authored
      commit 33ab9110 upstream.
      
      This is CVE-2017-2583.  On Intel this causes a failed vmentry because
      SS's type is neither 3 nor 7 (even though the manual says this check is
      only done for usable SS, and the dmesg splat says that SS is unusable!).
      On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.
      
      The fix fabricates a data segment descriptor when SS is set to a null
      selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
      Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
      this in turn ensures CPL < 3 because RPL must be equal to CPL.
      
      Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
      the bug and deciphering the manuals.
      Reported-by: default avatarXiaohan Zhang <zhangxiaohan1@huawei.com>
      Fixes: 79d5b4c3Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7718ffcf
    • Mike Kravetz's avatar
      mm/hugetlb.c: fix reservation race when freeing surplus pages · 1e26cec6
      Mike Kravetz authored
      commit e5bbc8a6 upstream.
      
      return_unused_surplus_pages() decrements the global reservation count,
      and frees any unused surplus pages that were backing the reservation.
      
      Commit 7848a4bf ("mm/hugetlb.c: add cond_resched_lock() in
      return_unused_surplus_pages()") added a call to cond_resched_lock in the
      loop freeing the pages.
      
      As a result, the hugetlb_lock could be dropped, and someone else could
      use the pages that will be freed in subsequent iterations of the loop.
      This could result in inconsistent global hugetlb page state, application
      api failures (such as mmap) failures or application crashes.
      
      When dropping the lock in return_unused_surplus_pages, make sure that
      the global reservation count (resv_huge_pages) remains sufficiently
      large to prevent someone else from claiming pages about to be freed.
      
      Analyzed by Paul Cassella.
      
      Fixes: 7848a4bf ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()")
      Link: http://lkml.kernel.org/r/1483991767-6879-1-git-send-email-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: default avatarPaul Cassella <cassella@cray.com>
      Suggested-by: default avatarMichal Hocko <mhocko@kernel.org>
      Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e26cec6
    • John Sperbeck's avatar
      mm/slab.c: fix SLAB freelist randomization duplicate entries · 8315c22e
      John Sperbeck authored
      commit c4e490cf upstream.
      
      This patch fixes a bug in the freelist randomization code.  When a high
      random number is used, the freelist will contain duplicate entries.  It
      will result in different allocations sharing the same chunk.
      
      It will result in odd behaviours and crashes.  It should be uncommon but
      it depends on the machines.  We saw it happening more often on some
      machines (every few hours of running tests).
      
      Fixes: c7ce4f60 ("mm: SLAB freelist randomization")
      Link: http://lkml.kernel.org/r/20170103181908.143178-1-thgarnie@google.comSigned-off-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Signed-off-by: default avatarThomas Garnier <thgarnie@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8315c22e
    • Minchan Kim's avatar
      mm: support anonymous stable page · 6ca29ee3
      Minchan Kim authored
      commit f0571429 upstream.
      
      During developemnt for zram-swap asynchronous writeback, I found strange
      corruption of compressed page, resulting in:
      
        Modules linked in: zram(E)
        CPU: 3 PID: 1520 Comm: zramd-1 Tainted: G            E   4.8.0-mm1-00320-ge0d4894c9c38-dirty #3274
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
        task: ffff88007620b840 task.stack: ffff880078090000
        RIP: set_freeobj.part.43+0x1c/0x1f
        RSP: 0018:ffff880078093ca8  EFLAGS: 00010246
        RAX: 0000000000000018 RBX: ffff880076798d88 RCX: ffffffff81c408c8
        RDX: 0000000000000018 RSI: 0000000000000000 RDI: 0000000000000246
        RBP: ffff880078093cb0 R08: 0000000000000000 R09: 0000000000000000
        R10: ffff88005bc43030 R11: 0000000000001df3 R12: ffff880076798d88
        R13: 000000000005bc43 R14: ffff88007819d1b8 R15: 0000000000000001
        FS:  0000000000000000(0000) GS:ffff88007e380000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fc934048f20 CR3: 0000000077b01000 CR4: 00000000000406e0
        Call Trace:
          obj_malloc+0x22b/0x260
          zs_malloc+0x1e4/0x580
          zram_bvec_rw+0x4cd/0x830 [zram]
          page_requests_rw+0x9c/0x130 [zram]
          zram_thread+0xe6/0x173 [zram]
          kthread+0xca/0xe0
          ret_from_fork+0x25/0x30
      
      With investigation, it reveals currently stable page doesn't support
      anonymous page.  IOW, reuse_swap_page can reuse the page without waiting
      writeback completion so it can overwrite page zram is compressing.
      
      Unfortunately, zram has used per-cpu stream feature from v4.7.
      It aims for increasing cache hit ratio of scratch buffer for
      compressing. Downside of that approach is that zram should ask
      memory space for compressed page in per-cpu context which requires
      stricted gfp flag which could be failed. If so, it retries to
      allocate memory space out of per-cpu context so it could get memory
      this time and compress the data again, copies it to the memory space.
      
      In this scenario, zram assumes the data should never be changed
      but it is not true unless stable page supports. So, If the data is
      changed under us, zram can make buffer overrun because second
      compression size could be bigger than one we got in previous trial
      and blindly, copy bigger size object to smaller buffer which is
      buffer overrun. The overrun breaks zsmalloc free object chaining
      so system goes crash like above.
      
      I think below is same problem.
      https://bugzilla.suse.com/show_bug.cgi?id=997574
      
      Unfortunately, reuse_swap_page should be atomic so that we cannot wait on
      writeback in there so the approach in this patch is simply return false if
      we found it needs stable page.  Although it increases memory footprint
      temporarily, it happens rarely and it should be reclaimed easily althoug
      it happened.  Also, It would be better than waiting of IO completion,
      which is critial path for application latency.
      
      Fixes: da9556a2 ("zram: user per-cpu compression streams")
      Link: http://lkml.kernel.org/r/20161120233015.GA14113@bbox
      Link: http://lkml.kernel.org/r/1482366980-3782-2-git-send-email-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Hyeoncheol Lee <cheol.lee@lge.com>
      Cc: <yjay.kim@lge.com>
      Cc: Sangseok Lee <sangseok.lee@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ca29ee3