1. 03 Feb, 2018 25 commits
    • Nikita Leshenko's avatar
      KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered · 3ee7ae2b
      Nikita Leshenko authored
      
      [ Upstream commit a8bfec29 ]
      
      Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
      IOAPICs without an EOI register. They simulate the EOI message manually
      by changing the trigger mode to edge and then back to level, with the
      entry being masked during this.
      
      QEMU implements this feature in commit ed1263c363c9
      ("ioapic: clear remote irr bit for edge-triggered interrupts")
      
      As a side effect, this commit removes an incorrect behavior where Remote
      IRR was cleared when the redirection table entry was rewritten. This is not
      consistent with the manual and also opens an opportunity for a strange
      behavior when a redirection table entry is modified from an interrupt
      handler that handles the same entry: The modification will clear the
      Remote IRR bit even though the interrupt handler is still running.
      Signed-off-by: default avatarNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: default avatarSteve Rutherford <srutherford@google.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ee7ae2b
    • Nikita Leshenko's avatar
      KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race · 19a2c007
      Nikita Leshenko authored
      
      [ Upstream commit 0fc5a36d ]
      
      KVM uses ioapic_handled_vectors to track vectors that need to notify the
      IOAPIC on EOI. The problem is that IOAPIC can be reconfigured while an
      interrupt with old configuration is pending or running and
      ioapic_handled_vectors only remembers the newest configuration;
      thus EOI from the old interrupt is not delievered to the IOAPIC.
      
      A previous commit db2bdcbb
      ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
      addressed this issue by adding pending edge-triggered interrupts to
      ioapic_handled_vectors, fixing this race for edge-triggered interrupts.
      The commit explicitly ignored level-triggered interrupts,
      but this race applies to them as well:
      
      1) IOAPIC sends a level triggered interrupt vector to VCPU0
      2) VCPU0's handler deasserts the irq line and reconfigures the IOAPIC
         to route the vector to VCPU1. The reconfiguration rewrites only the
         upper 32 bits of the IOREDTBLn register. (Causes KVM to update
         ioapic_handled_vectors for VCPU0 and it no longer includes the vector.)
      3) VCPU0 sends EOI for the vector, but it's not delievered to the
         IOAPIC because the ioapic_handled_vectors doesn't include the vector.
      4) New interrupts are not delievered to VCPU1 because remote_irr bit
         is set forever.
      
      Therefore, the correct behavior is to add all pending and running
      interrupts to ioapic_handled_vectors.
      
      This commit introduces a slight performance hit similar to
      commit db2bdcbb ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
      for the rare case that the vector is reused by a non-IOAPIC source on
      VCPU0. We prefer to keep solution simple and not handle this case just
      as the original commit does.
      
      Fixes: db2bdcbb ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
      Signed-off-by: default avatarNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19a2c007
    • Wanpeng Li's avatar
      KVM: X86: Fix operand/address-size during instruction decoding · 3bad2ece
      Wanpeng Li authored
      
      [ Upstream commit 3853be26 ]
      
      Pedro reported:
        During tests that we conducted on KVM, we noticed that executing a "PUSH %ES"
        instruction under KVM produces different results on both memory and the SP
        register depending on whether EPT support is enabled. With EPT the SP is
        reduced by 4 bytes (and the written value is 0-padded) but without EPT support
        it is only reduced by 2 bytes. The difference can be observed when the CS.DB
        field is 1 (32-bit) but not when it's 0 (16-bit).
      
      The internal segment descriptor cache exist even in real/vm8096 mode. The CS.D
      also should be respected instead of just default operand/address-size/66H
      prefix/67H prefix during instruction decoding. This patch fixes it by also
      adjusting operand/address-size according to CS.D.
      Reported-by: default avatarPedro Fonseca <pfonseca@cs.washington.edu>
      Tested-by: default avatarPedro Fonseca <pfonseca@cs.washington.edu>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Pedro Fonseca <pfonseca@cs.washington.edu>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3bad2ece
    • Liran Alon's avatar
      KVM: x86: Don't re-execute instruction when not passing CR2 value · 80d2b5af
      Liran Alon authored
      
      [ Upstream commit 9b8ae637 ]
      
      In case of instruction-decode failure or emulation failure,
      x86_emulate_instruction() will call reexecute_instruction() which will
      attempt to use the cr2 value passed to x86_emulate_instruction().
      However, when x86_emulate_instruction() is called from
      emulate_instruction(), cr2 is not passed (passed as 0) and therefore
      it doesn't make sense to execute reexecute_instruction() logic at all.
      
      Fixes: 51d8b661 ("KVM: cleanup emulate_instruction")
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80d2b5af
    • Liran Alon's avatar
      KVM: x86: emulator: Return to user-mode on L1 CPL=0 emulation failure · 3d4df917
      Liran Alon authored
      
      [ Upstream commit 1f4dcb3b ]
      
      On this case, handle_emulation_failure() fills kvm_run with
      internal-error information which it expects to be delivered
      to user-mode for further processing.
      However, the code reports a wrong return-value which makes KVM to never
      return to user-mode on this scenario.
      
      Fixes: 6d77dbfc ("KVM: inject #UD if instruction emulation fails and exit to
      userspace")
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d4df917
    • Lyude Paul's avatar
      igb: Free IRQs when device is hotplugged · 2b84fd62
      Lyude Paul authored
      commit 888f2293 upstream.
      
      Recently I got a Caldigit TS3 Thunderbolt 3 dock, and noticed that upon
      hotplugging my kernel would immediately crash due to igb:
      
      [  680.825801] kernel BUG at drivers/pci/msi.c:352!
      [  680.828388] invalid opcode: 0000 [#1] SMP
      [  680.829194] Modules linked in: igb(O) thunderbolt i2c_algo_bit joydev vfat fat btusb btrtl btbcm btintel bluetooth ecdh_generic hp_wmi sparse_keymap rfkill wmi_bmof iTCO_wdt intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul snd_pcm rtsx_pci_ms mei_me snd_timer memstick snd pcspkr mei soundcore i2c_i801 tpm_tis psmouse shpchp wmi tpm_tis_core tpm video hp_wireless acpi_pad rtsx_pci_sdmmc mmc_core crc32c_intel serio_raw rtsx_pci mfd_core xhci_pci xhci_hcd i2c_hid i2c_core [last unloaded: igb]
      [  680.831085] CPU: 1 PID: 78 Comm: kworker/u16:1 Tainted: G           O     4.15.0-rc3Lyude-Test+ #6
      [  680.831596] Hardware name: HP HP ZBook Studio G4/826B, BIOS P71 Ver. 01.03 06/09/2017
      [  680.832168] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
      [  680.832687] RIP: 0010:free_msi_irqs+0x180/0x1b0
      [  680.833271] RSP: 0018:ffffc9000030fbf0 EFLAGS: 00010286
      [  680.833761] RAX: ffff8803405f9c00 RBX: ffff88033e3d2e40 RCX: 000000000000002c
      [  680.834278] RDX: 0000000000000000 RSI: 00000000000000ac RDI: ffff880340be2178
      [  680.834832] RBP: 0000000000000000 R08: ffff880340be1ff0 R09: ffff8803405f9c00
      [  680.835342] R10: 0000000000000000 R11: 0000000000000040 R12: ffff88033d63a298
      [  680.835822] R13: ffff88033d63a000 R14: 0000000000000060 R15: ffff880341959000
      [  680.836332] FS:  0000000000000000(0000) GS:ffff88034f440000(0000) knlGS:0000000000000000
      [  680.836817] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  680.837360] CR2: 000055e64044afdf CR3: 0000000001c09002 CR4: 00000000003606e0
      [  680.837954] Call Trace:
      [  680.838853]  pci_disable_msix+0xce/0xf0
      [  680.839616]  igb_reset_interrupt_capability+0x5d/0x60 [igb]
      [  680.840278]  igb_remove+0x9d/0x110 [igb]
      [  680.840764]  pci_device_remove+0x36/0xb0
      [  680.841279]  device_release_driver_internal+0x157/0x220
      [  680.841739]  pci_stop_bus_device+0x7d/0xa0
      [  680.842255]  pci_stop_bus_device+0x2b/0xa0
      [  680.842722]  pci_stop_bus_device+0x3d/0xa0
      [  680.843189]  pci_stop_and_remove_bus_device+0xe/0x20
      [  680.843627]  trim_stale_devices+0xf3/0x140
      [  680.844086]  trim_stale_devices+0x94/0x140
      [  680.844532]  trim_stale_devices+0xa6/0x140
      [  680.845031]  ? get_slot_status+0x90/0xc0
      [  680.845536]  acpiphp_check_bridge.part.5+0xfe/0x140
      [  680.846021]  acpiphp_hotplug_notify+0x175/0x200
      [  680.846581]  ? free_bridge+0x100/0x100
      [  680.847113]  acpi_device_hotplug+0x8a/0x490
      [  680.847535]  acpi_hotplug_work_fn+0x1a/0x30
      [  680.848076]  process_one_work+0x182/0x3a0
      [  680.848543]  worker_thread+0x2e/0x380
      [  680.848963]  ? process_one_work+0x3a0/0x3a0
      [  680.849373]  kthread+0x111/0x130
      [  680.849776]  ? kthread_create_worker_on_cpu+0x50/0x50
      [  680.850188]  ret_from_fork+0x1f/0x30
      [  680.850601] Code: 43 14 85 c0 0f 84 d5 fe ff ff 31 ed eb 0f 83 c5 01 39 6b 14 0f 86 c5 fe ff ff 8b 7b 10 01 ef e8 b7 e4 d2 ff 48 83 78 70 00 74 e3 <0f> 0b 49 8d b5 a0 00 00 00 e8 62 6f d3 ff e9 c7 fe ff ff 48 8b
      [  680.851497] RIP: free_msi_irqs+0x180/0x1b0 RSP: ffffc9000030fbf0
      
      As it turns out, normally the freeing of IRQs that would fix this is called
      inside of the scope of __igb_close(). However, since the device is
      already gone by the point we try to unregister the netdevice from the
      driver due to a hotplug we end up seeing that the netif isn't present
      and thus, forget to free any of the device IRQs.
      
      So: make sure that if we're in the process of dismantling the netdev, we
      always allow __igb_close() to be called so that IRQs may be freed
      normally. Additionally, only allow igb_close() to be called from
      __igb_close() if it hasn't already been called for the given adapter.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Fixes: 9474933c ("igb: close/suspend race in netif_device_detach")
      Cc: Todd Fujinaka <todd.fujinaka@intel.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b84fd62
    • Jesse Chan's avatar
      mtd: nand: denali_pci: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE · f42132c9
      Jesse Chan authored
      commit d822401d upstream.
      
      This change resolves a new compile-time warning
      when built as a loadable module:
      
      WARNING: modpost: missing MODULE_LICENSE() in drivers/mtd/nand/denali_pci.o
      see include/linux/module.h for more information
      
      This adds the license as "GPL v2", which matches the header of the file.
      
      MODULE_DESCRIPTION and MODULE_AUTHOR are also added.
      Signed-off-by: default avatarJesse Chan <jc@linux.com>
      Acked-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f42132c9
    • Jesse Chan's avatar
      gpio: ath79: add missing MODULE_DESCRIPTION/LICENSE · 40ac08f6
      Jesse Chan authored
      commit 539340f3 upstream.
      
      This change resolves a new compile-time warning
      when built as a loadable module:
      
      WARNING: modpost: missing MODULE_LICENSE() in drivers/gpio/gpio-ath79.o
      see include/linux/module.h for more information
      
      This adds the license as "GPL v2", which matches the header of the file.
      
      MODULE_DESCRIPTION is also added.
      Signed-off-by: default avatarJesse Chan <jc@linux.com>
      Acked-by: default avatarAlban Bedel <albeu@free.fr>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40ac08f6
    • Jesse Chan's avatar
      gpio: iop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE · 35831adf
      Jesse Chan authored
      commit 97b03136 upstream.
      
      This change resolves a new compile-time warning
      when built as a loadable module:
      
      WARNING: modpost: missing MODULE_LICENSE() in drivers/gpio/gpio-iop.o
      see include/linux/module.h for more information
      
      This adds the license as "GPL", which matches the header of the file.
      
      MODULE_DESCRIPTION and MODULE_AUTHOR are also added.
      Signed-off-by: default avatarJesse Chan <jc@linux.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      35831adf
    • Jesse Chan's avatar
      power: reset: zx-reboot: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE · 3f19b9ea
      Jesse Chan authored
      commit 348c7cf5 upstream.
      
      This change resolves a new compile-time warning
      when built as a loadable module:
      
      WARNING: modpost: missing MODULE_LICENSE() in drivers/power/reset/zx-reboot.o
      see include/linux/module.h for more information
      
      This adds the license as "GPL v2", which matches the header of the file.
      
      MODULE_DESCRIPTION and MODULE_AUTHOR are also added.
      Signed-off-by: default avatarJesse Chan <jc@linux.com>
      Signed-off-by: default avatarSebastian Reichel <sebastian.reichel@collabora.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f19b9ea
    • Stephan Mueller's avatar
      crypto: af_alg - whitelist mask and type · fba245f4
      Stephan Mueller authored
      commit bb30b884 upstream.
      
      The user space interface allows specifying the type and mask field used
      to allocate the cipher. Only a subset of the possible flags are intended
      for user space. Therefore, white-list the allowed flags.
      
      In case the user space caller uses at least one non-allowed flag, EINVAL
      is returned.
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarStephan Mueller <smueller@chronox.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fba245f4
    • Stephan Mueller's avatar
      crypto: aesni - handle zero length dst buffer · 0b69d7f4
      Stephan Mueller authored
      commit 9c674e1e upstream.
      
      GCM can be invoked with a zero destination buffer. This is possible if
      the AAD and the ciphertext have zero lengths and only the tag exists in
      the source buffer (i.e. a source buffer cannot be zero). In this case,
      the GCM cipher only performs the authentication and no decryption
      operation.
      
      When the destination buffer has zero length, it is possible that no page
      is mapped to the SG pointing to the destination. In this case,
      sg_page(req->dst) is an invalid access. Therefore, page accesses should
      only be allowed if the req->dst->length is non-zero which is the
      indicator that a page must exist.
      
      This fixes a crash that can be triggered by user space via AF_ALG.
      Signed-off-by: default avatarStephan Mueller <smueller@chronox.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b69d7f4
    • Takashi Iwai's avatar
      ALSA: seq: Make ioctls race-free · 623e5c8a
      Takashi Iwai authored
      commit b3defb79 upstream.
      
      The ALSA sequencer ioctls have no protection against racy calls while
      the concurrent operations may lead to interfere with each other.  As
      reported recently, for example, the concurrent calls of setting client
      pool with a combination of write calls may lead to either the
      unkillable dead-lock or UAF.
      
      As a slightly big hammer solution, this patch introduces the mutex to
      make each ioctl exclusive.  Although this may reduce performance via
      parallel ioctl calls, usually it's not demanded for sequencer usages,
      hence it should be negligible.
      Reported-by: default avatarLuo Quan <a4651386@163.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      [bwh: Backported to 4.4: ioctl dispatch is done from snd_seq_do_ioctl();
       take the mutex and add ret variable there.]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      623e5c8a
    • Hugh Dickins's avatar
      kaiser: fix intel_bts perf crashes · 145ebf95
      Hugh Dickins authored
      Vince reported perf_fuzzer quickly locks up on 4.15-rc7 with PTI;
      Robert reported Bad RIP with KPTI and Intel BTS also on 4.15-rc7:
      honggfuzz -f /tmp/somedirectorywithatleastonefile \
                --linux_perf_bts_edge -s -- /bin/true
      (honggfuzz from https://github.com/google/honggfuzz) crashed with
      BUG: unable to handle kernel paging request at ffff9d3215100000
      (then narrowed it down to
      perf record --per-thread -e intel_bts//u -- /bin/ls).
      
      The intel_bts driver does not use the 'normal' BTS buffer which is
      exposed through kaiser_add_mapping(), but instead uses the memory
      allocated for the perf AUX buffer.
      
      This obviously comes apart when using PTI, because then the kernel
      mapping, which includes that AUX buffer memory, disappears while
      switched to user page tables.
      
      Easily fixed in old-Kaiser backports, by applying kaiser_add_mapping()
      to those pages; perhaps not so easy for upstream, where 4.15-rc8 commit
      99a9dc98 ("x86,perf: Disable intel_bts when PTI") disables for now.
      
      Slightly reorganized surrounding code in bts_buffer_setup_aux(),
      so it can better match bts_buffer_free_aux(): free_aux with an #ifdef
      to avoid the loop when PTI is off, but setup_aux needs to loop anyway
      (and kaiser_add_mapping() is cheap when PTI config is off or "pti=off").
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Reported-by: default avatarRobert Święcki <robert@swiecki.net>
      Analyzed-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Analyzed-by: default avatarStephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Vince Weaver <vince@deater.net>
      Cc: stable@vger.kernel.org
      Cc: Jiri Kosina <jkosina@suze.cz>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      145ebf95
    • Dave Hansen's avatar
      x86/pti: Make unpoison of pgd for trusted boot work for real · 5991ee90
      Dave Hansen authored
      commit 445b69e3 upstream.
      
      The inital fix for trusted boot and PTI potentially misses the pgd clearing
      if pud_alloc() sets a PGD.  It probably works in *practice* because for two
      adjacent calls to map_tboot_page() that share a PGD entry, the first will
      clear NX, *then* allocate and set the PGD (without NX clear).  The second
      call will *not* allocate but will clear the NX bit.
      
      Defer the NX clearing to a point after it is known that all top-level
      allocations have occurred.  Add a comment to clarify why.
      
      [ tglx: Massaged changelog ]
      
      [hughd notes: I have not tested tboot, but this looks to me as necessary
      and as safe in old-Kaiser backports as it is upstream; I'm not submitting
      the commit-to-be-fixed 262b6b30, since it was undone by 445b69e3,
      and makes conflict trouble because of 5-level's p4d versus 4-level's pgd.]
      
      Fixes: 262b6b30 ("x86/tboot: Unbreak tboot with PTI enabled")
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: "Tim Chen" <tim.c.chen@linux.intel.com>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: peterz@infradead.org
      Cc: ning.sun@intel.com
      Cc: tboot-devel@lists.sourceforge.net
      Cc: andi@firstfloor.org
      Cc: luto@kernel.org
      Cc: law@redhat.com
      Cc: pbonzini@redhat.com
      Cc: torvalds@linux-foundation.org
      Cc: gregkh@linux-foundation.org
      Cc: dwmw@amazon.co.uk
      Cc: nickc@redhat.com
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180110224939.2695CD47@viggo.jf.intel.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5991ee90
    • Daniel Borkmann's avatar
      bpf: reject stores into ctx via st and xadd · faa74a86
      Daniel Borkmann authored
      [ upstream commit f37a8cb8 ]
      
      Alexei found that verifier does not reject stores into context
      via BPF_ST instead of BPF_STX. And while looking at it, we
      also should not allow XADD variant of BPF_STX.
      
      The context rewriter is only assuming either BPF_LDX_MEM- or
      BPF_STX_MEM-type operations, thus reject anything other than
      that so that assumptions in the rewriter properly hold. Add
      test cases as well for BPF selftests.
      
      Fixes: d691f9e8 ("bpf: allow programs to write to certain skb fields")
      Reported-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      faa74a86
    • Alexei Starovoitov's avatar
      bpf: fix 32-bit divide by zero · 02662601
      Alexei Starovoitov authored
      [ upstream commit 68fda450 ]
      
      due to some JITs doing if (src_reg == 0) check in 64-bit mode
      for div/mod operations mask upper 32-bits of src register
      before doing the check
      
      Fixes: 62258278 ("net: filter: x86: internal BPF JIT")
      Fixes: 7a12b503 ("sparc64: Add eBPF JIT.")
      Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02662601
    • Eric Dumazet's avatar
      bpf: fix divides by zero · b72ba2a0
      Eric Dumazet authored
      [ upstream commit c366287e ]
      
      Divides by zero are not nice, lets avoid them if possible.
      
      Also do_div() seems not needed when dealing with 32bit operands,
      but this seems a minor detail.
      
      Fixes: bd4cf0ed ("net: filter: rework/optimize internal BPF interpreter's instruction set")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b72ba2a0
    • Daniel Borkmann's avatar
      bpf: avoid false sharing of map refcount with max_entries · 96d9b233
      Daniel Borkmann authored
      [ upstream commit be95a845 ]
      
      In addition to commit b2157399 ("bpf: prevent out-of-bounds
      speculation") also change the layout of struct bpf_map such that
      false sharing of fast-path members like max_entries is avoided
      when the maps reference counter is altered. Therefore enforce
      them to be placed into separate cachelines.
      
      pahole dump after change:
      
        struct bpf_map {
              const struct bpf_map_ops  * ops;                 /*     0     8 */
              struct bpf_map *           inner_map_meta;       /*     8     8 */
              void *                     security;             /*    16     8 */
              enum bpf_map_type          map_type;             /*    24     4 */
              u32                        key_size;             /*    28     4 */
              u32                        value_size;           /*    32     4 */
              u32                        max_entries;          /*    36     4 */
              u32                        map_flags;            /*    40     4 */
              u32                        pages;                /*    44     4 */
              u32                        id;                   /*    48     4 */
              int                        numa_node;            /*    52     4 */
              bool                       unpriv_array;         /*    56     1 */
      
              /* XXX 7 bytes hole, try to pack */
      
              /* --- cacheline 1 boundary (64 bytes) --- */
              struct user_struct *       user;                 /*    64     8 */
              atomic_t                   refcnt;               /*    72     4 */
              atomic_t                   usercnt;              /*    76     4 */
              struct work_struct         work;                 /*    80    32 */
              char                       name[16];             /*   112    16 */
              /* --- cacheline 2 boundary (128 bytes) --- */
      
              /* size: 128, cachelines: 2, members: 17 */
              /* sum members: 121, holes: 1, sum holes: 7 */
        };
      
      Now all entries in the first cacheline are read only throughout
      the life time of the map, set up once during map creation. Overall
      struct size and number of cachelines doesn't change from the
      reordering. struct bpf_map is usually first member and embedded
      in map structs in specific map implementations, so also avoid those
      members to sit at the end where it could potentially share the
      cacheline with first map values e.g. in the array since remote
      CPUs could trigger map updates just as well for those (easily
      dirtying members like max_entries intentionally as well) while
      having subsequent values in cache.
      
      Quoting from Google's Project Zero blog [1]:
      
        Additionally, at least on the Intel machine on which this was
        tested, bouncing modified cache lines between cores is slow,
        apparently because the MESI protocol is used for cache coherence
        [8]. Changing the reference counter of an eBPF array on one
        physical CPU core causes the cache line containing the reference
        counter to be bounced over to that CPU core, making reads of the
        reference counter on all other CPU cores slow until the changed
        reference counter has been written back to memory. Because the
        length and the reference counter of an eBPF array are stored in
        the same cache line, this also means that changing the reference
        counter on one physical CPU core causes reads of the eBPF array's
        length to be slow on other physical CPU cores (intentional false
        sharing).
      
      While this doesn't 'control' the out-of-bounds speculation through
      masking the index as in commit b2157399, triggering a manipulation
      of the map's reference counter is really trivial, so lets not allow
      to easily affect max_entries from it.
      
      Splitting to separate cachelines also generally makes sense from
      a performance perspective anyway in that fast-path won't have a
      cache miss if the map gets pinned, reused in other progs, etc out
      of control path, thus also avoids unintentional false sharing.
      
        [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.htmlSigned-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96d9b233
    • Daniel Borkmann's avatar
      bpf: arsh is not supported in 32 bit alu thus reject it · 7dcda40e
      Daniel Borkmann authored
      [ upstream commit 7891a87e ]
      
      The following snippet was throwing an 'unknown opcode cc' warning
      in BPF interpreter:
      
        0: (18) r0 = 0x0
        2: (7b) *(u64 *)(r10 -16) = r0
        3: (cc) (u32) r0 s>>= (u32) r0
        4: (95) exit
      
      Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
      generation, not all of them do and interpreter does neither. We can
      leave existing ones and implement it later in bpf-next for the
      remaining ones, but reject this properly in verifier for the time
      being.
      
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7dcda40e
    • Alexei Starovoitov's avatar
      bpf: introduce BPF_JIT_ALWAYS_ON config · 28c48674
      Alexei Starovoitov authored
      [ upstream commit 290af866 ]
      
      The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
      
      A quote from goolge project zero blog:
      "At this point, it would normally be necessary to locate gadgets in
      the host kernel code that can be used to actually leak data by reading
      from an attacker-controlled location, shifting and masking the result
      appropriately and then using the result of that as offset to an
      attacker-controlled address for a load. But piecing gadgets together
      and figuring out which ones work in a speculation context seems annoying.
      So instead, we decided to use the eBPF interpreter, which is built into
      the host kernel - while there is no legitimate way to invoke it from inside
      a VM, the presence of the code in the host kernel's text section is sufficient
      to make it usable for the attack, just like with ordinary ROP gadgets."
      
      To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
      option that removes interpreter from the kernel in favor of JIT-only mode.
      So far eBPF JIT is supported by:
      x64, arm64, arm32, sparc64, s390, powerpc64, mips64
      
      The start of JITed program is randomized and code page is marked as read-only.
      In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
      
      v2->v3:
      - move __bpf_prog_ret0 under ifdef (Daniel)
      
      v1->v2:
      - fix init order, test_bpf and cBPF (Daniel's feedback)
      - fix offloaded bpf (Jakub's feedback)
      - add 'return 0' dummy in case something can invoke prog->bpf_func
      - retarget bpf tree. For bpf-next the patch would need one extra hunk.
        It will be sent when the trees are merged back to net-next
      
      Considered doing:
        int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
      but it seems better to land the patch as-is and in bpf-next remove
      bpf_jit_enable global variable from all JITs, consolidate in one place
      and remove this jit_init() function.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28c48674
    • Alexei Starovoitov's avatar
      bpf: fix bpf_tail_call() x64 JIT · 361fb048
      Alexei Starovoitov authored
      [ upstream commit 90caccdd ]
      
      - bpf prog_array just like all other types of bpf array accepts 32-bit index.
        Clarify that in the comment.
      - fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes
      - tighten corresponding check in the interpreter to stay consistent
      
      The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag
      in commit 96eabe7a in 4.14. Before that the map_flags would stay zero and
      though JIT code is wrong it will check bounds correctly.
      Hence two fixes tags. All other JITs don't have this problem.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Fixes: 96eabe7a ("bpf: Allow selecting numa node during map creation")
      Fixes: b52f00e6 ("x86: bpf_jit: implement bpf_tail_call() helper")
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      361fb048
    • Eric Dumazet's avatar
      x86: bpf_jit: small optimization in emit_bpf_tail_call() · 5a802e67
      Eric Dumazet authored
      [ upstream commit 84ccac6e ]
      
      Saves 4 bytes replacing following instructions :
      
      lea rax, [rsi + rdx * 8 + offsetof(...)]
      mov rax, qword ptr [rax]
      cmp rax, 0
      
      by :
      
      mov rax, [rsi + rdx * 8 + offsetof(...)]
      test rax, rax
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a802e67
    • Alexei Starovoitov's avatar
      bpf: fix branch pruning logic · 1367d854
      Alexei Starovoitov authored
      [ Upstream commit c131187d ]
      
      when the verifier detects that register contains a runtime constant
      and it's compared with another constant it will prune exploration
      of the branch that is guaranteed not to be taken at runtime.
      This is all correct, but malicious program may be constructed
      in such a way that it always has a constant comparison and
      the other branch is never taken under any conditions.
      In this case such path through the program will not be explored
      by the verifier. It won't be taken at run-time either, but since
      all instructions are JITed the malicious program may cause JITs
      to complain about using reserved fields, etc.
      To fix the issue we have to track the instructions explored by
      the verifier and sanitize instructions that are dead at run time
      with NOPs. We cannot reject such dead code, since llvm generates
      it for valid C code, since it doesn't do as much data flow
      analysis as the verifier does.
      
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1367d854
    • Linus Torvalds's avatar
      loop: fix concurrent lo_open/lo_release · b3922254
      Linus Torvalds authored
      commit ae665016 upstream.
      
      范龙飞 reports that KASAN can report a use-after-free in __lock_acquire.
      The reason is due to insufficient serialization in lo_release(), which
      will continue to use the loop device even after it has decremented the
      lo_refcnt to zero.
      
      In the meantime, another process can come in, open the loop device
      again as it is being shut down. Confusion ensues.
      Reported-by: default avatar范龙飞 <long7573@126.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3922254
  2. 31 Jan, 2018 15 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.4.114 · 49fe90b8
      Greg Kroah-Hartman authored
      49fe90b8
    • Ben Hutchings's avatar
      nfsd: auth: Fix gid sorting when rootsquash enabled · 3f84339b
      Ben Hutchings authored
      commit 19952667 upstream.
      
      Commit bdcf0a42 ("kernel: make groups_sort calling a responsibility
      group_info allocators") appears to break nfsd rootsquash in a pretty
      major way.
      
      It adds a call to groups_sort() inside the loop that copies/squashes
      gids, which means the valid gids are sorted along with the following
      garbage.  The net result is that the highest numbered valid gids are
      replaced with any lower-valued garbage gids, possibly including 0.
      
      We should sort only once, after filling in all the gids.
      
      Fixes: bdcf0a42 ("kernel: make groups_sort calling a responsibility ...")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Acked-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Wolfgang Walter <linux@stwm.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f84339b
    • Dan Streetman's avatar
      net: tcp: close sock if net namespace is exiting · edaafa80
      Dan Streetman authored
      
      [ Upstream commit 4ee806d5 ]
      
      When a tcp socket is closed, if it detects that its net namespace is
      exiting, close immediately and do not wait for FIN sequence.
      
      For normal sockets, a reference is taken to their net namespace, so it will
      never exit while the socket is open.  However, kernel sockets do not take a
      reference to their net namespace, so it may begin exiting while the kernel
      socket is still open.  In this case if the kernel socket is a tcp socket,
      it will stay open trying to complete its close sequence.  The sock's dst(s)
      hold a reference to their interface, which are all transferred to the
      namespace's loopback interface when the real interfaces are taken down.
      When the namespace tries to take down its loopback interface, it hangs
      waiting for all references to the loopback interface to release, which
      results in messages like:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      These messages continue until the socket finally times out and closes.
      Since the net namespace cleanup holds the net_mutex while calling its
      registered pernet callbacks, any new net namespace initialization is
      blocked until the current net namespace finishes exiting.
      
      After this change, the tcp socket notices the exiting net namespace, and
      closes immediately, releasing its dst(s) and their reference to the
      loopback interface, which lets the net namespace continue exiting.
      
      Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811Signed-off-by: default avatarDan Streetman <ddstreet@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edaafa80
    • Eric Dumazet's avatar
      flow_dissector: properly cap thoff field · d35cd5e2
      Eric Dumazet authored
      
      [ Upstream commit d0c081b4 ]
      
      syzbot reported yet another crash [1] that is caused by
      insufficient validation of DODGY packets.
      
      Two bugs are happening here to trigger the crash.
      
      1) Flow dissection leaves with incorrect thoff field.
      
      2) skb_probe_transport_header() sets transport header to this invalid
      thoff, even if pointing after skb valid data.
      
      3) qdisc_pkt_len_init() reads out-of-bound data because it
      trusts tcp_hdrlen(skb)
      
      Possible fixes :
      
      - Full flow dissector validation before injecting bad DODGY packets in
      the stack.
       This approach was attempted here : https://patchwork.ozlabs.org/patch/
      861874/
      
      - Have more robust functions in the core.
        This might be needed anyway for stable versions.
      
      This patch fixes the flow dissection issue.
      
      [1]
      CPU: 1 PID: 3144 Comm: syzkaller271204 Not tainted 4.15.0-rc4-mm1+ #49
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:355 [inline]
       kasan_report+0x23b/0x360 mm/kasan/report.c:413
       __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:432
       __tcp_hdrlen include/linux/tcp.h:35 [inline]
       tcp_hdrlen include/linux/tcp.h:40 [inline]
       qdisc_pkt_len_init net/core/dev.c:3160 [inline]
       __dev_queue_xmit+0x20d3/0x2200 net/core/dev.c:3465
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3554
       packet_snd net/packet/af_packet.c:2943 [inline]
       packet_sendmsg+0x3ad5/0x60a0 net/packet/af_packet.c:2968
       sock_sendmsg_nosec net/socket.c:628 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:638
       sock_write_iter+0x31a/0x5d0 net/socket.c:907
       call_write_iter include/linux/fs.h:1776 [inline]
       new_sync_write fs/read_write.c:469 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:482
       vfs_write+0x189/0x510 fs/read_write.c:544
       SYSC_write fs/read_write.c:589 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:581
       entry_SYSCALL_64_fastpath+0x1f/0x96
      
      Fixes: 34fad54c ("net: __skb_flow_dissect() must cap its return value")
      Fixes: a6e544b0 ("flow_dissector: Jump to exit code in __skb_flow_dissect")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d35cd5e2
    • Jim Westfall's avatar
      ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY · cab84514
      Jim Westfall authored
      
      [ Upstream commit cd9ff4de ]
      
      Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
      to avoid making an entry for every remote ip the device needs to talk to.
      
      This used the be the old behavior but became broken in a263b309
      (ipv4: Make neigh lookups directly in output packet path) and later removed
      in 0bb4087c (ipv4: Fix neigh lookup keying over loopback/point-to-point
      devices) because it was broken.
      Signed-off-by: default avatarJim Westfall <jwestfall@surrealistic.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cab84514
    • Jim Westfall's avatar
      net: Allow neigh contructor functions ability to modify the primary_key · 29837a4a
      Jim Westfall authored
      
      [ Upstream commit 096b9854 ]
      
      Use n->primary_key instead of pkey to account for the possibility that a neigh
      constructor function may have modified the primary_key value.
      Signed-off-by: default avatarJim Westfall <jwestfall@surrealistic.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29837a4a
    • Neil Horman's avatar
      vmxnet3: repair memory leak · 0d9bcadb
      Neil Horman authored
      
      [ Upstream commit 848b1598 ]
      
      with the introduction of commit
      b0eb57cb, it appears that rq->buf_info
      is improperly handled.  While it is heap allocated when an rx queue is
      setup, and freed when torn down, an old line of code in
      vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0]
      being set to NULL prior to its being freed, causing a memory leak, which
      eventually exhausts the system on repeated create/destroy operations
      (for example, when  the mtu of a vmxnet3 interface is changed
      frequently.
      
      Fix is pretty straight forward, just move the NULL set to after the
      free.
      
      Tested by myself with successful results
      
      Applies to net, and should likely be queued for stable, please
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-By: boyang@redhat.com
      CC: boyang@redhat.com
      CC: Shrikrishna Khare <skhare@vmware.com>
      CC: "VMware, Inc." <pv-drivers@vmware.com>
      CC: David S. Miller <davem@davemloft.net>
      Acked-by: default avatarShrikrishna Khare <skhare@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d9bcadb
    • Xin Long's avatar
      sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf · 50194c3f
      Xin Long authored
      
      [ Upstream commit a0ff6600 ]
      
      After commit cea0cc80 ("sctp: use the right sk after waking up from
      wait_buf sleep"), it may change to lock another sk if the asoc has been
      peeled off in sctp_wait_for_sndbuf.
      
      However, the asoc's new sk could be already closed elsewhere, as it's in
      the sendmsg context of the old sk that can't avoid the new sk's closing.
      If the sk's last one refcnt is held by this asoc, later on after putting
      this asoc, the new sk will be freed, while under it's own lock.
      
      This patch is to revert that commit, but fix the old issue by returning
      error under the old sk's lock.
      
      Fixes: cea0cc80 ("sctp: use the right sk after waking up from wait_buf sleep")
      Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50194c3f
    • Xin Long's avatar
      sctp: do not allow the v4 socket to bind a v4mapped v6 address · 23f521bc
      Xin Long authored
      
      [ Upstream commit c5006b8a ]
      
      The check in sctp_sockaddr_af is not robust enough to forbid binding a
      v4mapped v6 addr on a v4 socket.
      
      The worse thing is that v4 socket's bind_verify would not convert this
      v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
      socket bound a v6 addr.
      
      This patch is to fix it by doing the common sa.sa_family check first,
      then AF_INET check for v4mapped v6 addrs.
      
      Fixes: 7dab83de ("sctp: Support ipv6only AF_INET6 sockets.")
      Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23f521bc
    • Francois Romieu's avatar
      r8169: fix memory corruption on retrieval of hardware statistics. · d6611191
      Francois Romieu authored
      
      [ Upstream commit a78e9366 ]
      
      Hardware statistics retrieval hurts in tight invocation loops.
      
      Avoid extraneous write and enforce strict ordering of writes targeted to
      the tally counters dump area address registers.
      Signed-off-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Tested-by: default avatarOliver Freyermuth <o.freyermuth@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6611191
    • Guillaume Nault's avatar
      pppoe: take ->needed_headroom of lower device into account on xmit · a49558eb
      Guillaume Nault authored
      
      [ Upstream commit 02612bb0 ]
      
      In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom
      was probably fine before the introduction of ->needed_headroom in
      commit f5184d26 ("net: Allow netdevices to specify needed head/tailroom").
      
      But now, virtual devices typically advertise the size of their overhead
      in dev->needed_headroom, so we must also take it into account in
      skb_reserve().
      Allocation size of skb is also updated to take dev->needed_tailroom
      into account and replace the arbitrary 32 bytes with the real size of
      a PPPoE header.
      
      This issue was discovered by syzbot, who connected a pppoe socket to a
      gre device which had dev->header_ops->create == ipgre_header and
      dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any
      headroom, and dev_hard_header() crashed when ipgre_header() tried to
      prepend its header to skb->data.
      
      skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24
      head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0
      ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:104!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
          (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted
      4.15.0-rc7-next-20180115+ #97
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100
      RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282
      RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000
      RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc
      RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0
      R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180
      FS:  00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        skb_under_panic net/core/skbuff.c:114 [inline]
        skb_push+0xce/0xf0 net/core/skbuff.c:1714
        ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879
        dev_hard_header include/linux/netdevice.h:2723 [inline]
        pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:640
        sock_write_iter+0x31a/0x5d0 net/socket.c:909
        call_write_iter include/linux/fs.h:1775 [inline]
        do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
        do_iter_write+0x154/0x540 fs/read_write.c:932
        vfs_writev+0x18a/0x340 fs/read_write.c:977
        do_writev+0xfc/0x2a0 fs/read_write.c:1012
        SYSC_writev fs/read_write.c:1085 [inline]
        SyS_writev+0x27/0x30 fs/read_write.c:1082
        entry_SYSCALL_64_fastpath+0x29/0xa0
      
      Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like
      interfaces, but reserving space for ->needed_headroom is a more
      fundamental issue that needs to be addressed first.
      
      Same problem exists for __pppoe_xmit(), which also needs to take
      dev->needed_headroom into account in skb_cow_head().
      
      Fixes: f5184d26 ("net: Allow netdevices to specify needed head/tailroom")
      Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d@syzkaller.appspotmail.com
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a49558eb
    • Eric Dumazet's avatar
      net: qdisc_pkt_len_init() should be more robust · f50fc5f4
      Eric Dumazet authored
      
      [ Upstream commit 7c68d1a6 ]
      
      Without proper validation of DODGY packets, we might very well
      feed qdisc_pkt_len_init() with invalid GSO packets.
      
      tcp_hdrlen() might access out-of-bound data, so let's use
      skb_header_pointer() and proper checks.
      
      Whole story is described in commit d0c081b4 ("flow_dissector:
      properly cap thoff field")
      
      We have the goal of validating DODGY packets earlier in the stack,
      so we might very well revert this fix in the future.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Reported-by: syzbot+9da69ebac7dddd804552@syzkaller.appspotmail.com
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f50fc5f4
    • Craig Gallek's avatar
      tcp: __tcp_hdrlen() helper · bed42ef5
      Craig Gallek authored
      commit d9b3fca2 upstream.
      
      tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr.
      This splits the size calculation into a helper function that can be
      used if a struct tcphdr is already available.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bed42ef5
    • Felix Fietkau's avatar
      net: igmp: fix source address check for IGMPv3 reports · 6c489ab4
      Felix Fietkau authored
      
      [ Upstream commit ad23b750 ]
      
      Commit "net: igmp: Use correct source address on IGMPv3 reports"
      introduced a check to validate the source address of locally generated
      IGMPv3 packets.
      Instead of checking the local interface address directly, it uses
      inet_ifa_match(fl4->saddr, ifa), which checks if the address is on the
      local subnet (or equal to the point-to-point address if used).
      
      This breaks for point-to-point interfaces, so check against
      ifa->ifa_local directly.
      
      Cc: Kevin Cernekee <cernekee@chromium.org>
      Fixes: a46182b0 ("net: igmp: Use correct source address on IGMPv3 reports")
      Reported-by: default avatarSebastian Gottschall <s.gottschall@dd-wrt.com>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Tested-by: default avatarFlorian Wolters <florian@florian-wolters.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c489ab4
    • Yuiko Oshino's avatar
      lan78xx: Fix failure in USB Full Speed · 1202fc02
      Yuiko Oshino authored
      
      [ Upstream commit a5b1379a ]
      
      Fix initialize the uninitialized tx_qlen to an appropriate value when USB
      Full Speed is used.
      
      Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Signed-off-by: default avatarYuiko Oshino <yuiko.oshino@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1202fc02