1. 25 May, 2018 23 commits
    • Tetsuo Handa's avatar
      x86/kexec: Avoid double free_page() upon do_kexec_load() failure · a81f4015
      Tetsuo Handa authored
      commit a466ef76 upstream.
      
      >From ff82bedd3e12f0d3353282054ae48c3bd8c72012 Mon Sep 17 00:00:00 2001
      From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Date: Wed, 9 May 2018 12:12:39 +0900
      Subject: x86/kexec: Avoid double free_page() upon do_kexec_load() failure
      
      syzbot is reporting crashes after memory allocation failure inside
      do_kexec_load() [1]. This is because free_transition_pgtable() is called
      by both init_transition_pgtable() and machine_kexec_cleanup() when memory
      allocation failed inside init_transition_pgtable().
      
      Regarding 32bit code, machine_kexec_free_page_tables() is called by both
      machine_kexec_alloc_page_tables() and machine_kexec_cleanup() when memory
      allocation failed inside machine_kexec_alloc_page_tables().
      
      Fix this by leaving the error handling to machine_kexec_cleanup()
      (and optionally setting NULL after free_page()).
      
      [1] https://syzkaller.appspot.com/bug?id=91e52396168cf2bdd572fe1e1bc0bc645c1c6b40
      
      Fixes: f5deb796 ("x86: kexec: Use one page table in x86_64 machine_kexec")
      Fixes: 92be3d6b ("kexec/i386: allocate page table pages dynamically")
      Reported-by: default avatarsyzbot <syzbot+d96f60296ef613fe1d69@syzkaller.appspotmail.com>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: prudo@linux.vnet.ibm.com
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: syzkaller-bugs@googlegroups.com
      Cc: takahiro.akashi@linaro.org
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: akpm@linux-foundation.org
      Cc: dyoung@redhat.com
      Cc: kirill.shutemov@linux.intel.com
      Link: https://lkml.kernel.org/r/201805091942.DGG12448.tMFVFSJFQOOLHO@I-love.SAKURA.ne.jpSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a81f4015
    • Tetsuo Handa's avatar
      hfsplus: stop workqueue when fill_super() failed · 2595f213
      Tetsuo Handa authored
      commit 66072c29 upstream.
      
      syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1].  This
      is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync().
      
      As far as I can see, it is hfsplus_mark_mdb_dirty() from
      hfsplus_new_inode() in hfsplus_fill_super() that calls
      queue_delayed_work().  Therefore, I assume that hfsplus_new_inode() does
      not fail if queue_delayed_work() was called, and the out_put_hidden_dir
      label is the appropriate location to call cancel_delayed_work_sync().
      
      [1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c9
      
      Link: http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056201@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+4f2e5f086147d543ab03@syzkaller.appspotmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2595f213
    • Johannes Berg's avatar
      cfg80211: limit wiphy names to 128 bytes · 7d73a8c0
      Johannes Berg authored
      commit a7cfebcb upstream.
      
      There's currently no limit on wiphy names, other than netlink
      message size and memory limitations, but that causes issues when,
      for example, the wiphy name is used in a uevent, e.g. in rfkill
      where we use the same name for the rfkill instance, and then the
      buffer there is "only" 2k for the environment variables.
      
      This was reported by syzkaller, which used a 4k name.
      
      Limit the name to something reasonable, I randomly picked 128.
      
      Reported-by: syzbot+230d9e642a85d3fec29c@syzkaller.appspotmail.com
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d73a8c0
    • Jens Remus's avatar
      scsi: zfcp: fix infinite iteration on ERP ready list · 6c657191
      Jens Remus authored
      commit fa89adba upstream.
      
      zfcp_erp_adapter_reopen() schedules blocking of all of the adapter's
      rports via zfcp_scsi_schedule_rports_block() and enqueues a reopen
      adapter ERP action via zfcp_erp_action_enqueue(). Both are separately
      processed asynchronously and concurrently.
      
      Blocking of rports is done in a kworker by zfcp_scsi_rport_work(). It
      calls zfcp_scsi_rport_block(), which then traces a DBF REC "scpdely" via
      zfcp_dbf_rec_trig().  zfcp_dbf_rec_trig() acquires the DBF REC spin lock
      and then iterates with list_for_each() over the adapter's ERP ready list
      without holding the ERP lock. This opens a race window in which the
      current list entry can be moved to another list, causing list_for_each()
      to iterate forever on the wrong list, as the erp_ready_head is never
      encountered as terminal condition.
      
      Meanwhile the ERP action can be processed in the ERP thread by
      zfcp_erp_thread(). It calls zfcp_erp_strategy(), which acquires the ERP
      lock and then calls zfcp_erp_action_to_running() to move the ERP action
      from the ready to the running list.  zfcp_erp_action_to_running() can
      move the ERP action using list_move() just during the aforementioned
      race window. It then traces a REC RUN "erator1" via zfcp_dbf_rec_run().
      zfcp_dbf_rec_run() tries to acquire the DBF REC spin lock. If this is
      held by the infinitely looping kworker, it effectively spins forever.
      
      Example Sequence Diagram:
      
      Process                ERP Thread             rport_work
      -------------------    -------------------    -------------------
      zfcp_erp_adapter_reopen()
      zfcp_erp_adapter_block()
      zfcp_scsi_schedule_rports_block()
      lock ERP                                      zfcp_scsi_rport_work()
      zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER)
      list_add_tail() on ready                      !(rport_task==RPORT_ADD)
      wake_up() ERP thread                          zfcp_scsi_rport_block()
      zfcp_dbf_rec_trig()    zfcp_erp_strategy()    zfcp_dbf_rec_trig()
      unlock ERP                                    lock DBF REC
      zfcp_erp_wait()        lock ERP
      |                      zfcp_erp_action_to_running()
      |                                             list_for_each() ready
      |                      list_move()              current entry
      |                        ready to running
      |                      zfcp_dbf_rec_run()       endless loop over running
      |                      zfcp_dbf_rec_run_lvl()
      |                      lock DBF REC spins forever
      
      Any adapter recovery can trigger this, such as setting the device offline
      or reboot.
      
      V4.9 commit 4eeaa4f3 ("zfcp: close window with unblocked rport
      during rport gone") introduced additional tracing of (un)blocking of
      rports. It missed that the adapter->erp_lock must be held when calling
      zfcp_dbf_rec_trig().
      
      This fix uses the approach formerly introduced by commit aa0fec62
      ("[SCSI] zfcp: Fix sparse warning by providing new entry in dbf") that got
      later removed by commit ae0904f6 ("[SCSI] zfcp: Redesign of the debug
      tracing for recovery actions.").
      
      Introduce zfcp_dbf_rec_trig_lock(), a wrapper for zfcp_dbf_rec_trig() that
      acquires and releases the adapter->erp_lock for read.
      Reported-by: default avatarSebastian Ott <sebott@linux.ibm.com>
      Signed-off-by: default avatarJens Remus <jremus@linux.ibm.com>
      Fixes: 4eeaa4f3 ("zfcp: close window with unblocked rport during rport gone")
      Cc: <stable@vger.kernel.org> # 2.6.32+
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c657191
    • Alexander Potapenko's avatar
      scsi: sg: allocate with __GFP_ZERO in sg_build_indirect() · ad251832
      Alexander Potapenko authored
      commit a45b599a upstream.
      
      This shall help avoid copying uninitialized memory to the userspace when
      calling ioctl(fd, SG_IO) with an empty command.
      
      Reported-by: syzbot+7d26fc1eea198488deab@syzkaller.appspotmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad251832
    • Jason Yan's avatar
      scsi: libsas: defer ata device eh commands to libata · e420d983
      Jason Yan authored
      commit 318aaf34 upstream.
      
      When ata device doing EH, some commands still attached with tasks are
      not passed to libata when abort failed or recover failed, so libata did
      not handle these commands. After these commands done, sas task is freed,
      but ata qc is not freed. This will cause ata qc leak and trigger a
      warning like below:
      
      WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
      ata_eh_finish+0xb4/0xcc
      CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G     W  OE 4.14.0#1
      ......
      Call trace:
      [<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc
      [<ffff0000088b8420>] ata_do_eh+0xc4/0xd8
      [<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c
      [<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694
      [<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80
      [<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170
      [<ffff0000080ebd70>] process_one_work+0x144/0x390
      [<ffff0000080ec100>] worker_thread+0x144/0x418
      [<ffff0000080f2c98>] kthread+0x10c/0x138
      [<ffff0000080855dc>] ret_from_fork+0x10/0x18
      
      If ata qc leaked too many, ata tag allocation will fail and io blocked
      for ever.
      
      As suggested by Dan Williams, defer ata device commands to libata and
      merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle
      ata qcs correctly after this.
      Signed-off-by: default avatarJason Yan <yanaijie@huawei.com>
      CC: Xiaofei Tan <tanxiaofei@huawei.com>
      CC: John Garry <john.garry@huawei.com>
      CC: Dan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e420d983
    • Martin Schwidefsky's avatar
      s390: use expoline thunks in the BPF JIT · 6089a72d
      Martin Schwidefsky authored
      [ Upstream commit de5cb6eb ]
      
      The BPF JIT need safe guarding against spectre v2 in the sk_load_xxx
      assembler stubs and the indirect branches generated by the JIT itself
      need to be converted to expolines.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6089a72d
    • Martin Schwidefsky's avatar
      s390: extend expoline to BC instructions · 1ace5fcb
      Martin Schwidefsky authored
      [ Upstream commit 6deaa3bb ]
      
      The BPF JIT uses a 'b <disp>(%r<x>)' instruction in the definition
      of the sk_load_word and sk_load_half functions.
      
      Add support for branch-on-condition instructions contained in the
      thunk code of an expoline.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ace5fcb
    • Martin Schwidefsky's avatar
      s390: move spectre sysfs attribute code · b004790d
      Martin Schwidefsky authored
      [ Upstream commit 4253b0e0 ]
      
      The nospec-branch.c file is compiled without the gcc options to
      generate expoline thunks. The return branch of the sysfs show
      functions cpu_show_spectre_v1 and cpu_show_spectre_v2 is an indirect
      branch as well. These need to be compiled with expolines.
      
      Move the sysfs functions for spectre reporting to a separate file
      and loose an '.' for one of the messages.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: d424986f ("s390: add sysfs attributes for spectre")
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b004790d
    • Martin Schwidefsky's avatar
      s390/kernel: use expoline for indirect branches · b35421ab
      Martin Schwidefsky authored
      [ Upstream commit c50c84c3 ]
      
      The assember code in arch/s390/kernel uses a few more indirect branches
      which need to be done with execute trampolines for CONFIG_EXPOLINE=y.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b35421ab
    • Martin Schwidefsky's avatar
      s390/ftrace: use expoline for indirect branches · caa47e1f
      Martin Schwidefsky authored
      [ Upstream commit 23a4d7fd ]
      
      The return from the ftrace_stub, _mcount, ftrace_caller and
      return_to_handler functions is done with "br %r14" and "br %r1".
      These are indirect branches as well and need to use execute
      trampolines for CONFIG_EXPOLINE=y.
      
      The ftrace_caller function is a special case as it returns to the
      start of a function and may only use %r0 and %r1. For a pre z10
      machine the standard execute trampoline uses a LARL + EX to do
      this, but this requires *two* registers in the range %r1..%r15.
      To get around this the 'br %r1' located in the lowcore is used,
      then the EX instruction does not need an address register.
      But the lowcore trick may only be used for pre z14 machines,
      with noexec=on the mapping for the first page may not contain
      instructions. The solution for that is an ALTERNATIVE in the
      expoline THUNK generated by 'GEN_BR_THUNK %r1' to switch to
      EXRL, this relies on the fact that a machine that supports
      noexec=on has EXRL as well.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      caa47e1f
    • Martin Schwidefsky's avatar
      s390/lib: use expoline for indirect branches · cba0d6c2
      Martin Schwidefsky authored
      [ Upstream commit 97489e06 ]
      
      The return from the memmove, memset, memcpy, __memset16, __memset32 and
      __memset64 functions are done with "br %r14". These are indirect branches
      as well and need to use execute trampolines for CONFIG_EXPOLINE=y.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cba0d6c2
    • Martin Schwidefsky's avatar
      s390/crc32-vx: use expoline for indirect branches · f37bfc0d
      Martin Schwidefsky authored
      [ Upstream commit 467a3bf2 ]
      
      The return from the crc32_le_vgfm_16/crc32c_le_vgfm_16 and the
      crc32_be_vgfm_16 functions are done with "br %r14". These are indirect
      branches as well and need to use execute trampolines for CONFIG_EXPOLINE=y.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f37bfc0d
    • Martin Schwidefsky's avatar
      s390: move expoline assembler macros to a header · 4a5c26dd
      Martin Schwidefsky authored
      [ Upstream commit 6dd85fbb ]
      
      To be able to use the expoline branches in different assembler
      files move the associated macros from entry.S to a new header
      nospec-insn.h.
      
      While we are at it make the macros a bit nicer to use.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a5c26dd
    • Martin Schwidefsky's avatar
      s390: add assembler macros for CPU alternatives · 63257f26
      Martin Schwidefsky authored
      [ Upstream commit fba9eb79 ]
      
      Add a header with macros usable in assembler files to emit alternative
      code sequences. It works analog to the alternatives for inline assmeblies
      in C files, with the same restrictions and capabilities.
      The syntax is
      
           ALTERNATIVE "<default instructions sequence>", \
      		 "<alternative instructions sequence>", \
      		 "<features-bit>"
      and
      
           ALTERNATIVE_2 "<default instructions sequence>", \
      		   "<alternative instructions sqeuence #1>", \
      		   "<feature-bit #1>",
      		   "<alternative instructions sqeuence #2>", \
      		   "<feature-bit #2>"
      Reviewed-by: default avatarVasily Gorbik <gor@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63257f26
    • Al Viro's avatar
      ext2: fix a block leak · 808449d2
      Al Viro authored
      commit 5aa1437d upstream.
      
      open file, unlink it, then use ioctl(2) to make it immutable or
      append only.  Now close it and watch the blocks *not* freed...
      
      Immutable/append-only checks belong in ->setattr().
      Note: the bug is old and backport to anything prior to 737f2e93
      ("ext2: convert to use the new truncate convention") will need
      these checks lifted into ext2_setattr().
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      808449d2
    • hpreg@vmware.com's avatar
      vmxnet3: use DMA memory barriers where required · aab32922
      hpreg@vmware.com authored
      [ Upstream commit f3002c13 ]
      
      The gen bits must be read first from (resp. written last to) DMA memory.
      The proper way to enforce this on Linux is to call dma_rmb() (resp.
      dma_wmb()).
      Signed-off-by: default avatarRegis Duchesne <hpreg@vmware.com>
      Acked-by: default avatarRonak Doshi <doshir@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aab32922
    • hpreg@vmware.com's avatar
      vmxnet3: set the DMA mask before the first DMA map operation · 779fd38b
      hpreg@vmware.com authored
      [ Upstream commit 61aeecea ]
      
      The DMA mask must be set before, not after, the first DMA map operation, or
      the first DMA map operation could in theory fail on some systems.
      
      Fixes: b0eb57cb ("VMXNET3: Add support for virtual IOMMU")
      Signed-off-by: default avatarRegis Duchesne <hpreg@vmware.com>
      Acked-by: default avatarRonak Doshi <doshir@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      779fd38b
    • Eric Dumazet's avatar
      tcp: purge write queue in tcp_connect_init() · 74a4c09d
      Eric Dumazet authored
      [ Upstream commit 7f582b24 ]
      
      syzkaller found a reliable way to crash the host, hitting a BUG()
      in __tcp_retransmit_skb()
      
      Malicous MSG_FASTOPEN is the root cause. We need to purge write queue
      in tcp_connect_init() at the point we init snd_una/write_seq.
      
      This patch also replaces the BUG() by a less intrusive WARN_ON_ONCE()
      
      kernel BUG at net/ipv4/tcp_output.c:2837!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 5276 Comm: syz-executor0 Not tainted 4.17.0-rc3+ #51
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__tcp_retransmit_skb+0x2992/0x2eb0 net/ipv4/tcp_output.c:2837
      RSP: 0000:ffff8801dae06ff8 EFLAGS: 00010206
      RAX: ffff8801b9fe61c0 RBX: 00000000ffc18a16 RCX: ffffffff864e1a49
      RDX: 0000000000000100 RSI: ffffffff864e2e12 RDI: 0000000000000005
      RBP: ffff8801dae073a0 R08: ffff8801b9fe61c0 R09: ffffed0039c40dd2
      R10: ffffed0039c40dd2 R11: ffff8801ce206e93 R12: 00000000421eeaad
      R13: ffff8801ce206d4e R14: ffff8801ce206cc0 R15: ffff8801cd4f4a80
      FS:  0000000000000000(0000) GS:ffff8801dae00000(0063) knlGS:00000000096bc900
      CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 0000000020000000 CR3: 00000001c47b6000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
       tcp_retransmit_skb+0x2e/0x250 net/ipv4/tcp_output.c:2923
       tcp_retransmit_timer+0xc50/0x3060 net/ipv4/tcp_timer.c:488
       tcp_write_timer_handler+0x339/0x960 net/ipv4/tcp_timer.c:573
       tcp_write_timer+0x111/0x1d0 net/ipv4/tcp_timer.c:593
       call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
       expire_timers kernel/time/timer.c:1363 [inline]
       __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
       run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
       __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
       invoke_softirq kernel/softirq.c:365 [inline]
       irq_exit+0x1d1/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:525 [inline]
       smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
      
      Fixes: cf60af03 ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74a4c09d
    • Eric Dumazet's avatar
      sock_diag: fix use-after-free read in __sk_free · a5e907c3
      Eric Dumazet authored
      [ Upstream commit 9709020c ]
      
      We must not call sock_diag_has_destroy_listeners(sk) on a socket
      that has no reference on net structure.
      
      BUG: KASAN: use-after-free in sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
      BUG: KASAN: use-after-free in __sk_free+0x329/0x340 net/core/sock.c:1609
      Read of size 8 at addr ffff88018a02e3a0 by task swapper/1/0
      
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.17.0-rc5+ #54
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
       __sk_free+0x329/0x340 net/core/sock.c:1609
       sk_free+0x42/0x50 net/core/sock.c:1623
       sock_put include/net/sock.h:1664 [inline]
       reqsk_free include/net/request_sock.h:116 [inline]
       reqsk_put include/net/request_sock.h:124 [inline]
       inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:672 [inline]
       reqsk_timer_handler+0xe27/0x10e0 net/ipv4/inet_connection_sock.c:739
       call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
       expire_timers kernel/time/timer.c:1363 [inline]
       __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
       run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
       __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
       invoke_softirq kernel/softirq.c:365 [inline]
       irq_exit+0x1d1/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:525 [inline]
       smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
       </IRQ>
      RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
      RSP: 0018:ffff8801d9ae7c38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
      RAX: dffffc0000000000 RBX: 1ffff1003b35cf8a RCX: 0000000000000000
      RDX: 1ffffffff11a30d0 RSI: 0000000000000001 RDI: ffffffff88d18680
      RBP: ffff8801d9ae7c38 R08: ffffed003b5e46c3 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
      R13: ffff8801d9ae7cf0 R14: ffffffff897bef20 R15: 0000000000000000
       arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
       default_idle+0xc2/0x440 arch/x86/kernel/process.c:354
       arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345
       default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
       cpuidle_idle_call kernel/sched/idle.c:153 [inline]
       do_idle+0x395/0x560 kernel/sched/idle.c:262
       cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:368
       start_secondary+0x426/0x5b0 arch/x86/kernel/smpboot.c:269
       secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
      
      Allocated by task 4557:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
       kmem_cache_zalloc include/linux/slab.h:691 [inline]
       net_alloc net/core/net_namespace.c:383 [inline]
       copy_net_ns+0x159/0x4c0 net/core/net_namespace.c:423
       create_new_namespaces+0x69d/0x8f0 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xc3/0x1f0 kernel/nsproxy.c:206
       ksys_unshare+0x708/0xf90 kernel/fork.c:2408
       __do_sys_unshare kernel/fork.c:2476 [inline]
       __se_sys_unshare kernel/fork.c:2474 [inline]
       __x64_sys_unshare+0x31/0x40 kernel/fork.c:2474
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 69:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
       net_free net/core/net_namespace.c:399 [inline]
       net_drop_ns.part.14+0x11a/0x130 net/core/net_namespace.c:406
       net_drop_ns net/core/net_namespace.c:405 [inline]
       cleanup_net+0x6a1/0xb20 net/core/net_namespace.c:541
       process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
       worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
       kthread+0x345/0x410 kernel/kthread.c:240
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      
      The buggy address belongs to the object at ffff88018a02c140
       which belongs to the cache net_namespace of size 8832
      The buggy address is located 8800 bytes inside of
       8832-byte region [ffff88018a02c140, ffff88018a02e3c0)
      The buggy address belongs to the page:
      page:ffffea0006280b00 count:1 mapcount:0 mapping:ffff88018a02c140 index:0x0 compound_mapcount: 0
      flags: 0x2fffc0000008100(slab|head)
      raw: 02fffc0000008100 ffff88018a02c140 0000000000000000 0000000100000001
      raw: ffffea00062a1320 ffffea0006268020 ffff8801d9bdde40 0000000000000000
      page dumped because: kasan: bad access detected
      
      Fixes: b922622e ("sock_diag: don't broadcast kernel sockets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Craig Gallek <kraig@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5e907c3
    • Willem de Bruijn's avatar
      packet: in packet_snd start writing at link layer allocation · 6190cce2
      Willem de Bruijn authored
      [ Upstream commit b84bbaf7 ]
      
      Packet sockets allow construction of packets shorter than
      dev->hard_header_len to accommodate protocols with variable length
      link layer headers. These packets are padded to dev->hard_header_len,
      because some device drivers interpret that as a minimum packet size.
      
      packet_snd reserves dev->hard_header_len bytes on allocation.
      SOCK_DGRAM sockets call skb_push in dev_hard_header() to ensure that
      link layer headers are stored in the reserved range. SOCK_RAW sockets
      do the same in tpacket_snd, but not in packet_snd.
      
      Syzbot was able to send a zero byte packet to a device with massive
      116B link layer header, causing padding to cross over into skb_shinfo.
      Fix this by writing from the start of the llheader reserved range also
      in the case of packet_snd/SOCK_RAW.
      
      Update skb_set_network_header to the new offset. This also corrects
      it for SOCK_DGRAM, where it incorrectly double counted reserve due to
      the skb_push in dev_hard_header.
      
      Fixes: 9ed988cd ("packet: validate variable length ll headers")
      Reported-by: syzbot+71d74a5406d02057d559@syzkaller.appspotmail.com
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6190cce2
    • Willem de Bruijn's avatar
      net: test tailroom before appending to linear skb · 2ef22bd0
      Willem de Bruijn authored
      [ Upstream commit 113f99c3 ]
      
      Device features may change during transmission. In particular with
      corking, a device may toggle scatter-gather in between allocating
      and writing to an skb.
      
      Do not unconditionally assume that !NETIF_F_SG at write time implies
      that the same held at alloc time and thus the skb has sufficient
      tailroom.
      
      This issue predates git history.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ef22bd0
    • Tarick Bedeir's avatar
      net/mlx4_core: Fix error handling in mlx4_init_port_info. · 97b7270c
      Tarick Bedeir authored
      [ Upstream commit 57f6f99f ]
      
      Avoid exiting the function with a lingering sysfs file (if the first
      call to device_create_file() fails while the second succeeds), and avoid
      calling devlink_port_unregister() twice.
      
      In other words, either mlx4_init_port_info() succeeds and returns zero, or
      it fails, returns non-zero, and requires no cleanup.
      
      Fixes: 096335b3 ("mlx4_core: Allow dynamic MTU configuration for IB ports")
      Signed-off-by: default avatarTarick Bedeir <tarick@google.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97b7270c
  2. 22 May, 2018 17 commits