1. 30 May, 2018 15 commits
  2. 26 May, 2018 25 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.4.133 · 7620164e
      Greg Kroah-Hartman authored
      7620164e
    • Tetsuo Handa's avatar
      x86/kexec: Avoid double free_page() upon do_kexec_load() failure · eef045e7
      Tetsuo Handa authored
      commit a466ef76 upstream.
      
      >From ff82bedd3e12f0d3353282054ae48c3bd8c72012 Mon Sep 17 00:00:00 2001
      From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Date: Wed, 9 May 2018 12:12:39 +0900
      Subject: x86/kexec: Avoid double free_page() upon do_kexec_load() failure
      
      syzbot is reporting crashes after memory allocation failure inside
      do_kexec_load() [1]. This is because free_transition_pgtable() is called
      by both init_transition_pgtable() and machine_kexec_cleanup() when memory
      allocation failed inside init_transition_pgtable().
      
      Regarding 32bit code, machine_kexec_free_page_tables() is called by both
      machine_kexec_alloc_page_tables() and machine_kexec_cleanup() when memory
      allocation failed inside machine_kexec_alloc_page_tables().
      
      Fix this by leaving the error handling to machine_kexec_cleanup()
      (and optionally setting NULL after free_page()).
      
      [1] https://syzkaller.appspot.com/bug?id=91e52396168cf2bdd572fe1e1bc0bc645c1c6b40
      
      Fixes: f5deb796 ("x86: kexec: Use one page table in x86_64 machine_kexec")
      Fixes: 92be3d6b ("kexec/i386: allocate page table pages dynamically")
      Reported-by: default avatarsyzbot <syzbot+d96f60296ef613fe1d69@syzkaller.appspotmail.com>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: prudo@linux.vnet.ibm.com
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: syzkaller-bugs@googlegroups.com
      Cc: takahiro.akashi@linaro.org
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: akpm@linux-foundation.org
      Cc: dyoung@redhat.com
      Cc: kirill.shutemov@linux.intel.com
      Link: https://lkml.kernel.org/r/201805091942.DGG12448.tMFVFSJFQOOLHO@I-love.SAKURA.ne.jpSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eef045e7
    • Tetsuo Handa's avatar
      hfsplus: stop workqueue when fill_super() failed · 338762ca
      Tetsuo Handa authored
      commit 66072c29 upstream.
      
      syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1].  This
      is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync().
      
      As far as I can see, it is hfsplus_mark_mdb_dirty() from
      hfsplus_new_inode() in hfsplus_fill_super() that calls
      queue_delayed_work().  Therefore, I assume that hfsplus_new_inode() does
      not fail if queue_delayed_work() was called, and the out_put_hidden_dir
      label is the appropriate location to call cancel_delayed_work_sync().
      
      [1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c9
      
      Link: http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056201@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+4f2e5f086147d543ab03@syzkaller.appspotmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      338762ca
    • Johannes Berg's avatar
      cfg80211: limit wiphy names to 128 bytes · 87c807f1
      Johannes Berg authored
      commit a7cfebcb upstream.
      
      There's currently no limit on wiphy names, other than netlink
      message size and memory limitations, but that causes issues when,
      for example, the wiphy name is used in a uevent, e.g. in rfkill
      where we use the same name for the rfkill instance, and then the
      buffer there is "only" 2k for the environment variables.
      
      This was reported by syzkaller, which used a 4k name.
      
      Limit the name to something reasonable, I randomly picked 128.
      
      Reported-by: syzbot+230d9e642a85d3fec29c@syzkaller.appspotmail.com
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87c807f1
    • Geert Uytterhoeven's avatar
      gpio: rcar: Add Runtime PM handling for interrupts · c610b0cb
      Geert Uytterhoeven authored
      commit b26a719b upstream.
      
      The R-Car GPIO driver handles Runtime PM for requested GPIOs only.
      
      When using a GPIO purely as an interrupt source, no Runtime PM handling
      is done, and the GPIO module's clock may not be enabled.
      
      To fix this:
        - Add .irq_request_resources() and .irq_release_resources() callbacks
          to handle Runtime PM when an interrupt is requested,
        - Add irq_bus_lock() and sync_unlock() callbacks to handle Runtime PM
          when e.g. disabling/enabling an interrupt, or configuring the
          interrupt type.
      
      Fixes: d5c3d846 "net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS"
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      [fabrizio: cherry-pick to v4.4.y. Use container_of instead of
      gpiochip_get_data.]
      Signed-off-by: default avatarFabrizio Castro <fabrizio.castro@bp.renesas.com>
      Reviewed-by: default avatarBiju Das <biju.das@bp.renesas.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c610b0cb
    • John Stultz's avatar
      time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting · 09f7ebaa
      John Stultz authored
      commit 3d88d56c upstream.
      
      Due to how the MONOTONIC_RAW accumulation logic was handled,
      there is the potential for a 1ns discontinuity when we do
      accumulations. This small discontinuity has for the most part
      gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
      in their vDSO clock_gettime implementation, we've seen failures
      with the inconsistency-check test in kselftest.
      
      This patch addresses the issue by using the same sub-ns
      accumulation handling that CLOCK_MONOTONIC uses, which avoids
      the issue for in-kernel users.
      
      Since the ARM64 vDSO implementation has its own clock_gettime
      calculation logic, this patch reduces the frequency of errors,
      but failures are still seen. The ARM64 vDSO will need to be
      updated to include the sub-nanosecond xtime_nsec values in its
      calculation for this issue to be completely fixed.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarDaniel Mentz <danielmentz@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Stephen Boyd <stephen.boyd@linaro.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "stable #4 . 8+" <stable@vger.kernel.org>
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [fabrizio: cherry-pick to 4.4. Kept cycle_t type for function
      logarithmic_accumulation local variable "interval". Dropped
      casting of "interval" variable]
      Signed-off-by: default avatarFabrizio Castro <fabrizio.castro@bp.renesas.com>
      Signed-off-by: default avatarBiju Das <biju.das@bp.renesas.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09f7ebaa
    • Vinod Koul's avatar
      dmaengine: ensure dmaengine helpers check valid callback · 92cffdc9
      Vinod Koul authored
      commit 757d12e5 upstream.
      
      dmaengine has various device callbacks and exposes helper
      functions to invoke these. These helpers should check if channel,
      device and callback is valid or not before invoking them.
      Reported-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarFabrizio Castro <fabrizio.castro@bp.renesas.com>
      Signed-off-by: default avatarJianming Qiao <jianming.qiao@bp.renesas.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92cffdc9
    • Jens Remus's avatar
      scsi: zfcp: fix infinite iteration on ERP ready list · 36797134
      Jens Remus authored
      commit fa89adba upstream.
      
      zfcp_erp_adapter_reopen() schedules blocking of all of the adapter's
      rports via zfcp_scsi_schedule_rports_block() and enqueues a reopen
      adapter ERP action via zfcp_erp_action_enqueue(). Both are separately
      processed asynchronously and concurrently.
      
      Blocking of rports is done in a kworker by zfcp_scsi_rport_work(). It
      calls zfcp_scsi_rport_block(), which then traces a DBF REC "scpdely" via
      zfcp_dbf_rec_trig().  zfcp_dbf_rec_trig() acquires the DBF REC spin lock
      and then iterates with list_for_each() over the adapter's ERP ready list
      without holding the ERP lock. This opens a race window in which the
      current list entry can be moved to another list, causing list_for_each()
      to iterate forever on the wrong list, as the erp_ready_head is never
      encountered as terminal condition.
      
      Meanwhile the ERP action can be processed in the ERP thread by
      zfcp_erp_thread(). It calls zfcp_erp_strategy(), which acquires the ERP
      lock and then calls zfcp_erp_action_to_running() to move the ERP action
      from the ready to the running list.  zfcp_erp_action_to_running() can
      move the ERP action using list_move() just during the aforementioned
      race window. It then traces a REC RUN "erator1" via zfcp_dbf_rec_run().
      zfcp_dbf_rec_run() tries to acquire the DBF REC spin lock. If this is
      held by the infinitely looping kworker, it effectively spins forever.
      
      Example Sequence Diagram:
      
      Process                ERP Thread             rport_work
      -------------------    -------------------    -------------------
      zfcp_erp_adapter_reopen()
      zfcp_erp_adapter_block()
      zfcp_scsi_schedule_rports_block()
      lock ERP                                      zfcp_scsi_rport_work()
      zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER)
      list_add_tail() on ready                      !(rport_task==RPORT_ADD)
      wake_up() ERP thread                          zfcp_scsi_rport_block()
      zfcp_dbf_rec_trig()    zfcp_erp_strategy()    zfcp_dbf_rec_trig()
      unlock ERP                                    lock DBF REC
      zfcp_erp_wait()        lock ERP
      |                      zfcp_erp_action_to_running()
      |                                             list_for_each() ready
      |                      list_move()              current entry
      |                        ready to running
      |                      zfcp_dbf_rec_run()       endless loop over running
      |                      zfcp_dbf_rec_run_lvl()
      |                      lock DBF REC spins forever
      
      Any adapter recovery can trigger this, such as setting the device offline
      or reboot.
      
      V4.9 commit 4eeaa4f3 ("zfcp: close window with unblocked rport
      during rport gone") introduced additional tracing of (un)blocking of
      rports. It missed that the adapter->erp_lock must be held when calling
      zfcp_dbf_rec_trig().
      
      This fix uses the approach formerly introduced by commit aa0fec62
      ("[SCSI] zfcp: Fix sparse warning by providing new entry in dbf") that got
      later removed by commit ae0904f6 ("[SCSI] zfcp: Redesign of the debug
      tracing for recovery actions.").
      
      Introduce zfcp_dbf_rec_trig_lock(), a wrapper for zfcp_dbf_rec_trig() that
      acquires and releases the adapter->erp_lock for read.
      Reported-by: default avatarSebastian Ott <sebott@linux.ibm.com>
      Signed-off-by: default avatarJens Remus <jremus@linux.ibm.com>
      Fixes: 4eeaa4f3 ("zfcp: close window with unblocked rport during rport gone")
      Cc: <stable@vger.kernel.org> # 2.6.32+
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36797134
    • Alexander Potapenko's avatar
      scsi: sg: allocate with __GFP_ZERO in sg_build_indirect() · 93314640
      Alexander Potapenko authored
      commit a45b599a upstream.
      
      This shall help avoid copying uninitialized memory to the userspace when
      calling ioctl(fd, SG_IO) with an empty command.
      
      Reported-by: syzbot+7d26fc1eea198488deab@syzkaller.appspotmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93314640
    • Jason Yan's avatar
      scsi: libsas: defer ata device eh commands to libata · 6efcc74e
      Jason Yan authored
      commit 318aaf34 upstream.
      
      When ata device doing EH, some commands still attached with tasks are
      not passed to libata when abort failed or recover failed, so libata did
      not handle these commands. After these commands done, sas task is freed,
      but ata qc is not freed. This will cause ata qc leak and trigger a
      warning like below:
      
      WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
      ata_eh_finish+0xb4/0xcc
      CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G     W  OE 4.14.0#1
      ......
      Call trace:
      [<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc
      [<ffff0000088b8420>] ata_do_eh+0xc4/0xd8
      [<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c
      [<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694
      [<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80
      [<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170
      [<ffff0000080ebd70>] process_one_work+0x144/0x390
      [<ffff0000080ec100>] worker_thread+0x144/0x418
      [<ffff0000080f2c98>] kthread+0x10c/0x138
      [<ffff0000080855dc>] ret_from_fork+0x10/0x18
      
      If ata qc leaked too many, ata tag allocation will fail and io blocked
      for ever.
      
      As suggested by Dan Williams, defer ata device commands to libata and
      merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle
      ata qcs correctly after this.
      Signed-off-by: default avatarJason Yan <yanaijie@huawei.com>
      CC: Xiaofei Tan <tanxiaofei@huawei.com>
      CC: John Garry <john.garry@huawei.com>
      CC: Dan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6efcc74e
    • Martin Schwidefsky's avatar
      s390: use expoline thunks in the BPF JIT · 8c6d306f
      Martin Schwidefsky authored
      [ Upstream commit de5cb6eb ]
      
      The BPF JIT need safe guarding against spectre v2 in the sk_load_xxx
      assembler stubs and the indirect branches generated by the JIT itself
      need to be converted to expolines.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c6d306f
    • Martin Schwidefsky's avatar
      s390: extend expoline to BC instructions · f436cb96
      Martin Schwidefsky authored
      [ Upstream commit 6deaa3bb ]
      
      The BPF JIT uses a 'b <disp>(%r<x>)' instruction in the definition
      of the sk_load_word and sk_load_half functions.
      
      Add support for branch-on-condition instructions contained in the
      thunk code of an expoline.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f436cb96
    • Martin Schwidefsky's avatar
      s390: move spectre sysfs attribute code · c617e74f
      Martin Schwidefsky authored
      [ Upstream commit 4253b0e0 ]
      
      The nospec-branch.c file is compiled without the gcc options to
      generate expoline thunks. The return branch of the sysfs show
      functions cpu_show_spectre_v1 and cpu_show_spectre_v2 is an indirect
      branch as well. These need to be compiled with expolines.
      
      Move the sysfs functions for spectre reporting to a separate file
      and loose an '.' for one of the messages.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: d424986f ("s390: add sysfs attributes for spectre")
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c617e74f
    • Martin Schwidefsky's avatar
      s390/kernel: use expoline for indirect branches · 90305465
      Martin Schwidefsky authored
      [ Upstream commit c50c84c3 ]
      
      The assember code in arch/s390/kernel uses a few more indirect branches
      which need to be done with execute trampolines for CONFIG_EXPOLINE=y.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90305465
    • Martin Schwidefsky's avatar
      s390/lib: use expoline for indirect branches · 5ce9dc0f
      Martin Schwidefsky authored
      [ Upstream commit 97489e06 ]
      
      The return from the memmove, memset, memcpy, __memset16, __memset32 and
      __memset64 functions are done with "br %r14". These are indirect branches
      as well and need to use execute trampolines for CONFIG_EXPOLINE=y.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ce9dc0f
    • Martin Schwidefsky's avatar
      s390: move expoline assembler macros to a header · 73bf2b1c
      Martin Schwidefsky authored
      [ Upstream commit 6dd85fbb ]
      
      To be able to use the expoline branches in different assembler
      files move the associated macros from entry.S to a new header
      nospec-insn.h.
      
      While we are at it make the macros a bit nicer to use.
      
      Cc: stable@vger.kernel.org # 4.16
      Fixes: f19fbd5e ("s390: introduce execute-trampolines for branches")
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73bf2b1c
    • Martin Schwidefsky's avatar
      s390: add assembler macros for CPU alternatives · 2685d33c
      Martin Schwidefsky authored
      [ Upstream commit fba9eb79 ]
      
      Add a header with macros usable in assembler files to emit alternative
      code sequences. It works analog to the alternatives for inline assmeblies
      in C files, with the same restrictions and capabilities.
      The syntax is
      
           ALTERNATIVE "<default instructions sequence>", \
      		 "<alternative instructions sequence>", \
      		 "<features-bit>"
      and
      
           ALTERNATIVE_2 "<default instructions sequence>", \
      		   "<alternative instructions sqeuence #1>", \
      		   "<feature-bit #1>",
      		   "<alternative instructions sqeuence #2>", \
      		   "<feature-bit #2>"
      Reviewed-by: default avatarVasily Gorbik <gor@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2685d33c
    • Al Viro's avatar
      ext2: fix a block leak · 3cd868dc
      Al Viro authored
      commit 5aa1437d upstream.
      
      open file, unlink it, then use ioctl(2) to make it immutable or
      append only.  Now close it and watch the blocks *not* freed...
      
      Immutable/append-only checks belong in ->setattr().
      Note: the bug is old and backport to anything prior to 737f2e93
      ("ext2: convert to use the new truncate convention") will need
      these checks lifted into ext2_setattr().
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3cd868dc
    • Eric Dumazet's avatar
      tcp: purge write queue in tcp_connect_init() · 5bbe138a
      Eric Dumazet authored
      [ Upstream commit 7f582b24 ]
      
      syzkaller found a reliable way to crash the host, hitting a BUG()
      in __tcp_retransmit_skb()
      
      Malicous MSG_FASTOPEN is the root cause. We need to purge write queue
      in tcp_connect_init() at the point we init snd_una/write_seq.
      
      This patch also replaces the BUG() by a less intrusive WARN_ON_ONCE()
      
      kernel BUG at net/ipv4/tcp_output.c:2837!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 5276 Comm: syz-executor0 Not tainted 4.17.0-rc3+ #51
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__tcp_retransmit_skb+0x2992/0x2eb0 net/ipv4/tcp_output.c:2837
      RSP: 0000:ffff8801dae06ff8 EFLAGS: 00010206
      RAX: ffff8801b9fe61c0 RBX: 00000000ffc18a16 RCX: ffffffff864e1a49
      RDX: 0000000000000100 RSI: ffffffff864e2e12 RDI: 0000000000000005
      RBP: ffff8801dae073a0 R08: ffff8801b9fe61c0 R09: ffffed0039c40dd2
      R10: ffffed0039c40dd2 R11: ffff8801ce206e93 R12: 00000000421eeaad
      R13: ffff8801ce206d4e R14: ffff8801ce206cc0 R15: ffff8801cd4f4a80
      FS:  0000000000000000(0000) GS:ffff8801dae00000(0063) knlGS:00000000096bc900
      CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 0000000020000000 CR3: 00000001c47b6000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
       tcp_retransmit_skb+0x2e/0x250 net/ipv4/tcp_output.c:2923
       tcp_retransmit_timer+0xc50/0x3060 net/ipv4/tcp_timer.c:488
       tcp_write_timer_handler+0x339/0x960 net/ipv4/tcp_timer.c:573
       tcp_write_timer+0x111/0x1d0 net/ipv4/tcp_timer.c:593
       call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
       expire_timers kernel/time/timer.c:1363 [inline]
       __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
       run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
       __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
       invoke_softirq kernel/softirq.c:365 [inline]
       irq_exit+0x1d1/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:525 [inline]
       smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
      
      Fixes: cf60af03 ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bbe138a
    • Eric Dumazet's avatar
      sock_diag: fix use-after-free read in __sk_free · 8e299f7a
      Eric Dumazet authored
      [ Upstream commit 9709020c ]
      
      We must not call sock_diag_has_destroy_listeners(sk) on a socket
      that has no reference on net structure.
      
      BUG: KASAN: use-after-free in sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
      BUG: KASAN: use-after-free in __sk_free+0x329/0x340 net/core/sock.c:1609
      Read of size 8 at addr ffff88018a02e3a0 by task swapper/1/0
      
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.17.0-rc5+ #54
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline]
       __sk_free+0x329/0x340 net/core/sock.c:1609
       sk_free+0x42/0x50 net/core/sock.c:1623
       sock_put include/net/sock.h:1664 [inline]
       reqsk_free include/net/request_sock.h:116 [inline]
       reqsk_put include/net/request_sock.h:124 [inline]
       inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:672 [inline]
       reqsk_timer_handler+0xe27/0x10e0 net/ipv4/inet_connection_sock.c:739
       call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
       expire_timers kernel/time/timer.c:1363 [inline]
       __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
       run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
       __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
       invoke_softirq kernel/softirq.c:365 [inline]
       irq_exit+0x1d1/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:525 [inline]
       smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
       </IRQ>
      RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
      RSP: 0018:ffff8801d9ae7c38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
      RAX: dffffc0000000000 RBX: 1ffff1003b35cf8a RCX: 0000000000000000
      RDX: 1ffffffff11a30d0 RSI: 0000000000000001 RDI: ffffffff88d18680
      RBP: ffff8801d9ae7c38 R08: ffffed003b5e46c3 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
      R13: ffff8801d9ae7cf0 R14: ffffffff897bef20 R15: 0000000000000000
       arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
       default_idle+0xc2/0x440 arch/x86/kernel/process.c:354
       arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345
       default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
       cpuidle_idle_call kernel/sched/idle.c:153 [inline]
       do_idle+0x395/0x560 kernel/sched/idle.c:262
       cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:368
       start_secondary+0x426/0x5b0 arch/x86/kernel/smpboot.c:269
       secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
      
      Allocated by task 4557:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
       kmem_cache_zalloc include/linux/slab.h:691 [inline]
       net_alloc net/core/net_namespace.c:383 [inline]
       copy_net_ns+0x159/0x4c0 net/core/net_namespace.c:423
       create_new_namespaces+0x69d/0x8f0 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xc3/0x1f0 kernel/nsproxy.c:206
       ksys_unshare+0x708/0xf90 kernel/fork.c:2408
       __do_sys_unshare kernel/fork.c:2476 [inline]
       __se_sys_unshare kernel/fork.c:2474 [inline]
       __x64_sys_unshare+0x31/0x40 kernel/fork.c:2474
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 69:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
       net_free net/core/net_namespace.c:399 [inline]
       net_drop_ns.part.14+0x11a/0x130 net/core/net_namespace.c:406
       net_drop_ns net/core/net_namespace.c:405 [inline]
       cleanup_net+0x6a1/0xb20 net/core/net_namespace.c:541
       process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
       worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
       kthread+0x345/0x410 kernel/kthread.c:240
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      
      The buggy address belongs to the object at ffff88018a02c140
       which belongs to the cache net_namespace of size 8832
      The buggy address is located 8800 bytes inside of
       8832-byte region [ffff88018a02c140, ffff88018a02e3c0)
      The buggy address belongs to the page:
      page:ffffea0006280b00 count:1 mapcount:0 mapping:ffff88018a02c140 index:0x0 compound_mapcount: 0
      flags: 0x2fffc0000008100(slab|head)
      raw: 02fffc0000008100 ffff88018a02c140 0000000000000000 0000000100000001
      raw: ffffea00062a1320 ffffea0006268020 ffff8801d9bdde40 0000000000000000
      page dumped because: kasan: bad access detected
      
      Fixes: b922622e ("sock_diag: don't broadcast kernel sockets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Craig Gallek <kraig@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e299f7a
    • Willem de Bruijn's avatar
      packet: in packet_snd start writing at link layer allocation · d9fb8cc2
      Willem de Bruijn authored
      [ Upstream commit b84bbaf7 ]
      
      Packet sockets allow construction of packets shorter than
      dev->hard_header_len to accommodate protocols with variable length
      link layer headers. These packets are padded to dev->hard_header_len,
      because some device drivers interpret that as a minimum packet size.
      
      packet_snd reserves dev->hard_header_len bytes on allocation.
      SOCK_DGRAM sockets call skb_push in dev_hard_header() to ensure that
      link layer headers are stored in the reserved range. SOCK_RAW sockets
      do the same in tpacket_snd, but not in packet_snd.
      
      Syzbot was able to send a zero byte packet to a device with massive
      116B link layer header, causing padding to cross over into skb_shinfo.
      Fix this by writing from the start of the llheader reserved range also
      in the case of packet_snd/SOCK_RAW.
      
      Update skb_set_network_header to the new offset. This also corrects
      it for SOCK_DGRAM, where it incorrectly double counted reserve due to
      the skb_push in dev_hard_header.
      
      Fixes: 9ed988cd ("packet: validate variable length ll headers")
      Reported-by: syzbot+71d74a5406d02057d559@syzkaller.appspotmail.com
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9fb8cc2
    • Willem de Bruijn's avatar
      net: test tailroom before appending to linear skb · 671cf50f
      Willem de Bruijn authored
      [ Upstream commit 113f99c3 ]
      
      Device features may change during transmission. In particular with
      corking, a device may toggle scatter-gather in between allocating
      and writing to an skb.
      
      Do not unconditionally assume that !NETIF_F_SG at write time implies
      that the same held at alloc time and thus the skb has sufficient
      tailroom.
      
      This issue predates git history.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      671cf50f
    • Liu Bo's avatar
      btrfs: fix reading stale metadata blocks after degraded raid1 mounts · e2da30c5
      Liu Bo authored
      commit 02a3307a upstream.
      
      If a btree block, aka. extent buffer, is not available in the extent
      buffer cache, it'll be read out from the disk instead, i.e.
      
      btrfs_search_slot()
        read_block_for_search()  # hold parent and its lock, go to read child
          btrfs_release_path()
          read_tree_block()  # read child
      
      Unfortunately, the parent lock got released before reading child, so
      commit 5bdd3536 ("Btrfs: Fix block generation verification race") had
      used 0 as parent transid to read the child block.  It forces
      read_tree_block() not to check if parent transid is different with the
      generation id of the child that it reads out from disk.
      
      A simple PoC is included in btrfs/124,
      
      0. A two-disk raid1 btrfs,
      
      1. Right after mkfs.btrfs, block A is allocated to be device tree's root.
      
      2. Mount this filesystem and put it in use, after a while, device tree's
         root got COW but block A hasn't been allocated/overwritten yet.
      
      3. Umount it and reload the btrfs module to remove both disks from the
         global @fs_devices list.
      
      4. mount -odegraded dev1 and write some data, so now block A is allocated
         to be a leaf in checksum tree.  Note that only dev1 has the latest
         metadata of this filesystem.
      
      5. Umount it and mount it again normally (with both disks), since raid1
         can pick up one disk by the writer task's pid, if btrfs_search_slot()
         needs to read block A, dev2 which does NOT have the latest metadata
         might be read for block A, then we got a stale block A.
      
      6. As parent transid is not checked, block A is marked as uptodate and
         put into the extent buffer cache, so the future search won't bother
         to read disk again, which means it'll make changes on this stale
         one and make it dirty and flush it onto disk.
      
      To avoid the problem, parent transid needs to be passed to
      read_tree_block().
      
      In order to get a valid parent transid, we need to hold the parent's
      lock until finishing reading child.
      
      This patch needs to be slightly adapted for stable kernels, the
      &first_key parameter added to read_tree_block() is from 4.16+
      (581c1760). The fix is to replace 0 by 'gen'.
      
      Fixes: 5bdd3536 ("Btrfs: Fix block generation verification race")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      [ update changelog ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2da30c5
    • Anand Jain's avatar
      btrfs: fix crash when trying to resume balance without the resume flag · 68dea4bd
      Anand Jain authored
      commit 02ee654d upstream.
      
      We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance()
      only, which isn't called during the remount. So when resuming from
      the paused balance we hit the bug:
      
       kernel: kernel BUG at fs/btrfs/volumes.c:3890!
       ::
       kernel:  balance_kthread+0x51/0x60 [btrfs]
       kernel:  kthread+0x111/0x130
       ::
       kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8
      
      Reproducer:
        On a mounted filesystem:
      
        btrfs balance start --full-balance /btrfs
        btrfs balance pause /btrfs
        mount -o remount,ro /dev/sdb /btrfs
        mount -o remount,rw /dev/sdb /btrfs
      
      To fix this set the BTRFS_BALANCE_RESUME flag in
      btrfs_resume_balance_async().
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68dea4bd
    • Filipe Manana's avatar
      Btrfs: fix xattr loss after power failure · 72d1df8a
      Filipe Manana authored
      commit 9a8fca62 upstream.
      
      If a file has xattrs, we fsync it, to ensure we clear the flags
      BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its
      inode, the current transaction commits and then we fsync it (without
      either of those bits being set in its inode), we end up not logging
      all its xattrs. This results in deleting all xattrs when replying the
      log after a power failure.
      
      Trivial reproducer
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ touch /mnt/foobar
        $ setfattr -n user.xa -v qwerty /mnt/foobar
        $ xfs_io -c "fsync" /mnt/foobar
      
        $ sync
      
        $ xfs_io -c "pwrite -S 0xab 0 64K" /mnt/foobar
        $ xfs_io -c "fsync" /mnt/foobar
        <power failure>
      
        $ mount /dev/sdb /mnt
        $ getfattr --absolute-names --dump /mnt/foobar
        <empty output>
        $
      
      So fix this by making sure all xattrs are logged if we log a file's inode
      item and neither the flags BTRFS_INODE_NEEDS_FULL_SYNC nor
      BTRFS_INODE_COPY_EVERYTHING were set in the inode.
      
      Fixes: 36283bf7 ("Btrfs: fix fsync xattr loss in the fast fsync path")
      Cc: <stable@vger.kernel.org> # 4.2+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72d1df8a