1. 11 Oct, 2019 6 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Check for MMU ready on piggybacked virtual cores · 4faa7f05
      Paul Mackerras authored
      commit d28eafc5 upstream.
      
      When we are running multiple vcores on the same physical core, they
      could be from different VMs and so it is possible that one of the
      VMs could have its arch.mmu_ready flag cleared (for example by a
      concurrent HPT resize) when we go to run it on a physical core.
      We currently check the arch.mmu_ready flag for the primary vcore
      but not the flags for the other vcores that will be run alongside
      it.  This adds that check, and also a check when we select the
      secondary vcores from the preempted vcores list.
      
      Cc: stable@vger.kernel.org # v4.14+
      Fixes: 38c53af8 ("KVM: PPC: Book3S HV: Fix exclusion between HPT resizing and other HPT updates")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4faa7f05
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts · 577a5119
      Paul Mackerras authored
      commit 959c5d51 upstream.
      
      Escalation interrupts are interrupts sent to the host by the XIVE
      hardware when it has an interrupt to deliver to a guest VCPU but that
      VCPU is not running anywhere in the system.  Hence we disable the
      escalation interrupt for the VCPU being run when we enter the guest
      and re-enable it when the guest does an H_CEDE hypercall indicating
      it is idle.
      
      It is possible that an escalation interrupt gets generated just as we
      are entering the guest.  In that case the escalation interrupt may be
      using a queue entry in one of the interrupt queues, and that queue
      entry may not have been processed when the guest exits with an H_CEDE.
      The existing entry code detects this situation and does not clear the
      vcpu->arch.xive_esc_on flag as an indication that there is a pending
      queue entry (if the queue entry gets processed, xive_esc_irq() will
      clear the flag).  There is a comment in the code saying that if the
      flag is still set on H_CEDE, we have to abort the cede rather than
      re-enabling the escalation interrupt, lest we end up with two
      occurrences of the escalation interrupt in the interrupt queue.
      
      However, the exit code doesn't do that; it aborts the cede in the sense
      that vcpu->arch.ceded gets cleared, but it still enables the escalation
      interrupt by setting the source's PQ bits to 00.  Instead we need to
      set the PQ bits to 10, indicating that an interrupt has been triggered.
      We also need to avoid setting vcpu->arch.xive_esc_on in this case
      (i.e. vcpu->arch.xive_esc_on seen to be set on H_CEDE) because
      xive_esc_irq() will run at some point and clear it, and if we race with
      that we may end up with an incorrect result (i.e. xive_esc_on set when
      the escalation interrupt has just been handled).
      
      It is extremely unlikely that having two queue entries would cause
      observable problems; theoretically it could cause queue overflow, but
      the CPU would have to have thousands of interrupts targetted to it for
      that to be possible.  However, this fix will also make it possible to
      determine accurately whether there is an unhandled escalation
      interrupt in the queue, which will be needed by the following patch.
      
      Fixes: 9b9b13a6 ("KVM: PPC: Book3S HV: Keep XIVE escalation interrupt masked unless ceded")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190813100349.GD9567@blackberrySigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      577a5119
    • Vasily Gorbik's avatar
      s390/cio: exclude subchannels with no parent from pseudo check · 46cb14a5
      Vasily Gorbik authored
      commit ab575884 upstream.
      
      ccw console is created early in start_kernel and used before css is
      initialized or ccw console subchannel is registered. Until then console
      subchannel does not have a parent. For that reason assume subchannels
      with no parent are not pseudo subchannels. This fixes the following
      kasan finding:
      
      BUG: KASAN: global-out-of-bounds in sch_is_pseudo_sch+0x8e/0x98
      Read of size 8 at addr 00000000000005e8 by task swapper/0/0
      
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc8-07370-g6ac43dd12538 #2
      Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
      Call Trace:
      ([<000000000012cd76>] show_stack+0x14e/0x1e0)
       [<0000000001f7fb44>] dump_stack+0x1a4/0x1f8
       [<00000000007d7afc>] print_address_description+0x64/0x3c8
       [<00000000007d75f6>] __kasan_report+0x14e/0x180
       [<00000000018a2986>] sch_is_pseudo_sch+0x8e/0x98
       [<000000000189b950>] cio_enable_subchannel+0x1d0/0x510
       [<00000000018cac7c>] ccw_device_recognition+0x12c/0x188
       [<0000000002ceb1a8>] ccw_device_enable_console+0x138/0x340
       [<0000000002cf1cbe>] con3215_init+0x25e/0x300
       [<0000000002c8770a>] console_init+0x68a/0x9b8
       [<0000000002c6a3d6>] start_kernel+0x4fe/0x728
       [<0000000000100070>] startup_continue+0x70/0xd0
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarSebastian Ott <sebott@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46cb14a5
    • Vasily Gorbik's avatar
      s390/topology: avoid firing events before kobjs are created · 9aa823b3
      Vasily Gorbik authored
      commit f3122a79 upstream.
      
      arch_update_cpu_topology is first called from:
      kernel_init_freeable->sched_init_smp->sched_init_domains
      
      even before cpus has been registered in:
      kernel_init_freeable->do_one_initcall->s390_smp_init
      
      Do not trigger kobject_uevent change events until cpu devices are
      actually created. Fixes the following kasan findings:
      
      BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb40/0xee0
      Read of size 8 at addr 0000000000000020 by task swapper/0/1
      
      BUG: KASAN: global-out-of-bounds in kobject_uevent_env+0xb36/0xee0
      Read of size 8 at addr 0000000000000018 by task swapper/0/1
      
      CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B
      Hardware name: IBM 3906 M04 704 (LPAR)
      Call Trace:
      ([<0000000143c6db7e>] show_stack+0x14e/0x1a8)
       [<0000000145956498>] dump_stack+0x1d0/0x218
       [<000000014429fb4c>] print_address_description+0x64/0x380
       [<000000014429f630>] __kasan_report+0x138/0x168
       [<0000000145960b96>] kobject_uevent_env+0xb36/0xee0
       [<0000000143c7c47c>] arch_update_cpu_topology+0x104/0x108
       [<0000000143df9e22>] sched_init_domains+0x62/0xe8
       [<000000014644c94a>] sched_init_smp+0x3a/0xc0
       [<0000000146433a20>] kernel_init_freeable+0x558/0x958
       [<000000014599002a>] kernel_init+0x22/0x160
       [<00000001459a71d4>] ret_from_fork+0x28/0x30
       [<00000001459a71dc>] kernel_thread_starter+0x0/0x10
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9aa823b3
    • Thomas Huth's avatar
      KVM: s390: Test for bad access register and size at the start of S390_MEM_OP · ddfef75f
      Thomas Huth authored
      commit a13b03bb upstream.
      
      If the KVM_S390_MEM_OP ioctl is called with an access register >= 16,
      then there is certainly a bug in the calling userspace application.
      We check for wrong access registers, but only if the vCPU was already
      in the access register mode before (i.e. the SIE block has recorded
      it). The check is also buried somewhere deep in the calling chain (in
      the function ar_translation()), so this is somewhat hard to find.
      
      It's better to always report an error to the userspace in case this
      field is set wrong, and it's safer in the KVM code if we block wrong
      values here early instead of relying on a check somewhere deep down
      the calling chain, so let's add another check to kvm_s390_guest_mem_op()
      directly.
      
      We also should check that the "size" is non-zero here (thanks to Janosch
      Frank for the hint!). If we do not check the size, we could call vmalloc()
      with this 0 value, and this will cause a kernel warning.
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Link: https://lkml.kernel.org/r/20190829122517.31042-1-thuth@redhat.comReviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Reviewed-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddfef75f
    • Vasily Gorbik's avatar
      s390/process: avoid potential reading of freed stack · 8b41a30f
      Vasily Gorbik authored
      commit 8769f610 upstream.
      
      With THREAD_INFO_IN_TASK (which is selected on s390) task's stack usage
      is refcounted and should always be protected by get/put when touching
      other task's stack to avoid race conditions with task's destruction code.
      
      Fixes: d5c352cd ("s390: move thread_info into task_struct")
      Cc: stable@vger.kernel.org # v4.10+
      Acked-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b41a30f
  2. 07 Oct, 2019 34 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.19.78 · 58fce206
      Greg Kroah-Hartman authored
      58fce206
    • Bharath Vedartham's avatar
      9p/cache.c: Fix memory leak in v9fs_cache_session_get_cookie · 5b0446c8
      Bharath Vedartham authored
      commit 962a991c upstream.
      
      v9fs_cache_session_get_cookie assigns a random cachetag to v9ses->cachetag,
      if the cachetag is not assigned previously.
      
      v9fs_random_cachetag allocates memory to v9ses->cachetag with kmalloc and uses
      scnprintf to fill it up with a cachetag.
      
      But if scnprintf fails, v9ses->cachetag is not freed in the current
      code causing a memory leak.
      
      Fix this by freeing v9ses->cachetag it v9fs_random_cachetag fails.
      
      This was reported by syzbot, the link to the report is below:
      https://syzkaller.appspot.com/bug?id=f012bdf297a7a4c860c38a88b44fbee43fd9bbf3
      
      Link: http://lkml.kernel.org/r/20190522194519.GA5313@bharath12345-Inspiron-5559
      Reported-by: syzbot+3a030a73b6c1e9833815@syzkaller.appspotmail.com
      Signed-off-by: default avatarBharath Vedartham <linux.bhar@gmail.com>
      Signed-off-by: default avatarDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b0446c8
    • Tetsuo Handa's avatar
      kexec: bail out upon SIGKILL when allocating memory. · d85bc11a
      Tetsuo Handa authored
      commit 7c3a6aed upstream.
      
      syzbot found that a thread can stall for minutes inside kexec_load() after
      that thread was killed by SIGKILL [1].  It turned out that the reproducer
      was trying to allocate 2408MB of memory using kimage_alloc_page() from
      kimage_load_normal_segment().  Let's check for SIGKILL before doing memory
      allocation.
      
      [1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e
      
      Link: http://lkml.kernel.org/r/993c9185-d324-2640-d061-bed2dd18b1f7@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarsyzbot <syzbot+8ab2d0f39fb79fe6ca40@syzkaller.appspotmail.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d85bc11a
    • Andrey Konovalov's avatar
      NFC: fix attrs checks in netlink interface · c8a65ec0
      Andrey Konovalov authored
      commit 18917d51 upstream.
      
      nfc_genl_deactivate_target() relies on the NFC_ATTR_TARGET_INDEX
      attribute being present, but doesn't check whether it is actually
      provided by the user. Same goes for nfc_genl_fw_download() and
      NFC_ATTR_FIRMWARE_NAME.
      
      This patch adds appropriate checks.
      
      Found with syzkaller.
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8a65ec0
    • Eric Biggers's avatar
      smack: use GFP_NOFS while holding inode_smack::smk_lock · 1b425032
      Eric Biggers authored
      commit e5bfad3d upstream.
      
      inode_smack::smk_lock is taken during smack_d_instantiate(), which is
      called during a filesystem transaction when creating a file on ext4.
      Therefore to avoid a deadlock, all code that takes this lock must use
      GFP_NOFS, to prevent memory reclaim from waiting for the filesystem
      transaction to complete.
      
      Reported-by: syzbot+0eefc1e06a77d327a056@syzkaller.appspotmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b425032
    • Jann Horn's avatar
      Smack: Don't ignore other bprm->unsafe flags if LSM_UNSAFE_PTRACE is set · ef9744a0
      Jann Horn authored
      commit 3675f052 upstream.
      
      There is a logic bug in the current smack_bprm_set_creds():
      If LSM_UNSAFE_PTRACE is set, but the ptrace state is deemed to be
      acceptable (e.g. because the ptracer detached in the meantime), the other
      ->unsafe flags aren't checked. As far as I can tell, this means that
      something like the following could work (but I haven't tested it):
      
       - task A: create task B with fork()
       - task B: set NO_NEW_PRIVS
       - task B: install a seccomp filter that makes open() return 0 under some
         conditions
       - task B: replace fd 0 with a malicious library
       - task A: attach to task B with PTRACE_ATTACH
       - task B: execve() a file with an SMACK64EXEC extended attribute
       - task A: while task B is still in the middle of execve(), exit (which
         destroys the ptrace relationship)
      
      Make sure that if any flags other than LSM_UNSAFE_PTRACE are set in
      bprm->unsafe, we reject the execve().
      
      Cc: stable@vger.kernel.org
      Fixes: 5663884c ("Smack: unify all ptrace accesses in the smack")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef9744a0
    • Pierre-Louis Bossart's avatar
      soundwire: fix regmap dependencies and align with other serial links · 47035934
      Pierre-Louis Bossart authored
      [ Upstream commit 8676b3ca ]
      
      The existing code has a mixed select/depend usage which makes no sense.
      
      config SOUNDWIRE_BUS
             tristate
             select REGMAP_SOUNDWIRE
      
      config REGMAP_SOUNDWIRE
              tristate
              depends on SOUNDWIRE_BUS
      
      Let's remove one layer of Kconfig definitions and align with the
      solutions used by all other serial links.
      Signed-off-by: default avatarPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Link: https://lore.kernel.org/r/20190718230215.18675-1-pierre-louis.bossart@linux.intel.comSigned-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      47035934
    • Pierre-Louis Bossart's avatar
      soundwire: Kconfig: fix help format · 322753c7
      Pierre-Louis Bossart authored
      [ Upstream commit 9d7cd9d5 ]
      
      Move to the regular help format, --help-- is no longer recommended.
      Reviewed-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      322753c7
    • Eric Dumazet's avatar
      sch_cbq: validate TCA_CBQ_WRROPT to avoid crash · 74e2a311
      Eric Dumazet authored
      [ Upstream commit e9789c7c ]
      
      syzbot reported a crash in cbq_normalize_quanta() caused
      by an out of range cl->priority.
      
      iproute2 enforces this check, but malicious users do not.
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN PTI
      Modules linked in:
      CPU: 1 PID: 26447 Comm: syz-executor.1 Not tainted 5.3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:cbq_normalize_quanta.part.0+0x1fd/0x430 net/sched/sch_cbq.c:902
      RSP: 0018:ffff8801a5c333b0 EFLAGS: 00010206
      RAX: 0000000020000003 RBX: 00000000fffffff8 RCX: ffffc9000712f000
      RDX: 00000000000043bf RSI: ffffffff83be8962 RDI: 0000000100000018
      RBP: ffff8801a5c33420 R08: 000000000000003a R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000002ef
      R13: ffff88018da95188 R14: dffffc0000000000 R15: 0000000000000015
      FS:  00007f37d26b1700(0000) GS:ffff8801dad00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004c7cec CR3: 00000001bcd0a006 CR4: 00000000001626f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       [<ffffffff83be9d57>] cbq_normalize_quanta include/net/pkt_sched.h:27 [inline]
       [<ffffffff83be9d57>] cbq_addprio net/sched/sch_cbq.c:1097 [inline]
       [<ffffffff83be9d57>] cbq_set_wrr+0x2d7/0x450 net/sched/sch_cbq.c:1115
       [<ffffffff83bee8a7>] cbq_change_class+0x987/0x225b net/sched/sch_cbq.c:1537
       [<ffffffff83b96985>] tc_ctl_tclass+0x555/0xcd0 net/sched/sch_api.c:2329
       [<ffffffff83a84655>] rtnetlink_rcv_msg+0x485/0xc10 net/core/rtnetlink.c:5248
       [<ffffffff83cadf0a>] netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2510
       [<ffffffff83a7db6d>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5266
       [<ffffffff83cac2c6>] netlink_unicast_kernel net/netlink/af_netlink.c:1324 [inline]
       [<ffffffff83cac2c6>] netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1350
       [<ffffffff83cacd4a>] netlink_sendmsg+0x89a/0xd50 net/netlink/af_netlink.c:1939
       [<ffffffff8399d46e>] sock_sendmsg_nosec net/socket.c:673 [inline]
       [<ffffffff8399d46e>] sock_sendmsg+0x12e/0x170 net/socket.c:684
       [<ffffffff8399f1fd>] ___sys_sendmsg+0x81d/0x960 net/socket.c:2359
       [<ffffffff839a2d05>] __sys_sendmsg+0x105/0x1d0 net/socket.c:2397
       [<ffffffff839a2df9>] SYSC_sendmsg net/socket.c:2406 [inline]
       [<ffffffff839a2df9>] SyS_sendmsg+0x29/0x30 net/socket.c:2404
       [<ffffffff8101ccc8>] do_syscall_64+0x528/0x770 arch/x86/entry/common.c:305
       [<ffffffff84400091>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74e2a311
    • Tuong Lien's avatar
      tipc: fix unlimited bundling of small messages · ed9420dd
      Tuong Lien authored
      [ Upstream commit e95584a8 ]
      
      We have identified a problem with the "oversubscription" policy in the
      link transmission code.
      
      When small messages are transmitted, and the sending link has reached
      the transmit window limit, those messages will be bundled and put into
      the link backlog queue. However, bundles of data messages are counted
      at the 'CRITICAL' level, so that the counter for that level, instead of
      the counter for the real, bundled message's level is the one being
      increased.
      Subsequent, to-be-bundled data messages at non-CRITICAL levels continue
      to be tested against the unchanged counter for their own level, while
      contributing to an unrestrained increase at the CRITICAL backlog level.
      
      This leaves a gap in congestion control algorithm for small messages
      that can result in starvation for other users or a "real" CRITICAL
      user. Even that eventually can lead to buffer exhaustion & link reset.
      
      We fix this by keeping a 'target_bskb' buffer pointer at each levels,
      then when bundling, we only bundle messages at the same importance
      level only. This way, we know exactly how many slots a certain level
      have occupied in the queue, so can manage level congestion accurately.
      
      By bundling messages at the same level, we even have more benefits. Let
      consider this:
      - One socket sends 64-byte messages at the 'CRITICAL' level;
      - Another sends 4096-byte messages at the 'LOW' level;
      
      When a 64-byte message comes and is bundled the first time, we put the
      overhead of message bundle to it (+ 40-byte header, data copy, etc.)
      for later use, but the next message can be a 4096-byte one that cannot
      be bundled to the previous one. This means the last bundle carries only
      one payload message which is totally inefficient, as for the receiver
      also! Later on, another 64-byte message comes, now we make a new bundle
      and the same story repeats...
      
      With the new bundling algorithm, this will not happen, the 64-byte
      messages will be bundled together even when the 4096-byte message(s)
      comes in between. However, if the 4096-byte messages are sent at the
      same level i.e. 'CRITICAL', the bundling algorithm will again cause the
      same overhead.
      
      Also, the same will happen even with only one socket sending small
      messages at a rate close to the link transmit's one, so that, when one
      message is bundled, it's transmitted shortly. Then, another message
      comes, a new bundle is created and so on...
      
      We will solve this issue radically by another patch.
      
      Fixes: 365ad353 ("tipc: reduce risk of user starvation during link congestion")
      Reported-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed9420dd
    • Dongli Zhang's avatar
      xen-netfront: do not use ~0U as error return value for xennet_fill_frags() · a1afd826
      Dongli Zhang authored
      [ Upstream commit a761129e ]
      
      xennet_fill_frags() uses ~0U as return value when the sk_buff is not able
      to cache extra fragments. This is incorrect because the return type of
      xennet_fill_frags() is RING_IDX and 0xffffffff is an expected value for
      ring buffer index.
      
      In the situation when the rsp_cons is approaching 0xffffffff, the return
      value of xennet_fill_frags() may become 0xffffffff which xennet_poll() (the
      caller) would regard as error. As a result, queue->rx.rsp_cons is set
      incorrectly because it is updated only when there is error. If there is no
      error, xennet_poll() would be responsible to update queue->rx.rsp_cons.
      Finally, queue->rx.rsp_cons would point to the rx ring buffer entries whose
      queue->rx_skbs[i] and queue->grant_rx_ref[i] are already cleared to NULL.
      This leads to NULL pointer access in the next iteration to process rx ring
      buffer entries.
      
      The symptom is similar to the one fixed in
      commit 00b36850 ("xen-netfront: do not assume sk_buff_head list is
      empty in error handling").
      
      This patch changes the return type of xennet_fill_frags() to indicate
      whether it is successful or failed. The queue->rx.rsp_cons will be
      always updated inside this function.
      
      Fixes: ad4f15dc ("xen/netfront: don't bug in case of too many frags")
      Signed-off-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1afd826
    • Dotan Barak's avatar
      net/rds: Fix error handling in rds_ib_add_one() · 36a4043c
      Dotan Barak authored
      [ Upstream commit d64bf89a ]
      
      rds_ibdev:ipaddr_list and rds_ibdev:conn_list are initialized
      after allocation some resources such as protection domain.
      If allocation of such resources fail, then these uninitialized
      variables are accessed in rds_ib_dev_free() in failure path. This
      can potentially crash the system. The code has been updated to
      initialize these variables very early in the function.
      Signed-off-by: default avatarDotan Barak <dotanb@dev.mellanox.co.il>
      Signed-off-by: default avatarSudhakar Dindukurti <sudhakar.dindukurti@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36a4043c
    • Josh Hunt's avatar
      udp: only do GSO if # of segs > 1 · 012363f5
      Josh Hunt authored
      [ Upstream commit 4094871d ]
      
      Prior to this change an application sending <= 1MSS worth of data and
      enabling UDP GSO would fail if the system had SW GSO enabled, but the
      same send would succeed if HW GSO offload is enabled. In addition to this
      inconsistency the error in the SW GSO case does not get back to the
      application if sending out of a real device so the user is unaware of this
      failure.
      
      With this change we only perform GSO if the # of segments is > 1 even
      if the application has enabled segmentation. I've also updated the
      relevant udpgso selftests.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Signed-off-by: default avatarJosh Hunt <johunt@akamai.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      012363f5
    • Linus Walleij's avatar
      net: dsa: rtl8366: Check VLAN ID and not ports · 5c08d7e4
      Linus Walleij authored
      [ Upstream commit e8521e53 ]
      
      There has been some confusion between the port number and
      the VLAN ID in this driver. What we need to check for
      validity is the VLAN ID, nothing else.
      
      The current confusion came from assigning a few default
      VLANs for default routing and we need to rewrite that
      properly.
      
      Instead of checking if the port number is a valid VLAN
      ID, check the actual VLAN IDs passed in to the callback
      one by one as expected.
      
      Fixes: d8652956 ("net: dsa: realtek-smi: Add Realtek SMI driver")
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c08d7e4
    • Dexuan Cui's avatar
      vsock: Fix a lockdep warning in __vsock_release() · 3c1f0704
      Dexuan Cui authored
      [ Upstream commit 0d9138ff ]
      
      Lockdep is unhappy if two locks from the same class are held.
      
      Fix the below warning for hyperv and virtio sockets (vmci socket code
      doesn't have the issue) by using lock_sock_nested() when __vsock_release()
      is called recursively:
      
      ============================================
      WARNING: possible recursive locking detected
      5.3.0+ #1 Not tainted
      --------------------------------------------
      server/1795 is trying to acquire lock:
      ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]
      
      but task is already holding lock:
      ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(sk_lock-AF_VSOCK);
        lock(sk_lock-AF_VSOCK);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by server/1795:
       #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
       #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      stack backtrace:
      CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
      Call Trace:
       dump_stack+0x67/0x90
       __lock_acquire.cold.67+0xd2/0x20b
       lock_acquire+0xb5/0x1c0
       lock_sock_nested+0x6d/0x90
       hvs_release+0x10/0x120 [hv_sock]
       __vsock_release+0x24/0xf0 [vsock]
       __vsock_release+0xa0/0xf0 [vsock]
       vsock_release+0x12/0x30 [vsock]
       __sock_release+0x37/0xa0
       sock_close+0x14/0x20
       __fput+0xc1/0x250
       task_work_run+0x98/0xc0
       do_exit+0x344/0xc60
       do_group_exit+0x47/0xb0
       get_signal+0x15c/0xc50
       do_signal+0x30/0x720
       exit_to_usermode_loop+0x50/0xa0
       do_syscall_64+0x24e/0x270
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f4184e85f31
      Tested-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c1f0704
    • Josh Hunt's avatar
      udp: fix gso_segs calculations · 544aee54
      Josh Hunt authored
      [ Upstream commit 44b321e5 ]
      
      Commit dfec0ee2 ("udp: Record gso_segs when supporting UDP segmentation offload")
      added gso_segs calculation, but incorrectly got sizeof() the pointer and
      not the underlying data type. In addition let's fix the v6 case.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Fixes: dfec0ee2 ("udp: Record gso_segs when supporting UDP segmentation offload")
      Signed-off-by: default avatarJosh Hunt <johunt@akamai.com>
      Reviewed-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      544aee54
    • Eric Dumazet's avatar
      sch_dsmark: fix potential NULL deref in dsmark_init() · 79fd59ae
      Eric Dumazet authored
      [ Upstream commit 474f0813 ]
      
      Make sure TCA_DSMARK_INDICES was provided by the user.
      
      syzbot reported :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 8799 Comm: syz-executor235 Not tainted 5.3.0+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:nla_get_u16 include/net/netlink.h:1501 [inline]
      RIP: 0010:dsmark_init net/sched/sch_dsmark.c:364 [inline]
      RIP: 0010:dsmark_init+0x193/0x640 net/sched/sch_dsmark.c:339
      Code: 85 db 58 0f 88 7d 03 00 00 e8 e9 1a ac fb 48 8b 9d 70 ff ff ff 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 ca
      RSP: 0018:ffff88809426f3b8 EFLAGS: 00010247
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff85c6eb09
      RDX: 0000000000000000 RSI: ffffffff85c6eb17 RDI: 0000000000000004
      RBP: ffff88809426f4b0 R08: ffff88808c4085c0 R09: ffffed1015d26159
      R10: ffffed1015d26158 R11: ffff8880ae930ac7 R12: ffff8880a7e96940
      R13: dffffc0000000000 R14: ffff88809426f8c0 R15: 0000000000000000
      FS:  0000000001292880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000080 CR3: 000000008ca1b000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       qdisc_create+0x4ee/0x1210 net/sched/sch_api.c:1237
       tc_modify_qdisc+0x524/0x1c50 net/sched/sch_api.c:1653
       rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5223
       netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5241
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x8a5/0xd60 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:657
       ___sys_sendmsg+0x803/0x920 net/socket.c:2311
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2356
       __do_sys_sendmsg net/socket.c:2365 [inline]
       __se_sys_sendmsg net/socket.c:2363 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2363
       do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440369
      
      Fixes: 758cc43c ("[PKT_SCHED]: Fix dsmark to apply changes consistent")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79fd59ae
    • David Howells's avatar
      rxrpc: Fix rxrpc_recvmsg tracepoint · 76b55277
      David Howells authored
      [ Upstream commit db9b2e0a ]
      
      Fix the rxrpc_recvmsg tracepoint to handle being called with a NULL call
      parameter.
      
      Fixes: a25e21f0 ("rxrpc, afs: Use debug_ids rather than pointers in traces")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76b55277
    • Reinhard Speyerer's avatar
      qmi_wwan: add support for Cinterion CLS8 devices · 7047aae6
      Reinhard Speyerer authored
      [ Upstream commit cf74ac6d ]
      
      Add support for Cinterion CLS8 devices.
      Use QMI_QUIRK_SET_DTR as required for Qualcomm MDM9x07 chipsets.
      
      T:  Bus=01 Lev=03 Prnt=05 Port=01 Cnt=02 Dev#= 25 Spd=480  MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=1e2d ProdID=00b0 Rev= 3.18
      S:  Manufacturer=GEMALTO
      S:  Product=USB Modem
      C:* #Ifs= 5 Cfg#= 1 Atr=80 MxPwr=500mA
      I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
      E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=83(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      E:  Ad=87(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      E:  Ad=89(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      E:  Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      Signed-off-by: default avatarReinhard Speyerer <rspmn@arcor.de>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7047aae6
    • Eric Dumazet's avatar
      nfc: fix memory leak in llcp_sock_bind() · dd9c580a
      Eric Dumazet authored
      [ Upstream commit a0c2dc1f ]
      
      sysbot reported a memory leak after a bind() has failed.
      
      While we are at it, abort the operation if kmemdup() has failed.
      
      BUG: memory leak
      unreferenced object 0xffff888105d83ec0 (size 32):
        comm "syz-executor067", pid 7207, jiffies 4294956228 (age 19.430s)
        hex dump (first 32 bytes):
          00 69 6c 65 20 72 65 61 64 00 6e 65 74 3a 5b 34  .ile read.net:[4
          30 32 36 35 33 33 30 39 37 5d 00 00 00 00 00 00  026533097]......
        backtrace:
          [<0000000036bac473>] kmemleak_alloc_recursive /./include/linux/kmemleak.h:43 [inline]
          [<0000000036bac473>] slab_post_alloc_hook /mm/slab.h:522 [inline]
          [<0000000036bac473>] slab_alloc /mm/slab.c:3319 [inline]
          [<0000000036bac473>] __do_kmalloc /mm/slab.c:3653 [inline]
          [<0000000036bac473>] __kmalloc_track_caller+0x169/0x2d0 /mm/slab.c:3670
          [<000000000cd39d07>] kmemdup+0x27/0x60 /mm/util.c:120
          [<000000008e57e5fc>] kmemdup /./include/linux/string.h:432 [inline]
          [<000000008e57e5fc>] llcp_sock_bind+0x1b3/0x230 /net/nfc/llcp_sock.c:107
          [<000000009cb0b5d3>] __sys_bind+0x11c/0x140 /net/socket.c:1647
          [<00000000492c3bbc>] __do_sys_bind /net/socket.c:1658 [inline]
          [<00000000492c3bbc>] __se_sys_bind /net/socket.c:1656 [inline]
          [<00000000492c3bbc>] __x64_sys_bind+0x1e/0x30 /net/socket.c:1656
          [<0000000008704b2a>] do_syscall_64+0x76/0x1a0 /arch/x86/entry/common.c:296
          [<000000009f4c57a4>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 30cc4587 ("NFC: Move LLCP code to the NFC top level diirectory")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd9c580a
    • Martin KaFai Lau's avatar
      net: Unpublish sk from sk_reuseport_cb before call_rcu · d5b1db1c
      Martin KaFai Lau authored
      [ Upstream commit 8c7138b3 ]
      
      The "reuse->sock[]" array is shared by multiple sockets.  The going away
      sk must unpublish itself from "reuse->sock[]" before making call_rcu()
      call.  However, this unpublish-action is currently done after a grace
      period and it may cause use-after-free.
      
      The fix is to move reuseport_detach_sock() to sk_destruct().
      Due to the above reason, any socket with sk_reuseport_cb has
      to go through the rcu grace period before freeing it.
      
      It is a rather old bug (~3 yrs).  The Fixes tag is not necessary
      the right commit but it is the one that introduced the SOCK_RCU_FREE
      logic and this fix is depending on it.
      
      Fixes: a4298e45 ("net: add SOCK_RCU_FREE socket flag")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5b1db1c
    • Navid Emamdoost's avatar
      net: qlogic: Fix memory leak in ql_alloc_large_buffers · 9d0995cc
      Navid Emamdoost authored
      [ Upstream commit 1acb8f2a ]
      
      In ql_alloc_large_buffers, a new skb is allocated via netdev_alloc_skb.
      This skb should be released if pci_dma_mapping_error fails.
      
      Fixes: 0f8ab89e ("qla3xxx: Check return code from pci_map_single() in ql_release_to_lrg_buf_free_list(), ql_populate_free_queue(), ql_alloc_large_buffers(), and ql3xxx_send()")
      Signed-off-by: default avatarNavid Emamdoost <navid.emamdoost@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d0995cc
    • Paolo Abeni's avatar
      net: ipv4: avoid mixed n_redirects and rate_tokens usage · 124b64fe
      Paolo Abeni authored
      [ Upstream commit b406472b ]
      
      Since commit c09551c6 ("net: ipv4: use a dedicated counter
      for icmp_v4 redirect packets") we use 'n_redirects' to account
      for redirect packets, but we still use 'rate_tokens' to compute
      the redirect packets exponential backoff.
      
      If the device sent to the relevant peer any ICMP error packet
      after sending a redirect, it will also update 'rate_token' according
      to the leaking bucket schema; typically 'rate_token' will raise
      above BITS_PER_LONG and the redirect packets backoff algorithm
      will produce undefined behavior.
      
      Fix the issue using 'n_redirects' to compute the exponential backoff
      in ip_rt_send_redirect().
      
      Note that we still clear rate_tokens after a redirect silence period,
      to avoid changing an established behaviour.
      
      The root cause predates git history; before the mentioned commit in
      the critical scenario, the kernel stopped sending redirects, after
      the mentioned commit the behavior more randomic.
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Fixes: c09551c6 ("net: ipv4: use a dedicated counter for icmp_v4 redirect packets")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      124b64fe
    • David Ahern's avatar
      ipv6: Handle missing host route in __ipv6_ifa_notify · 6f8564ed
      David Ahern authored
      [ Upstream commit 2d819d25 ]
      
      Rajendra reported a kernel panic when a link was taken down:
      
          [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
          [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
      
          <snip>
      
          [ 6870.570501] Call Trace:
          [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
          [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
          [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
          [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
          [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
          [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
          [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
          [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
          [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
          [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
          [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
          [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
          [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
      
      addrconf_dad_work is kicked to be scheduled when a device is brought
      up. There is a race between addrcond_dad_work getting scheduled and
      taking the rtnl lock and a process taking the link down (under rtnl).
      The latter removes the host route from the inet6_addr as part of
      addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
      to use the host route in __ipv6_ifa_notify. If the down event removes
      the host route due to the race to the rtnl, then the BUG listed above
      occurs.
      
      Since the DAD sequence can not be aborted, add a check for the missing
      host route in __ipv6_ifa_notify. The only way this should happen is due
      to the previously mentioned race. The host route is created when the
      address is added to an interface; it is only removed on a down event
      where the address is kept. Add a warning if the host route is missing
      AND the device is up; this is a situation that should never happen.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Reported-by: default avatarRajendra Dendukuri <rajendra.dendukuri@broadcom.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f8564ed
    • Eric Dumazet's avatar
      ipv6: drop incoming packets having a v4mapped source address · 658d7ee4
      Eric Dumazet authored
      [ Upstream commit 6af1799a ]
      
      This began with a syzbot report. syzkaller was injecting
      IPv6 TCP SYN packets having a v4mapped source address.
      
      After an unsuccessful 4-tuple lookup, TCP creates a request
      socket (SYN_RECV) and calls reqsk_queue_hash_req()
      
      reqsk_queue_hash_req() calls sk_ehashfn(sk)
      
      At this point we have AF_INET6 sockets, and the heuristic
      used by sk_ehashfn() to either hash the IPv4 or IPv6 addresses
      is to use ipv6_addr_v4mapped(&sk->sk_v6_daddr)
      
      For the particular spoofed packet, we end up hashing V4 addresses
      which were not initialized by the TCP IPv6 stack, so KMSAN fired
      a warning.
      
      I first fixed sk_ehashfn() to test both source and destination addresses,
      but then faced various problems, including user-space programs
      like packetdrill that had similar assumptions.
      
      Instead of trying to fix the whole ecosystem, it is better
      to admit that we have a dual stack behavior, and that we
      can not build linux kernels without V4 stack anyway.
      
      The dual stack API automatically forces the traffic to be IPv4
      if v4mapped addresses are used at bind() or connect(), so it makes
      no sense to allow IPv6 traffic to use the same v4mapped class.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      658d7ee4
    • Johan Hovold's avatar
      hso: fix NULL-deref on tty open · a495fd19
      Johan Hovold authored
      [ Upstream commit 8353da9f ]
      
      Fix NULL-pointer dereference on tty open due to a failure to handle a
      missing interrupt-in endpoint when probing modem ports:
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000006
      	...
      	RIP: 0010:tiocmget_submit_urb+0x1c/0xe0 [hso]
      	...
      	Call Trace:
      	hso_start_serial_device+0xdc/0x140 [hso]
      	hso_serial_open+0x118/0x1b0 [hso]
      	tty_open+0xf1/0x490
      
      Fixes: 542f5482 ("tty: Modem functions for the HSO driver")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a495fd19
    • Haishuang Yan's avatar
      erspan: remove the incorrect mtu limit for erspan · 7f30c44b
      Haishuang Yan authored
      [ Upstream commit 0e141f75 ]
      
      erspan driver calls ether_setup(), after commit 61e84623
      ("net: centralize net_device min/max MTU checking"), the range
      of mtu is [min_mtu, max_mtu], which is [68, 1500] by default.
      
      It causes the dev mtu of the erspan device to not be greater
      than 1500, this limit value is not correct for ipgre tap device.
      
      Tested:
      Before patch:
      # ip link set erspan0 mtu 1600
      Error: mtu greater than device maximum.
      After patch:
      # ip link set erspan0 mtu 1600
      # ip -d link show erspan0
      21: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1600 qdisc noop state DOWN
      mode DEFAULT group default qlen 1000
          link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 0
      
      Fixes: 61e84623 ("net: centralize net_device min/max MTU checking")
      Signed-off-by: default avatarHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f30c44b
    • Vishal Kulkarni's avatar
      cxgb4:Fix out-of-bounds MSI-X info array access · 2b838911
      Vishal Kulkarni authored
      [ Upstream commit 6b517374 ]
      
      When fetching free MSI-X vectors for ULDs, check for the error code
      before accessing MSI-X info array. Otherwise, an out-of-bounds access is
      attempted, which results in kernel panic.
      
      Fixes: 94cdb8bb ("cxgb4: Add support for dynamic allocation of resources for ULD")
      Signed-off-by: default avatarShahjada Abul Husain <shahjada@chelsio.com>
      Signed-off-by: default avatarVishal Kulkarni <vishal@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b838911
    • Daniel Borkmann's avatar
      bpf: fix use after free in prog symbol exposure · ed568ca7
      Daniel Borkmann authored
      commit c751798a upstream.
      
      syzkaller managed to trigger the warning in bpf_jit_free() which checks via
      bpf_prog_kallsyms_verify_off() for potentially unlinked JITed BPF progs
      in kallsyms, and subsequently trips over GPF when walking kallsyms entries:
      
        [...]
        8021q: adding VLAN 0 to HW filter on device batadv0
        8021q: adding VLAN 0 to HW filter on device batadv0
        WARNING: CPU: 0 PID: 9869 at kernel/bpf/core.c:810 bpf_jit_free+0x1e8/0x2a0
        Kernel panic - not syncing: panic_on_warn set ...
        CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: events bpf_prog_free_deferred
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x113/0x167 lib/dump_stack.c:113
         panic+0x212/0x40b kernel/panic.c:214
         __warn.cold.8+0x1b/0x38 kernel/panic.c:571
         report_bug+0x1a4/0x200 lib/bug.c:186
         fixup_bug arch/x86/kernel/traps.c:178 [inline]
         do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
         do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
         invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
        RIP: 0010:bpf_jit_free+0x1e8/0x2a0
        Code: 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 86 00 00 00 48 ba 00 02 00 00 00 00 ad de 0f b6 43 02 49 39 d6 0f 84 5f fe ff ff <0f> 0b e9 58 fe ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1
        RSP: 0018:ffff888092f67cd8 EFLAGS: 00010202
        RAX: 0000000000000007 RBX: ffffc90001947000 RCX: ffffffff816e9d88
        RDX: dead000000000200 RSI: 0000000000000008 RDI: ffff88808769f7f0
        RBP: ffff888092f67d00 R08: fffffbfff1394059 R09: fffffbfff1394058
        R10: fffffbfff1394058 R11: ffffffff89ca02c7 R12: ffffc90001947002
        R13: ffffc90001947020 R14: ffffffff881eca80 R15: ffff88808769f7e8
        BUG: unable to handle kernel paging request at fffffbfff400d000
        #PF error: [normal kernel read fault]
        PGD 21ffee067 P4D 21ffee067 PUD 21ffed067 PMD 9f942067 PTE 0
        Oops: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: events bpf_prog_free_deferred
        RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:495 [inline]
        RIP: 0010:bpf_tree_comp kernel/bpf/core.c:558 [inline]
        RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
        RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
        RIP: 0010:bpf_prog_kallsyms_find+0x107/0x2e0 kernel/bpf/core.c:632
        Code: 00 f0 ff ff 44 38 c8 7f 08 84 c0 0f 85 fa 00 00 00 41 f6 45 02 01 75 02 0f 0b 48 39 da 0f 82 92 00 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 03 0f 8e 45 01 00 00 8b 03 48 c1 e0
        [...]
      
      Upon further debugging, it turns out that whenever we trigger this
      issue, the kallsyms removal in bpf_prog_ksym_node_del() was /skipped/
      but yet bpf_jit_free() reported that the entry is /in use/.
      
      Problem is that symbol exposure via bpf_prog_kallsyms_add() but also
      perf_event_bpf_event() were done /after/ bpf_prog_new_fd(). Once the
      fd is exposed to the public, a parallel close request came in right
      before we attempted to do the bpf_prog_kallsyms_add().
      
      Given at this time the prog reference count is one, we start to rip
      everything underneath us via bpf_prog_release() -> bpf_prog_put().
      The memory is eventually released via deferred free, so we're seeing
      that bpf_jit_free() has a kallsym entry because we added it from
      bpf_prog_load() but /after/ bpf_prog_put() from the remote CPU.
      
      Therefore, move both notifications /before/ we install the fd. The
      issue was never seen between bpf_prog_alloc_id() and bpf_prog_new_fd()
      because upon bpf_prog_get_fd_by_id() we'll take another reference to
      the BPF prog, so we're still holding the original reference from the
      bpf_prog_load().
      
      Fixes: 6ee52e2a ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT")
      Fixes: 74451e66 ("bpf: make jited programs visible in traces")
      Reported-by: syzbot+bd3bba6ff3fcea7a6ec6@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Song Liu <songliubraving@fb.com>
      Signed-off-by: default avatarZubin Mithra <zsm@chromium.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ed568ca7
    • Damien Le Moal's avatar
      block: mq-deadline: Fix queue restart handling · dbb7339c
      Damien Le Moal authored
      [ Upstream commit cb8acabb ]
      
      Commit 7211aef8 ("block: mq-deadline: Fix write completion
      handling") added a call to blk_mq_sched_mark_restart_hctx() in
      dd_dispatch_request() to make sure that write request dispatching does
      not stall when all target zones are locked. This fix left a subtle race
      when a write completion happens during a dispatch execution on another
      CPU:
      
      CPU 0: Dispatch			CPU1: write completion
      
      dd_dispatch_request()
          lock(&dd->lock);
          ...
          lock(&dd->zone_lock);	dd_finish_request()
          rq = find request		lock(&dd->zone_lock);
          unlock(&dd->zone_lock);
          				zone write unlock
      				unlock(&dd->zone_lock);
      				...
      				__blk_mq_free_request
                                            check restart flag (not set)
      				      -> queue not run
          ...
          if (!rq && have writes)
              blk_mq_sched_mark_restart_hctx()
          unlock(&dd->lock)
      
      Since the dispatch context finishes after the write request completion
      handling, marking the queue as needing a restart is not seen from
      __blk_mq_free_request() and blk_mq_sched_restart() not executed leading
      to the dispatch stall under 100% write workloads.
      
      Fix this by moving the call to blk_mq_sched_mark_restart_hctx() from
      dd_dispatch_request() into dd_finish_request() under the zone lock to
      ensure full mutual exclusion between write request dispatch selection
      and zone unlock on write request completion.
      
      Fixes: 7211aef8 ("block: mq-deadline: Fix write completion handling")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarHans Holmberg <Hans.Holmberg@wdc.com>
      Reviewed-by: default avatarHans Holmberg <hans.holmberg@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dbb7339c
    • Alexandre Ghiti's avatar
      arm: use STACK_TOP when computing mmap base address · af10ffa6
      Alexandre Ghiti authored
      [ Upstream commit 86e568e9 ]
      
      mmap base address must be computed wrt stack top address, using TASK_SIZE
      is wrong since STACK_TOP and TASK_SIZE are not equivalent.
      
      Link: http://lkml.kernel.org/r/20190730055113.23635-8-alex@ghiti.frSigned-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      af10ffa6
    • Alexandre Ghiti's avatar
      arm: properly account for stack randomization and stack guard gap · f91a9c65
      Alexandre Ghiti authored
      [ Upstream commit af0f4297 ]
      
      This commit takes care of stack randomization and stack guard gap when
      computing mmap base address and checks if the task asked for
      randomization.  This fixes the problem uncovered and not fixed for arm
      here: https://lkml.kernel.org/r/20170622200033.25714-1-riel@redhat.com
      
      Link: http://lkml.kernel.org/r/20190730055113.23635-7-alex@ghiti.frSigned-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f91a9c65
    • Alexandre Ghiti's avatar
      mips: properly account for stack randomization and stack guard gap · 53ba8d43
      Alexandre Ghiti authored
      [ Upstream commit b1f61b5b ]
      
      This commit takes care of stack randomization and stack guard gap when
      computing mmap base address and checks if the task asked for
      randomization.  This fixes the problem uncovered and not fixed for arm
      here: https://lkml.kernel.org/r/20170622200033.25714-1-riel@redhat.com
      
      Link: http://lkml.kernel.org/r/20190730055113.23635-10-alex@ghiti.frSigned-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarPaul Burton <paul.burton@mips.com>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      53ba8d43
    • Alexandre Ghiti's avatar
      arm64: consider stack randomization for mmap base only when necessary · e1b391ab
      Alexandre Ghiti authored
      [ Upstream commit e8d54b62 ]
      
      Do not offset mmap base address because of stack randomization if current
      task does not want randomization.  Note that x86 already implements this
      behaviour.
      
      Link: http://lkml.kernel.org/r/20190730055113.23635-4-alex@ghiti.frSigned-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e1b391ab