1. 16 Dec, 2017 23 commits
    • Masahiro Yamada's avatar
      kbuild: do not call cc-option before KBUILD_CFLAGS initialization · 97c66870
      Masahiro Yamada authored
      
      [ Upstream commit 433dc2eb ]
      
      Some $(call cc-option,...) are invoked very early, even before
      KBUILD_CFLAGS, etc. are initialized.
      
      The returned string from $(call cc-option,...) depends on
      KBUILD_CPPFLAGS, KBUILD_CFLAGS, and GCC_PLUGINS_CFLAGS.
      
      Since they are exported, they are not empty when the top Makefile
      is recursively invoked.
      
      The recursion occurs in several places.  For example, the top
      Makefile invokes itself for silentoldconfig.  "make tinyconfig",
      "make rpm-pkg" are the cases, too.
      
      In those cases, the second call of cc-option from the same line
      runs a different shell command due to non-pristine KBUILD_CFLAGS.
      
      To get the same result all the time, KBUILD_* and GCC_PLUGINS_CFLAGS
      must be initialized before any call of cc-option.  This avoids
      garbage data in the .cache.mk file.
      
      Move all calls of cc-option below the config targets because target
      compiler flags are unnecessary for Kconfig.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97c66870
    • Paul Mackerras's avatar
      powerpc/64: Fix checksum folding in csum_tcpudp_nofold and ip_fast_csum_nofold · eae3f3ab
      Paul Mackerras authored
      commit b492f7e4 upstream.
      
      These functions compute an IP checksum by computing a 64-bit sum and
      folding it to 32 bits (the "nofold" in their names refers to folding
      down to 16 bits).  However, doing (u32) (s + (s >> 32)) is not
      sufficient to fold a 64-bit sum to 32 bits correctly.  The addition
      can produce a carry out from bit 31, which needs to be added in to
      the sum to produce the correct result.
      
      To fix this, we copy the from64to32() function from lib/checksum.c
      and use that.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eae3f3ab
    • Marc Zyngier's avatar
      KVM: arm/arm64: vgic-its: Preserve the revious read from the pending table · 9414a630
      Marc Zyngier authored
      commit 64afe6e9 upstream.
      
      The current pending table parsing code assumes that we keep the
      previous read of the pending bits, but keep that variable in
      the current block, making sure it is discarded on each loop.
      
      We end-up using whatever is on the stack. Who knows, it might
      just be the right thing...
      
      Fixes: 33d3bc95 ("KVM: arm64: vgic-its: Read initial LPI pending table")
      Cc: stable@vger.kernel.org # 4.8
      Reported-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9414a630
    • Al Viro's avatar
      fix kcm_clone() · 80c0f477
      Al Viro authored
      commit a5739435 upstream.
      
      1) it's fput() or sock_release(), not both
      2) don't do fd_install() until the last failure exit.
      3) not a bug per se, but... don't attach socket to struct file
         until it's set up.
      
      Take reserving descriptor into the caller, move fd_install() to the
      caller, sanitize failure exits and calling conventions.
      Acked-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80c0f477
    • Vincent Pelletier's avatar
      usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping · 16648cbc
      Vincent Pelletier authored
      commit 30bf90cc upstream.
      
      Found using DEBUG_ATOMIC_SLEEP while submitting an AIO read operation:
      
      [  100.853642] BUG: sleeping function called from invalid context at mm/slab.h:421
      [  100.861148] in_atomic(): 1, irqs_disabled(): 1, pid: 1880, name: python
      [  100.867954] 2 locks held by python/1880:
      [  100.867961]  #0:  (&epfile->mutex){....}, at: [<f8188627>] ffs_mutex_lock+0x27/0x30 [usb_f_fs]
      [  100.868020]  #1:  (&(&ffs->eps_lock)->rlock){....}, at: [<f818ad4b>] ffs_epfile_io.isra.17+0x24b/0x590 [usb_f_fs]
      [  100.868076] CPU: 1 PID: 1880 Comm: python Not tainted 4.14.0-edison+ #118
      [  100.868085] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48
      [  100.868093] Call Trace:
      [  100.868122]  dump_stack+0x47/0x62
      [  100.868156]  ___might_sleep+0xfd/0x110
      [  100.868182]  __might_sleep+0x68/0x70
      [  100.868217]  kmem_cache_alloc_trace+0x4b/0x200
      [  100.868248]  ? dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3]
      [  100.868302]  dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3]
      [  100.868343]  usb_ep_alloc_request+0x16/0xc0 [udc_core]
      [  100.868386]  ffs_epfile_io.isra.17+0x444/0x590 [usb_f_fs]
      [  100.868424]  ? _raw_spin_unlock_irqrestore+0x27/0x40
      [  100.868457]  ? kiocb_set_cancel_fn+0x57/0x60
      [  100.868477]  ? ffs_ep0_poll+0xc0/0xc0 [usb_f_fs]
      [  100.868512]  ffs_epfile_read_iter+0xfe/0x157 [usb_f_fs]
      [  100.868551]  ? security_file_permission+0x9c/0xd0
      [  100.868587]  ? rw_verify_area+0xac/0x120
      [  100.868633]  aio_read+0x9d/0x100
      [  100.868692]  ? __fget+0xa2/0xd0
      [  100.868727]  ? __might_sleep+0x68/0x70
      [  100.868763]  SyS_io_submit+0x471/0x680
      [  100.868878]  do_int80_syscall_32+0x4e/0xd0
      [  100.868921]  entry_INT80_32+0x2a/0x2a
      [  100.868932] EIP: 0xb7fbb676
      [  100.868941] EFLAGS: 00000292 CPU: 1
      [  100.868951] EAX: ffffffda EBX: b7aa2000 ECX: 00000002 EDX: b7af8368
      [  100.868961] ESI: b7fbb660 EDI: b7aab000 EBP: bfb6c658 ESP: bfb6c638
      [  100.868973]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      Signed-off-by: default avatarVincent Pelletier <plr.vincent@gmail.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16648cbc
    • Heiko Carstens's avatar
      s390: always save and restore all registers on context switch · 47273f0d
      Heiko Carstens authored
      commit fbbd7f1a upstream.
      
      The switch_to() macro has an optimization to avoid saving and
      restoring register contents that aren't needed for kernel threads.
      
      There is however the possibility that a kernel thread execve's a user
      space program. In such a case the execve'd process can partially see
      the contents of the previous process, which shouldn't be allowed.
      
      To avoid this, simply always save and restore register contents on
      context switch.
      
      Fixes: fdb6d070 ("switch_to: dont restore/save access & fpu regs for kernel threads")
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47273f0d
    • Masamitsu Yamazaki's avatar
      ipmi: Stop timers before cleaning up the module · f8dac5bf
      Masamitsu Yamazaki authored
      commit 4f7f5551 upstream.
      
      System may crash after unloading ipmi_si.ko module
      because a timer may remain and fire after the module cleaned up resources.
      
      cleanup_one_si() contains the following processing.
      
              /*
               * Make sure that interrupts, the timer and the thread are
               * stopped and will not run again.
               */
              if (to_clean->irq_cleanup)
                      to_clean->irq_cleanup(to_clean);
              wait_for_timer_and_thread(to_clean);
      
              /*
               * Timeouts are stopped, now make sure the interrupts are off
               * in the BMC.  Note that timers and CPU interrupts are off,
               * so no need for locks.
               */
              while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) {
                      poll(to_clean);
                      schedule_timeout_uninterruptible(1);
              }
      
      si_state changes as following in the while loop calling poll(to_clean).
      
        SI_GETTING_MESSAGES
          => SI_CHECKING_ENABLES
           => SI_SETTING_ENABLES
            => SI_GETTING_EVENTS
             => SI_NORMAL
      
      As written in the code comments above,
      timers are expected to stop before the polling loop and not to run again.
      But the timer is set again in the following process
      when si_state becomes SI_SETTING_ENABLES.
      
        => poll
           => smi_event_handler
             => handle_transaction_done
                // smi_info->si_state == SI_SETTING_ENABLES
               => start_getting_events
                 => start_new_msg
                  => smi_mod_timer
                    => mod_timer
      
      As a result, before the timer set in start_new_msg() expires,
      the polling loop may see si_state becoming SI_NORMAL
      and the module clean-up finishes.
      
      For example, hard LOCKUP and panic occurred as following.
      smi_timeout was called after smi_event_handler,
      kcs_event and hangs at port_inb()
      trying to access I/O port after release.
      
          [exception RIP: port_inb+19]
          RIP: ffffffffc0473053  RSP: ffff88069fdc3d80  RFLAGS: 00000006
          RAX: ffff8806800f8e00  RBX: ffff880682bd9400  RCX: 0000000000000000
          RDX: 0000000000000ca3  RSI: 0000000000000ca3  RDI: ffff8806800f8e40
          RBP: ffff88069fdc3d80   R8: ffffffff81d86dfc   R9: ffffffff81e36426
          R10: 00000000000509f0  R11: 0000000000100000  R12: 0000000000]:000000
          R13: 0000000000000000  R14: 0000000000000246  R15: ffff8806800f8e00
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
       --- <NMI exception stack> ---
      
      To fix the problem I defined a flag, timer_can_start,
      as member of struct smi_info.
      The flag is enabled immediately after initializing the timer
      and disabled immediately before waiting for timer deletion.
      
      Fixes: 0cfec916 ("ipmi: Start the timer and thread on internal msgs")
      Signed-off-by: default avatarYamazaki Masamitsu <m-yamazaki@ah.jp.nec.com>
      [Adjusted for recent changes in the driver.]
      [Some fairly major changes went into the IPMI driver in 4.15, so this
       required a backport as the code had changed and moved to a different
       file.  The 4.14 version of this patch moved some code under an
       if statement causing it to not apply to 4.7-4.13.]
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8dac5bf
    • Debabrata Banerjee's avatar
      Fix handling of verdicts after NF_QUEUE · 0cab694a
      Debabrata Banerjee authored
      [This fix is only needed for v4.9 stable since v4.10+ does not have the issue]
      
      A verdict of NF_STOLEN after NF_QUEUE will cause an incorrect return value
      and a potential kernel panic via double free of skb's
      
      This was broken by commit 7034b566 ("netfilter: fix nf_queue handling")
      and subsequently fixed in v4.10 by commit c63cbc46 ("netfilter:
      use switch() to handle verdict cases from nf_hook_slow()"). However that
      commit cannot be cleanly cherry-picked to v4.9
      Signed-off-by: default avatarDebabrata Banerjee <dbanerje@akamai.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0cab694a
    • Tommi Rantala's avatar
      tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv() · cf00fd3d
      Tommi Rantala authored
      
      [ Upstream commit c7799c06 ]
      
      Remove the second tipc_rcv() call in tipc_udp_recv(). We have just
      checked that the bearer is not up, and calling tipc_rcv() with a bearer
      that is not up leads to a TIPC div-by-zero crash in
      tipc_node_calculate_timer(). The crash is rare in practice, but can
      happen like this:
      
        We're enabling a bearer, but it's not yet up and fully initialized.
        At the same time we receive a discovery packet, and in tipc_udp_recv()
        we end up calling tipc_rcv() with the not-yet-initialized bearer,
        causing later the div-by-zero crash in tipc_node_calculate_timer().
      
      Jon Maloy explains the impact of removing the second tipc_rcv() call:
        "link setup in the worst case will be delayed until the next arriving
         discovery messages, 1 sec later, and this is an acceptable delay."
      
      As the tipc_rcv() call is removed, just leave the function via the
      rcu_out label, so that we will kfree_skb().
      
      [   12.590450] Own node address <1.1.1>, network identity 1
      [   12.668088] divide error: 0000 [#1] SMP
      [   12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-dirty #1
      [   12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
      [   12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000
      [   12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc]
      [   12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246
      [   12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000
      [   12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600
      [   12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001
      [   12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8
      [   12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800
      [   12.702338] FS:  0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000
      [   12.705099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0
      [   12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   12.712627] Call Trace:
      [   12.713390]  <IRQ>
      [   12.714011]  tipc_node_check_dest+0x2e8/0x350 [tipc]
      [   12.715286]  tipc_disc_rcv+0x14d/0x1d0 [tipc]
      [   12.716370]  tipc_rcv+0x8b0/0xd40 [tipc]
      [   12.717396]  ? minmax_running_min+0x2f/0x60
      [   12.718248]  ? dst_alloc+0x4c/0xa0
      [   12.718964]  ? tcp_ack+0xaf1/0x10b0
      [   12.719658]  ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc]
      [   12.720634]  tipc_udp_recv+0x71/0x1d0 [tipc]
      [   12.721459]  ? dst_alloc+0x4c/0xa0
      [   12.722130]  udp_queue_rcv_skb+0x264/0x490
      [   12.722924]  __udp4_lib_rcv+0x21e/0x990
      [   12.723670]  ? ip_route_input_rcu+0x2dd/0xbf0
      [   12.724442]  ? tcp_v4_rcv+0x958/0xa40
      [   12.725039]  udp_rcv+0x1a/0x20
      [   12.725587]  ip_local_deliver_finish+0x97/0x1d0
      [   12.726323]  ip_local_deliver+0xaf/0xc0
      [   12.726959]  ? ip_route_input_noref+0x19/0x20
      [   12.727689]  ip_rcv_finish+0xdd/0x3b0
      [   12.728307]  ip_rcv+0x2ac/0x360
      [   12.728839]  __netif_receive_skb_core+0x6fb/0xa90
      [   12.729580]  ? udp4_gro_receive+0x1a7/0x2c0
      [   12.730274]  __netif_receive_skb+0x1d/0x60
      [   12.730953]  ? __netif_receive_skb+0x1d/0x60
      [   12.731637]  netif_receive_skb_internal+0x37/0xd0
      [   12.732371]  napi_gro_receive+0xc7/0xf0
      [   12.732920]  receive_buf+0x3c3/0xd40
      [   12.733441]  virtnet_poll+0xb1/0x250
      [   12.733944]  net_rx_action+0x23e/0x370
      [   12.734476]  __do_softirq+0xc5/0x2f8
      [   12.734922]  irq_exit+0xfa/0x100
      [   12.735315]  do_IRQ+0x4f/0xd0
      [   12.735680]  common_interrupt+0xa2/0xa2
      [   12.736126]  </IRQ>
      [   12.736416] RIP: 0010:native_safe_halt+0x6/0x10
      [   12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d
      [   12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000
      [   12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      [   12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88
      [   12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
      [   12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000
      [   12.741831]  default_idle+0x2a/0x100
      [   12.742323]  arch_cpu_idle+0xf/0x20
      [   12.742796]  default_idle_call+0x28/0x40
      [   12.743312]  do_idle+0x179/0x1f0
      [   12.743761]  cpu_startup_entry+0x1d/0x20
      [   12.744291]  start_secondary+0x112/0x120
      [   12.744816]  secondary_startup_64+0xa5/0xa5
      [   12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00
      00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48
      89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f
      [   12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0
      [   12.748555] ---[ end trace 1399ab83390650fd ]---
      [   12.749296] Kernel panic - not syncing: Fatal exception in interrupt
      [   12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000
      (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [   12.751215] Rebooting in 60 seconds..
      
      Fixes: c9b64d49 ("tipc: add replicast peer discovery")
      Signed-off-by: default avatarTommi Rantala <tommi.t.rantala@nokia.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf00fd3d
    • Julian Wiedmann's avatar
      s390/qeth: fix thinko in IPv4 multicast address tracking · 0cfe6df9
      Julian Wiedmann authored
      
      [ Upsteam commit bc3ab705 ]
      
      Commit 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      reworked how secondary addresses are managed for qeth devices.
      Instead of dropping & subsequently re-adding all addresses on every
      ndo_set_rx_mode() call, qeth now keeps track of the addresses that are
      currently registered with the HW.
      On a ndo_set_rx_mode(), we thus only need to do (de-)registration
      requests for the addresses that have actually changed.
      
      On L3 devices, the lookup for IPv4 Multicast addresses checks the wrong
      hashtable - and thus never finds a match. As a result, we first delete
      *all* such addresses, and then re-add them again. So each set_rx_mode()
      causes a short period where the IPv4 Multicast addresses are not
      registered, and the card stops forwarding inbound traffic for them.
      
      Fix this by setting the ->is_multicast flag on the lookup object, thus
      enabling qeth_l3_ip_from_hash() to search the correct hashtable and
      find a match there.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cfe6df9
    • Julian Wiedmann's avatar
      s390/qeth: fix GSO throughput regression · 1d55222b
      Julian Wiedmann authored
      
      [ Upstream commit 6d69b1f1 ]
      
      Using GSO with small MTUs currently results in a substantial throughput
      regression - which is caused by how qeth needs to map non-linear skbs
      into its IO buffer elements:
      compared to a linear skb, each GSO-segmented skb effectively consumes
      twice as many buffer elements (ie two instead of one) due to the
      additional header-only part. This causes the Output Queue to be
      congested with low-utilized IO buffers.
      
      Fix this as follows:
      If the MSS is low enough so that a non-SG GSO segmentation produces
      order-0 skbs (currently ~3500 byte), opt out from NETIF_F_SG. This is
      where we anticipate the biggest savings, since an SG-enabled
      GSO segmentation produces skbs that always consume at least two
      buffer elements.
      
      Larger MSS values continue to get a SG-enabled GSO segmentation, since
      1) the relative overhead of the additional header-only buffer element
      becomes less noticeable, and
      2) the linearization overhead increases.
      
      With the throughput regression fixed, re-enable NETIF_F_SG by default to
      reap the significant CPU savings of GSO.
      
      Fixes: 5722963a ("qeth: do not turn on SG per default")
      Reported-by: default avatarNils Hoppmann <niho@de.ibm.com>
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d55222b
    • Julian Wiedmann's avatar
      s390/qeth: build max size GSO skbs on L2 devices · fbf0dfe7
      Julian Wiedmann authored
      
      [ Upstream commit 0cbff6d4 ]
      
      The current GSO skb size limit was copy&pasted over from the L3 path,
      where it is needed due to a TSO limitation.
      As L2 devices don't offer TSO support (and thus all GSO skbs are
      segmented before they reach the driver), there's no reason to restrict
      the stack in how large it may build the GSO skbs.
      
      Fixes: d52aec97 ("qeth: enable scatter/gather in layer 2 mode")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbf0dfe7
    • Eric Dumazet's avatar
      tcp/dccp: block bh before arming time_wait timer · aa0080f1
      Eric Dumazet authored
      
      [ Upstream commit cfac7f83 ]
      
      Maciej Żenczykowski reported some panics in tcp_twsk_destructor()
      that might be caused by the following bug.
      
      timewait timer is pinned to the cpu, because we want to transition
      timwewait refcount from 0 to 4 in one go, once everything has been
      initialized.
      
      At the time commit ed2e9239 ("tcp/dccp: fix timewait races in timer
      handling") was merged, TCP was always running from BH habdler.
      
      After commit 5413d1ba ("net: do not block BH while processing
      socket backlog") we definitely can run tcp_time_wait() from process
      context.
      
      We need to block BH in the critical section so that the pinned timer
      has still its purpose.
      
      This bug is more likely to happen under stress and when very small RTO
      are used in datacenter flows.
      
      Fixes: 5413d1ba ("net: do not block BH while processing socket backlog")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Acked-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa0080f1
    • Lars Persson's avatar
      stmmac: reset last TSO segment size after device open · 30985e3b
      Lars Persson authored
      
      [ Upstream commit 45ab4b13 ]
      
      The mss variable tracks the last max segment size sent to the TSO
      engine. We do not update the hardware as long as we receive skb:s with
      the same value in gso_size.
      
      During a network device down/up cycle (mapped to stmmac_release() and
      stmmac_open() callbacks) we issue a reset to the hardware and it
      forgets the setting for mss. However we did not zero out our mss
      variable so the next transmission of a gso packet happens with an
      undefined hardware setting.
      
      This triggers a hang in the TSO engine and eventuelly the netdev
      watchdog will bark.
      
      Fixes: f748be53 ("stmmac: support new GMAC4")
      Signed-off-by: default avatarLars Persson <larper@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30985e3b
    • Eric Dumazet's avatar
      net: remove hlist_nulls_add_tail_rcu() · 564fe3e0
      Eric Dumazet authored
      
      [ Upstream commit d7efc6c1 ]
      
      Alexander Potapenko reported use of uninitialized memory [1]
      
      This happens when inserting a request socket into TCP ehash,
      in __sk_nulls_add_node_rcu(), since sk_reuseport is not initialized.
      
      Bug was added by commit d894ba18 ("soreuseport: fix ordering for
      mixed v4/v6 sockets")
      
      Note that d296ba60 ("soreuseport: Resolve merge conflict for v4/v6
      ordering fix") missed the opportunity to get rid of
      hlist_nulls_add_tail_rcu() :
      
      Both UDP sockets and TCP/DCCP listeners no longer use
      __sk_nulls_add_node_rcu() for their hash insertion.
      
      Since all other sockets have unique 4-tuple, the reuseport status
      has no special meaning, so we can always use hlist_nulls_add_head_rcu()
      for them and save few cycles/instructions.
      
      [1]
      
      ==================================================================
      BUG: KMSAN: use of uninitialized memory in inet_ehash_insert+0xd40/0x1050
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3288
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:16
       dump_stack+0x185/0x1d0 lib/dump_stack.c:52
       kmsan_report+0x13f/0x1c0 mm/kmsan/kmsan.c:1016
       __msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:766
       __sk_nulls_add_node_rcu ./include/net/sock.h:684
       inet_ehash_insert+0xd40/0x1050 net/ipv4/inet_hashtables.c:413
       reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:754
       inet_csk_reqsk_queue_hash_add+0x1cc/0x300 net/ipv4/inet_connection_sock.c:765
       tcp_conn_request+0x31e7/0x36f0 net/ipv4/tcp_input.c:6414
       tcp_v4_conn_request+0x16d/0x220 net/ipv4/tcp_ipv4.c:1314
       tcp_rcv_state_process+0x42a/0x7210 net/ipv4/tcp_input.c:5917
       tcp_v4_do_rcv+0xa6a/0xcd0 net/ipv4/tcp_ipv4.c:1483
       tcp_v4_rcv+0x3de0/0x4ab0 net/ipv4/tcp_ipv4.c:1763
       ip_local_deliver_finish+0x6bb/0xcb0 net/ipv4/ip_input.c:216
       NF_HOOK ./include/linux/netfilter.h:248
       ip_local_deliver+0x3fa/0x480 net/ipv4/ip_input.c:257
       dst_input ./include/net/dst.h:477
       ip_rcv_finish+0x6fb/0x1540 net/ipv4/ip_input.c:397
       NF_HOOK ./include/linux/netfilter.h:248
       ip_rcv+0x10f6/0x15c0 net/ipv4/ip_input.c:488
       __netif_receive_skb_core+0x36f6/0x3f60 net/core/dev.c:4298
       __netif_receive_skb net/core/dev.c:4336
       netif_receive_skb_internal+0x63c/0x19c0 net/core/dev.c:4497
       napi_skb_finish net/core/dev.c:4858
       napi_gro_receive+0x629/0xa50 net/core/dev.c:4889
       e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4018
       e1000_clean_rx_irq+0x1492/0x1d30
      drivers/net/ethernet/intel/e1000/e1000_main.c:4474
       e1000_clean+0x43aa/0x5970 drivers/net/ethernet/intel/e1000/e1000_main.c:3819
       napi_poll net/core/dev.c:5500
       net_rx_action+0x73c/0x1820 net/core/dev.c:5566
       __do_softirq+0x4b4/0x8dd kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364
       irq_exit+0x203/0x240 kernel/softirq.c:405
       exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:638
       do_IRQ+0x15e/0x1a0 arch/x86/kernel/irq.c:263
       common_interrupt+0x86/0x86
      
      Fixes: d894ba18 ("soreuseport: fix ordering for mixed v4/v6 sockets")
      Fixes: d296ba60 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      564fe3e0
    • Bjørn Mork's avatar
      usbnet: fix alignment for frames with no ethernet header · 80ad5bd1
      Bjørn Mork authored
      
      [ Upstream commit a4abd7a8 ]
      
      The qmi_wwan minidriver support a 'raw-ip' mode where frames are
      received without any ethernet header. This causes alignment issues
      because the skbs allocated by usbnet are "IP aligned".
      
      Fix by allowing minidrivers to disable the additional alignment
      offset. This is implemented using a per-device flag, since the same
      minidriver also supports 'ethernet' mode.
      
      Fixes: 32f7adf6 ("net: qmi_wwan: support "raw IP" mode")
      Reported-and-tested-by: default avatarJay Foster <jay@systech.com>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80ad5bd1
    • Eric Dumazet's avatar
      net/packet: fix a race in packet_bind() and packet_notifier() · 5471afee
      Eric Dumazet authored
      
      [ Upstream commit 15fe076e ]
      
      syzbot reported crashes [1] and provided a C repro easing bug hunting.
      
      When/if packet_do_bind() calls __unregister_prot_hook() and releases
      po->bind_lock, another thread can run packet_notifier() and process an
      NETDEV_UP event.
      
      This calls register_prot_hook() and hooks again the socket right before
      first thread is able to grab again po->bind_lock.
      
      Fixes this issue by temporarily setting po->num to 0, as suggested by
      David Miller.
      
      [1]
      dev_remove_pack: ffff8801bf16fa80 not found
      ------------[ cut here ]------------
      kernel BUG at net/core/dev.c:7945!  ( BUG_ON(!list_empty(&dev->ptype_all)); )
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      device syz0 entered promiscuous mode
      CPU: 0 PID: 3161 Comm: syzkaller404108 Not tainted 4.14.0+ #190
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801cc57a500 task.stack: ffff8801cc588000
      RIP: 0010:netdev_run_todo+0x772/0xae0 net/core/dev.c:7945
      RSP: 0018:ffff8801cc58f598 EFLAGS: 00010293
      RAX: ffff8801cc57a500 RBX: dffffc0000000000 RCX: ffffffff841f75b2
      RDX: 0000000000000000 RSI: 1ffff100398b1ede RDI: ffff8801bf1f8810
      device syz0 entered promiscuous mode
      RBP: ffff8801cc58f898 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bf1f8cd8
      R13: ffff8801cc58f870 R14: ffff8801bf1f8780 R15: ffff8801cc58f7f0
      FS:  0000000001716880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020b13000 CR3: 0000000005e25000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:106
       tun_detach drivers/net/tun.c:670 [inline]
       tun_chr_close+0x49/0x60 drivers/net/tun.c:2845
       __fput+0x333/0x7f0 fs/file_table.c:210
       ____fput+0x15/0x20 fs/file_table.c:244
       task_work_run+0x199/0x270 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x9bb/0x1ae0 kernel/exit.c:865
       do_group_exit+0x149/0x400 kernel/exit.c:968
       SYSC_exit_group kernel/exit.c:979 [inline]
       SyS_exit_group+0x1d/0x20 kernel/exit.c:977
       entry_SYSCALL_64_fastpath+0x1f/0x96
      RIP: 0033:0x44ad19
      
      Fixes: 30f7ea1c ("packet: race condition in packet_bind")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Francesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5471afee
    • Mike Maloney's avatar
      packet: fix crash in fanout_demux_rollover() · 30c573af
      Mike Maloney authored
      
      syzkaller found a race condition fanout_demux_rollover() while removing
      a packet socket from a fanout group.
      
      po->rollover is read and operated on during packet_rcv_fanout(), via
      fanout_demux_rollover(), but the pointer is currently cleared before the
      synchronization in packet_release().   It is safer to delay the cleanup
      until after synchronize_net() has been called, ensuring all calls to
      packet_rcv_fanout() for this socket have finished.
      
      To further simplify synchronization around the rollover structure, set
      po->rollover in fanout_add() only if there are no errors.  This removes
      the need for rcu in the struct and in the call to
      packet_getsockopt(..., PACKET_ROLLOVER_STATS, ...).
      
      Crashing stack trace:
       fanout_demux_rollover+0xb6/0x4d0 net/packet/af_packet.c:1392
       packet_rcv_fanout+0x649/0x7c8 net/packet/af_packet.c:1487
       dev_queue_xmit_nit+0x835/0xc10 net/core/dev.c:1953
       xmit_one net/core/dev.c:2975 [inline]
       dev_hard_start_xmit+0x16b/0xac0 net/core/dev.c:2995
       __dev_queue_xmit+0x17a4/0x2050 net/core/dev.c:3476
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3509
       neigh_connected_output+0x489/0x720 net/core/neighbour.c:1379
       neigh_output include/net/neighbour.h:482 [inline]
       ip6_finish_output2+0xad1/0x22a0 net/ipv6/ip6_output.c:120
       ip6_finish_output+0x2f9/0x920 net/ipv6/ip6_output.c:146
       NF_HOOK_COND include/linux/netfilter.h:239 [inline]
       ip6_output+0x1f4/0x850 net/ipv6/ip6_output.c:163
       dst_output include/net/dst.h:459 [inline]
       NF_HOOK.constprop.35+0xff/0x630 include/linux/netfilter.h:250
       mld_sendpack+0x6a8/0xcc0 net/ipv6/mcast.c:1660
       mld_send_initial_cr.part.24+0x103/0x150 net/ipv6/mcast.c:2072
       mld_send_initial_cr net/ipv6/mcast.c:2056 [inline]
       ipv6_mc_dad_complete+0x99/0x130 net/ipv6/mcast.c:2079
       addrconf_dad_completed+0x595/0x970 net/ipv6/addrconf.c:4039
       addrconf_dad_work+0xac9/0x1160 net/ipv6/addrconf.c:3971
       process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2113
       worker_thread+0x223/0x1990 kernel/workqueue.c:2247
       kthread+0x35e/0x430 kernel/kthread.c:231
       ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:432
      
      Fixes: 0648ab70 ("packet: rollover prepare: per-socket state")
      Fixes: 509c7a1e ("packet: avoid panic in packet_getsockopt()")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarMike Maloney <maloney@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30c573af
    • Hangbin Liu's avatar
      sit: update frag_off info · 5f218c3f
      Hangbin Liu authored
      
      [ Upstream commit f859b4af ]
      
      After parsing the sit netlink change info, we forget to update frag_off in
      ipip6_tunnel_update(). Fix it by assigning frag_off with new value.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f218c3f
    • Håkon Bugge's avatar
      rds: Fix NULL pointer dereference in __rds_rdma_map · 3259862d
      Håkon Bugge authored
      
      [ Upstream commit f3069c6d ]
      
      This is a fix for syzkaller719569, where memory registration was
      attempted without any underlying transport being loaded.
      
      Analysis of the case reveals that it is the setsockopt() RDS_GET_MR
      (2) and RDS_GET_MR_FOR_DEST (7) that are vulnerable.
      
      Here is an example stack trace when the bug is hit:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000c0
      IP: __rds_rdma_map+0x36/0x440 [rds]
      PGD 2f93d03067 P4D 2f93d03067 PUD 2f93d02067 PMD 0
      Oops: 0000 [#1] SMP
      Modules linked in: bridge stp llc tun rpcsec_gss_krb5 nfsv4
      dns_resolver nfs fscache rds binfmt_misc sb_edac intel_powerclamp
      coretemp kvm_intel kvm irqbypass crct10dif_pclmul c rc32_pclmul
      ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd
      iTCO_wdt mei_me sg iTCO_vendor_support ipmi_si mei ipmi_devintf nfsd
      shpchp pcspkr i2c_i801 ioatd ma ipmi_msghandler wmi lpc_ich mfd_core
      auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2
      mgag200 i2c_algo_bit drm_kms_helper ixgbe syscopyarea ahci sysfillrect
      sysimgblt libahci mdio fb_sys_fops ttm ptp libata sd_mod mlx4_core drm
      crc32c_intel pps_core megaraid_sas i2c_core dca dm_mirror
      dm_region_hash dm_log dm_mod
      CPU: 48 PID: 45787 Comm: repro_set2 Not tainted 4.14.2-3.el7uek.x86_64 #2
      Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017
      task: ffff882f9190db00 task.stack: ffffc9002b994000
      RIP: 0010:__rds_rdma_map+0x36/0x440 [rds]
      RSP: 0018:ffffc9002b997df0 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: ffff882fa2182580 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffc9002b997e40 RDI: ffff882fa2182580
      RBP: ffffc9002b997e30 R08: 0000000000000000 R09: 0000000000000002
      R10: ffff885fb29e3838 R11: 0000000000000000 R12: ffff882fa2182580
      R13: ffff882fa2182580 R14: 0000000000000002 R15: 0000000020000ffc
      FS:  00007fbffa20b700(0000) GS:ffff882fbfb80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000000c0 CR3: 0000002f98a66006 CR4: 00000000001606e0
      Call Trace:
       rds_get_mr+0x56/0x80 [rds]
       rds_setsockopt+0x172/0x340 [rds]
       ? __fget_light+0x25/0x60
       ? __fdget+0x13/0x20
       SyS_setsockopt+0x80/0xe0
       do_syscall_64+0x67/0x1b0
       entry_SYSCALL64_slow_path+0x25/0x25
      RIP: 0033:0x7fbff9b117f9
      RSP: 002b:00007fbffa20aed8 EFLAGS: 00000293 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00000000000c84a4 RCX: 00007fbff9b117f9
      RDX: 0000000000000002 RSI: 0000400000000114 RDI: 000000000000109b
      RBP: 00007fbffa20af10 R08: 0000000000000020 R09: 00007fbff9dd7860
      R10: 0000000020000ffc R11: 0000000000000293 R12: 0000000000000000
      R13: 00007fbffa20b9c0 R14: 00007fbffa20b700 R15: 0000000000000021
      
      Code: 41 56 41 55 49 89 fd 41 54 53 48 83 ec 18 8b 87 f0 02 00 00 48
      89 55 d0 48 89 4d c8 85 c0 0f 84 2d 03 00 00 48 8b 87 00 03 00 00 <48>
      83 b8 c0 00 00 00 00 0f 84 25 03 00 0 0 48 8b 06 48 8b 56 08
      
      The fix is to check the existence of an underlying transport in
      __rds_rdma_map().
      Signed-off-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3259862d
    • Jon Maloy's avatar
      tipc: fix memory leak in tipc_accept_from_sock() · 96b4a8ac
      Jon Maloy authored
      
      [ Upstream commit a7d5f107 ]
      
      When the function tipc_accept_from_sock() fails to create an instance of
      struct tipc_subscriber it omits to free the already created instance of
      struct tipc_conn instance before it returns.
      
      We fix that with this commit.
      Reported-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96b4a8ac
    • Julian Wiedmann's avatar
      s390/qeth: fix early exit from error path · 20610f5b
      Julian Wiedmann authored
      
      [ Upstream commit 83cf79a2 ]
      
      When the allocation of the addr buffer fails, we need to free
      our refcount on the inetdevice before returning.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20610f5b
    • Sebastian Sjoholm's avatar
      net: qmi_wwan: add Quectel BG96 2c7c:0296 · 32436bf3
      Sebastian Sjoholm authored
      
      [ Upstream commit f9409e7f ]
      
      Quectel BG96 is an Qualcomm MDM9206 based IoT modem, supporting both
      CAT-M and NB-IoT. Tested hardware is BG96 mounted on Quectel development
      board (EVB). The USB id is added to qmi_wwan.c to allow QMI
      communication with the BG96.
      Signed-off-by: default avatarSebastian Sjoholm <ssjoholm@mac.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32436bf3
  2. 14 Dec, 2017 17 commits