1. 30 Mar, 2017 11 commits
    • Eric Dumazet's avatar
      tcp: initialize icsk_ack.lrcvtime at session start time · afaed241
      Eric Dumazet authored
      [ Upstream commit 15bb7745 ]
      
      icsk_ack.lrcvtime has a 0 value at socket creation time.
      
      tcpi_last_data_recv can have bogus value if no payload is ever received.
      
      This patch initializes icsk_ack.lrcvtime for active sessions
      in tcp_finish_connect(), and for passive sessions in
      tcp_create_openreq_child()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afaed241
    • Daniel Borkmann's avatar
      socket, bpf: fix sk_filter use after free in sk_clone_lock · 95aa915c
      Daniel Borkmann authored
      [ Upstream commit a97e50cc ]
      
      In sk_clone_lock(), we create a new socket and inherit most of the
      parent's members via sock_copy() which memcpy()'s various sections.
      Now, in case the parent socket had a BPF socket filter attached,
      then newsk->sk_filter points to the same instance as the original
      sk->sk_filter.
      
      sk_filter_charge() is then called on the newsk->sk_filter to take a
      reference and should that fail due to hitting max optmem, we bail
      out and release the newsk instance.
      
      The issue is that commit 278571ba ("net: filter: simplify socket
      charging") wrongly combined the dismantle path with the failure path
      of xfrm_sk_clone_policy(). This means, even when charging failed, we
      call sk_free_unlock_clone() on the newsk, which then still points to
      the same sk_filter as the original sk.
      
      Thus, sk_free_unlock_clone() calls into __sk_destruct() eventually
      where it tests for present sk_filter and calls sk_filter_uncharge()
      on it, which potentially lets sk_omem_alloc wrap around and releases
      the eBPF prog and sk_filter structure from the (still intact) parent.
      
      Fix it by making sure that when sk_filter_charge() failed, we reset
      newsk->sk_filter back to NULL before passing to sk_free_unlock_clone(),
      so that we don't mess with the parents sk_filter.
      
      Only if xfrm_sk_clone_policy() fails, we did reach the point where
      either the parent's filter was NULL and as a result newsk's as well
      or where we previously had a successful sk_filter_charge(), thus for
      that case, we do need sk_filter_uncharge() to release the prior taken
      reference on sk_filter.
      
      Fixes: 278571ba ("net: filter: simplify socket charging")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95aa915c
    • Eric Dumazet's avatar
      ipv4: provide stronger user input validation in nl_fib_input() · 38dece41
      Eric Dumazet authored
      [ Upstream commit c64c0b3c ]
      
      Alexander reported a KMSAN splat caused by reads of uninitialized
      field (tb_id_in) from user provided struct fib_result_nl
      
      It turns out nl_fib_input() sanity tests on user input is a bit
      wrong :
      
      User can pretend nlh->nlmsg_len is big enough, but provide
      at sendmsg() time a too small buffer.
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38dece41
    • Doug Berger's avatar
      net: bcmgenet: remove bcmgenet_internal_phy_setup() · 85f00dac
      Doug Berger authored
      [ Upstream commit 31739eae ]
      
      Commit 6ac3ce82 ("net: bcmgenet: Remove excessive PHY reset")
      removed the bcmgenet_mii_reset() function from bcmgenet_power_up() and
      bcmgenet_internal_phy_setup() functions.  In so doing it broke the reset
      of the internal PHY devices used by the GENETv1-GENETv3 which required
      this reset before the UniMAC was enabled.  It also broke the internal
      GPHY devices used by the GENETv4 because the config_init that installed
      the AFE workaround was no longer occurring after the reset of the GPHY
      performed by bcmgenet_phy_power_set() in bcmgenet_internal_phy_setup().
      In addition the code in bcmgenet_internal_phy_setup() related to the
      "enable APD" comment goes with the bcmgenet_mii_reset() so it should
      have also been removed.
      
      Commit bd4060a6 ("net: bcmgenet: Power on integrated GPHY in
      bcmgenet_power_up()") moved the bcmgenet_phy_power_set() call to the
      bcmgenet_power_up() function, but failed to remove it from the
      bcmgenet_internal_phy_setup() function.  Had it done so, the
      bcmgenet_internal_phy_setup() function would have been empty and could
      have been removed at that time.
      
      Commit 5dbebbb4 ("net: bcmgenet: Software reset EPHY after power on")
      was submitted to correct the functional problems introduced by
      commit 6ac3ce82 ("net: bcmgenet: Remove excessive PHY reset"). It
      was included in v4.4 and made available on 4.3-stable. Unfortunately,
      it didn't fully revert the commit because this bcmgenet_mii_reset()
      doesn't apply the soft reset to the internal GPHY used by GENETv4 like
      the previous one did. This prevents the restoration of the AFE work-
      arounds for internal GPHY devices after the bcmgenet_phy_power_set() in
      bcmgenet_internal_phy_setup().
      
      This commit takes the alternate approach of removing the unnecessary
      bcmgenet_internal_phy_setup() function which shouldn't have been in v4.3
      so that when bcmgenet_mii_reset() was restored it should have only gone
      into bcmgenet_power_up().  This will avoid the problems while also
      removing the redundancy (and hopefully some of the confusion).
      
      Fixes: 6ac3ce82 ("net: bcmgenet: Remove excessive PHY reset")
      Signed-off-by: default avatarDoug Berger <opendmb@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85f00dac
    • Gal Pressman's avatar
      net/mlx5e: Count LRO packets correctly · fdcee7c1
      Gal Pressman authored
      [ Upstream commit 8ab7e2ae ]
      
      RX packets statistics ('rx_packets' counter) used to count LRO packets
      as one, even though it contains multiple segments.
      This patch will increment the counter by the number of segments, and
      align the driver with the behavior of other drivers in the stack.
      
      Note that no information is lost in this patch due to 'rx_lro_packets'
      counter existence.
      
      Before, ethtool showed:
      $ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets"
           rx_packets: 435277
           rx_lro_packets: 35847
           rx_packets_phy: 1935066
      
      Now, we will see the more logical statistics:
      $ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets"
           rx_packets: 1935066
           rx_lro_packets: 35847
           rx_packets_phy: 1935066
      
      Fixes: e586b3b0 ("net/mlx5: Ethernet Datapath files")
      Signed-off-by: default avatarGal Pressman <galp@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdcee7c1
    • Maor Gottlieb's avatar
      net/mlx5: Increase number of max QPs in default profile · 9d1894cb
      Maor Gottlieb authored
      [ Upstream commit 5f40b4ed ]
      
      With ConnectX-4 sharing SRQs from the same space as QPs, we hit a
      limit preventing some applications to allocate needed QPs amount.
      Double the size to 256K.
      
      Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters')
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d1894cb
    • Andrey Ulanov's avatar
      net: unix: properly re-increment inflight counter of GC discarded candidates · 610c6bcc
      Andrey Ulanov authored
      [ Upstream commit 7df9c246 ]
      
      Dmitry has reported that a BUG_ON() condition in unix_notinflight()
      may be triggered by a simple code that forwards unix socket in an
      SCM_RIGHTS message.
      That is caused by incorrect unix socket GC implementation in unix_gc().
      
      The GC first collects list of candidates, then (a) decrements their
      "children's" inflight counter, (b) checks which inflight counters are
      now 0, and then (c) increments all inflight counters back.
      (a) and (c) are done by calling scan_children() with inc_inflight or
      dec_inflight as the second argument.
      
      Commit 6209344f ("net: unix: fix inflight counting bug in garbage
      collector") changed scan_children() such that it no longer considers
      sockets that do not have UNIX_GC_CANDIDATE flag. It also added a block
      of code that that unsets this flag _before_ invoking
      scan_children(, dec_iflight, ). This may lead to incorrect inflight
      counters for some sockets.
      
      This change fixes this bug by changing order of operations:
      UNIX_GC_CANDIDATE is now unset only after all inflight counters are
      restored to the original state.
      
        kernel BUG at net/unix/garbage.c:149!
        RIP: 0010:[<ffffffff8717ebf4>]  [<ffffffff8717ebf4>]
        unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149
        Call Trace:
         [<ffffffff8716cfbf>] unix_detach_fds.isra.19+0xff/0x170 net/unix/af_unix.c:1487
         [<ffffffff8716f6a9>] unix_destruct_scm+0xf9/0x210 net/unix/af_unix.c:1496
         [<ffffffff86a90a01>] skb_release_head_state+0x101/0x200 net/core/skbuff.c:655
         [<ffffffff86a9808a>] skb_release_all+0x1a/0x60 net/core/skbuff.c:668
         [<ffffffff86a980ea>] __kfree_skb+0x1a/0x30 net/core/skbuff.c:684
         [<ffffffff86a98284>] kfree_skb+0x184/0x570 net/core/skbuff.c:705
         [<ffffffff871789d5>] unix_release_sock+0x5b5/0xbd0 net/unix/af_unix.c:559
         [<ffffffff87179039>] unix_release+0x49/0x90 net/unix/af_unix.c:836
         [<ffffffff86a694b2>] sock_release+0x92/0x1f0 net/socket.c:570
         [<ffffffff86a6962b>] sock_close+0x1b/0x20 net/socket.c:1017
         [<ffffffff81a76b8e>] __fput+0x34e/0x910 fs/file_table.c:208
         [<ffffffff81a771da>] ____fput+0x1a/0x20 fs/file_table.c:244
         [<ffffffff81483ab0>] task_work_run+0x1a0/0x280 kernel/task_work.c:116
         [<     inline     >] exit_task_work include/linux/task_work.h:21
         [<ffffffff8141287a>] do_exit+0x183a/0x2640 kernel/exit.c:828
         [<ffffffff8141383e>] do_group_exit+0x14e/0x420 kernel/exit.c:931
         [<ffffffff814429d3>] get_signal+0x663/0x1880 kernel/signal.c:2307
         [<ffffffff81239b45>] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807
         [<ffffffff8100666a>] exit_to_usermode_loop+0x1ea/0x2d0
        arch/x86/entry/common.c:156
         [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
         [<ffffffff81009693>] syscall_return_slowpath+0x4d3/0x570
        arch/x86/entry/common.c:259
         [<ffffffff881478e6>] entry_SYSCALL_64_fastpath+0xc4/0xc6
      
      Link: https://lkml.org/lkml/2017/3/6/252Signed-off-by: default avatarAndrey Ulanov <andreyu@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: 6209344f ("net: unix: fix inflight counting bug in garbage collector")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      610c6bcc
    • Lendacky, Thomas's avatar
      amd-xgbe: Fix jumbo MTU processing on newer hardware · ae43f936
      Lendacky, Thomas authored
      [ Upstream commit 622c36f1 ]
      
      Newer hardware does not provide a cumulative payload length when multiple
      descriptors are needed to handle the data. Once the MTU increases beyond
      the size that can be handled by a single descriptor, the SKB does not get
      built properly by the driver.
      
      The driver will now calculate the size of the data buffers used by the
      hardware.  The first buffer of the first descriptor is for packet headers
      or packet headers and data when the headers can't be split. Subsequent
      descriptors in a multi-descriptor chain will not use the first buffer. The
      second buffer is used by all the descriptors in the chain for payload data.
      Based on whether the driver is processing the first, intermediate, or last
      descriptor it can calculate the buffer usage and build the SKB properly.
      
      Tested and verified on both old and new hardware.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae43f936
    • Eric Dumazet's avatar
      net: properly release sk_frag.page · f3126725
      Eric Dumazet authored
      [ Upstream commit 22a0e18e ]
      
      I mistakenly added the code to release sk->sk_frag in
      sk_common_release() instead of sk_destruct()
      
      TCP sockets using sk->sk_allocation == GFP_ATOMIC do no call
      sk_common_release() at close time, thus leaking one (order-3) page.
      
      iSCSI is using such sockets.
      
      Fixes: 5640f768 ("net: use a per task frag allocator")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3126725
    • Florian Fainelli's avatar
      net: bcmgenet: Do not suspend PHY if Wake-on-LAN is enabled · 12f0bffc
      Florian Fainelli authored
      [ Upstream commit 5371bbf4 ]
      
      Suspending the PHY would be putting it in a low power state where it
      may no longer allow us to do Wake-on-LAN.
      
      Fixes: cc013fb4 ("net: bcmgenet: correctly suspend and resume PHY device")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      12f0bffc
    • Or Gerlitz's avatar
      net/openvswitch: Set the ipv6 source tunnel key address attribute correctly · b362d673
      Or Gerlitz authored
      [ Upstream commit 3d20f1f7 ]
      
      When dealing with ipv6 source tunnel key address attribute
      (OVS_TUNNEL_KEY_ATTR_IPV6_SRC) we are wrongly setting the tunnel
      dst ip, fix that.
      
      Fixes: 6b26ba3a ('openvswitch: netlink attributes for IPv6 tunneling')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: default avatarPaul Blakey <paulb@mellanox.com>
      Acked-by: default avatarJiri Benc <jbenc@redhat.com>
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b362d673
  2. 26 Mar, 2017 29 commits