An error occurred fetching the project authors.
  1. 10 Feb, 2017 1 commit
  2. 18 Mar, 2015 1 commit
    • Michal Kubeček's avatar
      udp: only allow UFO for packets from SOCK_DGRAM sockets · 6b313008
      Michal Kubeček authored
      [ Upstream commit acf8dd0a ]
      
      If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
      UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
      CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
      checksum is to be computed on segmentation. However, in this case,
      skb->csum_start and skb->csum_offset are never set as raw socket
      transmit path bypasses udp_send_skb() where they are usually set. As a
      result, driver may access invalid memory when trying to calculate the
      checksum and store the result (as observed in virtio_net driver).
      
      Moreover, the very idea of modifying the userspace provided UDP header
      is IMHO against raw socket semantics (I wasn't able to find a document
      clearly stating this or the opposite, though). And while allowing
      CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
      too intrusive change just to handle a corner case like this. Therefore
      disallowing UFO for packets from SOCK_DGRAM seems to be the best option.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b313008
  3. 27 Feb, 2015 2 commits
  4. 14 Nov, 2014 1 commit
  5. 14 Aug, 2014 1 commit
    • Eric Dumazet's avatar
      inetpeer: get rid of ip_id_count · ff1f69a8
      Eric Dumazet authored
      [ Upstream commit 73f156a6 ]
      
      Ideally, we would need to generate IP ID using a per destination IP
      generator.
      
      linux kernels used inet_peer cache for this purpose, but this had a huge
      cost on servers disabling MTU discovery.
      
      1) each inet_peer struct consumes 192 bytes
      
      2) inetpeer cache uses a binary tree of inet_peer structs,
         with a nominal size of ~66000 elements under load.
      
      3) lookups in this tree are hitting a lot of cache lines, as tree depth
         is about 20.
      
      4) If server deals with many tcp flows, we have a high probability of
         not finding the inet_peer, allocating a fresh one, inserting it in
         the tree with same initial ip_id_count, (cf secure_ip_id())
      
      5) We garbage collect inet_peer aggressively.
      
      IP ID generation do not have to be 'perfect'
      
      Goal is trying to avoid duplicates in a short period of time,
      so that reassembly units have a chance to complete reassembly of
      fragments belonging to one message before receiving other fragments
      with a recycled ID.
      
      We simply use an array of generators, and a Jenkin hash using the dst IP
      as a key.
      
      ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
      belongs (it is only used from this file)
      
      secure_ip_id() and secure_ipv6_id() no longer are needed.
      
      Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
      unnecessary decrement/increment of the number of segments.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff1f69a8
  6. 04 Nov, 2013 1 commit
    • Hannes Frederic Sowa's avatar
      inet: fix possible memory corruption with UDP_CORK and UFO · b90cd7b9
      Hannes Frederic Sowa authored
      [ This is a simplified -stable version of a set of upstream commits. ]
      
      This is a replacement patch only for stable which does fix the problems
      handled by the following two commits in -net:
      
      "ip_output: do skb ufo init for peeked non ufo skb as well" (e93b7d74)
      "ip6_output: do skb ufo init for peeked non ufo skb as well" (c547dbf5)
      
      Three frames are written on a corked udp socket for which the output
      netdevice has UFO enabled.  If the first and third frame are smaller than
      the mtu and the second one is bigger, we enqueue the second frame with
      skb_append_datato_frags without initializing the gso fields. This leads
      to the third frame appended regulary and thus constructing an invalid skb.
      
      This fixes the problem by always using skb_append_datato_frags as soon
      as the first frag got enqueued to the skb without marking the packet
      as SKB_GSO_UDP.
      
      The problem with only two frames for ipv6 was fixed by "ipv6: udp
      packets following an UFO enqueued packet need also be handled by UFO"
      (2811ebac).
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b90cd7b9
  7. 13 Oct, 2013 2 commits
  8. 11 May, 2013 1 commit
  9. 01 Apr, 2013 1 commit
  10. 13 Feb, 2013 1 commit
    • Pravin B Shelar's avatar
      net: Fix possible wrong checksum generation. · c9af6db4
      Pravin B Shelar authored
      Patch cef401de (net: fix possible wrong checksum
      generation) fixed wrong checksum calculation but it broke TSO by
      defining new GSO type but not a netdev feature for that type.
      net_gso_ok() would not allow hardware checksum/segmentation
      offload of such packets without the feature.
      
      Following patch fixes TSO and wrong checksum. This patch uses
      same logic that Eric Dumazet used. Patch introduces new flag
      SKBTX_SHARED_FRAG if at least one frag can be modified by
      the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
      info tx_flags rather than gso_type.
      
      tx_flags is better compared to gso_type since we can have skb with
      shared frag without gso packet. It does not link SHARED_FRAG to
      GSO, So there is no need to define netdev feature for this.
      Signed-off-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9af6db4
  11. 09 Dec, 2012 1 commit
  12. 08 Oct, 2012 1 commit
    • Julian Anastasov's avatar
      ipv4: introduce rt_uses_gateway · 155e8336
      Julian Anastasov authored
      Add new flag to remember when route is via gateway.
      We will use it to allow rt_gateway to contain address of
      directly connected host for the cases when DST_NOCACHE is
      used or when the NH exception caches per-destination route
      without DST_NOCACHE flag, i.e. when routes are not used for
      other destinations. By this way we force the neighbour
      resolving to work with the routed destination but we
      can use different address in the packet, feature needed
      for IPVS-DR where original packet for virtual IP is routed
      via route to real IP.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      155e8336
  13. 24 Sep, 2012 1 commit
    • Eric Dumazet's avatar
      net: use a per task frag allocator · 5640f768
      Eric Dumazet authored
      We currently use a per socket order-0 page cache for tcp_sendmsg()
      operations.
      
      This page is used to build fragments for skbs.
      
      Its done to increase probability of coalescing small write() into
      single segments in skbs still in write queue (not yet sent)
      
      But it wastes a lot of memory for applications handling many mostly
      idle sockets, since each socket holds one page in sk->sk_sndmsg_page
      
      Its also quite inefficient to build TSO 64KB packets, because we need
      about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
      page allocator more than wanted.
      
      This patch adds a per task frag allocator and uses bigger pages,
      if available. An automatic fallback is done in case of memory pressure.
      
      (up to 32768 bytes per frag, thats order-3 pages on x86)
      
      This increases TCP stream performance by 20% on loopback device,
      but also benefits on other network devices, since 8x less frags are
      mapped on transmit and unmapped on tx completion. Alexander Duyck
      mentioned a probable performance win on systems with IOMMU enabled.
      
      Its possible some SG enabled hardware cant cope with bigger fragments,
      but their ndo_start_xmit() should already handle this, splitting a
      fragment in sub fragments, since some arches have PAGE_SIZE=65536
      
      Successfully tested on various ethernet devices.
      (ixgbe, igb, bnx2x, tg3, mellanox mlx4)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5640f768
  14. 26 Aug, 2012 1 commit
  15. 21 Aug, 2012 1 commit
  16. 10 Aug, 2012 2 commits
  17. 06 Aug, 2012 1 commit
  18. 22 Jul, 2012 1 commit
  19. 20 Jul, 2012 1 commit
  20. 19 Jul, 2012 1 commit
    • Eric Dumazet's avatar
      ipv4: tcp: remove per net tcp_sock · be9f4a44
      Eric Dumazet authored
      tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
      per network namespace.
      
      This leads to bad behavior on multiqueue NICS, because many cpus
      contend for the socket lock and once socket lock is acquired, extra
      false sharing on various socket fields slow down the operations.
      
      To better resist to attacks, we use a percpu socket. Each cpu can
      run without contention, using appropriate memory (local node)
      
      Additional features :
      
      1) We also mirror the queue_mapping of the incoming skb, so that
      answers use the same queue if possible.
      
      2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()
      
      3) We now limit the number of in-flight RST/ACK [1] packets
      per cpu, instead of per namespace, and we honor the sysctl_wmem_default
      limit dynamically. (Prior to this patch, sysctl_wmem_default value was
      copied at boot time, so any further change would not affect tcp_sock
      limit)
      
      [1] These packets are only generated when no socket was matched for
      the incoming packet.
      Reported-by: default avatarBill Sommerfeld <wsommerfeld@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be9f4a44
  21. 05 Jul, 2012 2 commits
  22. 28 Jun, 2012 1 commit
  23. 13 Jun, 2012 1 commit
    • Michel Machado's avatar
      net-next: add dev_loopback_xmit() to avoid duplicate code · 95603e22
      Michel Machado authored
      Add dev_loopback_xmit() in order to deduplicate functions
      ip_dev_loopback_xmit() (in net/ipv4/ip_output.c) and
      ip6_dev_loopback_xmit() (in net/ipv6/ip6_output.c).
      
      I was about to reinvent the wheel when I noticed that
      ip_dev_loopback_xmit() and ip6_dev_loopback_xmit() do exactly what I
      need and are not IP-only functions, but they were not available to reuse
      elsewhere.
      
      ip6_dev_loopback_xmit() does not have line "skb_dst_force(skb);", but I
      understand that this is harmless, and should be in dev_loopback_xmit().
      Signed-off-by: default avatarMichel Machado <michel@digirati.com.br>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      CC: James Morris <jmorris@namei.org>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jpirko@redhat.com>
      CC: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95603e22
  24. 04 Jun, 2012 1 commit
  25. 15 May, 2012 1 commit
  26. 28 Mar, 2012 1 commit
  27. 05 Dec, 2011 1 commit
  28. 01 Dec, 2011 1 commit
  29. 24 Oct, 2011 1 commit
    • Eric Dumazet's avatar
      ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT · 66b13d99
      Eric Dumazet authored
      There is a long standing bug in linux tcp stack, about ACK messages sent
      on behalf of TIME_WAIT sockets.
      
      In the IP header of the ACK message, we choose to reflect TOS field of
      incoming message, and this might break some setups.
      
      Example of things that were broken :
        - Routing using TOS as a selector
        - Firewalls
        - Trafic classification / shaping
      
      We now remember in timewait structure the inet tos field and use it in
      ACK generation, and route lookup.
      
      Notes :
       - We still reflect incoming TOS in RST messages.
       - We could extend MuraliRaja Muniraju patch to report TOS value in
      netlink messages for TIME_WAIT sockets.
       - A patch is needed for IPv6
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66b13d99
  30. 19 Oct, 2011 1 commit
  31. 25 Aug, 2011 1 commit
  32. 08 Aug, 2011 1 commit
  33. 03 Aug, 2011 1 commit
  34. 22 Jul, 2011 1 commit
  35. 18 Jul, 2011 1 commit
  36. 17 Jul, 2011 1 commit