1. 19 Sep, 2016 8 commits
    • Thadeu Lima de Souza Cascardo's avatar
      openvswitch: fix flow stats accounting when node 0 is not possible · 40773966
      Thadeu Lima de Souza Cascardo authored
      On a system with only node 1 as possible, all statistics is going to be
      accounted on node 0 as it will have a single writer.
      
      However, when getting and clearing the statistics, node 0 is not going
      to be considered, as it's not a possible node.
      
      Tested that statistics are not zero on a system with only node 1
      possible. Also compile-tested with CONFIG_NUMA off.
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@redhat.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40773966
    • David S. Miller's avatar
      Merge branch 'sctp-transmit-errs' · 829ff348
      David S. Miller authored
      Xin Long says:
      
      ====================
      sctp: fix the transmit err process
      
      This patchset is to improve the transmit err process and also fix some
      issues.
      
      After this patchset, once the chunks are enqueued successfully, even
      if the chunks fail to send out, no matter because of nodst or nomem,
      no err retruns back to users any more. Instead, they are taken care
      of by retransmit.
      
      v1->v2:
        - add more details to the changelog in patch 1/6
        - add Fixes: tag in patch 2/6, 3/6
        - also revert 69b5777f in patch 3/6
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      829ff348
    • Xin Long's avatar
      sctp: not return ENOMEM err back in sctp_packet_transmit · 41001faf
      Xin Long authored
      As David and Marcelo's suggestion, ENOMEM err shouldn't return back to
      user in transmit path. Instead, sctp's retransmit would take care of
      the chunks that fail to send because of ENOMEM.
      
      This patch is only to do some release job when alloc_skb fails, not to
      return ENOMEM back any more.
      
      Besides, it also cleans up sctp_packet_transmit's err path, and fixes
      some issues in err path:
      
       - It didn't free the head skb in nomem: path.
       - No need to check nskb in no_route: path.
       - It should goto err: path if alloc_skb fails for head.
       - Not all the NOMEMs should free nskb.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41001faf
    • Xin Long's avatar
      sctp: make sctp_outq_flush/tail/uncork return void · 83dbc3d4
      Xin Long authored
      sctp_outq_flush return value is meaningless now, this patch is
      to make sctp_outq_flush return void, as well as sctp_outq_fail
      and sctp_outq_uncork.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83dbc3d4
    • Xin Long's avatar
      sctp: save transmit error to sk_err in sctp_outq_flush · 64519440
      Xin Long authored
      Every time when sctp calls sctp_outq_flush, it sends out the chunks of
      control queue, retransmit queue and data queue. Even if some trunks are
      failed to transmit, it still has to flush all the transports, as it's
      the only chance to clean that transmit_list.
      
      So the latest transmit error here should be returned back. This transmit
      error is an internal error of sctp stack.
      
      I checked all the places where it uses the transmit error (the return
      value of sctp_outq_flush), most of them are actually just save it to
      sk_err.
      
      Except for sctp_assoc/endpoint_bh_rcv, they will drop the chunk if
      it's failed to send a REPLY, which is actually incorrect, as we can't
      be sure the error that sctp_outq_flush returns is from sending that
      REPLY.
      
      So it's meaningless for sctp_outq_flush to return error back.
      
      This patch is to save transmit error to sk_err in sctp_outq_flush, the
      new error can update the old value. Eventually, sctp_wait_for_* would
      check for it.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64519440
    • Xin Long's avatar
      sctp: free msg->chunks when sctp_primitive_SEND return err · b61c654f
      Xin Long authored
      Last patch "sctp: do not return the transmit err back to sctp_sendmsg"
      made sctp_primitive_SEND return err only when asoc state is unavailable.
      In this case, chunks are not enqueued, they have no chance to be freed if
      we don't take care of them later.
      
      This Patch is actually to revert commit 1cd4d5c4 ("sctp: remove the
      unused sctp_datamsg_free()"), commit 69b5777f ("sctp: hold the chunks
      only after the chunk is enqueued in outq") and commit 8b570dc9 ("sctp:
      only drop the reference on the datamsg after sending a msg"), to use
      sctp_datamsg_free to free the chunks of current msg.
      
      Fixes: 8b570dc9 ("sctp: only drop the reference on the datamsg after sending a msg")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b61c654f
    • Xin Long's avatar
      sctp: do not return the transmit err back to sctp_sendmsg · 66388f2c
      Xin Long authored
      Once a chunk is enqueued successfully, sctp queues can take care of it.
      Even if it is failed to transmit (like because of nomem), it should be
      put into retransmit queue.
      
      If sctp report this error to users, it confuses them, they may resend
      that msg, but actually in kernel sctp stack is in charge of retransmit
      it already.
      
      Besides, this error probably is not from the failure of transmitting
      current msg, but transmitting or retransmitting another msg's chunks,
      as sctp_outq_flush just tries to send out all transports' chunks.
      
      This patch is to make sctp_cmd_send_msg return avoid, and not return the
      transmit err back to sctp_sendmsg
      
      Fixes: 8b570dc9 ("sctp: only drop the reference on the datamsg after sending a msg")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66388f2c
    • Xin Long's avatar
      sctp: remove the unnecessary state check in sctp_outq_tail · 2c89791e
      Xin Long authored
      Data Chunks are only sent by sctp_primitive_SEND, in which sctp checks
      the asoc's state through statetable before calling sctp_outq_tail. So
      there's no need to check the asoc's state again in sctp_outq_tail.
      
      Besides, sctp_do_sm is protected by lock_sock, even if sending msg is
      interrupted by timer events, the event's processes still need to acquire
      lock_sock first. It means no others CMDs can be enqueue into side effect
      list before CMD_SEND_MSG to change asoc->state, so it's safe to remove it.
      
      This patch is to remove redundant asoc->state check from sctp_outq_tail.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c89791e
  2. 17 Sep, 2016 18 commits
    • David S. Miller's avatar
      Merge branch 'ip_tunnel-collect_md' · fd9527f4
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      ip_tunnel: add collect_md mode to IPv4/IPv6 tunnels
      
      Similar to geneve, vxlan, gre tunnels implement 'collect metadata' mode
      in ipip, ipip6, ip6ip6 tunnels.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd9527f4
    • Alexei Starovoitov's avatar
      samples/bpf: add comprehensive ipip, ipip6, ip6ip6 test · 173ca26e
      Alexei Starovoitov authored
      the test creates 3 namespaces with veth connected via bridge.
      First two namespaces simulate two different hosts with the same
      IPv4 and IPv6 addresses configured on the tunnel interface and they
      communicate with outside world via standard tunnels.
      Third namespace creates collect_md tunnel that is driven by BPF
      program which selects different remote host (either first or
      second namespace) based on tcp dest port number while tcp dst
      ip is the same.
      This scenario is rough approximation of load balancer use case.
      The tests check both traditional tunnel configuration and collect_md mode.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      173ca26e
    • Alexei Starovoitov's avatar
      samples/bpf: extend test_tunnel_bpf.sh with IPIP test · a1c82704
      Alexei Starovoitov authored
      extend existing tests for vxlan, geneve, gre to include IPIP tunnel.
      It tests both traditional tunnel configuration and
      dynamic via bpf helpers.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1c82704
    • Alexei Starovoitov's avatar
      ip6_tunnel: add collect_md mode to IPv6 tunnels · 8d79266b
      Alexei Starovoitov authored
      Similar to gre, vxlan, geneve tunnels allow IPIP6 and IP6IP6 tunnels
      to operate in 'collect metadata' mode.
      Unlike ipv4 code here it's possible to reuse ip6_tnl_xmit() function
      for both collect_md and traditional tunnels.
      bpf_skb_[gs]et_tunnel_key() helpers and ovs (in the future) are the users.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d79266b
    • Alexei Starovoitov's avatar
      ip_tunnel: add collect_md mode to IPIP tunnel · cfc7381b
      Alexei Starovoitov authored
      Similar to gre, vxlan, geneve tunnels allow IPIP tunnels to
      operate in 'collect metadata' mode.
      bpf_skb_[gs]et_tunnel_key() helpers can make use of it right away.
      ovs can use it as well in the future (once appropriate ovs-vport
      abstractions and user apis are added).
      Note that just like in other tunnels we cannot cache the dst,
      since tunnel_info metadata can be different for every packet.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfc7381b
    • Julia Lawall's avatar
      l2tp: constify net_device_ops structures · eb94737d
      Julia Lawall authored
      Check for net_device_ops structures that are only stored in the netdev_ops
      field of a net_device structure.  This field is declared const, so
      net_device_ops structures that have this property can be declared as const
      also.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r disable optional_qualifier@
      identifier i;
      position p;
      @@
      static struct net_device_ops i@p = { ... };
      
      @ok@
      identifier r.i;
      struct net_device e;
      position p;
      @@
      e.netdev_ops = &i@p;
      
      @bad@
      position p != {r.p,ok.p};
      identifier r.i;
      struct net_device_ops e;
      @@
      e@i@p
      
      @depends on !bad disable optional_qualifier@
      identifier r.i;
      @@
      static
      +const
       struct net_device_ops i = { ... };
      // </smpl>
      
      The result of size on this file before the change is:
         text	      data     bss     dec         hex	  filename
         3401        931      44    4376        1118	net/l2tp/l2tp_eth.o
      
      and after the change it is:
         text	     data        bss	    dec	    hex	filename
         3993       347         44       4384    1120	net/l2tp/l2tp_eth.o
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb94737d
    • Julia Lawall's avatar
      dwc_eth_qos: constify net_device_ops structures · 37307504
      Julia Lawall authored
      Check for net_device_ops structures that are only stored in the netdev_ops
      field of a net_device structure.  This field is declared const, so
      net_device_ops structures that have this property can be declared as const
      also.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r disable optional_qualifier@
      identifier i;
      position p;
      @@
      static struct net_device_ops i@p = { ... };
      
      @ok@
      identifier r.i;
      struct net_device e;
      position p;
      @@
      e.netdev_ops = &i@p;
      
      @bad@
      position p != {r.p,ok.p};
      identifier r.i;
      struct net_device_ops e;
      @@
      e@i@p
      
      @depends on !bad disable optional_qualifier@
      identifier r.i;
      @@
      static
      +const
       struct net_device_ops i = { ... };
      // </smpl>
      
      The result of size on this file before the change is:
         text	      data     bss     dec         hex	  filename
        21623       1316      40   22979        59c3
         drivers/net/ethernet/synopsys/dwc_eth_qos.o
      
      and after the change it is:
         text	     data        bss	    dec	    hex	filename
        22199       724         40      22963    59b3
         drivers/net/ethernet/synopsys/dwc_eth_qos.o
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37307504
    • Julia Lawall's avatar
      hisilicon: constify net_device_ops structures · 66f58ec4
      Julia Lawall authored
      Check for net_device_ops structures that are only stored in the netdev_ops
      field of a net_device structure.  This field is declared const, so
      net_device_ops structures that have this property can be declared as const
      also.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r disable optional_qualifier@
      identifier i;
      position p;
      @@
      static struct net_device_ops i@p = { ... };
      
      @ok@
      identifier r.i;
      struct net_device e;
      position p;
      @@
      e.netdev_ops = &i@p;
      
      @bad@
      position p != {r.p,ok.p};
      identifier r.i;
      struct net_device_ops e;
      @@
      e@i@p
      
      @depends on !bad disable optional_qualifier@
      identifier r.i;
      @@
      static
      +const
       struct net_device_ops i = { ... };
      // </smpl>
      
      The result of size on this file before the change is:
      
         text	      data     bss     dec         hex	  filename
         7995	       848       8    8851        2293
         drivers/net/ethernet/hisilicon/hip04_eth.o
      
      and after the change it is:
      
         text	     data        bss	    dec	    hex	filename
         8571	      256          8       8835    2283
         drivers/net/ethernet/hisilicon/hip04_eth.o
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66f58ec4
    • Alan Cox's avatar
      llc: switch type to bool as the timeout is only tested versus 0 · 5ff904d5
      Alan Cox authored
      (As asked by Dave in Februrary)
      Signed-off-by: default avatarAlan Cox <alan@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ff904d5
    • David Ahern's avatar
      net: l3mdev: Remove netif_index_is_l3_master · 19664c6a
      David Ahern authored
      No longer used after e0d56fdd ("net: l3mdev: remove redundant calls")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19664c6a
    • David Ahern's avatar
      net: vrf: Remove RT_FL_TOS · e1fb9d03
      David Ahern authored
      No longer used after d66f6c0a ("net: ipv4: Remove l3mdev_get_saddr")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1fb9d03
    • Eric Dumazet's avatar
      tcp: prepare skbs for better sack shifting · 3613b3db
      Eric Dumazet authored
      With large BDP TCP flows and lossy networks, it is very important
      to keep a low number of skbs in the write queue.
      
      RACK and SACK processing can perform a linear scan of it.
      
      We should avoid putting any payload in skb->head, so that SACK
      shifting can be done if needed.
      
      With this patch, we allow to pack ~0.5 MB per skb instead of
      the 64KB initially cooked at tcp_sendmsg() time.
      
      This gives a reduction of number of skbs in write queue by eight.
      tcp_rack_detect_loss() likes this.
      
      We still allow payload in skb->head for first skb put in the queue,
      to not impact RPC workloads.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3613b3db
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-next-for-davem-2016-09-15' of... · e812bd90
      David S. Miller authored
      Merge tag 'wireless-drivers-next-for-davem-2016-09-15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      wireless-drivers-next patches for 4.9
      
      Major changes:
      
      iwlwifi
      
      * preparation for new a000 HW continues
      * some DQA improvements
      * add support for GMAC
      * add support for 9460, 9270 and 9170 series
      
      mwifiex
      
      * support random MAC address for scanning
      * add HT aggregation support for adhoc mode
      * add custom regulatory domain support
      * add manufacturing mode support via nl80211 testmode interface
      
      bcma
      
      * support BCM53573 series of wireless SoCs
      
      bitfield.h
      
      * add FIELD_PREP() and FIELD_GET() macros
      
      mt7601u
      
      * convert to use the new bitfield.h macros
      
      brcmfmac
      
      * add support for bcm4339 chip with modalias sdio:c00v02D0d4339
      
      ath10k
      
      * add nl80211 testmode support for 10.4 firmware
      * hide kernel addresses from logs using %pK format specifier
      * implement NAPI support
      * enable peer stats by default
      
      ath9k
      
      * use ieee80211_tx_status_noskb where possible
      
      wil6210
      
      * extract firmware capabilities from the firmware file
      
      ath6kl
      
      * enable firmware crash dumps on the AR6004
      
      ath-current is also merged to fix a conflict in ath10k.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e812bd90
    • David S. Miller's avatar
      Merge branch 'mlx5e-order-0' · 31b96621
      David S. Miller authored
      Tariq Toukan says:
      
      ====================
      mlx5e Order-0 pages for Striding RQ
      
      In this series, we refactor our Striding RQ receive-flow to always use
      fragmented WQEs (Work Queue Elements) using order-0 pages, omitting the
      flow that allocates and splits high-order pages which would fragment
      and deplete high-order pages in the system.
      
      The first patch gives a slight degradation, but opens the opportunity
      to using a simple page-cache mechanism of a fair size.
      The page-cache, implemented in patch 3, not only closes the performance
      gap but even gives a gain.
      In patch 2 we re-organize the code to better manage the calls for
      alloc/de-alloc pages in the RX flow.
      
      Series generated against net-next commit:
      bed806cb "Merge branch 'mlxsw-ethtool'"
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31b96621
    • Tariq Toukan's avatar
      net/mlx5e: Implement RX mapped page cache for page recycle · 4415a031
      Tariq Toukan authored
      Instead of reallocating and mapping pages for RX data-path,
      recycle already used pages in a per ring cache.
      
      Performance tests:
      The following results were measured on a freshly booted system,
      giving optimal baseline performance, as high-order pages are yet to
      be fragmented and depleted.
      
      We ran pktgen single-stream benchmarks, with iptables-raw-drop:
      
      Single stride, 64 bytes:
      * 4,739,057 - baseline
      * 4,749,550 - order0 no cache
      * 4,786,899 - order0 with cache
      1% gain
      
      Larger packets, no page cross, 1024 bytes:
      * 3,982,361 - baseline
      * 3,845,682 - order0 no cache
      * 4,127,852 - order0 with cache
      3.7% gain
      
      Larger packets, every 3rd packet crosses a page, 1500 bytes:
      * 3,731,189 - baseline
      * 3,579,414 - order0 no cache
      * 3,931,708 - order0 with cache
      5.4% gain
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4415a031
    • Tariq Toukan's avatar
      net/mlx5e: Introduce API for RX mapped pages · a5a0c590
      Tariq Toukan authored
      Manage the allocation and deallocation of mapped RX pages only
      through dedicated API functions.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5a0c590
    • Tariq Toukan's avatar
      net/mlx5e: Single flow order-0 pages for Striding RQ · 7e426671
      Tariq Toukan authored
      To improve the memory consumption scheme, we omit the flow that
      demands and splits high-order pages in Striding RQ, and stay
      with a single Striding RQ flow that uses order-0 pages.
      
      Moving to fragmented memory allows the use of larger MPWQEs,
      which reduces the number of UMR posts and filler CQEs.
      
      Moving to a single flow allows several optimizations that improve
      performance, especially in production servers where we would
      anyway fallback to order-0 allocations:
      - inline functions that were called via function pointers.
      - improve the UMR post process.
      
      This patch alone is expected to give a slight performance reduction.
      However, the new memory scheme gives the possibility to use a page-cache
      of a fair size, that doesn't inflate the memory footprint, which will
      dramatically fix the reduction and even give a performance gain.
      
      Performance tests:
      The following results were measured on a freshly booted system,
      giving optimal baseline performance, as high-order pages are yet to
      be fragmented and depleted.
      
      We ran pktgen single-stream benchmarks, with iptables-raw-drop:
      
      Single stride, 64 bytes:
      * 4,739,057 - baseline
      * 4,749,550 - this patch
      no reduction
      
      Larger packets, no page cross, 1024 bytes:
      * 3,982,361 - baseline
      * 3,845,682 - this patch
      3.5% reduction
      
      Larger packets, every 3rd packet crosses a page, 1500 bytes:
      * 3,731,189 - baseline
      * 3,579,414 - this patch
      4% reduction
      
      Fixes: 461017cb ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
      Fixes: bc77b240 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e426671
    • David Howells's avatar
      rxrpc: Make IPv6 support conditional on CONFIG_IPV6 · d1912747
      David Howells authored
      Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it.
      This is then made conditional on CONFIG_IPV6.
      
      Without this, the following can be seen:
      
         net/built-in.o: In function `rxrpc_init_peer':
      >> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags'
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1912747
  3. 16 Sep, 2016 14 commits