1. 17 Jan, 2017 9 commits
    • Robert Shearman's avatar
      mpls: Packet stats · 27d69105
      Robert Shearman authored
      Having MPLS packet stats is useful for observing network operation and
      for diagnosing network problems. In the absence of anything better,
      RFC2863 and RFC3813 are used for guidance for which stats to expose
      and the semantics of them. In particular rx_noroutes maps to in
      unknown protos in RFC2863. The stats are exposed to userspace via
      AF_MPLS attributes embedded in the IFLA_STATS_AF_SPEC attribute of
      RTM_GETSTATS messages.
      
      All the introduced fields are 64-bit, even error ones, to ensure no
      overflow with long uptimes. Per-CPU counters are used to avoid
      cache-line contention on the commonly used fields. The other fields
      have also been made per-CPU for code to avoid performance problems in
      error conditions on the assumption that on some platforms the cost of
      atomic operations could be more expensive than sending the packet
      (which is what would be done in the success case). If that's not the
      case, we could instead not use per-CPU counters for these fields.
      
      Only unicast and non-fragment are exposed at the moment, but other
      counters can be exposed in the future either by adding to the end of
      struct mpls_link_stats or by additional netlink attributes in the
      AF_MPLS IFLA_STATS_AF_SPEC nested attribute.
      Signed-off-by: default avatarRobert Shearman <rshearma@brocade.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27d69105
    • Robert Shearman's avatar
      net: AF-specific RTM_GETSTATS attributes · aefb4d4a
      Robert Shearman authored
      Add the functionality for including address-family-specific per-link
      stats in RTM_GETSTATS messages. This is done through adding a new
      IFLA_STATS_AF_SPEC attribute under which address family attributes are
      nested and then the AF-specific attributes can be further nested. This
      follows the model of IFLA_AF_SPEC on RTM_*LINK messages and it has the
      advantage of presenting an easily extended hierarchy. The rtnl_af_ops
      structure is extended to provide AFs with the opportunity to fill and
      provide the size of their stats attributes.
      
      One alternative would have been to provide AFs with the ability to add
      attributes directly into the RTM_GETSTATS message without a nested
      hierarchy. I discounted this approach as it increases the rate at
      which the 32 attribute number space is used up and it makes
      implementation a little more tricky for stats dump resuming (at the
      moment the order in which attributes are added to the message has to
      match the numeric order of the attributes).
      
      Another alternative would have been to register per-AF RTM_GETSTATS
      handlers. I discounted this approach as I perceived a common use-case
      to be getting all the stats for an interface and this approach would
      necessitate multiple requests/dumps to retrieve them all.
      Signed-off-by: default avatarRobert Shearman <rshearma@brocade.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aefb4d4a
    • Philippe Reynes's avatar
      net: marvell: sky2: use new api ethtool_{get|set}_link_ksettings · 55f78fcd
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55f78fcd
    • Philippe Reynes's avatar
      net: marvell: skge: use new api ethtool_{get|set}_link_ksettings · 0f826385
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      
      The callback set_link_ksettings no longer update the value
      of advertising, as the struct ethtool_link_ksettings is
      defined as const.
      
      As I don't have the hardware, I'd be very pleased if
      someone may test this patch.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f826385
    • Philippe Reynes's avatar
      net: jme: use new api ethtool_{get|set}_link_ksettings · c523838c
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      
      As I don't have the hardware, I'd be very pleased if
      someone may test this patch.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c523838c
    • Philippe Reynes's avatar
      net: korina: use new api ethtool_{get|set}_link_ksettings · af473688
      Philippe Reynes authored
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: default avatarPhilippe Reynes <tremyfr@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af473688
    • David S. Miller's avatar
      Merge branch 'mvneta-xmit_more-bql' · b8128c42
      David S. Miller authored
      Marcin Wojtas says:
      
      ====================
      mvneta xmit_more and bql support
      
      This is a delayed v2 of short patchset, which introduces xmit_more and BQL
      to mvneta driver. The only one change was added in xmit_more support -
      condition check preventing excessive descriptors concatenation before
      flushing in HW.
      
      Any comments or feedback would be welcome.
      
      Changelog:
      v1 -> v2:
      
      * Add checking condition that ensures too much descriptors are not
        concatenated before flushing in HW.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8128c42
    • Marcin Wojtas's avatar
      net: mvneta: add BQL support · a29b6235
      Marcin Wojtas authored
      Tests showed that when whole bandwidth is consumed, the latency for
      various kind of traffic can reach high values. With saturated
      link (e.g. with iperf from target to host) simple ping could take
      significant amount of time. BQL proved to improve this situation
      when implemented in mvneta driver. Measurements of ping latency
      for 3 link speeds:
      Speed | Latency w/o BQL | Latency with BQL
      10    |      7-14 ms    |     3.5 ms
      100   |      2-12 ms    |     0.6 ms
      1000  |   often timeout |   up to 2ms
      
      Decreasing latency as above result in sligt performance cost - 4kpps
      (-1.4%) when pushing 64B packets via two bridged interfaces of Armada 38x.
      For 1500B packets in the same setup, the mpstat tool showed +8% of
      CPU occupation (default affinity, second CPU idle). Even though this
      cost seems reasonable to take, considering other improvements.
      
      This commit adds byte queue limit mechanism for the mvneta driver.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a29b6235
    • Simon Guinot's avatar
      net: mvneta: add xmit_more support · 2a90f7e1
      Simon Guinot authored
      Basing on xmit_more flag of the skb, TX descriptors can be concatenated
      before flushing. This commit delay Tx descriptor flush if the queue is
      running and if there is more skb's to send.
      
      A maximum allowed number of descriptors for flushing at once due to
      MVNETA_TXQ_UPDATE_REG(q) reqisters limitation, is 255. Because of that
      a new macro was added (MVNETA_TXQ_DEC_SENT_MASK) in order to ensure that
      concatenated amount of descriptor does not exceed that value.
      Signed-off-by: default avatarSimon Guinot <simon.guinot@sequanux.org>
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a90f7e1
  2. 16 Jan, 2017 15 commits
  3. 14 Jan, 2017 16 commits
    • David S. Miller's avatar
      Merge tag 'mac80211-next-for-davem-2017-01-13' of... · bb60b8b3
      David S. Miller authored
      Merge tag 'mac80211-next-for-davem-2017-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes Berg says:
      
      ====================
      For 4.11, we seem to have more than in the past few releases:
       * socket owner support for connections, so when the wifi
         manager (e.g. wpa_supplicant) is killed, connections are
         torn down - wpa_supplicant is critical to managing certain
         operations, and can opt in to this where applicable
       * minstrel & minstrel_ht updates to be more efficient (time and space)
       * set wifi_acked/wifi_acked_valid for skb->destructor use in the
         kernel, which was already available to userspace
       * don't indicate new mesh peers that might be used if there's no
         room to add them
       * multicast-to-unicast support in mac80211, for better medium usage
         (since unicast frames can use *much* higher rates, by ~3 orders of
         magnitude)
       * add API to read channel (frequency) limitations from DT
       * add infrastructure to allow randomizing public action frames for
         MAC address privacy (still requires driver support)
       * many cleanups and small improvements/fixes across the board
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb60b8b3
    • Shyam Saini's avatar
      cxgb4: Remove redundant memset before memcpy · ca4b5eb8
      Shyam Saini authored
      The region set by the call to memset, immediately overwritten by
      the subsequent call to memcpy and thus makes the  memset redundant.
      
      Also remove the memset((&info, 0, sizeof(info)) on line 398 because
      info is memcpy()'ed to before being used in the loop and it isn't
      used outside of the loop.
      Signed-off-by: default avatarShyam Saini <mayhs11saini@gmail.com>
      Reviewed-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca4b5eb8
    • Ganesh Goudar's avatar
      cxgb4: Fix misleading packet/frame count stats. · f750e82e
      Ganesh Goudar authored
      Do not count pause frames as part of general TX/RX frame
      counters.
      
      Based on the original work of Casey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f750e82e
    • David S. Miller's avatar
      Merge branch 'bnxt_en-next' · 4b89aa3c
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Misc. updates for net-next.
      
      Miscellaneous updates including firmware spec update, ethtool -p blinking
      LED support, RDMA SRIOV config callback, and minor fixes.
      
      v2: Dropped the DCBX RoCE app TLV patch until the ETH_P_IBOE RDMA patch
      is merged.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b89aa3c
    • Michael Chan's avatar
      bnxt_en: Add the ulp_sriov_cfg hooks for bnxt_re RDMA driver. · 2f593846
      Michael Chan authored
      Add the ulp_sriov_cfg callbacks when the number of VFs is changing.  This
      allows the RDMA driver to provision RDMA resources for the VFs.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f593846
    • Michael Chan's avatar
      bnxt_en: Add support for ethtool -p. · 5ad2cbee
      Michael Chan authored
      Add LED blinking code to support ethtool -p on the PF.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ad2cbee
    • Michael Chan's avatar
    • Michael Chan's avatar
      bnxt_en: Clear TPA flags when BNXT_FLAG_NO_AGG_RINGS is set. · 341138c3
      Michael Chan authored
      Commit bdbd1eb5 ("bnxt_en: Handle no aggregation ring gracefully.")
      introduced the BNXT_FLAG_NO_AGG_RINGS flag.  For consistency,
      bnxt_set_tpa_flags() should also clear TPA flags when there are no
      aggregation rings.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      341138c3
    • Michael Chan's avatar
      bnxt_en: Fix compiler warnings when CONFIG_RFS_ACCEL is not defined. · b7429954
      Michael Chan authored
      CC [M]  drivers/net/ethernet/broadcom/bnxt/bnxt.o
      drivers/net/ethernet/broadcom/bnxt/bnxt.c:4947:21: warning: ‘bnxt_get_max_func_rss_ctxs’ defined but not used [-Wunused-function]
       static unsigned int bnxt_get_max_func_rss_ctxs(struct bnxt *bp)
                           ^
        CC [M]  drivers/net/ethernet/broadcom/bnxt/bnxt.o
      drivers/net/ethernet/broadcom/bnxt/bnxt.c:4956:21: warning: ‘bnxt_get_max_func_vnics’ defined but not used [-Wunused-function]
       static unsigned int bnxt_get_max_func_vnics(struct bnxt *bp)
                           ^
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7429954
    • David S. Miller's avatar
      Merge branch 'tcp-RACK-fast-recovery' · 718e14bb
      David S. Miller authored
      Yuchung Cheng says:
      
      ====================
      tcp: RACK fast recovery
      
      The patch set enables RACK loss detection (draft-ietf-tcpm-rack-01)
      to trigger fast recovery with a reordering timer.
      
      Previously RACK has been running in auxiliary mode where it is
      used to detect packet losses once the recovery has triggered by
      other algorithms (e.g., FACK). By inspecting packet timestamps,
      RACK can start ACK-driven repairs timely. A few similar heuristics
      are no longer needed and are either removed or disabled to reduce
      the complexity of the Linux TCP loss recovery engine:
      
        1. FACK (Forward Acknowledgement)
        2. Early Retransmit (RFC5827)
        3. thin_dupack (fast recovery on single DUPACK for thin-streams)
        4. NCR (Non-Congestion Robustness RFC4653) (RFC4653)
        5. Forward Retransmit
      
      After this change, Linux's loss recovery algorithms consist of
        1. Conventional DUPACK threshold approach (RFC6675)
        2. RACK and Tail Loss Probe (draft-ietf-tcpm-rack-01)
        3. RTO plus F-RTO extension (RFC5682)
      
      The patch set has been tested on Google servers extensively and
      presented in several IETF meetings. The data suggests that RACK
      successfully improves recovery performance:
      https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-draft-ietf-tcpm-rack-01.pdf
      https://www.ietf.org/proceedings/96/slides/slides-96-tcpm-3.pdf
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      718e14bb
    • Yuchung Cheng's avatar
      tcp: disable fack by default · 94bdc978
      Yuchung Cheng authored
      This patch disables FACK by default as RACK is the successor of FACK
      (inspired by the insights behind FACK).
      
      FACK[1] in Linux works as follows: a packet P is deemed lost,
      if packet Q of higher sequence is s/acked and P and Q are distant
      by at least dupthresh number of packets in sequence space.
      
      FACK is more aggressive than the IETF recommened recovery for SACK
      (RFC3517 A Conservative Selective Acknowledgment (SACK)-based Loss
       Recovery Algorithm for TCP), because a single SACK may trigger
      fast recovery. This obviously won't work well with reordering so
      FACK is dynamically disabled upon detecting reordering.
      
      RACK supersedes FACK by using time distance instead of sequence
      distance. On reordering, RACK waits for a quarter of RTT receiving
      a single SACK before starting recovery. (the timer can be made more
      adaptive in the future by measuring reordering distance in time,
      but currently RTT/4 seem to work well.) Once the recovery starts,
      RACK behaves almost like FACK because it reduces the reodering
      window to 1ms, so it fast retransmits quickly. In addition RACK
      can detect loss retransmission as it does not care about the packet
      sequences (being repeated or not), which is extremely useful when
      the connection is going through a traffic policer.
      
      Google server experiments indicate that disabling FACK after enabling
      RACK has negligible impact on the overall loss recovery performance
      with more reordering events detected.  But we still keep the FACK
      implementation for backup if RACK has bugs that needs to be disabled.
      
      [1] M. Mathis, J. Mahdavi, "Forward Acknowledgment: Refining
      TCP Congestion Control," In Proceedings of SIGCOMM '96, August 1996.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94bdc978
    • Yuchung Cheng's avatar
      tcp: remove thin_dupack feature · 4a7f6009
      Yuchung Cheng authored
      Thin stream DUPACK is to start fast recovery on only one DUPACK
      provided the connection is a thin stream (i.e., low inflight).  But
      this older feature is now subsumed with RACK. If a connection
      receives only a single DUPACK, RACK would arm a reordering timer
      and soon starts fast recovery instead of timeout if no further
      ACKs are received.
      
      The socket option (THIN_DUPACK) is kept as a nop for compatibility.
      Note that this patch does not change another thin-stream feature
      which enables linear RTO. Although it might be good to generalize
      that in the future (i.e., linear RTO for the first say 3 retries).
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a7f6009
    • Yuchung Cheng's avatar
      tcp: remove RFC4653 NCR · ac229dca
      Yuchung Cheng authored
      This patch removes the (partial) implementation of the aggressive
      limited transmit in RFC4653 TCP Non-Congestion Robustness (NCR).
      
      NCR is a mitigation to the problem created by the dynamic
      DUPACK threshold.  With the current adaptive DUPACK threshold
      (tp->reordering) could cause timeouts by preventing fast recovery.
      For example, if the last packet of a cwnd burst was reordered, the
      threshold will be set to the size of cwnd. But if next application
      burst is smaller than threshold and has drops instead of reorderings,
      the sender would not trigger fast recovery but instead resorts to a
      timeout recovery.
      
      NCR mitigates this issue by checking the number of DUPACKs against
      the current flight size additionally. The techniqueue is similar to
      the early retransmit RFC.
      
      With RACK loss detection, this mitigation is not needed, because RACK
      does not use DUPACK threshold to detect losses. RACK arms a reordering
      timer to fire at most a quarter RTT later to start fast recovery.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac229dca
    • Yuchung Cheng's avatar
      tcp: remove early retransmit · bec41a11
      Yuchung Cheng authored
      This patch removes the support of RFC5827 early retransmit (i.e.,
      fast recovery on small inflight with <3 dupacks) because it is
      subsumed by the new RACK loss detection. More specifically when
      RACK receives DUPACKs, it'll arm a reordering timer to start fast
      recovery after a quarter of (min)RTT, hence it covers the early
      retransmit except RACK does not limit itself to specific inflight
      or dupack numbers.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bec41a11
    • Yuchung Cheng's avatar
      tcp: remove forward retransmit feature · 840a3cbe
      Yuchung Cheng authored
      Forward retransmit is an esoteric feature in RFC3517 (condition(3)
      in the NextSeg()). Basically if a packet is not considered lost by
      the current criteria (# of dupacks etc), but the congestion window
      has room for more packets, then retransmit this packet.
      
      However it actually conflicts with the rest of recovery design. For
      example, when reordering is detected we want to be conservative
      in retransmitting packets but forward-retransmit feature would
      break that to force more retransmission. Also the implementation is
      fairly complicated inside the retransmission logic inducing extra
      iterations in the write queue. With RACK losses are being detected
      timely and this heuristic is no longer necessary. There this patch
      removes the feature.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      840a3cbe
    • Yuchung Cheng's avatar
      tcp: extend F-RTO to catch more spurious timeouts · 89fe18e4
      Yuchung Cheng authored
      Current F-RTO reverts cwnd reset whenever a never-retransmitted
      packet was (s)acked. The timeout can be declared spurious because
      the packets acknoledged with this ACK was transmitted before the
      timeout, so clearly not all the packets are lost to reset the cwnd.
      
      This nice detection does not really depend F-RTO internals. This
      patch applies the detection universally. On Google servers this
      change detected 20% more spurious timeouts.
      Suggested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89fe18e4