1. 14 Nov, 2017 14 commits
  2. 13 Nov, 2017 26 commits
    • David S. Miller's avatar
      Merge branch 'net-improve-the-process-of-redirect-and-toobig-for-ipv6-tunnels' · ede372dc
      David S. Miller authored
      Xin Long says:
      
      ====================
      net: improve the process of redirect and toobig for ipv6 tunnels
      
      Now let's say there are 3 kinds of icmp packets to process for tunnels,
      toobig(needfrag), redirect, others, their process should be:
      
       - toobig(needfrag)
         update the lower dst's pmtu by route cache, also update sk dst's pmtu
         if possible, or it will be fine if sk dst pmtu will get updated on tx
         path.
      
       - redirect
         update the lower dst's gw by route cache and return, no need to send
         this redirect packet to user sk.
      
       - others
         send the packet to user's sk, or it will also be fine to use err_count
         to count it and report fail link on tx path.
      
      All ipv4 tunnels basically follow this while some of ipv6 tunnels are
      doing in different ways, like ip6gre and ip6_tunnels update tnl dev's
      mtu instead of updating lower dst pmtu, no redirect process on their
      err_handlers, which doesn't make any sense and even causes performance
      problems.
      
      This patchset is to improve the process of redirect and toobig for ip6gre
      ip4ip6, ip6ip6 tunnels, as in ipv4 tunnels.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ede372dc
    • Xin Long's avatar
      ip6_tunnel: clean up ip4ip6 and ip6ip6's err_handlers · 77552cfa
      Xin Long authored
      This patch is to remove some useless codes of redirect and fix some
      indents on ip4ip6 and ip6ip6's err_handlers.
      
      Note that redirect icmp packet is already processed in ip6_tnl_err,
      the old redirect codes in ip4ip6_err actually never worked even
      before this patch. Besides, there's no need to send redirect to
      user's sk, it's for lower dst, so just remove it in this patch.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77552cfa
    • Xin Long's avatar
      ip6_tunnel: process toobig in a better way · b00f5432
      Xin Long authored
      The same improvement in "ip6_gre: process toobig in a better way"
      is needed by ip4ip6 and ip6ip6 as well.
      
      Note that ip4ip6 and ip6ip6 will also update sk dst pmtu in their
      err_handlers. Like I said before, gre6 could not do this as it's
      inner proto is not certain. But for all of them, sk dst pmtu will
      be updated in tx path if in need.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b00f5432
    • Xin Long's avatar
      ip6_tunnel: add the process for redirect in ip6_tnl_err · 383c1f88
      Xin Long authored
      The same process for redirect in "ip6_gre: add the process for redirect
      in ip6gre_err" is needed by ip4ip6 and ip6ip6 as well.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      383c1f88
    • Xin Long's avatar
      ip6_gre: process toobig in a better way · fe1a4ca0
      Xin Long authored
      Now ip6gre processes toobig icmp packet by setting gre dev's mtu in
      ip6gre_err, which would cause few things not good:
      
        - It couldn't set mtu with dev_set_mtu due to it's not in user context,
          which causes route cache and idev->cnf.mtu6 not to be updated.
      
        - It has to update sk dst pmtu in tx path according to gredev->mtu for
          ip6gre, while it updates pmtu again according to lower dst pmtu in
          ip6_tnl_xmit.
      
        - To change dev->mtu by toobig icmp packet is not a good idea, it should
          only work on pmtu.
      
      This patch is to process toobig by updating the lower dst's pmtu, as later
      sk dst pmtu will be updated in ip6_tnl_xmit, the same way as in ip4gre.
      
      Note that gre dev's mtu will not be updated any more, it doesn't make any
      sense to change dev's mtu after receiving a toobig packet.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe1a4ca0
    • Xin Long's avatar
      ip6_gre: add the process for redirect in ip6gre_err · 929fc032
      Xin Long authored
      This patch is to add redirect icmp packet process for ip6gre by
      calling ip6_redirect() in ip6gre_err(), as in vti6_err.
      
      Prior to this patch, there's even no route cache generated after
      receiving redirect.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      929fc032
    • Zhu Yanjun's avatar
      forcedeth: remove redudant assignments in xmit · 0d728b84
      Zhu Yanjun authored
      In xmit process, the variables are set many times. In fact,
      it is enough for these variables to be set once.
      After a long time test, the throughput performance is better
      than before.
      
      CC: Srinivas Eeda <srinivas.eeda@oracle.com>
      CC: Joe Jin <joe.jin@oracle.com>
      CC: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d728b84
    • David S. Miller's avatar
      Merge tag 'nfc-next-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next · 6afce196
      David S. Miller authored
      Samuel Ortiz says:
      
      ====================
      NFC 4.15 pull request
      
      This is the NFC pull request for 4.15. We have:
      
      - A new netlink command for explicitly deactivating NFC targets
      - i2c constification for all NFC drivers
      - One NFC device allocation error path fix
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6afce196
    • David S. Miller's avatar
      Merge branch 'Openvswitch-meter-action' · fd9080a3
      David S. Miller authored
      Andy Zhou says:
      
      ====================
      Openvswitch meter action
      
      This patch series is the first attempt to add openvswitch
      meter support. We have previously experimented with adding
      metering support in nftables. However 1) It was not clear
      how to expose a named nftables object cleanly, and 2)
      the logic that implements metering is quite small, < 100 lines
      of code.
      
      With those two observations, it seems cleaner to add meter
      support in the openvswitch module directly.
      
      ---
      
          v1(RFC)->v2:  remove unused code improve locking
      		  and other review comments
          v2 -> v3:     rebase
          v3 -> v4:     fix undefined "__udivdi3" references on 32 bit builds.
                        use div_u64() instead.
          v4 -> v5:     rebase
      ====================
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd9080a3
    • Andy Zhou's avatar
      openvswitch: Add meter action support · cd8a6c33
      Andy Zhou authored
      Implements OVS kernel meter action support.
      Signed-off-by: default avatarAndy Zhou <azhou@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd8a6c33
    • Andy Zhou's avatar
      openvswitch: Add meter infrastructure · 96fbc13d
      Andy Zhou authored
      OVS kernel datapath so far does not support Openflow meter action.
      This is the first stab at adding kernel datapath meter support.
      This implementation supports only drop band type.
      Signed-off-by: default avatarAndy Zhou <azhou@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96fbc13d
    • Andy Zhou's avatar
      openvswitch: export get_dp() API. · 9602c01e
      Andy Zhou authored
      Later patches will invoke get_dp() outside of datapath.c. Export it.
      Signed-off-by: default avatarAndy Zhou <azhou@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9602c01e
    • Andy Zhou's avatar
      openvswitch: Add meter netlink definitions · 57940406
      Andy Zhou authored
      Meter has its own netlink family. Define netlink messages and attributes
      for communicating with the user space programs.
      Signed-off-by: default avatarAndy Zhou <azhou@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57940406
    • David S. Miller's avatar
      Merge branch 'dsa-b53-Support-prepended-Broadcom-tags' · aef1e0d5
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: dsa: b53: Support prepended Broadcom tags
      
      This patch series adds support for prepended 4-bytes Broadcom tags that we
      already support. This type of tag will typically be used when interfaced to
      a SoC like BCM58xx (NorthStar Plus) which supports a Flow Accelerator (WIP).
      In that case, we need to support a slightly different tagging format.
      
      The first patch does a bit of re-factoring and passes a port index to
      the get_tag_protocol() function since at least two different drivers need
      that type of information (mt7530, b53) to support tagging or not.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aef1e0d5
    • Florian Fainelli's avatar
      net: dsa: b53: Support prepended Broadcom tags · 11606039
      Florian Fainelli authored
      On BCM58xx devices (Northstar Plus), there is an accelerator attached to
      port 8 which would only work if we use prepended Broadcom tags. Resolve
      that difference in our get_tag_protocol() function by setting the
      appropriate tagging protocol in that case. We need to change
      b53_brcm_hdr_setup() a little bit now since we can deal with two types
      of Broadcom tags.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11606039
    • Florian Fainelli's avatar
      net: dsa: Support prepended Broadcom tag · b74b70c4
      Florian Fainelli authored
      Add a new type: DSA_TAG_PROTO_PREPEND which allows us to support for the
      4-bytes Broadcom tag that we already support, but in a format where it
      is pre-pended to the packet instead of located between the MAC SA and
      the Ethertyper (DSA_TAG_PROTO_BRCM).
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b74b70c4
    • Florian Fainelli's avatar
      net: dsa: tag_brcm: Prepare for supporting prepended tag · f7c39e3d
      Florian Fainelli authored
      In preparation for supporting the same Broadcom tag format, but instead
      of inserted between the MAC SA and EtherType, prepended to the Ethernet
      frame, restructure the code a little bit to make that possible and take
      an offset parameter.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7c39e3d
    • Florian Fainelli's avatar
      net: dsa: Pass a port to get_tag_protocol() · 5ed4e3eb
      Florian Fainelli authored
      A number of drivers want to check whether the configured CPU port is a
      possible configuration for enabling tagging, pass down the CPU port
      number so they verify that.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ed4e3eb
    • Andrew Morton's avatar
      net/sched/sch_red.c: work around gcc-4.4.4 anon union initializer issue · ee9d3429
      Andrew Morton authored
      gcc-4.4.4 (at lest) has issues with initializers and anonymous unions:
      
      net/sched/sch_red.c: In function 'red_dump_offload':
      net/sched/sch_red.c:282: error: unknown field 'stats' specified in initializer
      net/sched/sch_red.c:282: warning: initialization makes integer from pointer without a cast
      net/sched/sch_red.c:283: error: unknown field 'stats' specified in initializer
      net/sched/sch_red.c:283: warning: initialization makes integer from pointer without a cast
      net/sched/sch_red.c: In function 'red_dump_stats':
      net/sched/sch_red.c:352: error: unknown field 'xstats' specified in initializer
      net/sched/sch_red.c:352: warning: initialization makes integer from pointer without a cast
      
      Work around this.
      
      Fixes: 602f3baf ("net_sch: red: Add offload ability to RED qdisc")
      Cc: Nogah Frankel <nogahf@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Simon Horman <simon.horman@netronome.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee9d3429
    • Slava Shwartsman's avatar
      net/mlx4: Use Kconfig flag to remove support of old gen2 Mellanox devices · a1b87145
      Slava Shwartsman authored
      Since Mellanox focus is on newer adapters, we would like to have the
      ability to disable the support for old gen2 adapters.
      
      This can be done by turning off the MLX4_CORE_GEN2 Kconfig flag.
      We keep it turned on by default.
      Signed-off-by: default avatarSlava Shwartsman <slavash@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1b87145
    • Jason A. Donenfeld's avatar
      af_netlink: ensure that NLMSG_DONE never fails in dumps · 0642840b
      Jason A. Donenfeld authored
      The way people generally use netlink_dump is that they fill in the skb
      as much as possible, breaking when nla_put returns an error. Then, they
      get called again and start filling out the next skb, and again, and so
      forth. The mechanism at work here is the ability for the iterative
      dumping function to detect when the skb is filled up and not fill it
      past the brim, waiting for a fresh skb for the rest of the data.
      
      However, if the attributes are small and nicely packed, it is possible
      that a dump callback function successfully fills in attributes until the
      skb is of size 4080 (libmnl's default page-sized receive buffer size).
      The dump function completes, satisfied, and then, if it happens to be
      that this is actually the last skb, and no further ones are to be sent,
      then netlink_dump will add on the NLMSG_DONE part:
      
        nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
      
      It is very important that netlink_dump does this, of course. However, in
      this example, that call to nlmsg_put_answer will fail, because the
      previous filling by the dump function did not leave it enough room. And
      how could it possibly have done so? All of the nla_put variety of
      functions simply check to see if the skb has enough tailroom,
      independent of the context it is in.
      
      In order to keep the important assumptions of all netlink dump users, it
      is therefore important to give them an skb that has this end part of the
      tail already reserved, so that the call to nlmsg_put_answer does not
      fail. Otherwise, library authors are forced to find some bizarre sized
      receive buffer that has a large modulo relative to the common sizes of
      messages received, which is ugly and buggy.
      
      This patch thus saves the NLMSG_DONE for an additional message, for the
      case that things are dangerously close to the brim. This requires
      keeping track of the errno from ->dump() across calls.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0642840b
    • David S. Miller's avatar
      Merge branch 'netem-add-nsec-scheduling-and-slot-feature' · 907a4425
      David S. Miller authored
      Dave Taht says:
      
      ====================
      netem: add nsec scheduling and slot feature
      
      This patch series converts netem away from the old "ticks" interface and
      userspace API, and adds support for a new "slot" feature intended to
      emulate bursty macs such as WiFi and LTE better.
      
      Changes since v2:
      Use u64 for packet_len_sched_time()
      Use simpler max(time_to_send,q->slot.slot_next)
      
      Changes since v1:
      Always pass new nanosecond APIs to userspace
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      907a4425
    • Dave Taht's avatar
      netem: support delivering packets in delayed time slots · 836af83b
      Dave Taht authored
      Slotting is a crude approximation of the behaviors of shared media such
      as cable, wifi, and LTE, which gather up a bunch of packets within a
      varying delay window and deliver them, relative to that, nearly all at
      once.
      
      It works within the existing loss, duplication, jitter and delay
      parameters of netem. Some amount of inherent latency must be specified,
      regardless.
      
      The new "slot" parameter specifies a minimum and maximum delay between
      transmission attempts.
      
      The "bytes" and "packets" parameters can be used to limit the amount of
      information transferred per slot.
      
      Examples of use:
      
      tc qdisc add dev eth0 root netem delay 200us \
               slot 800us 10ms bytes 64k packets 42
      
      A more correct example, using stacked netem instances and a packet limit
      to emulate a tail drop wifi queue with slots and variable packet
      delivery, with a 200Mbit isochronous underlying rate, and 20ms path
      delay:
      
      tc qdisc add dev eth0 root handle 1: netem delay 20ms rate 200mbit \
               limit 10000
      tc qdisc add dev eth0 parent 1:1 handle 10:1 netem delay 200us \
               slot 800us 10ms bytes 64k packets 42 limit 512
      Signed-off-by: default avatarDave Taht <dave.taht@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      836af83b
    • Dave Taht's avatar
      netem: add uapi to express delay and jitter in nanoseconds · 99803171
      Dave Taht authored
      netem userspace has long relied on a horrible /proc/net/psched hack
      to translate the current notion of "ticks" to nanoseconds.
      
      Expressing latency and jitter instead, in well defined nanoseconds,
      increases the dynamic range of emulated delays and jitter in netem.
      
      It will also ease a transition where reducing a tick to nsec
      equivalence would constrain the max delay in prior versions of
      netem to only 4.3 seconds.
      Signed-off-by: default avatarDave Taht <dave.taht@gmail.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99803171
    • Dave Taht's avatar
      netem: convert to qdisc_watchdog_schedule_ns · 112f9cb6
      Dave Taht authored
      Upgrade the internal netem scheduler to use nanoseconds rather than
      ticks throughout.
      
      Convert to and from the std "ticks" userspace api automatically,
      while allowing for finer grained scheduling to take place.
      Signed-off-by: default avatarDave Taht <dave.taht@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      112f9cb6
    • Francesco Ruggeri's avatar
      ipv6: try not to take rtnl_lock in ip6mr_sk_done · 338d182f
      Francesco Ruggeri authored
      Avoid traversing the list of mr6_tables (which requires the
      rtnl_lock) in ip6mr_sk_done(), when we know in advance that
      a match will not be found.
      This can happen when rawv6_close()/ip6mr_sk_done() is invoked
      on non-mroute6 sockets.
      This patch helps reduce rtnl_lock contention when destroying
      a large number of net namespaces, each having a non-mroute6
      raw socket.
      
      v2: same patch, only fixed subject line and expanded comment.
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      338d182f