1. 24 Jan, 2017 3 commits
    • Felix Fietkau's avatar
      bridge: multicast to unicast · 6db6f0ea
      Felix Fietkau authored
      Implements an optional, per bridge port flag and feature to deliver
      multicast packets to any host on the according port via unicast
      individually. This is done by copying the packet per host and
      changing the multicast destination MAC to a unicast one accordingly.
      
      multicast-to-unicast works on top of the multicast snooping feature of
      the bridge. Which means unicast copies are only delivered to hosts which
      are interested in it and signalized this via IGMP/MLD reports
      previously.
      
      This feature is intended for interface types which have a more reliable
      and/or efficient way to deliver unicast packets than broadcast ones
      (e.g. wifi).
      
      However, it should only be enabled on interfaces where no IGMPv2/MLDv1
      report suppression takes place. This feature is disabled by default.
      
      The initial patch and idea is from Felix Fietkau.
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      [linus.luessing@c0d3.blue: various bug + style fixes, commit message]
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Reviewed-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6db6f0ea
    • Krister Johansen's avatar
      Introduce a sysctl that modifies the value of PROT_SOCK. · 4548b683
      Krister Johansen authored
      Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl
      that denotes the first unprivileged inet port in the namespace.  To
      disable all privileged ports set this to zero.  It also checks for
      overlap with the local port range.  The privileged and local range may
      not overlap.
      
      The use case for this change is to allow containerized processes to bind
      to priviliged ports, but prevent them from ever being allowed to modify
      their container's network configuration.  The latter is accomplished by
      ensuring that the network namespace is not a child of the user
      namespace.  This modification was needed to allow the container manager
      to disable a namespace's priviliged port restrictions without exposing
      control of the network namespace to processes in the user namespace.
      Signed-off-by: default avatarKrister Johansen <kjlx@templeofstupid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4548b683
    • Daniel Borkmann's avatar
      bpf, lpm: fix kfree of im_node in trie_update_elem · d140199a
      Daniel Borkmann authored
      We need to initialize im_node to NULL, otherwise in case of error path
      it gets passed to kfree() as uninitialized pointer.
      
      Fixes: b95a5c4d ("bpf: add a longest prefix match trie map implementation")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d140199a
  2. 23 Jan, 2017 9 commits
    • David S. Miller's avatar
      Merge branch 'bpf-lpm' · 2acc76cb
      David S. Miller authored
      Daniel Mack says:
      
      ====================
      bpf: add longest prefix match map
      
      This patch set adds a longest prefix match algorithm that can be used
      to match IP addresses to a stored set of ranges. It is exposed as a
      bpf map type.
      
      Internally, data is stored in an unbalanced tree of nodes that has a
      maximum height of n, where n is the prefixlen the trie was created
      with.
      
      Note that this has nothing to do with fib or fib6 and is in no way meant
      to replace or share code with it. It's rather a much simpler
      implementation that is specifically written with bpf maps in mind.
      
      Patch 1/2 adds the implementation, 2/2 an extensive test suite and 3/3
      has benchmarking code for the new trie type.
      
      Feedback is much appreciated.
      
      Changelog:
      
      v3 -> v4:
      	* David added a 3rd patch that augments map_perf_test for
      	  LPM trie benchmarks
      	* Limit allocation of maps of this new type to CAP_SYS_ADMIN
      	  for now, as requested by Alexei
      	* Add a stub .map_delete_elem so the core does not stumble
      	  over a NULL pointer when the syscall is invoked
      	* Tests for non-power-of-2 prefix lengths were added
      	* More comment style fixes
      
      v2 -> v3:
      	* Store both the key match data and the caller provided
      	  value in the same byte array attached to a node. This
      	  avoids double allocations
      	* Bring back node->flags to distinguish between 'real'
      	  and intermediate nodes
      	* Fix comment style and some typos
      
      v1 -> v2:
      	* Turn spin lock into raw spinlock
      	* Lock with irqsave options during trie_update_elem()
      	* Return -ENOMEM properly from trie_alloc()
      	* Force attr->flags == BPF_F_NO_PREALLOC during creation
      	* Set trie->map.pages after creation to account for map memory
      	* Allow arbitrary value sizes
      	* Removed node->flags and denode intermediate nodes through
      	  node->value == NULL instead
      
      rfc -> v1:
      	* Add __rcu pointer annotations to make sparse happy
      	* Fold _lpm_trie_find_target_node() into its only caller
      	* Fix some minor documentation issues
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2acc76cb
    • David Herrmann's avatar
      samples/bpf: add lpm-trie benchmark · b8a943e2
      David Herrmann authored
      Extend the map_perf_test_{user,kern}.c infrastructure to stress test
      lpm-trie lookups. We hook into the kprobe on sys_gettid() and measure
      the latency depending on trie size and lookup count.
      
      On my Intel Haswell i7-6400U, a single gettid() syscall with an empty
      bpf program takes roughly 6.5us on my system. Lookups in empty tries
      take ~1.8us on first try, ~0.9us on retries. Lookups in tries with 8192
      entries take ~7.1us (on the first _and_ any subsequent try).
      Signed-off-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Reviewed-by: default avatarDaniel Mack <daniel@zonque.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8a943e2
    • David Herrmann's avatar
      bpf: Add tests for the lpm trie map · 4d3381f5
      David Herrmann authored
      The first part of this program runs randomized tests against the
      lpm-bpf-map. It implements a "Trivial Longest Prefix Match" (tlpm)
      based on simple, linear, single linked lists. The implementation
      should be pretty straightforward.
      
      Based on tlpm, this inserts randomized data into bpf-lpm-maps and
      verifies the trie-based bpf-map implementation behaves the same way
      as tlpm.
      
      The second part uses 'real world' IPv4 and IPv6 addresses and tests
      the trie with those.
      Signed-off-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: default avatarDaniel Mack <daniel@zonque.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d3381f5
    • Daniel Mack's avatar
      bpf: add a longest prefix match trie map implementation · b95a5c4d
      Daniel Mack authored
      This trie implements a longest prefix match algorithm that can be used
      to match IP addresses to a stored set of ranges.
      
      Internally, data is stored in an unbalanced trie of nodes that has a
      maximum height of n, where n is the prefixlen the trie was created
      with.
      
      Tries may be created with prefix lengths that are multiples of 8, in
      the range from 8 to 2048. The key used for lookup and update operations
      is a struct bpf_lpm_trie_key, and the value is a uint64_t.
      
      The code carries more information about the internal implementation.
      Signed-off-by: default avatarDaniel Mack <daniel@zonque.org>
      Reviewed-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b95a5c4d
    • Bhumika Goyal's avatar
      net: xilinx: constify net_device_ops structure · 10eeb5e6
      Bhumika Goyal authored
      Declare net_device_ops structure as const as it is only stored in
      the netdev_ops field of a net_device structure. This field is of type
      const, so net_device_ops structures having same properties can be made
      const too.
      Done using Coccinelle:
      
      @r1 disable optional_qualifier@
      identifier i;
      position p;
      @@
      static struct net_device_ops i@p={...};
      
      @ok1@
      identifier r1.i;
      position p;
      struct net_device ndev;
      @@
      ndev.netdev_ops=&i@p
      
      @bad@
      position p!={r1.p,ok1.p};
      identifier r1.i;
      @@
      i@p
      
      @depends on !bad disable optional_qualifier@
      identifier r1.i;
      @@
      +const
      struct net_device_ops i;
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
         6201	    744	      0	   6945	   1b21 ethernet/xilinx/xilinx_emaclite.o
      
      File size after:
         text	   data	    bss	    dec	    hex	filename
         6745	    192	      0	   6937	   1b19 ethernet/xilinx/xilinx_emaclite.o
      Signed-off-by: default avatarBhumika Goyal <bhumirks@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10eeb5e6
    • Bhumika Goyal's avatar
      net: moxa: constify net_device_ops structures · 30bd2f52
      Bhumika Goyal authored
      Declare net_device_ops structure as const as it is only stored in
      the netdev_ops field of a net_device structure. This field is of type
      const, so net_device_ops structures having same properties can be made
      const too.
      Done using Coccinelle:
      
      @r1 disable optional_qualifier@
      identifier i;
      position p;
      @@
      static struct net_device_ops i@p={...};
      
      @ok1@
      identifier r1.i;
      position p;
      struct net_device ndev;
      @@
      ndev.netdev_ops=&i@p
      
      @bad@
      position p!={r1.p,ok1.p};
      identifier r1.i;
      @@
      i@p
      
      @depends on !bad disable optional_qualifier@
      identifier r1.i;
      @@
      +const
      struct net_device_ops i;
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
         4821	    744	      0	   5565	   15bd ethernet/moxa/moxart_ether.o
      
      File size after:
         text	   data	    bss	    dec	    hex	filename
         5373	    192	      0	   5565	   15bd ethernet/moxa/moxart_ether.o
      Signed-off-by: default avatarBhumika Goyal <bhumirks@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30bd2f52
    • Timur Tabi's avatar
      net: qcom/emac: claim the irq only when the device is opened · 4404323c
      Timur Tabi authored
      During reset, functions emac_mac_down() and emac_mac_up() are called,
      so we don't want to free and claim the IRQ unnecessarily.  Move those
      operations to open/close.
      Signed-off-by: default avatarTimur Tabi <timur@codeaurora.org>
      Reviewed-by: default avatarLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4404323c
    • Timur Tabi's avatar
      net: qcom/emac: rename emac_phy to emac_sgmii and move it · 41c1093f
      Timur Tabi authored
      The EMAC has an internal PHY that is often called the "SGMII".  This
      SGMII is also connected to an external PHY, which is managed by phylib.
      These dual PHYs often cause confusion.  In this case, the data structure
      for managing the SGMII was mis-named and located in the wrong header file.
      
      Structure emac_phy is renamed to emac_sgmii to clearly indicate it applies
      to the internal PHY only.  It also also moved from emac_phy.h (which
      supports the external PHY) to emac_sgmii.h (where it belongs).
      
      To keep the changes minimal, only the structure name is changed, not
      the names of any variables of that type.
      Signed-off-by: default avatarTimur Tabi <timur@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41c1093f
    • Eric Dumazet's avatar
      bnx2x: avoid two atomic ops per page on x86 · b9032741
      Eric Dumazet authored
      Commit 4cace675 ("bnx2x: Alloc 4k fragment for each rx ring buffer
      element") added extra put_page() and get_page() calls on arches where
      PAGE_SIZE=4K like x86
      
      Reorder things to avoid this overhead.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
      Cc: Ariel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9032741
  3. 22 Jan, 2017 15 commits
  4. 20 Jan, 2017 13 commits