Commits · f19397a5c65665d66e3866b42056f1f58b7a366b · nexedi / linux

05 Dec, 2017 1 commit

bpf: Add access to snd_cwnd and others in sock_ops · f19397a5

Lawrence Brakmo authored Dec 01, 2017

Adds read access to snd_cwnd and srtt_us fields of tcp_sock. Since these
fields are only valid if the socket associated with the sock_ops program
call is a full socket, the field is_fullsock is also added to the
bpf_sock_ops struct. If the socket is not a full socket, reading these
fields returns 0.

Note that in most cases it will not be necessary to check is_fullsock to
know if there is a full socket. The context of the call, as specified by
the 'op' field, can sometimes determine whether there is a full socket.

The struct bpf_sock_ops has the following fields added:

  __u32 is_fullsock;      /* Some TCP fields are only valid if
                           * there is a full socket. If not, the
                           * fields read as zero.
			   */
  __u32 snd_cwnd;
  __u32 srtt_us;          /* Averaged RTT << 3 in usecs */

There is a new macro, SOCK_OPS_GET_TCP32(NAME), to make it easier to add
read access to more 32 bit tcp_sock fields.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

f19397a5

04 Dec, 2017 16 commits

bpf: move bpf csum flag check · 792f3dd6

William Tu authored Dec 04, 2017

trivial move the BPF_F_ZERO_CSUM_TX check right below the
'flags & BPF_F_DONT_FRAGMENT', so common tun_flags handling
is logically together.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

792f3dd6

rtnetlink: ipv6: convert remaining users to rtnl_register_module · a3fde2ad

Florian Westphal authored Dec 04, 2017

convert remaining users of rtnl_register to rtnl_register_module
and un-export rtnl_register.
Requested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

a3fde2ad

Merge tag 'linux-can-next-for-4.16-20171201' of... · 112d59c7

David S. Miller authored Dec 04, 2017

Merge tag 'linux-can-next-for-4.16-20171201' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2017-12-01

this is a pull request of 10 patches for net-next/master.

The first two patches are by Arnd Bergmann, they convert the peak_usb
from using "struct timeval" to "ktime_t". The error handling in the
vxcan driver is clean up by Markus Elfring's patch. Bhumika Goyal
contributes a patch for the c_can_pci driver to make the pci data const.
The six patches by Pankaj Bansal for the flexcan driver add LS1021A
support by making the endianness of the driver configurable by the
device tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

112d59c7

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · d671965b

David S. Miller authored Dec 04, 2017

Daniel Borkmann says:

====================
pull-request: bpf-next 2017-12-03

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Addition of a software model for BPF offloads in order to ease
   testing code changes in that area and make semantics more clear.
   This is implemented in a new driver called netdevsim, which can
   later also be extended for other offloads. SR-IOV support is added
   as well to netdevsim. BPF kernel selftests for offloading are
   added so we can track basic functionality as well as exercising
   all corner cases around BPF offloading, from Jakub.

2) Today drivers have to drop the reference on BPF progs they hold
   due to XDP on device teardown themselves. Change this in order
   to make XDP handling inside the drivers less error prone, and
   move disabling XDP to the core instead, also from Jakub.

3) Misc set of BPF verifier improvements and cleanups as preparatory
   work for upcoming BPF-to-BPF calls. Among others, this set also
   improves liveness marking such that pruning can be slightly more
   effective. Register and stack liveness information is now included
   in the verifier log as well, from Alexei.

4) nfp JIT improvements in order to identify load/store sequences in
   the BPF prog e.g. coming from memcpy lowering and optimizing them
   through the NPU's command push pull (CPP) instruction, from Jiong.

5) Cleanups to test_cgrp2_attach2.c BPF sample code in oder to remove
   bpf_prog_attach() magic values and replacing them with actual proper
   attach flag instead, from David.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d671965b

Merge branch 'rtnetlink-rework-handler-registration' · f4d4c49b

David S. Miller authored Dec 04, 2017

Florian Westphal says:

====================
rtnetlink: rework handler (un)registering

Peter Zijlstra reported (referring to commit 019a3169,
"rtnetlink: add reference counting to prevent module unload while dump is in progress"):

 1) it not in fact a refcount, so using refcount_t is silly
 2) there is a distinct lack of memory barriers, so we can easily
    observe the decrement while the msg_handler is still in progress.
 3) waiting with a schedule()/yield() loop is complete crap and subject
    life-locks, imagine doing that rtnl_unregister_all() from a RT task.

In ancient times rtnetlink exposed a statically-sized table with
preset doit/dumpit handlers to be called for a protocol/type pair.

Later the rtnl_register interface was added and the table was allocated
on demand.  Eventually these were also used by modules.

Problem is that nothing prevents module unload while a netlink dump
is in progress.  netlink dumps can be span multiple recv calls and
netlink core saves the to-be-repeated dumper address for later invocation.

To prevent rmmod the netlink core expects callers to pass in the owning
module so a reference can be taken.

So far rtnetlink wasn't doing this, add new interface to pass THIS_MODULE.
Moreover, when converting parts of the rtnetlink handling to rcu this code
gained way too many READ_ONCE spots, remove them and the extra refcounting.

Take a module reference when running dumpit and doit callbacks
and never alter content of rtnl_link structures after they have been
published via rcu_assign_pointer.

Based partially on earlier patch from Peter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f4d4c49b

rtnetlink: remove __rtnl_register · 16feebcf

Florian Westphal authored Dec 02, 2017

This removes __rtnl_register and switches callers to either
rtnl_register or rtnl_register_module.

Also, rtnl_register() will now print an error if memory allocation
failed rather than panic the kernel.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

16feebcf

net: use rtnl_register_module where needed · c1c502b5

Florian Westphal authored Dec 02, 2017

all of these can be compiled as a module, so use new
_module version to make sure module can no longer be removed
while callback/dump is in use.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

c1c502b5

rtnetlink: get reference on module before invoking handlers · e4202511

Florian Westphal authored Dec 02, 2017

Add yet another rtnl_register function.  It will be used by modules
that can be removed.

The passed module struct is used to prevent module unload while
a netlink dump is in progress or when a DOIT_UNLOCKED doit callback
is called.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4202511

net: rtnetlink: use rcu to free rtnl message handlers · addf9b90

Florian Westphal authored Dec 02, 2017

rtnetlink is littered with READ_ONCE() because we can have read accesses
while another cpu can write to the structure we're reading by
(un)registering doit or dumpit handlers.

This patch changes this so that (un)registering cpu allocates a new
structure and then publishes it via rcu_assign_pointer, i.e. once
another cpu can see such pointer no modifications will occur anymore.

based on initial patch from Peter Zijlstra.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

addf9b90

net: phy: broadcom: re-add mistakenly removed config settings · 9753c21f

Heiner Kallweit authored Dec 02, 2017

Previous patch mistakenly removed three chip-specific config settings.
Add them again.

Fixes: 80274aba "net: phy: remove generic settings for callbacks config_aneg and read_status from drivers"
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9753c21f

Merge branch 'ipv6-gre-collect_md' · 90ffa72f

David S. Miller authored Dec 04, 2017

William Tu says:

====================
add ip6 gre and gretap collect_md mode

Similar to gre, vxlan, geneve, ipip tunnels, allow ip6gretap tunnels to
operate in collect metadata mode.  The first patch adds the support to
ip6_gre.c. The second patch enables unsetting the csum for ipv6 tunnel,
when using bpf_skb_[gs]et_tunnel_key() helpers.  Finally, the last patch
adds the ip6 gre and gretap tunnel test cases to BPF sample code.

The corresponding iproute2 patch:
https://marc.info/?l=linux-netdev&m=151216943128087&w=2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

90ffa72f

samples/bpf: extend test_tunnel_bpf.sh with ip6gre · 56ddd302

William Tu authored Dec 01, 2017

Extend existing tests for vxlan, gre, geneve, ipip, erspan,
to include ip6 gre and gretap tunnel.
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

56ddd302

bpf: allow disabling tunnel csum for ipv6 · b8da518c

William Tu authored Dec 01, 2017

Before the patch, BPF_F_ZERO_CSUM_TX can be used only for ipv4 tunnel.
With introduction of ip6gretap collect_md mode, the flag should be also
supported for ipv6.
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8da518c

ip6_gre: add ip6 gre and gretap collect_md mode · 6712abc1

William Tu authored Dec 01, 2017

Similar to gre, vxlan, geneve, ipip tunnels, allow ip6 gre and gretap
tunnels to operate in collect metadata mode.  bpf_skb_[gs]et_tunnel_key()
helpers can make use of it right away.  OVS can use it as well in the
future.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6712abc1

net: phy: core: don't disable device interrupts in phy_change · c34bc2b5

Heiner Kallweit authored Nov 30, 2017

If state is not PHY_HALTED I see no need to temporarily disable
interrupts on the device. As long as the current interrupt isn't acked
on the device no new interrupt can happen anyway.

In addition remove a unneeded enabling of interrupts in the state
machine when handling state PHY_CHANGELINK.

Tested on a Odroid-C2 with RTL8211F phy in interrupt mode.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c34bc2b5

net: phy: core: remove now uneeded disabling of interrupts · a6d1642d

Heiner Kallweit authored Nov 30, 2017

After commits c974bdbc "net: phy: Use threaded IRQ, to allow IRQ from
sleeping devices" and 664fcf12 "net: phy: Threaded interrupts allow
some simplification" all relevant code pieces run in process context
anyway and I don't think we need the disabling of interrupts any longer.

Interestingly enough, latter commit already removed the comment
explaining why interrupts need to be temporarily disabled.

On my system phy interrupt mode works fine with this patch.
However I may miss something, especially in the context of shared phy
interrupts, therefore I'd appreciate if more people could test this.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

a6d1642d

03 Dec, 2017 23 commits

Merge branch 'tcp-2nd-listener-hash' · c6d3c96f

David S. Miller authored Dec 03, 2017

Martin KaFai Lau says:

====================
tcp: Add a 2nd listener hashtable (port+addr)

This patch set adds a 2nd listener hashtable.  It is to resolve
the performance issue when a process is listening at many IP
addresses with the same port (e.g. [IP1]:443, [IP2]:443... [IPN]:443)

v2:
- Move the new lhash2 and lhash2_mask before the existing
  listening_hash to avoid adding another cacheline
  to inet_hashinfo (Suggested by Eric Dumazet, Thanks!)
- I take this chance to plug an existing 4 bytes hole while
  adding 'unsigned int lhash2_mask'.
- Add some comments about lhash2 in inet_hashtables.h
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c6d3c96f

tcp: Enable 2nd listener hashtable in TCP · 27da6d37

Martin KaFai Lau authored Dec 01, 2017

Enable the second listener hashtable in TCP.
The scale is the same as UDP which is one slot per 2MB.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

27da6d37

inet: Add a 2nd listener hashtable (port+addr) · 61b7c691

Martin KaFai Lau authored Dec 01, 2017

The current listener hashtable is hashed by port only.
When a process is listening at many IP addresses with the same port (e.g.
[IP1]:443, [IP2]:443... [IPN]:443), the inet[6]_lookup_listener()
performance is degraded to a link list.  It is prone to syn attack.

UDP had a similar issue and a second hashtable was added to resolve it.

This patch adds a second hashtable for the listener's sockets.
The second hashtable is hashed by port and address.

It cannot reuse the existing skc_portaddr_node which is shared
with skc_bind_node.  TCP listener needs to use skc_bind_node.
Instead, this patch adds a hlist_node 'icsk_listen_portaddr_node' to
the inet_connection_sock which the listener (like TCP) also belongs to.

The new portaddr hashtable may need two lookup (First by IP:PORT.
Second by INADDR_ANY:PORT if the IP:PORT is a not found).   Hence,
it implements a similar cut off as UDP such that it will only consult the
new portaddr hashtable if the current port-only hashtable has >10
sk in the link-list.

lhash2 and lhash2_mask are added to 'struct inet_hashinfo'.  I take
this chance to plug a 4 bytes hole.  It is done by first moving
the existing bind_bucket_cachep up and then add the new
(int lhash2_mask, *lhash2) after the existing bhash_size.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

61b7c691

udp: Move udp[46]_portaddr_hash() to net/ip[v6].h · f0b1e64c

Martin KaFai Lau authored Dec 01, 2017

This patch moves the udp[46]_portaddr_hash()
to net/ip[v6].h.  The function name is renamed to
ipv[46]_portaddr_hash().

It will be used by a later patch which adds a second listener
hashtable hashed by the address and port.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f0b1e64c

inet: Add a count to struct inet_listen_hashbucket · 76d013b2

Martin KaFai Lau authored Dec 01, 2017

This patch adds a count to the 'struct inet_listen_hashbucket'.
It counts how many sk is hashed to a bucket.  It will be
used to decide if the (to-be-added) portaddr listener's hashtable
should be used during inet[6]_lookup_listener().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76d013b2

enic: add sw timestamp support · fb7516d4

Govindarajulu Varadarajan authored Dec 01, 2017

Add ethtool ops to advertise sw timestamping.
Call skb_tx_timestamp() just before ringing the wq doorbell.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fb7516d4

Merge branch 'hv_netvsc-minor-optimizations' · b8278f2c

David S. Miller authored Dec 03, 2017

Stephen Hemminger says:

====================
hv_netvsc: minor optimizations

These are a set of local optimizations the Hyper-V networking driver.
Also include a vmbus patch in this set, because it depends on the
netvsc that last used that function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b8278f2c

vmbus: make hv_get_ringbuffer_availbytes local · 0487426f

Stephen Hemminger authored Dec 01, 2017

The last use of hv_get_ringbuffer_availbytes in drivers is now
gone. Only used by the debug info routine so make it static. Also, add
READ_ONCE() to avoid any possible issues with potentially volatile
index values.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0487426f

hv_netvsc: optimize initialization of RNDIS header · f5a22550

Stephen Hemminger authored Dec 01, 2017

The memset of the whole maximum possible RNDIS header is unnecessary.
For the main part of the header use a structure assignment.

No need to memset the whole per packet info. Instead rely on caller to
set what it wants. Also get rid of cast to void and signed/unsigned
conversion. Now return pointer to per packet data (rather than the
header) which simplifies use by code setting up the packet data.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5a22550

hv_netvsc: use reciprocal divide to speed up percent calculation · a7f99d0f

Stephen Hemminger authored Dec 01, 2017

Every packet sent checks the available ring space. The calculation
can be sped up by using reciprocal divide which is multiplication.

Since ring_size can only be configured by module parameter, so it doesn't
have to be passed around everywhere. Also it should be unsigned
since it is number of pages.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7f99d0f

hv_netvsc: replace divide with mask when computing padding · b85e06f7

Stephen Hemminger authored Dec 01, 2017

Packet alignment is always a power of 2 therefore modulus can
be replaced with a faster and operation
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b85e06f7

hv_netvsc: don't need local xmit_more · 200a5699

Stephen Hemminger authored Dec 01, 2017

Since skb is always non-NULL in the copy portion of netvsc_send
do not need local variable.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

200a5699

hv_netvsc: drop unused macros · 07a7c494

Stephen Hemminger authored Dec 01, 2017

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

07a7c494

ipvlan: Add new func ipvlan_is_valid_dev instead of duplicated codes · 5e51fe6f

Gao Feng authored Dec 01, 2017

There are multiple duplicated condition checks in the current codes, so
I add the new func ipvlan_is_valid_dev instead of the duplicated codes to
check if the netdev is real ipvlan dev.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e51fe6f

Merge branch 'realtek-phy-improvements' · 0653cb2e

David S. Miller authored Dec 03, 2017

Martin Blumenstingl says:

====================
Realtek Ethernet PHY driver improvements

This series provides some small improvements and cleanups for the
Realtek Ethernet PHY driver.
None of the patches in this series should change any functionality.
The goal is to make the code a bit easier to read by:
- re-using the BIT and GENMASK macros (which makes it easier to compare
  the #defines in the kernel with the values from the datasheets)
- rename a #define from a generic name to a PHY-specific name since it's
  only used for one specific PHY
- logically group the register #defines and their register bit #defines
  together
- indentation cleanups
- removed some code duplicating for reading/writing registers on a
  Realtek specific "page"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0653cb2e

net: phy: realtek: add utility functions to read/write page addresses · 136819a6

Martin Blumenstingl authored Dec 02, 2017

Realtek PHYs implement the concept of so-called "extension pages". The
reason for this is probably because these PHYs expose more registers
than available in the standard address range.
After all read/write operations on such a page are done the driver
should switch back to page 0 where the standard MII registers (such as
MII_BMCR) are available.

When referring to such a register the datasheets of RTL8211E and
RTL8211F always specify:
- the page / "ext. page" which has to be written to RTL821x_PAGE_SELECT
- an address (sometimes also called reg)

These new utility functions make the existing code easier to read since
it removes some duplication (switching back to page 0 is done within the
new helpers for example).

No functional changes are intended.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

136819a6

net: phy: realtek: use the same indentation for all #defines · f609ab0e

Martin Blumenstingl authored Dec 02, 2017

This simply makes the code easier to read. No functional changes.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

f609ab0e

net: phy: realtek: group all register bit #defines for RTL821x_INER · a82f266d

Martin Blumenstingl authored Dec 02, 2017

This simply moves all register bit #defines which describe the (PHY
specific) bits in the RTL821x_INER right below the RTL821x_INER register
definition. This makes it easier to spot which registers and bits belong
together.
No functional changes.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

a82f266d

net: phy: realtek: rename RTL821x_INER_INIT to RTL8211B_INER_INIT · 69021e32

Martin Blumenstingl authored Dec 02, 2017

This macro is only used by the RTL8211B code. RTL8211E and RTL8211F both
use other bits to initialize the RTL821x_INER register.
No functional changes.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

69021e32

net: phy: realtek: use the BIT and GENMASK macros · 8cc5baef

Martin Blumenstingl authored Dec 02, 2017

This makes it easier to compare the #defines with the datasheets.
No functional changes.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

8cc5baef

Merge branch 'dsa-cross-chip-FDB-support' · 75d0de8c

David S. Miller authored Dec 02, 2017

Vivien Didelot says:

====================
net: dsa: cross-chip FDB support

DSA can have interconnected switches. For instance, the ZII Dev Rev B
board described in arch/arm/boot/dts/vf610-zii-dev-rev-b.dts has a
switch fabric composed of 3 switch devices like this:

                          lan4                 lan6
        CPU (eth1)            |  lan5         |  lan7
                  |           | |             | |
       [0 1 2 3 4 6 5]---[6 0 1 2 3 4 5]---[9 0 1 2 3 4 5 6 7 8]
        | | |               |                     | | |
    lan0  |  lan2       lan3                  lan8  |  optical4
           lan1                                      optical3

One current issue with DSA is cross-chip FDB. If we add a static MAC
address on lan3, only its parent switch 1 (the one in the middle) will
be programmed. That is not correct in a cross-chip environment, because
the DSA ports connecting to switch 1 of adjacent switch 0 (on the left)
and switch 2 (on the right) must be programmed too.

Without this patchset, a dump of the hardware FDB of switches 0, 1 and 2
after programming a MAC address on lan3 looks like this (*):

    # bridge fdb add 11:22:33:44:55:66 dev lan3
    # cat /sys/kernel/debug/mv88e6xxx/sw*/atu/0 | grep -v FID
       0  ff:ff:ff:ff:ff:ff            MC_STATIC       n  0 1 2 3 4 5 6
       0  11:22:33:44:55:66    MC_STATIC_MGMT_PO       n  0 - - - - - -
       0  ff:ff:ff:ff:ff:ff            MC_STATIC       n  0 1 2 3 4 5 6
       0  ff:ff:ff:ff:ff:ff            MC_STATIC       n  0 1 2 3 4 5 6 7 8 9

With this patchset applied, adjacent DSA ports get programmed too:

    # bridge fdb add 11:22:33:44:55:66 dev lan3
    # cat /sys/kernel/debug/mv88e6xxx/sw*/atu/0 | grep -v FID
       0  11:22:33:44:55:66    MC_STATIC_MGMT_PO       n  - - - - - 5 -
       0  ff:ff:ff:ff:ff:ff            MC_STATIC       n  0 1 2 3 4 5 6
       0  11:22:33:44:55:66    MC_STATIC_MGMT_PO       n  0 - - - - - -
       0  ff:ff:ff:ff:ff:ff            MC_STATIC       n  0 1 2 3 4 5 6
       0  11:22:33:44:55:66    MC_STATIC_MGMT_PO       n  - - - - - - - - - 9
       0  ff:ff:ff:ff:ff:ff            MC_STATIC       n  0 1 2 3 4 5 6 7 8 9

In order to do that, the first commit introduces a dsa_towards_port()
helper which returns the local port of a switch which must be used to
reach an arbitrary switch port (local or from an adjacent switch.)

The second patch uses this helper to configure the port reaching the
target port for every switches of the fabric.

(*) a patch for squashed debugfs interface which applies on top of this
patchset is available here:

    https://github.com/vivien/linux/commit/f8e6ba34c68a72d3bf42f4dea79abacb2e61a3cc.patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

75d0de8c

net: dsa: support cross-chip FDB operations · 3169241f

Vivien Didelot authored Nov 30, 2017

When a MAC address is added to or removed from a switch port in the
fabric, the target switch must program its port and adjacent switches
must program their local DSA port used to reach the target switch.

For this purpose, use the dsa_towards_port() helper to identify the
local switch port which must be programmed.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3169241f

net: dsa: introduce dsa_towards_port helper · 3b8fac5d

Vivien Didelot authored Nov 30, 2017

Add a new helper returning the local port used to reach an arbitrary
switch port in the fabric.

Its only user at the moment is the dsa_upstream_port helper, which
returns the local port reaching the dedicated CPU port, but it will be
used in cross-chip FDB operations.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b8fac5d