Commits · 7035aad875ba83331ab243173d2775f59880164f · Kirill Smelkov / linux

09 Aug, 2019 22 commits

net: stmmac: xgmac: Implement tx_queue_prio() · 7035aad8

Jose Abreu authored Aug 07, 2019

Implement the TX Queue Priority callback in XGMAC core.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7035aad8

net: stmmac: xgmac: Implement set_mtl_tx_queue_weight() · 5656ac55

Jose Abreu authored Aug 07, 2019

Implement the TX Queue Weight callback. In order for this to be active
we also need to set ETS algorithm when configuring Queue.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5656ac55

net: stmmac: xgmac: Implement MMC counters · b6cdf09f

Jose Abreu authored Aug 07, 2019

Implement the MMC counters feature in XGMAC core.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b6cdf09f

tipc: add loopback device tracking · 6c9081a3

John Rutherford authored Aug 07, 2019

Since node internal messages are passed directly to the socket, it is not
possible to observe those messages via tcpdump or wireshark.

We now remedy this by making it possible to clone such messages and send
the clones to the loopback interface.  The clones are dropped at reception
and have no functional role except making the traffic visible.

The feature is enabled if network taps are active for the loopback device.
pcap filtering restrictions require the messages to be presented to the
receiving side of the loopback device.

v3 - Function dev_nit_active used to check for network taps.
   - Procedure netif_rx_ni used to send cloned messages to loopback device.
Signed-off-by: John Rutherford <john.rutherford@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6c9081a3

Merge branch 'flow_offload-add-indr-block-in-nf_table_offload' · 2339ef1c

David S. Miller authored Aug 08, 2019

wenxu says:

====================
flow_offload: add indr-block in nf_table_offload

This series patch make nftables offload support the vlan and
tunnel device offload through indr-block architecture.

The first four patches mv tc indr block to flow offload and
rename to flow-indr-block.
Because the new flow-indr-block can't get the tcf_block
directly. The fifth patch provide a callback list to get
flow_block of each subsystem immediately when the device
register and contain a block.
The last patch make nf_tables_offload support flow-indr-block.

This version add a mutex lock for add/del flow_indr_block_ing_cb
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2339ef1c

netfilter: nf_tables_offload: support indr block call · 9a32669f

wenxu authored Aug 07, 2019

nftable support indr-block call. It makes nftable an offload vlan
and tunnel device.

nft add table netdev firewall
nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; }
nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0
nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; }
nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a32669f

flow_offload: support get multi-subsystem block · 1150ab0f

wenxu authored Aug 07, 2019

It provide a callback list to find the blocks of tc
and nft subsystems
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1150ab0f

flow_offload: move tc indirect block to flow offload · 4e481908

wenxu authored Aug 07, 2019

move tc indirect block to flow_offload and rename
it to flow indirect block.The nf_tables can use the
indr block architecture.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e481908

cls_api: add flow_indr_block_call function · e4da9102

wenxu authored Aug 07, 2019

This patch make indr_block_call don't access struct tc_indr_block_cb
and tc_indr_block_dev directly
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4da9102

cls_api: remove the tcf_block cache · f8436988

wenxu authored Aug 07, 2019

Remove the tcf_block in the tc_indr_block_dev for muti-subsystem
support.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f8436988

cls_api: modify the tc_indr_block_ing_cmd parameters. · 242453c2

wenxu authored Aug 07, 2019

This patch make tc_indr_block_ing_cmd can't access struct
tc_indr_block_dev and tc_indr_block_cb.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

242453c2

Merge branch 'net-batched-receive-in-GRO-path' · 61552d2c

David S. Miller authored Aug 08, 2019

Edward Cree says:

====================
net: batched receive in GRO path

This series listifies part of GRO processing, in a manner which allows those
 packets which are not GROed (i.e. for which dev_gro_receive returns
 GRO_NORMAL) to be passed on to the listified regular receive path.
dev_gro_receive() itself is not listified, nor the per-protocol GRO
 callback, since GRO's need to hold packets on lists under napi->gro_hash
 makes keeping the packets on other lists awkward, and since the GRO control
 block state of held skbs can refer only to one 'new' skb at a time.
Instead, when napi_frags_finish() handles a GRO_NORMAL result, stash the skb
 onto a list in the napi struct, which is received at the end of the napi
 poll or when its length exceeds the (new) sysctl net.core.gro_normal_batch.

Performance figures with this series, collected on a back-to-back pair of
 Solarflare sfn8522-r2 NICs with 120-second NetPerf tests.  In the stats,
 sample size n for old and new code is 6 runs each; p is from a Welch t-test.
Tests were run both with GRO enabled and disabled, the latter simulating
 uncoalesceable packets (e.g. due to IP or TCP options).  The receive side
 (which was the device under test) had the NetPerf process pinned to one CPU,
 and the device interrupts pinned to a second CPU.  CPU utilisation figures
 (used in cases of line-rate performance) are summed across all CPUs.
net.core.gro_normal_batch was left at its default value of 8.

TCP 4 streams, GRO on: all results line rate (9.415Gbps)
net-next: 210.3% cpu
after #1: 181.5% cpu (-13.7%, p=0.031 vs net-next)
after #3: 196.7% cpu (- 8.4%, p=0.136 vs net-next)
TCP 4 streams, GRO off:
net-next: 8.017 Gbps
after #1: 7.785 Gbps (- 2.9%, p=0.385 vs net-next)
after #3: 7.604 Gbps (- 5.1%, p=0.282 vs net-next.  But note *)
TCP 1 stream, GRO off:
net-next: 6.553 Gbps
after #1: 6.444 Gbps (- 1.7%, p=0.302 vs net-next)
after #3: 6.790 Gbps (+ 3.6%, p=0.169 vs net-next)
TCP 1 stream, GRO on, busy_read = 50: all results line rate
net-next: 156.0% cpu
after #1: 174.5% cpu (+11.9%, p=0.015 vs net-next)
after #3: 165.0% cpu (+ 5.8%, p=0.147 vs net-next)
TCP 1 stream, GRO off, busy_read = 50:
net-next: 6.488 Gbps
after #1: 6.625 Gbps (+ 2.1%, p=0.059 vs net-next)
after #3: 7.351 Gbps (+13.3%, p=0.026 vs net-next)
TCP_RR 100 streams, GRO off, 8000 byte payload
net-next: 995.083 us
after #1: 969.167 us (- 2.6%, p=0.204 vs net-next)
after #3: 976.433 us (- 1.9%, p=0.254 vs net-next)
TCP_RR 100 streams, GRO off, 8000 byte payload, busy_read = 50:
net-next:   2.851 ms
after #1:   2.871 ms (+ 0.7%, p=0.134 vs net-next)
after #3:   2.937 ms (+ 3.0%, p<0.001 vs net-next)
TCP_RR 100 streams, GRO off, 1 byte payload, busy_read = 50:
net-next: 867.317 us
after #1: 865.717 us (- 0.2%, p=0.334 vs net-next)
after #3: 868.517 us (+ 0.1%, p=0.414 vs net-next)

(*) These tests produced a mixture of line-rate and below-line-rate results,
 meaning that statistically speaking the results were 'censored' by the
 upper bound, and were thus not normally distributed, making a Welch t-test
 mathematically invalid.  I therefore also calculated estimators according
 to [1], which gave the following:
net-next: 8.133 Gbps
after #1: 8.130 Gbps (- 0.0%, p=0.499 vs net-next)
after #3: 7.680 Gbps (- 5.6%, p=0.285 vs net-next)
(though my procedure for determining ν wasn't mathematically well-founded
 either, so take that p-value with a grain of salt).
A further check came from dividing the bandwidth figure by the CPU usage for
 each test run, giving:
net-next: 3.461
after #1: 3.198 (- 7.6%, p=0.145 vs net-next)
after #3: 3.641 (+ 5.2%, p=0.280 vs net-next)

The above results are fairly mixed, and in most cases not statistically
 significant.  But I think we can roughly conclude that the series
 marginally improves non-GROable throughput, without hurting latency
 (except in the large-payload busy-polling case, which in any case yields
 horrid performance even on net-next (almost triple the latency without
 busy-poll).  Also, drivers which, unlike sfc, pass UDP traffic to GRO
 would expect to see a benefit from gaining access to batching.

Changed in v3:
 * gro_normal_batch sysctl now uses SYSCTL_ONE instead of &one
 * removed RFC tags (no comments after a week means no-one objects, right?)

Changed in v2:
 * During busy poll, call gro_normal_list() to receive batched packets
   after each cycle of the napi busy loop.  See comments in Patch #3 for
   complications of doing the same in busy_poll_stop().

[1]: Cohen 1959, doi: 10.1080/00401706.1959.10489859
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

61552d2c

net: use listified RX for handling GRO_NORMAL skbs · 323ebb61

Edward Cree authored Aug 06, 2019

When GRO decides not to coalesce a packet, in napi_frags_finish(), instead
 of passing it to the stack immediately, place it on a list in the napi
 struct.  Then, at flush time (napi_complete_done(), napi_poll(), or
 napi_busy_loop()), call netif_receive_skb_list_internal() on the list.
We'd like to do that in napi_gro_flush(), but it's not called if
 !napi->gro_bitmask, so we have to do it in the callers instead.  (There are
 a handful of drivers that call napi_gro_flush() themselves, but it's not
 clear why, or whether this will affect them.)
Because a full 64 packets is an inefficiently large batch, also consume the
 list whenever it exceeds gro_normal_batch, a new net/core sysctl that
 defaults to 8.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

323ebb61

sfc: falcon: don't score irq moderation points for GRO · 67270136

Edward Cree authored Aug 06, 2019

Same rationale as for sfc, except that this wasn't performance-tested.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

67270136

sfc: don't score irq moderation points for GRO · 5e040d4b

Edward Cree authored Aug 06, 2019

We already scored points when handling the RX event, no-one else does this,
and looking at the history it appears this was originally meant to only
score on merges, not on GRO_NORMAL. Moreover, it gets in the way of
changing GRO to not immediately pass GRO_NORMAL skbs to the stack.
Performance testing with four TCP streams received on a single CPU (where
throughput was line rate of 9.4Gbps in all tests) showed a 13.7% reduction
in RX CPU usage (n=6, p=0.03).
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e040d4b

qed: Add new ethtool supported port types based on media. · 5e6d9fc7

Rahul Verma authored Aug 05, 2019

Supported ports in ethtool <eth1> are displayed based on media type.
For media type fibre and twinaxial, port type is "FIBRE". Media type
Base-T is "TP" and media KR is "Backplane".

V1->V2:
Corrected the subject.
Signed-off-by: Rahul Verma <rahulv@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e6d9fc7

cxgb4: smt: Use normal int for refcount · ad2dcba0

Chuhong Yuan authored Aug 06, 2019

All refcount operations are protected by spinlocks now.
Then the atomic counter can be replaced by a normal int.

This patch depends on PATCH 1/2.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad2dcba0

cxgb4: smt: Add lock for atomic_dec_and_test · 4a8937b8

Chuhong Yuan authored Aug 06, 2019

The atomic_dec_and_test() is not safe because it is
outside of locks.
Move the locks of t4_smte_free() to its caller,
cxgb4_smt_release() to protect the atomic decrement.

Fixes: 3bdb376e ("cxgb4: introduce SMT ops to prepare for SMAC rewrite support")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a8937b8

selftests: Add l2tp tests · e858ef1c

David Ahern authored Aug 05, 2019

Add IPv4 and IPv6 l2tp tests. Current set is over IP and with
IPsec.

v2
- add l2tp.sh to TEST_PROGS in Makefile
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e858ef1c

net: delete "register" keyword · 9d2f1123

Alexey Dobriyan authored Aug 05, 2019

Delete long obsoleted "register" keyword.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9d2f1123

mkiss: Use refcount_t for refcount · 4b4de398

Chuhong Yuan authored Aug 03, 2019

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b4de398

dpaa_eth: Use refcount_t for refcount · 31168a6d

Chuhong Yuan authored Aug 03, 2019

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31168a6d

08 Aug, 2019 1 commit

Merge tag 'batadv-next-for-davem-20190808' of git://git.open-mesh.org/linux-merge · b3a598eb

David S. Miller authored Aug 08, 2019

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - Replace usage of strlcpy with strscpy, by Sven Eckelmann

 - Add OGMv2 per-interface queue and aggregations, by Linus Luessing
   (2 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b3a598eb

07 Aug, 2019 2 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 13dfb3fa
David S. Miller authored Aug 06, 2019
```
Just minor overlapping changes in the conflicts here.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
13dfb3fa

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 33920f1e

Linus Torvalds authored Aug 06, 2019

Pull networking fixes from David Miller:
 "Yeah I should have sent a pull request last week, so there is a lot
  more here than usual:

   1) Fix memory leak in ebtables compat code, from Wenwen Wang.

   2) Several kTLS bug fixes from Jakub Kicinski (circular close on
      disconnect etc.)

   3) Force slave speed check on link state recovery in bonding 802.3ad
      mode, from Thomas Falcon.

   4) Clear RX descriptor bits before assigning buffers to them in
      stmmac, from Jose Abreu.

   5) Several missing of_node_put() calls, mostly wrt. for_each_*() OF
      loops, from Nishka Dasgupta.

   6) Double kfree_skb() in peak_usb can driver, from Stephane Grosjean.

   7) Need to hold sock across skb->destructor invocation, from Cong
      Wang.

   8) IP header length needs to be validated in ipip tunnel xmit, from
      Haishuang Yan.

   9) Use after free in ip6 tunnel driver, also from Haishuang Yan.

  10) Do not use MSI interrupts on r8169 chips before RTL8168d, from
      Heiner Kallweit.

  11) Upon bridge device init failure, we need to delete the local fdb.
      From Nikolay Aleksandrov.

  12) Handle erros from of_get_mac_address() properly in stmmac, from
      Martin Blumenstingl.

  13) Handle concurrent rename vs. dump in netfilter ipset, from Jozsef
      Kadlecsik.

  14) Setting NETIF_F_LLTX on mac80211 causes complete breakage with
      some devices, so revert. From Johannes Berg.

  15) Fix deadlock in rxrpc, from David Howells.

  16) Fix Kconfig deps of enetc driver, we must have PHYLIB. From Yue
      Haibing.

  17) Fix mvpp2 crash on module removal, from Matteo Croce.

  18) Fix race in genphy_update_link, from Heiner Kallweit.

  19) bpf_xdp_adjust_head() stopped working with generic XDP when we
      fixes generic XDP to support stacked devices properly, fix from
      Jesper Dangaard Brouer.

  20) Unbalanced RCU locking in rt6_update_exception_stamp_rt(), from
      David Ahern.

  21) Several memory leaks in new sja1105 driver, from Vladimir Oltean"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (214 commits)
  net: dsa: sja1105: Fix memory leak on meta state machine error path
  net: dsa: sja1105: Fix memory leak on meta state machine normal path
  net: dsa: sja1105: Really fix panic on unregistering PTP clock
  net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well
  net: dsa: sja1105: Fix broken learning with vlan_filtering disabled
  net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus()
  net: sched: sample: allow accessing psample_group with rtnl
  net: sched: police: allow accessing police->params with rtnl
  net: hisilicon: Fix dma_map_single failed on arm64
  net: hisilicon: fix hip04-xmit never return TX_BUSY
  net: hisilicon: make hip04_tx_reclaim non-reentrant
  tc-testing: updated vlan action tests with batch create/delete
  net sched: update vlan action for batched events operations
  net: stmmac: tc: Do not return a fragment entry
  net: stmmac: Fix issues when number of Queues >= 4
  net: stmmac: xgmac: Fix XGMAC selftests
  be2net: disable bh with spin_lock in be_process_mcc
  net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
  net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs
  net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
  ...

33920f1e

06 Aug, 2019 15 commits

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 05bb5203

David S. Miller authored Aug 06, 2019

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2019-08-05

This series contains updates to i40e driver only.

Dmitrii adds missing statistic counters for VEB and VEB TC's.

Slawomir adds support for logging the "Disable Firmware LLDP" flag
option and its current status.

Jake fixes an issue where VF's being notified of their link status
before their queues are enabled which was causing issues.  So always
report link status down when the VF queues are not enabled.  Also adds
future proofing when statistics are added or removed by adding checks to
ensure the data pointer for the strings lines up with the expected
statistics count.

Czeslaw fixes the advertised mode reported in ethtool for FEC, where the
"None BaseR RS" was always being displayed no matter what the mode it
was in.  Also added logging information when the PF is entering or
leaving "allmulti" (or promiscuous) mode.  Fixed up the logging logic
for VF's when leaving multicast mode to not include unicast as well.

v2: drop Aleksandr's patch (previously patch #2 in the series) to
    display the VF MAC address that is set by the VF while community
    feedback is addressed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

05bb5203

openvswitch: Print error when ovs_execute_actions() fails · aa733660

Yifeng Sun authored Aug 04, 2019

Currently in function ovs_dp_process_packet(), return values of
ovs_execute_actions() are silently discarded. This patch prints out
an debug message when error happens so as to provide helpful hints
for debugging.
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

aa733660

Merge branch 'sja1105-fixes' · feac1d68

David S. Miller authored Aug 06, 2019

Vladimir Oltean says:

====================
Fixes for SJA1105 DSA: FDBs, Learning and PTP

This is an assortment of functional fixes for the sja1105 switch driver
targeted for the "net" tree (although they apply on net-next just as
well).

Patch 1/5 ("net: dsa: sja1105: Fix broken learning with vlan_filtering
disabled") repairs a breakage introduced in the early development stages
of the driver: support for traffic from the CPU has broken "normal"
frame forwarding (based on DMAC) - there is connectivity through the
switch only because all frames are flooded.
I debated whether this patch qualifies as a fix, since it puts the
switch into a mode it has never operated in before (aka SVL). But
"normal" forwarding did use to work before the "Traffic support for
SJA1105 DSA driver" patchset, and arguably this patch should have been
part of that.
Also, it would be strange for this feature to be broken in the 5.2 LTS.

Patch 2/5 ("net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as
well") is a simplification of a previous FDB-related patch that is
currently in the 5.3 rc's.

Patches 3/5 - 5/5 fix various crashes found while running linuxptp over the
switch ports for extended periods of time, or in conjunction with other
error conditions. The fixed-up commits were all introduced in 5.2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

feac1d68

net: dsa: sja1105: Fix memory leak on meta state machine error path · 93fa8587

Vladimir Oltean authored Aug 05, 2019

When RX timestamping is enabled and two link-local (non-meta) frames are
received in a row, this constitutes an error.

The tagger is always caching the last link-local frame, in an attempt to
merge it with the meta follow-up frame when that arrives. To recover
from the above error condition, the initial cached link-local frame is
dropped and the second frame in a row is cached (in expectance of the
second meta frame).

However, when dropping the initial link-local frame, its backing memory
was being leaked.

Fixes: f3097be2 ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93fa8587

net: dsa: sja1105: Fix memory leak on meta state machine normal path · f163fed2

Vladimir Oltean authored Aug 05, 2019

After a meta frame is received, it is associated with the cached
sp->data->stampable_skb from the DSA tagger private structure.

Cached means its refcount is incremented with skb_get() in order for
dsa_switch_rcv() to not free it when the tagger .rcv returns NULL.

The mistake is that skb_unref() is not the correct function to use. It
will correctly decrement the refcount (which will go back to zero) but
the skb memory will not be freed.  That is the job of kfree_skb(), which
also calls skb_unref().

But it turns out that freeing the cached stampable_skb is in fact not
necessary.  It is still a perfectly valid skb, and now it is even
annotated with the partial RX timestamp.  So remove the skb_copy()
altogether and simply pass the stampable_skb with a refcount of 1
(incremented by us, decremented by dsa_switch_rcv) up the stack.

Fixes: f3097be2 ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f163fed2

net: dsa: sja1105: Really fix panic on unregistering PTP clock · 6cb0abbd

Vladimir Oltean authored Aug 05, 2019

The IS_ERR_OR_NULL(priv->clock) check inside
sja1105_ptp_clock_unregister() is preventing cancel_delayed_work_sync
from actually being run.

Additionally, sja1105_ptp_clock_unregister() does not actually get run,
when placed in sja1105_remove(). The DSA switch gets torn down, but the
sja1105 module does not get unregistered. So sja1105_ptp_clock_unregister
needs to be moved to sja1105_teardown, to be symmetrical with
sja1105_ptp_clock_register which is called from the DSA sja1105_setup.

It is strange to fix a "fixes" patch, but the probe failure can only be
seen when the attached PHY does not respond to MDIO (issue which I can't
pinpoint the reason to) and it goes away after I power-cycle the board.
This time the patch was validated on a failing board, and the kernel
panic from the fixed commit's message can no longer be seen.

Fixes: 29dd908d ("net: dsa: sja1105: Cancel PTP delayed work on unregister")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6cb0abbd

net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well · 4b7da3d8

Vladimir Oltean authored Aug 05, 2019

It looks like the FDB dump taken from first-generation switches also
contains information on whether entries are static or not. So use that
instead of searching through the driver's tables.

Fixes: d7637782 ("net: dsa: sja1105: Implement is_static for FDB entries on E/T")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b7da3d8

net: dsa: sja1105: Fix broken learning with vlan_filtering disabled · 6d7c7d94

Vladimir Oltean authored Aug 05, 2019

When put under a bridge with vlan_filtering 0, the SJA1105 ports will
flood all traffic as if learning was broken. This is because learning
interferes with the rx_vid's configured by dsa_8021q as unique pvid's.

So learning technically still *does* work, it's just that the learnt
entries never get matched due to their unique VLAN ID.

The setting that saves the day is Shared VLAN Learning, which on this
switch family works exactly as desired: VLAN tagging still works
(untagged traffic gets the correct pvid) and FDB entries are still
populated with the correct contents including VID. Also, a frame cannot
violate the forwarding domain restrictions enforced by its classified
VLAN. It is just that the VID is ignored when looking up the FDB for
taking a forwarding decision (selecting the egress port).

This patch activates SVL, and the result is that frames with a learnt
DMAC are no longer flooded in the scenario described above.

Now exactly *because* SVL works as desired, we have to revisit some
earlier patches:

- It is no longer necessary to manipulate the VID of the 'bridge fdb
  {add,del}' command when vlan_filtering is off. This is because now,
  SVL is enabled for that case, so the actual VID does not matter*.

- It is still desirable to hide dsa_8021q VID's in the FDB dump
  callback. But right now the dump callback should no longer hide
  duplicates (one per each front panel port's pvid, plus one for the
  VLAN that the CPU port is going to tag a TX frame with), because there
  shouldn't be any (the switch will match a single FDB entry no matter
  its VID anyway).

* Not really... It's no longer necessary to transform a 'bridge fdb add'
  into 5 fdb add operations, but the user might still add a fdb entry with
  any vid, and all of them would appear as duplicates in 'bridge fdb
  show'. So force a 'bridge fdb add' to insert the VID of 0**, so that we
  can prune the duplicates at insertion time.

** The VID of 0 is better than 1 because it is always guaranteed to be
   in the ports' hardware filter. DSA also avoids putting the VID inside
   the netlink response message towards the bridge driver when we return
   this particular VID, which makes it suitable for FDB entries learnt
   with vlan_filtering off.

Fixes: 227d07a0 ("net: dsa: sja1105: Add support for traffic through standalone ports")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Georg Waibel <georg.waibel@sensor-technik.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d7c7d94

net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus() · f26e0cca

Nishka Dasgupta authored Aug 04, 2019

Each iteration of for_each_available_child_of_node() puts the previous
node, but in the case of a return from the middle of the loop, there
is no put, thus causing a memory leak. Hence add an of_node_put() before
the return.
Additionally, the local variable ports in the function
qca8k_setup_mdio_bus() takes the return value of of_get_child_by_name(),
which gets a node but does not put it. If the function returns without
putting ports, it may cause a memory leak. Hence put ports before the
mid-loop return statement, and also outside the loop after its last usage
in this function.
Issues found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f26e0cca

Merge branch 'Support-tunnels-over-VLAN-in-NFP' · ef68de56

David S. Miller authored Aug 06, 2019

John Hurley says:

====================
Support tunnels over VLAN in NFP

This patchset deals with tunnel encap and decap when the end-point IP
address is on an internal port (for example and OvS VLAN port). Tunnel
encap without VLAN is already supported in the NFP driver. This patchset
extends that to include a push VLAN along with tunnel header push.

Patches 1-4 extend the flow_offload IR API to include actions that use
skbedit to set the ptype of an SKB and that send a packet to port ingress
from the act_mirred module. Such actions are used in flower rules that
forward tunnel packets to internal ports where they can be decapsulated.
OvS and its TC API is an example of a user-space app that produces such
rules.

Patch 5 modifies the encap offload code to allow the pushing of a VLAN
header after a tunnel header push.

Patches 6-10 deal with tunnel decap when the end-point is on an internal
port. They detect 'pre-tunnel rules' which do not deal with tunnels
themselves but, rather, forward packets to internal ports where they
can be decapped if required. Such rules are offloaded to a table in HW
along with an indication of whether packets need to be passed to this
table of not (based on their destination MAC address). Matching against
this table prior to decapsulation in HW allows the correct parsing and
handling of outer VLANs on tunnelled packets and the correct updating of
stats for said 'pre-tunnel' rules.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ef68de56

nfp: flower: encode mac indexes with pre-tunnel rule check · 2e0bc7f3

John Hurley authored Aug 04, 2019

When a tunnel packet arrives on the NFP card, its destination MAC is
looked up and MAC index returned for it. This index can help verify the
tunnel by, for example, ensuring that the packet arrived on the expected
port. If the packet is destined for a known MAC that is not connected to a
given physical port then the mac index can have a global value (e.g. when
a series of bonded ports shared the same MAC).

If the packet is to be detunneled at a bridge device or internal port like
an Open vSwitch VLAN port, then it should first match a 'pre-tunnel' rule
to direct it to that internal port.

Use the MAC index to indicate if a packet should match a pre-tunnel rule
before decap is allowed. Do this by tracking the number of internal ports
associated with a MAC address and, if the number if >0, set a bit in the
mac_index to forward the packet to the pre-tunnel table before continuing
with decap.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e0bc7f3

nfp: flower: remove offloaded MACs when reprs are applied to OvS bridges · 09aa811b

John Hurley authored Aug 04, 2019

MAC addresses along with an identifying index are offloaded to firmware to
allow tunnel decapsulation. If a tunnel packet arrives with a matching
destination MAC address and a verified index, it can continue on the
decapsulation process. This replicates the MAC verifications carried out
in the kernel network stack.

When a netdev is added to a bridge (e.g. OvS) then packets arriving on
that dev are directed through the bridge datapath instead of passing
through the network stack. Therefore, tunnelled packets matching the MAC
of that dev will not be decapped here.

Replicate this behaviour on firmware by removing offloaded MAC addresses
when a MAC representer is added to an OvS bridge. This can prevent any
false positive tunnel decaps.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

09aa811b

nfp: flower: offload pre-tunnel rules · f12725d9

John Hurley authored Aug 04, 2019

Pre-tunnel rules are TC flower and OvS rules that forward a packet to the
tunnel end point where it can then pass through the network stack and be
decapsulated. These are required if the tunnel end point is, say, an OvS
internal port.

Currently, firmware determines that a packet is in a tunnel and decaps it
if it has a known destination IP and MAC address. However, this bypasses
the flower pre-tunnel rule and so does not update the stats. Further to
this it ignores VLANs that may exist outside of the tunnel header.

Offload pre-tunnel rules to the NFP. This embeds the pre-tunnel rule into
the tunnel decap process based on (firmware) mac index and VLAN. This
means that decap can be carried out correctly with VLANs and that stats
can be updated for all kernel rules correctly.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f12725d9

nfp: flower: verify pre-tunnel rules · 120ffd84

John Hurley authored Aug 04, 2019

Pre-tunnel rules must direct packets to an internal port based on L2
information. Rules that egress to an internal port are already indicated
by a non-NULL device in its nfp_fl_payload struct. Verfiy the rest of the
match fields indicate that the rule is a pre-tunnel rule. This requires a
full match on the destination MAC address, an option VLAN field, and no
specific matches on other lower layer fields (with the exception of L4
proto and flags).

If a rule is identified as a pre-tunnel rule then mark it for offload to
the pre-tunnel table. Similarly, remove it from the pre-tunnel table on
rule deletion. The actual offloading of these commands is left to a
following patch.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

120ffd84

nfp: flower: detect potential pre-tunnel rules · f5c977ee

John Hurley authored Aug 04, 2019

Pre-tunnel rules are used when the tunnel end-point is on an 'internal
port'. These rules are used to direct the tunnelled packets (based on outer
header fields) to the internal port where they can be detunnelled. The
rule must send the packet to ingress the internal port at the TC layer.

Currently FW does not support an action to send to ingress so cannot
offload such rules. However, in preparation for populating the pre-tunnel
table to represent such rules, check for rules that send to the ingress of
an internal port and mark them as such. Further validation of such rules
is left to subsequent patches.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5c977ee