Commits · b9f2b0ffde0c9b666b2b1672eb468b8f805a9b97 · nexedi / linux

15 Nov, 2019 33 commits

vsock: handle buffer_size sockopts in the core · b9f2b0ff

Stefano Garzarella authored Nov 14, 2019

virtio_transport and vmci_transport handle the buffer_size
sockopts in a very similar way.

In order to support multiple transports, this patch moves this
handling in the core to allow the user to change the options
also if the socket is not yet assigned to any transport.

This patch also adds the '.notify_buffer_size' callback in the
'struct virtio_transport' in order to inform the transport,
when the buffer_size is changed by the user. It is also useful
to limit the 'buffer_size' requested (e.g. virtio transports).
Acked-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9f2b0ff

vsock: add 'struct vsock_sock *' param to vsock_core_get_transport() · daabfbca

Stefano Garzarella authored Nov 14, 2019

Since now the 'struct vsock_sock' object contains a pointer to
the transport, this patch adds a parameter to the
vsock_core_get_transport() to return the right transport
assigned to the socket.

This patch modifies also the virtio_transport_get_ops(), that
uses the vsock_core_get_transport(), adding the
'struct vsock_sock *' parameter.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

daabfbca

vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock() · 4c7246dc

Stefano Garzarella authored Nov 14, 2019

We are going to add 'struct vsock_sock *' parameter to
virtio_transport_get_ops().

In some cases, like in the virtio_transport_reset_no_sock(),
we don't have any socket assigned to the packet received,
so we can't use the virtio_transport_get_ops().

In order to allow virtio_transport_reset_no_sock() to use the
'.send_pkt' callback from the 'vhost_transport' or 'virtio_transport',
we add the 'struct virtio_transport *' to it and to its caller:
virtio_transport_recv_pkt().

We moved the 'vhost_transport' and 'virtio_transport' definition,
to pass their address to the virtio_transport_recv_pkt().
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c7246dc

vsock: add 'transport' member in the struct vsock_sock · fe502c4a

Stefano Garzarella authored Nov 14, 2019

As a preparation to support multiple transports, this patch adds
the 'transport' member at the 'struct vsock_sock'.
This new field is initialized during the creation in the
__vsock_create() function.

This patch also renames the global 'transport' pointer to
'transport_single', since for now we're only supporting a single
transport registered at run-time.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe502c4a

vsock: remove include/linux/vm_sockets.h file · 3603a2e9

Stefano Garzarella authored Nov 14, 2019

This header file now only includes the "uapi/linux/vm_sockets.h".
We can include directly it when needed.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3603a2e9

vsock: remove vm_sockets_get_local_cid() · db205c76

Stefano Garzarella authored Nov 14, 2019

vm_sockets_get_local_cid() is only used in virtio_transport_common.c.
We can replace it calling the virtio_transport_get_ops() and
using the get_local_cid() callback registered by the transport.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

db205c76

vsock/vmci: remove unused VSOCK_DEFAULT_CONNECT_TIMEOUT · 7ed78bc4

Stefano Garzarella authored Nov 14, 2019

The VSOCK_DEFAULT_CONNECT_TIMEOUT definition was introduced with
commit d021c344 ("VSOCK: Introduce VM Sockets"), but it is
never used in the net/vmw_vsock/vmci_transport.c.

VSOCK_DEFAULT_CONNECT_TIMEOUT is used and defined in
net/vmw_vsock/af_vsock.c

Cc: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ed78bc4

Merge branch 'octeontx2-af-Debugfs-support-and-updates-to-parser-profile' · 798a496b

David S. Miller authored Nov 14, 2019

Sunil Goutham says:

====================
octeontx2-af: Debugfs support and updates to parser profile

This patchset adds debugfs support to dump various HW state machine info
which helps in debugging issues. Info includes
- Current queue context, stats, resource utilization etc
- MCAM entry utilization, miss and pkt drop counter
- CGX ingress and egress stats
- Current RVU block allocation status
- etc.

Rest patches has changes wrt
- Updated packet parsing profile for parsing more protocols.
- RSS algorithms to include inner protocols while generating hash
- Handle current version of silicon's limitations wrt shaping, coloring
  and fixed mapping of transmit limiter queue's configuration.
- Enable broadcast packet replication to PF and it's VFs.
- Support for configurable NDC cache waymask
- etc

Changes from v1:
   Removed inline keyword for newly introduced APIs in few patches.
   - Suggested by David Miller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

798a496b

octeontx2-af: Start/Stop traffic in CGX along with NPC · a7faa68b

Subbaraya Sundeep authored Nov 14, 2019

Traffic for a CGX mapped NIXLF can be stopped by disabling entries
in NPC MCAM or by configuring CGX and mailbox messages exist for the
two options. If traffic is stopped at CGX then VFs of that PF are
also effected hence CGX traffic should be started/stopped by
tracking all the users of it. This patch implements that CGX users
tracking. CGX is also configured along with NPC if required.

Also removed a check which mandates even number of LBK VFs.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7faa68b

octeontx2-af: Add option to disable dynamic entry caching in NDC · a0291766

Sunil Goutham authored Nov 14, 2019

A config option is added to disable caching of dynamic entries
like SQEs and stack pages. Also locks down all HW contexts in NDC,
preventing them from being evicted.

This option is useful when the queue count is large and there are
huge NDC cache misses. It's trade off between SQ context misses and
dynamically changing entries like SQE and stack page pointers.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0291766

octeontx2-af: Support configurable NDC cache way_mask · ee1e7591

Geetha sowjanya authored Nov 14, 2019

Each of the NIX/NPA LFs can choose which ways of their respective
NDC caches should be used to cache their contexts. This enables
flexible configurations like disabling caching for a LF, limiting
it's context to a certain set of ways etc etc. Separate way_mask
for NIX-TX and NIX-RX is not supported.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee1e7591

octeontx2-af: Enable broadcast packet replication · 561e8752

Sunil Goutham authored Nov 14, 2019

Ingress packet replication support has been added to 96xx B0
silicon. This patch enables using that feature to replicate
ingress broadcast packets to PF and it's VFs.

Also fixed below issues
- VFs can also install NPC MCAM entry to forward broadcast pkts.
  Otherwise, unless PF's interface is UP, VFs will not receive
  bcast packets.
- NPC MCAM entry is disabled when PF and all it's VFs are down.
- Few corner cases in installing multicast entry list.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

561e8752

octeontx2-af: Support fixed transmit scheduler topology · 5d9b976d

Sunil Goutham authored Nov 14, 2019

CN96xx initial silicon doesn't support all features pertaining to
NIX transmit scheduling and shaping.
- It supports a fixed topology of 1:1 mapped transmit
  limiters at all levels.
- Supports DWRR only at SMQ/MDQ and TL1.
- Doesn't support shaping and coloring.

This patch adds HW capability structure by which each variant
and skew of silicon can be differentiated by their supported
features. And adds support for A0 silicon's transmit scheduler
capabilities or rather limitations.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5d9b976d

octeontx2-af: Add more RSS algorithms · 206ff848

Kiran Kumar K authored Nov 14, 2019

This patch adds support for few more RSS key types for flow key
algorithm to compute rss hash index.

Following flow key types have been added.
- Tunnel types like NVGRE, VXLAN, GENEVE.
- L2 offload type ETH_DMAC, Here we will consider only DMAC 6 bytes.
- And extension header IPV6_EXT (1 byte followed by IPV6 header
- Hashing inner protocol fields for inner DMAC, IPv4/v6, TCP, UDP, SCTP.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

206ff848

octeontx2-af: Clear NPC MCAM entries before update · 8cc89ae9

Nithin Dabilpuram authored Nov 14, 2019

Writing into NPC MCAM1 and MCAM0 registers are suppressed if
they happened to form a reserved combination. Hence
clear and disable MCAM entries before update.

For HRM:
[CAM(1)]<n>=1, [CAM(0)]<n>=1: Reserved.
The reserved combination is not allowed. Hardware suppresses any
write to CAM(0) or CAM(1) that would result in the reserved combination for
any CAM bit.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8cc89ae9

octeontx2-af: Update NPC KPU packet parsing profile · 922584f6

Hao Zheng authored Nov 14, 2019

Updated NPC KPU packet parsing profile with support for following

- Fragmentation support for IPv4 IPv6 outer header
- NIX instruction header support
- QinQ with TPID of 0x8100 as non inner most vlan tag, as legacy
  network equipments still generate QinQ packets with this configuration.
- To better support RSS for tunnelled packets, udp based tunnel
  protocols such as vxlan, vxlan-gpe, geneve and gtpu are now
  captured into a separate layer E. Consequently, the inner
  packet headers are pushed one layer down to LF, LG, and LH
  accordingly.
- Support for rfc7510 mpls in udp. Up to 4 MPLS labels can be parsed
  and captured in one layer LE.
- Parser support for DSA, extended DSA and eDSA tags right after
  ethernet header by Marvell SOHO and Falcon switches. For extended
  DSA and eDSA tags, a special PKIND of 62 is used, as these tags don't
  contain a tpid field.
- Higig2 protocol header parsing support, added a NPC_LT_LA_HIGIG2_ETHER
  for a combined header of HIGIG2 and Ethernet.  Add a
  NPC_LT_LA_IH_NIX_HIGIG2_ETHER for a combined header of nix_ih,
  HIGIG2 and Ethernet on egress side. Also added 2 upper flags in LA to
  indicate the presence of nix_ih and HIGIG2.

Other changes include
- IPv4.TTL==0 IPv6.HLIM==0 check
- Per RFC 1858, mark fragment offset == 1 as error
- TCP invalid flags check
- Separate error codes for outer and inner IPv4 checksum errors.
- Fix a parser error when KPU parses incoming IPSec ESP and AH packets
- NPC vtag capture/strip hardware expect tag pointer to point to
  tpid/ethertype instead of tci. So move lb_ptr to point to tpid/ethertype.
- Fix npc parser error when parsing udp packets that don't have any payload.
- For a single MCAM entry to match on packets with one or stacked vlan tags
  combine NPC_LT_LB_STAG and NPC_LT_LB_QINQ to NPC_LT_LB_STAG_QINQ.
- NVGRE to have a separate ltype LD_NVGRE instead of combined with LD_GRE.
- Reserve top LD/LTYPEs to support custom KPU profile fields.
Signed-off-by: Hao Zheng <haoz@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

922584f6

octeontx2-af: Add macro to generate mbox handlers declarations · c6614738

Subbaraya Sundeep authored Nov 14, 2019

For every mailbox handler added to rvu, we are adding a function
declaration in rvu header file. Cleaned this up by adding a macro
to generate these declarations automatically.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c6614738

octeontx2-af: Sync hw mbox with bounce buffer. · fdb90298

Geetha sowjanya authored Nov 14, 2019

If mailbox client has a bounce buffer or a intermediate buffer where
mbox messages are framed then copy them from there to HW buffer.
If 'mbase' and 'hw_mbase' are not same then assume 'mbase' points to
bounce buffer.

This patch also adds msg_size field to mbox header to copy only valid
data instead of whole buffer.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fdb90298

octeontx2-af: Add mbox API to validate all responses · a36740f6

Sunil Goutham authored Nov 14, 2019

Added a new mailbox API which goes through all responses
to check their IDs and response codes.

Also added logic to prevent queuing multiple works to
process the same mailbox message. This scenario happens
when AF is processing a PF's request and menawhile PF
sends ACK to AF sent UP message, then mbox_hdr->num_msgs
in the PF->AF DOWN mbox region will be nonzero and AF
will end up processing PF's request again. This is fixed
by taking a backup of num_msgs counter and clearing the
same in the mbox region before scheduling work.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a36740f6

octeontx2-af: Add NPC MCAM entry allocation status to debugfs · e07fb507

Sunil Goutham authored Nov 14, 2019

Added support to display current NPC MCAM entries and counter's allocation
status ín debugfs.

cat /sys/kernel/debug/octeontx2/npc/mcam_info' will dump following info
- MCAM Rx and Tx keysize
- Total MCAM entries and counters
- Current available count
- Count of number of MCAM entries and counters allocated
  by a RVU PF/VF device.

Also, one NPC MCAM counter (last one) is reserved and mapped to
NPC RX_INTF's MISS_ACTION to count dropped packets due to no MCAM
entry match. This pkt drop counter can be checked via debugfs.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e07fb507

octeontx2-af: Add per CGX port level NIX Rx/Tx counters · f967488d

Linu Cherian authored Nov 14, 2019

A CGX port is shared by a RVU PF and it's VFs. These per
CGX port level NIX Rx/Tx counters are cumilative stats of
all NIXLFs sharing this port. These stats when compared
to CGX Rx/Tx stats helps in identifying pkts dropped within
the system, if any.
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f967488d

octeontx2-af: Add CGX LMAC stats to debugfs · c57211b5

Prakash Brahmajyosyula authored Nov 14, 2019

This patch adds CGX LMAC physical interface or serdes Rx/Tx
packet stats to debugfs.

'cat cgx<idx>/lmac<idx>/stats' dumps the current interface link
status and Rx/Tx stats. Stats include pkt received/transmitted,
dropped, pause frames etc etc.
Signed-off-by: Prakash Brahmajyosyula <bprakash@marvell.com>
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c57211b5

octeontx2-af: Add NDC block stats to debugfs. · c5a797e0

Prakash Brahmajyosyula authored Nov 14, 2019

NDC is a data cache unit which caches NPA and NIX block's
aura/pool/RQ/SQ/CQ/etc contexts to reduce number of costly
DRAM accesses.

This patch adds support to dump cache's performance stats
like cache line hit/miss counters, average cycles taken for
accessing cached and non-cached data. This will help in
checking if NPA/NIX context reads/writes are having NDC cache
misses which inturn might effect performance.

Also changed NDC enums to reflect correct NDC hardware instance.
Signed-off-by: Prakash Brahmajyosyula <bprakash@marvell.com>
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5a797e0

octeontx2-af: Add NIX RQ, SQ and CQ contexts to debugfs · 02e202c3

Prakash Brahmajyosyula authored Nov 14, 2019

To aid in debugging NIX block related issues, added support to dump
NIX block LF's RQ, SQ and CQ hardware contexts in debugfs. User can
check which contexts are enabled currently and dump it's current HW
context.

Four new files 'qsize', 'rq_ctx', 'sq_ctx' and 'cq_ctx' are added to the
debugfs at 'sys/kernel/debug/octeontx2/nix/'

'echo <nixlf index> > qsize' will display current enabled CQ/SQ/RQs.
'echo <nixlf> [rq number/all] > rq_ctx',
'echo <nixlf> [sq number/all] > sq_ctx' &
'echo <nixlf> [cq number/all] > cq_ctx' will dump RQ/SQ/CQ's current
hardware context.
Signed-off-by: Prakash Brahmajyosyula <bprakash@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

02e202c3

octeontx2-af: Add NPA aura and pool contexts to debugfs · 8756828a

Christina Jacob authored Nov 14, 2019

To aid in debugging NPA related issues, added support to dump
NPA (pool allocator) block LF's aura and pool hardware contexts in
debugfs. User can check which contexts are enabled currently and dump
it's current HW context.

Three new files 'qsize', 'aura_ctx', 'pool_ctx' are added to the
debugfs at 'sys/kernel/debug/octeontx2/npa/'

'echo <npalf index> > qsize' will display current enabled Aura/Pools.
'echo <npalf> [aura number/all] > aura_ctx' &
'echo <npalf> [aura number/all] > pool_ctx' will dump Aura/Pool
context info.
Signed-off-by: Christina Jacob <cjacob@marvell.com>
Signed-off-by: Prakash Brahmajyosyula <bprakash@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8756828a

octeontx2-af: Dump current resource provisioning status · 23205e6d

Christina Jacob authored Nov 14, 2019

Added support to dump current resource provisioning status
of all resource virtualization unit (RVU) block's
(i.e NPA, NIX, SSO, SSOW, CPT, TIM) local functions attached
to a PF_FUNC into a debugfs file.

'cat /sys/kernel/debug/octeontx2/rsrc_alloc'
will show the current block LF's allocation status.
Signed-off-by: Christina Jacob <cjacob@marvell.com>
Signed-off-by: Prakash Brahmajyosyula <bprakash@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

23205e6d

net: mvneta: fix build skb for bm capable devices · b37fa92e

Lorenzo Bianconi authored Nov 14, 2019

Fix build_skb for bm capable devices when they fall-back using swbm path
(e.g. when bm properties are configured in device tree but
CONFIG_MVNETA_BM_ENABLE is not set). In this case rx_offset_correction is
overwritten so we need to use it building skb instead of
MVNETA_SKB_HEADROOM directly

Fixes: 8dc9a088 ("net: mvneta: rely on build_skb in mvneta_rx_swbm poll routine")
Fixes: 0db51da7 ("net: mvneta: add basic XDP support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reported-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

b37fa92e

Merge tag 'mlx5-updates-2019-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · f97d139a

David S. Miller authored Nov 14, 2019

Saeed Mahameed says:

====================
mlx5-updates-2019-11-12

1) Merge mlx5-next for devlink reload and flowtable offloads dependencies
2) Devlink reload support
3) TC Flowtable offloads
4) Misc cleanup
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f97d139a

r8169: use r8168d_modify_extpage in rtl8168f_config_eee_phy · d0db136f

Heiner Kallweit authored Nov 13, 2019

Use r8168d_modify_extpage() also in rtl8168f_config_eee_phy() to
simplify the code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d0db136f

ibmveth: Detect unsupported packets before sending to the hypervisor · 6f227543

Cris Forno authored Nov 13, 2019

Currently, when ibmveth receive a loopback packet, it reports an
ambiguous error message "tx: h_send_logical_lan failed with rc=-4"
because the hypervisor rejects those types of packets. This fix
detects loopback packet and assures the source packet's MAC address
matches the driver's MAC address before transmitting to the
hypervisor.
Signed-off-by: Cris Forno <cforno12@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f227543

net: phy: dp83869: Add TI dp83869 phy · 01db923e

Dan Murphy authored Nov 13, 2019

Add support for the TI DP83869 Gigabit ethernet phy
device.

The DP83869 is a robust, low power, fully featured
Physical Layer transceiver with integrated PMD
sublayers to support 10BASE-T, 100BASE-TX and
1000BASE-T Ethernet protocols.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01db923e

dt-bindings: net: dp83869: Add TI dp83869 phy · 4d66c56f

Dan Murphy authored Nov 13, 2019

Add dt bindings for the TI dp83869 Gigabit ethernet phy
device.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
CC: Rob Herring <robh+dt@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d66c56f

net: openvswitch: add hash info to upcall · bd1903b7

Tonghao Zhang authored Nov 13, 2019

When using the kernel datapath, the upcall don't
include skb hash info relatived. That will introduce
some problem, because the hash of skb is important
in kernel stack. For example, VXLAN module uses
it to select UDP src port. The tx queue selection
may also use the hash in stack.

Hash is computed in different ways. Hash is random
for a TCP socket, and hash may be computed in hardware,
or software stack. Recalculation hash is not easy.

Hash of TCP socket is computed:
tcp_v4_connect
    -> sk_set_txhash (is random)

__tcp_transmit_skb
    -> skb_set_hash_from_sk

There will be one upcall, without information of skb
hash, to ovs-vswitchd, for the first packet of a TCP
session. The rest packets will be processed in Open vSwitch
modules, hash kept. If this tcp session is forward to
VXLAN module, then the UDP src port of first tcp packet
is different from rest packets.

TCP packets may come from the host or dockers, to Open vSwitch.
To fix it, we store the hash info to upcall, and restore hash
when packets sent back.

+---------------+          +-------------------------+
|   Docker/VMs  |          |     ovs-vswitchd        |
+----+----------+          +-+--------------------+--+
     |                       ^                    |
     |                       |                    |
     |                       |  upcall            v restore packet hash (not recalculate)
     |                     +-+--------------------+--+
     |  tap netdev         |                         |   vxlan module
     +--------------->     +-->  Open vSwitch ko     +-->
       or internal type    |                         |
                           +-------------------------+

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.htmlSigned-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd1903b7

14 Nov, 2019 7 commits

Merge branch 'Rework-mt762x-GDM-setup-flow' · 839554b7

David S. Miller authored Nov 14, 2019

MarkLee says:

====================
Rework mt762x GDM setup flow

The mt762x GDM block is mainly used to setup the HW internal
rx path from GMAC to RX DMA engine(PDMA) and the packet
switching engine(PSE) is responsed to do the data forward
following the GDM configuration.

This patch set have three goals :

1. Integrate GDM/PSE setup operations into single function "mtk_gdm_config"

2. Refine the timing of GDM/PSE setup, move it from mtk_hw_init
   to mtk_open

3. Enable GDM GDMA_DROP_ALL mode to drop all packet during the
   stop operation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

839554b7

net: ethernet: mediatek: Enable GDM GDMA_DROP_ALL mode · 8d66a818

MarkLee authored Nov 13, 2019

Enable GDM GDMA_DROP_ALL mode to drop all packet during the
stop operation. This is recommended by the mt762x HW design
to drop all packet from GMAC before stopping PDMA.
Signed-off-by: MarkLee <Mark-MC.Lee@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d66a818

net: ethernet: mediatek: Refine the timing of GDM/PSE setup · 5ac9eda0

MarkLee authored Nov 13, 2019

Refine the timing of GDM/PSE setup, move it from mtk_hw_init
to mtk_open. This is recommended by the mt762x HW design to
do GDM/PSE setup only after PDMA has been started.

We exclude mt7628 in mtk_gdm_config function since it is a old IP
and there is no GDM/PSE block on it.
Signed-off-by: MarkLee <Mark-MC.Lee@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ac9eda0

net: ethernet: mediatek: Integrate GDM/PSE setup operations · 8d3f4a95

MarkLee authored Nov 13, 2019

Integrate GDM/PSE setup operations into single function "mtk_gdm_config"
Signed-off-by: MarkLee <Mark-MC.Lee@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d3f4a95

net: dsa: sja1105: Simplify reset handling · abfb228a

Vladimir Oltean authored Nov 13, 2019

We don't really need 10k species of reset. Remove everything except cold
reset which is what is actually used. Too bad the hardware designers
couldn't agree to use the same bit field for rev 1 and rev 2, so the
(*reset_cmd) function pointer is there to stay.

However let's simplify the prototype and give it a struct dsa_switch (we
want to avoid forward-declarations of structures, in this case struct
sja1105_private, wherever we can).
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

abfb228a

Merge branch 'PTP-clock-source-for-SJA1105-tc-taprio-offload' · ccb68993

David S. Miller authored Nov 14, 2019

Vladimir Oltean says:

====================
PTP clock source for SJA1105 tc-taprio offload

This series makes the IEEE 802.1Qbv egress scheduler of the sja1105
switch use a time reference that is synchronized to the network. This
enables quite a few real Time Sensitive Networking use cases, since in
this mode the switch can offer its clients a TDMA sort of access to the
network, and guaranteed latency for frames that are properly scheduled
based on the common PTP time.

The driver needs to do a 2-part activity:
- Program the gate control list into the static config and upload it
  over SPI to the switch (already supported)
- Write the activation time of the scheduler (base-time) into the
  PTPSCHTM register, and set the PTPSTRTSCH bit.
- Monitor the activation of the scheduler at the planned time and its
  health.

Ok, 3 parts.

The time-aware scheduler cannot be programmed to activate at a time in
the past, and there is some logic to avoid that.

PTPCLKCORP is one of those "black magic" registers that just need to be
written to the length of the cycle. There is a 40-line long comment in
the second patch which explains why.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ccb68993

net: dsa: sja1105: Implement state machine for TAS with PTP clock source · 86db36a3

Vladimir Oltean authored Nov 12, 2019

Tested using the following bash script and the tc from iproute2-next:

	#!/bin/bash

	set -e -u -o pipefail

	NSEC_PER_SEC="1000000000"

	gatemask() {
		local tc_list="$1"
		local mask=0

		for tc in ${tc_list}; do
			mask=$((${mask} | (1 << ${tc})))
		done

		printf "%02x" ${mask}
	}

	if ! systemctl is-active --quiet ptp4l; then
		echo "Please start the ptp4l service"
		exit
	fi

	now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
	# Phase-align the base time to the start of the next second.
	sec=$(echo "${now}" | gawk -F. '{ print $1; }')
	base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"

	tc qdisc add dev swp5 parent root handle 100 taprio \
		num_tc 8 \
		map 0 1 2 3 5 6 7 \
		queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
		base-time ${base_time} \
		sched-entry S $(gatemask 7) 100000 \
		sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
		clockid CLOCK_TAI flags 2

The "state machine" is a workqueue invoked after each manipulation
command on the PTP clock (reset, adjust time, set time, adjust
frequency) which checks over the state of the time-aware scheduler.
So it is not monitored periodically, only in reaction to a PTP command
typically triggered from a userspace daemon (linuxptp). Otherwise there
is no reason for things to go wrong.

Now that the timecounter/cyclecounter has been replaced with hardware
operations on the PTP clock, the TAS Kconfig now depends upon PTP and
the standalone clocksource operating mode has been removed.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

86db36a3