Commit 08cbabb7 authored by David S. Miller's avatar David S. Miller

Merge tag 'mlx5-updates-2021-02-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

mlx5-updates-2021-02-04

Vlad Buslov says:
=================

Implement support for VF tunneling

Abstract

Currently, mlx5 only supports configuration with tunnel endpoint IP address on
uplink representor. Remove implicit and explicit assumptions of tunnel always
being terminated on uplink and implement necessary infrastructure for
configuring tunnels on VF representors and updating rules on such tunnels
according to routing changes.

SW TC model

From TC perspective VF tunnel configuration requires two rules in both
directions:

TX rules

1. Rule that redirects packets from UL to VF rep that has the tunnel
endpoint IP address:

$ tc -s filter show dev enp8s0f0 ingress
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
  dst_mac 16:c9:a0:2d:69:2c
  src_mac 0c:42:a1:58:ab:e4
  eth_type ipv4
  ip_flags nofrag
  in_hw in_hw_count 1
        action order 1: mirred (Egress Redirect to device enp8s0f0_0) stolen
        index 3 ref 1 bind 1 installed 377 sec used 0 sec
        Action statistics:
        Sent 114096 bytes 952 pkt (dropped 0, overlimits 0 requeues 0)
        Sent software 0 bytes 0 pkt
        Sent hardware 114096 bytes 952 pkt
        backlog 0b 0p requeues 0
        cookie 878fa48d8c423fc08c3b6ca599b50a97
        no_percpu
        used_hw_stats delayed

2. Rule that decapsulates the tunneled flow and redirects to destination VF
representor:

$ tc -s filter show dev vxlan_sys_4789 ingress
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
  dst_mac ca:2e:a7:3f:f5:0f
  src_mac 0a:40:bd:30:89:99
  eth_type ipv4
  enc_dst_ip 7.7.7.5
  enc_src_ip 7.7.7.1
  enc_key_id 98
  enc_dst_port 4789
  enc_tos 0
  ip_flags nofrag
  in_hw in_hw_count 1
        action order 1: tunnel_key  unset pipe
         index 2 ref 1 bind 1 installed 434 sec used 434 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        used_hw_stats delayed

        action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen
        index 4 ref 1 bind 1 installed 434 sec used 0 sec
        Action statistics:
        Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0)
        Sent software 0 bytes 0 pkt
        Sent hardware 129936 bytes 1082 pkt
        backlog 0b 0p requeues 0
        cookie ac17cf398c4c69e4a5b2f7aabd1b88ff
        no_percpu
        used_hw_stats delayed

RX rules

1. Rule that encapsulates the tunneled flow and redirects packets from
source VF rep to tunnel device:

$ tc -s filter show dev enp8s0f0_1 ingress
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
  dst_mac 0a:40:bd:30:89:99
  src_mac ca:2e:a7:3f:f5:0f
  eth_type ipv4
  ip_tos 0/0x3
  ip_flags nofrag
  in_hw in_hw_count 1
        action order 1: tunnel_key  set
        src_ip 7.7.7.5
        dst_ip 7.7.7.1
        key_id 98
        dst_port 4789
        nocsum
        ttl 64 pipe
         index 1 ref 1 bind 1 installed 411 sec used 411 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        no_percpu
        used_hw_stats delayed

        action order 2: mirred (Egress Redirect to device vxlan_sys_4789) stolen
        index 1 ref 1 bind 1 installed 411 sec used 0 sec
        Action statistics:
        Sent 5615833 bytes 4028 pkt (dropped 0, overlimits 0 requeues 0)
        Sent software 0 bytes 0 pkt
        Sent hardware 5615833 bytes 4028 pkt
        backlog 0b 0p requeues 0
        cookie bb406d45d343bf7ade9690ae80c7cba4
        no_percpu
        used_hw_stats delayed

2. Rule that redirects from tunnel device to UL rep:

$ tc -s filter show dev vxlan_sys_4789 ingress
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
  dst_mac ca:2e:a7:3f:f5:0f
  src_mac 0a:40:bd:30:89:99
  eth_type ipv4
  enc_dst_ip 7.7.7.5
  enc_src_ip 7.7.7.1
  enc_key_id 98
  enc_dst_port 4789
  enc_tos 0
  ip_flags nofrag
  in_hw in_hw_count 1
        action order 1: tunnel_key  unset pipe
         index 2 ref 1 bind 1 installed 434 sec used 434 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        used_hw_stats delayed

        action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen
        index 4 ref 1 bind 1 installed 434 sec used 0 sec
        Action statistics:
        Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0)
        Sent software 0 bytes 0 pkt
        Sent hardware 129936 bytes 1082 pkt
        backlog 0b 0p requeues 0
        cookie ac17cf398c4c69e4a5b2f7aabd1b88ff
        no_percpu
        used_hw_stats delayed

HW offloads model

For hardware offload the goal is to mach packet on both rules without exposing
it to software on tunnel endpoint VF. In order to achieve this for tx, TC
implementation marks encap rules with tunnel endpoint on mlx5 VF of same eswitch
with MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE flag and adds header modification
rule to overwrite packet source port to the value of tunnel VF. Eswitch code is
modified to recirculate such packets after source port value is changed, which
allows second tx rules to match.

For rx path indirect table infrastructure is used to allow fully processing VF
tunnel traffic in hardware. To implement such pipeline driver needs to program
the hardware after matching on UL rule to overwrite source vport from UL to
tunnel VF and recirculate the packet to the root table to allow matching on the
rule installed on tunnel VF. For this, indirect table matches all encapsulated
traffic by tunnel parameters and all other IP traffic is sent to tunnel VF by
the miss rule. Such configuration will cause packet to appear on VF representor
instead of VF itself if packet has been matches by indirect table rule based on
tunnel parameters but missed on second rule (after recirculation). Handle such
case by marking packets processed by indirect table with special 0xFFF value in
reg_c1 and extending slow table with additional flow group that matches on
reg_c0 (source port value set by indirect tables) and reg_c1 (special 0xFFF
mark). When creating offloads fdb tables, install one rule per VF vport to match
on recirculated miss packets and redirect them to appropriate VF vport.

Routing events

In order to support routing changes and migration of tunnel device between
different endpoint VFs, implement routing infrastructure and update it with FIB
events. Routing entry table is introduced to mlx5 TC. Every rx and tx VF tunnel
rule is attached to a routing entry, which is shared for rules of same tunnel.
On FIB event the work is scheduled to delete/recreate all rules of affected
tunnel.

Note: only vxlan tunnel type is supported by this series.

=================
parents 4429c5fc 8914add2
......@@ -40,6 +40,7 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH) += lag_mp.o lib/geneve.o lib/port_tun.o \
en_rep.o en/rep/bond.o en/mod_hdr.o
mlx5_core-$(CONFIG_MLX5_CLS_ACT) += en_tc.o en/rep/tc.o en/rep/neigh.o \
en/mapping.o lib/fs_chains.o en/tc_tun.o \
esw/indir_table.o en/tc_tun_encap.o \
en/tc_tun_vxlan.o en/tc_tun_gre.o en/tc_tun_geneve.o \
en/tc_tun_mplsoudp.o diag/en_tc_tracepoint.o
mlx5_core-$(CONFIG_MLX5_TC_CT) += en/tc_ct.o
......
......@@ -15,7 +15,7 @@ TRACE_EVENT(mlx5e_rep_neigh_update,
TP_PROTO(const struct mlx5e_neigh_hash_entry *nhe, const u8 *ha,
bool neigh_connected),
TP_ARGS(nhe, ha, neigh_connected),
TP_STRUCT__entry(__string(devname, nhe->m_neigh.dev->name)
TP_STRUCT__entry(__string(devname, nhe->neigh_dev->name)
__array(u8, ha, ETH_ALEN)
__array(u8, v4, 4)
__array(u8, v6, 16)
......@@ -25,7 +25,7 @@ TRACE_EVENT(mlx5e_rep_neigh_update,
struct in6_addr *pin6;
__be32 *p32;
__assign_str(devname, mn->dev->name);
__assign_str(devname, nhe->neigh_dev->name);
__entry->neigh_connected = neigh_connected;
memcpy(__entry->ha, ha, ETH_ALEN);
......
......@@ -77,7 +77,7 @@ TRACE_EVENT(mlx5e_stats_flower,
TRACE_EVENT(mlx5e_tc_update_neigh_used_value,
TP_PROTO(const struct mlx5e_neigh_hash_entry *nhe, bool neigh_used),
TP_ARGS(nhe, neigh_used),
TP_STRUCT__entry(__string(devname, nhe->m_neigh.dev->name)
TP_STRUCT__entry(__string(devname, nhe->neigh_dev->name)
__array(u8, v4, 4)
__array(u8, v6, 16)
__field(bool, neigh_used)
......@@ -86,7 +86,7 @@ TRACE_EVENT(mlx5e_tc_update_neigh_used_value,
struct in6_addr *pin6;
__be32 *p32;
__assign_str(devname, mn->dev->name);
__assign_str(devname, nhe->neigh_dev->name);
__entry->neigh_used = neigh_used;
p32 = (__be32 *)__entry->v4;
......
......@@ -129,10 +129,10 @@ static void mlx5e_rep_neigh_update(struct work_struct *work)
work);
struct mlx5e_neigh_hash_entry *nhe = update_work->nhe;
struct neighbour *n = update_work->n;
bool neigh_connected, same_dev;
struct mlx5e_encap_entry *e;
unsigned char ha[ETH_ALEN];
struct mlx5e_priv *priv;
bool neigh_connected;
u8 nud_state, dead;
rtnl_lock();
......@@ -146,12 +146,16 @@ static void mlx5e_rep_neigh_update(struct work_struct *work)
memcpy(ha, n->ha, ETH_ALEN);
nud_state = n->nud_state;
dead = n->dead;
same_dev = READ_ONCE(nhe->neigh_dev) == n->dev;
read_unlock_bh(&n->lock);
neigh_connected = (nud_state & NUD_VALID) && !dead;
trace_mlx5e_rep_neigh_update(nhe, ha, neigh_connected);
if (!same_dev)
goto out;
list_for_each_entry(e, &nhe->encap_list, encap_list) {
if (!mlx5e_encap_take(e))
continue;
......@@ -160,6 +164,7 @@ static void mlx5e_rep_neigh_update(struct work_struct *work)
mlx5e_rep_update_flows(priv, e, neigh_connected, ha);
mlx5e_encap_put(priv, e);
}
out:
rtnl_unlock();
mlx5e_release_neigh_update_work(update_work);
}
......@@ -175,7 +180,6 @@ static struct neigh_update_work *mlx5e_alloc_neigh_update_work(struct mlx5e_priv
if (WARN_ON(!update_work))
return NULL;
m_neigh.dev = n->dev;
m_neigh.family = n->ops->family;
memcpy(&m_neigh.dst_ip, n->primary_key, n->tbl->key_len);
......@@ -246,7 +250,7 @@ static int mlx5e_rep_netevent_event(struct notifier_block *nb,
rcu_read_lock();
list_for_each_entry_rcu(nhe, &neigh_update->neigh_list,
neigh_list) {
if (p->dev == nhe->m_neigh.dev) {
if (p->dev == READ_ONCE(nhe->neigh_dev)) {
found = true;
break;
}
......@@ -369,7 +373,8 @@ mlx5e_rep_neigh_entry_lookup(struct mlx5e_priv *priv,
}
int mlx5e_rep_neigh_entry_create(struct mlx5e_priv *priv,
struct mlx5e_encap_entry *e,
struct mlx5e_neigh *m_neigh,
struct net_device *neigh_dev,
struct mlx5e_neigh_hash_entry **nhe)
{
int err;
......@@ -379,10 +384,11 @@ int mlx5e_rep_neigh_entry_create(struct mlx5e_priv *priv,
return -ENOMEM;
(*nhe)->priv = priv;
memcpy(&(*nhe)->m_neigh, &e->m_neigh, sizeof(e->m_neigh));
memcpy(&(*nhe)->m_neigh, m_neigh, sizeof(*m_neigh));
spin_lock_init(&(*nhe)->encap_list_lock);
INIT_LIST_HEAD(&(*nhe)->encap_list);
refcount_set(&(*nhe)->refcnt, 1);
WRITE_ONCE((*nhe)->neigh_dev, neigh_dev);
err = mlx5e_rep_neigh_entry_insert(priv, *nhe);
if (err)
......
......@@ -16,7 +16,8 @@ struct mlx5e_neigh_hash_entry *
mlx5e_rep_neigh_entry_lookup(struct mlx5e_priv *priv,
struct mlx5e_neigh *m_neigh);
int mlx5e_rep_neigh_entry_create(struct mlx5e_priv *priv,
struct mlx5e_encap_entry *e,
struct mlx5e_neigh *m_neigh,
struct net_device *neigh_dev,
struct mlx5e_neigh_hash_entry **nhe);
void mlx5e_rep_neigh_entry_release(struct mlx5e_neigh_hash_entry *nhe);
......
......@@ -26,7 +26,9 @@ struct mlx5e_rep_indr_block_priv {
};
int mlx5e_rep_encap_entry_attach(struct mlx5e_priv *priv,
struct mlx5e_encap_entry *e)
struct mlx5e_encap_entry *e,
struct mlx5e_neigh *m_neigh,
struct net_device *neigh_dev)
{
struct mlx5e_rep_priv *rpriv = priv->ppriv;
struct mlx5_rep_uplink_priv *uplink_priv = &rpriv->uplink_priv;
......@@ -39,9 +41,9 @@ int mlx5e_rep_encap_entry_attach(struct mlx5e_priv *priv,
return err;
mutex_lock(&rpriv->neigh_update.encap_lock);
nhe = mlx5e_rep_neigh_entry_lookup(priv, &e->m_neigh);
nhe = mlx5e_rep_neigh_entry_lookup(priv, m_neigh);
if (!nhe) {
err = mlx5e_rep_neigh_entry_create(priv, e, &nhe);
err = mlx5e_rep_neigh_entry_create(priv, m_neigh, neigh_dev, &nhe);
if (err) {
mutex_unlock(&rpriv->neigh_update.encap_lock);
mlx5_tun_entropy_refcount_dec(tun_entropy,
......@@ -122,7 +124,7 @@ void mlx5e_rep_update_flows(struct mlx5e_priv *priv,
}
unlock:
mutex_unlock(&esw->offloads.encap_tbl_lock);
mlx5e_put_encap_flow_list(priv, &flow_list);
mlx5e_put_flow_list(priv, &flow_list);
}
static int
......@@ -651,7 +653,7 @@ bool mlx5e_rep_tc_update_skb(struct mlx5_cqe64 *cqe,
tc_skb_ext->chain = chain;
zone_restore_id = reg_c1 & ZONE_RESTORE_MAX;
zone_restore_id = reg_c1 & ESW_ZONE_ID_MASK;
uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
uplink_priv = &uplink_rpriv->uplink_priv;
......@@ -660,7 +662,7 @@ bool mlx5e_rep_tc_update_skb(struct mlx5_cqe64 *cqe,
return false;
}
tunnel_id = reg_c1 >> REG_MAPPING_SHIFT(TUNNEL_TO_REG);
tunnel_id = reg_c1 >> ESW_TUN_OFFSET;
return mlx5e_restore_tunnel(priv, skb, tc_priv, tunnel_id);
#endif /* CONFIG_NET_TC_SKB_EXT */
......
......@@ -27,7 +27,9 @@ void mlx5e_rep_update_flows(struct mlx5e_priv *priv,
unsigned char ha[ETH_ALEN]);
int mlx5e_rep_encap_entry_attach(struct mlx5e_priv *priv,
struct mlx5e_encap_entry *e);
struct mlx5e_encap_entry *e,
struct mlx5e_neigh *m_neigh,
struct net_device *neigh_dev);
void mlx5e_rep_encap_entry_detach(struct mlx5e_priv *priv,
struct mlx5e_encap_entry *e);
......
......@@ -711,6 +711,8 @@ mlx5_tc_ct_entry_add_rule(struct mlx5_tc_ct_priv *ct_priv,
attr->outer_match_level = MLX5_MATCH_L4;
attr->counter = entry->counter->counter;
attr->flags |= MLX5_ESW_ATTR_FLAG_NO_IN_PORT;
if (ct_priv->ns_type == MLX5_FLOW_NAMESPACE_FDB)
attr->esw_attr->in_mdev = priv->mdev;
mlx5_tc_ct_set_tuple_match(netdev_priv(ct_priv->netdev), spec, flow_rule);
mlx5e_tc_match_to_reg_match(spec, ZONE_TO_REG, entry->tuple.zone, MLX5_CT_ZONE_MASK);
......@@ -1761,7 +1763,6 @@ __mlx5_tc_ct_flow_offload_clear(struct mlx5_tc_ct_priv *ct_priv,
goto err_set_registers;
}
dealloc_mod_hdr_actions(mod_acts);
pre_ct_attr->modify_hdr = mod_hdr;
pre_ct_attr->action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
......
......@@ -73,7 +73,7 @@ struct mlx5_ct_attr {
#define zone_restore_to_reg_ct {\
.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_1,\
.moffset = 0,\
.mlen = 1,\
.mlen = (ESW_ZONE_ID_BITS / 8),\
.soffset = MLX5_BYTE_OFF(fte_match_param,\
misc_parameters_2.metadata_reg_c_1) + 3,\
}
......@@ -81,14 +81,12 @@ struct mlx5_ct_attr {
#define nic_zone_restore_to_reg_ct {\
.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_B,\
.moffset = 2,\
.mlen = 1,\
.mlen = (ESW_ZONE_ID_BITS / 8),\
}
#define REG_MAPPING_MLEN(reg) (mlx5e_tc_attr_to_reg_mappings[reg].mlen)
#define REG_MAPPING_MOFFSET(reg) (mlx5e_tc_attr_to_reg_mappings[reg].moffset)
#define REG_MAPPING_SHIFT(reg) (REG_MAPPING_MOFFSET(reg) * 8)
#define ZONE_RESTORE_BITS (REG_MAPPING_MLEN(ZONE_RESTORE_TO_REG) * 8)
#define ZONE_RESTORE_MAX GENMASK(ZONE_RESTORE_BITS - 1, 0)
#if IS_ENABLED(CONFIG_MLX5_TC_CT)
......
/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
/* Copyright (c) 2021 Mellanox Technologies. */
#ifndef __MLX5_EN_TC_PRIV_H__
#define __MLX5_EN_TC_PRIV_H__
#include "en_tc.h"
#define MLX5E_TC_FLOW_BASE (MLX5E_TC_FLAG_LAST_EXPORTED_BIT + 1)
#define MLX5E_TC_MAX_SPLITS 1
enum {
MLX5E_TC_FLOW_FLAG_INGRESS = MLX5E_TC_FLAG_INGRESS_BIT,
MLX5E_TC_FLOW_FLAG_EGRESS = MLX5E_TC_FLAG_EGRESS_BIT,
MLX5E_TC_FLOW_FLAG_ESWITCH = MLX5E_TC_FLAG_ESW_OFFLOAD_BIT,
MLX5E_TC_FLOW_FLAG_FT = MLX5E_TC_FLAG_FT_OFFLOAD_BIT,
MLX5E_TC_FLOW_FLAG_NIC = MLX5E_TC_FLAG_NIC_OFFLOAD_BIT,
MLX5E_TC_FLOW_FLAG_OFFLOADED = MLX5E_TC_FLOW_BASE,
MLX5E_TC_FLOW_FLAG_HAIRPIN = MLX5E_TC_FLOW_BASE + 1,
MLX5E_TC_FLOW_FLAG_HAIRPIN_RSS = MLX5E_TC_FLOW_BASE + 2,
MLX5E_TC_FLOW_FLAG_SLOW = MLX5E_TC_FLOW_BASE + 3,
MLX5E_TC_FLOW_FLAG_DUP = MLX5E_TC_FLOW_BASE + 4,
MLX5E_TC_FLOW_FLAG_NOT_READY = MLX5E_TC_FLOW_BASE + 5,
MLX5E_TC_FLOW_FLAG_DELETED = MLX5E_TC_FLOW_BASE + 6,
MLX5E_TC_FLOW_FLAG_CT = MLX5E_TC_FLOW_BASE + 7,
MLX5E_TC_FLOW_FLAG_L3_TO_L2_DECAP = MLX5E_TC_FLOW_BASE + 8,
MLX5E_TC_FLOW_FLAG_TUN_RX = MLX5E_TC_FLOW_BASE + 9,
MLX5E_TC_FLOW_FLAG_FAILED = MLX5E_TC_FLOW_BASE + 10,
};
struct mlx5e_tc_flow_parse_attr {
const struct ip_tunnel_info *tun_info[MLX5_MAX_FLOW_FWD_VPORTS];
struct net_device *filter_dev;
struct mlx5_flow_spec spec;
struct mlx5e_tc_mod_hdr_acts mod_hdr_acts;
int mirred_ifindex[MLX5_MAX_FLOW_FWD_VPORTS];
struct ethhdr eth;
};
/* Helper struct for accessing a struct containing list_head array.
* Containing struct
* |- Helper array
* [0] Helper item 0
* |- list_head item 0
* |- index (0)
* [1] Helper item 1
* |- list_head item 1
* |- index (1)
* To access the containing struct from one of the list_head items:
* 1. Get the helper item from the list_head item using
* helper item =
* container_of(list_head item, helper struct type, list_head field)
* 2. Get the contining struct from the helper item and its index in the array:
* containing struct =
* container_of(helper item, containing struct type, helper field[index])
*/
struct encap_flow_item {
struct mlx5e_encap_entry *e; /* attached encap instance */
struct list_head list;
int index;
};
struct encap_route_flow_item {
struct mlx5e_route_entry *r; /* attached route instance */
int index;
};
struct mlx5e_tc_flow {
struct rhash_head node;
struct mlx5e_priv *priv;
u64 cookie;
unsigned long flags;
struct mlx5_flow_handle *rule[MLX5E_TC_MAX_SPLITS + 1];
/* flows sharing the same reformat object - currently mpls decap */
struct list_head l3_to_l2_reformat;
struct mlx5e_decap_entry *decap_reformat;
/* flows sharing same route entry */
struct list_head decap_routes;
struct mlx5e_route_entry *decap_route;
struct encap_route_flow_item encap_routes[MLX5_MAX_FLOW_FWD_VPORTS];
/* Flow can be associated with multiple encap IDs.
* The number of encaps is bounded by the number of supported
* destinations.
*/
struct encap_flow_item encaps[MLX5_MAX_FLOW_FWD_VPORTS];
struct mlx5e_tc_flow *peer_flow;
struct mlx5e_mod_hdr_handle *mh; /* attached mod header instance */
struct mlx5e_hairpin_entry *hpe; /* attached hairpin instance */
struct list_head hairpin; /* flows sharing the same hairpin */
struct list_head peer; /* flows with peer flow */
struct list_head unready; /* flows not ready to be offloaded (e.g
* due to missing route)
*/
struct net_device *orig_dev; /* netdev adding flow first */
int tmp_entry_index;
struct list_head tmp_list; /* temporary flow list used by neigh update */
refcount_t refcnt;
struct rcu_head rcu_head;
struct completion init_done;
int tunnel_id; /* the mapped tunnel id of this flow */
struct mlx5_flow_attr *attr;
};
u8 mlx5e_tc_get_ip_version(struct mlx5_flow_spec *spec, bool outer);
struct mlx5_flow_handle *
mlx5e_tc_offload_fdb_rules(struct mlx5_eswitch *esw,
struct mlx5e_tc_flow *flow,
struct mlx5_flow_spec *spec,
struct mlx5_flow_attr *attr);
bool mlx5e_is_offloaded_flow(struct mlx5e_tc_flow *flow);
static inline void __flow_flag_set(struct mlx5e_tc_flow *flow, unsigned long flag)
{
/* Complete all memory stores before setting bit. */
smp_mb__before_atomic();
set_bit(flag, &flow->flags);
}
#define flow_flag_set(flow, flag) __flow_flag_set(flow, MLX5E_TC_FLOW_FLAG_##flag)
static inline bool __flow_flag_test_and_set(struct mlx5e_tc_flow *flow,
unsigned long flag)
{
/* test_and_set_bit() provides all necessary barriers */
return test_and_set_bit(flag, &flow->flags);
}
#define flow_flag_test_and_set(flow, flag) \
__flow_flag_test_and_set(flow, \
MLX5E_TC_FLOW_FLAG_##flag)
static inline void __flow_flag_clear(struct mlx5e_tc_flow *flow, unsigned long flag)
{
/* Complete all memory stores before clearing bit. */
smp_mb__before_atomic();
clear_bit(flag, &flow->flags);
}
#define flow_flag_clear(flow, flag) __flow_flag_clear(flow, \
MLX5E_TC_FLOW_FLAG_##flag)
static inline bool __flow_flag_test(struct mlx5e_tc_flow *flow, unsigned long flag)
{
bool ret = test_bit(flag, &flow->flags);
/* Read fields of flow structure only after checking flags. */
smp_mb__after_atomic();
return ret;
}
#define flow_flag_test(flow, flag) __flow_flag_test(flow, \
MLX5E_TC_FLOW_FLAG_##flag)
void mlx5e_tc_unoffload_from_slow_path(struct mlx5_eswitch *esw,
struct mlx5e_tc_flow *flow);
struct mlx5_flow_handle *
mlx5e_tc_offload_to_slow_path(struct mlx5_eswitch *esw,
struct mlx5e_tc_flow *flow,
struct mlx5_flow_spec *spec);
void mlx5e_tc_unoffload_fdb_rules(struct mlx5_eswitch *esw,
struct mlx5e_tc_flow *flow,
struct mlx5_flow_attr *attr);
struct mlx5e_tc_flow *mlx5e_flow_get(struct mlx5e_tc_flow *flow);
void mlx5e_flow_put(struct mlx5e_priv *priv, struct mlx5e_tc_flow *flow);
struct mlx5_fc *mlx5e_tc_get_counter(struct mlx5e_tc_flow *flow);
#endif /* __MLX5_EN_TC_PRIV_H__ */
......@@ -59,17 +59,30 @@ int mlx5e_tc_tun_init_encap_attr(struct net_device *tunnel_dev,
int mlx5e_tc_tun_create_header_ipv4(struct mlx5e_priv *priv,
struct net_device *mirred_dev,
struct mlx5e_encap_entry *e);
int mlx5e_tc_tun_update_header_ipv4(struct mlx5e_priv *priv,
struct net_device *mirred_dev,
struct mlx5e_encap_entry *e);
#if IS_ENABLED(CONFIG_INET) && IS_ENABLED(CONFIG_IPV6)
int mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv,
struct net_device *mirred_dev,
struct mlx5e_encap_entry *e);
int mlx5e_tc_tun_update_header_ipv6(struct mlx5e_priv *priv,
struct net_device *mirred_dev,
struct mlx5e_encap_entry *e);
#else
static inline int
mlx5e_tc_tun_create_header_ipv6(struct mlx5e_priv *priv,
struct net_device *mirred_dev,
struct mlx5e_encap_entry *e) { return -EOPNOTSUPP; }
int mlx5e_tc_tun_update_header_ipv6(struct mlx5e_priv *priv,
struct net_device *mirred_dev,
struct mlx5e_encap_entry *e)
{ return -EOPNOTSUPP; }
#endif
int mlx5e_tc_tun_route_lookup(struct mlx5e_priv *priv,
struct mlx5_flow_spec *spec,
struct mlx5_flow_attr *attr);
bool mlx5e_tc_tun_device_to_offload(struct mlx5e_priv *priv,
struct net_device *netdev);
......
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
/* Copyright (c) 2021 Mellanox Technologies. */
#ifndef __MLX5_EN_TC_TUN_ENCAP_H__
#define __MLX5_EN_TC_TUN_ENCAP_H__
#include "tc_priv.h"
void mlx5e_detach_encap(struct mlx5e_priv *priv,
struct mlx5e_tc_flow *flow, int out_index);
int mlx5e_attach_encap(struct mlx5e_priv *priv,
struct mlx5e_tc_flow *flow,
struct net_device *mirred_dev,
int out_index,
struct netlink_ext_ack *extack,
struct net_device **encap_dev,
bool *encap_valid);
int mlx5e_attach_decap(struct mlx5e_priv *priv,
struct mlx5e_tc_flow *flow,
struct netlink_ext_ack *extack);
void mlx5e_detach_decap(struct mlx5e_priv *priv,
struct mlx5e_tc_flow *flow);
int mlx5e_attach_decap_route(struct mlx5e_priv *priv,
struct mlx5e_tc_flow *flow);
void mlx5e_detach_decap_route(struct mlx5e_priv *priv,
struct mlx5e_tc_flow *flow);
struct ip_tunnel_info *mlx5e_dup_tun_info(const struct ip_tunnel_info *tun_info);
int mlx5e_tc_set_attr_rx_tun(struct mlx5e_tc_flow *flow,
struct mlx5_flow_spec *spec);
struct mlx5e_tc_tun_encap *mlx5e_tc_tun_init(struct mlx5e_priv *priv);
void mlx5e_tc_tun_cleanup(struct mlx5e_tc_tun_encap *encap);
#endif /* __MLX5_EN_TC_TUN_ENCAP_H__ */
......@@ -59,6 +59,8 @@ struct mlx5e_neigh_update_table {
struct mlx5_tc_ct_priv;
struct mlx5e_rep_bond;
struct mlx5e_tc_tun_encap;
struct mlx5_rep_uplink_priv {
/* Filters DB - instantiated by the uplink representor and shared by
* the uplink's VFs
......@@ -90,6 +92,9 @@ struct mlx5_rep_uplink_priv {
/* support eswitch vports bonding */
struct mlx5e_rep_bond *bond;
/* tc tunneling encapsulation private data */
struct mlx5e_tc_tun_encap *encap;
};
struct mlx5e_rep_priv {
......@@ -110,7 +115,6 @@ struct mlx5e_rep_priv *mlx5e_rep_to_rep_priv(struct mlx5_eswitch_rep *rep)
}
struct mlx5e_neigh {
struct net_device *dev;
union {
__be32 v4;
struct in6_addr v6;
......@@ -122,6 +126,7 @@ struct mlx5e_neigh_hash_entry {
struct rhash_head rhash_node;
struct mlx5e_neigh m_neigh;
struct mlx5e_priv *priv;
struct net_device *neigh_dev;
/* Save the neigh hash entry in a list on the representor in
* addition to the hash table. In order to iterate easily over the
......@@ -153,6 +158,7 @@ enum {
/* set when the encap entry is successfully offloaded into HW */
MLX5_ENCAP_ENTRY_VALID = BIT(0),
MLX5_REFORMAT_DECAP = BIT(1),
MLX5_ENCAP_ENTRY_NO_ROUTE = BIT(2),
};
struct mlx5e_decap_key {
......@@ -175,12 +181,12 @@ struct mlx5e_encap_entry {
struct mlx5e_neigh_hash_entry *nhe;
/* neigh hash entry list of encaps sharing the same neigh */
struct list_head encap_list;
struct mlx5e_neigh m_neigh;
/* a node of the eswitch encap hash table which keeping all the encap
* entries
*/
struct hlist_node encap_hlist;
struct list_head flows;
struct list_head route_list;
struct mlx5_pkt_reformat *pkt_reformat;
const struct ip_tunnel_info *tun_info;
unsigned char h_dest[ETH_ALEN]; /* destination eth addr */
......
......@@ -37,6 +37,8 @@
#include "en.h"
#include "eswitch.h"
#include "en/tc_ct.h"
#include "en/tc_tun.h"
#include "en_rep.h"
#define MLX5E_TC_FLOW_ID_MASK 0x0000ffff
......@@ -76,6 +78,7 @@ struct mlx5_flow_attr {
struct mlx5_flow_table *dest_ft;
u8 inner_match_level;
u8 outer_match_level;
u8 ip_version;
u32 flags;
union {
struct mlx5_esw_flow_attr esw_attr[0];
......@@ -83,6 +86,19 @@ struct mlx5_flow_attr {
};
};
struct mlx5_rx_tun_attr {
u16 decap_vport;
union {
__be32 v4;
struct in6_addr v6;
} src_ip; /* Valid if decap_vport is not zero */
union {
__be32 v4;
struct in6_addr v6;
} dst_ip; /* Valid if decap_vport is not zero */
u32 vni;
};
#define MLX5E_TC_TABLE_CHAIN_TAG_BITS 16
#define MLX5E_TC_TABLE_CHAIN_TAG_MASK GENMASK(MLX5E_TC_TABLE_CHAIN_TAG_BITS - 1, 0)
......@@ -158,7 +174,7 @@ bool mlx5e_encap_take(struct mlx5e_encap_entry *e);
void mlx5e_encap_put(struct mlx5e_priv *priv, struct mlx5e_encap_entry *e);
void mlx5e_take_all_encap_flows(struct mlx5e_encap_entry *e, struct list_head *flow_list);
void mlx5e_put_encap_flow_list(struct mlx5e_priv *priv, struct list_head *flow_list);
void mlx5e_put_flow_list(struct mlx5e_priv *priv, struct list_head *flow_list);
struct mlx5e_neigh_hash_entry;
void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe);
......@@ -167,6 +183,7 @@ void mlx5e_tc_reoffload_flows_work(struct work_struct *work);
enum mlx5e_tc_attr_to_reg {
CHAIN_TO_REG,
VPORT_TO_REG,
TUNNEL_TO_REG,
CTSTATE_TO_REG,
ZONE_TO_REG,
......@@ -197,6 +214,11 @@ int mlx5e_tc_match_to_reg_set(struct mlx5_core_dev *mdev,
enum mlx5e_tc_attr_to_reg type,
u32 data);
void mlx5e_tc_match_to_reg_mod_hdr_change(struct mlx5_core_dev *mdev,
struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts,
enum mlx5e_tc_attr_to_reg type,
int act_id, u32 data);
void mlx5e_tc_match_to_reg_match(struct mlx5_flow_spec *spec,
enum mlx5e_tc_attr_to_reg type,
u32 data,
......@@ -207,6 +229,16 @@ void mlx5e_tc_match_to_reg_get_match(struct mlx5_flow_spec *spec,
u32 *data,
u32 *mask);
int mlx5e_tc_match_to_reg_set_and_get_id(struct mlx5_core_dev *mdev,
struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts,
enum mlx5_flow_namespace_type ns,
enum mlx5e_tc_attr_to_reg type,
u32 data);
int mlx5e_tc_add_flow_mod_hdr(struct mlx5e_priv *priv,
struct mlx5e_tc_flow_parse_attr *parse_attr,
struct mlx5e_tc_flow *flow);
int alloc_mod_hdr_actions(struct mlx5_core_dev *mdev,
int namespace,
struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts);
......@@ -242,6 +274,10 @@ mlx5_tc_rule_delete(struct mlx5e_priv *priv,
struct mlx5_flow_handle *rule,
struct mlx5_flow_attr *attr);
bool mlx5e_tc_is_vf_tunnel(struct net_device *out_dev, struct net_device *route_dev);
int mlx5e_tc_query_route_vport(struct net_device *out_dev, struct net_device *route_dev,
u16 *vport);
#else /* CONFIG_MLX5_CLS_ACT */
static inline int mlx5e_tc_nic_init(struct mlx5e_priv *priv) { return 0; }
static inline void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv) {}
......@@ -283,7 +319,7 @@ static inline bool mlx5e_cqe_regb_chain(struct mlx5_cqe64 *cqe)
reg_b = be32_to_cpu(cqe->ft_metadata);
if (reg_b >> (MLX5E_TC_TABLE_CHAIN_TAG_BITS + ZONE_RESTORE_BITS))
if (reg_b >> (MLX5E_TC_TABLE_CHAIN_TAG_BITS + ESW_ZONE_ID_BITS))
return false;
chain = reg_b & MLX5E_TC_TABLE_CHAIN_TAG_MASK;
......
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
/* Copyright (c) 2021 Mellanox Technologies. */
#ifndef __MLX5_ESW_FT_H__
#define __MLX5_ESW_FT_H__
#ifdef CONFIG_MLX5_CLS_ACT
struct mlx5_esw_indir_table *
mlx5_esw_indir_table_init(void);
void
mlx5_esw_indir_table_destroy(struct mlx5_esw_indir_table *indir);
struct mlx5_flow_table *mlx5_esw_indir_table_get(struct mlx5_eswitch *esw,
struct mlx5_flow_attr *attr,
struct mlx5_flow_spec *spec,
u16 vport, bool decap);
void mlx5_esw_indir_table_put(struct mlx5_eswitch *esw,
struct mlx5_flow_attr *attr,
u16 vport, bool decap);
bool
mlx5_esw_indir_table_needed(struct mlx5_eswitch *esw,
struct mlx5_flow_attr *attr,
u16 vport_num,
struct mlx5_core_dev *dest_mdev);
u16
mlx5_esw_indir_table_decap_vport(struct mlx5_flow_attr *attr);
#else
/* indir API stubs */
struct mlx5_esw_indir_table *
mlx5_esw_indir_table_init(void)
{
return NULL;
}
void
mlx5_esw_indir_table_destroy(struct mlx5_esw_indir_table *indir)
{
}
static inline struct mlx5_flow_table *
mlx5_esw_indir_table_get(struct mlx5_eswitch *esw,
struct mlx5_flow_attr *attr,
struct mlx5_flow_spec *spec,
u16 vport, bool decap)
{
return ERR_PTR(-EOPNOTSUPP);
}
static inline void
mlx5_esw_indir_table_put(struct mlx5_eswitch *esw,
struct mlx5_flow_attr *attr,
u16 vport, bool decap)
{
}
bool
mlx5_esw_indir_table_needed(struct mlx5_eswitch *esw,
struct mlx5_flow_attr *attr,
u16 vport_num,
struct mlx5_core_dev *dest_mdev)
{
return false;
}
static inline u16
mlx5_esw_indir_table_decap_vport(struct mlx5_flow_attr *attr)
{
return 0;
}
#endif
#endif /* __MLX5_ESW_FT_H__ */
......@@ -1300,6 +1300,13 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, u16 vport_num,
(!vport_num && mlx5_core_is_ecpf(esw->dev)))
vport->info.trusted = true;
if (!mlx5_esw_is_manager_vport(esw, vport->vport) &&
MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) {
ret = mlx5_esw_vport_vhca_id_set(esw, vport_num);
if (ret)
goto err_vhca_mapping;
}
esw_vport_change_handle_locked(vport);
esw->enabled_vports++;
......@@ -1307,6 +1314,11 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, u16 vport_num,
done:
mutex_unlock(&esw->state_lock);
return ret;
err_vhca_mapping:
esw_vport_cleanup(esw, vport);
mutex_unlock(&esw->state_lock);
return ret;
}
void mlx5_esw_vport_disable(struct mlx5_eswitch *esw, u16 vport_num)
......@@ -1325,6 +1337,11 @@ void mlx5_esw_vport_disable(struct mlx5_eswitch *esw, u16 vport_num)
/* Disable events from this vport */
arm_vport_context_events_cmd(esw->dev, vport->vport, 0);
if (!mlx5_esw_is_manager_vport(esw, vport->vport) &&
MLX5_CAP_GEN(esw->dev, vhca_resource_manager))
mlx5_esw_vport_vhca_id_clear(esw, vport_num);
/* We don't assume VFs will cleanup after themselves.
* Calling vport change handler while vport is disabled will cleanup
* the vport resources.
......@@ -1815,6 +1832,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
mlx5e_mod_hdr_tbl_init(&esw->offloads.mod_hdr);
atomic64_set(&esw->offloads.num_flows, 0);
ida_init(&esw->offloads.vport_metadata_ida);
xa_init_flags(&esw->offloads.vhca_map, XA_FLAGS_ALLOC);
mutex_init(&esw->state_lock);
mutex_init(&esw->mode_lock);
......@@ -1854,6 +1872,8 @@ void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw)
esw_offloads_cleanup_reps(esw);
mutex_destroy(&esw->mode_lock);
mutex_destroy(&esw->state_lock);
WARN_ON(!xa_empty(&esw->offloads.vhca_map));
xa_destroy(&esw->offloads.vhca_map);
ida_destroy(&esw->offloads.vport_metadata_ida);
mlx5e_mod_hdr_tbl_destroy(&esw->offloads.mod_hdr);
mutex_destroy(&esw->offloads.encap_tbl_lock);
......
......@@ -270,5 +270,7 @@ void mlx5_mdev_uninit(struct mlx5_core_dev *dev);
void mlx5_unload_one(struct mlx5_core_dev *dev, bool cleanup);
int mlx5_load_one(struct mlx5_core_dev *dev, bool boot);
int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 function_id, void *out);
void mlx5_events_work_enqueue(struct mlx5_core_dev *dev, struct work_struct *work);
#endif /* __MLX5_CORE_H__ */
......@@ -1164,3 +1164,15 @@ u16 mlx5_eswitch_get_total_vports(const struct mlx5_core_dev *dev)
return MLX5_SPECIAL_VPORTS(dev) + mlx5_core_max_vfs(dev) + mlx5_sf_max_functions(dev);
}
EXPORT_SYMBOL_GPL(mlx5_eswitch_get_total_vports);
int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 function_id, void *out)
{
u16 opmod = (MLX5_CAP_GENERAL << 1) | (HCA_CAP_OPMOD_GET_MAX & 0x01);
u8 in[MLX5_ST_SZ_BYTES(query_hca_cap_in)] = {};
MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
MLX5_SET(query_hca_cap_in, in, op_mod, opmod);
MLX5_SET(query_hca_cap_in, in, function_id, function_id);
MLX5_SET(query_hca_cap_in, in, other_function, true);
return mlx5_cmd_exec_inout(dev, query_hca_cap, in, out);
}
......@@ -96,6 +96,35 @@ static inline u32 mlx5_eswitch_get_vport_metadata_mask(void)
u32 mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw,
u16 vport_num);
u32 mlx5_eswitch_get_vport_metadata_for_set(struct mlx5_eswitch *esw,
u16 vport_num);
/* Reg C1 usage:
* Reg C1 = < ESW_TUN_ID(12) | ESW_TUN_OPTS(12) | ESW_ZONE_ID(8) >
*
* Highest 12 bits of reg c1 is the encapsulation tunnel id, next 12 bits is
* encapsulation tunnel options, and the lowest 8 bits are used for zone id.
*
* Zone id is used to restore CT flow when packet misses on chain.
*
* Tunnel id and options are used together to restore the tunnel info metadata
* on miss and to support inner header rewrite by means of implicit chain 0
* flows.
*/
#define ESW_ZONE_ID_BITS 8
#define ESW_TUN_OPTS_BITS 12
#define ESW_TUN_ID_BITS 12
#define ESW_TUN_OPTS_OFFSET ESW_ZONE_ID_BITS
#define ESW_TUN_OFFSET ESW_TUN_OPTS_OFFSET
#define ESW_ZONE_ID_MASK GENMASK(ESW_ZONE_ID_BITS - 1, 0)
#define ESW_TUN_OPTS_MASK GENMASK(32 - ESW_TUN_ID_BITS - 1, ESW_TUN_OPTS_OFFSET)
#define ESW_TUN_MASK GENMASK(31, ESW_TUN_OFFSET)
#define ESW_TUN_ID_SLOW_TABLE_GOTO_VPORT 0 /* 0 is not a valid tunnel id */
#define ESW_TUN_OPTS_SLOW_TABLE_GOTO_VPORT 0xFFF /* 0xFFF is a reserved mapping */
#define ESW_TUN_SLOW_TABLE_GOTO_VPORT ((ESW_TUN_ID_SLOW_TABLE_GOTO_VPORT << ESW_TUN_OPTS_BITS) | \
ESW_TUN_OPTS_SLOW_TABLE_GOTO_VPORT)
#define ESW_TUN_SLOW_TABLE_GOTO_VPORT_MARK ESW_TUN_OPTS_MASK
u8 mlx5_eswitch_mode(struct mlx5_core_dev *dev);
#else /* CONFIG_MLX5_ESWITCH */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment