Commit f86c70ed authored by David S. Miller's avatar David S. Miller

Merge tag 'mlx5-updates-2021-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-04-06

Introduce TC sample offload

Background
----------

The tc sample action allows user to sample traffic matched by tc
classifier. The sampling consists of choosing packets randomly and
sampling them using psample module.

The tc sample parameters include group id, sampling rate and packet's
truncation (to save kernel-user traffic).

Sample in TC SW
---------------

User must specify rate and group id for sample action, truncate is
optional.

tc filter add dev enp4s0f0_0 ingress protocol ip prio 1 flower	\
	src_mac 02:25:d0:14:01:02 dst_mac 02:25:d0:14:01:03	\
	action sample rate 10 group 5 trunc 60			\
	action mirred egress redirect dev enp4s0f0_1

The tc sample action kernel module 'act_sample' will call another
kernel module 'psample' to send sampled packets to userspace.

MLX5 sample HW offload - MLX5 driver patches
--------------------------------------------

The sample action is translated to a goto flow table object
destination which samples packets according to the provided
sample ratio. Sampled packets are duplicated. One copy is
processed by a termination table, named the sample table,
which sends the packet to the eswitch manager port (that will
be processed by software).

The second copy is processed by the default table which executes
the subsequent actions. The default table is created per <vport,
chain, prio> tuple as rules with different prios and chains may
overlap.

For example, for the following typical flow table:

+-------------------------------+
+       original flow table     +
+-------------------------------+
+         original match        +
+-------------------------------+
+ sample action + other actions +
+-------------------------------+

We translate the tc filter with sample action to the following HW model:

        +---------------------+
        + original flow table +
        +---------------------+
        +   original match    +
        +---------------------+
                   |
                   v
+------------------------------------------------+
+                Flow Sampler Object             +
+------------------------------------------------+
+                    sample ratio                +
+------------------------------------------------+
+    sample table id    |    default table id    +
+------------------------------------------------+
           |                            |
           v                            v
+-----------------------------+  +----------------------------------------+
+        sample table         +  + default table per <vport, chain, prio> +
+-----------------------------+  +----------------------------------------+
+ forward to management vport +  +            original match              +
+-----------------------------+  +----------------------------------------+
                                 +            other actions               +
                                 +----------------------------------------+

Flow sampler object
-------------------

Hardware introduces flow sampler object to do sample. It is a new
destination type. Driver needs to specify two flow table ids in it.
One is sample table id. The other one is the default table id.
Sample table samples the packets according to the sample rate and
forward the sampled packets to eswitch manager port. Default table
finishes the subsequent actions.

Group id and reg_c0
-------------------

Userspace program will take different actions for sampled packets
according to tc sample action group id. So hardware must pass group
id to software for each sampled packets. In Paul Blakey's "Introduce
connection tracking offload" patch set, reg_c0 lower 16 bits are used
for miss packet chain id restore. We convert reg_c0 lower 16 bits to
a common object pool, so other features can also use it.

Since sample group id is 32 bits, create a 16 bits object id to map
the group id and write the object id to reg_c0 lower 16 bits. reg_c0
can only be used for matching. Write reg_c0 to flow_tag, so software
can get the object id via flow_tag and find group id via the common
object pool.

Sampler restore handle
----------------------

Use common object pool to create an object id to map sample parameters.
Allocate a modify header action to write the object id to reg_c0 lower
16 bits. Create a restore rule to pass the object id to software. So
software can identify sampled packets via the object id and send it to
userspace.

Aggregate the modify header action, restore rule and object id to a
sample restore handle. Re-use identical sample restore handle for
the same object id.

Send sampled packets to userspace
---------------------------------

The destination for sampled packets is eswitch manager port, so
representors can receive sampled packets together with the group id.
Driver will send sampled packets and group id to userspace via psample.
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 872fff33 f94d6389
......@@ -104,6 +104,18 @@ config MLX5_TC_CT
If unsure, set to Y
config MLX5_TC_SAMPLE
bool "MLX5 TC sample offload support"
depends on MLX5_CLS_ACT
default y
help
Say Y here if you want to support offloading sample rules via tc
sample action.
If set to N, will not be able to configure tc rules with sample
action.
If unsure, set to Y
config MLX5_CORE_EN_DCB
bool "Data Center Bridging (DCB) Support"
default y
......
......@@ -37,9 +37,10 @@ mlx5_core-$(CONFIG_MLX5_EN_RXNFC) += en_fs_ethtool.o
mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o en/port_buffer.o
mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += en/hv_vhca_stats.o
mlx5_core-$(CONFIG_MLX5_ESWITCH) += lag_mp.o lib/geneve.o lib/port_tun.o \
en_rep.o en/rep/bond.o en/mod_hdr.o
en_rep.o en/rep/bond.o en/mod_hdr.o \
en/mapping.o
mlx5_core-$(CONFIG_MLX5_CLS_ACT) += en_tc.o en/rep/tc.o en/rep/neigh.o \
en/mapping.o lib/fs_chains.o en/tc_tun.o \
lib/fs_chains.o en/tc_tun.o \
esw/indir_table.o en/tc_tun_encap.o \
en/tc_tun_vxlan.o en/tc_tun_gre.o en/tc_tun_geneve.o \
en/tc_tun_mplsoudp.o diag/en_tc_tracepoint.o
......@@ -53,7 +54,8 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH) += eswitch.o eswitch_offloads.o eswitch_offlo
mlx5_core-$(CONFIG_MLX5_ESWITCH) += esw/acl/helper.o \
esw/acl/egress_lgcy.o esw/acl/egress_ofld.o \
esw/acl/ingress_lgcy.o esw/acl/ingress_ofld.o \
esw/devlink_port.o
esw/devlink_port.o esw/vporttbl.o
mlx5_core-$(CONFIG_MLX5_TC_SAMPLE) += esw/sample.o
mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
mlx5_core-$(CONFIG_VXLAN) += lib/vxlan.o
......
......@@ -29,6 +29,7 @@ struct mlx5e_tc_table {
struct netdev_net_notifier netdevice_nn;
struct mlx5_tc_ct_priv *ct;
struct mapping_ctx *mapping;
};
struct mlx5e_flow_table {
......
......@@ -17,6 +17,7 @@
#include "en/mapping.h"
#include "en/tc_tun.h"
#include "lib/port_tun.h"
#include "esw/sample.h"
struct mlx5e_rep_indr_block_priv {
struct net_device *netdev;
......@@ -611,19 +612,46 @@ static bool mlx5e_restore_tunnel(struct mlx5e_priv *priv, struct sk_buff *skb,
return true;
}
static bool mlx5e_restore_skb(struct sk_buff *skb, u32 chain, u32 reg_c1,
struct mlx5e_tc_update_priv *tc_priv)
{
struct mlx5e_priv *priv = netdev_priv(skb->dev);
u32 tunnel_id = reg_c1 >> ESW_TUN_OFFSET;
if (chain) {
struct mlx5_rep_uplink_priv *uplink_priv;
struct mlx5e_rep_priv *uplink_rpriv;
struct tc_skb_ext *tc_skb_ext;
struct mlx5_eswitch *esw;
u32 zone_restore_id;
tc_skb_ext = skb_ext_add(skb, TC_SKB_EXT);
if (!tc_skb_ext) {
WARN_ON(1);
return false;
}
tc_skb_ext->chain = chain;
zone_restore_id = reg_c1 & ESW_ZONE_ID_MASK;
esw = priv->mdev->priv.eswitch;
uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
uplink_priv = &uplink_rpriv->uplink_priv;
if (!mlx5e_tc_ct_restore_flow(uplink_priv->ct_priv, skb,
zone_restore_id))
return false;
}
return mlx5e_restore_tunnel(priv, skb, tc_priv, tunnel_id);
}
#endif /* CONFIG_NET_TC_SKB_EXT */
bool mlx5e_rep_tc_update_skb(struct mlx5_cqe64 *cqe,
struct sk_buff *skb,
struct mlx5e_tc_update_priv *tc_priv)
{
#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
u32 chain = 0, reg_c0, reg_c1, tunnel_id, zone_restore_id;
struct mlx5_rep_uplink_priv *uplink_priv;
struct mlx5e_rep_priv *uplink_rpriv;
struct tc_skb_ext *tc_skb_ext;
struct mlx5_mapped_obj mapped_obj;
struct mlx5_eswitch *esw;
struct mlx5e_priv *priv;
u32 reg_c0, reg_c1;
int err;
reg_c0 = (be32_to_cpu(cqe->sop_drop_qpn) & MLX5E_TC_FLOW_ID_MASK);
......@@ -639,36 +667,29 @@ bool mlx5e_rep_tc_update_skb(struct mlx5_cqe64 *cqe,
priv = netdev_priv(skb->dev);
esw = priv->mdev->priv.eswitch;
err = mlx5_get_chain_for_tag(esw_chains(esw), reg_c0, &chain);
err = mapping_find(esw->offloads.reg_c0_obj_pool, reg_c0, &mapped_obj);
if (err) {
netdev_dbg(priv->netdev,
"Couldn't find chain for chain tag: %d, err: %d\n",
"Couldn't find mapped object for reg_c0: %d, err: %d\n",
reg_c0, err);
return false;
}
if (chain) {
tc_skb_ext = skb_ext_add(skb, TC_SKB_EXT);
if (!tc_skb_ext) {
WARN_ON(1);
return false;
}
tc_skb_ext->chain = chain;
zone_restore_id = reg_c1 & ESW_ZONE_ID_MASK;
uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
uplink_priv = &uplink_rpriv->uplink_priv;
if (!mlx5e_tc_ct_restore_flow(uplink_priv->ct_priv, skb,
zone_restore_id))
return false;
}
tunnel_id = reg_c1 >> ESW_TUN_OFFSET;
return mlx5e_restore_tunnel(priv, skb, tc_priv, tunnel_id);
#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
if (mapped_obj.type == MLX5_MAPPED_OBJ_CHAIN)
return mlx5e_restore_skb(skb, mapped_obj.chain, reg_c1, tc_priv);
#endif /* CONFIG_NET_TC_SKB_EXT */
#if IS_ENABLED(CONFIG_MLX5_TC_SAMPLE)
if (mapped_obj.type == MLX5_MAPPED_OBJ_SAMPLE) {
mlx5_esw_sample_skb(skb, &mapped_obj);
return false;
}
#endif /* CONFIG_MLX5_TC_SAMPLE */
if (mapped_obj.type != MLX5_MAPPED_OBJ_SAMPLE &&
mapped_obj.type != MLX5_MAPPED_OBJ_CHAIN) {
netdev_dbg(priv->netdev, "Invalid mapped object type: %d\n", mapped_obj.type);
return false;
}
return true;
}
......
......@@ -27,6 +27,7 @@ enum {
MLX5E_TC_FLOW_FLAG_L3_TO_L2_DECAP = MLX5E_TC_FLOW_BASE + 8,
MLX5E_TC_FLOW_FLAG_TUN_RX = MLX5E_TC_FLOW_BASE + 9,
MLX5E_TC_FLOW_FLAG_FAILED = MLX5E_TC_FLOW_BASE + 10,
MLX5E_TC_FLOW_FLAG_SAMPLE = MLX5E_TC_FLOW_BASE + 11,
};
struct mlx5e_tc_flow_parse_attr {
......
......@@ -89,6 +89,7 @@ struct mlx5_rep_uplink_priv {
struct mapping_ctx *tunnel_enc_opts_mapping;
struct mlx5_tc_ct_priv *ct_priv;
struct mlx5_esw_psample *esw_psample;
/* support eswitch vports bonding */
struct mlx5e_rep_bond *bond;
......
......@@ -47,6 +47,7 @@
#include <net/tc_act/tc_pedit.h>
#include <net/tc_act/tc_csum.h>
#include <net/tc_act/tc_mpls.h>
#include <net/psample.h>
#include <net/arp.h>
#include <net/ipv6_stubs.h>
#include <net/bareudp.h>
......@@ -65,6 +66,7 @@
#include "en/mod_hdr.h"
#include "en/tc_priv.h"
#include "en/tc_tun_encap.h"
#include "esw/sample.h"
#include "lib/devcom.h"
#include "lib/geneve.h"
#include "lib/fs_chains.h"
......@@ -221,6 +223,25 @@ get_ct_priv(struct mlx5e_priv *priv)
return priv->fs.tc.ct;
}
#if IS_ENABLED(CONFIG_MLX5_TC_SAMPLE)
static struct mlx5_esw_psample *
get_sample_priv(struct mlx5e_priv *priv)
{
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
struct mlx5_rep_uplink_priv *uplink_priv;
struct mlx5e_rep_priv *uplink_rpriv;
if (is_mdev_switchdev_mode(priv->mdev)) {
uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
uplink_priv = &uplink_rpriv->uplink_priv;
return uplink_priv->esw_psample;
}
return NULL;
}
#endif
struct mlx5_flow_handle *
mlx5_tc_rule_insert(struct mlx5e_priv *priv,
struct mlx5_flow_spec *spec,
......@@ -1090,6 +1111,10 @@ mlx5e_tc_offload_fdb_rules(struct mlx5_eswitch *esw,
rule = mlx5_tc_ct_flow_offload(get_ct_priv(flow->priv),
flow, spec, attr,
mod_hdr_acts);
#if IS_ENABLED(CONFIG_MLX5_TC_SAMPLE)
} else if (flow_flag_test(flow, SAMPLE)) {
rule = mlx5_esw_sample_offload(get_sample_priv(flow->priv), spec, attr);
#endif
} else {
rule = mlx5_eswitch_add_offloaded_rule(esw, spec, attr);
}
......@@ -1125,6 +1150,13 @@ void mlx5e_tc_unoffload_fdb_rules(struct mlx5_eswitch *esw,
return;
}
#if IS_ENABLED(CONFIG_MLX5_TC_SAMPLE)
if (flow_flag_test(flow, SAMPLE)) {
mlx5_esw_sample_unoffload(get_sample_priv(flow->priv), flow->rule[0], attr);
return;
}
#endif
if (attr->esw_attr->split_count)
mlx5_eswitch_del_fwd_rule(esw, flow->rule[1], attr);
......@@ -1481,6 +1513,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
if (flow_flag_test(flow, L3_TO_L2_DECAP))
mlx5e_detach_decap(priv, flow);
kfree(flow->attr->esw_attr->sample);
kfree(flow->attr);
}
......@@ -3627,6 +3660,7 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
bool ft_flow = mlx5e_is_ft_flow(flow);
const struct flow_action_entry *act;
struct mlx5_esw_flow_attr *esw_attr;
struct mlx5_sample_attr sample = {};
bool encap = false, decap = false;
u32 action = attr->action;
int err, i, if_count = 0;
......@@ -3881,6 +3915,10 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
attr->dest_chain = act->chain_index;
break;
case FLOW_ACTION_CT:
if (flow_flag_test(flow, SAMPLE)) {
NL_SET_ERR_MSG_MOD(extack, "Sample action with connection tracking is not supported");
return -EOPNOTSUPP;
}
err = mlx5_tc_ct_parse_action(get_ct_priv(priv), attr, act, extack);
if (err)
return err;
......@@ -3888,6 +3926,17 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
flow_flag_set(flow, CT);
esw_attr->split_count = esw_attr->out_count;
break;
case FLOW_ACTION_SAMPLE:
if (flow_flag_test(flow, CT)) {
NL_SET_ERR_MSG_MOD(extack, "Sample action with connection tracking is not supported");
return -EOPNOTSUPP;
}
sample.rate = act->sample.rate;
sample.group_num = act->sample.psample_group->group_num;
if (act->sample.truncate)
sample.trunc_size = act->sample.trunc_size;
flow_flag_set(flow, SAMPLE);
break;
default:
NL_SET_ERR_MSG_MOD(extack, "The offload action is not supported");
return -EOPNOTSUPP;
......@@ -3966,6 +4015,16 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
return -EOPNOTSUPP;
}
/* Allocate sample attribute only when there is a sample action and
* no errors after parsing.
*/
if (flow_flag_test(flow, SAMPLE)) {
esw_attr->sample = kzalloc(sizeof(*esw_attr->sample), GFP_KERNEL);
if (!esw_attr->sample)
return -ENOMEM;
*esw_attr->sample = sample;
}
return 0;
}
......@@ -4731,6 +4790,7 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
{
struct mlx5e_tc_table *tc = &priv->fs.tc;
struct mlx5_core_dev *dev = priv->mdev;
struct mapping_ctx *chains_mapping;
struct mlx5_chains_attr attr = {};
int err;
......@@ -4745,15 +4805,22 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
lockdep_set_class(&tc->ht.mutex, &tc_ht_lock_key);
if (MLX5_CAP_FLOWTABLE_NIC_RX(priv->mdev, ignore_flow_level)) {
chains_mapping = mapping_create(sizeof(struct mlx5_mapped_obj),
MLX5E_TC_TABLE_CHAIN_TAG_MASK, true);
if (IS_ERR(chains_mapping)) {
err = PTR_ERR(chains_mapping);
goto err_mapping;
}
tc->mapping = chains_mapping;
if (MLX5_CAP_FLOWTABLE_NIC_RX(priv->mdev, ignore_flow_level))
attr.flags = MLX5_CHAINS_AND_PRIOS_SUPPORTED |
MLX5_CHAINS_IGNORE_FLOW_LEVEL_SUPPORTED;
attr.max_restore_tag = MLX5E_TC_TABLE_CHAIN_TAG_MASK;
}
attr.ns = MLX5_FLOW_NAMESPACE_KERNEL;
attr.max_ft_sz = mlx5e_tc_nic_get_ft_size(dev);
attr.max_grp_num = MLX5E_TC_TABLE_NUM_GROUPS;
attr.default_ft = mlx5e_vlan_get_flowtable(priv->fs.vlan);
attr.mapping = chains_mapping;
tc->chains = mlx5_chains_create(dev, &attr);
if (IS_ERR(tc->chains)) {
......@@ -4780,6 +4847,8 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
mlx5_tc_ct_clean(tc->ct);
mlx5_chains_destroy(tc->chains);
err_chains:
mapping_destroy(chains_mapping);
err_mapping:
rhashtable_destroy(&tc->ht);
return err;
}
......@@ -4814,6 +4883,7 @@ void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv)
mutex_destroy(&tc->t_lock);
mlx5_tc_ct_clean(tc->ct);
mapping_destroy(tc->mapping);
mlx5_chains_destroy(tc->chains);
}
......@@ -4837,6 +4907,10 @@ int mlx5e_tc_esw_init(struct rhashtable *tc_ht)
&esw->offloads.mod_hdr,
MLX5_FLOW_NAMESPACE_FDB);
#if IS_ENABLED(CONFIG_MLX5_TC_SAMPLE)
uplink_priv->esw_psample = mlx5_esw_sample_init(netdev_priv(priv->netdev));
#endif
mapping = mapping_create(sizeof(struct tunnel_match_key),
TUNNEL_INFO_BITS_MASK, true);
if (IS_ERR(mapping)) {
......@@ -4874,6 +4948,9 @@ int mlx5e_tc_esw_init(struct rhashtable *tc_ht)
err_enc_opts_mapping:
mapping_destroy(uplink_priv->tunnel_mapping);
err_tun_mapping:
#if IS_ENABLED(CONFIG_MLX5_TC_SAMPLE)
mlx5_esw_sample_cleanup(uplink_priv->esw_psample);
#endif
mlx5_tc_ct_clean(uplink_priv->ct_priv);
netdev_warn(priv->netdev,
"Failed to initialize tc (eswitch), err: %d", err);
......@@ -4892,6 +4969,9 @@ void mlx5e_tc_esw_cleanup(struct rhashtable *tc_ht)
mapping_destroy(uplink_priv->tunnel_enc_opts_mapping);
mapping_destroy(uplink_priv->tunnel_mapping);
#if IS_ENABLED(CONFIG_MLX5_TC_SAMPLE)
mlx5_esw_sample_cleanup(uplink_priv->esw_psample);
#endif
mlx5_tc_ct_clean(uplink_priv->ct_priv);
}
......@@ -4973,6 +5053,7 @@ bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
u32 chain = 0, chain_tag, reg_b, zone_restore_id;
struct mlx5e_priv *priv = netdev_priv(skb->dev);
struct mlx5e_tc_table *tc = &priv->fs.tc;
struct mlx5_mapped_obj mapped_obj;
struct tc_skb_ext *tc_skb_ext;
int err;
......@@ -4980,7 +5061,7 @@ bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
chain_tag = reg_b & MLX5E_TC_TABLE_CHAIN_TAG_MASK;
err = mlx5_get_chain_for_tag(nic_chains(priv), chain_tag, &chain);
err = mapping_find(tc->mapping, chain_tag, &mapped_obj);
if (err) {
netdev_dbg(priv->netdev,
"Couldn't find chain for chain tag: %d, err: %d\n",
......@@ -4988,7 +5069,8 @@ bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
return false;
}
if (chain) {
if (mapped_obj.type == MLX5_MAPPED_OBJ_CHAIN) {
chain = mapped_obj.chain;
tc_skb_ext = skb_ext_add(skb, TC_SKB_EXT);
if (WARN_ON(!tc_skb_ext))
return false;
......@@ -5001,6 +5083,9 @@ bool mlx5e_tc_update_skb(struct mlx5_cqe64 *cqe,
if (!mlx5e_tc_ct_restore_flow(tc->ct, skb,
zone_restore_id))
return false;
} else {
netdev_dbg(priv->netdev, "Invalid mapped object type: %d\n", mapped_obj.type);
return false;
}
#endif /* CONFIG_NET_TC_SKB_EXT */
......
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
/* Copyright (c) 2021 Mellanox Technologies. */
#ifndef __MLX5_EN_TC_SAMPLE_H__
#define __MLX5_EN_TC_SAMPLE_H__
#include "en.h"
#include "eswitch.h"
struct mlx5e_priv;
struct mlx5_flow_attr;
struct mlx5_esw_psample;
struct mlx5_sample_attr {
u32 group_num;
u32 rate;
u32 trunc_size;
u32 restore_obj_id;
u32 sampler_id;
struct mlx5_flow_table *sample_default_tbl;
struct mlx5_sample_flow *sample_flow;
};
void mlx5_esw_sample_skb(struct sk_buff *skb, struct mlx5_mapped_obj *mapped_obj);
struct mlx5_flow_handle *
mlx5_esw_sample_offload(struct mlx5_esw_psample *sample_priv,
struct mlx5_flow_spec *spec,
struct mlx5_flow_attr *attr);
void
mlx5_esw_sample_unoffload(struct mlx5_esw_psample *sample_priv,
struct mlx5_flow_handle *rule,
struct mlx5_flow_attr *attr);
struct mlx5_esw_psample *
mlx5_esw_sample_init(struct mlx5e_priv *priv);
void
mlx5_esw_sample_cleanup(struct mlx5_esw_psample *esw_psample);
#endif /* __MLX5_EN_TC_SAMPLE_H__ */
// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
// Copyright (c) 2021 Mellanox Technologies.
#include "eswitch.h"
/* This struct is used as a key to the hash table and we need it to be packed
* so hash result is consistent
*/
struct mlx5_vport_key {
u32 chain;
u16 prio;
u16 vport;
u16 vhca_id;
const struct esw_vport_tbl_namespace *vport_ns;
} __packed;
struct mlx5_vport_table {
struct hlist_node hlist;
struct mlx5_flow_table *fdb;
u32 num_rules;
struct mlx5_vport_key key;
};
static struct mlx5_flow_table *
esw_vport_tbl_create(struct mlx5_eswitch *esw, struct mlx5_flow_namespace *ns,
const struct esw_vport_tbl_namespace *vport_ns)
{
struct mlx5_flow_table_attr ft_attr = {};
struct mlx5_flow_table *fdb;
if (vport_ns->max_num_groups)
ft_attr.autogroup.max_num_groups = vport_ns->max_num_groups;
else
ft_attr.autogroup.max_num_groups = esw->params.large_group_num;
ft_attr.max_fte = vport_ns->max_fte;
ft_attr.prio = FDB_PER_VPORT;
ft_attr.flags = vport_ns->flags;
fdb = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
if (IS_ERR(fdb)) {
esw_warn(esw->dev, "Failed to create per vport FDB Table err %ld\n",
PTR_ERR(fdb));
}
return fdb;
}
static u32 flow_attr_to_vport_key(struct mlx5_eswitch *esw,
struct mlx5_vport_tbl_attr *attr,
struct mlx5_vport_key *key)
{
key->vport = attr->vport;
key->chain = attr->chain;
key->prio = attr->prio;
key->vhca_id = MLX5_CAP_GEN(esw->dev, vhca_id);
key->vport_ns = attr->vport_ns;
return jhash(key, sizeof(*key), 0);
}
/* caller must hold vports.lock */
static struct mlx5_vport_table *
esw_vport_tbl_lookup(struct mlx5_eswitch *esw, struct mlx5_vport_key *skey, u32 key)
{
struct mlx5_vport_table *e;
hash_for_each_possible(esw->fdb_table.offloads.vports.table, e, hlist, key)
if (!memcmp(&e->key, skey, sizeof(*skey)))
return e;
return NULL;
}
struct mlx5_flow_table *
mlx5_esw_vporttbl_get(struct mlx5_eswitch *esw, struct mlx5_vport_tbl_attr *attr)
{
struct mlx5_core_dev *dev = esw->dev;
struct mlx5_flow_namespace *ns;
struct mlx5_flow_table *fdb;
struct mlx5_vport_table *e;
struct mlx5_vport_key skey;
u32 hkey;
mutex_lock(&esw->fdb_table.offloads.vports.lock);
hkey = flow_attr_to_vport_key(esw, attr, &skey);
e = esw_vport_tbl_lookup(esw, &skey, hkey);
if (e) {
e->num_rules++;
goto out;
}
e = kzalloc(sizeof(*e), GFP_KERNEL);
if (!e) {
fdb = ERR_PTR(-ENOMEM);
goto err_alloc;
}
ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_FDB);
if (!ns) {
esw_warn(dev, "Failed to get FDB namespace\n");
fdb = ERR_PTR(-ENOENT);
goto err_ns;
}
fdb = esw_vport_tbl_create(esw, ns, attr->vport_ns);
if (IS_ERR(fdb))
goto err_ns;
e->fdb = fdb;
e->num_rules = 1;
e->key = skey;
hash_add(esw->fdb_table.offloads.vports.table, &e->hlist, hkey);
out:
mutex_unlock(&esw->fdb_table.offloads.vports.lock);
return e->fdb;
err_ns:
kfree(e);
err_alloc:
mutex_unlock(&esw->fdb_table.offloads.vports.lock);
return fdb;
}
void
mlx5_esw_vporttbl_put(struct mlx5_eswitch *esw, struct mlx5_vport_tbl_attr *attr)
{
struct mlx5_vport_table *e;
struct mlx5_vport_key key;
u32 hkey;
mutex_lock(&esw->fdb_table.offloads.vports.lock);
hkey = flow_attr_to_vport_key(esw, attr, &key);
e = esw_vport_tbl_lookup(esw, &key, hkey);
if (!e || --e->num_rules)
goto out;
hash_del(&e->hlist);
mlx5_destroy_flow_table(e->fdb);
kfree(e);
out:
mutex_unlock(&esw->fdb_table.offloads.vports.lock);
}
......@@ -46,6 +46,24 @@
#include "lib/fs_chains.h"
#include "sf/sf.h"
#include "en/tc_ct.h"
#include "esw/sample.h"
enum mlx5_mapped_obj_type {
MLX5_MAPPED_OBJ_CHAIN,
MLX5_MAPPED_OBJ_SAMPLE,
};
struct mlx5_mapped_obj {
enum mlx5_mapped_obj_type type;
union {
u32 chain;
struct {
u32 group_id;
u32 rate;
u32 trunc_size;
} sample;
};
};
#ifdef CONFIG_MLX5_ESWITCH
......@@ -206,6 +224,7 @@ struct mlx5_esw_offload {
struct mlx5_flow_table *ft_offloads_restore;
struct mlx5_flow_group *restore_group;
struct mlx5_modify_hdr *restore_copy_hdr_id;
struct mapping_ctx *reg_c0_obj_pool;
struct mlx5_flow_table *ft_offloads;
struct mlx5_flow_group *vport_rx_group;
......@@ -357,6 +376,9 @@ void
mlx5_eswitch_termtbl_put(struct mlx5_eswitch *esw,
struct mlx5_termtbl_handle *tt);
void
mlx5_eswitch_clear_rule_source_port(struct mlx5_eswitch *esw, struct mlx5_flow_spec *spec);
struct mlx5_flow_handle *
mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
struct mlx5_flow_spec *spec,
......@@ -404,6 +426,7 @@ enum {
MLX5_ESW_ATTR_FLAG_SLOW_PATH = BIT(1),
MLX5_ESW_ATTR_FLAG_NO_IN_PORT = BIT(2),
MLX5_ESW_ATTR_FLAG_SRC_REWRITE = BIT(3),
MLX5_ESW_ATTR_FLAG_SAMPLE = BIT(4),
};
struct mlx5_esw_flow_attr {
......@@ -428,6 +451,7 @@ struct mlx5_esw_flow_attr {
} dests[MLX5_MAX_FLOW_FWD_VPORTS];
struct mlx5_rx_tun_attr *rx_tun_attr;
struct mlx5_pkt_reformat *decap_pkt_reformat;
struct mlx5_sample_attr *sample;
};
int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
......@@ -713,13 +737,26 @@ void
esw_vport_destroy_offloads_acl_tables(struct mlx5_eswitch *esw,
struct mlx5_vport *vport);
int mlx5_esw_vport_tbl_get(struct mlx5_eswitch *esw);
void mlx5_esw_vport_tbl_put(struct mlx5_eswitch *esw);
struct esw_vport_tbl_namespace {
int max_fte;
int max_num_groups;
u32 flags;
};
struct mlx5_vport_tbl_attr {
u16 chain;
u16 prio;
u16 vport;
const struct esw_vport_tbl_namespace *vport_ns;
};
struct mlx5_flow_table *
mlx5_esw_vporttbl_get(struct mlx5_eswitch *esw, struct mlx5_vport_tbl_attr *attr);
void
mlx5_esw_vporttbl_put(struct mlx5_eswitch *esw, struct mlx5_vport_tbl_attr *attr);
struct mlx5_flow_handle *
esw_add_restore_rule(struct mlx5_eswitch *esw, u32 tag);
u32
esw_get_max_restore_tag(struct mlx5_eswitch *esw);
int esw_offloads_load_rep(struct mlx5_eswitch *esw, u16 vport_num);
void esw_offloads_unload_rep(struct mlx5_eswitch *esw, u16 vport_num);
......
......@@ -7,15 +7,11 @@
#include "lib/fs_chains.h"
#include "en/mapping.h"
#include "mlx5_core.h"
#include "fs_core.h"
#include "eswitch.h"
#include "en.h"
#include "en_tc.h"
#define chains_lock(chains) ((chains)->lock)
#define chains_ht(chains) ((chains)->chains_ht)
#define chains_mapping(chains) ((chains)->chains_mapping)
#define prios_ht(chains) ((chains)->prios_ht)
#define ft_pool_left(chains) ((chains)->ft_left)
#define tc_default_ft(chains) ((chains)->tc_default_ft)
......@@ -300,7 +296,7 @@ create_chain_restore(struct fs_chain *chain)
!mlx5_chains_prios_supported(chains))
return 0;
err = mapping_add(chains_mapping(chains), &chain->chain, &index);
err = mlx5_chains_get_chain_mapping(chains, chain->chain, &index);
if (err)
return err;
if (index == MLX5_FS_DEFAULT_FLOW_TAG) {
......@@ -310,10 +306,8 @@ create_chain_restore(struct fs_chain *chain)
*
* This case isn't possible with MLX5_FS_DEFAULT_FLOW_TAG = 0.
*/
err = mapping_add(chains_mapping(chains),
&chain->chain, &index);
mapping_remove(chains_mapping(chains),
MLX5_FS_DEFAULT_FLOW_TAG);
err = mlx5_chains_get_chain_mapping(chains, chain->chain, &index);
mapping_remove(chains->chains_mapping, MLX5_FS_DEFAULT_FLOW_TAG);
if (err)
return err;
}
......@@ -361,7 +355,7 @@ create_chain_restore(struct fs_chain *chain)
mlx5_del_flow_rules(chain->restore_rule);
err_rule:
/* Datapath can't find this mapping, so we can safely remove it */
mapping_remove(chains_mapping(chains), chain->id);
mapping_remove(chains->chains_mapping, chain->id);
return err;
}
......@@ -376,7 +370,7 @@ static void destroy_chain_restore(struct fs_chain *chain)
mlx5_del_flow_rules(chain->restore_rule);
mlx5_modify_header_dealloc(chains->dev, chain->miss_modify_hdr);
mapping_remove(chains_mapping(chains), chain->id);
mapping_remove(chains->chains_mapping, chain->id);
}
static struct fs_chain *
......@@ -797,7 +791,6 @@ static struct mlx5_fs_chains *
mlx5_chains_init(struct mlx5_core_dev *dev, struct mlx5_chains_attr *attr)
{
struct mlx5_fs_chains *chains_priv;
struct mapping_ctx *mapping;
u32 max_flow_counter;
int err;
......@@ -816,6 +809,7 @@ mlx5_chains_init(struct mlx5_core_dev *dev, struct mlx5_chains_attr *attr)
chains_priv->flags = attr->flags;
chains_priv->ns = attr->ns;
chains_priv->group_num = attr->max_grp_num;
chains_priv->chains_mapping = attr->mapping;
tc_default_ft(chains_priv) = tc_end_ft(chains_priv) = attr->default_ft;
mlx5_core_info(dev, "Supported tc offload range - chains: %u, prios: %u\n",
......@@ -832,20 +826,10 @@ mlx5_chains_init(struct mlx5_core_dev *dev, struct mlx5_chains_attr *attr)
if (err)
goto init_prios_ht_err;
mapping = mapping_create(sizeof(u32), attr->max_restore_tag,
true);
if (IS_ERR(mapping)) {
err = PTR_ERR(mapping);
goto mapping_err;
}
chains_mapping(chains_priv) = mapping;
mutex_init(&chains_lock(chains_priv));
return chains_priv;
mapping_err:
rhashtable_destroy(&prios_ht(chains_priv));
init_prios_ht_err:
rhashtable_destroy(&chains_ht(chains_priv));
init_chains_ht_err:
......@@ -857,7 +841,6 @@ static void
mlx5_chains_cleanup(struct mlx5_fs_chains *chains)
{
mutex_destroy(&chains_lock(chains));
mapping_destroy(chains_mapping(chains));
rhashtable_destroy(&prios_ht(chains));
rhashtable_destroy(&chains_ht(chains));
......@@ -884,25 +867,18 @@ int
mlx5_chains_get_chain_mapping(struct mlx5_fs_chains *chains, u32 chain,
u32 *chain_mapping)
{
return mapping_add(chains_mapping(chains), &chain, chain_mapping);
struct mapping_ctx *ctx = chains->chains_mapping;
struct mlx5_mapped_obj mapped_obj = {};
mapped_obj.type = MLX5_MAPPED_OBJ_CHAIN;
mapped_obj.chain = chain;
return mapping_add(ctx, &mapped_obj, chain_mapping);
}
int
mlx5_chains_put_chain_mapping(struct mlx5_fs_chains *chains, u32 chain_mapping)
{
return mapping_remove(chains_mapping(chains), chain_mapping);
}
int mlx5_get_chain_for_tag(struct mlx5_fs_chains *chains, u32 tag,
u32 *chain)
{
int err;
struct mapping_ctx *ctx = chains->chains_mapping;
err = mapping_find(chains_mapping(chains), tag, chain);
if (err) {
mlx5_core_warn(chains->dev, "Can't find chain for tag: %d\n", tag);
return -ENOENT;
}
return 0;
return mapping_remove(ctx, chain_mapping);
}
......@@ -7,6 +7,7 @@
#include <linux/mlx5/fs.h>
struct mlx5_fs_chains;
struct mlx5_mapped_obj;
enum mlx5_chains_flags {
MLX5_CHAINS_AND_PRIOS_SUPPORTED = BIT(0),
......@@ -20,7 +21,7 @@ struct mlx5_chains_attr {
u32 max_ft_sz;
u32 max_grp_num;
struct mlx5_flow_table *default_ft;
u32 max_restore_tag;
struct mapping_ctx *mapping;
};
#if IS_ENABLED(CONFIG_MLX5_CLS_ACT)
......@@ -63,9 +64,6 @@ struct mlx5_fs_chains *
mlx5_chains_create(struct mlx5_core_dev *dev, struct mlx5_chains_attr *attr);
void mlx5_chains_destroy(struct mlx5_fs_chains *chains);
int
mlx5_get_chain_for_tag(struct mlx5_fs_chains *chains, u32 tag, u32 *chain);
void
mlx5_chains_set_end_ft(struct mlx5_fs_chains *chains,
struct mlx5_flow_table *ft);
......
......@@ -74,20 +74,19 @@ bool mlx5_eswitch_reg_c1_loopback_enabled(const struct mlx5_eswitch *esw);
bool mlx5_eswitch_vport_match_metadata_enabled(const struct mlx5_eswitch *esw);
/* Reg C0 usage:
* Reg C0 = < ESW_PFNUM_BITS(4) | ESW_VPORT BITS(12) | ESW_CHAIN_TAG(16) >
* Reg C0 = < ESW_PFNUM_BITS(4) | ESW_VPORT BITS(12) | ESW_REG_C0_OBJ(16) >
*
* Highest 4 bits of the reg c0 is the PF_NUM (range 0-15), 12 bits of
* unique non-zero vport id (range 1-4095). The rest (lowest 16 bits) is left
* for tc chain tag restoration.
* for user data objects managed by a common mapping context.
* PFNUM + VPORT comprise the SOURCE_PORT matching.
*/
#define ESW_VPORT_BITS 12
#define ESW_PFNUM_BITS 4
#define ESW_SOURCE_PORT_METADATA_BITS (ESW_PFNUM_BITS + ESW_VPORT_BITS)
#define ESW_SOURCE_PORT_METADATA_OFFSET (32 - ESW_SOURCE_PORT_METADATA_BITS)
#define ESW_CHAIN_TAG_METADATA_BITS (32 - ESW_SOURCE_PORT_METADATA_BITS)
#define ESW_CHAIN_TAG_METADATA_MASK GENMASK(ESW_CHAIN_TAG_METADATA_BITS - 1,\
0)
#define ESW_REG_C0_USER_DATA_METADATA_BITS (32 - ESW_SOURCE_PORT_METADATA_BITS)
#define ESW_REG_C0_USER_DATA_METADATA_MASK GENMASK(ESW_REG_C0_USER_DATA_METADATA_BITS - 1, 0)
static inline u32 mlx5_eswitch_get_vport_metadata_mask(void)
{
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment