Commit c3e676b9 authored by Jakub Kicinski's avatar Jakub Kicinski

Merge branch 'inet-separate-dscp-from-ecn-bits-using-new-dscp_t-type'

Guillaume Nault says:

====================
inet: Separate DSCP from ECN bits using new dscp_t type

The networking stack currently doesn't clearly distinguish between DSCP
and ECN bits. The entire DSCP+ECN bits are stored in u8 variables (or
structure fields), and each part of the stack handles them in their own
way, using different macros. This has created several bugs in the past
and some uncommon code paths are still unfixed.

Such bugs generally manifest by selecting invalid routes because of ECN
bits interfering with FIB routes and rules lookups (more details in the
LPC 2021 talk[1] and in the RFC of this series[2]).

This patch series aims at preventing the introduction of such bugs (and
detecting existing ones), by introducing a dscp_t type, representing
"sanitised" DSCP values (that is, with no ECN information), as opposed
to plain u8 values that contain both DSCP and ECN information. dscp_t
makes it clear for the reader what we're working on, and Sparse can
flag invalid interactions between dscp_t and plain u8.

This series converts only a few variables and structures:

  * Patch 1 converts the tclass field of struct fib6_rule. It
    effectively forbids the use of ECN bits in the tos/dsfield option
    of ip -6 rule. Rules now match packets solely based on their DSCP
    bits, so ECN doesn't influence the result any more. This contrasts
    with the previous behaviour where all 8 bits of the Traffic Class
    field were used. It is believed that this change is acceptable as
    matching ECN bits wasn't usable for IPv4, so only IPv6-only
    deployments could be depending on it. Also the previous behaviour
    made DSCP-based ip6-rules fail for packets with both a DSCP and an
    ECN mark, which is another reason why any such deploy is unlikely.

  * Patch 2 converts the tos field of struct fib4_rule. This one too
    effectively forbids defining ECN bits, this time in ip -4 rule.
    Before that, setting ECN bit 1 was accepted, while ECN bit 0 was
    rejected. But even when accepted, the rule would never match, as
    the packets would have their ECN bits cleared before doing the
    rule lookup.

  * Patch 3 converts the fc_tos field of struct fib_config. This is
    equivalent to patch 2, but for IPv4 routes. Routes using a
    tos/dsfield option with any ECN bit set is now rejected. Before
    this patch, they were accepted but, as with ip4 rules, these routes
    couldn't match any packet, since their ECN bits are cleared before
    the lookup.

  * Patch 4 converts the fa_tos field of struct fib_alias. This one is
    pure internal u8 to dscp_t conversion. While patches 1-3 had user
    facing consequences, this patch shouldn't have any side effect and
    is there to give an overview of what future conversion patches will
    look like. Conversions are quite mechanical, but imply some code
    churn, which is the price for the extra clarity a possibility of
    type checking.

To summarise, all the behaviour changes required for the dscp_t type
approach to work should be contained in patches 1-3. These changes are
edge cases of ip-route and ip-rule that don't currently work properly.
So they should be safe. Also, a kernel selftest is added for each of
them.

Finally, this work also paves the way for allowing the usage of the 3
high order DSCP bits in IPv4 (a few call paths already handle them, but
in general the stack clears them before IPv4 rule and route lookups).

References:
  [1] LPC 2021 talk:
        - https://linuxplumbersconf.org/event/11/contributions/943/
        - Direct link to slide deck:
            https://linuxplumbersconf.org/event/11/contributions/943/attachments/901/1780/inet_tos_lpc2021.pdf
  [2] RFC version of this series:
      - https://lore.kernel.org/netdev/cover.1638814614.git.gnault@redhat.com/
====================

Link: https://lore.kernel.org/r/cover.1643981839.git.gnault@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents 642436a1 32ccf110
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
* inet_dscp.h: helpers for handling differentiated services codepoints (DSCP)
*
* DSCP is defined in RFC 2474:
*
* 0 1 2 3 4 5 6 7
* +---+---+---+---+---+---+---+---+
* | DSCP | CU |
* +---+---+---+---+---+---+---+---+
*
* DSCP: differentiated services codepoint
* CU: currently unused
*
* The whole DSCP + CU bits form the DS field.
* The DS field is also commonly called TOS or Traffic Class (for IPv6).
*
* Note: the CU bits are now used for Explicit Congestion Notification
* (RFC 3168).
*/
#ifndef _INET_DSCP_H
#define _INET_DSCP_H
#include <linux/types.h>
/* Special type for storing DSCP values.
*
* A dscp_t variable stores a DS field with the CU (ECN) bits cleared.
* Using dscp_t allows to strictly separate DSCP and ECN bits, thus avoiding
* bugs where ECN bits are erroneously taken into account during FIB lookups
* or policy routing.
*
* Note: to get the real DSCP value contained in a dscp_t variable one would
* have to do a bit shift after calling inet_dscp_to_dsfield(). We could have
* a helper for that, but there's currently no users.
*/
typedef u8 __bitwise dscp_t;
#define INET_DSCP_MASK 0xfc
static inline dscp_t inet_dsfield_to_dscp(__u8 dsfield)
{
return (__force dscp_t)(dsfield & INET_DSCP_MASK);
}
static inline __u8 inet_dscp_to_dsfield(dscp_t dscp)
{
return (__force __u8)dscp;
}
static inline bool inet_validate_dscp(__u8 val)
{
return !(val & ~INET_DSCP_MASK);
}
#endif /* _INET_DSCP_H */
...@@ -17,6 +17,7 @@ ...@@ -17,6 +17,7 @@
#include <linux/rcupdate.h> #include <linux/rcupdate.h>
#include <net/fib_notifier.h> #include <net/fib_notifier.h>
#include <net/fib_rules.h> #include <net/fib_rules.h>
#include <net/inet_dscp.h>
#include <net/inetpeer.h> #include <net/inetpeer.h>
#include <linux/percpu.h> #include <linux/percpu.h>
#include <linux/notifier.h> #include <linux/notifier.h>
...@@ -24,7 +25,7 @@ ...@@ -24,7 +25,7 @@
struct fib_config { struct fib_config {
u8 fc_dst_len; u8 fc_dst_len;
u8 fc_tos; dscp_t fc_dscp;
u8 fc_protocol; u8 fc_protocol;
u8 fc_scope; u8 fc_scope;
u8 fc_type; u8 fc_type;
......
...@@ -17,6 +17,7 @@ ...@@ -17,6 +17,7 @@
#include <net/if_inet6.h> #include <net/if_inet6.h>
#include <net/flow.h> #include <net/flow.h>
#include <net/flow_dissector.h> #include <net/flow_dissector.h>
#include <net/inet_dscp.h>
#include <net/snmp.h> #include <net/snmp.h>
#include <net/netns/hash.h> #include <net/netns/hash.h>
...@@ -974,6 +975,11 @@ static inline u8 ip6_tclass(__be32 flowinfo) ...@@ -974,6 +975,11 @@ static inline u8 ip6_tclass(__be32 flowinfo)
return ntohl(flowinfo & IPV6_TCLASS_MASK) >> IPV6_TCLASS_SHIFT; return ntohl(flowinfo & IPV6_TCLASS_MASK) >> IPV6_TCLASS_SHIFT;
} }
static inline dscp_t ip6_dscp(__be32 flowinfo)
{
return inet_dsfield_to_dscp(ip6_tclass(flowinfo));
}
static inline __be32 ip6_make_flowinfo(unsigned int tclass, __be32 flowlabel) static inline __be32 ip6_make_flowinfo(unsigned int tclass, __be32 flowlabel)
{ {
return htonl(tclass << IPV6_TCLASS_SHIFT) | flowlabel; return htonl(tclass << IPV6_TCLASS_SHIFT) | flowlabel;
......
...@@ -32,6 +32,7 @@ ...@@ -32,6 +32,7 @@
#include <linux/list.h> #include <linux/list.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <net/inet_dscp.h>
#include <net/ip.h> #include <net/ip.h>
#include <net/protocol.h> #include <net/protocol.h>
#include <net/route.h> #include <net/route.h>
...@@ -735,8 +736,16 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb, ...@@ -735,8 +736,16 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
memset(cfg, 0, sizeof(*cfg)); memset(cfg, 0, sizeof(*cfg));
rtm = nlmsg_data(nlh); rtm = nlmsg_data(nlh);
if (!inet_validate_dscp(rtm->rtm_tos)) {
NL_SET_ERR_MSG(extack,
"Invalid dsfield (tos): ECN bits must be 0");
err = -EINVAL;
goto errout;
}
cfg->fc_dscp = inet_dsfield_to_dscp(rtm->rtm_tos);
cfg->fc_dst_len = rtm->rtm_dst_len; cfg->fc_dst_len = rtm->rtm_dst_len;
cfg->fc_tos = rtm->rtm_tos;
cfg->fc_table = rtm->rtm_table; cfg->fc_table = rtm->rtm_table;
cfg->fc_protocol = rtm->rtm_protocol; cfg->fc_protocol = rtm->rtm_protocol;
cfg->fc_scope = rtm->rtm_scope; cfg->fc_scope = rtm->rtm_scope;
......
...@@ -4,13 +4,14 @@ ...@@ -4,13 +4,14 @@
#include <linux/types.h> #include <linux/types.h>
#include <linux/list.h> #include <linux/list.h>
#include <net/inet_dscp.h>
#include <net/ip_fib.h> #include <net/ip_fib.h>
#include <net/nexthop.h> #include <net/nexthop.h>
struct fib_alias { struct fib_alias {
struct hlist_node fa_list; struct hlist_node fa_list;
struct fib_info *fa_info; struct fib_info *fa_info;
u8 fa_tos; dscp_t fa_dscp;
u8 fa_type; u8 fa_type;
u8 fa_state; u8 fa_state;
u8 fa_slen; u8 fa_slen;
......
...@@ -23,6 +23,7 @@ ...@@ -23,6 +23,7 @@
#include <linux/list.h> #include <linux/list.h>
#include <linux/rcupdate.h> #include <linux/rcupdate.h>
#include <linux/export.h> #include <linux/export.h>
#include <net/inet_dscp.h>
#include <net/ip.h> #include <net/ip.h>
#include <net/route.h> #include <net/route.h>
#include <net/tcp.h> #include <net/tcp.h>
...@@ -35,7 +36,7 @@ struct fib4_rule { ...@@ -35,7 +36,7 @@ struct fib4_rule {
struct fib_rule common; struct fib_rule common;
u8 dst_len; u8 dst_len;
u8 src_len; u8 src_len;
u8 tos; dscp_t dscp;
__be32 src; __be32 src;
__be32 srcmask; __be32 srcmask;
__be32 dst; __be32 dst;
...@@ -49,7 +50,7 @@ static bool fib4_rule_matchall(const struct fib_rule *rule) ...@@ -49,7 +50,7 @@ static bool fib4_rule_matchall(const struct fib_rule *rule)
{ {
struct fib4_rule *r = container_of(rule, struct fib4_rule, common); struct fib4_rule *r = container_of(rule, struct fib4_rule, common);
if (r->dst_len || r->src_len || r->tos) if (r->dst_len || r->src_len || r->dscp)
return false; return false;
return fib_rule_matchall(rule); return fib_rule_matchall(rule);
} }
...@@ -185,7 +186,7 @@ INDIRECT_CALLABLE_SCOPE int fib4_rule_match(struct fib_rule *rule, ...@@ -185,7 +186,7 @@ INDIRECT_CALLABLE_SCOPE int fib4_rule_match(struct fib_rule *rule,
((daddr ^ r->dst) & r->dstmask)) ((daddr ^ r->dst) & r->dstmask))
return 0; return 0;
if (r->tos && (r->tos != fl4->flowi4_tos)) if (r->dscp && r->dscp != inet_dsfield_to_dscp(fl4->flowi4_tos))
return 0; return 0;
if (rule->ip_proto && (rule->ip_proto != fl4->flowi4_proto)) if (rule->ip_proto && (rule->ip_proto != fl4->flowi4_proto))
...@@ -225,10 +226,12 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb, ...@@ -225,10 +226,12 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
int err = -EINVAL; int err = -EINVAL;
struct fib4_rule *rule4 = (struct fib4_rule *) rule; struct fib4_rule *rule4 = (struct fib4_rule *) rule;
if (frh->tos & ~IPTOS_TOS_MASK) { if (!inet_validate_dscp(frh->tos)) {
NL_SET_ERR_MSG(extack, "Invalid tos"); NL_SET_ERR_MSG(extack,
"Invalid dsfield (tos): ECN bits must be 0");
goto errout; goto errout;
} }
rule4->dscp = inet_dsfield_to_dscp(frh->tos);
/* split local/main if they are not already split */ /* split local/main if they are not already split */
err = fib_unmerge(net); err = fib_unmerge(net);
...@@ -270,7 +273,6 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb, ...@@ -270,7 +273,6 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
rule4->srcmask = inet_make_mask(rule4->src_len); rule4->srcmask = inet_make_mask(rule4->src_len);
rule4->dst_len = frh->dst_len; rule4->dst_len = frh->dst_len;
rule4->dstmask = inet_make_mask(rule4->dst_len); rule4->dstmask = inet_make_mask(rule4->dst_len);
rule4->tos = frh->tos;
net->ipv4.fib_has_custom_rules = true; net->ipv4.fib_has_custom_rules = true;
...@@ -313,7 +315,7 @@ static int fib4_rule_compare(struct fib_rule *rule, struct fib_rule_hdr *frh, ...@@ -313,7 +315,7 @@ static int fib4_rule_compare(struct fib_rule *rule, struct fib_rule_hdr *frh,
if (frh->dst_len && (rule4->dst_len != frh->dst_len)) if (frh->dst_len && (rule4->dst_len != frh->dst_len))
return 0; return 0;
if (frh->tos && (rule4->tos != frh->tos)) if (frh->tos && inet_dscp_to_dsfield(rule4->dscp) != frh->tos)
return 0; return 0;
#ifdef CONFIG_IP_ROUTE_CLASSID #ifdef CONFIG_IP_ROUTE_CLASSID
...@@ -337,7 +339,7 @@ static int fib4_rule_fill(struct fib_rule *rule, struct sk_buff *skb, ...@@ -337,7 +339,7 @@ static int fib4_rule_fill(struct fib_rule *rule, struct sk_buff *skb,
frh->dst_len = rule4->dst_len; frh->dst_len = rule4->dst_len;
frh->src_len = rule4->src_len; frh->src_len = rule4->src_len;
frh->tos = rule4->tos; frh->tos = inet_dscp_to_dsfield(rule4->dscp);
if ((rule4->dst_len && if ((rule4->dst_len &&
nla_put_in_addr(skb, FRA_DST, rule4->dst)) || nla_put_in_addr(skb, FRA_DST, rule4->dst)) ||
......
...@@ -32,6 +32,7 @@ ...@@ -32,6 +32,7 @@
#include <linux/hash.h> #include <linux/hash.h>
#include <net/arp.h> #include <net/arp.h>
#include <net/inet_dscp.h>
#include <net/ip.h> #include <net/ip.h>
#include <net/protocol.h> #include <net/protocol.h>
#include <net/route.h> #include <net/route.h>
...@@ -523,7 +524,7 @@ void rtmsg_fib(int event, __be32 key, struct fib_alias *fa, ...@@ -523,7 +524,7 @@ void rtmsg_fib(int event, __be32 key, struct fib_alias *fa,
fri.tb_id = tb_id; fri.tb_id = tb_id;
fri.dst = key; fri.dst = key;
fri.dst_len = dst_len; fri.dst_len = dst_len;
fri.tos = fa->fa_tos; fri.tos = inet_dscp_to_dsfield(fa->fa_dscp);
fri.type = fa->fa_type; fri.type = fa->fa_type;
fri.offload = fa->offload; fri.offload = fa->offload;
fri.trap = fa->trap; fri.trap = fa->trap;
...@@ -2039,7 +2040,7 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res) ...@@ -2039,7 +2040,7 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res)
int order = -1, last_idx = -1; int order = -1, last_idx = -1;
struct fib_alias *fa, *fa1 = NULL; struct fib_alias *fa, *fa1 = NULL;
u32 last_prio = res->fi->fib_priority; u32 last_prio = res->fi->fib_priority;
u8 last_tos = 0; dscp_t last_dscp = 0;
hlist_for_each_entry_rcu(fa, fa_head, fa_list) { hlist_for_each_entry_rcu(fa, fa_head, fa_list) {
struct fib_info *next_fi = fa->fa_info; struct fib_info *next_fi = fa->fa_info;
...@@ -2047,19 +2048,20 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res) ...@@ -2047,19 +2048,20 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res)
if (fa->fa_slen != slen) if (fa->fa_slen != slen)
continue; continue;
if (fa->fa_tos && fa->fa_tos != flp->flowi4_tos) if (fa->fa_dscp &&
fa->fa_dscp != inet_dsfield_to_dscp(flp->flowi4_tos))
continue; continue;
if (fa->tb_id != tb->tb_id) if (fa->tb_id != tb->tb_id)
continue; continue;
if (next_fi->fib_priority > last_prio && if (next_fi->fib_priority > last_prio &&
fa->fa_tos == last_tos) { fa->fa_dscp == last_dscp) {
if (last_tos) if (last_dscp)
continue; continue;
break; break;
} }
if (next_fi->fib_flags & RTNH_F_DEAD) if (next_fi->fib_flags & RTNH_F_DEAD)
continue; continue;
last_tos = fa->fa_tos; last_dscp = fa->fa_dscp;
last_prio = next_fi->fib_priority; last_prio = next_fi->fib_priority;
if (next_fi->fib_scope != res->scope || if (next_fi->fib_scope != res->scope ||
......
...@@ -61,6 +61,7 @@ ...@@ -61,6 +61,7 @@
#include <linux/vmalloc.h> #include <linux/vmalloc.h>
#include <linux/notifier.h> #include <linux/notifier.h>
#include <net/net_namespace.h> #include <net/net_namespace.h>
#include <net/inet_dscp.h>
#include <net/ip.h> #include <net/ip.h>
#include <net/protocol.h> #include <net/protocol.h>
#include <net/route.h> #include <net/route.h>
...@@ -81,7 +82,7 @@ static int call_fib_entry_notifier(struct notifier_block *nb, ...@@ -81,7 +82,7 @@ static int call_fib_entry_notifier(struct notifier_block *nb,
.dst = dst, .dst = dst,
.dst_len = dst_len, .dst_len = dst_len,
.fi = fa->fa_info, .fi = fa->fa_info,
.tos = fa->fa_tos, .tos = inet_dscp_to_dsfield(fa->fa_dscp),
.type = fa->fa_type, .type = fa->fa_type,
.tb_id = fa->tb_id, .tb_id = fa->tb_id,
}; };
...@@ -98,7 +99,7 @@ static int call_fib_entry_notifiers(struct net *net, ...@@ -98,7 +99,7 @@ static int call_fib_entry_notifiers(struct net *net,
.dst = dst, .dst = dst,
.dst_len = dst_len, .dst_len = dst_len,
.fi = fa->fa_info, .fi = fa->fa_info,
.tos = fa->fa_tos, .tos = inet_dscp_to_dsfield(fa->fa_dscp),
.type = fa->fa_type, .type = fa->fa_type,
.tb_id = fa->tb_id, .tb_id = fa->tb_id,
}; };
...@@ -973,13 +974,13 @@ static struct key_vector *fib_find_node(struct trie *t, ...@@ -973,13 +974,13 @@ static struct key_vector *fib_find_node(struct trie *t,
return n; return n;
} }
/* Return the first fib alias matching TOS with /* Return the first fib alias matching DSCP with
* priority less than or equal to PRIO. * priority less than or equal to PRIO.
* If 'find_first' is set, return the first matching * If 'find_first' is set, return the first matching
* fib alias, regardless of TOS and priority. * fib alias, regardless of DSCP and priority.
*/ */
static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen, static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen,
u8 tos, u32 prio, u32 tb_id, dscp_t dscp, u32 prio, u32 tb_id,
bool find_first) bool find_first)
{ {
struct fib_alias *fa; struct fib_alias *fa;
...@@ -988,6 +989,10 @@ static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen, ...@@ -988,6 +989,10 @@ static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen,
return NULL; return NULL;
hlist_for_each_entry(fa, fah, fa_list) { hlist_for_each_entry(fa, fah, fa_list) {
/* Avoid Sparse warning when using dscp_t in inequalities */
u8 __fa_dscp = inet_dscp_to_dsfield(fa->fa_dscp);
u8 __dscp = inet_dscp_to_dsfield(dscp);
if (fa->fa_slen < slen) if (fa->fa_slen < slen)
continue; continue;
if (fa->fa_slen != slen) if (fa->fa_slen != slen)
...@@ -998,9 +1003,9 @@ static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen, ...@@ -998,9 +1003,9 @@ static struct fib_alias *fib_find_alias(struct hlist_head *fah, u8 slen,
break; break;
if (find_first) if (find_first)
return fa; return fa;
if (fa->fa_tos > tos) if (__fa_dscp > __dscp)
continue; continue;
if (fa->fa_info->fib_priority >= prio || fa->fa_tos < tos) if (fa->fa_info->fib_priority >= prio || __fa_dscp < __dscp)
return fa; return fa;
} }
...@@ -1027,8 +1032,8 @@ fib_find_matching_alias(struct net *net, const struct fib_rt_info *fri) ...@@ -1027,8 +1032,8 @@ fib_find_matching_alias(struct net *net, const struct fib_rt_info *fri)
hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) { hlist_for_each_entry_rcu(fa, &l->leaf, fa_list) {
if (fa->fa_slen == slen && fa->tb_id == fri->tb_id && if (fa->fa_slen == slen && fa->tb_id == fri->tb_id &&
fa->fa_tos == fri->tos && fa->fa_info == fri->fi && fa->fa_dscp == inet_dsfield_to_dscp(fri->tos) &&
fa->fa_type == fri->type) fa->fa_info == fri->fi && fa->fa_type == fri->type)
return fa; return fa;
} }
...@@ -1210,7 +1215,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb, ...@@ -1210,7 +1215,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
struct fib_info *fi; struct fib_info *fi;
u8 plen = cfg->fc_dst_len; u8 plen = cfg->fc_dst_len;
u8 slen = KEYLENGTH - plen; u8 slen = KEYLENGTH - plen;
u8 tos = cfg->fc_tos; dscp_t dscp;
u32 key; u32 key;
int err; int err;
...@@ -1227,12 +1232,13 @@ int fib_table_insert(struct net *net, struct fib_table *tb, ...@@ -1227,12 +1232,13 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
goto err; goto err;
} }
dscp = cfg->fc_dscp;
l = fib_find_node(t, &tp, key); l = fib_find_node(t, &tp, key);
fa = l ? fib_find_alias(&l->leaf, slen, tos, fi->fib_priority, fa = l ? fib_find_alias(&l->leaf, slen, dscp, fi->fib_priority,
tb->tb_id, false) : NULL; tb->tb_id, false) : NULL;
/* Now fa, if non-NULL, points to the first fib alias /* Now fa, if non-NULL, points to the first fib alias
* with the same keys [prefix,tos,priority], if such key already * with the same keys [prefix,dscp,priority], if such key already
* exists or to the node before which we will insert new one. * exists or to the node before which we will insert new one.
* *
* If fa is NULL, we will need to allocate a new one and * If fa is NULL, we will need to allocate a new one and
...@@ -1240,7 +1246,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb, ...@@ -1240,7 +1246,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
* of the new alias. * of the new alias.
*/ */
if (fa && fa->fa_tos == tos && if (fa && fa->fa_dscp == dscp &&
fa->fa_info->fib_priority == fi->fib_priority) { fa->fa_info->fib_priority == fi->fib_priority) {
struct fib_alias *fa_first, *fa_match; struct fib_alias *fa_first, *fa_match;
...@@ -1260,7 +1266,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb, ...@@ -1260,7 +1266,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
hlist_for_each_entry_from(fa, fa_list) { hlist_for_each_entry_from(fa, fa_list) {
if ((fa->fa_slen != slen) || if ((fa->fa_slen != slen) ||
(fa->tb_id != tb->tb_id) || (fa->tb_id != tb->tb_id) ||
(fa->fa_tos != tos)) (fa->fa_dscp != dscp))
break; break;
if (fa->fa_info->fib_priority != fi->fib_priority) if (fa->fa_info->fib_priority != fi->fib_priority)
break; break;
...@@ -1288,7 +1294,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb, ...@@ -1288,7 +1294,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
goto out; goto out;
fi_drop = fa->fa_info; fi_drop = fa->fa_info;
new_fa->fa_tos = fa->fa_tos; new_fa->fa_dscp = fa->fa_dscp;
new_fa->fa_info = fi; new_fa->fa_info = fi;
new_fa->fa_type = cfg->fc_type; new_fa->fa_type = cfg->fc_type;
state = fa->fa_state; state = fa->fa_state;
...@@ -1351,7 +1357,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb, ...@@ -1351,7 +1357,7 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
goto out; goto out;
new_fa->fa_info = fi; new_fa->fa_info = fi;
new_fa->fa_tos = tos; new_fa->fa_dscp = dscp;
new_fa->fa_type = cfg->fc_type; new_fa->fa_type = cfg->fc_type;
new_fa->fa_state = 0; new_fa->fa_state = 0;
new_fa->fa_slen = slen; new_fa->fa_slen = slen;
...@@ -1567,7 +1573,8 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp, ...@@ -1567,7 +1573,8 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
if (index >= (1ul << fa->fa_slen)) if (index >= (1ul << fa->fa_slen))
continue; continue;
} }
if (fa->fa_tos && fa->fa_tos != flp->flowi4_tos) if (fa->fa_dscp &&
inet_dscp_to_dsfield(fa->fa_dscp) != flp->flowi4_tos)
continue; continue;
if (fi->fib_dead) if (fi->fib_dead)
continue; continue;
...@@ -1703,7 +1710,7 @@ int fib_table_delete(struct net *net, struct fib_table *tb, ...@@ -1703,7 +1710,7 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
struct key_vector *l, *tp; struct key_vector *l, *tp;
u8 plen = cfg->fc_dst_len; u8 plen = cfg->fc_dst_len;
u8 slen = KEYLENGTH - plen; u8 slen = KEYLENGTH - plen;
u8 tos = cfg->fc_tos; dscp_t dscp;
u32 key; u32 key;
key = ntohl(cfg->fc_dst); key = ntohl(cfg->fc_dst);
...@@ -1715,11 +1722,13 @@ int fib_table_delete(struct net *net, struct fib_table *tb, ...@@ -1715,11 +1722,13 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
if (!l) if (!l)
return -ESRCH; return -ESRCH;
fa = fib_find_alias(&l->leaf, slen, tos, 0, tb->tb_id, false); dscp = cfg->fc_dscp;
fa = fib_find_alias(&l->leaf, slen, dscp, 0, tb->tb_id, false);
if (!fa) if (!fa)
return -ESRCH; return -ESRCH;
pr_debug("Deleting %08x/%d tos=%d t=%p\n", key, plen, tos, t); pr_debug("Deleting %08x/%d dsfield=0x%02x t=%p\n", key, plen,
inet_dscp_to_dsfield(dscp), t);
fa_to_delete = NULL; fa_to_delete = NULL;
hlist_for_each_entry_from(fa, fa_list) { hlist_for_each_entry_from(fa, fa_list) {
...@@ -1727,7 +1736,7 @@ int fib_table_delete(struct net *net, struct fib_table *tb, ...@@ -1727,7 +1736,7 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
if ((fa->fa_slen != slen) || if ((fa->fa_slen != slen) ||
(fa->tb_id != tb->tb_id) || (fa->tb_id != tb->tb_id) ||
(fa->fa_tos != tos)) (fa->fa_dscp != dscp))
break; break;
if ((!cfg->fc_type || fa->fa_type == cfg->fc_type) && if ((!cfg->fc_type || fa->fa_type == cfg->fc_type) &&
...@@ -2295,7 +2304,7 @@ static int fn_trie_dump_leaf(struct key_vector *l, struct fib_table *tb, ...@@ -2295,7 +2304,7 @@ static int fn_trie_dump_leaf(struct key_vector *l, struct fib_table *tb,
fri.tb_id = tb->tb_id; fri.tb_id = tb->tb_id;
fri.dst = xkey; fri.dst = xkey;
fri.dst_len = KEYLENGTH - fa->fa_slen; fri.dst_len = KEYLENGTH - fa->fa_slen;
fri.tos = fa->fa_tos; fri.tos = inet_dscp_to_dsfield(fa->fa_dscp);
fri.type = fa->fa_type; fri.type = fa->fa_type;
fri.offload = fa->offload; fri.offload = fa->offload;
fri.trap = fa->trap; fri.trap = fa->trap;
...@@ -2807,8 +2816,9 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v) ...@@ -2807,8 +2816,9 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v)
fa->fa_info->fib_scope), fa->fa_info->fib_scope),
rtn_type(buf2, sizeof(buf2), rtn_type(buf2, sizeof(buf2),
fa->fa_type)); fa->fa_type));
if (fa->fa_tos) if (fa->fa_dscp)
seq_printf(seq, " tos=%d", fa->fa_tos); seq_printf(seq, " tos=%d",
inet_dscp_to_dsfield(fa->fa_dscp));
seq_putc(seq, '\n'); seq_putc(seq, '\n');
} }
} }
......
...@@ -84,6 +84,7 @@ ...@@ -84,6 +84,7 @@
#include <linux/jhash.h> #include <linux/jhash.h>
#include <net/dst.h> #include <net/dst.h>
#include <net/dst_metadata.h> #include <net/dst_metadata.h>
#include <net/inet_dscp.h>
#include <net/net_namespace.h> #include <net/net_namespace.h>
#include <net/ip.h> #include <net/ip.h>
#include <net/route.h> #include <net/route.h>
...@@ -3391,7 +3392,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, ...@@ -3391,7 +3392,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
if (fa->fa_slen == slen && if (fa->fa_slen == slen &&
fa->tb_id == fri.tb_id && fa->tb_id == fri.tb_id &&
fa->fa_tos == fri.tos && fa->fa_dscp == inet_dsfield_to_dscp(fri.tos) &&
fa->fa_info == res.fi && fa->fa_info == res.fi &&
fa->fa_type == fri.type) { fa->fa_type == fri.type) {
fri.offload = fa->offload; fri.offload = fa->offload;
......
...@@ -16,6 +16,7 @@ ...@@ -16,6 +16,7 @@
#include <linux/indirect_call_wrapper.h> #include <linux/indirect_call_wrapper.h>
#include <net/fib_rules.h> #include <net/fib_rules.h>
#include <net/inet_dscp.h>
#include <net/ipv6.h> #include <net/ipv6.h>
#include <net/addrconf.h> #include <net/addrconf.h>
#include <net/ip6_route.h> #include <net/ip6_route.h>
...@@ -25,14 +26,14 @@ struct fib6_rule { ...@@ -25,14 +26,14 @@ struct fib6_rule {
struct fib_rule common; struct fib_rule common;
struct rt6key src; struct rt6key src;
struct rt6key dst; struct rt6key dst;
u8 tclass; dscp_t dscp;
}; };
static bool fib6_rule_matchall(const struct fib_rule *rule) static bool fib6_rule_matchall(const struct fib_rule *rule)
{ {
struct fib6_rule *r = container_of(rule, struct fib6_rule, common); struct fib6_rule *r = container_of(rule, struct fib6_rule, common);
if (r->dst.plen || r->src.plen || r->tclass) if (r->dst.plen || r->src.plen || r->dscp)
return false; return false;
return fib_rule_matchall(rule); return fib_rule_matchall(rule);
} }
...@@ -323,7 +324,7 @@ INDIRECT_CALLABLE_SCOPE int fib6_rule_match(struct fib_rule *rule, ...@@ -323,7 +324,7 @@ INDIRECT_CALLABLE_SCOPE int fib6_rule_match(struct fib_rule *rule,
return 0; return 0;
} }
if (r->tclass && r->tclass != ip6_tclass(fl6->flowlabel)) if (r->dscp && r->dscp != ip6_dscp(fl6->flowlabel))
return 0; return 0;
if (rule->ip_proto && (rule->ip_proto != fl6->flowi6_proto)) if (rule->ip_proto && (rule->ip_proto != fl6->flowi6_proto))
...@@ -349,6 +350,13 @@ static int fib6_rule_configure(struct fib_rule *rule, struct sk_buff *skb, ...@@ -349,6 +350,13 @@ static int fib6_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
struct net *net = sock_net(skb->sk); struct net *net = sock_net(skb->sk);
struct fib6_rule *rule6 = (struct fib6_rule *) rule; struct fib6_rule *rule6 = (struct fib6_rule *) rule;
if (!inet_validate_dscp(frh->tos)) {
NL_SET_ERR_MSG(extack,
"Invalid dsfield (tos): ECN bits must be 0");
goto errout;
}
rule6->dscp = inet_dsfield_to_dscp(frh->tos);
if (rule->action == FR_ACT_TO_TBL && !rule->l3mdev) { if (rule->action == FR_ACT_TO_TBL && !rule->l3mdev) {
if (rule->table == RT6_TABLE_UNSPEC) { if (rule->table == RT6_TABLE_UNSPEC) {
NL_SET_ERR_MSG(extack, "Invalid table"); NL_SET_ERR_MSG(extack, "Invalid table");
...@@ -369,7 +377,6 @@ static int fib6_rule_configure(struct fib_rule *rule, struct sk_buff *skb, ...@@ -369,7 +377,6 @@ static int fib6_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
rule6->src.plen = frh->src_len; rule6->src.plen = frh->src_len;
rule6->dst.plen = frh->dst_len; rule6->dst.plen = frh->dst_len;
rule6->tclass = frh->tos;
if (fib_rule_requires_fldissect(rule)) if (fib_rule_requires_fldissect(rule))
net->ipv6.fib6_rules_require_fldissect++; net->ipv6.fib6_rules_require_fldissect++;
...@@ -402,7 +409,7 @@ static int fib6_rule_compare(struct fib_rule *rule, struct fib_rule_hdr *frh, ...@@ -402,7 +409,7 @@ static int fib6_rule_compare(struct fib_rule *rule, struct fib_rule_hdr *frh,
if (frh->dst_len && (rule6->dst.plen != frh->dst_len)) if (frh->dst_len && (rule6->dst.plen != frh->dst_len))
return 0; return 0;
if (frh->tos && (rule6->tclass != frh->tos)) if (frh->tos && inet_dscp_to_dsfield(rule6->dscp) != frh->tos)
return 0; return 0;
if (frh->src_len && if (frh->src_len &&
...@@ -423,7 +430,7 @@ static int fib6_rule_fill(struct fib_rule *rule, struct sk_buff *skb, ...@@ -423,7 +430,7 @@ static int fib6_rule_fill(struct fib_rule *rule, struct sk_buff *skb,
frh->dst_len = rule6->dst.plen; frh->dst_len = rule6->dst.plen;
frh->src_len = rule6->src.plen; frh->src_len = rule6->src.plen;
frh->tos = rule6->tclass; frh->tos = inet_dscp_to_dsfield(rule6->dscp);
if ((rule6->dst.plen && if ((rule6->dst.plen &&
nla_put_in6_addr(skb, FRA_DST, &rule6->dst.addr)) || nla_put_in6_addr(skb, FRA_DST, &rule6->dst.addr)) ||
......
...@@ -114,10 +114,25 @@ fib_rule6_test_match_n_redirect() ...@@ -114,10 +114,25 @@ fib_rule6_test_match_n_redirect()
log_test $? 0 "rule6 del by pref: $description" log_test $? 0 "rule6 del by pref: $description"
} }
fib_rule6_test_reject()
{
local match="$1"
local rc
$IP -6 rule add $match table $RTABLE 2>/dev/null
rc=$?
log_test $rc 2 "rule6 check: $match"
if [ $rc -eq 0 ]; then
$IP -6 rule del $match table $RTABLE
fi
}
fib_rule6_test() fib_rule6_test()
{ {
local getmatch local getmatch
local match local match
local cnt
# setup the fib rule redirect route # setup the fib rule redirect route
$IP -6 route add table $RTABLE default via $GW_IP6 dev $DEV onlink $IP -6 route add table $RTABLE default via $GW_IP6 dev $DEV onlink
...@@ -128,8 +143,21 @@ fib_rule6_test() ...@@ -128,8 +143,21 @@ fib_rule6_test()
match="from $SRC_IP6 iif $DEV" match="from $SRC_IP6 iif $DEV"
fib_rule6_test_match_n_redirect "$match" "$match" "iif redirect to table" fib_rule6_test_match_n_redirect "$match" "$match" "iif redirect to table"
# Reject dsfield (tos) options which have ECN bits set
for cnt in $(seq 1 3); do
match="dsfield $cnt"
fib_rule6_test_reject "$match"
done
# Don't take ECN bits into account when matching on dsfield
match="tos 0x10" match="tos 0x10"
fib_rule6_test_match_n_redirect "$match" "$match" "tos redirect to table" for cnt in "0x10" "0x11" "0x12" "0x13"; do
# Using option 'tos' instead of 'dsfield' as old iproute2
# versions don't support 'dsfield' in ip rule show.
getmatch="tos $cnt"
fib_rule6_test_match_n_redirect "$match" "$getmatch" \
"$getmatch redirect to table"
done
match="fwmark 0x64" match="fwmark 0x64"
getmatch="mark 0x64" getmatch="mark 0x64"
...@@ -187,10 +215,25 @@ fib_rule4_test_match_n_redirect() ...@@ -187,10 +215,25 @@ fib_rule4_test_match_n_redirect()
log_test $? 0 "rule4 del by pref: $description" log_test $? 0 "rule4 del by pref: $description"
} }
fib_rule4_test_reject()
{
local match="$1"
local rc
$IP rule add $match table $RTABLE 2>/dev/null
rc=$?
log_test $rc 2 "rule4 check: $match"
if [ $rc -eq 0 ]; then
$IP rule del $match table $RTABLE
fi
}
fib_rule4_test() fib_rule4_test()
{ {
local getmatch local getmatch
local match local match
local cnt
# setup the fib rule redirect route # setup the fib rule redirect route
$IP route add table $RTABLE default via $GW_IP4 dev $DEV onlink $IP route add table $RTABLE default via $GW_IP4 dev $DEV onlink
...@@ -206,8 +249,21 @@ fib_rule4_test() ...@@ -206,8 +249,21 @@ fib_rule4_test()
fib_rule4_test_match_n_redirect "$match" "$match" "iif redirect to table" fib_rule4_test_match_n_redirect "$match" "$match" "iif redirect to table"
ip netns exec testns sysctl -qw net.ipv4.ip_forward=0 ip netns exec testns sysctl -qw net.ipv4.ip_forward=0
# Reject dsfield (tos) options which have ECN bits set
for cnt in $(seq 1 3); do
match="dsfield $cnt"
fib_rule4_test_reject "$match"
done
# Don't take ECN bits into account when matching on dsfield
match="tos 0x10" match="tos 0x10"
fib_rule4_test_match_n_redirect "$match" "$match" "tos redirect to table" for cnt in "0x10" "0x11" "0x12" "0x13"; do
# Using option 'tos' instead of 'dsfield' as old iproute2
# versions don't support 'dsfield' in ip rule show.
getmatch="tos $cnt"
fib_rule4_test_match_n_redirect "$match" "$getmatch" \
"$getmatch redirect to table"
done
match="fwmark 0x64" match="fwmark 0x64"
getmatch="mark 0x64" getmatch="mark 0x64"
......
...@@ -1447,6 +1447,81 @@ ipv4_local_rt_cache() ...@@ -1447,6 +1447,81 @@ ipv4_local_rt_cache()
log_test $? 0 "Cached route removed from VRF port device" log_test $? 0 "Cached route removed from VRF port device"
} }
ipv4_rt_dsfield()
{
echo
echo "IPv4 route with dsfield tests"
run_cmd "$IP route flush 172.16.102.0/24"
# New routes should reject dsfield options that interfere with ECN
run_cmd "$IP route add 172.16.102.0/24 dsfield 0x01 via 172.16.101.2"
log_test $? 2 "Reject route with dsfield 0x01"
run_cmd "$IP route add 172.16.102.0/24 dsfield 0x02 via 172.16.101.2"
log_test $? 2 "Reject route with dsfield 0x02"
run_cmd "$IP route add 172.16.102.0/24 dsfield 0x03 via 172.16.101.2"
log_test $? 2 "Reject route with dsfield 0x03"
# A generic route that doesn't take DSCP into account
run_cmd "$IP route add 172.16.102.0/24 via 172.16.101.2"
# A more specific route for DSCP 0x10
run_cmd "$IP route add 172.16.102.0/24 dsfield 0x10 via 172.16.103.2"
# DSCP 0x10 should match the specific route, no matter the ECN bits
$IP route get fibmatch 172.16.102.1 dsfield 0x10 | \
grep -q "via 172.16.103.2"
log_test $? 0 "IPv4 route with DSCP and ECN:Not-ECT"
$IP route get fibmatch 172.16.102.1 dsfield 0x11 | \
grep -q "via 172.16.103.2"
log_test $? 0 "IPv4 route with DSCP and ECN:ECT(1)"
$IP route get fibmatch 172.16.102.1 dsfield 0x12 | \
grep -q "via 172.16.103.2"
log_test $? 0 "IPv4 route with DSCP and ECN:ECT(0)"
$IP route get fibmatch 172.16.102.1 dsfield 0x13 | \
grep -q "via 172.16.103.2"
log_test $? 0 "IPv4 route with DSCP and ECN:CE"
# Unknown DSCP should match the generic route, no matter the ECN bits
$IP route get fibmatch 172.16.102.1 dsfield 0x14 | \
grep -q "via 172.16.101.2"
log_test $? 0 "IPv4 route with unknown DSCP and ECN:Not-ECT"
$IP route get fibmatch 172.16.102.1 dsfield 0x15 | \
grep -q "via 172.16.101.2"
log_test $? 0 "IPv4 route with unknown DSCP and ECN:ECT(1)"
$IP route get fibmatch 172.16.102.1 dsfield 0x16 | \
grep -q "via 172.16.101.2"
log_test $? 0 "IPv4 route with unknown DSCP and ECN:ECT(0)"
$IP route get fibmatch 172.16.102.1 dsfield 0x17 | \
grep -q "via 172.16.101.2"
log_test $? 0 "IPv4 route with unknown DSCP and ECN:CE"
# Null DSCP should match the generic route, no matter the ECN bits
$IP route get fibmatch 172.16.102.1 dsfield 0x00 | \
grep -q "via 172.16.101.2"
log_test $? 0 "IPv4 route with no DSCP and ECN:Not-ECT"
$IP route get fibmatch 172.16.102.1 dsfield 0x01 | \
grep -q "via 172.16.101.2"
log_test $? 0 "IPv4 route with no DSCP and ECN:ECT(1)"
$IP route get fibmatch 172.16.102.1 dsfield 0x02 | \
grep -q "via 172.16.101.2"
log_test $? 0 "IPv4 route with no DSCP and ECN:ECT(0)"
$IP route get fibmatch 172.16.102.1 dsfield 0x03 | \
grep -q "via 172.16.101.2"
log_test $? 0 "IPv4 route with no DSCP and ECN:CE"
}
ipv4_route_test() ipv4_route_test()
{ {
route_setup route_setup
...@@ -1454,6 +1529,7 @@ ipv4_route_test() ...@@ -1454,6 +1529,7 @@ ipv4_route_test()
ipv4_rt_add ipv4_rt_add
ipv4_rt_replace ipv4_rt_replace
ipv4_local_rt_cache ipv4_local_rt_cache
ipv4_rt_dsfield
route_cleanup route_cleanup
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment