Commit 7c804e91 authored by David S. Miller's avatar David S. Miller

Merge branch 'ipv6-ioam'

Justin Iurman says:

====================
Support for the IOAM Pre-allocated Trace with IPv6

v5:
 - Refine types, min/max and default values for new sysctls
 - Introduce a "_wide" sysctl for each "ioam6_id" sysctl
 - Add more validation on headers before processing data
 - RCU for sc <> ns pointers + appropriate accessors
 - Generic Netlink policies are now per op, not per family anymore
 - Address other comments/remarks from Jakub (thanks again)
 - Revert "__packed" to "__attribute__((packed))" for uapi headers
 - Add tests to cover the functionality added, as requested by David Ahern

v4:
 - Address warnings from checkpatch (ignore errors related to unnamed bitfields
   in the first patch)
 - Use of hweight32 (thanks Jakub)
 - Remove inline keyword from static functions in C files and let the compiler
   decide what to do (thanks Jakub)

v3:
 - Fix warning "unused label 'out_unregister_genl'" by adding conditional macro
 - Fix lwtunnel output redirect bug: dst cache useless in this case, use
   orig_output instead

v2:
 - Fix warning with static for __ioam6_fill_trace_data
 - Fix sparse warning with __force when casting __be64 to __be32
 - Fix unchecked dereference when removing IOAM namespaces or schemas
 - exthdrs.c: Don't drop by default (now: ignore) to match the act bits "00"
 - Add control plane support for the inline insertion (lwtunnel)
 - Provide uapi structures
 - Use __net_timestamp if skb->tstamp is empty
 - Add note about the temporary IANA allocation
 - Remove support for "removable" TLVs
 - Remove support for virtual/anonymous tunnel decapsulation

In-situ Operations, Administration, and Maintenance (IOAM) records
operational and telemetry information in a packet while it traverses
a path between two points in an IOAM domain. It is defined in
draft-ietf-ippm-ioam-data [1]. IOAM data fields can be encapsulated
into a variety of protocols. The IPv6 encapsulation is defined in
draft-ietf-ippm-ioam-ipv6-options [2], via extension headers. IOAM
can be used to complement OAM mechanisms based on e.g. ICMP or other
types of probe packets.

This patchset implements support for the Pre-allocated Trace, carried
by a Hop-by-Hop. Therefore, a new IPv6 Hop-by-Hop TLV option is
introduced, see IANA [3]. The three other IOAM options are not included
in this patchset (Incremental Trace, Proof-of-Transit and Edge-to-Edge).
The main idea behind the IOAM Pre-allocated Trace is that a node
pre-allocates some room in packets for IOAM data. Then, each IOAM node
on the path will insert its data. There exist several interesting use-
cases, e.g. Fast failure detection/isolation or Smart service selection.
Another killer use-case is what we have called Cross-Layer Telemetry,
see the demo video on its repository [4], that aims to make the entire
stack (L2/L3 -> L7) visible for distributed tracing tools (e.g. Jaeger),
instead of the current L5 -> L7 limited view. So, basically, this is a
nice feature for the Linux Kernel.

This patchset also provides support for the control plane part, but only for the
inline insertion (host-to-host use case), through lightweight tunnels. Indeed,
for in-transit traffic, the solution is to have an IPv6-in-IPv6 encapsulation,
which brings some difficulties and still requires a little bit of work and
discussion (ie anonymous tunnel decapsulation and multi egress resolution).

- Patch 1: IPv6 IOAM headers definition
- Patch 2: Data plane support for Pre-allocated Trace
- Patch 3: IOAM Generic Netlink API
- Patch 4: Support for IOAM injection with lwtunnels
- Patch 5: Documentation for new IOAM sysctls
- Patch 6: Test for the IOAM insertion with IPv6

  [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data
  [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options
  [3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
  [4] https://github.com/iurmanj/cross-layer-telemetry
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 71f4f89a 968691c7
.. SPDX-License-Identifier: GPL-2.0
=====================
IOAM6 Sysfs variables
=====================
/proc/sys/net/conf/<iface>/ioam6_* variables:
=============================================
ioam6_enabled - BOOL
Accept (= enabled) or ignore (= disabled) IPv6 IOAM options on ingress
for this interface.
* 0 - disabled (default)
* 1 - enabled
ioam6_id - SHORT INTEGER
Define the IOAM id of this interface.
Default is ~0.
ioam6_id_wide - INTEGER
Define the wide IOAM id of this interface.
Default is ~0.
...@@ -1926,6 +1926,23 @@ fib_notify_on_flag_change - INTEGER ...@@ -1926,6 +1926,23 @@ fib_notify_on_flag_change - INTEGER
- 1 - Emit notifications. - 1 - Emit notifications.
- 2 - Emit notifications only for RTM_F_OFFLOAD_FAILED flag change. - 2 - Emit notifications only for RTM_F_OFFLOAD_FAILED flag change.
ioam6_id - INTEGER
Define the IOAM id of this node. Uses only 24 bits out of 32 in total.
Min: 0
Max: 0xFFFFFF
Default: 0xFFFFFF
ioam6_id_wide - LONG INTEGER
Define the wide IOAM id of this node. Uses only 56 bits out of 64 in
total. Can be different from ioam6_id.
Min: 0
Max: 0xFFFFFFFFFFFFFF
Default: 0xFFFFFFFFFFFFFF
IPv6 Fragmentation: IPv6 Fragmentation:
ip6frag_high_thresh - INTEGER ip6frag_high_thresh - INTEGER
......
/* SPDX-License-Identifier: GPL-2.0+ */
/*
* IPv6 IOAM
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _LINUX_IOAM6_H
#define _LINUX_IOAM6_H
#include <uapi/linux/ioam6.h>
#endif /* _LINUX_IOAM6_H */
/* SPDX-License-Identifier: GPL-2.0+ */
/*
* IPv6 IOAM Generic Netlink API
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _LINUX_IOAM6_GENL_H
#define _LINUX_IOAM6_GENL_H
#include <uapi/linux/ioam6_genl.h>
#endif /* _LINUX_IOAM6_GENL_H */
/* SPDX-License-Identifier: GPL-2.0+ */
/*
* IPv6 IOAM Lightweight Tunnel API
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _LINUX_IOAM6_IPTUNNEL_H
#define _LINUX_IOAM6_IPTUNNEL_H
#include <uapi/linux/ioam6_iptunnel.h>
#endif /* _LINUX_IOAM6_IPTUNNEL_H */
...@@ -76,6 +76,9 @@ struct ipv6_devconf { ...@@ -76,6 +76,9 @@ struct ipv6_devconf {
__s32 disable_policy; __s32 disable_policy;
__s32 ndisc_tclass; __s32 ndisc_tclass;
__s32 rpl_seg_enabled; __s32 rpl_seg_enabled;
__u32 ioam6_id;
__u32 ioam6_id_wide;
__u8 ioam6_enabled;
struct ctl_table_header *sysctl_header; struct ctl_table_header *sysctl_header;
}; };
......
/* SPDX-License-Identifier: GPL-2.0+ */
/*
* IPv6 IOAM implementation
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _NET_IOAM6_H
#define _NET_IOAM6_H
#include <linux/net.h>
#include <linux/ipv6.h>
#include <linux/ioam6.h>
#include <linux/rhashtable-types.h>
struct ioam6_namespace {
struct rhash_head head;
struct rcu_head rcu;
struct ioam6_schema __rcu *schema;
__be16 id;
__be32 data;
__be64 data_wide;
};
struct ioam6_schema {
struct rhash_head head;
struct rcu_head rcu;
struct ioam6_namespace __rcu *ns;
u32 id;
int len;
__be32 hdr;
u8 data[0];
};
struct ioam6_pernet_data {
struct mutex lock;
struct rhashtable namespaces;
struct rhashtable schemas;
};
static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
{
#if IS_ENABLED(CONFIG_IPV6)
return net->ipv6.ioam6_data;
#else
return NULL;
#endif
}
struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
void ioam6_fill_trace_data(struct sk_buff *skb,
struct ioam6_namespace *ns,
struct ioam6_trace_hdr *trace);
int ioam6_init(void);
void ioam6_exit(void);
int ioam6_iptunnel_init(void);
void ioam6_iptunnel_exit(void);
#endif /* _NET_IOAM6_H */
...@@ -51,6 +51,8 @@ struct netns_sysctl_ipv6 { ...@@ -51,6 +51,8 @@ struct netns_sysctl_ipv6 {
int max_dst_opts_len; int max_dst_opts_len;
int max_hbh_opts_len; int max_hbh_opts_len;
int seg6_flowlabel; int seg6_flowlabel;
u32 ioam6_id;
u64 ioam6_id_wide;
bool skip_notify_on_dev_down; bool skip_notify_on_dev_down;
u8 fib_notify_on_flag_change; u8 fib_notify_on_flag_change;
}; };
...@@ -110,6 +112,7 @@ struct netns_ipv6 { ...@@ -110,6 +112,7 @@ struct netns_ipv6 {
spinlock_t lock; spinlock_t lock;
u32 seq; u32 seq;
} ip6addrlbl_table; } ip6addrlbl_table;
struct ioam6_pernet_data *ioam6_data;
}; };
#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
......
...@@ -145,6 +145,7 @@ struct in6_flowlabel_req { ...@@ -145,6 +145,7 @@ struct in6_flowlabel_req {
#define IPV6_TLV_PADN 1 #define IPV6_TLV_PADN 1
#define IPV6_TLV_ROUTERALERT 5 #define IPV6_TLV_ROUTERALERT 5
#define IPV6_TLV_CALIPSO 7 /* RFC 5570 */ #define IPV6_TLV_CALIPSO 7 /* RFC 5570 */
#define IPV6_TLV_IOAM 49 /* TEMPORARY IANA allocation for IOAM */
#define IPV6_TLV_JUMBO 194 #define IPV6_TLV_JUMBO 194
#define IPV6_TLV_HAO 201 /* home address option */ #define IPV6_TLV_HAO 201 /* home address option */
......
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
/*
* IPv6 IOAM implementation
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _UAPI_LINUX_IOAM6_H
#define _UAPI_LINUX_IOAM6_H
#include <asm/byteorder.h>
#include <linux/types.h>
#define IOAM6_U16_UNAVAILABLE U16_MAX
#define IOAM6_U32_UNAVAILABLE U32_MAX
#define IOAM6_U64_UNAVAILABLE U64_MAX
#define IOAM6_DEFAULT_ID (IOAM6_U32_UNAVAILABLE >> 8)
#define IOAM6_DEFAULT_ID_WIDE (IOAM6_U64_UNAVAILABLE >> 8)
#define IOAM6_DEFAULT_IF_ID IOAM6_U16_UNAVAILABLE
#define IOAM6_DEFAULT_IF_ID_WIDE IOAM6_U32_UNAVAILABLE
/*
* IPv6 IOAM Option Header
*/
struct ioam6_hdr {
__u8 opt_type;
__u8 opt_len;
__u8 :8; /* reserved */
#define IOAM6_TYPE_PREALLOC 0
__u8 type;
} __attribute__((packed));
/*
* IOAM Trace Header
*/
struct ioam6_trace_hdr {
__be16 namespace_id;
#if defined(__LITTLE_ENDIAN_BITFIELD)
__u8 :1, /* unused */
:1, /* unused */
overflow:1,
nodelen:5;
__u8 remlen:7,
:1; /* unused */
union {
__be32 type_be32;
struct {
__u32 bit7:1,
bit6:1,
bit5:1,
bit4:1,
bit3:1,
bit2:1,
bit1:1,
bit0:1,
bit15:1, /* unused */
bit14:1, /* unused */
bit13:1, /* unused */
bit12:1, /* unused */
bit11:1,
bit10:1,
bit9:1,
bit8:1,
bit23:1, /* reserved */
bit22:1,
bit21:1, /* unused */
bit20:1, /* unused */
bit19:1, /* unused */
bit18:1, /* unused */
bit17:1, /* unused */
bit16:1, /* unused */
:8; /* reserved */
} type;
};
#elif defined(__BIG_ENDIAN_BITFIELD)
__u8 nodelen:5,
overflow:1,
:1, /* unused */
:1; /* unused */
__u8 :1, /* unused */
remlen:7;
union {
__be32 type_be32;
struct {
__u32 bit0:1,
bit1:1,
bit2:1,
bit3:1,
bit4:1,
bit5:1,
bit6:1,
bit7:1,
bit8:1,
bit9:1,
bit10:1,
bit11:1,
bit12:1, /* unused */
bit13:1, /* unused */
bit14:1, /* unused */
bit15:1, /* unused */
bit16:1, /* unused */
bit17:1, /* unused */
bit18:1, /* unused */
bit19:1, /* unused */
bit20:1, /* unused */
bit21:1, /* unused */
bit22:1,
bit23:1, /* reserved */
:8; /* reserved */
} type;
};
#else
#error "Please fix <asm/byteorder.h>"
#endif
#define IOAM6_TRACE_DATA_SIZE_MAX 244
__u8 data[0];
} __attribute__((packed));
#endif /* _UAPI_LINUX_IOAM6_H */
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
/*
* IPv6 IOAM Generic Netlink API
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _UAPI_LINUX_IOAM6_GENL_H
#define _UAPI_LINUX_IOAM6_GENL_H
#define IOAM6_GENL_NAME "IOAM6"
#define IOAM6_GENL_VERSION 0x1
enum {
IOAM6_ATTR_UNSPEC,
IOAM6_ATTR_NS_ID, /* u16 */
IOAM6_ATTR_NS_DATA, /* u32 */
IOAM6_ATTR_NS_DATA_WIDE,/* u64 */
#define IOAM6_MAX_SCHEMA_DATA_LEN (255 * 4)
IOAM6_ATTR_SC_ID, /* u32 */
IOAM6_ATTR_SC_DATA, /* Binary */
IOAM6_ATTR_SC_NONE, /* Flag */
IOAM6_ATTR_PAD,
__IOAM6_ATTR_MAX,
};
#define IOAM6_ATTR_MAX (__IOAM6_ATTR_MAX - 1)
enum {
IOAM6_CMD_UNSPEC,
IOAM6_CMD_ADD_NAMESPACE,
IOAM6_CMD_DEL_NAMESPACE,
IOAM6_CMD_DUMP_NAMESPACES,
IOAM6_CMD_ADD_SCHEMA,
IOAM6_CMD_DEL_SCHEMA,
IOAM6_CMD_DUMP_SCHEMAS,
IOAM6_CMD_NS_SET_SCHEMA,
__IOAM6_CMD_MAX,
};
#define IOAM6_CMD_MAX (__IOAM6_CMD_MAX - 1)
#endif /* _UAPI_LINUX_IOAM6_GENL_H */
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
/*
* IPv6 IOAM Lightweight Tunnel API
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _UAPI_LINUX_IOAM6_IPTUNNEL_H
#define _UAPI_LINUX_IOAM6_IPTUNNEL_H
enum {
IOAM6_IPTUNNEL_UNSPEC,
IOAM6_IPTUNNEL_TRACE, /* struct ioam6_trace_hdr */
__IOAM6_IPTUNNEL_MAX,
};
#define IOAM6_IPTUNNEL_MAX (__IOAM6_IPTUNNEL_MAX - 1)
#endif /* _UAPI_LINUX_IOAM6_IPTUNNEL_H */
...@@ -190,6 +190,9 @@ enum { ...@@ -190,6 +190,9 @@ enum {
DEVCONF_NDISC_TCLASS, DEVCONF_NDISC_TCLASS,
DEVCONF_RPL_SEG_ENABLED, DEVCONF_RPL_SEG_ENABLED,
DEVCONF_RA_DEFRTR_METRIC, DEVCONF_RA_DEFRTR_METRIC,
DEVCONF_IOAM6_ENABLED,
DEVCONF_IOAM6_ID,
DEVCONF_IOAM6_ID_WIDE,
DEVCONF_MAX DEVCONF_MAX
}; };
......
...@@ -14,6 +14,7 @@ enum lwtunnel_encap_types { ...@@ -14,6 +14,7 @@ enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_BPF, LWTUNNEL_ENCAP_BPF,
LWTUNNEL_ENCAP_SEG6_LOCAL, LWTUNNEL_ENCAP_SEG6_LOCAL,
LWTUNNEL_ENCAP_RPL, LWTUNNEL_ENCAP_RPL,
LWTUNNEL_ENCAP_IOAM6,
__LWTUNNEL_ENCAP_MAX, __LWTUNNEL_ENCAP_MAX,
}; };
......
...@@ -43,6 +43,8 @@ static const char *lwtunnel_encap_str(enum lwtunnel_encap_types encap_type) ...@@ -43,6 +43,8 @@ static const char *lwtunnel_encap_str(enum lwtunnel_encap_types encap_type)
return "SEG6LOCAL"; return "SEG6LOCAL";
case LWTUNNEL_ENCAP_RPL: case LWTUNNEL_ENCAP_RPL:
return "RPL"; return "RPL";
case LWTUNNEL_ENCAP_IOAM6:
return "IOAM6";
case LWTUNNEL_ENCAP_IP6: case LWTUNNEL_ENCAP_IP6:
case LWTUNNEL_ENCAP_IP: case LWTUNNEL_ENCAP_IP:
case LWTUNNEL_ENCAP_NONE: case LWTUNNEL_ENCAP_NONE:
......
...@@ -328,4 +328,15 @@ config IPV6_RPL_LWTUNNEL ...@@ -328,4 +328,15 @@ config IPV6_RPL_LWTUNNEL
If unsure, say N. If unsure, say N.
config IPV6_IOAM6_LWTUNNEL
bool "IPv6: IOAM Pre-allocated Trace insertion support"
depends on IPV6
select LWTUNNEL
help
Support for the inline insertion of IOAM Pre-allocated
Trace Header (only on locally generated packets), using
the lightweight tunnels mechanism.
If unsure, say N.
endif # IPV6 endif # IPV6
...@@ -10,7 +10,7 @@ ipv6-objs := af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o \ ...@@ -10,7 +10,7 @@ ipv6-objs := af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o \
route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \ route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \ raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \ exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
udp_offload.o seg6.o fib6_notifier.o rpl.o udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
ipv6-offload := ip6_offload.o tcpv6_offload.o exthdrs_offload.o ipv6-offload := ip6_offload.o tcpv6_offload.o exthdrs_offload.o
...@@ -27,6 +27,7 @@ ipv6-$(CONFIG_NETLABEL) += calipso.o ...@@ -27,6 +27,7 @@ ipv6-$(CONFIG_NETLABEL) += calipso.o
ipv6-$(CONFIG_IPV6_SEG6_LWTUNNEL) += seg6_iptunnel.o seg6_local.o ipv6-$(CONFIG_IPV6_SEG6_LWTUNNEL) += seg6_iptunnel.o seg6_local.o
ipv6-$(CONFIG_IPV6_SEG6_HMAC) += seg6_hmac.o ipv6-$(CONFIG_IPV6_SEG6_HMAC) += seg6_hmac.o
ipv6-$(CONFIG_IPV6_RPL_LWTUNNEL) += rpl_iptunnel.o ipv6-$(CONFIG_IPV6_RPL_LWTUNNEL) += rpl_iptunnel.o
ipv6-$(CONFIG_IPV6_IOAM6_LWTUNNEL) += ioam6_iptunnel.o
ipv6-objs += $(ipv6-y) ipv6-objs += $(ipv6-y)
......
...@@ -89,12 +89,15 @@ ...@@ -89,12 +89,15 @@
#include <linux/proc_fs.h> #include <linux/proc_fs.h>
#include <linux/seq_file.h> #include <linux/seq_file.h>
#include <linux/export.h> #include <linux/export.h>
#include <linux/ioam6.h>
#define INFINITY_LIFE_TIME 0xFFFFFFFF #define INFINITY_LIFE_TIME 0xFFFFFFFF
#define IPV6_MAX_STRLEN \ #define IPV6_MAX_STRLEN \
sizeof("ffff:ffff:ffff:ffff:ffff:ffff:255.255.255.255") sizeof("ffff:ffff:ffff:ffff:ffff:ffff:255.255.255.255")
static u32 ioam6_if_id_max = U16_MAX;
static inline u32 cstamp_delta(unsigned long cstamp) static inline u32 cstamp_delta(unsigned long cstamp)
{ {
return (cstamp - INITIAL_JIFFIES) * 100UL / HZ; return (cstamp - INITIAL_JIFFIES) * 100UL / HZ;
...@@ -237,6 +240,9 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = { ...@@ -237,6 +240,9 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
.addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64, .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
.disable_policy = 0, .disable_policy = 0,
.rpl_seg_enabled = 0, .rpl_seg_enabled = 0,
.ioam6_enabled = 0,
.ioam6_id = IOAM6_DEFAULT_IF_ID,
.ioam6_id_wide = IOAM6_DEFAULT_IF_ID_WIDE,
}; };
static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
...@@ -293,6 +299,9 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { ...@@ -293,6 +299,9 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
.addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64, .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
.disable_policy = 0, .disable_policy = 0,
.rpl_seg_enabled = 0, .rpl_seg_enabled = 0,
.ioam6_enabled = 0,
.ioam6_id = IOAM6_DEFAULT_IF_ID,
.ioam6_id_wide = IOAM6_DEFAULT_IF_ID_WIDE,
}; };
/* Check if link is ready: is it up and is a valid qdisc available */ /* Check if link is ready: is it up and is a valid qdisc available */
...@@ -5524,6 +5533,9 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf, ...@@ -5524,6 +5533,9 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy; array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass; array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled; array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
array[DEVCONF_IOAM6_ID_WIDE] = cnf->ioam6_id_wide;
} }
static inline size_t inet6_ifla6_size(void) static inline size_t inet6_ifla6_size(void)
...@@ -6930,6 +6942,31 @@ static const struct ctl_table addrconf_sysctl[] = { ...@@ -6930,6 +6942,31 @@ static const struct ctl_table addrconf_sysctl[] = {
.mode = 0644, .mode = 0644,
.proc_handler = proc_dointvec, .proc_handler = proc_dointvec,
}, },
{
.procname = "ioam6_enabled",
.data = &ipv6_devconf.ioam6_enabled,
.maxlen = sizeof(u8),
.mode = 0644,
.proc_handler = proc_dou8vec_minmax,
.extra1 = (void *)SYSCTL_ZERO,
.extra2 = (void *)SYSCTL_ONE,
},
{
.procname = "ioam6_id",
.data = &ipv6_devconf.ioam6_id,
.maxlen = sizeof(u32),
.mode = 0644,
.proc_handler = proc_douintvec_minmax,
.extra1 = (void *)SYSCTL_ZERO,
.extra2 = (void *)&ioam6_if_id_max,
},
{
.procname = "ioam6_id_wide",
.data = &ipv6_devconf.ioam6_id_wide,
.maxlen = sizeof(u32),
.mode = 0644,
.proc_handler = proc_douintvec,
},
{ {
/* sentinel */ /* sentinel */
} }
......
...@@ -62,6 +62,7 @@ ...@@ -62,6 +62,7 @@
#include <net/rpl.h> #include <net/rpl.h>
#include <net/compat.h> #include <net/compat.h>
#include <net/xfrm.h> #include <net/xfrm.h>
#include <net/ioam6.h>
#include <linux/uaccess.h> #include <linux/uaccess.h>
#include <linux/mroute6.h> #include <linux/mroute6.h>
...@@ -961,6 +962,9 @@ static int __net_init inet6_net_init(struct net *net) ...@@ -961,6 +962,9 @@ static int __net_init inet6_net_init(struct net *net)
net->ipv6.sysctl.fib_notify_on_flag_change = 0; net->ipv6.sysctl.fib_notify_on_flag_change = 0;
atomic_set(&net->ipv6.fib6_sernum, 1); atomic_set(&net->ipv6.fib6_sernum, 1);
net->ipv6.sysctl.ioam6_id = IOAM6_DEFAULT_ID;
net->ipv6.sysctl.ioam6_id_wide = IOAM6_DEFAULT_ID_WIDE;
err = ipv6_init_mibs(net); err = ipv6_init_mibs(net);
if (err) if (err)
return err; return err;
...@@ -1191,6 +1195,10 @@ static int __init inet6_init(void) ...@@ -1191,6 +1195,10 @@ static int __init inet6_init(void)
if (err) if (err)
goto rpl_fail; goto rpl_fail;
err = ioam6_init();
if (err)
goto ioam6_fail;
err = igmp6_late_init(); err = igmp6_late_init();
if (err) if (err)
goto igmp6_late_err; goto igmp6_late_err;
...@@ -1213,6 +1221,8 @@ static int __init inet6_init(void) ...@@ -1213,6 +1221,8 @@ static int __init inet6_init(void)
igmp6_late_cleanup(); igmp6_late_cleanup();
#endif #endif
igmp6_late_err: igmp6_late_err:
ioam6_exit();
ioam6_fail:
rpl_exit(); rpl_exit();
rpl_fail: rpl_fail:
seg6_exit(); seg6_exit();
......
...@@ -49,6 +49,9 @@ ...@@ -49,6 +49,9 @@
#include <net/seg6_hmac.h> #include <net/seg6_hmac.h>
#endif #endif
#include <net/rpl.h> #include <net/rpl.h>
#include <linux/ioam6.h>
#include <net/ioam6.h>
#include <net/dst_metadata.h>
#include <linux/uaccess.h> #include <linux/uaccess.h>
...@@ -928,6 +931,60 @@ static bool ipv6_hop_ra(struct sk_buff *skb, int optoff) ...@@ -928,6 +931,60 @@ static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
return false; return false;
} }
/* IOAM */
static bool ipv6_hop_ioam(struct sk_buff *skb, int optoff)
{
struct ioam6_trace_hdr *trace;
struct ioam6_namespace *ns;
struct ioam6_hdr *hdr;
/* Bad alignment (must be 4n-aligned) */
if (optoff & 3)
goto drop;
/* Ignore if IOAM is not enabled on ingress */
if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
goto ignore;
/* Truncated Option header */
hdr = (struct ioam6_hdr *)(skb_network_header(skb) + optoff);
if (hdr->opt_len < 2)
goto drop;
switch (hdr->type) {
case IOAM6_TYPE_PREALLOC:
/* Truncated Pre-allocated Trace header */
if (hdr->opt_len < 2 + sizeof(*trace))
goto drop;
/* Malformed Pre-allocated Trace header */
trace = (struct ioam6_trace_hdr *)((u8 *)hdr + sizeof(*hdr));
if (hdr->opt_len < 2 + sizeof(*trace) + trace->remlen * 4)
goto drop;
/* Ignore if the IOAM namespace is unknown */
ns = ioam6_namespace(ipv6_skb_net(skb), trace->namespace_id);
if (!ns)
goto ignore;
if (!skb_valid_dst(skb))
ip6_route_input(skb);
ioam6_fill_trace_data(skb, ns, trace);
break;
default:
break;
}
ignore:
return true;
drop:
kfree_skb(skb);
return false;
}
/* Jumbo payload */ /* Jumbo payload */
static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff) static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
...@@ -999,6 +1056,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = { ...@@ -999,6 +1056,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
.type = IPV6_TLV_ROUTERALERT, .type = IPV6_TLV_ROUTERALERT,
.func = ipv6_hop_ra, .func = ipv6_hop_ra,
}, },
{
.type = IPV6_TLV_IOAM,
.func = ipv6_hop_ioam,
},
{ {
.type = IPV6_TLV_JUMBO, .type = IPV6_TLV_JUMBO,
.func = ipv6_hop_jumbo, .func = ipv6_hop_jumbo,
......
// SPDX-License-Identifier: GPL-2.0+
/*
* IPv6 IOAM implementation
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#include <linux/errno.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/net.h>
#include <linux/ioam6.h>
#include <linux/ioam6_genl.h>
#include <linux/rhashtable.h>
#include <net/addrconf.h>
#include <net/genetlink.h>
#include <net/ioam6.h>
static void ioam6_ns_release(struct ioam6_namespace *ns)
{
kfree_rcu(ns, rcu);
}
static void ioam6_sc_release(struct ioam6_schema *sc)
{
kfree_rcu(sc, rcu);
}
static void ioam6_free_ns(void *ptr, void *arg)
{
struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
if (ns)
ioam6_ns_release(ns);
}
static void ioam6_free_sc(void *ptr, void *arg)
{
struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
if (sc)
ioam6_sc_release(sc);
}
static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
{
const struct ioam6_namespace *ns = obj;
return (ns->id != *(__be16 *)arg->key);
}
static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
{
const struct ioam6_schema *sc = obj;
return (sc->id != *(u32 *)arg->key);
}
static const struct rhashtable_params rht_ns_params = {
.key_len = sizeof(__be16),
.key_offset = offsetof(struct ioam6_namespace, id),
.head_offset = offsetof(struct ioam6_namespace, head),
.automatic_shrinking = true,
.obj_cmpfn = ioam6_ns_cmpfn,
};
static const struct rhashtable_params rht_sc_params = {
.key_len = sizeof(u32),
.key_offset = offsetof(struct ioam6_schema, id),
.head_offset = offsetof(struct ioam6_schema, head),
.automatic_shrinking = true,
.obj_cmpfn = ioam6_sc_cmpfn,
};
static struct genl_family ioam6_genl_family;
static const struct nla_policy ioam6_genl_policy_addns[] = {
[IOAM6_ATTR_NS_ID] = { .type = NLA_U16 },
[IOAM6_ATTR_NS_DATA] = { .type = NLA_U32 },
[IOAM6_ATTR_NS_DATA_WIDE] = { .type = NLA_U64 },
};
static const struct nla_policy ioam6_genl_policy_delns[] = {
[IOAM6_ATTR_NS_ID] = { .type = NLA_U16 },
};
static const struct nla_policy ioam6_genl_policy_addsc[] = {
[IOAM6_ATTR_SC_ID] = { .type = NLA_U32 },
[IOAM6_ATTR_SC_DATA] = { .type = NLA_BINARY,
.len = IOAM6_MAX_SCHEMA_DATA_LEN },
};
static const struct nla_policy ioam6_genl_policy_delsc[] = {
[IOAM6_ATTR_SC_ID] = { .type = NLA_U32 },
};
static const struct nla_policy ioam6_genl_policy_ns_sc[] = {
[IOAM6_ATTR_NS_ID] = { .type = NLA_U16 },
[IOAM6_ATTR_SC_ID] = { .type = NLA_U32 },
[IOAM6_ATTR_SC_NONE] = { .type = NLA_FLAG },
};
static int ioam6_genl_addns(struct sk_buff *skb, struct genl_info *info)
{
struct ioam6_pernet_data *nsdata;
struct ioam6_namespace *ns;
u64 data64;
u32 data32;
__be16 id;
int err;
if (!info->attrs[IOAM6_ATTR_NS_ID])
return -EINVAL;
id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
nsdata = ioam6_pernet(genl_info_net(info));
mutex_lock(&nsdata->lock);
ns = rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
if (ns) {
err = -EEXIST;
goto out_unlock;
}
ns = kzalloc(sizeof(*ns), GFP_KERNEL);
if (!ns) {
err = -ENOMEM;
goto out_unlock;
}
ns->id = id;
if (!info->attrs[IOAM6_ATTR_NS_DATA])
data32 = IOAM6_U32_UNAVAILABLE;
else
data32 = nla_get_u32(info->attrs[IOAM6_ATTR_NS_DATA]);
if (!info->attrs[IOAM6_ATTR_NS_DATA_WIDE])
data64 = IOAM6_U64_UNAVAILABLE;
else
data64 = nla_get_u64(info->attrs[IOAM6_ATTR_NS_DATA_WIDE]);
ns->data = cpu_to_be32(data32);
ns->data_wide = cpu_to_be64(data64);
err = rhashtable_lookup_insert_fast(&nsdata->namespaces, &ns->head,
rht_ns_params);
if (err)
kfree(ns);
out_unlock:
mutex_unlock(&nsdata->lock);
return err;
}
static int ioam6_genl_delns(struct sk_buff *skb, struct genl_info *info)
{
struct ioam6_pernet_data *nsdata;
struct ioam6_namespace *ns;
struct ioam6_schema *sc;
__be16 id;
int err;
if (!info->attrs[IOAM6_ATTR_NS_ID])
return -EINVAL;
id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
nsdata = ioam6_pernet(genl_info_net(info));
mutex_lock(&nsdata->lock);
ns = rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
if (!ns) {
err = -ENOENT;
goto out_unlock;
}
sc = rcu_dereference_protected(ns->schema,
lockdep_is_held(&nsdata->lock));
err = rhashtable_remove_fast(&nsdata->namespaces, &ns->head,
rht_ns_params);
if (err)
goto out_unlock;
if (sc)
rcu_assign_pointer(sc->ns, NULL);
ioam6_ns_release(ns);
out_unlock:
mutex_unlock(&nsdata->lock);
return err;
}
static int __ioam6_genl_dumpns_element(struct ioam6_namespace *ns,
u32 portid,
u32 seq,
u32 flags,
struct sk_buff *skb,
u8 cmd)
{
struct ioam6_schema *sc;
u64 data64;
u32 data32;
void *hdr;
hdr = genlmsg_put(skb, portid, seq, &ioam6_genl_family, flags, cmd);
if (!hdr)
return -ENOMEM;
data32 = be32_to_cpu(ns->data);
data64 = be64_to_cpu(ns->data_wide);
if (nla_put_u16(skb, IOAM6_ATTR_NS_ID, be16_to_cpu(ns->id)) ||
(data32 != IOAM6_U32_UNAVAILABLE &&
nla_put_u32(skb, IOAM6_ATTR_NS_DATA, data32)) ||
(data64 != IOAM6_U64_UNAVAILABLE &&
nla_put_u64_64bit(skb, IOAM6_ATTR_NS_DATA_WIDE,
data64, IOAM6_ATTR_PAD)))
goto nla_put_failure;
rcu_read_lock();
sc = rcu_dereference(ns->schema);
if (sc && nla_put_u32(skb, IOAM6_ATTR_SC_ID, sc->id)) {
rcu_read_unlock();
goto nla_put_failure;
}
rcu_read_unlock();
genlmsg_end(skb, hdr);
return 0;
nla_put_failure:
genlmsg_cancel(skb, hdr);
return -EMSGSIZE;
}
static int ioam6_genl_dumpns_start(struct netlink_callback *cb)
{
struct ioam6_pernet_data *nsdata = ioam6_pernet(sock_net(cb->skb->sk));
struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0];
if (!iter) {
iter = kmalloc(sizeof(*iter), GFP_KERNEL);
if (!iter)
return -ENOMEM;
cb->args[0] = (long)iter;
}
rhashtable_walk_enter(&nsdata->namespaces, iter);
return 0;
}
static int ioam6_genl_dumpns_done(struct netlink_callback *cb)
{
struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0];
rhashtable_walk_exit(iter);
kfree(iter);
return 0;
}
static int ioam6_genl_dumpns(struct sk_buff *skb, struct netlink_callback *cb)
{
struct rhashtable_iter *iter;
struct ioam6_namespace *ns;
int err;
iter = (struct rhashtable_iter *)cb->args[0];
rhashtable_walk_start(iter);
for (;;) {
ns = rhashtable_walk_next(iter);
if (IS_ERR(ns)) {
if (PTR_ERR(ns) == -EAGAIN)
continue;
err = PTR_ERR(ns);
goto done;
} else if (!ns) {
break;
}
err = __ioam6_genl_dumpns_element(ns,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
NLM_F_MULTI,
skb,
IOAM6_CMD_DUMP_NAMESPACES);
if (err)
goto done;
}
err = skb->len;
done:
rhashtable_walk_stop(iter);
return err;
}
static int ioam6_genl_addsc(struct sk_buff *skb, struct genl_info *info)
{
struct ioam6_pernet_data *nsdata;
int len, len_aligned, err;
struct ioam6_schema *sc;
u32 id;
if (!info->attrs[IOAM6_ATTR_SC_ID] || !info->attrs[IOAM6_ATTR_SC_DATA])
return -EINVAL;
id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
nsdata = ioam6_pernet(genl_info_net(info));
mutex_lock(&nsdata->lock);
sc = rhashtable_lookup_fast(&nsdata->schemas, &id, rht_sc_params);
if (sc) {
err = -EEXIST;
goto out_unlock;
}
len = nla_len(info->attrs[IOAM6_ATTR_SC_DATA]);
len_aligned = ALIGN(len, 4);
sc = kzalloc(sizeof(*sc) + len_aligned, GFP_KERNEL);
if (!sc) {
err = -ENOMEM;
goto out_unlock;
}
sc->id = id;
sc->len = len_aligned;
sc->hdr = cpu_to_be32(sc->id | ((u8)(sc->len / 4) << 24));
nla_memcpy(sc->data, info->attrs[IOAM6_ATTR_SC_DATA], len);
err = rhashtable_lookup_insert_fast(&nsdata->schemas, &sc->head,
rht_sc_params);
if (err)
goto free_sc;
out_unlock:
mutex_unlock(&nsdata->lock);
return err;
free_sc:
kfree(sc);
goto out_unlock;
}
static int ioam6_genl_delsc(struct sk_buff *skb, struct genl_info *info)
{
struct ioam6_pernet_data *nsdata;
struct ioam6_namespace *ns;
struct ioam6_schema *sc;
int err;
u32 id;
if (!info->attrs[IOAM6_ATTR_SC_ID])
return -EINVAL;
id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
nsdata = ioam6_pernet(genl_info_net(info));
mutex_lock(&nsdata->lock);
sc = rhashtable_lookup_fast(&nsdata->schemas, &id, rht_sc_params);
if (!sc) {
err = -ENOENT;
goto out_unlock;
}
ns = rcu_dereference_protected(sc->ns, lockdep_is_held(&nsdata->lock));
err = rhashtable_remove_fast(&nsdata->schemas, &sc->head,
rht_sc_params);
if (err)
goto out_unlock;
if (ns)
rcu_assign_pointer(ns->schema, NULL);
ioam6_sc_release(sc);
out_unlock:
mutex_unlock(&nsdata->lock);
return err;
}
static int __ioam6_genl_dumpsc_element(struct ioam6_schema *sc,
u32 portid, u32 seq, u32 flags,
struct sk_buff *skb, u8 cmd)
{
struct ioam6_namespace *ns;
void *hdr;
hdr = genlmsg_put(skb, portid, seq, &ioam6_genl_family, flags, cmd);
if (!hdr)
return -ENOMEM;
if (nla_put_u32(skb, IOAM6_ATTR_SC_ID, sc->id) ||
nla_put(skb, IOAM6_ATTR_SC_DATA, sc->len, sc->data))
goto nla_put_failure;
rcu_read_lock();
ns = rcu_dereference(sc->ns);
if (ns && nla_put_u16(skb, IOAM6_ATTR_NS_ID, be16_to_cpu(ns->id))) {
rcu_read_unlock();
goto nla_put_failure;
}
rcu_read_unlock();
genlmsg_end(skb, hdr);
return 0;
nla_put_failure:
genlmsg_cancel(skb, hdr);
return -EMSGSIZE;
}
static int ioam6_genl_dumpsc_start(struct netlink_callback *cb)
{
struct ioam6_pernet_data *nsdata = ioam6_pernet(sock_net(cb->skb->sk));
struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0];
if (!iter) {
iter = kmalloc(sizeof(*iter), GFP_KERNEL);
if (!iter)
return -ENOMEM;
cb->args[0] = (long)iter;
}
rhashtable_walk_enter(&nsdata->schemas, iter);
return 0;
}
static int ioam6_genl_dumpsc_done(struct netlink_callback *cb)
{
struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0];
rhashtable_walk_exit(iter);
kfree(iter);
return 0;
}
static int ioam6_genl_dumpsc(struct sk_buff *skb, struct netlink_callback *cb)
{
struct rhashtable_iter *iter;
struct ioam6_schema *sc;
int err;
iter = (struct rhashtable_iter *)cb->args[0];
rhashtable_walk_start(iter);
for (;;) {
sc = rhashtable_walk_next(iter);
if (IS_ERR(sc)) {
if (PTR_ERR(sc) == -EAGAIN)
continue;
err = PTR_ERR(sc);
goto done;
} else if (!sc) {
break;
}
err = __ioam6_genl_dumpsc_element(sc,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
NLM_F_MULTI,
skb,
IOAM6_CMD_DUMP_SCHEMAS);
if (err)
goto done;
}
err = skb->len;
done:
rhashtable_walk_stop(iter);
return err;
}
static int ioam6_genl_ns_set_schema(struct sk_buff *skb, struct genl_info *info)
{
struct ioam6_namespace *ns, *ns_ref;
struct ioam6_schema *sc, *sc_ref;
struct ioam6_pernet_data *nsdata;
__be16 ns_id;
u32 sc_id;
int err;
if (!info->attrs[IOAM6_ATTR_NS_ID] ||
(!info->attrs[IOAM6_ATTR_SC_ID] &&
!info->attrs[IOAM6_ATTR_SC_NONE]))
return -EINVAL;
ns_id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
nsdata = ioam6_pernet(genl_info_net(info));
mutex_lock(&nsdata->lock);
ns = rhashtable_lookup_fast(&nsdata->namespaces, &ns_id, rht_ns_params);
if (!ns) {
err = -ENOENT;
goto out_unlock;
}
if (info->attrs[IOAM6_ATTR_SC_NONE]) {
sc = NULL;
} else {
sc_id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
sc = rhashtable_lookup_fast(&nsdata->schemas, &sc_id,
rht_sc_params);
if (!sc) {
err = -ENOENT;
goto out_unlock;
}
}
sc_ref = rcu_dereference_protected(ns->schema,
lockdep_is_held(&nsdata->lock));
if (sc_ref)
rcu_assign_pointer(sc_ref->ns, NULL);
rcu_assign_pointer(ns->schema, sc);
if (sc) {
ns_ref = rcu_dereference_protected(sc->ns,
lockdep_is_held(&nsdata->lock));
if (ns_ref)
rcu_assign_pointer(ns_ref->schema, NULL);
rcu_assign_pointer(sc->ns, ns);
}
err = 0;
out_unlock:
mutex_unlock(&nsdata->lock);
return err;
}
static const struct genl_ops ioam6_genl_ops[] = {
{
.cmd = IOAM6_CMD_ADD_NAMESPACE,
.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.doit = ioam6_genl_addns,
.flags = GENL_ADMIN_PERM,
.policy = ioam6_genl_policy_addns,
.maxattr = ARRAY_SIZE(ioam6_genl_policy_addns) - 1,
},
{
.cmd = IOAM6_CMD_DEL_NAMESPACE,
.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.doit = ioam6_genl_delns,
.flags = GENL_ADMIN_PERM,
.policy = ioam6_genl_policy_delns,
.maxattr = ARRAY_SIZE(ioam6_genl_policy_delns) - 1,
},
{
.cmd = IOAM6_CMD_DUMP_NAMESPACES,
.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.start = ioam6_genl_dumpns_start,
.dumpit = ioam6_genl_dumpns,
.done = ioam6_genl_dumpns_done,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = IOAM6_CMD_ADD_SCHEMA,
.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.doit = ioam6_genl_addsc,
.flags = GENL_ADMIN_PERM,
.policy = ioam6_genl_policy_addsc,
.maxattr = ARRAY_SIZE(ioam6_genl_policy_addsc) - 1,
},
{
.cmd = IOAM6_CMD_DEL_SCHEMA,
.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.doit = ioam6_genl_delsc,
.flags = GENL_ADMIN_PERM,
.policy = ioam6_genl_policy_delsc,
.maxattr = ARRAY_SIZE(ioam6_genl_policy_delsc) - 1,
},
{
.cmd = IOAM6_CMD_DUMP_SCHEMAS,
.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.start = ioam6_genl_dumpsc_start,
.dumpit = ioam6_genl_dumpsc,
.done = ioam6_genl_dumpsc_done,
.flags = GENL_ADMIN_PERM,
},
{
.cmd = IOAM6_CMD_NS_SET_SCHEMA,
.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.doit = ioam6_genl_ns_set_schema,
.flags = GENL_ADMIN_PERM,
.policy = ioam6_genl_policy_ns_sc,
.maxattr = ARRAY_SIZE(ioam6_genl_policy_ns_sc) - 1,
},
};
static struct genl_family ioam6_genl_family __ro_after_init = {
.name = IOAM6_GENL_NAME,
.version = IOAM6_GENL_VERSION,
.netnsok = true,
.parallel_ops = true,
.ops = ioam6_genl_ops,
.n_ops = ARRAY_SIZE(ioam6_genl_ops),
.module = THIS_MODULE,
};
struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
{
struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
}
static void __ioam6_fill_trace_data(struct sk_buff *skb,
struct ioam6_namespace *ns,
struct ioam6_trace_hdr *trace,
struct ioam6_schema *sc,
u8 sclen)
{
struct __kernel_sock_timeval ts;
u64 raw64;
u32 raw32;
u16 raw16;
u8 *data;
u8 byte;
data = trace->data + trace->remlen * 4 - trace->nodelen * 4 - sclen * 4;
/* hop_lim and node_id */
if (trace->type.bit0) {
byte = ipv6_hdr(skb)->hop_limit;
if (skb->dev)
byte--;
raw32 = dev_net(skb_dst(skb)->dev)->ipv6.sysctl.ioam6_id;
*(__be32 *)data = cpu_to_be32((byte << 24) | raw32);
data += sizeof(__be32);
}
/* ingress_if_id and egress_if_id */
if (trace->type.bit1) {
if (!skb->dev)
raw16 = IOAM6_U16_UNAVAILABLE;
else
raw16 = (__force u16)__in6_dev_get(skb->dev)->cnf.ioam6_id;
*(__be16 *)data = cpu_to_be16(raw16);
data += sizeof(__be16);
if (skb_dst(skb)->dev->flags & IFF_LOOPBACK)
raw16 = IOAM6_U16_UNAVAILABLE;
else
raw16 = (__force u16)__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
*(__be16 *)data = cpu_to_be16(raw16);
data += sizeof(__be16);
}
/* timestamp seconds */
if (trace->type.bit2) {
if (!skb->dev) {
*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
} else {
if (!skb->tstamp)
__net_timestamp(skb);
skb_get_new_timestamp(skb, &ts);
*(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
}
data += sizeof(__be32);
}
/* timestamp subseconds */
if (trace->type.bit3) {
if (!skb->dev) {
*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
} else {
if (!skb->tstamp)
__net_timestamp(skb);
if (!trace->type.bit2)
skb_get_new_timestamp(skb, &ts);
*(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
}
data += sizeof(__be32);
}
/* transit delay */
if (trace->type.bit4) {
*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
data += sizeof(__be32);
}
/* namespace data */
if (trace->type.bit5) {
*(__be32 *)data = ns->data;
data += sizeof(__be32);
}
/* queue depth */
if (trace->type.bit6) {
*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
data += sizeof(__be32);
}
/* checksum complement */
if (trace->type.bit7) {
*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
data += sizeof(__be32);
}
/* hop_lim and node_id (wide) */
if (trace->type.bit8) {
byte = ipv6_hdr(skb)->hop_limit;
if (skb->dev)
byte--;
raw64 = dev_net(skb_dst(skb)->dev)->ipv6.sysctl.ioam6_id_wide;
*(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw64);
data += sizeof(__be64);
}
/* ingress_if_id and egress_if_id (wide) */
if (trace->type.bit9) {
if (!skb->dev)
raw32 = IOAM6_U32_UNAVAILABLE;
else
raw32 = __in6_dev_get(skb->dev)->cnf.ioam6_id_wide;
*(__be32 *)data = cpu_to_be32(raw32);
data += sizeof(__be32);
if (skb_dst(skb)->dev->flags & IFF_LOOPBACK)
raw32 = IOAM6_U32_UNAVAILABLE;
else
raw32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id_wide;
*(__be32 *)data = cpu_to_be32(raw32);
data += sizeof(__be32);
}
/* namespace data (wide) */
if (trace->type.bit10) {
*(__be64 *)data = ns->data_wide;
data += sizeof(__be64);
}
/* buffer occupancy */
if (trace->type.bit11) {
*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE);
data += sizeof(__be32);
}
/* opaque state snapshot */
if (trace->type.bit22) {
if (!sc) {
*(__be32 *)data = cpu_to_be32(IOAM6_U32_UNAVAILABLE >> 8);
} else {
*(__be32 *)data = sc->hdr;
data += sizeof(__be32);
memcpy(data, sc->data, sc->len);
}
}
}
/* called with rcu_read_lock() */
void ioam6_fill_trace_data(struct sk_buff *skb,
struct ioam6_namespace *ns,
struct ioam6_trace_hdr *trace)
{
struct ioam6_schema *sc;
u8 sclen = 0;
/* Skip if Overflow flag is set OR
* if an unknown type (bit 12-21) is set
*/
if (trace->overflow ||
trace->type.bit12 | trace->type.bit13 | trace->type.bit14 |
trace->type.bit15 | trace->type.bit16 | trace->type.bit17 |
trace->type.bit18 | trace->type.bit19 | trace->type.bit20 |
trace->type.bit21) {
return;
}
/* NodeLen does not include Opaque State Snapshot length. We need to
* take it into account if the corresponding bit is set (bit 22) and
* if the current IOAM namespace has an active schema attached to it
*/
sc = rcu_dereference(ns->schema);
if (trace->type.bit22) {
sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
if (sc)
sclen += sc->len / 4;
}
/* If there is no space remaining, we set the Overflow flag and we
* skip without filling the trace
*/
if (!trace->remlen || trace->remlen < trace->nodelen + sclen) {
trace->overflow = 1;
return;
}
__ioam6_fill_trace_data(skb, ns, trace, sc, sclen);
trace->remlen -= trace->nodelen + sclen;
}
static int __net_init ioam6_net_init(struct net *net)
{
struct ioam6_pernet_data *nsdata;
int err = -ENOMEM;
nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
if (!nsdata)
goto out;
mutex_init(&nsdata->lock);
net->ipv6.ioam6_data = nsdata;
err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
if (err)
goto free_nsdata;
err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
if (err)
goto free_rht_ns;
out:
return err;
free_rht_ns:
rhashtable_destroy(&nsdata->namespaces);
free_nsdata:
kfree(nsdata);
net->ipv6.ioam6_data = NULL;
goto out;
}
static void __net_exit ioam6_net_exit(struct net *net)
{
struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
kfree(nsdata);
}
static struct pernet_operations ioam6_net_ops = {
.init = ioam6_net_init,
.exit = ioam6_net_exit,
};
int __init ioam6_init(void)
{
int err = register_pernet_subsys(&ioam6_net_ops);
if (err)
goto out;
err = genl_register_family(&ioam6_genl_family);
if (err)
goto out_unregister_pernet_subsys;
#ifdef CONFIG_IPV6_IOAM6_LWTUNNEL
err = ioam6_iptunnel_init();
if (err)
goto out_unregister_genl;
#endif
pr_info("In-situ OAM (IOAM) with IPv6\n");
out:
return err;
#ifdef CONFIG_IPV6_IOAM6_LWTUNNEL
out_unregister_genl:
genl_unregister_family(&ioam6_genl_family);
#endif
out_unregister_pernet_subsys:
unregister_pernet_subsys(&ioam6_net_ops);
goto out;
}
void ioam6_exit(void)
{
#ifdef CONFIG_IPV6_IOAM6_LWTUNNEL
ioam6_iptunnel_exit();
#endif
genl_unregister_family(&ioam6_genl_family);
unregister_pernet_subsys(&ioam6_net_ops);
}
// SPDX-License-Identifier: GPL-2.0+
/*
* IPv6 IOAM Lightweight Tunnel implementation
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#include <linux/kernel.h>
#include <linux/skbuff.h>
#include <linux/net.h>
#include <linux/netlink.h>
#include <linux/in6.h>
#include <linux/ioam6.h>
#include <linux/ioam6_iptunnel.h>
#include <net/dst.h>
#include <net/sock.h>
#include <net/lwtunnel.h>
#include <net/ioam6.h>
#define IOAM6_MASK_SHORT_FIELDS 0xff100000
#define IOAM6_MASK_WIDE_FIELDS 0xe00000
struct ioam6_lwt_encap {
struct ipv6_hopopt_hdr eh;
u8 pad[2]; /* 2-octet padding for 4n-alignment */
struct ioam6_hdr ioamh;
struct ioam6_trace_hdr traceh;
} __packed;
struct ioam6_lwt {
struct ioam6_lwt_encap tuninfo;
};
static struct ioam6_lwt *ioam6_lwt_state(struct lwtunnel_state *lwt)
{
return (struct ioam6_lwt *)lwt->data;
}
static struct ioam6_lwt_encap *ioam6_lwt_info(struct lwtunnel_state *lwt)
{
return &ioam6_lwt_state(lwt)->tuninfo;
}
static struct ioam6_trace_hdr *ioam6_trace(struct lwtunnel_state *lwt)
{
return &(ioam6_lwt_state(lwt)->tuninfo.traceh);
}
static const struct nla_policy ioam6_iptunnel_policy[IOAM6_IPTUNNEL_MAX + 1] = {
[IOAM6_IPTUNNEL_TRACE] = NLA_POLICY_EXACT_LEN(sizeof(struct ioam6_trace_hdr)),
};
static int nla_put_ioam6_trace(struct sk_buff *skb, int attrtype,
struct ioam6_trace_hdr *trace)
{
struct ioam6_trace_hdr *data;
struct nlattr *nla;
int len;
len = sizeof(*trace);
nla = nla_reserve(skb, attrtype, len);
if (!nla)
return -EMSGSIZE;
data = nla_data(nla);
memcpy(data, trace, len);
return 0;
}
static bool ioam6_validate_trace_hdr(struct ioam6_trace_hdr *trace)
{
u32 fields;
if (!trace->type_be32 || !trace->remlen ||
trace->remlen > IOAM6_TRACE_DATA_SIZE_MAX / 4)
return false;
trace->nodelen = 0;
fields = be32_to_cpu(trace->type_be32);
trace->nodelen += hweight32(fields & IOAM6_MASK_SHORT_FIELDS)
* (sizeof(__be32) / 4);
trace->nodelen += hweight32(fields & IOAM6_MASK_WIDE_FIELDS)
* (sizeof(__be64) / 4);
return true;
}
static int ioam6_build_state(struct net *net, struct nlattr *nla,
unsigned int family, const void *cfg,
struct lwtunnel_state **ts,
struct netlink_ext_ack *extack)
{
struct nlattr *tb[IOAM6_IPTUNNEL_MAX + 1];
struct ioam6_lwt_encap *tuninfo;
struct ioam6_trace_hdr *trace;
struct lwtunnel_state *s;
int len_aligned;
int len, err;
if (family != AF_INET6)
return -EINVAL;
err = nla_parse_nested(tb, IOAM6_IPTUNNEL_MAX, nla,
ioam6_iptunnel_policy, extack);
if (err < 0)
return err;
if (!tb[IOAM6_IPTUNNEL_TRACE]) {
NL_SET_ERR_MSG(extack, "missing trace");
return -EINVAL;
}
trace = nla_data(tb[IOAM6_IPTUNNEL_TRACE]);
if (!ioam6_validate_trace_hdr(trace)) {
NL_SET_ERR_MSG_ATTR(extack, tb[IOAM6_IPTUNNEL_TRACE],
"invalid trace validation");
return -EINVAL;
}
len = sizeof(*tuninfo) + trace->remlen * 4;
len_aligned = ALIGN(len, 8);
s = lwtunnel_state_alloc(len_aligned);
if (!s)
return -ENOMEM;
tuninfo = ioam6_lwt_info(s);
tuninfo->eh.hdrlen = (len_aligned >> 3) - 1;
tuninfo->pad[0] = IPV6_TLV_PADN;
tuninfo->ioamh.type = IOAM6_TYPE_PREALLOC;
tuninfo->ioamh.opt_type = IPV6_TLV_IOAM;
tuninfo->ioamh.opt_len = sizeof(tuninfo->ioamh) - 2 + sizeof(*trace)
+ trace->remlen * 4;
memcpy(&tuninfo->traceh, trace, sizeof(*trace));
len = len_aligned - len;
if (len == 1) {
tuninfo->traceh.data[trace->remlen * 4] = IPV6_TLV_PAD1;
} else if (len > 0) {
tuninfo->traceh.data[trace->remlen * 4] = IPV6_TLV_PADN;
tuninfo->traceh.data[trace->remlen * 4 + 1] = len - 2;
}
s->type = LWTUNNEL_ENCAP_IOAM6;
s->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT;
*ts = s;
return 0;
}
static int ioam6_do_inline(struct sk_buff *skb, struct ioam6_lwt_encap *tuninfo)
{
struct ioam6_trace_hdr *trace;
struct ipv6hdr *oldhdr, *hdr;
struct ioam6_namespace *ns;
int hdrlen, err;
hdrlen = (tuninfo->eh.hdrlen + 1) << 3;
err = skb_cow_head(skb, hdrlen + skb->mac_len);
if (unlikely(err))
return err;
oldhdr = ipv6_hdr(skb);
skb_pull(skb, sizeof(*oldhdr));
skb_postpull_rcsum(skb, skb_network_header(skb), sizeof(*oldhdr));
skb_push(skb, sizeof(*oldhdr) + hdrlen);
skb_reset_network_header(skb);
skb_mac_header_rebuild(skb);
hdr = ipv6_hdr(skb);
memmove(hdr, oldhdr, sizeof(*oldhdr));
tuninfo->eh.nexthdr = hdr->nexthdr;
skb_set_transport_header(skb, sizeof(*hdr));
skb_postpush_rcsum(skb, hdr, sizeof(*hdr) + hdrlen);
memcpy(skb_transport_header(skb), (u8 *)tuninfo, hdrlen);
hdr->nexthdr = NEXTHDR_HOP;
hdr->payload_len = cpu_to_be16(skb->len - sizeof(*hdr));
trace = (struct ioam6_trace_hdr *)(skb_transport_header(skb)
+ sizeof(struct ipv6_hopopt_hdr) + 2
+ sizeof(struct ioam6_hdr));
ns = ioam6_namespace(dev_net(skb_dst(skb)->dev), trace->namespace_id);
if (ns)
ioam6_fill_trace_data(skb, ns, trace);
return 0;
}
static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
{
struct lwtunnel_state *lwt = skb_dst(skb)->lwtstate;
int err = -EINVAL;
if (skb->protocol != htons(ETH_P_IPV6))
goto drop;
/* Only for packets we send and
* that do not contain a Hop-by-Hop yet
*/
if (skb->dev || ipv6_hdr(skb)->nexthdr == NEXTHDR_HOP)
goto out;
err = ioam6_do_inline(skb, ioam6_lwt_info(lwt));
if (unlikely(err))
goto drop;
err = skb_cow_head(skb, LL_RESERVED_SPACE(skb_dst(skb)->dev));
if (unlikely(err))
goto drop;
out:
return lwt->orig_output(net, sk, skb);
drop:
kfree_skb(skb);
return err;
}
static int ioam6_fill_encap_info(struct sk_buff *skb,
struct lwtunnel_state *lwtstate)
{
struct ioam6_trace_hdr *trace = ioam6_trace(lwtstate);
if (nla_put_ioam6_trace(skb, IOAM6_IPTUNNEL_TRACE, trace))
return -EMSGSIZE;
return 0;
}
static int ioam6_encap_nlsize(struct lwtunnel_state *lwtstate)
{
struct ioam6_trace_hdr *trace = ioam6_trace(lwtstate);
return nla_total_size(sizeof(*trace));
}
static int ioam6_encap_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
{
struct ioam6_trace_hdr *a_hdr = ioam6_trace(a);
struct ioam6_trace_hdr *b_hdr = ioam6_trace(b);
return (a_hdr->namespace_id != b_hdr->namespace_id);
}
static const struct lwtunnel_encap_ops ioam6_iptun_ops = {
.build_state = ioam6_build_state,
.output = ioam6_output,
.fill_encap = ioam6_fill_encap_info,
.get_encap_size = ioam6_encap_nlsize,
.cmp_encap = ioam6_encap_cmp,
.owner = THIS_MODULE,
};
int __init ioam6_iptunnel_init(void)
{
return lwtunnel_encap_add_ops(&ioam6_iptun_ops, LWTUNNEL_ENCAP_IOAM6);
}
void ioam6_iptunnel_exit(void)
{
lwtunnel_encap_del_ops(&ioam6_iptun_ops, LWTUNNEL_ENCAP_IOAM6);
}
...@@ -21,6 +21,7 @@ ...@@ -21,6 +21,7 @@
#ifdef CONFIG_NETLABEL #ifdef CONFIG_NETLABEL
#include <net/calipso.h> #include <net/calipso.h>
#endif #endif
#include <linux/ioam6.h>
static int two = 2; static int two = 2;
static int three = 3; static int three = 3;
...@@ -28,6 +29,8 @@ static int flowlabel_reflect_max = 0x7; ...@@ -28,6 +29,8 @@ static int flowlabel_reflect_max = 0x7;
static int auto_flowlabels_max = IP6_AUTO_FLOW_LABEL_MAX; static int auto_flowlabels_max = IP6_AUTO_FLOW_LABEL_MAX;
static u32 rt6_multipath_hash_fields_all_mask = static u32 rt6_multipath_hash_fields_all_mask =
FIB_MULTIPATH_HASH_FIELD_ALL_MASK; FIB_MULTIPATH_HASH_FIELD_ALL_MASK;
static u32 ioam6_id_max = IOAM6_DEFAULT_ID;
static u64 ioam6_id_wide_max = IOAM6_DEFAULT_ID_WIDE;
static int proc_rt6_multipath_hash_policy(struct ctl_table *table, int write, static int proc_rt6_multipath_hash_policy(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos) void *buffer, size_t *lenp, loff_t *ppos)
...@@ -196,6 +199,22 @@ static struct ctl_table ipv6_table_template[] = { ...@@ -196,6 +199,22 @@ static struct ctl_table ipv6_table_template[] = {
.extra1 = SYSCTL_ZERO, .extra1 = SYSCTL_ZERO,
.extra2 = &two, .extra2 = &two,
}, },
{
.procname = "ioam6_id",
.data = &init_net.ipv6.sysctl.ioam6_id,
.maxlen = sizeof(u32),
.mode = 0644,
.proc_handler = proc_douintvec_minmax,
.extra2 = &ioam6_id_max,
},
{
.procname = "ioam6_id_wide",
.data = &init_net.ipv6.sysctl.ioam6_id_wide,
.maxlen = sizeof(u64),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
.extra2 = &ioam6_id_wide_max,
},
{ } { }
}; };
......
...@@ -25,6 +25,7 @@ TEST_PROGS += bareudp.sh ...@@ -25,6 +25,7 @@ TEST_PROGS += bareudp.sh
TEST_PROGS += unicast_extensions.sh TEST_PROGS += unicast_extensions.sh
TEST_PROGS += udpgro_fwd.sh TEST_PROGS += udpgro_fwd.sh
TEST_PROGS += veth.sh TEST_PROGS += veth.sh
TEST_PROGS += ioam6.sh
TEST_PROGS_EXTENDED := in_netns.sh TEST_PROGS_EXTENDED := in_netns.sh
TEST_GEN_FILES = socket nettest TEST_GEN_FILES = socket nettest
TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any
...@@ -36,6 +37,7 @@ TEST_GEN_FILES += fin_ack_lat ...@@ -36,6 +37,7 @@ TEST_GEN_FILES += fin_ack_lat
TEST_GEN_FILES += reuseaddr_ports_exhausted TEST_GEN_FILES += reuseaddr_ports_exhausted
TEST_GEN_FILES += hwtstamp_config rxtimestamp timestamping txtimestamp TEST_GEN_FILES += hwtstamp_config rxtimestamp timestamping txtimestamp
TEST_GEN_FILES += ipsec TEST_GEN_FILES += ipsec
TEST_GEN_FILES += ioam6_parser
TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls
......
...@@ -42,3 +42,4 @@ CONFIG_NET_CLS_FLOWER=m ...@@ -42,3 +42,4 @@ CONFIG_NET_CLS_FLOWER=m
CONFIG_NET_ACT_TUNNEL_KEY=m CONFIG_NET_ACT_TUNNEL_KEY=m
CONFIG_NET_ACT_MIRRED=m CONFIG_NET_ACT_MIRRED=m
CONFIG_BAREUDP=m CONFIG_BAREUDP=m
CONFIG_IPV6_IOAM6_LWTUNNEL=y
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Author: Justin Iurman <justin.iurman@uliege.be>
#
# This test evaluates the IOAM insertion for IPv6 by checking the IOAM data
# integrity on the receiver.
#
# The topology is formed by 3 nodes: Alpha (sender), Beta (router in-between)
# and Gamma (receiver). An IOAM domain is configured from Alpha to Gamma only,
# which means not on the reverse path. When Gamma is the destination, Alpha
# adds an IOAM option (Pre-allocated Trace) inside a Hop-by-hop and fills the
# trace with its own IOAM data. Beta and Gamma also fill the trace. The IOAM
# data integrity is checked on Gamma, by comparing with the pre-defined IOAM
# configuration (see below).
#
# +-------------------+ +-------------------+
# | | | |
# | alpha netns | | gamma netns |
# | | | |
# | +-------------+ | | +-------------+ |
# | | veth0 | | | | veth0 | |
# | | db01::2/64 | | | | db02::2/64 | |
# | +-------------+ | | +-------------+ |
# | . | | . |
# +-------------------+ +-------------------+
# . .
# . .
# . .
# +----------------------------------------------------+
# | . . |
# | +-------------+ +-------------+ |
# | | veth0 | | veth1 | |
# | | db01::1/64 | ................ | db02::1/64 | |
# | +-------------+ +-------------+ |
# | |
# | beta netns |
# | |
# +--------------------------+-------------------------+
#
#
# ~~~~~~~~~~~~~~~~~~~~~~
# | IOAM configuration |
# ~~~~~~~~~~~~~~~~~~~~~~
#
# Alpha
# +-----------------------------------------------------------+
# | Type | Value |
# +-----------------------------------------------------------+
# | Node ID | 1 |
# +-----------------------------------------------------------+
# | Node Wide ID | 11111111 |
# +-----------------------------------------------------------+
# | Ingress ID | 0xffff (default value) |
# +-----------------------------------------------------------+
# | Ingress Wide ID | 0xffffffff (default value) |
# +-----------------------------------------------------------+
# | Egress ID | 101 |
# +-----------------------------------------------------------+
# | Egress Wide ID | 101101 |
# +-----------------------------------------------------------+
# | Namespace Data | 0xdeadbee0 |
# +-----------------------------------------------------------+
# | Namespace Wide Data | 0xcafec0caf00dc0de |
# +-----------------------------------------------------------+
# | Schema ID | 777 |
# +-----------------------------------------------------------+
# | Schema Data | something that will be 4n-aligned |
# +-----------------------------------------------------------+
#
# Note: When Gamma is the destination, Alpha adds an IOAM Pre-allocated Trace
# option inside a Hop-by-hop, where 164 bytes are pre-allocated for the
# trace, with 123 as the IOAM-Namespace and with 0xfff00200 as the trace
# type (= all available options at this time). As a result, and based on
# IOAM configurations here, only both Alpha and Beta should be capable of
# inserting their IOAM data while Gamma won't have enough space and will
# set the overflow bit.
#
# Beta
# +-----------------------------------------------------------+
# | Type | Value |
# +-----------------------------------------------------------+
# | Node ID | 2 |
# +-----------------------------------------------------------+
# | Node Wide ID | 22222222 |
# +-----------------------------------------------------------+
# | Ingress ID | 201 |
# +-----------------------------------------------------------+
# | Ingress Wide ID | 201201 |
# +-----------------------------------------------------------+
# | Egress ID | 202 |
# +-----------------------------------------------------------+
# | Egress Wide ID | 202202 |
# +-----------------------------------------------------------+
# | Namespace Data | 0xdeadbee1 |
# +-----------------------------------------------------------+
# | Namespace Wide Data | 0xcafec0caf11dc0de |
# +-----------------------------------------------------------+
# | Schema ID | 0xffffff (= None) |
# +-----------------------------------------------------------+
# | Schema Data | |
# +-----------------------------------------------------------+
#
# Gamma
# +-----------------------------------------------------------+
# | Type | Value |
# +-----------------------------------------------------------+
# | Node ID | 3 |
# +-----------------------------------------------------------+
# | Node Wide ID | 33333333 |
# +-----------------------------------------------------------+
# | Ingress ID | 301 |
# +-----------------------------------------------------------+
# | Ingress Wide ID | 301301 |
# +-----------------------------------------------------------+
# | Egress ID | 0xffff (default value) |
# +-----------------------------------------------------------+
# | Egress Wide ID | 0xffffffff (default value) |
# +-----------------------------------------------------------+
# | Namespace Data | 0xdeadbee2 |
# +-----------------------------------------------------------+
# | Namespace Wide Data | 0xcafec0caf22dc0de |
# +-----------------------------------------------------------+
# | Schema ID | 0xffffff (= None) |
# +-----------------------------------------------------------+
# | Schema Data | |
# +-----------------------------------------------------------+
#===============================================================================
#
# WARNING:
# Do NOT modify the following configuration unless you know what you're doing.
#
IOAM_NAMESPACE=123
IOAM_TRACE_TYPE=0xfff00200
IOAM_PREALLOC_DATA_SIZE=164
ALPHA=(
1 # ID
11111111 # Wide ID
0xffff # Ingress ID
0xffffffff # Ingress Wide ID
101 # Egress ID
101101 # Egress Wide ID
0xdeadbee0 # Namespace Data
0xcafec0caf00dc0de # Namespace Wide Data
777 # Schema ID (0xffffff = None)
"something that will be 4n-aligned" # Schema Data
)
BETA=(
2
22222222
201
201201
202
202202
0xdeadbee1
0xcafec0caf11dc0de
0xffffff
""
)
GAMMA=(
3
33333333
301
301301
0xffff
0xffffffff
0xdeadbee2
0xcafec0caf22dc0de
0xffffff
""
)
#===============================================================================
if [ "$(id -u)" -ne 0 ]; then
echo "SKIP: Need root privileges"
exit 1
fi
if [ ! -x "$(command -v ip)" ]; then
echo "SKIP: Could not run test without ip tool"
exit 1
fi
ip ioam &>/dev/null
if [ $? = 1 ]; then
echo "SKIP: ip tool must include IOAM"
exit 1
fi
if [ ! -e /proc/sys/net/ipv6/ioam6_id ]; then
echo "SKIP: ioam6 sysctls do not exist"
exit 1
fi
cleanup()
{
ip link del ioam-veth-alpha 2>/dev/null || true
ip link del ioam-veth-gamma 2>/dev/null || true
ip netns del ioam-node-alpha || true
ip netns del ioam-node-beta || true
ip netns del ioam-node-gamma || true
}
setup()
{
ip netns add ioam-node-alpha
ip netns add ioam-node-beta
ip netns add ioam-node-gamma
ip link add name ioam-veth-alpha type veth peer name ioam-veth-betaL
ip link add name ioam-veth-betaR type veth peer name ioam-veth-gamma
ip link set ioam-veth-alpha netns ioam-node-alpha
ip link set ioam-veth-betaL netns ioam-node-beta
ip link set ioam-veth-betaR netns ioam-node-beta
ip link set ioam-veth-gamma netns ioam-node-gamma
ip -netns ioam-node-alpha link set ioam-veth-alpha name veth0
ip -netns ioam-node-beta link set ioam-veth-betaL name veth0
ip -netns ioam-node-beta link set ioam-veth-betaR name veth1
ip -netns ioam-node-gamma link set ioam-veth-gamma name veth0
ip -netns ioam-node-alpha addr add db01::2/64 dev veth0
ip -netns ioam-node-alpha link set veth0 up
ip -netns ioam-node-alpha link set lo up
ip -netns ioam-node-alpha route add default via db01::1
ip -netns ioam-node-beta addr add db01::1/64 dev veth0
ip -netns ioam-node-beta addr add db02::1/64 dev veth1
ip -netns ioam-node-beta link set veth0 up
ip -netns ioam-node-beta link set veth1 up
ip -netns ioam-node-beta link set lo up
ip -netns ioam-node-gamma addr add db02::2/64 dev veth0
ip -netns ioam-node-gamma link set veth0 up
ip -netns ioam-node-gamma link set lo up
ip -netns ioam-node-gamma route add default via db02::1
# - IOAM config -
ip netns exec ioam-node-alpha sysctl -wq net.ipv6.ioam6_id=${ALPHA[0]}
ip netns exec ioam-node-alpha sysctl -wq net.ipv6.ioam6_id_wide=${ALPHA[1]}
ip netns exec ioam-node-alpha sysctl -wq net.ipv6.conf.veth0.ioam6_id=${ALPHA[4]}
ip netns exec ioam-node-alpha sysctl -wq net.ipv6.conf.veth0.ioam6_id_wide=${ALPHA[5]}
ip -netns ioam-node-alpha ioam namespace add ${IOAM_NAMESPACE} data ${ALPHA[6]} wide ${ALPHA[7]}
ip -netns ioam-node-alpha ioam schema add ${ALPHA[8]} "${ALPHA[9]}"
ip -netns ioam-node-alpha ioam namespace set ${IOAM_NAMESPACE} schema ${ALPHA[8]}
ip -netns ioam-node-alpha route add db02::/64 encap ioam6 trace type ${IOAM_TRACE_TYPE:0:-2} ns ${IOAM_NAMESPACE} size ${IOAM_PREALLOC_DATA_SIZE} via db01::1 dev veth0
ip netns exec ioam-node-beta sysctl -wq net.ipv6.conf.all.forwarding=1
ip netns exec ioam-node-beta sysctl -wq net.ipv6.ioam6_id=${BETA[0]}
ip netns exec ioam-node-beta sysctl -wq net.ipv6.ioam6_id_wide=${BETA[1]}
ip netns exec ioam-node-beta sysctl -wq net.ipv6.conf.veth0.ioam6_enabled=1
ip netns exec ioam-node-beta sysctl -wq net.ipv6.conf.veth0.ioam6_id=${BETA[2]}
ip netns exec ioam-node-beta sysctl -wq net.ipv6.conf.veth0.ioam6_id_wide=${BETA[3]}
ip netns exec ioam-node-beta sysctl -wq net.ipv6.conf.veth1.ioam6_id=${BETA[4]}
ip netns exec ioam-node-beta sysctl -wq net.ipv6.conf.veth1.ioam6_id_wide=${BETA[5]}
ip -netns ioam-node-beta ioam namespace add ${IOAM_NAMESPACE} data ${BETA[6]} wide ${BETA[7]}
ip netns exec ioam-node-gamma sysctl -wq net.ipv6.ioam6_id=${GAMMA[0]}
ip netns exec ioam-node-gamma sysctl -wq net.ipv6.ioam6_id_wide=${GAMMA[1]}
ip netns exec ioam-node-gamma sysctl -wq net.ipv6.conf.veth0.ioam6_enabled=1
ip netns exec ioam-node-gamma sysctl -wq net.ipv6.conf.veth0.ioam6_id=${GAMMA[2]}
ip netns exec ioam-node-gamma sysctl -wq net.ipv6.conf.veth0.ioam6_id_wide=${GAMMA[3]}
ip -netns ioam-node-gamma ioam namespace add ${IOAM_NAMESPACE} data ${GAMMA[6]} wide ${GAMMA[7]}
}
run()
{
echo -n "IOAM test... "
ip netns exec ioam-node-alpha ping6 -c 5 -W 1 db02::2 &>/dev/null
if [ $? != 0 ]; then
echo "FAILED"
cleanup &>/dev/null
exit 0
fi
ip netns exec ioam-node-gamma ./ioam6_parser veth0 2 ${IOAM_NAMESPACE} ${IOAM_TRACE_TYPE} 64 ${ALPHA[0]} ${ALPHA[1]} ${ALPHA[2]} ${ALPHA[3]} ${ALPHA[4]} ${ALPHA[5]} ${ALPHA[6]} ${ALPHA[7]} ${ALPHA[8]} "${ALPHA[9]}" 63 ${BETA[0]} ${BETA[1]} ${BETA[2]} ${BETA[3]} ${BETA[4]} ${BETA[5]} ${BETA[6]} ${BETA[7]} ${BETA[8]} &
local spid=$!
sleep 0.1
ip netns exec ioam-node-alpha ping6 -c 5 -W 1 db02::2 &>/dev/null
wait $spid
[ $? = 0 ] && echo "PASSED" || echo "FAILED"
}
cleanup &>/dev/null
setup
run
cleanup &>/dev/null
// SPDX-License-Identifier: GPL-2.0+
/*
* Author: Justin Iurman (justin.iurman@uliege.be)
*
* IOAM parser for IPv6, see ioam6.sh for details.
*/
#include <asm/byteorder.h>
#include <linux/const.h>
#include <linux/if_ether.h>
#include <linux/ioam6.h>
#include <linux/ipv6.h>
#include <sys/socket.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
struct node_args {
__u32 id;
__u64 wide;
__u16 ingr_id;
__u16 egr_id;
__u32 ingr_wide;
__u32 egr_wide;
__u32 ns_data;
__u64 ns_wide;
__u32 sc_id;
__u8 hop_limit;
__u8 *sc_data; /* NULL when sc_id = 0xffffff (default empty value) */
};
/* expected args per node, in that order */
enum {
NODE_ARG_HOP_LIMIT,
NODE_ARG_ID,
NODE_ARG_WIDE,
NODE_ARG_INGR_ID,
NODE_ARG_INGR_WIDE,
NODE_ARG_EGR_ID,
NODE_ARG_EGR_WIDE,
NODE_ARG_NS_DATA,
NODE_ARG_NS_WIDE,
NODE_ARG_SC_ID,
__NODE_ARG_MAX,
};
#define NODE_ARGS_SIZE __NODE_ARG_MAX
struct args {
__u16 ns_id;
__u32 trace_type;
__u8 n_node;
__u8 *ifname;
struct node_args node[0];
};
/* expected args, in that order */
enum {
ARG_IFNAME,
ARG_N_NODE,
ARG_NS_ID,
ARG_TRACE_TYPE,
__ARG_MAX,
};
#define ARGS_SIZE __ARG_MAX
int check_ioam6_node_data(__u8 **p, struct ioam6_trace_hdr *trace, __u8 hlim,
__u32 id, __u64 wide, __u16 ingr_id, __u32 ingr_wide,
__u16 egr_id, __u32 egr_wide, __u32 ns_data,
__u64 ns_wide, __u32 sc_id, __u8 *sc_data)
{
__u64 raw64;
__u32 raw32;
__u8 sc_len;
if (trace->type.bit0) {
raw32 = __be32_to_cpu(*((__u32 *)*p));
if (hlim != (raw32 >> 24) || id != (raw32 & 0xffffff))
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit1) {
raw32 = __be32_to_cpu(*((__u32 *)*p));
if (ingr_id != (raw32 >> 16) || egr_id != (raw32 & 0xffff))
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit2)
*p += sizeof(__u32);
if (trace->type.bit3)
*p += sizeof(__u32);
if (trace->type.bit4) {
if (__be32_to_cpu(*((__u32 *)*p)) != 0xffffffff)
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit5) {
if (__be32_to_cpu(*((__u32 *)*p)) != ns_data)
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit6) {
if (__be32_to_cpu(*((__u32 *)*p)) != 0xffffffff)
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit7) {
if (__be32_to_cpu(*((__u32 *)*p)) != 0xffffffff)
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit8) {
raw64 = __be64_to_cpu(*((__u64 *)*p));
if (hlim != (raw64 >> 56) || wide != (raw64 & 0xffffffffffffff))
return 1;
*p += sizeof(__u64);
}
if (trace->type.bit9) {
if (__be32_to_cpu(*((__u32 *)*p)) != ingr_wide)
return 1;
*p += sizeof(__u32);
if (__be32_to_cpu(*((__u32 *)*p)) != egr_wide)
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit10) {
if (__be64_to_cpu(*((__u64 *)*p)) != ns_wide)
return 1;
*p += sizeof(__u64);
}
if (trace->type.bit11) {
if (__be32_to_cpu(*((__u32 *)*p)) != 0xffffffff)
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit22) {
raw32 = __be32_to_cpu(*((__u32 *)*p));
sc_len = sc_data ? __ALIGN_KERNEL(strlen(sc_data), 4) : 0;
if (sc_len != (raw32 >> 24) * 4 || sc_id != (raw32 & 0xffffff))
return 1;
*p += sizeof(__u32);
if (sc_data) {
if (strncmp(*p, sc_data, strlen(sc_data)))
return 1;
*p += strlen(sc_data);
sc_len -= strlen(sc_data);
while (sc_len--) {
if (**p != '\0')
return 1;
*p += sizeof(__u8);
}
}
}
return 0;
}
int check_ioam6_trace(struct ioam6_trace_hdr *trace, struct args *args)
{
__u8 *p;
int i;
if (__be16_to_cpu(trace->namespace_id) != args->ns_id ||
__be32_to_cpu(trace->type_be32) != args->trace_type)
return 1;
p = trace->data + trace->remlen * 4;
for (i = args->n_node - 1; i >= 0; i--) {
if (check_ioam6_node_data(&p, trace,
args->node[i].hop_limit,
args->node[i].id,
args->node[i].wide,
args->node[i].ingr_id,
args->node[i].ingr_wide,
args->node[i].egr_id,
args->node[i].egr_wide,
args->node[i].ns_data,
args->node[i].ns_wide,
args->node[i].sc_id,
args->node[i].sc_data))
return 1;
}
return 0;
}
int parse_node_args(int *argcp, char ***argvp, struct node_args *node)
{
char **argv = *argvp;
if (*argcp < NODE_ARGS_SIZE)
return 1;
node->hop_limit = strtoul(argv[NODE_ARG_HOP_LIMIT], NULL, 10);
if (!node->hop_limit) {
node->hop_limit = strtoul(argv[NODE_ARG_HOP_LIMIT], NULL, 16);
if (!node->hop_limit)
return 1;
}
node->id = strtoul(argv[NODE_ARG_ID], NULL, 10);
if (!node->id) {
node->id = strtoul(argv[NODE_ARG_ID], NULL, 16);
if (!node->id)
return 1;
}
node->wide = strtoull(argv[NODE_ARG_WIDE], NULL, 10);
if (!node->wide) {
node->wide = strtoull(argv[NODE_ARG_WIDE], NULL, 16);
if (!node->wide)
return 1;
}
node->ingr_id = strtoul(argv[NODE_ARG_INGR_ID], NULL, 10);
if (!node->ingr_id) {
node->ingr_id = strtoul(argv[NODE_ARG_INGR_ID], NULL, 16);
if (!node->ingr_id)
return 1;
}
node->ingr_wide = strtoul(argv[NODE_ARG_INGR_WIDE], NULL, 10);
if (!node->ingr_wide) {
node->ingr_wide = strtoul(argv[NODE_ARG_INGR_WIDE], NULL, 16);
if (!node->ingr_wide)
return 1;
}
node->egr_id = strtoul(argv[NODE_ARG_EGR_ID], NULL, 10);
if (!node->egr_id) {
node->egr_id = strtoul(argv[NODE_ARG_EGR_ID], NULL, 16);
if (!node->egr_id)
return 1;
}
node->egr_wide = strtoul(argv[NODE_ARG_EGR_WIDE], NULL, 10);
if (!node->egr_wide) {
node->egr_wide = strtoul(argv[NODE_ARG_EGR_WIDE], NULL, 16);
if (!node->egr_wide)
return 1;
}
node->ns_data = strtoul(argv[NODE_ARG_NS_DATA], NULL, 16);
if (!node->ns_data)
return 1;
node->ns_wide = strtoull(argv[NODE_ARG_NS_WIDE], NULL, 16);
if (!node->ns_wide)
return 1;
node->sc_id = strtoul(argv[NODE_ARG_SC_ID], NULL, 10);
if (!node->sc_id) {
node->sc_id = strtoul(argv[NODE_ARG_SC_ID], NULL, 16);
if (!node->sc_id)
return 1;
}
*argcp -= NODE_ARGS_SIZE;
*argvp += NODE_ARGS_SIZE;
if (node->sc_id != 0xffffff) {
if (!*argcp)
return 1;
node->sc_data = argv[NODE_ARG_SC_ID + 1];
*argcp -= 1;
*argvp += 1;
}
return 0;
}
struct args *parse_args(int argc, char **argv)
{
struct args *args;
int n_node, i;
if (argc < ARGS_SIZE)
goto out;
n_node = strtoul(argv[ARG_N_NODE], NULL, 10);
if (!n_node || n_node > 10)
goto out;
args = calloc(1, sizeof(*args) + n_node * sizeof(struct node_args));
if (!args)
goto out;
args->ns_id = strtoul(argv[ARG_NS_ID], NULL, 10);
if (!args->ns_id)
goto free;
args->trace_type = strtoul(argv[ARG_TRACE_TYPE], NULL, 16);
if (!args->trace_type)
goto free;
args->n_node = n_node;
args->ifname = argv[ARG_IFNAME];
argv += ARGS_SIZE;
argc -= ARGS_SIZE;
for (i = 0; i < n_node; i++) {
if (parse_node_args(&argc, &argv, &args->node[i]))
goto free;
}
if (argc)
goto free;
return args;
free:
free(args);
out:
return NULL;
}
int main(int argc, char **argv)
{
int ret, fd, pkts, size, hoplen, found;
struct ioam6_trace_hdr *ioam6h;
struct ioam6_hdr *opt;
struct ipv6hdr *ip6h;
__u8 buffer[400], *p;
struct args *args;
args = parse_args(argc - 1, argv + 1);
if (!args) {
ret = 1;
goto out;
}
fd = socket(AF_PACKET, SOCK_DGRAM, __cpu_to_be16(ETH_P_IPV6));
if (!fd) {
ret = 1;
goto out;
}
if (setsockopt(fd, SOL_SOCKET, SO_BINDTODEVICE,
args->ifname, strlen(args->ifname))) {
ret = 1;
goto close;
}
pkts = 0;
found = 0;
while (pkts < 3 && !found) {
size = recv(fd, buffer, sizeof(buffer), 0);
ip6h = (struct ipv6hdr *)buffer;
pkts++;
if (ip6h->nexthdr == IPPROTO_HOPOPTS) {
p = buffer + sizeof(*ip6h);
hoplen = (p[1] + 1) << 3;
p += sizeof(struct ipv6_hopopt_hdr);
while (hoplen > 0) {
opt = (struct ioam6_hdr *)p;
if (opt->opt_type == IPV6_TLV_IOAM &&
opt->type == IOAM6_TYPE_PREALLOC) {
found = 1;
p += sizeof(*opt);
ioam6h = (struct ioam6_trace_hdr *)p;
ret = check_ioam6_trace(ioam6h, args);
break;
}
p += opt->opt_len + 2;
hoplen -= opt->opt_len + 2;
}
}
}
if (!found)
ret = 1;
close:
close(fd);
out:
free(args);
return ret;
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment