Commit e8b9eab9 authored by Martynas Pumputis's avatar Martynas Pumputis Committed by David S. Miller

net: retrieve netns cookie via getsocketopt

It's getting more common to run nested container environments for
testing cloud software. One of such examples is Kind [1] which runs a
Kubernetes cluster in Docker containers on a single host. Each container
acts as a Kubernetes node, and thus can run any Pod (aka container)
inside the former. This approach simplifies testing a lot, as it
eliminates complicated VM setups.

Unfortunately, such a setup breaks some functionality when cgroupv2 BPF
programs are used for load-balancing. The load-balancer BPF program
needs to detect whether a request originates from the host netns or a
container netns in order to allow some access, e.g. to a service via a
loopback IP address. Typically, the programs detect this by comparing
netns cookies with the one of the init ns via a call to
bpf_get_netns_cookie(NULL). However, in nested environments the latter
cannot be used given the Kubernetes node's netns is outside the init ns.
To fix this, we need to pass the Kubernetes node netns cookie to the
program in a different way: by extending getsockopt() with a
SO_NETNS_COOKIE option, the orchestrator which runs in the Kubernetes
node netns can retrieve the cookie and pass it to the program instead.

Thus, this is following up on Eric's commit 3d368ab8 ("net:
initialize net->net_cookie at netns setup") to allow retrieval via
SO_NETNS_COOKIE.  This is also in line in how we retrieve socket cookie
via SO_COOKIE.

  [1] https://kind.sigs.k8s.io/Signed-off-by: default avatarLorenz Bauer <lmb@cloudflare.com>
Signed-off-by: default avatarMartynas Pumputis <m@lambda.lt>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parent 35713d9b
...@@ -127,6 +127,8 @@ ...@@ -127,6 +127,8 @@
#define SO_PREFER_BUSY_POLL 69 #define SO_PREFER_BUSY_POLL 69
#define SO_BUSY_POLL_BUDGET 70 #define SO_BUSY_POLL_BUDGET 70
#define SO_NETNS_COOKIE 71
#if !defined(__KERNEL__) #if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64 #if __BITS_PER_LONG == 64
......
...@@ -138,6 +138,8 @@ ...@@ -138,6 +138,8 @@
#define SO_PREFER_BUSY_POLL 69 #define SO_PREFER_BUSY_POLL 69
#define SO_BUSY_POLL_BUDGET 70 #define SO_BUSY_POLL_BUDGET 70
#define SO_NETNS_COOKIE 71
#if !defined(__KERNEL__) #if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64 #if __BITS_PER_LONG == 64
......
...@@ -119,6 +119,8 @@ ...@@ -119,6 +119,8 @@
#define SO_PREFER_BUSY_POLL 0x4043 #define SO_PREFER_BUSY_POLL 0x4043
#define SO_BUSY_POLL_BUDGET 0x4044 #define SO_BUSY_POLL_BUDGET 0x4044
#define SO_NETNS_COOKIE 0x4045
#if !defined(__KERNEL__) #if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64 #if __BITS_PER_LONG == 64
......
...@@ -120,6 +120,8 @@ ...@@ -120,6 +120,8 @@
#define SO_PREFER_BUSY_POLL 0x0048 #define SO_PREFER_BUSY_POLL 0x0048
#define SO_BUSY_POLL_BUDGET 0x0049 #define SO_BUSY_POLL_BUDGET 0x0049
#define SO_NETNS_COOKIE 0x0050
#if !defined(__KERNEL__) #if !defined(__KERNEL__)
......
...@@ -122,6 +122,8 @@ ...@@ -122,6 +122,8 @@
#define SO_PREFER_BUSY_POLL 69 #define SO_PREFER_BUSY_POLL 69
#define SO_BUSY_POLL_BUDGET 70 #define SO_BUSY_POLL_BUDGET 70
#define SO_NETNS_COOKIE 71
#if !defined(__KERNEL__) #if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__)) #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
......
...@@ -1635,6 +1635,13 @@ int sock_getsockopt(struct socket *sock, int level, int optname, ...@@ -1635,6 +1635,13 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
v.val = sk->sk_bound_dev_if; v.val = sk->sk_bound_dev_if;
break; break;
case SO_NETNS_COOKIE:
lv = sizeof(u64);
if (len != lv)
return -EINVAL;
v.val64 = sock_net(sk)->net_cookie;
break;
default: default:
/* We implement the SO_SNDLOWAT etc to not be settable /* We implement the SO_SNDLOWAT etc to not be settable
* (1003.1g 7). * (1003.1g 7).
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment