• Lorenz Bauer's avatar
    selftests/bpf: Add cls_redirect classifier · 23458901
    Lorenz Bauer authored
    cls_redirect is a TC clsact based replacement for the glb-redirect iptables
    module available at [1]. It enables what GitHub calls "second chance"
    flows [2], similarly proposed by the Beamer paper [3]. In contrast to
    glb-redirect, it also supports migrating UDP flows as long as connected
    sockets are used. cls_redirect is in production at Cloudflare, as part of
    our own L4 load balancer.
    
    We have modified the encapsulation format slightly from glb-redirect:
    glbgue_chained_routing.private_data_type has been repurposed to form a
    version field and several flags. Both have been arranged in a way that
    a private_data_type value of zero matches the current glb-redirect
    behaviour. This means that cls_redirect will understand packets in
    glb-redirect format, but not vice versa.
    
    The test suite only covers basic features. For example, cls_redirect will
    correctly forward path MTU discovery packets, but this is not exercised.
    It is also possible to switch the encapsulation format to GRE on the last
    hop, which is also not tested.
    
    There are two major distinctions from glb-redirect: first, cls_redirect
    relies on receiving encapsulated packets directly from a router. This is
    because we don't have access to the neighbour tables from BPF, yet. See
    forward_to_next_hop for details. Second, cls_redirect performs decapsulation
    instead of using separate ipip and sit tunnel devices. This
    avoids issues with the sit tunnel [4] and makes deploying the classifier
    easier: decapsulated packets appear on the same interface, so existing
    firewall rules continue to work as expected.
    
    The code base started it's life on v4.19, so there are most likely still
    hold overs from old workarounds. In no particular order:
    
    - The function buf_off is required to defeat a clang optimization
      that leads to the verifier rejecting the program due to pointer
      arithmetic in the wrong order.
    
    - The function pkt_parse_ipv6 is force inlined, because it would
      otherwise be rejected due to returning a pointer to stack memory.
    
    - The functions fill_tuple and classify_tcp contain kludges, because
      we've run out of function arguments.
    
    - The logic in general is rather nested, due to verifier restrictions.
      I think this is either because the verifier loses track of constants
      on the stack, or because it can't track enum like variables.
    
    1: https://github.com/github/glb-director/tree/master/src/glb-redirect
    2: https://github.com/github/glb-director/blob/master/docs/development/second-chance-design.md
    3: https://www.usenix.org/conference/nsdi18/presentation/olteanu
    4: https://github.com/github/glb-director/issues/64Signed-off-by: default avatarLorenz Bauer <lmb@cloudflare.com>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20200424185556.7358-2-lmb@cloudflare.com
    23458901
test_progs.h 3.94 KB