• Ido Schimmel's avatar
    skbuff: bridge: Add layer 2 miss indication · 7b4858df
    Ido Schimmel authored
    For EVPN non-DF (Designated Forwarder) filtering we need to be able to
    prevent decapsulated traffic from being flooded to a multi-homed host.
    Filtering of multicast and broadcast traffic can be achieved using the
    following flower filter:
    
     # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop
    
    Unlike broadcast and multicast traffic, it is not currently possible to
    filter unknown unicast traffic. The classification into unknown unicast
    is performed by the bridge driver, but is not visible to other layers
    such as tc.
    
    Solve this by adding a new 'l2_miss' bit to the tc skb extension. Clear
    the bit whenever a packet enters the bridge (received from a bridge port
    or transmitted via the bridge) and set it if the packet did not match an
    FDB or MDB entry. If there is no skb extension and the bit needs to be
    cleared, then do not allocate one as no extension is equivalent to the
    bit being cleared. The bit is not set for broadcast packets as they
    never perform a lookup and therefore never incur a miss.
    
    A bit that is set for every flooded packet would also work for the
    current use case, but it does not allow us to differentiate between
    registered and unregistered multicast traffic, which might be useful in
    the future.
    
    To keep the performance impact to a minimum, the marking of packets is
    guarded by the 'tc_skb_ext_tc' static key. When 'false', the skb is not
    touched and an skb extension is not allocated. Instead, only a
    5 bytes nop is executed, as demonstrated below for the call site in
    br_handle_frame().
    
    Before the patch:
    
    ```
            memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
      c37b09:       49 c7 44 24 28 00 00    movq   $0x0,0x28(%r12)
      c37b10:       00 00
    
            p = br_port_get_rcu(skb->dev);
      c37b12:       49 8b 44 24 10          mov    0x10(%r12),%rax
            memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
      c37b17:       49 c7 44 24 30 00 00    movq   $0x0,0x30(%r12)
      c37b1e:       00 00
      c37b20:       49 c7 44 24 38 00 00    movq   $0x0,0x38(%r12)
      c37b27:       00 00
    ```
    
    After the patch (when static key is disabled):
    
    ```
            memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
      c37c29:       49 c7 44 24 28 00 00    movq   $0x0,0x28(%r12)
      c37c30:       00 00
      c37c32:       49 8d 44 24 28          lea    0x28(%r12),%rax
      c37c37:       48 c7 40 08 00 00 00    movq   $0x0,0x8(%rax)
      c37c3e:       00
      c37c3f:       48 c7 40 10 00 00 00    movq   $0x0,0x10(%rax)
      c37c46:       00
    
    #ifdef CONFIG_HAVE_JUMP_LABEL_HACK
    
    static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
    {
            asm_volatile_goto("1:"
      c37c47:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
            br_tc_skb_miss_set(skb, false);
    
            p = br_port_get_rcu(skb->dev);
      c37c4c:       49 8b 44 24 10          mov    0x10(%r12),%rax
    ```
    
    Subsequent patches will extend the flower classifier to be able to match
    on the new 'l2_miss' bit and enable / disable the static key when
    filters that match on it are added / deleted.
    Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
    Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
    Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
    Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    7b4858df
br_device.c 12.8 KB