• Vladimir Oltean's avatar
    net: bridge: add helper to replay port and host-joined mdb entries · 4f2673b3
    Vladimir Oltean authored
    I have a system with DSA ports, and udhcpcd is configured to bring
    interfaces up as soon as they are created.
    
    I create a bridge as follows:
    
    ip link add br0 type bridge
    
    As soon as I create the bridge and udhcpcd brings it up, I also have
    avahi which automatically starts sending IPv6 packets to advertise some
    local services, and because of that, the br0 bridge joins the following
    IPv6 groups due to the code path detailed below:
    
    33:33:ff:6d:c1:9c vid 0
    33:33:00:00:00:6a vid 0
    33:33:00:00:00:fb vid 0
    
    br_dev_xmit
    -> br_multicast_rcv
       -> br_ip6_multicast_add_group
          -> __br_multicast_add_group
             -> br_multicast_host_join
                -> br_mdb_notify
    
    This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
    hooked up, and switchdev will attempt to offload the host joined groups
    to an empty list of ports. Of course nobody offloads them.
    
    Then when we add a port to br0:
    
    ip link set swp0 master br0
    
    the bridge doesn't replay the host-joined MDB entries from br_add_if,
    and eventually the host joined addresses expire, and a switchdev
    notification for deleting it is emitted, but surprise, the original
    addition was already completely missed.
    
    The strategy to address this problem is to replay the MDB entries (both
    the port ones and the host joined ones) when the new port joins the
    bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
    be populated and only then attached to a bridge that you offload).
    However there are 2 possibilities: the addresses can be 'pushed' by the
    bridge into the port, or the port can 'pull' them from the bridge.
    
    Considering that in the general case, the new port can be really late to
    the party, and there may have been many other switchdev ports that
    already received the initial notification, we would like to avoid
    delivering duplicate events to them, since they might misbehave. And
    currently, the bridge calls the entire switchdev notifier chain, whereas
    for replaying it should just call the notifier block of the new guy.
    But the bridge doesn't know what is the new guy's notifier block, it
    just knows where the switchdev notifier chain is. So for simplification,
    we make this a driver-initiated pull for now, and the notifier block is
    passed as an argument.
    
    To emulate the calling context for mdb objects (deferred and put on the
    blocking notifier chain), we must iterate under RCU protection through
    the bridge's mdb entries, queue them, and only call them once we're out
    of the RCU read-side critical section.
    
    There was some opportunity for reuse between br_mdb_switchdev_host_port,
    br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
    mdb object is created, so a helper was created.
    Suggested-by: default avatarIdo Schimmel <idosch@idosch.org>
    Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
    Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    4f2673b3
br_mdb.c 29.9 KB