• Vladimir Oltean's avatar
    net: mscc: ocelot: fix incorrect balancing with down LAG ports · a14e6b69
    Vladimir Oltean authored
    Assuming the test setup described here:
    https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/
    (swp1 and swp2 are in bond0, and bond0 is in a bridge with swp0)
    
    it can be seen that when swp1 goes down (on either board A or B), then
    traffic that should go through that port isn't forwarded anywhere.
    
    A dump of the PGID table shows the following:
    
    PGID_DST[0] = ports 0
    PGID_DST[1] = ports 1
    PGID_DST[2] = ports 2
    PGID_DST[3] = ports 3
    PGID_DST[4] = ports 4
    PGID_DST[5] = ports 5
    PGID_DST[6] = no ports
    PGID_AGGR[0] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[1] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[2] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[3] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[4] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[5] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[6] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[7] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[8] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[9] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[10] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[11] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[12] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[13] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[14] = ports 0, 1, 2, 3, 4, 5
    PGID_AGGR[15] = ports 0, 1, 2, 3, 4, 5
    PGID_SRC[0] = ports 1, 2
    PGID_SRC[1] = ports 0
    PGID_SRC[2] = ports 0
    PGID_SRC[3] = no ports
    PGID_SRC[4] = no ports
    PGID_SRC[5] = no ports
    PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5
    
    Whereas a "good" PGID configuration for that setup should have looked
    like this:
    
    PGID_DST[0] = ports 0
    PGID_DST[1] = ports 1, 2
    PGID_DST[2] = ports 1, 2
    PGID_DST[3] = ports 3
    PGID_DST[4] = ports 4
    PGID_DST[5] = ports 5
    PGID_DST[6] = no ports
    PGID_AGGR[0] = ports 0, 2, 3, 4, 5
    PGID_AGGR[1] = ports 0, 2, 3, 4, 5
    PGID_AGGR[2] = ports 0, 2, 3, 4, 5
    PGID_AGGR[3] = ports 0, 2, 3, 4, 5
    PGID_AGGR[4] = ports 0, 2, 3, 4, 5
    PGID_AGGR[5] = ports 0, 2, 3, 4, 5
    PGID_AGGR[6] = ports 0, 2, 3, 4, 5
    PGID_AGGR[7] = ports 0, 2, 3, 4, 5
    PGID_AGGR[8] = ports 0, 2, 3, 4, 5
    PGID_AGGR[9] = ports 0, 2, 3, 4, 5
    PGID_AGGR[10] = ports 0, 2, 3, 4, 5
    PGID_AGGR[11] = ports 0, 2, 3, 4, 5
    PGID_AGGR[12] = ports 0, 2, 3, 4, 5
    PGID_AGGR[13] = ports 0, 2, 3, 4, 5
    PGID_AGGR[14] = ports 0, 2, 3, 4, 5
    PGID_AGGR[15] = ports 0, 2, 3, 4, 5
    PGID_SRC[0] = ports 1, 2
    PGID_SRC[1] = ports 0
    PGID_SRC[2] = ports 0
    PGID_SRC[3] = no ports
    PGID_SRC[4] = no ports
    PGID_SRC[5] = no ports
    PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5
    
    In other words, in the "bad" configuration, the attempt is to remove the
    inactive swp1 from the destination ports via PGID_DST. But when a MAC
    table entry is learned, it is learned towards PGID_DST 1, because that
    is the logical port id of the LAG itself (it is equal to the lowest
    numbered member port). So when swp1 becomes inactive, if we set
    PGID_DST[1] to contain just swp1 and not swp2, the packet will not have
    any chance to reach the destination via swp2.
    
    The "correct" way to remove swp1 as a destination is via PGID_AGGR
    (remove swp1 from the aggregation port groups for all aggregation
    codes). This means that PGID_DST[1] and PGID_DST[2] must still contain
    both swp1 and swp2. This makes the MAC table still treat packets
    destined towards the single-port LAG as "multicast", and the inactive
    ports are removed via the aggregation code tables.
    
    The change presented here is a design one: the ocelot_get_bond_mask()
    function used to take an "only_active_ports" argument. We don't need
    that. The only call site that specifies only_active_ports=true,
    ocelot_set_aggr_pgids(), must retrieve the entire bonding mask, because
    it must program that into PGID_DST. Additionally, it must also clear the
    inactive ports from the bond mask here, which it can't do if bond_mask
    just contains the active ports:
    
    	ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i);
    	ac &= ~bond_mask;  <---- here
    	/* Don't do division by zero if there was no active
    	 * port. Just make all aggregation codes zero.
    	 */
    	if (num_active_ports)
    		ac |= BIT(aggr_idx[i % num_active_ports]);
    	ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i);
    
    So it becomes the responsibility of ocelot_set_aggr_pgids() to take
    ocelot_port->lag_tx_active into consideration when populating the
    aggr_idx array.
    
    Fixes: 23ca3b72 ("net: mscc: ocelot: rebalance LAGs on link up/down events")
    Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://lore.kernel.org/r/20220107164332.402133-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    a14e6b69
ocelot.c 73.2 KB