• David Ahern's avatar
    ipv4: Update exception handling for multipath routes via same device · 2fbc6e89
    David Ahern authored
    Kfir reported that pmtu exceptions are not created properly for
    deployments where multipath routes use the same device.
    
    After some digging I see 2 compounding problems:
    1. ip_route_output_key_hash_rcu is updating the flowi4_oif *after*
       the route lookup. This is the second use case where this has
       been a problem (the first is related to use of vti devices with
       VRF). I can not find any reason for the oif to be changed after the
       lookup; the code goes back to the start of git. It does not seem
       logical so remove it.
    
    2. fib_lookups for exceptions do not call fib_select_path to handle
       multipath route selection based on the hash.
    
    The end result is that the fib_lookup used to add the exception
    always creates it based using the first leg of the route.
    
    An example topology showing the problem:
    
                     |  host1
                 +------+
                 | eth0 |  .209
                 +------+
                     |
                 +------+
         switch  | br0  |
                 +------+
                     |
           +---------+---------+
           | host2             |  host3
       +------+             +------+
       | eth0 | .250        | eth0 | 192.168.252.252
       +------+             +------+
    
       +-----+             +-----+
       | vti | .2          | vti | 192.168.247.3
       +-----+             +-----+
           \                  /
     =================================
     tunnels
             192.168.247.1/24
    
    for h in host1 host2 host3; do
            ip netns add ${h}
            ip -netns ${h} link set lo up
            ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1
    done
    
    ip netns add switch
    ip -netns switch li set lo up
    ip -netns switch link add br0 type bridge stp 0
    ip -netns switch link set br0 up
    
    for n in 1 2 3; do
            ip -netns switch link add eth-sw type veth peer name eth-h${n}
            ip -netns switch li set eth-h${n} master br0 up
            ip -netns switch li set eth-sw netns host${n} name eth0
    done
    
    ip -netns host1 addr add 192.168.252.209/24 dev eth0
    ip -netns host1 link set dev eth0 up
    ip -netns host1 route add 192.168.247.0/24 \
            nexthop via 192.168.252.250 dev eth0 nexthop via 192.168.252.252 dev eth0
    
    ip -netns host2 addr add 192.168.252.250/24 dev eth0
    ip -netns host2 link set dev eth0 up
    
    ip -netns host2 addr add 192.168.252.252/24 dev eth0
    ip -netns host3 link set dev eth0 up
    
    ip netns add tunnel
    ip -netns tunnel li set lo up
    ip -netns tunnel li add br0 type bridge
    ip -netns tunnel li set br0 up
    for n in $(seq 11 20); do
            ip -netns tunnel addr add dev br0 192.168.247.${n}/24
    done
    
    for n in 2 3
    do
            ip -netns tunnel link add vti${n} type veth peer name eth${n}
            ip -netns tunnel link set eth${n} mtu 1360 master br0 up
            ip -netns tunnel link set vti${n} netns host${n} mtu 1360 up
            ip -netns host${n} addr add dev vti${n} 192.168.247.${n}/24
    done
    ip -netns tunnel ro add default nexthop via 192.168.247.2 nexthop via 192.168.247.3
    
    ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.11
    ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.15
    ip -netns host1 ro ls cache
    
    Before this patch the cache always shows exceptions against the first
    leg in the multipath route; 192.168.252.250 per this example. Since the
    hash has an initial random seed, you may need to vary the final octet
    more than what is listed. In my tests, using addresses between 11 and 19
    usually found 1 that used both legs.
    
    With this patch, the cache will have exceptions for both legs.
    
    Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions")
    Reported-by: default avatarKfir Itzhak <mastertheknife@gmail.com>
    Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    2fbc6e89
route.c 89.4 KB