• Jakub Sitnicki's avatar
    ipv6: Skip XFRM lookup if dst_entry in socket cache is valid · 48fc8f94
    Jakub Sitnicki authored
    [ Upstream commit 00bc0ef5 ]
    
    At present we perform an xfrm_lookup() for each UDPv6 message we
    send. The lookup involves querying the flow cache (flow_cache_lookup)
    and, in case of a cache miss, creating an XFRM bundle.
    
    If we miss the flow cache, we can end up creating a new bundle and
    deriving the path MTU (xfrm_init_pmtu) from on an already transformed
    dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
    to xfrm_lookup(). This can happen only if we're caching the dst_entry
    in the socket, that is when we're using a connected UDP socket.
    
    To put it another way, the path MTU shrinks each time we miss the flow
    cache, which later on leads to incorrectly fragmented payload. It can
    be observed with ESPv6 in transport mode:
    
      1) Set up a transformation and lower the MTU to trigger fragmentation
        # ip xfrm policy add dir out src ::1 dst ::1 \
          tmpl src ::1 dst ::1 proto esp spi 1
        # ip xfrm state add src ::1 dst ::1 \
          proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
        # ip link set dev lo mtu 1500
    
      2) Monitor the packet flow and set up an UDP sink
        # tcpdump -ni lo -ttt &
        # socat udp6-listen:12345,fork /dev/null &
    
      3) Send a datagram that needs fragmentation with a connected socket
        # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
        2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
        00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
        00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
        00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
        (^ ICMPv6 Parameter Problem)
        00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136
    
      4) Compare it to a non-connected socket
        # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
        00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
        00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)
    
    What happens in step (3) is:
    
      1) when connecting the socket in __ip6_datagram_connect(), we
         perform an XFRM lookup, miss the flow cache, create an XFRM
         bundle, and cache the destination,
    
      2) afterwards, when sending the datagram, we perform an XFRM lookup,
         again, miss the flow cache (due to mismatch of flowi6_iif and
         flowi6_oif, which is an issue of its own), and recreate an XFRM
         bundle based on the cached (and already transformed) destination.
    
    To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
    altogether whenever we already have a destination entry cached in the
    socket. This prevents the path MTU shrinkage and brings us on par with
    UDPv4.
    
    The fix also benefits connected PINGv6 sockets, another user of
    ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
    twice.
    
    Joint work with Hannes Frederic Sowa.
    Reported-by: default avatarJan Tluka <jtluka@redhat.com>
    Signed-off-by: default avatarJakub Sitnicki <jkbs@redhat.com>
    Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    48fc8f94
ip6_output.c 45.2 KB