• Jakub Kicinski's avatar
    ipv4: fix ip option filtering for locally generated fragments · 27a8caa5
    Jakub Kicinski authored
    During IP fragmentation we sanitize IP options. This means overwriting
    options which should not be copied with NOPs. Only the first fragment
    has the original, full options.
    
    ip_fraglist_prepare() copies the IP header and options from previous
    fragment to the next one. Commit 19c3401a ("net: ipv4: place control
    buffer handling away from fragmentation iterators") moved sanitizing
    options before ip_fraglist_prepare() which means options are sanitized
    and then overwritten again with the old values.
    
    Fixing this is not enough, however, nor did the sanitization work
    prior to aforementioned commit.
    
    ip_options_fragment() (which does the sanitization) uses ipcb->opt.optlen
    for the length of the options. ipcb->opt of fragments is not populated
    (it's 0), only the head skb has the state properly built. So even when
    called at the right time ip_options_fragment() does nothing. This seems
    to date back all the way to v2.5.44 when the fast path for pre-fragmented
    skbs had been introduced. Prior to that ip_options_build() would have been
    called for every fragment (in fact ever since v2.5.44 the fragmentation
    handing in ip_options_build() has been dead code, I'll clean it up in
    -next).
    
    In the original patch (see Link) caixf mentions fixing the handling
    for fragments other than the second one, but I'm not sure how _any_
    fragment could have had their options sanitized with the code
    as it stood.
    
    Tested with python (MTU on lo lowered to 1000 to force fragmentation):
    
      import socket
      s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
      s.setsockopt(socket.IPPROTO_IP, socket.IP_OPTIONS,
                   bytearray([7,4,5,192, 20|0x80,4,1,0]))
      s.sendto(b'1'*2000, ('127.0.0.1', 1234))
    
    Before:
    
    IP (tos 0x0, ttl 64, id 1053, offset 0, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
        localhost.36500 > localhost.search-agent: UDP, length 2000
    IP (tos 0x0, ttl 64, id 1053, offset 968, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
        localhost > localhost: udp
    IP (tos 0x0, ttl 64, id 1053, offset 1936, flags [none], proto UDP (17), length 100, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
        localhost > localhost: udp
    
    After:
    
    IP (tos 0x0, ttl 96, id 42549, offset 0, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
        localhost.51607 > localhost.search-agent: UDP, bad length 2000 > 960
    IP (tos 0x0, ttl 96, id 42549, offset 968, flags [+], proto UDP (17), length 996, options (NOP,NOP,NOP,NOP,RA value 256))
        localhost > localhost: udp
    IP (tos 0x0, ttl 96, id 42549, offset 1936, flags [none], proto UDP (17), length 100, options (NOP,NOP,NOP,NOP,RA value 256))
        localhost > localhost: udp
    
    RA (20 | 0x80) is now copied as expected, RR (7) is "NOPed out".
    
    Link: https://lore.kernel.org/netdev/20220107080559.122713-1-ooppublic@163.com/
    Fixes: 19c3401a ("net: ipv4: place control buffer handling away from fragmentation iterators")
    Fixes: 1da177e4 ("Linux-2.6.12-rc2")
    Signed-off-by: default avatarcaixf <ooppublic@163.com>
    Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    27a8caa5
ip_output.c 42.9 KB