1. 05 Aug, 2023 8 commits
    • Paolo Abeni's avatar
      mptcp: fix disconnect vs accept race · 511b90e3
      Paolo Abeni authored
      Despite commit 0ad529d9 ("mptcp: fix possible divide by zero in
      recvmsg()"), the mptcp protocol is still prone to a race between
      disconnect() (or shutdown) and accept.
      
      The root cause is that the mentioned commit checks the msk-level
      flag, but mptcp_stream_accept() does acquire the msk-level lock,
      as it can rely directly on the first subflow lock.
      
      As reported by Christoph than can lead to a race where an msk
      socket is accepted after that mptcp_subflow_queue_clean() releases
      the listener socket lock and just before it takes destructive
      actions leading to the following splat:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000012
      PGD 5a4ca067 P4D 5a4ca067 PUD 37d4c067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      CPU: 2 PID: 10955 Comm: syz-executor.5 Not tainted 6.5.0-rc1-gdc7b257ee5dd #37
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      RIP: 0010:mptcp_stream_accept+0x1ee/0x2f0 include/net/inet_sock.h:330
      Code: 0a 09 00 48 8b 1b 4c 39 e3 74 07 e8 bc 7c 7f fe eb a1 e8 b5 7c 7f fe 4c 8b 6c 24 08 eb 05 e8 a9 7c 7f fe 49 8b 85 d8 09 00 00 <0f> b6 40 12 88 44 24 07 0f b6 6c 24 07 bf 07 00 00 00 89 ee e8 89
      RSP: 0018:ffffc90000d07dc0 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: ffff888037e8d020 RCX: ffff88803b093300
      RDX: 0000000000000000 RSI: ffffffff833822c5 RDI: ffffffff8333896a
      RBP: 0000607f82031520 R08: ffff88803b093300 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000003e83 R12: ffff888037e8d020
      R13: ffff888037e8c680 R14: ffff888009af7900 R15: ffff888009af6880
      FS:  00007fc26d708640(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000012 CR3: 0000000066bc5001 CR4: 0000000000370ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       do_accept+0x1ae/0x260 net/socket.c:1872
       __sys_accept4+0x9b/0x110 net/socket.c:1913
       __do_sys_accept4 net/socket.c:1954 [inline]
       __se_sys_accept4 net/socket.c:1951 [inline]
       __x64_sys_accept4+0x20/0x30 net/socket.c:1951
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
      Address the issue by temporary removing the pending request socket
      from the accept queue, so that racing accept() can't touch them.
      
      After depleting the msk - the ssk still exists, as plain TCP sockets,
      re-insert them into the accept queue, so that later inet_csk_listen_stop()
      will complete the tcp socket disposal.
      
      Fixes: 2a6a870e ("mptcp: stops worker on unaccepted sockets at listener close")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/423Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-4-6671b1ab11cc@tessares.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      511b90e3
    • Paolo Abeni's avatar
      mptcp: avoid bogus reset on fallback close · ff18f9ef
      Paolo Abeni authored
      Since the blamed commit, the MPTCP protocol unconditionally sends
      TCP resets on all the subflows on disconnect().
      
      That fits full-blown MPTCP sockets - to implement the fastclose
      mechanism - but causes unexpected corruption of the data stream,
      caught as sporadic self-tests failures.
      
      Fixes: d21f8348 ("mptcp: use fastclose on more edge scenarios")
      Cc: stable@vger.kernel.org
      Tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/419Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-3-6671b1ab11cc@tessares.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ff18f9ef
    • Andrea Claudi's avatar
      selftests: mptcp: join: fix 'implicit EP' test · c8c101ae
      Andrea Claudi authored
      mptcp_join 'implicit EP' test currently fails when using ip mptcp:
      
        $ ./mptcp_join.sh -iI
        <snip>
        001 implicit EP    creation[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 '
        Error: too many addresses or duplicate one: -22.
                           ID change is prevented[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 '
                           modif is allowed[fail] expected '10.0.2.2 10.0.2.2 id 1 signal' found '10.0.2.2 id 1 signal '
      
      This happens because of two reasons:
      - iproute v6.3.0 does not support the implicit flag, fixed with
        iproute2-next commit 3a2535a41854 ("mptcp: add support for implicit
        flag")
      - pm_nl_check_endpoint wrongly expects the ip address to be repeated two
        times in iproute output, and does not account for a final whitespace
        in it.
      
      This fixes the issue trimming the whitespace in the output string and
      removing the double address in the expected string.
      
      Fixes: 69c6ce7b ("selftests: mptcp: add implicit endpoint test case")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrea Claudi <aclaudi@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-2-6671b1ab11cc@tessares.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c8c101ae
    • Andrea Claudi's avatar
      selftests: mptcp: join: fix 'delete and re-add' test · aaf2123a
      Andrea Claudi authored
      mptcp_join 'delete and re-add' test fails when using ip mptcp:
      
        $ ./mptcp_join.sh -iI
        <snip>
        002 delete and re-add                    before delete[ ok ]
                                                 mptcp_info subflows=1         [ ok ]
        Error: argument "ADDRESS" is wrong: invalid for non-zero id address
                                                 after delete[fail] got 2:2 subflows expected 1
      
      This happens because endpoint delete includes an ip address while id is
      not 0, contrary to what is indicated in the ip mptcp man page:
      
      "When used with the delete id operation, an IFADDR is only included when
      the ID is 0."
      
      This fixes the issue using the $addr variable in pm_nl_del_endpoint()
      only when id is 0.
      
      Fixes: 34aa6e3b ("selftests: mptcp: add ip mptcp wrappers")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrea Claudi <aclaudi@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Link: https://lore.kernel.org/r/20230803-upstream-net-20230803-misc-fixes-6-5-v1-1-6671b1ab11cc@tessares.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aaf2123a
    • Jakub Kicinski's avatar
      Merge branch 'tunnels-fix-ipv4-pmtu-icmp-checksum' · ec935188
      Jakub Kicinski authored
      Florian Westphal says:
      
      ====================
      tunnels: fix ipv4 pmtu icmp checksum
      
      The checksum of the generated ipv4 icmp pmtud message is
      only correct if the skb that causes the icmp error generation
      is linear.
      
      Fix this and add a selftest for this.
      ====================
      
      Link: https://lore.kernel.org/r/20230803152653.29535-1-fw@strlen.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec935188
    • Florian Westphal's avatar
      selftests: net: test vxlan pmtu exceptions with tcp · 136a1b43
      Florian Westphal authored
      TCP might get stuck if a nonlinear skb exceeds the path MTU,
      icmp error contains an incorrect icmp checksum in that case.
      
      Extend the existing test for vxlan to also send at least 1MB worth of
      data via TCP in addition to the existing 'large icmp packet adds
      route exception'.
      
      On my test VM this fails due to 0-size output file without
      "tunnels: fix kasan splat when generating ipv4 pmtu error".
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Link: https://lore.kernel.org/r/20230803152653.29535-3-fw@strlen.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      136a1b43
    • Florian Westphal's avatar
      tunnels: fix kasan splat when generating ipv4 pmtu error · 6a7ac3d2
      Florian Westphal authored
      If we try to emit an icmp error in response to a nonliner skb, we get
      
      BUG: KASAN: slab-out-of-bounds in ip_compute_csum+0x134/0x220
      Read of size 4 at addr ffff88811c50db00 by task iperf3/1691
      CPU: 2 PID: 1691 Comm: iperf3 Not tainted 6.5.0-rc3+ #309
      [..]
       kasan_report+0x105/0x140
       ip_compute_csum+0x134/0x220
       iptunnel_pmtud_build_icmp+0x554/0x1020
       skb_tunnel_check_pmtu+0x513/0xb80
       vxlan_xmit_one+0x139e/0x2ef0
       vxlan_xmit+0x1867/0x2760
       dev_hard_start_xmit+0x1ee/0x4f0
       br_dev_queue_push_xmit+0x4d1/0x660
       [..]
      
      ip_compute_csum() cannot deal with nonlinear skbs, so avoid it.
      After this change, splat is gone and iperf3 is no longer stuck.
      
      Fixes: 4cb47a86 ("tunnels: PMTU discovery support for directly bridged IP packets")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Link: https://lore.kernel.org/r/20230803152653.29535-2-fw@strlen.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6a7ac3d2
    • Eric Dumazet's avatar
      net/packet: annotate data-races around tp->status · 8a989617
      Eric Dumazet authored
      Another syzbot report [1] is about tp->status lockless reads
      from __packet_get_status()
      
      [1]
      BUG: KCSAN: data-race in __packet_rcv_has_room / __packet_set_status
      
      write to 0xffff888117d7c080 of 8 bytes by interrupt on cpu 0:
      __packet_set_status+0x78/0xa0 net/packet/af_packet.c:407
      tpacket_rcv+0x18bb/0x1a60 net/packet/af_packet.c:2483
      deliver_skb net/core/dev.c:2173 [inline]
      __netif_receive_skb_core+0x408/0x1e80 net/core/dev.c:5337
      __netif_receive_skb_one_core net/core/dev.c:5491 [inline]
      __netif_receive_skb+0x57/0x1b0 net/core/dev.c:5607
      process_backlog+0x21f/0x380 net/core/dev.c:5935
      __napi_poll+0x60/0x3b0 net/core/dev.c:6498
      napi_poll net/core/dev.c:6565 [inline]
      net_rx_action+0x32b/0x750 net/core/dev.c:6698
      __do_softirq+0xc1/0x265 kernel/softirq.c:571
      invoke_softirq kernel/softirq.c:445 [inline]
      __irq_exit_rcu+0x57/0xa0 kernel/softirq.c:650
      sysvec_apic_timer_interrupt+0x6d/0x80 arch/x86/kernel/apic/apic.c:1106
      asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645
      smpboot_thread_fn+0x33c/0x4a0 kernel/smpboot.c:112
      kthread+0x1d7/0x210 kernel/kthread.c:379
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
      read to 0xffff888117d7c080 of 8 bytes by interrupt on cpu 1:
      __packet_get_status net/packet/af_packet.c:436 [inline]
      packet_lookup_frame net/packet/af_packet.c:524 [inline]
      __tpacket_has_room net/packet/af_packet.c:1255 [inline]
      __packet_rcv_has_room+0x3f9/0x450 net/packet/af_packet.c:1298
      tpacket_rcv+0x275/0x1a60 net/packet/af_packet.c:2285
      deliver_skb net/core/dev.c:2173 [inline]
      dev_queue_xmit_nit+0x38a/0x5e0 net/core/dev.c:2243
      xmit_one net/core/dev.c:3574 [inline]
      dev_hard_start_xmit+0xcf/0x3f0 net/core/dev.c:3594
      __dev_queue_xmit+0xefb/0x1d10 net/core/dev.c:4244
      dev_queue_xmit include/linux/netdevice.h:3088 [inline]
      can_send+0x4eb/0x5d0 net/can/af_can.c:276
      bcm_can_tx+0x314/0x410 net/can/bcm.c:302
      bcm_tx_timeout_handler+0xdb/0x260
      __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
      __hrtimer_run_queues+0x217/0x700 kernel/time/hrtimer.c:1749
      hrtimer_run_softirq+0xd6/0x120 kernel/time/hrtimer.c:1766
      __do_softirq+0xc1/0x265 kernel/softirq.c:571
      run_ksoftirqd+0x17/0x20 kernel/softirq.c:939
      smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
      kthread+0x1d7/0x210 kernel/kthread.c:379
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
      value changed: 0x0000000000000000 -> 0x0000000020000081
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 6.4.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
      
      Fixes: 69e3c75f ("net: TX_RING and packet mmap")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20230803145600.2937518-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a989617
  2. 04 Aug, 2023 6 commits
  3. 03 Aug, 2023 23 commits
  4. 02 Aug, 2023 3 commits