1. 23 Jun, 2021 29 commits
  2. 22 Jun, 2021 11 commits
    • David S. Miller's avatar
      Merge branch 'mptcp-C-flag-and-fixes' · 38f75922
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: Connection-time 'C' flag and two fixes
      
      Here are six more patches from the MPTCP tree.
      
      Most of them add support for the 'C' flag in the MPTCP connection-time
      option headers. This flag affects how the initial address and port are
      treated by each peer. Normally one peer may send MP_JOIN requests to the
      remote address and port that were used when initiating the MPTCP
      connection. The 'C' bit indicates that MP_JOINs should only be sent to
      remote addresses that have been advertised with ADD_ADDR.
      
      The other two patches are unrelated improvements.
      
      Patches 1-4: Add the 'C' flag feature, a sysctl to optionally enable it,
      and a selftest.
      
      Patch 5: Adjust rp_filter settings in a selftest.
      
      Patch 6: Improve rbuf cleanup for MPTCP sockets.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38f75922
    • Paolo Abeni's avatar
      mptcp: refine mptcp_cleanup_rbuf · fde56eea
      Paolo Abeni authored
      The current cleanup rbuf tries a bit too hard to avoid acquiring
      the subflow socket lock. We may end-up delaying the needed ack,
      or skip acking a blocked subflow.
      
      Address the above extending the conditions used to trigger the cleanup
      to reflect more closely what TCP does and invoking tcp_cleanup_rbuf()
      on all the active subflows.
      
      Note that we can't replicate the exact tests implemented in
      tcp_cleanup_rbuf(), as MPTCP lacks some of the required info - e.g.
      ping-pong mode.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fde56eea
    • Yonglong Li's avatar
      selftests: mptcp: turn rp_filter off on each NIC · d8e336f7
      Yonglong Li authored
      To turn rp_filter off we should:
      
        echo 0 > /proc/sys/net/ipv4/conf/default/rp_filter
      
      and
      
        echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter
      
      before NIC created.
      Co-developed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarYonglong Li <liyonglong@chinatelecom.cn>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8e336f7
    • Geliang Tang's avatar
      selftests: mptcp: add deny_join_id0 testcases · 0cddb4a6
      Geliang Tang authored
      This patch added a new argument '-d' for mptcp_join.sh script, to invoke
      the testcases for the MP_CAPABLE 'C' flag.
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cddb4a6
    • Geliang Tang's avatar
      mptcp: add deny_join_id0 in mptcp_options_received · df377be3
      Geliang Tang authored
      This patch added a new flag named deny_join_id0 in struct
      mptcp_options_received. Set it when MP_CAPABLE with the flag
      MPTCP_CAP_DENYJOIN_ID0 is received.
      
      Also add a new flag remote_deny_join_id0 in struct mptcp_pm_data. When the
      flag deny_join_id0 is set, set this remote_deny_join_id0 flag.
      
      In mptcp_pm_create_subflow_or_signal_addr, if the remote_deny_join_id0 flag
      is set, and the remote address id is zero, stop this connection.
      Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df377be3
    • Geliang Tang's avatar
      mptcp: add allow_join_id0 in mptcp_out_options · bab6b88e
      Geliang Tang authored
      This patch defined a new flag MPTCP_CAP_DENY_JOIN_ID0 for the third bit,
      labeled "C" of the MP_CAPABLE option.
      
      Add a new flag allow_join_id0 in struct mptcp_out_options. If this flag is
      set, send out the MP_CAPABLE option with the flag MPTCP_CAP_DENY_JOIN_ID0.
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bab6b88e
    • Geliang Tang's avatar
      mptcp: add sysctl allow_join_initial_addr_port · d2f77960
      Geliang Tang authored
      This patch added a new sysctl, named allow_join_initial_addr_port, to
      control whether allow peers to send join requests to the IP address and
      port number used by the initial subflow.
      Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2f77960
    • David S. Miller's avatar
      Merge branch 'sctp-packetization-path-MTU' · a432c771
      David S. Miller authored
      Xin Long says:
      
      ====================
      sctp: implement RFC8899: Packetization Layer Path MTU Discovery for SCTP transport
      
      Overview(From RFC8899):
      
        In contrast to PMTUD, Packetization Layer Path MTU Discovery
        (PLPMTUD) [RFC4821] introduces a method that does not rely upon
        reception and validation of PTB messages.  It is therefore more
        robust than Classical PMTUD.  This has become the recommended
        approach for implementing discovery of the PMTU [BCP145].
      
        It uses a general strategy in which the PL sends probe packets to
        search for the largest size of unfragmented datagram that can be sent
        over a network path.  Probe packets are sent to explore using a
        larger packet size.  If a probe packet is successfully delivered (as
        determined by the PL), then the PLPMTU is raised to the size of the
        successful probe.  If a black hole is detected (e.g., where packets
        of size PLPMTU are consistently not received), the method reduces the
        PLPMTU.
      
      SCTP Probe Packets:
      
        As the RFC suggested, the probe packets consist of an SCTP common header
        followed by a HEARTBEAT chunk and a PAD chunk. The PAD chunk is used to
        control the length of the probe packet.  The HEARTBEAT chunk is used to
        trigger the sending of a HEARTBEAT ACK chunk to confirm this probe on
        the HEARTBEAT sender.
      
        The HEARTBEAT chunk also carries a Heartbeat Information parameter that
        includes the probe size to help an implementation associate a HEARTBEAT
        ACK with the size of probe that was sent. The sender use the nonce and
        the probe size to verify the information returned.
      
      Detailed Implementation on SCTP:
      
                             +------+
                    +------->| Base |-----------------+ Connectivity
                    |        +------+                 | or BASE_PLPMTU
                    |           |                     | confirmation failed
                    |           |                     v
                    |           | Connectivity    +-------+
                    |           | and BASE_PLPMTU | Error |
                    |           | confirmed       +-------+
                    |           |                     | Consistent
                    |           v                     | connectivity
         Black Hole |       +--------+                | and BASE_PLPMTU
          detected  |       | Search |<---------------+ confirmed
                    |       +--------+
                    |          ^  |
                    |          |  |
                    |    Raise |  | Search
                    |    timer |  | algorithm
                    |  expired |  | completed
                    |          |  |
                    |          |  v
                    |   +-----------------+
                    +---| Search Complete |
                        +-----------------+
      
        When PLPMTUD is enabled, it's in Base state, and starts to probe with
        BASE_PLPMTU (1200). If this probe succeeds, it goes to Search state;
        If this probe fails, it goes to Error state under which pl.pmtu goes
        down to MIN_PLPMTU (512) and keeps probing with BASE_PLPMTU until it
        succeeds and goes to Search state.
      
        During the Search state, the probe size is growing by a Big step (32)
        every time when the last probe succeeds at the beginning. Once a probe
        (such as 1420) fails after trying MAX_PROBES (3) times, the probe_size
        goes back to the last one (1420 - 32 = 1388), meanwhile 'probe_high'
        is set to 1420 and the growing step becomes a Small one (4). Then the
        probe is continuing with a Small step grown each round. Until it gets
        the optimal size (such as 1400) when probe with its next probe size
        (1404) fails, it sync this size to pathmtu and goes to Complete state.
      
        In Complete state, it will only does a probe check for the pathmtu just
        set, if it fails, which means a Black Hole is detected and it goes back
        to Base state. If it succeeds, it goes back to Search state again, and
        probe is continuing with growing a Small step (1400 + 4). If this probe
        fails, probe_high is set and goes back to 1388 and then Complete state,
        which is kind of a loop normally. However if the env's pathmtu changes
        to a big size somehow, this probe will succeed and then probe continues
        with growing a Big step (1400 + 32) each round until another probe fails.
      
      PTB Messages Process:
      
        PLPMTUD doesn't rely on these package to find the pmtu, and shouldn't
        trust it either. When processing them, it only changes the probe_size
        to PL_PTB_SIZE(info - hlen) if 'pl.pmtu < PL_PTB_SIZE < the current
        probe_size' druing Search state. As this could help probe_size to get
        to the optimal size faster, for exmaple:
      
        pl.pmtu = 1388, probe_size = 1420, while the env's pathmtu = 1400.
        When probe_size is 1420, a Toobig packet with 1400 comes back. If probe
        size changes to use 1400, it will save quite a few rounds to get there.
        But of course after having this value, PLPMTUD will still verify it on
        its own before using it.
      
      Patches:
      
        - Patch 1-6: introduce some new constants/variables from the RFC, systcl
          and members in transport, APIs for the following patches, chunks and
          a timer for the probe sending and some codes for the probe receiving.
      
        - Patch 7-9: implement the state transition on the tx path, rx path and
          toobig ICMP packet processing. This is the main algorithm part.
      
        - Patch 10: activate this feature
      
        - Patch 11-14: improve the process for ICMP packets for SCTP over UDP,
          so that it can also be covered by this feature.
      
      Tests:
      
        - do sysctl and setsockopt tests for this feature's enabling and disabling.
      
        - get these pr_debug points for this feature by
            # cat /sys/kernel/debug/dynamic_debug/control | grep PLP
          and enable them on kernel dynamic debug, then play with the pathmtu and
          check if the state transition and plpmtu change match the RFC.
      
        - do the above tests for SCTP over IPv4/IPv6 and SCTP over UDP.
      
      v1->v2:
        - See Patch 06/14.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a432c771
    • Xin Long's avatar
      sctp: process sctp over udp icmp err on sctp side · 9e47df00
      Xin Long authored
      Previously, sctp over udp was using udp tunnel's icmp err process, which
      only does sk lookup on sctp side. However for sctp's icmp error process,
      there are more things to do, like syncing assoc pmtu/retransmit packets
      for toobig type err, and starting proto_unreach_timer for unreach type
      err etc.
      
      Now after adding PLPMTUD, which also requires to process toobig type err
      on sctp side. This patch is to process icmp err on sctp side by parsing
      the type/code/info in .encap_err_lookup and call sctp's icmp processing
      functions. Note as the 'redirect' err process needs to know the outer
      ip(v6) header's, we have to leave it to udp(v6)_err to handle it.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e47df00
    • Xin Long's avatar
      sctp: extract sctp_v4_err_handle function from sctp_v4_err · d8306075
      Xin Long authored
      This patch is to extract sctp_v4_err_handle() from sctp_v4_err() to
      only handle the icmp err after the sock lookup, and it also makes
      the code clearer.
      
      sctp_v4_err_handle() will be used in sctp over udp's err handling
      in the following patch.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8306075
    • Xin Long's avatar
      sctp: extract sctp_v6_err_handle function from sctp_v6_err · f6549bd3
      Xin Long authored
      This patch is to extract sctp_v6_err_handle() from sctp_v6_err() to
      only handle the icmp err after the sock lookup, and it also makes
      the code clearer.
      
      sctp_v6_err_handle() will be used in sctp over udp's err handling
      in the following patch.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6549bd3