An error occurred fetching the project authors.
  1. 22 Aug, 2017 2 commits
  2. 18 Aug, 2017 1 commit
  3. 07 Aug, 2017 1 commit
  4. 03 Aug, 2017 3 commits
    • Neal Cardwell's avatar
      tcp: fix xmit timer to only be reset if data ACKed/SACKed · df92c839
      Neal Cardwell authored
      Fix a TCP loss recovery performance bug raised recently on the netdev
      list, in two threads:
      
      (i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
      (ii) July 26, 2017: netdev thread:
           "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
           outstanding TLP retransmission"
      
      The basic problem is that incoming TCP packets that did not indicate
      forward progress could cause the xmit timer (TLP or RTO) to be rearmed
      and pushed back in time. In certain corner cases this could result in
      the following problems noted in these threads:
      
       - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
         could cause TCP to repeatedly schedule TLPs forever. We kept
         sending TLPs after every ~200ms, which elicited bogus SACKs, which
         caused more TLPs, ad infinitum; we never fired an RTO to fill in
         the holes.
      
       - Incoming data segments could, in some cases, cause us to reschedule
         our RTO or TLP timer further out in time, for no good reason. This
         could cause repeated inbound data to result in stalls in outbound
         data, in the presence of packet loss.
      
      This commit fixes these bugs by changing the TLP and RTO ACK
      processing to:
      
       (a) Only reschedule the xmit timer once per ACK.
      
       (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
           ACK indicates sufficient forward progress (a packet was
           cumulatively ACKed, or we got a SACK for a packet that was sent
           before the most recent retransmit of the write queue head).
      
      This brings us back into closer compliance with the RFCs, since, as
      the comment for tcp_rearm_rto() notes, we should only restart the RTO
      timer after forward progress on the connection. Previously we were
      restarting the xmit timer even in these cases where there was no
      forward progress.
      
      As a side benefit, this commit simplifies and speeds up the TCP timer
      arming logic. We had been calling inet_csk_reset_xmit_timer() three
      times on normal ACKs that cumulatively acknowledged some data:
      
      1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
              if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
                     tcp_rearm_rto(sk);
      
      2) Once in tcp_clean_rtx_queue(), to update the RTO:
              if (flag & FLAG_ACKED) {
                     tcp_rearm_rto(sk);
      
      3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
         to TLP:
              if (icsk->icsk_pending == ICSK_TIME_RETRANS)
                     tcp_schedule_loss_probe(sk);
      
      This commit, by only rescheduling the xmit timer once per ACK,
      simplifies the code and reduces CPU overhead.
      
      This commit was tested in an A/B test with Google web server
      traffic. SNMP stats and request latency metrics were within noise
      levels, substantiating that for normal web traffic patterns this is a
      rare issue. This commit was also tested with packetdrill tests to
      verify that it fixes the timer behavior in the corner cases discussed
      in the netdev threads mentioned above.
      
      This patch is a bug fix patch intended to be queued for -stable
      relases.
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Reported-by: default avatarKlavs Klavsen <kl@vsen.dk>
      Reported-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNandita Dukkipati <nanditad@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df92c839
    • Neal Cardwell's avatar
      tcp: introduce tcp_rto_delta_us() helper for xmit timer fix · e1a10ef7
      Neal Cardwell authored
      Pure refactor. This helper will be required in the xmit timer fix
      later in the patch series. (Because the TLP logic will want to make
      this calculation.)
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNandita Dukkipati <nanditad@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1a10ef7
    • Neal Cardwell's avatar
      tcp: remove extra POLL_OUT added for finished active connect() · d06c3583
      Neal Cardwell authored
      Commit 45f119bf ("tcp: remove header prediction") introduced a
      minor bug: the sk_state_change() and sk_wake_async() notifications for
      a completed active connection happen twice: once in this new spot
      inside tcp_finish_connect() and once in the existing code in
      tcp_rcv_synsent_state_process() immediately after it calls
      tcp_finish_connect(). This commit remoes the duplicate POLL_OUT
      notifications.
      
      Fixes: 45f119bf ("tcp: remove header prediction")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d06c3583
  5. 02 Aug, 2017 2 commits
  6. 31 Jul, 2017 4 commits
  7. 25 Jul, 2017 1 commit
  8. 01 Jul, 2017 3 commits
  9. 08 Jun, 2017 4 commits
  10. 02 Jun, 2017 1 commit
  11. 25 May, 2017 1 commit
    • Eric Dumazet's avatar
      tcp: better validation of received ack sequences · d0e1a1b5
      Eric Dumazet authored
      Paul Fiterau Brostean reported :
      
      <quote>
      Linux TCP stack we analyze exhibits behavior that seems odd to me.
      The scenario is as follows (all packets have empty payloads, no window
      scaling, rcv/snd window size should not be a factor):
      
             TEST HARNESS (CLIENT)                        LINUX SERVER
      
         1.  -                                          LISTEN (server listen,
      then accepts)
      
         2.  - --> <SEQ=100><CTL=SYN>               --> SYN-RECEIVED
      
         3.  - <-- <SEQ=300><ACK=101><CTL=SYN,ACK>  <-- SYN-RECEIVED
      
         4.  - --> <SEQ=101><ACK=301><CTL=ACK>      --> ESTABLISHED
      
         5.  - <-- <SEQ=301><ACK=101><CTL=FIN,ACK>  <-- FIN WAIT-1 (server
      opts to close the data connection calling "close" on the connection
      socket)
      
         6.  - --> <SEQ=101><ACK=99999><CTL=FIN,ACK> --> CLOSING (client sends
      FIN,ACK with not yet sent acknowledgement number)
      
         7.  - <-- <SEQ=302><ACK=102><CTL=ACK>      <-- CLOSING (ACK is 102
      instead of 101, why?)
      
      ... (silence from CLIENT)
      
         8.  - <-- <SEQ=301><ACK=102><CTL=FIN,ACK>  <-- CLOSING
      (retransmission, again ACK is 102)
      
      Now, note that packet 6 while having the expected sequence number,
      acknowledges something that wasn't sent by the server. So I would
      expect
      the packet to maybe prompt an ACK response from the server, and then be
      ignored. Yet it is not ignored and actually leads to an increase of the
      acknowledgement number in the server's retransmission of the FIN,ACK
      packet. The explanation I found is that the FIN  in packet 6 was
      processed, despite the acknowledgement number being unacceptable.
      Further experiments indeed show that the server processes this FIN,
      transitioning to CLOSING, then on receiving an ACK for the FIN it had
      send in packet 5, the server (or better said connection) transitions
      from CLOSING to TIME_WAIT (as signaled by netstat).
      
      </quote>
      
      Indeed, tcp_rcv_state_process() calls tcp_ack() but
      does not exploit the @acceptable status but for TCP_SYN_RECV
      state.
      
      What we want here is to send a challenge ACK, if not in TCP_SYN_RECV
      state. TCP_FIN_WAIT1 state is not the only state we should fix.
      
      Add a FLAG_NO_CHALLENGE_ACK so that tcp_rcv_state_process()
      can choose to send a challenge ACK and discard the packet instead
      of wrongly change socket state.
      
      With help from Neal Cardwell.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarPaul Fiterau Brostean <p.fiterau-brostean@science.ru.nl>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0e1a1b5
  12. 19 May, 2017 1 commit
  13. 18 May, 2017 1 commit
  14. 17 May, 2017 6 commits
  15. 16 May, 2017 1 commit
  16. 12 May, 2017 1 commit
  17. 05 May, 2017 1 commit
    • Eric Dumazet's avatar
      tcp: randomize timestamps on syncookies · 84b114b9
      Eric Dumazet authored
      Whole point of randomization was to hide server uptime, but an attacker
      can simply start a syn flood and TCP generates 'old style' timestamps,
      directly revealing server jiffies value.
      
      Also, TSval sent by the server to a particular remote address vary
      depending on syncookies being sent or not, potentially triggering PAWS
      drops for innocent clients.
      
      Lets implement proper randomization, including for SYNcookies.
      
      Also we do not need to export sysctl_tcp_timestamps, since it is not
      used from a module.
      
      In v2, I added Florian feedback and contribution, adding tsoff to
      tcp_get_cookie_sock().
      
      v3 removed one unused variable in tcp_v4_connect() as Florian spotted.
      
      Fixes: 95a22cae ("tcp: randomize tcp timestamp offsets for each connection")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Tested-by: default avatarFlorian Westphal <fw@strlen.de>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84b114b9
  18. 26 Apr, 2017 6 commits