1. 17 May, 2020 10 commits
    • John Hubbard's avatar
      rds: convert get_user_pages() --> pin_user_pages() · dbfe7d74
      John Hubbard authored
      This code was using get_user_pages_fast(), in a "Case 2" scenario
      (DMA/RDMA), using the categorization from [1]. That means that it's
      time to convert the get_user_pages_fast() + put_page() calls to
      pin_user_pages_fast() + unpin_user_pages() calls.
      
      There is some helpful background in [2]: basically, this is a small
      part of fixing a long-standing disconnect between pinning pages, and
      file systems' use of those pages.
      
      [1] Documentation/core-api/pin_user_pages.rst
      
      [2] "Explicit pinning of user-space pages":
          https://lwn.net/Articles/807108/
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-rdma@vger.kernel.org
      Cc: rds-devel@oss.oracle.com
      Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbfe7d74
    • David S. Miller's avatar
      Merge branch 'mptcp-do-not-block-on-subflow-socket' · 9740a7ae
      David S. Miller authored
      Florian Westphal says:
      
      ====================
      mptcp: do not block on subflow socket
      
      This series reworks mptcp_sendmsg logic to avoid blocking on the subflow
      socket.
      
      It does so by removing the wait loop from mptcp_sendmsg_frag helper.
      
      In order to do that, it moves prerequisites that are currently
      handled in mptcp_sendmsg_frag (and cause it to wait until they are
      met, e.g. frag cache refill) into the callers.
      
      The worker can just reschedule in case no subflow socket is ready,
      since it can't wait -- doing so would block other work items and
      doesn't make sense anyway because we should not (re)send data
      in case resources are already low.
      
      The sendmsg path can use the existing wait logic until memory
      becomes available.
      
      Because large send requests can result in multiple mptcp_sendmsg_frag
      calls from mptcp_sendmsg, we may need to restart the socket lookup in
      case subflow can't accept more data or memory is low.
      
      Doing so blocks on the mptcp socket, and existing wait handling
      releases the msk lock while blocking.
      
      Lastly, no need to use GFP_ATOMIC for extension allocation:
      extend __skb_ext_alloc with gfp_t arg instead of hard-coded ATOMIC and
      then relax the allocation constraints for mptcp case: those requests
      occur in process context.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9740a7ae
    • Florian Westphal's avatar
      net: allow __skb_ext_alloc to sleep · 4930f483
      Florian Westphal authored
      mptcp calls this from the transmit side, from process context.
      Allow a sleeping allocation instead of unconditional GFP_ATOMIC.
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4930f483
    • Florian Westphal's avatar
      mptcp: remove inner wait loop from mptcp_sendmsg_frag · 5c826443
      Florian Westphal authored
      previous patches made sure we only call into this function
      when these prerequisites are met, so no need to wait on the
      subflow socket anymore.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/7Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c826443
    • Florian Westphal's avatar
      mptcp: fill skb page frag cache outside of mptcp_sendmsg_frag · 17091708
      Florian Westphal authored
      The mptcp_sendmsg_frag helper contains a loop that will wait on the
      subflow sk.
      
      It seems preferrable to only wait in mptcp_sendmsg() when blocking io is
      requested.  mptcp_sendmsg already has such a wait loop that is used when
      no subflow socket is available for transmission.
      
      This is another preparation patch that makes sure we call
      mptcp_sendmsg_frag only if the page frag cache has been refilled.
      
      Followup patch will remove the wait loop from mptcp_sendmsg_frag().
      
      The retransmit worker doesn't need to do this refill as it won't
      transmit new mptcp-level data.
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17091708
    • Florian Westphal's avatar
      mptcp: fill skb extension cache outside of mptcp_sendmsg_frag · 149f7c71
      Florian Westphal authored
      The mptcp_sendmsg_frag helper contains a loop that will wait on the
      subflow sk.
      
      It seems preferrable to only wait in mptcp_sendmsg() when blocking io is
      requested.  mptcp_sendmsg already has such a wait loop that is used when
      no subflow socket is available for transmission.
      
      This is a preparation patch that makes sure we call
      mptcp_sendmsg_frag only if a skb extension has been allocated.
      
      Moreover, such allocation currently uses GFP_ATOMIC while it
      could use sleeping allocation instead.
      
      Followup patches will remove the wait loop from mptcp_sendmsg_frag()
      and will allow to do a sleeping allocation for the extension.
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      149f7c71
    • Florian Westphal's avatar
      mptcp: avoid blocking in tcp_sendpages · 72511aab
      Florian Westphal authored
      The transmit loop continues to xmit new data until an error is returned
      or all data was transmitted.
      
      For the blocking i/o case, this means that tcp_sendpages() may block on
      the subflow until more space becomes available, i.e. we end up sleeping
      with the mptcp socket lock held.
      
      Instead we should check if a different subflow is ready to be used.
      
      This restarts the subflow sk lookup when the tx operation succeeded
      and the tcp subflow can't accept more data or if tcp_sendpages
      indicates -EAGAIN on a blocking mptcp socket.
      
      In that case we also need to set the NOSPACE bit to make sure we get
      notified once memory becomes available.
      
      In case all subflows are busy, the existing logic will wait until a
      subflow is ready, releasing the mptcp socket lock while doing so.
      
      The mptcp worker already sets DONTWAIT, so no need to make changes there.
      
      v2:
       * set NOSPACE bit
       * add a comment to clarify that mptcp-sk sndbuf limits need to
         be checked as well.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72511aab
    • Florian Westphal's avatar
      mptcp: break and restart in case mptcp sndbuf is full · fb529e62
      Florian Westphal authored
      Its not enough to check for available tcp send space.
      
      We also hold on to transmitted data for mptcp-level retransmits.
      Right now we will send more and more data if the peer can ack data
      at the tcp level fast enough, since that frees up tcp send buffer space.
      
      But we also need to check that data was acked and reclaimed at the mptcp
      level.
      
      Therefore add needed check in mptcp_sendmsg, flush tcp data and
      wait until more mptcp snd space becomes available if we are over the
      limit.  Before we wait for more data, also make sure we start the
      retransmit timer if we ran out of sndbuf space.
      
      Otherwise there is a very small chance that we wait forever:
      
       * receiver is waiting for data
       * sender is blocked because mptcp socket buffer is full
       * at tcp level, all data was acked
       * mptcp-level snd_una was not updated, because last ack
         that acknowledged the last data packet carried an older
         MPTCP-ack.
      
      Restarting the retransmit timer avoids this problem: if TCP
      subflow is idle, data is retransmitted from the RTX queue.
      
      New data will make the peer send a new, updated MPTCP-Ack.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb529e62
    • Florian Westphal's avatar
      mptcp: move common nospace-pattern to a helper · a0e17064
      Florian Westphal authored
      Paolo noticed that ssk_check_wmem() has same pattern, so add/use
      common helper for both places.
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0e17064
    • David Ahern's avatar
      selftests: Drop 'pref medium' in route checks · eb682677
      David Ahern authored
      The 'pref medium' attribute was moved in iproute2 to be near the prefix
      which is where it applies versus after the last nexthop. The nexthop
      tests were updated to drop the string from route checking, but it crept
      in again with the compat tests.
      
      Fixes: 4dddb5be ("selftests: net: add new testcases for nexthop API compat mode sysctl")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb682677
  2. 16 May, 2020 19 commits
  3. 15 May, 2020 11 commits