1. 23 Sep, 2021 11 commits
    • David S. Miller's avatar
      Merge branch 'remove-sk-skb-caches' · 5146a574
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      net: remove sk skb caches
      
      Eric noted we would be better off reverting the sk
      skb caches.
      
      MPTCP relies on such a feature, so we need a
      little refactor of the MPTCP tx path before the mentioned
      revert.
      
      The first patch exposes additional TCP helpers. The 2nd patch
      changes the MPTCP code to do locally the whole skb allocation
      and updating, so it does not rely anymore on core TCP helpers
      for that nor the sk skb cache.
      
      As a side effect, we can make the tcp_build_frag helper static.
      
      Finally, we can pull Eric's revert.
      
      RFC -> v1:
       - drop driver specific patch - no more needed after helper rename
       - rename skb_entail -> tcp_skb_entail (Eric)
       - preserve the tcp_build_frag helpwe, just make it static (Eric)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5146a574
    • Eric Dumazet's avatar
      tcp: remove sk_{tr}x_skb_cache · d8b81175
      Eric Dumazet authored
      This reverts the following patches :
      
      - commit 2e05fcae ("tcp: fix compile error if !CONFIG_SYSCTL")
      - commit 4f661542 ("tcp: fix zerocopy and notsent_lowat issues")
      - commit 472c2e07 ("tcp: add one skb cache for tx")
      - commit 8b27dae5 ("tcp: add one skb cache for rx")
      
      Having a cache of one skb (in each direction) per TCP socket is fragile,
      since it can cause a significant increase of memory needs,
      and not good enough for high speed flows anyway where more than one skb
      is needed.
      
      We want instead to add a generic infrastructure, with more flexible
      per-cpu caches, for alien NUMA nodes.
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8b81175
    • Paolo Abeni's avatar
      tcp: make tcp_build_frag() static · ff6fb083
      Paolo Abeni authored
      After the previous patch the mentioned helper is
      used only inside its compilation unit: let's make
      it static.
      
      RFC -> v1:
       - preserve the tcp_build_frag() helper (Eric)
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff6fb083
    • Paolo Abeni's avatar
      mptcp: stop relying on tcp_tx_skb_cache · f70cad10
      Paolo Abeni authored
      We want to revert the skb TX cache, but MPTCP is currently
      using it unconditionally.
      
      Rework the MPTCP tx code, so that tcp_tx_skb_cache is not
      needed anymore: do the whole coalescing check, skb allocation
      skb initialization/update inside mptcp_sendmsg_frag(), quite
      alike the current TCP code.
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f70cad10
    • Paolo Abeni's avatar
      tcp: expose the tcp_mark_push() and tcp_skb_entail() helpers · 04d8825c
      Paolo Abeni authored
      the tcp_skb_entail() helper is actually skb_entail(), renamed
      to provide proper scope.
      
          The two helper will be used by the next patch.
      
      RFC -> v1:
       - rename skb_entail to tcp_skb_entail (Eric)
      Acked-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04d8825c
    • Paolo Abeni's avatar
      mptcp: ensure tx skbs always have the MPTCP ext · efe686ff
      Paolo Abeni authored
      Due to signed/unsigned comparison, the expression:
      
      	info->size_goal - skb->len > 0
      
      evaluates to true when the size goal is smaller than the
      skb size. That results in lack of tx cache refill, so that
      the skb allocated by the core TCP code lacks the required
      MPTCP skb extensions.
      
      Due to the above, syzbot is able to trigger the following WARN_ON():
      
      WARNING: CPU: 1 PID: 810 at net/mptcp/protocol.c:1366 mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
      Modules linked in:
      CPU: 1 PID: 810 Comm: syz-executor.4 Not tainted 5.14.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
      Code: ff 4c 8b 74 24 50 48 8b 5c 24 58 e9 0f fb ff ff e8 13 44 8b f8 4c 89 e7 45 31 ed e8 98 57 2e fe e9 81 f4 ff ff e8 fe 43 8b f8 <0f> 0b 41 bd ea ff ff ff e9 6f f4 ff ff 4c 89 e7 e8 b9 8e d2 f8 e9
      RSP: 0018:ffffc9000531f6a0 EFLAGS: 00010216
      RAX: 000000000000697f RBX: 0000000000000000 RCX: ffffc90012107000
      RDX: 0000000000040000 RSI: ffffffff88eac9e2 RDI: 0000000000000003
      RBP: ffff888078b15780 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff88eac017 R11: 0000000000000000 R12: ffff88801de0a280
      R13: 0000000000006b58 R14: ffff888066278280 R15: ffff88803c2fe9c0
      FS:  00007fd9f866e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007faebcb2f718 CR3: 00000000267cb000 CR4: 00000000001506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __mptcp_push_pending+0x1fb/0x6b0 net/mptcp/protocol.c:1547
       mptcp_release_cb+0xfe/0x210 net/mptcp/protocol.c:3003
       release_sock+0xb4/0x1b0 net/core/sock.c:3206
       sk_stream_wait_memory+0x604/0xed0 net/core/stream.c:145
       mptcp_sendmsg+0xc39/0x1bc0 net/mptcp/protocol.c:1749
       inet6_sendmsg+0x99/0xe0 net/ipv6/af_inet6.c:643
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_write_iter+0x2a0/0x3e0 net/socket.c:1057
       call_write_iter include/linux/fs.h:2163 [inline]
       new_sync_write+0x40b/0x640 fs/read_write.c:507
       vfs_write+0x7cf/0xae0 fs/read_write.c:594
       ksys_write+0x1ee/0x250 fs/read_write.c:647
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x4665f9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fd9f866e188 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 000000000056c038 RCX: 00000000004665f9
      RDX: 00000000000e7b78 RSI: 0000000020000000 RDI: 0000000000000003
      RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056c038
      R13: 0000000000a9fb1f R14: 00007fd9f866e300 R15: 0000000000022000
      
      Fix the issue rewriting the relevant expression to avoid
      sign-related problems - note: size_goal is always >= 0.
      
      Additionally, ensure that the skb in the tx cache always carries
      the relevant extension.
      
      Reported-and-tested-by: syzbot+263a248eec3e875baa7b@syzkaller.appspotmail.com
      Fixes: 1094c6fe ("mptcp: fix possible divide by zero")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efe686ff
    • Vladimir Oltean's avatar
      net: dsa: sja1105: don't keep a persistent reference to the reset GPIO · 33e1501f
      Vladimir Oltean authored
      The driver only needs the reset GPIO for a very brief period, so instead
      of using devres and keeping the descriptor pointer inside priv, just use
      that descriptor inside the sja1105_hw_reset function and then let go of
      it.
      
      Also use gpiod_get_optional while at it, and error out on real errors
      (bad flags etc).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33e1501f
    • David S. Miller's avatar
      Merge branch 'ja1105-deps' · a7597f79
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Fix circular dependency between sja1105 and tag_sja1105
      
      As discussed here:
      https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      DSA tagging protocols cannot use symbols exported by switch drivers.
      
      Eliminate the two instances of that from tag_sja1105, and that allows us
      to have a working setup with modules again.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7597f79
    • Vladimir Oltean's avatar
      net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver · f5aef424
      Vladimir Oltean authored
      It's nice to be able to test a tagging protocol with dsa_loop, but not
      at the cost of losing the ability of building the tagging protocol and
      switch driver as modules, because as things stand, there is a circular
      dependency between the two. Tagging protocol drivers cannot depend on
      switch drivers, that is a hard fact.
      
      The reasoning behind the blamed patch was that accessing dp->priv should
      first make sure that the structure behind that pointer is what we really
      think it is.
      
      Currently the "sja1105" and "sja1110" tagging protocols only operate
      with the sja1105 switch driver, just like any other tagging protocol and
      switch combination. The only way to mix and match them is by modifying
      the code, and this applies to dsa_loop as well (by default that uses
      DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
      practice there isn't one.
      
      Until we extend dsa_loop to allow user space configuration, treat the
      problem as a non-issue and just say that DSA ports found by tag_sja1105
      are always sja1105 ports, which is in fact true. But keep the
      dsa_port_is_sja1105 function so that it's easy to patch it during
      testing, and rely on dead code elimination.
      
      Fixes: 994d2cbb ("net: dsa: tag_sja1105: be dsa_loop-safe")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5aef424
    • Vladimir Oltean's avatar
      net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver · 6d709cad
      Vladimir Oltean authored
      The problem is that DSA tagging protocols really must not depend on the
      switch driver, because this creates a circular dependency at insmod
      time, and the switch driver will effectively not load when the tagging
      protocol driver is missing.
      
      The code was structured in the way it was for a reason, though. The DSA
      driver-facing API for PTP timestamping relies on the assumption that
      two-step TX timestamps are provided by the hardware in an out-of-band
      manner, typically by raising an interrupt and making that timestamp
      available inside some sort of FIFO which is to be accessed over
      SPI/MDIO/etc.
      
      So the API puts .port_txtstamp into dsa_switch_ops, because it is
      expected that the switch driver needs to save some state (like put the
      skb into a queue until its TX timestamp arrives).
      
      On SJA1110, TX timestamps are provided by the switch as Ethernet
      packets, so this makes them be received and processed by the tagging
      protocol driver. This in itself is great, because the timestamps are
      full 64-bit and do not require reconstruction, and since Ethernet is the
      fastest I/O method available to/from the switch, PTP timestamps arrive
      very quickly, no matter how bottlenecked the SPI connection is, because
      SPI interaction is not needed at all.
      
      DSA's code structure and strict isolation between the tagging protocol
      driver and the switch driver break the natural code organization.
      
      When the tagging protocol driver receives a packet which is classified
      as a metadata packet containing timestamps, it passes those timestamps
      one by one to the switch driver, which then proceeds to compare them
      based on the recorded timestamp ID that was generated in .port_txtstamp.
      
      The communication between the tagging protocol and the switch driver is
      done through a method exported by the switch driver, sja1110_process_meta_tstamp.
      To satisfy build requirements, we force a dependency to build the
      tagging protocol driver as a module when the switch driver is a module.
      However, as explained in the first paragraph, that causes the circular
      dependency.
      
      To solve this, move the skb queue from struct sja1105_private :: struct
      sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
      The latter is a data structure for which hacks have already been put
      into place to be able to create persistent storage per switch that is
      accessible from the tagging protocol driver (see sja1105_setup_ports).
      
      With the skb queue directly accessible from the tagging protocol driver,
      we can now move sja1110_process_meta_tstamp into the tagging driver
      itself, and avoid exporting a symbol.
      
      Fixes: 566b18c8 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d709cad
    • Vladimir Oltean's avatar
      net: dsa: sja1105: remove sp->dp · 68a81bb2
      Vladimir Oltean authored
      It looks like this field was never used since its introduction in commit
      227d07a0 ("net: dsa: sja1105: Add support for traffic through
      standalone ports") remove it.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68a81bb2
  2. 22 Sep, 2021 6 commits
  3. 21 Sep, 2021 9 commits
  4. 20 Sep, 2021 14 commits