1. 08 Jan, 2022 9 commits
    • Eric Dumazet's avatar
      af_packet: fix tracking issues in packet_do_bind() · bf44077c
      Eric Dumazet authored
      It appears that my changes in packet_do_bind() were
      slightly wrong.
      
      syzbot found that calling bind() twice would trigger
      a false positive.
      
      Remove proto_curr/dev_curr variables and rewrite things
      to be less confusing (like not having to use netdev_tracker_alloc(),
      and instead use the standard dev_hold_track())
      
      Fixes: f1d9268e ("net: add net device refcount tracker to struct packet_type")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220107183953.3886647-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bf44077c
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-refactoring-for-one-selftest-and-csum-validation' · d8caa2ed
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: Refactoring for one selftest and csum validation
      
      Patch 1 changes the MPTCP join self tests to depend more on events
      rather than delays, so the script runs faster and has more consistent
      results.
      
      Patches 2 and 3 get rid of some duplicate code in MPTCP's checksum
      validation by modifying and leveraging an existing helper function.
      ====================
      
      Link: https://lore.kernel.org/r/20220107192524.445137-1-mathew.j.martineau@linux.intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d8caa2ed
    • Geliang Tang's avatar
      mptcp: reuse __mptcp_make_csum in validate_data_csum · 8401e87f
      Geliang Tang authored
      This patch reused __mptcp_make_csum() in validate_data_csum() instead of
      open-coding.
      Signed-off-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8401e87f
    • Geliang Tang's avatar
      mptcp: change the parameter of __mptcp_make_csum · c312ee21
      Geliang Tang authored
      This patch changed the type of the last parameter of __mptcp_make_csum()
      from __sum16 to __wsum. And export this function in protocol.h.
      Signed-off-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c312ee21
    • Paolo Abeni's avatar
      selftests: mptcp: more stable join tests-cases · 327b9a94
      Paolo Abeni authored
      MPTCP join self-tests are a bit fragile as they reply on
      delays instead of events to catch-up with the expected
      sockets states.
      
      Replace the delay with state checking where possible and
      reduce the number of sleeps in the most complex scenarios.
      
      This will both reduce the tests run-time and will improve
      stability.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      327b9a94
    • Vladimir Oltean's avatar
      net: dsa: felix: add port fast age support · 5cad43a5
      Vladimir Oltean authored
      Add support for flushing the MAC table on a given port in the ocelot
      switch library, and use this functionality in the felix DSA driver.
      
      This operation is needed when a port leaves a bridge to become
      standalone, and when the learning is disabled, and when the STP state
      changes to a state where no FDB entry should be present.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220107144229.244584-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5cad43a5
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix incorrect balancing with down LAG ports · a14e6b69
      Vladimir Oltean authored
      Assuming the test setup described here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/
      (swp1 and swp2 are in bond0, and bond0 is in a bridge with swp0)
      
      it can be seen that when swp1 goes down (on either board A or B), then
      traffic that should go through that port isn't forwarded anywhere.
      
      A dump of the PGID table shows the following:
      
      PGID_DST[0] = ports 0
      PGID_DST[1] = ports 1
      PGID_DST[2] = ports 2
      PGID_DST[3] = ports 3
      PGID_DST[4] = ports 4
      PGID_DST[5] = ports 5
      PGID_DST[6] = no ports
      PGID_AGGR[0] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[1] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[2] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[3] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[4] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[5] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[6] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[7] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[8] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[9] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[10] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[11] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[12] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[13] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[14] = ports 0, 1, 2, 3, 4, 5
      PGID_AGGR[15] = ports 0, 1, 2, 3, 4, 5
      PGID_SRC[0] = ports 1, 2
      PGID_SRC[1] = ports 0
      PGID_SRC[2] = ports 0
      PGID_SRC[3] = no ports
      PGID_SRC[4] = no ports
      PGID_SRC[5] = no ports
      PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5
      
      Whereas a "good" PGID configuration for that setup should have looked
      like this:
      
      PGID_DST[0] = ports 0
      PGID_DST[1] = ports 1, 2
      PGID_DST[2] = ports 1, 2
      PGID_DST[3] = ports 3
      PGID_DST[4] = ports 4
      PGID_DST[5] = ports 5
      PGID_DST[6] = no ports
      PGID_AGGR[0] = ports 0, 2, 3, 4, 5
      PGID_AGGR[1] = ports 0, 2, 3, 4, 5
      PGID_AGGR[2] = ports 0, 2, 3, 4, 5
      PGID_AGGR[3] = ports 0, 2, 3, 4, 5
      PGID_AGGR[4] = ports 0, 2, 3, 4, 5
      PGID_AGGR[5] = ports 0, 2, 3, 4, 5
      PGID_AGGR[6] = ports 0, 2, 3, 4, 5
      PGID_AGGR[7] = ports 0, 2, 3, 4, 5
      PGID_AGGR[8] = ports 0, 2, 3, 4, 5
      PGID_AGGR[9] = ports 0, 2, 3, 4, 5
      PGID_AGGR[10] = ports 0, 2, 3, 4, 5
      PGID_AGGR[11] = ports 0, 2, 3, 4, 5
      PGID_AGGR[12] = ports 0, 2, 3, 4, 5
      PGID_AGGR[13] = ports 0, 2, 3, 4, 5
      PGID_AGGR[14] = ports 0, 2, 3, 4, 5
      PGID_AGGR[15] = ports 0, 2, 3, 4, 5
      PGID_SRC[0] = ports 1, 2
      PGID_SRC[1] = ports 0
      PGID_SRC[2] = ports 0
      PGID_SRC[3] = no ports
      PGID_SRC[4] = no ports
      PGID_SRC[5] = no ports
      PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5
      
      In other words, in the "bad" configuration, the attempt is to remove the
      inactive swp1 from the destination ports via PGID_DST. But when a MAC
      table entry is learned, it is learned towards PGID_DST 1, because that
      is the logical port id of the LAG itself (it is equal to the lowest
      numbered member port). So when swp1 becomes inactive, if we set
      PGID_DST[1] to contain just swp1 and not swp2, the packet will not have
      any chance to reach the destination via swp2.
      
      The "correct" way to remove swp1 as a destination is via PGID_AGGR
      (remove swp1 from the aggregation port groups for all aggregation
      codes). This means that PGID_DST[1] and PGID_DST[2] must still contain
      both swp1 and swp2. This makes the MAC table still treat packets
      destined towards the single-port LAG as "multicast", and the inactive
      ports are removed via the aggregation code tables.
      
      The change presented here is a design one: the ocelot_get_bond_mask()
      function used to take an "only_active_ports" argument. We don't need
      that. The only call site that specifies only_active_ports=true,
      ocelot_set_aggr_pgids(), must retrieve the entire bonding mask, because
      it must program that into PGID_DST. Additionally, it must also clear the
      inactive ports from the bond mask here, which it can't do if bond_mask
      just contains the active ports:
      
      	ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i);
      	ac &= ~bond_mask;  <---- here
      	/* Don't do division by zero if there was no active
      	 * port. Just make all aggregation codes zero.
      	 */
      	if (num_active_ports)
      		ac |= BIT(aggr_idx[i % num_active_ports]);
      	ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i);
      
      So it becomes the responsibility of ocelot_set_aggr_pgids() to take
      ocelot_port->lag_tx_active into consideration when populating the
      aggr_idx array.
      
      Fixes: 23ca3b72 ("net: mscc: ocelot: rebalance LAGs on link up/down events")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220107164332.402133-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a14e6b69
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · a5e7d9bb
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2022-01-07
      
      This series contains updates to i40e and iavf drivers.
      
      Karen limits per VF MAC filters so that one VF does not consume all
      filters for i40e.
      
      Jedrzej reduces busy wait time for admin queue calls for i40e.
      
      Mateusz updates firmware versions to reflect new supported NVM images
      and renames an error to remove non-inclusive language for i40e.
      
      Yang Li fixes a set but not used warning for i40e.
      
      Jason Wang removes an unneeded variable for iavf.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
        iavf: remove an unneeded variable
        i40e: remove variables set but not used
        i40e: Remove non-inclusive language
        i40e: Update FW API version
        i40e: Minimize amount of busy-waiting during AQ send
        i40e: Add ensurance of MacVlan resources for every trusted VF
      ====================
      
      Link: https://lore.kernel.org/r/20220107175704.438387-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a5e7d9bb
    • Gal Pressman's avatar
      net/tls: Fix skb memory leak when running kTLS traffic · ffef737f
      Gal Pressman authored
      The cited Fixes commit introduced a memory leak when running kTLS
      traffic (with/without hardware offloads).
      I'm running nginx on the server side and wrk on the client side and get
      the following:
      
        unreferenced object 0xffff8881935e9b80 (size 224):
        comm "softirq", pid 0, jiffies 4294903611 (age 43.204s)
        hex dump (first 32 bytes):
          80 9b d0 36 81 88 ff ff 00 00 00 00 00 00 00 00  ...6............
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000efe2a999>] build_skb+0x1f/0x170
          [<00000000ef521785>] mlx5e_skb_from_cqe_mpwrq_linear+0x2bc/0x610 [mlx5_core]
          [<00000000945d0ffe>] mlx5e_handle_rx_cqe_mpwrq+0x264/0x9e0 [mlx5_core]
          [<00000000cb675b06>] mlx5e_poll_rx_cq+0x3ad/0x17a0 [mlx5_core]
          [<0000000018aac6a9>] mlx5e_napi_poll+0x28c/0x1b60 [mlx5_core]
          [<000000001f3369d1>] __napi_poll+0x9f/0x560
          [<00000000cfa11f72>] net_rx_action+0x357/0xa60
          [<000000008653b8d7>] __do_softirq+0x282/0x94e
          [<00000000644923c6>] __irq_exit_rcu+0x11f/0x170
          [<00000000d4085f8f>] irq_exit_rcu+0xa/0x20
          [<00000000d412fef4>] common_interrupt+0x7d/0xa0
          [<00000000bfb0cebc>] asm_common_interrupt+0x1e/0x40
          [<00000000d80d0890>] default_idle+0x53/0x70
          [<00000000f2b9780e>] default_idle_call+0x8c/0xd0
          [<00000000c7659e15>] do_idle+0x394/0x450
      
      I'm not familiar with these areas of the code, but I've added this
      sk_defer_free_flush() to tls_sw_recvmsg() based on a hunch and it
      resolved the issue.
      
      Fixes: f35f8219 ("tcp: defer skb freeing after socket lock is released")
      Signed-off-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20220102081253.9123-1-gal@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ffef737f
  2. 07 Jan, 2022 31 commits