1. 10 Jul, 2020 11 commits
    • David S. Miller's avatar
      Merge branch 'udp_tunnel-add-NIC-RX-port-offload-infrastructure' · 0ea46047
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      udp_tunnel: add NIC RX port offload infrastructure
      
      Kernel has a facility to notify drivers about the UDP tunnel ports
      so that devices can recognize tunneled packets. This is important
      mostly for RX - devices which don't support CHECKSUM_COMPLETE can
      report checksums of inner packets, and compute RSS over inner headers.
      Some drivers also match the UDP tunnel ports also for TX, although
      doing so may lead to false positives and negatives.
      
      Unfortunately the user experience when trying to take adavantage
      of these facilities is suboptimal. First of all there is no way
      for users to check which ports are offloaded. Many drivers resort
      to printing messages to aid debugging, other use debugfs. Even worse
      the availability of the RX features (NETIF_F_RX_UDP_TUNNEL_PORT)
      is established purely on the basis of the driver having the ndos
      installed. For most drivers, however, the ability to perform offloads
      is contingent on device capabilities (driver support multiple device
      and firmware versions). Unless driver resorts to hackish clearing
      of features set incorrectly by the core - users are left guessing
      whether their device really supports UDP tunnel port offload or not.
      
      There is currently no way to indicate or configure whether RX
      features include just the checksum offload or checksum and using
      inner headers for RSS. Many drivers default to not using inner
      headers for RSS because most implementations populate the source
      port with entropy from the inner headers. This, however, is not
      always the case, for example certain switches are only able to
      use a fixed source port during encapsulation.
      
      We have also seen many driver authors get the intricacies of UDP
      tunnel port offloads wrong. Most commonly the drivers forget to
      perform reference counting, or take sleeping locks in the callbacks.
      
      This work tries to improve the situation by pulling the UDP tunnel
      port table maintenance out of the drivers. It turns out that almost
      all drivers maintain a fixed size table of ports (in most cases one
      per tunnel type), so we can take care of all the refcounting in the
      core, and let the driver specify if they need to sleep in the
      callbacks or not. The new common implementation will also support
      replacing ports - when a port is removed from a full table it will
      try to find a previously missing port to take its place.
      
      This patch only implements the core functionality along with a few
      drivers I was hoping to test manually [1] along with a test based
      on a netdevsim implementation. Following patches will convert all
      the drivers. Once that's complete we can remove the ndos, and rely
      directly on the new infrastrucutre.
      
      Then after RSS (RXFH) is converted to netlink we can add the ability
      to configure the use of inner RSS headers for UDP tunnels.
      
      [1] Unfortunately I wasn't able to, turns out 2 of the devices
      I had access to were older generation or had old FW, and they
      did not actually support UDP tunnel port notifications (see
      the second paragraph). The thrid device appears to program
      the UDP ports correctly but it generates bad UDP checksums with
      or without these patches. Long story short - I'd appreciate
      reviews and testing here..
      
      v4:
       - better build fix (hopefully this one does it..)
      v3:
       - fix build issue;
       - improve bnxt changes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ea46047
    • Jakub Kicinski's avatar
      mlx4: convert to new udp_tunnel_nic infra · fb6f8970
      Jakub Kicinski authored
      Convert to new infra, make use of the ability to sleep in the callback.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb6f8970
    • Jakub Kicinski's avatar
      bnxt: convert to new udp_tunnel_nic infra · 442a35a5
      Jakub Kicinski authored
      Convert to new infra, taking advantage of sleeping in callbacks.
      
      v2:
       - use bp->*_fw_dst_port_id != INVALID_HW_RING_ID as indication
         that the offload is active.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      442a35a5
    • Jakub Kicinski's avatar
      ixgbe: convert to new udp_tunnel_nic infra · dc221851
      Jakub Kicinski authored
      Make use of new common udp_tunnel_nic infra. ixgbe supports
      IPv4 only, and only single VxLAN and Geneve ports (one each).
      
      v2:
       - split out the RXCSUM feature handling to separate change;
       - declare structs separately;
       - use ti.type instead of assuming table 0 is VxLAN;
       - move setting netdev->udp_tunnel_nic_info to its own switch.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc221851
    • Jakub Kicinski's avatar
      ixgbe: don't clear UDP tunnel ports when RXCSUM is disabled · abc0c78c
      Jakub Kicinski authored
      It appears the clearing of UDP tunnel ports when RXCSUM
      is disabled is unnecessary. Driver will not pay attention
      to checksum bits if RXCSUM is not set, so we can let
      the hardware parse the packets.
      
      Note that the UDP tunnel port NDO handlers don't pay attention
      to the state of RXCSUM, so the ports could had been re-programmed,
      anyway.
      
      This cleanup simplifies later conversion patch.
      
      v2:
       - break this out of the following patch.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abc0c78c
    • Jakub Kicinski's avatar
      selftests: net: add a test for UDP tunnel info infra · 91f430b2
      Jakub Kicinski authored
      Add validating the UDP tunnel infra works.
      
      $ ./udp_tunnel_nic.sh
      PASSED all 383 checks
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91f430b2
    • Jakub Kicinski's avatar
      netdevsim: add UDP tunnel port offload support · 424be63a
      Jakub Kicinski authored
      Add UDP tunnel port handlers to our fake driver so we can test
      the core infra.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      424be63a
    • Jakub Kicinski's avatar
      ethtool: add tunnel info interface · c7d759eb
      Jakub Kicinski authored
      Add an interface to report offloaded UDP ports via ethtool netlink.
      
      Now that core takes care of tracking which UDP tunnel ports the NICs
      are aware of we can quite easily export this information out to
      user space.
      
      The responsibility of writing the netlink dumps is split between
      ethtool code and udp_tunnel_nic.c - since udp_tunnel module may
      not always be loaded, yet we should always report the capabilities
      of the NIC.
      
      $ ethtool --show-tunnels eth0
      Tunnel information for eth0:
        UDP port table 0:
          Size: 4
          Types: vxlan
          No entries
        UDP port table 1:
          Size: 4
          Types: geneve, vxlan-gpe
          Entries (1):
              port 1230, vxlan-gpe
      
      v4:
       - back to v2, build fix is now directly in udp_tunnel.h
      v3:
       - don't compile ETHTOOL_MSG_TUNNEL_INFO_GET in if CONFIG_INET
         not set.
      v2:
       - fix string set count,
       - reorder enums in the uAPI,
       - fix type of ETHTOOL_A_TUNNEL_UDP_TABLE_TYPES to bitset
         in docs and comments.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7d759eb
    • Jakub Kicinski's avatar
      udp_tunnel: add central NIC RX port offload infrastructure · cc4e3835
      Jakub Kicinski authored
      Cater to devices which:
       (a) may want to sleep in the callbacks;
       (b) only have IPv4 support;
       (c) need all the programming to happen while the netdev is up.
      
      Drivers attach UDP tunnel offload info struct to their netdevs,
      where they declare how many UDP ports of various tunnel types
      they support. Core takes care of tracking which ports to offload.
      
      Use a fixed-size array since this matches what almost all drivers
      do, and avoids a complexity and uncertainty around memory allocations
      in an atomic context.
      
      Make sure that tunnel drivers don't try to replay the ports when
      new NIC netdev is registered. Automatic replays would mess up
      reference counting, and will be removed completely once all drivers
      are converted.
      
      v4:
       - use a #define NULL to avoid build issues with CONFIG_INET=n.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc4e3835
    • Jakub Kicinski's avatar
      udp_tunnel: re-number the offload tunnel types · 84a4160e
      Jakub Kicinski authored
      Make it possible to use tunnel types as flags more easily.
      There doesn't appear to be any user using the type as an
      array index, so this should make no difference.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84a4160e
    • Jakub Kicinski's avatar
      debugfs: make sure we can remove u32_array files cleanly · a2b992c8
      Jakub Kicinski authored
      debugfs_create_u32_array() allocates a small structure to wrap
      the data and size information about the array. If users ever
      try to remove the file this leads to a leak since nothing ever
      frees this wrapper.
      
      That said there are no upstream users of debugfs_create_u32_array()
      that'd remove a u32 array file (we only have one u32 array user in
      CMA), so there is no real bug here.
      
      Make callers pass a wrapper they allocated. This way the lifetime
      management of the wrapper is on the caller, and we can avoid the
      potential leak in debugfs.
      
      CC: Chucheng Luo <luochucheng@vivo.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2b992c8
  2. 09 Jul, 2020 18 commits
  3. 08 Jul, 2020 11 commits
    • Jarod Wilson's avatar
      bonding: don't need RTNL for ipsec helpers · f548a476
      Jarod Wilson authored
      The bond_ipsec_* helpers don't need RTNL, and can potentially get called
      without it being held, so switch from rtnl_dereference() to
      rcu_dereference() to access bond struct data.
      
      Lightly tested with xfrm bonding, no problems found, should address the
      syzkaller bug referenced below.
      
      Reported-by: syzbot+582c98032903dcc04816@syzkaller.appspotmail.com
      CC: Huy Nguyen <huyn@mellanox.com>
      CC: Saeed Mahameed <saeedm@mellanox.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jakub Kicinski <kuba@kernel.org>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: netdev@vger.kernel.org
      CC: intel-wired-lan@lists.osuosl.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f548a476
    • Fabio Estevam's avatar
      dt-bindings: dp83869: Fix the type of device · 7d25e14e
      Fabio Estevam authored
      DP83869 is an Ethernet PHY, not a charger, so fix the documentation
      accordingly.
      
      Fixes: 4d66c56f ("dt-bindings: net: dp83869: Add TI dp83869 phy")
      Signed-off-by: default avatarFabio Estevam <festevam@gmail.com>
      Acked-by: default avatarDan Murphy <dmurphy@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d25e14e
    • Fabio Estevam's avatar
      dt-bindings: dp83867: Fix the type of device · a6b9580b
      Fabio Estevam authored
      DP83867 is an Ethernet PHY, not a charger, so fix the documentation
      accordingly.
      
      Fixes: 74ac28f1 ("dt-bindings: dp83867: Convert DP83867 to yaml")
      Signed-off-by: default avatarFabio Estevam <festevam@gmail.com>
      Acked-by: default avatarDan Murphy <dmurphy@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6b9580b
    • Jarod Wilson's avatar
      bonding: deal with xfrm state in all modes and add more error-checking · 5cd24cbe
      Jarod Wilson authored
      It's possible that device removal happens when the bond is in non-AB mode,
      and addition happens in AB mode, so bond_ipsec_del_sa() never gets called,
      which leaves security associations in an odd state if bond_ipsec_add_sa()
      then gets called after switching the bond into AB. Just call add and
      delete universally for all modes to keep things consistent.
      
      However, it's also possible that this code gets called when the system is
      shutting down, and the xfrm subsystem has already been disconnected from
      the bond device, so we need to do some error-checking and bail, lest we
      hit a null ptr deref.
      
      Fixes: a3b658cf ("bonding: allow xfrm offload setup post-module-load")
      CC: Huy Nguyen <huyn@mellanox.com>
      CC: Saeed Mahameed <saeedm@mellanox.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Jakub Kicinski <kuba@kernel.org>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: netdev@vger.kernel.org
      CC: intel-wired-lan@lists.osuosl.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cd24cbe
    • David S. Miller's avatar
      Merge branch 'RTL8366RB-tagging-support' · 32e0d42a
      David S. Miller authored
      Linus Walleij says:
      
      ====================
      RTL8366RB tagging support
      
      This patch set adds DSA tagging support to the RTL8366RB
      DSA driver.
      
      There is a minor performance improvement in the tag parser
      compared to the previous patch set and the review tags
      have been collected.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32e0d42a
    • Linus Walleij's avatar
      net: dsa: rtl8366rb: Support the CPU DSA tag · a20fafb9
      Linus Walleij authored
      This activates the support to use the CPU tag to properly
      direct ingress traffic to the right port.
      
      Bit 15 in register RTL8368RB_CPU_CTRL_REG can be set to
      1 to disable the insertion of the CPU tag which is what
      the code currently does. The bit 15 define calls this
      setting RTL8368RB_CPU_INSTAG which is confusing since the
      inverse meaning is implied: programmers may think that
      setting this bit to 1 will *enable* inserting the tag
      rather than disabling it, so rename this setting in
      bit 15 to RTL8368RB_CPU_NO_TAG which is more to the
      point.
      
      After this e.g. ping works out-of-the-box with the
      RTL8366RB.
      
      Cc: DENG Qingfang <dqfext@gmail.com>
      Cc: Mauri Sandberg <sandberg@mailfence.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a20fafb9
    • Linus Walleij's avatar
      net: dsa: tag_rtl4_a: Implement Realtek 4 byte A tag · efd7fe68
      Linus Walleij authored
      This implements the known parts of the Realtek 4 byte
      tag protocol version 0xA, as found in the RTL8366RB
      DSA switch.
      
      It is designated as protocol version 0xA as a
      different Realtek 4 byte tag format with protocol
      version 0x9 is known to exist in the Realtek RTL8306
      chips.
      
      The tag and switch chip lacks public documentation, so
      the tag format has been reverse-engineered from
      packet dumps. As only ingress traffic has been available
      for analysis an egress tag has not been possible to
      develop (even using educated guesses about bit fields)
      so this is as far as it gets. It is not known if the
      switch even supports egress tagging.
      
      Excessive attempts to figure out the egress tag format
      was made. When nothing else worked, I just tried all bit
      combinations with 0xannp where a is protocol and p is
      port. I looped through all values several times trying
      to get a response from ping, without any positive
      result.
      
      Using just these ingress tags however, the switch
      functionality is vastly improved and the packets find
      their way into the destination port without any
      tricky VLAN configuration. On the D-Link DIR-685 the
      LAN ports now come up and respond to ping without
      any command line configuration so this is a real
      improvement for users.
      
      Egress packets need to be restricted to the proper
      target ports using VLAN, which the RTL8366RB DSA
      switch driver already sets up.
      
      Cc: DENG Qingfang <dqfext@gmail.com>
      Cc: Mauri Sandberg <sandberg@mailfence.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efd7fe68
    • Meir Lichtinger's avatar
      net/mlx5: Added support for 100Gbps per lane link modes · 12fdafb8
      Meir Lichtinger authored
      This patch exposes new link modes using 100Gbps per lane, including 100G,
      200G and 400G modes.
      Signed-off-by: default avatarMeir Lichtinger <meirl@mellanox.com>
      Reviewed-by: default avatarAya Levin <ayal@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12fdafb8
    • Meir Lichtinger's avatar
      ethtool: Add support for 100Gbps per lane link modes · 065e0d42
      Meir Lichtinger authored
      Define 100G, 200G and 400G link modes using 100Gbps per lane
      
      LR, ER and FR are defined as a single link mode because they are
      using same technology and by design are fully interoperable.
      EEPROM content indicates if the module is LR, ER, or FR, and the
      user space ethtool decoder is planned to support decoding these
      modes in the EEPROM.
      Signed-off-by: default avatarMeir Lichtinger <meirl@mellanox.com>
      CC: Andrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarAya Levin <ayal@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      065e0d42
    • David S. Miller's avatar
      Merge branch 'bnxt_en-Driver-update-for-net-next' · 66846b7d
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Driver update for net-next.
      
      This patchset implements ethtool -X to setup user-defined RSS indirection
      table.  The new infrastructure also allows the proper logical ring index
      to be used to populate the RSS indirection when queried by ethtool -x.
      Prior to these patches, we were incorrectly populating the output of
      ethtool -x with internal ring IDs which would make no sense to the user.
      
      The last 2 patches add some cleanups to the VLAN acceleration logic
      and check the firmware capabilities before allowing VLAN acceleration
      offloads.
      
      v4: Move bnxt_get_rxfh_indir_size() fix to a new patch #2.
          Modify patch #7 to revert RSS map to default only when necessary.
      
      v3: Use ALIGN() in patch 5.
          Add warning messages in patch 6.
      
      v2: Some RSS indirection table changes requested by Jakub Kicinski.
      ====================
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66846b7d
    • Edwin Peer's avatar
      bnxt_en: allow firmware to disable VLAN offloads · 1da63ddd
      Edwin Peer authored
      Bare-metal use cases require giving firmware and the embedded
      application processor control over VLAN offloads. The driver should
      not attempt to override or utilize this feature in such scenarios
      since it will not work as expected.
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1da63ddd