1. 06 Feb, 2021 15 commits
    • Vlad Buslov's avatar
      net/mlx5e: TC preparation refactoring for routing update event · c7b9038d
      Vlad Buslov authored
      Following patch in series implement routing update event which requires
      ability to modify rule match_to_reg modify header actions dynamically
      during rule lifetime. In order to accommodate such behavior, refactor and
      extend TC infrastructure in following ways:
      
      - Modify mod_hdr infrastructure to preserve its parse attribute for whole
      rule lifetime, instead of deallocating it after rule creation.
      
      - Extend match_to_reg infrastructure with new function
      mlx5e_tc_match_to_reg_set_and_get_id() that returns mod_hdr action id that
      can be used afterwards to update the action, and
      mlx5e_tc_match_to_reg_mod_hdr_change() that can modify existing actions by
      its id.
      
      - Extend tun API with new functions mlx5e_tc_tun_update_header_ipv{4|6}()
      that are used to updated existing encap entry tunnel header.
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c7b9038d
    • Vlad Buslov's avatar
      net/mlx5e: Refactor neigh update infrastructure · 2221d954
      Vlad Buslov authored
      Following patches in series implements route update which can cause encap
      entries to migrate between routing devices. Consecutively, their parent
      nhe's need to be also transferable between devices instead of having neigh
      device as a part of their immutable key. Move neigh device from struct
      mlx5_neigh to struct mlx5e_neigh_hash_entry and check that nhe and neigh
      devices are the same in workqueue neigh update handler.
      
      Save neigh net_device that can change dynamically in dedicated nhe->dev
      field. With FIB event handler that is implemented in following patches
      changing nhe->dev, NETEVENT_DELAY_PROBE_TIME_UPDATE handler can
      concurrently access the nhe entry when traversing neigh list under rcu read
      lock. Processing stale values in that handler doesn't change the handler
      logic, so just wrap all accesses to the dev pointer in {WRITE|READ}_ONCE()
      helpers.
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2221d954
    • Vlad Buslov's avatar
      net/mlx5e: Create route entry infrastructure · 777bb800
      Vlad Buslov authored
      Implement dedicated route entry infrastructure to be used in following
      patch by route update event. Both encap (indirectly through their
      corresponding encap entries) and decap (directly) flows are attached to
      routing entry. Since route update also requires updating encap (route
      device MAC address is a source MAC address of tunnel encapsulation), same
      encap_tbl_lock mutex is used for synchronization.
      
      The new infrastructure looks similar to existing infrastructures for shared
      encap, mod_hdr and hairpin entries:
      
      - Per-eswitch hash table is used for quick entry lookup.
      
      - Flows are attached to per-entry linked list and hold reference to entry
        during their lifetime.
      
      - Atomic reference counting and rcu mechanisms are used as synchronization
        primitives for concurrent access.
      
      The infrastructure also enables connection tracking on stacked devices
      topology by attaching CT chain 0 flow on tunneling dev to decap route
      entry.
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      777bb800
    • Vlad Buslov's avatar
      net/mlx5e: Extract tc tunnel encap/decap code to dedicated file · 0d9f9647
      Vlad Buslov authored
      Following patches in series extend the extracted code with routing
      infrastructure. To improve code modularity created a dedicated
      tc_tun_encap.c source file and move encap/decap related code to the new
      file. Export code that is used by both regular TC code and encap/decap code
      into tc_priv.h (new header intended to be used only by TC module). Rename
      some exported functions by adding "mlx5e_" prefix to their names.
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0d9f9647
    • Vlad Buslov's avatar
      net/mlx5e: Match recirculated packet miss in slow table using reg_c1 · 8e404fef
      Vlad Buslov authored
      Previous patch in series that implements stack devices RX path implements
      indirect table rules that match on tunnel VNI. After such rule is created
      all tunnel traffic is recirculated to root table. However, recirculated
      packet might not match on any rules installed in the table (for example,
      when IP traffic follows ARP traffic). In that case packets appear on
      representor of tunnel endpoint VF instead being redirected to the VF
      itself.
      
      Extend slow table with additional flow group that matches on reg_c0 (source
      port value set by indirect tables implemented by previous patch in series)
      and reg_c1 (special 0xFFF mark). When creating offloads fdb tables, install
      one rule per VF vport to match on recirculated miss packets and redirect
      them to appropriate VF vport. Modify indirect tables code to also rewrite
      reg_c1 with special 0xFFF mark.
      
      Implementation reuses reg_c1 tunnel id bits. This is safe to do because
      recirculated packets are always matched before decapsulation.
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8e404fef
    • Vlad Buslov's avatar
      net/mlx5e: Refactor reg_c1 usage · 48d216e5
      Vlad Buslov authored
      Following patch in series uses reg_c1 in eswitch code. To use reg_c1
      helpers in both TC and eswitch code, refactor existing helpers according to
      similar use case of reg_c0 and move the functionality into eswitch.h.
      Calculate reg mappings length from new defines to ensure that they are
      always in sync and only need to be changed in single place.
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      48d216e5
    • Vlad Buslov's avatar
      net/mlx5e: VF tunnel RX traffic offloading · a508728a
      Vlad Buslov authored
      When tunnel endpoint is on VF the encapsulated RX traffic is exposed on the
      representor of the VF without any further processing of rules installed on
      the VF. Detect such case by checking if the device returned by route lookup
      in decap rule handling code is a mlx5 VF and handle it with new redirection
      tables API.
      
      Example TC rules for VF tunnel traffic:
      
      1. Rule that encapsulates the tunneled flow and redirects packets from
      source VF rep to tunnel device:
      
      $ tc -s filter show dev enp8s0f0_1 ingress
      filter protocol ip pref 4 flower chain 0
      filter protocol ip pref 4 flower chain 0 handle 0x1
        dst_mac 0a:40:bd:30:89:99
        src_mac ca:2e:a7:3f:f5:0f
        eth_type ipv4
        ip_tos 0/0x3
        ip_flags nofrag
        in_hw in_hw_count 1
              action order 1: tunnel_key  set
              src_ip 7.7.7.5
              dst_ip 7.7.7.1
              key_id 98
              dst_port 4789
              nocsum
              ttl 64 pipe
               index 1 ref 1 bind 1 installed 411 sec used 411 sec
              Action statistics:
              Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              no_percpu
              used_hw_stats delayed
      
              action order 2: mirred (Egress Redirect to device vxlan_sys_4789) stolen
              index 1 ref 1 bind 1 installed 411 sec used 0 sec
              Action statistics:
              Sent 5615833 bytes 4028 pkt (dropped 0, overlimits 0 requeues 0)
              Sent software 0 bytes 0 pkt
              Sent hardware 5615833 bytes 4028 pkt
              backlog 0b 0p requeues 0
              cookie bb406d45d343bf7ade9690ae80c7cba4
              no_percpu
              used_hw_stats delayed
      
      2. Rule that redirects from tunnel device to UL rep:
      
      $ tc -s filter show dev vxlan_sys_4789 ingress
      filter protocol ip pref 4 flower chain 0
      filter protocol ip pref 4 flower chain 0 handle 0x1
        dst_mac ca:2e:a7:3f:f5:0f
        src_mac 0a:40:bd:30:89:99
        eth_type ipv4
        enc_dst_ip 7.7.7.5
        enc_src_ip 7.7.7.1
        enc_key_id 98
        enc_dst_port 4789
        enc_tos 0
        ip_flags nofrag
        in_hw in_hw_count 1
              action order 1: tunnel_key  unset pipe
               index 2 ref 1 bind 1 installed 434 sec used 434 sec
              Action statistics:
              Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              used_hw_stats delayed
      
              action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen
              index 4 ref 1 bind 1 installed 434 sec used 0 sec
              Action statistics:
              Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0)
              Sent software 0 bytes 0 pkt
              Sent hardware 129936 bytes 1082 pkt
              backlog 0b 0p requeues 0
              cookie ac17cf398c4c69e4a5b2f7aabd1b88ff
              no_percpu
              used_hw_stats delayed
      Co-developed-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a508728a
    • Vlad Buslov's avatar
      net/mlx5e: Remove redundant match on tunnel destination mac · 4ad9116c
      Vlad Buslov authored
      Remove hardcoded match on tunnel destination MAC address. Such match is no
      longer required and would be wrong for stacked devices topology where
      encapsulation destination MAC address will be the address of tunnel VF that
      can change dynamically on route change (implemented in following patches in
      the series).
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      4ad9116c
    • Vlad Buslov's avatar
      net/mlx5: E-Switch, Indirect table infrastructure · 34ca6535
      Vlad Buslov authored
      Indirect table infrastructure is used to allow fully processing VF tunnel
      traffic in hardware. Kernel software model uses two TC rules for such
      traffic: UL rep to tunnel device, then tunnel VF rep to destination VF rep.
      To implement such pipeline driver needs to program the hardware after
      matching on UL rule to overwrite source vport from UL to tunnel VF and
      recirculate the packet to the root table to allow matching on the rule
      installed on tunnel VF. For this indirect table matches all encapsulated
      traffic by tunnel parameters and all other IP traffic is sent to tunnel VF
      by the miss rule.
      
      Indirect table API overview:
      
      - mlx5_esw_indir_table_{init|destroy}() - init and destroy opaque indirect
      table object.
      
      - mlx5_esw_indir_table_get() - get or create new table according to vport
      id and IP version. Table has following pre-created groups: recirculation
      group with match on ethertype and VNI (rules that match encapsulated
      packets are installed to this group) and forward group with default/miss
      rule that forwards to vport of tunnel endpoint VF (rule for regular
      non-encapsulated packets).
      
      - mlx5_esw_indir_table_put() - decrease reference to the indirect table and
      matching rule (for encapsulated traffic).
      
      - mlx5_esw_indir_table_needed() - check that in_port is an uplink port and
      out_port is VF on the same eswitch, verify that the rule is for IP traffic
      and source port rewrite functionality can be used.
      
      - mlx5_esw_indir_table_decap_vport() - function returns decap vport of
      flow attribute.
      Co-developed-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      34ca6535
    • Vlad Buslov's avatar
      net/mlx5e: Refactor tun routing helpers · 6717986e
      Vlad Buslov authored
      Refactor tun routing helpers to use dedicated struct
      mlx5e_tc_tun_route_attr instead of multiple output arguments. This
      simplifies the callers (no need to keep track of bunch of output param
      pointers) and allows to unify struct release code in new
      mlx5e_tc_tun_route_attr_cleanup() helper instead of requiring callers to
      manually release some of the output parameters that require it.
      
      Simplify code by unifying error handling at the end of the function and
      rearranging code. Remove redundant empty line.
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6717986e
    • Vlad Buslov's avatar
      net/mlx5e: VF tunnel TX traffic offloading · 10742efc
      Vlad Buslov authored
      When tunnel endpoint is on VF, driver still assumes that endpoint is on
      uplink and incorrectly configures encap rule offload according to that
      assumption. As a result, traffic is sent directly to the uplink and rules
      installed on representor of tunnel endpoint VF are ignored.
      
      Implement following changes to allow offloading tx traffic with tunnel
      endpoint on VF:
      
      - For tunneling flows perform route lookup on route and out devices pair.
      If out device is uplink and route device is VF of same physical port, then
      modify packet reg_c_0 metadata register (source port) with the value of VF
      vport. Use eswitch vhca_id->vport mapping introduced in one of previous
      patches in the series to obtain vport from route netdevice.
      
      - Recirculate encapsulated packets to VF vport in order to apply any flow
      rules installed on VF representor that match on encapsulated traffic.
      
      Only enable support for this functionality when all following conditions
      are true:
      
      - Hardware advertises capability to preserve reg_c_0 value on packet
      recirculation.
      
      - Vport metadata matching is enabled.
      
      - Termination tables are to be used by the flow.
      
      Example TC rules for VF tunnel traffic:
      
      1. Rule that redirects packets from UL to VF rep that has the tunnel
      endpoint IP address:
      
      $ tc -s filter show dev enp8s0f0 ingress
      filter protocol ip pref 4 flower chain 0
      filter protocol ip pref 4 flower chain 0 handle 0x1
        dst_mac 16:c9:a0:2d:69:2c
        src_mac 0c:42:a1:58:ab:e4
        eth_type ipv4
        ip_flags nofrag
        in_hw in_hw_count 1
              action order 1: mirred (Egress Redirect to device enp8s0f0_0) stolen
              index 3 ref 1 bind 1 installed 377 sec used 0 sec
              Action statistics:
              Sent 114096 bytes 952 pkt (dropped 0, overlimits 0 requeues 0)
              Sent software 0 bytes 0 pkt
              Sent hardware 114096 bytes 952 pkt
              backlog 0b 0p requeues 0
              cookie 878fa48d8c423fc08c3b6ca599b50a97
              no_percpu
              used_hw_stats delayed
      
      2. Rule that decapsulates the tunneled flow and redirects to destination VF
      representor:
      
      $ tc -s filter show dev vxlan_sys_4789 ingress
      filter protocol ip pref 4 flower chain 0
      filter protocol ip pref 4 flower chain 0 handle 0x1
        dst_mac ca:2e:a7:3f:f5:0f
        src_mac 0a:40:bd:30:89:99
        eth_type ipv4
        enc_dst_ip 7.7.7.5
        enc_src_ip 7.7.7.1
        enc_key_id 98
        enc_dst_port 4789
        enc_tos 0
        ip_flags nofrag
        in_hw in_hw_count 1
              action order 1: tunnel_key  unset pipe
               index 2 ref 1 bind 1 installed 434 sec used 434 sec
              Action statistics:
              Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
              used_hw_stats delayed
      
              action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen
              index 4 ref 1 bind 1 installed 434 sec used 0 sec
              Action statistics:
              Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0)
              Sent software 0 bytes 0 pkt
              Sent hardware 129936 bytes 1082 pkt
              backlog 0b 0p requeues 0
              cookie ac17cf398c4c69e4a5b2f7aabd1b88ff
              no_percpu
              used_hw_stats delayed
      Co-developed-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      10742efc
    • Vlad Buslov's avatar
      net/mlx5: E-Switch, Refactor rule offload forward action processing · 9e51c0a6
      Vlad Buslov authored
      Following patches in the series extend forwarding functionality with VF
      tunnel TX and RX handling. Extract action forwarding processing code into
      dedicated functions to simplify further extensions:
      
      - Handle every forwarding case with dedicated function instead of inline
      code.
      
      - Extract forwarding dest dispatch conditional into helper function
      esw_setup_dests().
      
      - Unify forwaring cleanup code in error path of
      mlx5_eswitch_add_offloaded_rule() and in rule deletion code of
      __mlx5_eswitch_del_rule() in new helper function esw_cleanup_dests() (dual
      to new esw_setup_dests() helper).
      
      This patch does not change functionality.
      Co-developed-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      9e51c0a6
    • Vlad Buslov's avatar
      net/mlx5e: Always set attr mdev pointer · 275c21d6
      Vlad Buslov authored
      Eswitch offloads extensions in following patches in the series require
      attr->esw_attr->in_mdev pointer to always be set. This is already the case
      for all code paths except mlx5_tc_ct_entry_add_rule() function. Fix the
      function to assign mdev pointer with priv->mdev value.
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      275c21d6
    • Vlad Buslov's avatar
      net/mlx5e: E-Switch, Maintain vhca_id to vport_num mapping · 84ae9c1f
      Vlad Buslov authored
      Following patches in the series need to be able to map VF netdev to vport.
      Since it is trivial to obtain vhca_id from netdev, maintain mapping from
      vhca_id to vport_num inside eswitch offloads using xarray. Provide function
      mlx5_eswitch_vhca_id_to_vport() to be used by TC code in following patches
      to obtain the mapping.
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      84ae9c1f
    • Mark Bloch's avatar
      net/mlx5: E-Switch, Refactor setting source port · b055ecf5
      Mark Bloch authored
      Setting the source port requires only the E-Switch and vport number.
      Refactor the function to get those parameters instead of passing the full
      attribute.
      Signed-off-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b055ecf5
  2. 05 Feb, 2021 25 commits