1. 04 Sep, 2019 1 commit
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2019-09-01-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 94810bd3
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2019-09-01  (Software steering support)
      
      Abstract:
      --------
      Mellanox ConnetX devices supports packet matching, packet modification and
      redirection. These functionalities are also referred to as flow-steering.
      To configure a steering rule, the rule is written to the device owned
      memory, this memory is accessed and cached by the device when processing
      a packet.
      Steering rules are constructed from multiple steering entries (STE).
      
      Rules are configured using the Firmware command interface. The Firmware
      processes the given driver command and translates them to STEs, then
      writes them to the device memory in the current steering tables.
      This process is slow due to the architecture of the command interface and
      the processing complexity of each rule.
      
      The highlight of this patchset is to cut the middle man (The firmware) and
      do steering rules programming into device directly from the driver, with
      no firmware intervention whatsoever.
      
      Motivation:
      -----------
      Software (driver managed) steering allows for high rule insertion rates
      compared to the FW steering described above, this is achieved by using
      internal RDMA writes to the device owned memory instead of the slow
      command interface to program steering rules.
      
      Software (driver managed) steering, doesn't depend on new FW
      for new steering functionality, new implementations can be done in the
      driver skipping the FW layer.
      
      Performance:
      ------------
      The insertion rate on a single core using the new approach allows
      programming ~300K rules per sec. (Done via direct raw test to the new mlx5
      sw steering layer, without any kernel layer involved).
      
      Test: TC L2 rules
      33K/s with Software steering (this patchset).
      5K/s  with FW and current driver.
      This will improve OVS based solution performance.
      
      Architecture and implementation details:
      ----------------------------------------
      Software steering will be dynamically selected via devlink device
      parameter. Example:
      $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
                pci/0000:06:00.0:
                name flow_steering_mode type driver-specific
                values:
                   cmode runtime value smfs
      
      mlx5 software steering module a.k.a (DR - Direct Rule) is implemented
      and contained in mlx5/core/steering directory and controlled by
      MLX5_SW_STEERING kconfig flag.
      
      mlx5 core steering layer (fs_core) already provides a shim layer for
      implementing different steering mechanisms, software steering will
      leverage that as seen at the end of this series.
      
      When Software Steering for a specific steering domain
      (NIC/RDMA/Vport/ESwitch, etc ..) is supported, it will cause rules
      targeting this domain to be created using  SW steering instead of FW.
      
      The implementation includes:
      Domain - The steering domain is the object that all other object resides
          in. It holds the memory allocator, send engine, locks and other shared
          data needed by lower objects such as table, matcher, rule, action.
          Each domain can contain multiple tables. Domain is equivalent to
          namespaces e.g (NIC/RDMA/Vport/ESwitch, etc ..) as implemented
          currently in mlx5_core fs_core (flow steering core).
      
      Table - Table objects are used for holding multiple matchers, each table
          has a level used to prevent processing loops. Packets are being
          directed to this table once it is set as the root table, this is done
          by fs_core using a FW command. A packet is being processed inside the
          table matcher by matcher until a successful hit, otherwise the packet
          will perform the default action.
      
      Matcher - Matchers objects are used to specify the fields mask for
          matching when processing a packet. A matcher belongs to a table, each
          matcher can hold multiple rules, each rule with different matching
          values corresponding to the matcher mask. Each matcher has a priority
          used for rule processing order inside the table.
      
      Action - Action objects are created to specify different steering actions
          such as count, reformat (encapsulate, decapsulate, ...), modify
          header, forward to table and many other actions. When creating a rule
          a sequence of actions can be provided to be executed on a successful
          match.
      
      Rule - Rule objects are used to specify a specific match on packets as
          well as the actions that should be executed. A rule belongs to a
          matcher.
      
      STE - This layer is used to hold the specific STE format for the device
          and to convert the requested rule to STEs. Each rule is constructed of
          an STE chain, Multiple rules construct a steering graph. Each node in
          the graph is a hash table containing multiple STEs. The index of each
          STE in the hash table is being calculated using a CRC32 hash function.
      
      Memory pool - Used for managing and caching device owned memory for rule
          insertion. The memory is being allocated using DM (device memory) API.
      
      Communication with device - layer for standard RDMA operation using  RC QP
          to configure the device steering.
      
      Command utility - This module holds all of the FW commands that are
          required for SW steering to function.
      
      Patch planning and files:
      -------------------------
      1) First patch, adds the support to Add flow steering actions to fs_cmd
      shim layer.
      
      2) Next 12 patch will add a file per each Software steering
      functionality/module as described above. (See patches with title: DR, *)
      
      3) Add CONFIG_MLX5_SW_STEERING for software steering support and enable
      build with the new files
      
      4) Next two patches will add the support for software steering in mlx5
      steering shim layer
      net/mlx5: Add API to set the namespace steering mode
      net/mlx5: Add direct rule fs_cmd implementation
      
      5) Last two patches will add the new devlink parameter to select mlx5
      steering mode, will be valid only for switchdev mode for now.
      Two modes are supported:
          1. DMFS - Device managed flow steering
          2. SMFS - Software/Driver managed flow steering.
      
          In the DMFS mode, the HW steering entities are created through the
          FW. In the SMFS mode this entities are created though the driver
          directly.
      
          The driver will use the devlink steering mode only if the steering
          domain supports it, for now SMFS will manages only the switchdev
          eswitch steering domain.
      
          User command examples:
          - Set SMFS flow steering mode::
      
              $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
      
          - Read device flow steering mode::
      
              $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
                pci/0000:06:00.0:
                name flow_steering_mode type driver-specific
                values:
                   cmode runtime value smfs
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94810bd3
  2. 03 Sep, 2019 18 commits
  3. 02 Sep, 2019 16 commits
    • David S. Miller's avatar
      Merge branch 'mvpp2-per-cpu-buffers' · 67538eb5
      David S. Miller authored
      Matteo Croce says:
      
      ====================
      mvpp2: per-cpu buffers
      
      This patchset workarounds an PP2 HW limitation which prevents to use
      per-cpu rx buffers.
      The first patch is just a refactor to prepare for the second one.
      The second one allocates percpu buffers if the following conditions are met:
      - CPU number is less or equal 4
      - no port is using jumbo frames
      
      If the following conditions are not met at load time, of jumbo frame is enabled
      later on, the shared allocation is reverted.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67538eb5
    • Matteo Croce's avatar
      mvpp2: percpu buffers · 7d04b0b1
      Matteo Croce authored
      Every mvpp2 unit can use up to 8 buffers mapped by the BM (the HW buffer
      manager). The HW will place the frames in the buffer pool depending on the
      frame size: short (< 128 bytes), long (< 1664) or jumbo (up to 9856).
      
      As any unit can have up to 4 ports, the driver allocates only 2 pools,
      one for small and one long frames, and share them between ports.
      When the first port MTU is set higher than 1664 bytes, a third pool is
      allocated for jumbo frames.
      
      This shared allocation makes impossible to use percpu allocators,
      and creates contention between HW queues.
      
      If possible, i.e. if the number of possible CPU are less than 8 and jumbo
      frames are not used, switch to a new scheme: allocate 8 per-cpu pools for
      short and long frames and bind every pool to an RXQ.
      
      When the first port MTU is set higher than 1664 bytes, the allocation
      scheme is reverted to the old behaviour (3 shared pools), and when all
      ports MTU are lowered, the per-cpu buffers are allocated again.
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d04b0b1
    • Matteo Croce's avatar
      mvpp2: refactor BM pool functions · 13616361
      Matteo Croce authored
      Refactor mvpp2_bm_pool_create(), mvpp2_bm_pool_destroy() and
      mvpp2_bm_pools_init() so that they accept a struct device instead
      of a struct platform_device, as they just need platform_device->dev.
      
      Removing such dependency makes the BM code more reusable in context
      where we don't have a pointer to the platform_device.
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13616361
    • Vladimir Oltean's avatar
      net: dsa: Fix off-by-one number of calls to devlink_port_unregister · 4ba0ebbc
      Vladimir Oltean authored
      When a function such as dsa_slave_create fails, currently the following
      stack trace can be seen:
      
      [    2.038342] sja1105 spi0.1: Probed switch chip: SJA1105T
      [    2.054556] sja1105 spi0.1: Reset switch and programmed static config
      [    2.063837] sja1105 spi0.1: Enabled switch tagging
      [    2.068706] fsl-gianfar soc:ethernet@2d90000 eth2: error -19 setting up slave phy
      [    2.076371] ------------[ cut here ]------------
      [    2.080973] WARNING: CPU: 1 PID: 21 at net/core/devlink.c:6184 devlink_free+0x1b4/0x1c0
      [    2.088954] Modules linked in:
      [    2.092005] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 5.3.0-rc6-01360-g41b52e38d2b6-dirty #1746
      [    2.100912] Hardware name: Freescale LS1021A
      [    2.105162] Workqueue: events deferred_probe_work_func
      [    2.110287] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14)
      [    2.117992] [<c030d8cc>] (show_stack) from [<c10b08d8>] (dump_stack+0xb4/0xc8)
      [    2.125180] [<c10b08d8>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8)
      [    2.132018] [<c0349d04>] (__warn) from [<c0349e34>] (warn_slowpath_null+0x40/0x48)
      [    2.139549] [<c0349e34>] (warn_slowpath_null) from [<c0f19d74>] (devlink_free+0x1b4/0x1c0)
      [    2.147772] [<c0f19d74>] (devlink_free) from [<c1064fc0>] (dsa_switch_teardown+0x60/0x6c)
      [    2.155907] [<c1064fc0>] (dsa_switch_teardown) from [<c1065950>] (dsa_register_switch+0x8e4/0xaa8)
      [    2.164821] [<c1065950>] (dsa_register_switch) from [<c0ba7fe4>] (sja1105_probe+0x21c/0x2ec)
      [    2.173216] [<c0ba7fe4>] (sja1105_probe) from [<c0b35948>] (spi_drv_probe+0x80/0xa4)
      [    2.180920] [<c0b35948>] (spi_drv_probe) from [<c0a4c1cc>] (really_probe+0x108/0x400)
      [    2.188711] [<c0a4c1cc>] (really_probe) from [<c0a4c694>] (driver_probe_device+0x78/0x1bc)
      [    2.196933] [<c0a4c694>] (driver_probe_device) from [<c0a4a3dc>] (bus_for_each_drv+0x58/0xb8)
      [    2.205414] [<c0a4a3dc>] (bus_for_each_drv) from [<c0a4c024>] (__device_attach+0xd0/0x168)
      [    2.213637] [<c0a4c024>] (__device_attach) from [<c0a4b1d0>] (bus_probe_device+0x84/0x8c)
      [    2.221772] [<c0a4b1d0>] (bus_probe_device) from [<c0a4b72c>] (deferred_probe_work_func+0x84/0xc4)
      [    2.230686] [<c0a4b72c>] (deferred_probe_work_func) from [<c03650a4>] (process_one_work+0x218/0x510)
      [    2.239772] [<c03650a4>] (process_one_work) from [<c03660d8>] (worker_thread+0x2a8/0x5c0)
      [    2.247908] [<c03660d8>] (worker_thread) from [<c036b348>] (kthread+0x148/0x150)
      [    2.255265] [<c036b348>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
      [    2.262444] Exception stack(0xea965fb0 to 0xea965ff8)
      [    2.267466] 5fa0:                                     00000000 00000000 00000000 00000000
      [    2.275598] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    2.283729] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
      [    2.290333] ---[ end trace ca5d506728a0581a ]---
      
      devlink_free is complaining right here:
      
      	WARN_ON(!list_empty(&devlink->port_list));
      
      This happens because devlink_port_unregister is no longer done right
      away in dsa_port_setup when a DSA_PORT_TYPE_USER has failed.
      Vivien said about this change that:
      
          Also no need to call devlink_port_unregister from within dsa_port_setup
          as this step is inconditionally handled by dsa_port_teardown on error.
      
      which is not really true. The devlink_port_unregister function _is_
      being called unconditionally from within dsa_port_setup, but not for
      this port that just failed, just for the previous ones which were set
      up.
      
      ports_teardown:
      	for (i = 0; i < port; i++)
      		dsa_port_teardown(&ds->ports[i]);
      
      Initially I was tempted to fix this by extending the "for" loop to also
      cover the port that failed during setup. But this could have potentially
      unforeseen consequences unrelated to devlink_port or even other types of
      ports than user ports, which I can't really test for. For example, if
      for some reason devlink_port_register itself would fail, then
      unconditionally unregistering it in dsa_port_teardown would not be a
      smart idea. The list might go on.
      
      So just make dsa_port_setup undo the setup it had done upon failure, and
      let the for loop undo the work of setting up the previous ports, which
      are guaranteed to be brought up to a consistent state.
      
      Fixes: 955222ca ("net: dsa: use a single switch statement for port setup")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ba0ebbc
    • Jiri Pirko's avatar
      mlx5: Add missing init_net check in FIB notifier · a21cf11b
      Jiri Pirko authored
      Take only FIB events that are happening in init_net into account. No other
      namespaces are supported.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a21cf11b
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 765b7590
      David S. Miller authored
      r8152 conflicts are the NAPI fixes in 'net' overlapping with
      some tasklet stuff in net-next
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      765b7590
    • Linus Torvalds's avatar
      Linux 5.3-rc7 · 089cf7f6
      Linus Torvalds authored
      089cf7f6
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 49ffdb4c
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small char and misc driver fixes for reported issues for
        5.3-rc7
      
        Also included in here is the documentation for how we are handling
        hardware issues under embargo that everyone has finally agreed on, as
        well as a MAINTAINERS update for the suckers who agreed to handle the
        LICENSES/ files.
      
        All of these have been in linux-next last week with no reported
        issues"
      
      * tag 'char-misc-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        fsi: scom: Don't abort operations for minor errors
        vmw_balloon: Fix offline page marking with compaction
        VMCI: Release resource if the work is already queued
        Documentation/process: Embargoed hardware security issues
        lkdtm/bugs: fix build error in lkdtm_EXHAUST_STACK
        mei: me: add Tiger Lake point LP device ID
        intel_th: pci: Add Tiger Lake support
        intel_th: pci: Add support for another Lewisburg PCH
        stm class: Fix a double free of stm_source_device
        MAINTAINERS: add entry for LICENSES and SPDX stuff
        fpga: altera-ps-spi: Fix getting of optional confd gpio
      49ffdb4c
    • Linus Torvalds's avatar
      Merge tag 'usb-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 2c248f92
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes that have been in linux-next this past
        week for 5.3-rc7
      
        They fix the usual xhci, syzbot reports, and other small issues that
        have come up last week.
      
        All have been in linux-next with no reported issues"
      
      * tag 'usb-5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: cdc-wdm: fix race between write and disconnect due to flag abuse
        usb: host: xhci: rcar: Fix typo in compatible string matching
        usb: host: xhci-tegra: Set DMA mask correctly
        USB: storage: ums-realtek: Whitelist auto-delink support
        USB: storage: ums-realtek: Update module parameter description for auto_delink_en
        usb: host: ohci: fix a race condition between shutdown and irq
        usb: hcd: use managed device resources
        typec: tcpm: fix a typo in the comparison of pdo_max_voltage
        usb-storage: Add new JMS567 revision to unusual_devs
        usb: chipidea: udc: don't do hardware access if gadget has stopped
        usbtmc: more sanity checking for packet size
        usb: udc: lpc32xx: silence fall-through warning
      2c248f92
    • Saeed Mahameed's avatar
      Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux · a06ebb8d
      Saeed Mahameed authored
      Merge mlx5-next patches needed for upcoming mlx5 software steering.
      
      1) Alex adds HW bits and definitions required for SW steering
      2) Ariel moves device memory management to mlx5_core (From mlx5_ib)
      3) Maor, Cleanups and fixups for eswitch mode and RoCE
      4) Mark, Set only stag for match untagged packets
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      a06ebb8d
    • Mark Bloch's avatar
      net/mlx5: Set only stag for match untagged packets · fc603294
      Mark Bloch authored
      cvlan_tag enabled in match criteria and disabled in
      match value means both S & C tags don't exist (untagged of both).
      Signed-off-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      fc603294
    • Maor Gottlieb's avatar
      net/mlx5: Add stub for mlx5_eswitch_mode · f813cb50
      Maor Gottlieb authored
      Return MLX5_ESWITCH_NONE when CONFIG_MLX5_ESWITCH
      is not selected.
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      f813cb50
    • Maor Gottlieb's avatar
      net/mlx5: Avoid disabling RoCE when uninitialized · 3a6ef515
      Maor Gottlieb authored
      Move the check if RoCE steering is initialized to the
      disable RoCE function, it will ensure that we disable
      RoCE only if we succeeded in enabling it before.
      
      Fixes: 80f09dfc ("net/mlx5: Eswitch, enable RoCE loopback traffic")
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      3a6ef515
    • Alex Vesker's avatar
      net/mlx5: Add HW bits and definitions required for SW steering · 97b5484e
      Alex Vesker authored
      Add the required Software Steering hardware definitions and
      bits to mlx5_ifc.
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarYevgeny Klitenik <kliten@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      97b5484e
    • Ariel Levkovich's avatar
      net/mlx5: Move device memory management to mlx5_core · c9b9dcb4
      Ariel Levkovich authored
      Move the device memory allocation and deallocation commands
      SW ICM memory to mlx5_core to expose this API for all
      mlx5_core users.
      
      This comes as preparation for supporting SW steering in kernel
      where it will be required to allocate and register device
      memory for direct rule insertion.
      
      In addition, an API to register this device memory for future
      remote access operations is introduced using the create_mkey
      commands.
      Signed-off-by: default avatarAriel Levkovich <lariel@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      c9b9dcb4
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 345464fb
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix some length checks during OGM processing in batman-adv, from
          Sven Eckelmann.
      
       2) Fix regression that caused netfilter conntrack sysctls to not be
          per-netns any more. From Florian Westphal.
      
       3) Use after free in netpoll, from Feng Sun.
      
       4) Guard destruction of pfifo_fast per-cpu qdisc stats with
          qdisc_is_percpu_stats(), from Davide Caratti. Similar bug is fixed
          in pfifo_fast_enqueue().
      
       5) Fix memory leak in mld_del_delrec(), from Eric Dumazet.
      
       6) Handle neigh events on internal ports correctly in nfp, from John
          Hurley.
      
       7) Clear SKB timestamp in NF flow table code so that it does not
          confuse fq scheduler. From Florian Westphal.
      
       8) taprio destroy can crash if it is invoked in a failure path of
          taprio_init(), because the list head isn't setup properly yet and
          the list del is unconditional. Perform the list add earlier to
          address this. From Vladimir Oltean.
      
       9) Make sure to reapply vlan filters on device up, in aquantia driver.
          From Dmitry Bogdanov.
      
      10) sgiseeq driver releases DMA memory using free_page() instead of
          dma_free_attrs(). From Christophe JAILLET.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
        net: seeq: Fix the function used to release some memory in an error handling path
        enetc: Add missing call to 'pci_free_irq_vectors()' in probe and remove functions
        net: bcmgenet: use ethtool_op_get_ts_info()
        tc-testing: don't hardcode 'ip' in nsPlugin.py
        net: dsa: microchip: add KSZ8563 compatibility string
        dt-bindings: net: dsa: document additional Microchip KSZ8563 switch
        net: aquantia: fix out of memory condition on rx side
        net: aquantia: linkstate irq should be oneshot
        net: aquantia: reapply vlan filters on up
        net: aquantia: fix limit of vlan filters
        net: aquantia: fix removal of vlan 0
        net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate
        taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte
        taprio: Fix kernel panic in taprio_destroy
        net: dsa: microchip: fill regmap_config name
        rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2]
        net: stmmac: dwmac-rk: Don't fail if phy regulator is absent
        amd-xgbe: Fix error path in xgbe_mod_init()
        netfilter: nft_meta_bridge: Fix get NFT_META_BRI_IIFVPROTO in network byteorder
        mac80211: Correctly set noencrypt for PAE frames
        ...
      345464fb
  4. 01 Sep, 2019 5 commits