Commit 451c7900 authored by David S. Miller's avatar David S. Miller

Merge branch 'devlink-documentation-refactor'

Jacob Keller says:

====================
devlink documentation refactor

This series updates the devlink documentation, with a few primary goals

 * move all of the devlink documentation into a dedicated subfolder
 * convert that documentation to the reStructuredText format
 * merge driver-specific documentations into a single file per driver
 * add missing documentation, including per-driver and devlink generally

For each driver, I took the time to review the code and add further
documentation on the various features it currently supports. Additionally, I
added new documentation files for some of the features such as
devlink-dpipe, devlink-resource, and devlink-regions.

Note for the region snapshot triggering, I kept that as a separate patch as
that is based on work that has not yet been merged to net-next, and may
change.

I also improved the existing documentation for devlink-info and
devlink-param by adding a bit more of an introduction when converting it to
the rst format.
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents efa193ba 9cd3e2c6
...@@ -39,7 +39,7 @@ but without enabling "switch" mode, or to different bridges. ...@@ -39,7 +39,7 @@ but without enabling "switch" mode, or to different bridges.
Devlink configuration parameters Devlink configuration parameters
==================== ====================
See Documentation/networking/devlink-params-ti-cpsw-switch.txt See Documentation/networking/devlink/ti-cpsw-switch.rst
==================== ====================
# Bridging in dual mac mode # Bridging in dual mac mode
......
The health mechanism is targeted for Real Time Alerting, in order to know when
something bad had happened to a PCI device
- Provide alert debug information
- Self healing
- If problem needs vendor support, provide a way to gather all needed debugging
information.
The main idea is to unify and centralize driver health reports in the
generic devlink instance and allow the user to set different
attributes of the health reporting and recovery procedures.
The devlink health reporter:
Device driver creates a "health reporter" per each error/health type.
Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
or unknown (driver specific).
For each registered health reporter a driver can issue error/health reports
asynchronously. All health reports handling is done by devlink.
Device driver can provide specific callbacks for each "health reporter", e.g.
- Recovery procedures
- Diagnostics and object dump procedures
- OOB initial parameters
Different parts of the driver can register different types of health reporters
with different handlers.
Once an error is reported, devlink health will do the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and saved at the reporter instance (as long as
there is no other dump which is already stored)
* Auto recovery attempt is being done. Depends on:
- Auto-recovery configuration
- Grace period vs. time passed since last recover
The user interface:
User can access/change each reporter's parameters and driver specific callbacks
via devlink, e.g per error type (per health reporter)
- Configure reporter's generic parameters (like: disable/enable auto recovery)
- Invoke recovery procedure
- Run diagnostics
- Object dump
The devlink health interface (via netlink):
DEVLINK_CMD_HEALTH_REPORTER_GET
Retrieves status and configuration info per DEV and reporter.
DEVLINK_CMD_HEALTH_REPORTER_SET
Allows reporter-related configuration setting.
DEVLINK_CMD_HEALTH_REPORTER_RECOVER
Triggers a reporter's recovery procedure.
DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
Retrieves diagnostics data from a reporter on a device.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
Retrieves the last stored dump. Devlink health
saves a single dump. If an dump is not already stored by the devlink
for this reporter, devlink generates a new dump.
dump output is defined by the reporter.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
Clears the last saved dump file for the specified reporter.
netlink
+--------------------------+
| |
| + |
| | |
+--------------------------+
|request for ops
|(diagnose,
mlx5_core devlink |recover,
|dump)
+--------+ +--------------------------+
| | | reporter| |
| | | +---------v----------+ |
| | ops execution | | | |
| <----------------------------------+ | |
| | | | | |
| | | + ^------------------+ |
| | | | request for ops |
| | | | (recover, dump) |
| | | | |
| | | +-+------------------+ |
| | health report | | health handler | |
| +-------------------------------> | |
| | | +--------------------+ |
| | health reporter create | |
| +----------------------------> |
+--------+ +--------------------------+
enable_sriov [DEVICE, GENERIC]
Configuration mode: Permanent
ignore_ari [DEVICE, GENERIC]
Configuration mode: Permanent
msix_vec_per_pf_max [DEVICE, GENERIC]
Configuration mode: Permanent
msix_vec_per_pf_min [DEVICE, GENERIC]
Configuration mode: Permanent
gre_ver_check [DEVICE, DRIVER-SPECIFIC]
Generic Routing Encapsulation (GRE) version check will
be enabled in the device. If disabled, device skips
version checking for incoming packets.
Type: Boolean
Configuration mode: Permanent
flow_steering_mode [DEVICE, DRIVER-SPECIFIC]
Controls the flow steering mode of the driver.
Two modes are supported:
1. 'dmfs' - Device managed flow steering.
2. 'smfs - Software/Driver managed flow steering.
In DMFS mode, the HW steering entities are created and
managed through the Firmware.
In SMFS mode, the HW steering entities are created and
managed though by the driver directly into Hardware
without firmware intervention.
Type: String
Configuration mode: runtime
enable_roce [DEVICE, GENERIC]
Enable handling of RoCE traffic in the device.
Defaultly enabled.
Configuration mode: driverinit
fw_load_policy [DEVICE, GENERIC]
Configuration mode: driverinit
acl_region_rehash_interval [DEVICE, DRIVER-SPECIFIC]
Sets an interval for periodic ACL region rehashes.
The value is in milliseconds, minimal value is "3000".
Value "0" disables the periodic work.
The first rehash will be run right after value is set.
Type: u32
Configuration mode: runtime
ATU_hash [DEVICE, DRIVER-SPECIFIC]
Select one of four possible hashing algorithms for
MAC addresses in the Address Translation Unit.
A value of 3 seems to work better than the default of
1 when many MAC addresses have the same OUI.
Configuration mode: runtime
Type: u8. 0-3 valid.
fw_load_policy [DEVICE, GENERIC]
Configuration mode: permanent
reset_dev_on_drv_probe [DEVICE, GENERIC]
Configuration mode: permanent
ale_bypass [DEVICE, DRIVER-SPECIFIC]
Allows to enable ALE_CONTROL(4).BYPASS mode for debug purposes.
All packets will be sent to the Host port only if enabled.
Type: bool
Configuration mode: runtime
switch_mode [DEVICE, DRIVER-SPECIFIC]
Enable switch mode
Type: bool
Configuration mode: runtime
Devlink configuration parameters
================================
Following is the list of configuration parameters via devlink interface.
Each parameter can be generic or driver specific and are device level
parameters.
Note that the driver-specific files should contain the generic params
they support to, with supported config modes.
Each parameter can be set in different configuration modes:
runtime - set while driver is running, no reset required.
driverinit - applied while driver initializes, requires restart
driver by devlink reload command.
permanent - written to device's non-volatile memory, hard reset
required.
Following is the list of parameters:
====================================
enable_sriov [DEVICE, GENERIC]
Enable Single Root I/O Virtualisation (SRIOV) in
the device.
Type: Boolean
ignore_ari [DEVICE, GENERIC]
Ignore Alternative Routing-ID Interpretation (ARI)
capability. If enabled, adapter will ignore ARI
capability even when platforms has the support
enabled and creates same number of partitions when
platform does not support ARI.
Type: Boolean
msix_vec_per_pf_max [DEVICE, GENERIC]
Provides the maximum number of MSIX interrupts that
a device can create. Value is same across all
physical functions (PFs) in the device.
Type: u32
msix_vec_per_pf_min [DEVICE, GENERIC]
Provides the minimum number of MSIX interrupts required
for the device initialization. Value is same across all
physical functions (PFs) in the device.
Type: u32
fw_load_policy [DEVICE, GENERIC]
Controls the device's firmware loading policy.
Valid values:
* DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DRIVER (0)
Load firmware version preferred by the driver.
* DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH (1)
Load firmware currently stored in flash.
* DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DISK (2)
Load firmware currently available on host's disk.
Type: u8
reset_dev_on_drv_probe [DEVICE, GENERIC]
Controls the device's reset policy on driver probe.
Valid values:
* DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_UNKNOWN (0)
Unknown or invalid value.
* DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_ALWAYS (1)
Always reset device on driver probe.
* DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_NEVER (2)
Never reset device on driver probe.
* DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_DISK (3)
Reset only if device firmware can be found in the
filesystem.
Type: u8
enable_roce [DEVICE, GENERIC]
Enable handling of RoCE traffic in the device.
Type: Boolean
.. SPDX-License-Identifier: GPL-2.0
======================
Devlink Trap netdevsim
======================
Driver-specific Traps
=====================
.. list-table:: List of Driver-specific Traps Registered by ``netdevsim``
:widths: 5 5 90
* - Name
- Type
- Description
* - ``fid_miss``
- ``exception``
- When a packet enters the device it is classified to a filtering
indentifier (FID) based on the ingress port and VLAN. This trap is used
to trap packets for which a FID could not be found
.. SPDX-License-Identifier: GPL-2.0
====================
bnxt devlink support
====================
This document describes the devlink features implemented by the ``bnxt``
device driver.
Parameters
==========
.. list-table:: Generic parameters implemented
* - Name
- Mode
* - ``enable_sriov``
- Permanent
* - ``ignore_ari``
- Permanent
* - ``msix_vec_per_pf_max``
- Permanent
* - ``msix_vec_per_pf_min``
- Permanent
The ``bnxt`` driver also implements the following driver-specific
parameters.
.. list-table:: Driver-specific parameters implemented
:widths: 5 5 5 85
* - Name
- Type
- Mode
- Description
* - ``gre_ver_check``
- Boolean
- Permanent
- Generic Routing Encapsulation (GRE) version check will be enabled in
the device. If disabled, the device will skip the version check for
incoming packets.
.. SPDX-License-Identifier: GPL-2.0
=============
Devlink DPIPE
=============
Background
==========
While performing the hardware offloading process, much of the hardware
specifics cannot be presented. These details are useful for debugging, and
``devlink-dpipe`` provides a standardized way to provide visibility into the
offloading process.
For example, the routing longest prefix match (LPM) algorithm used by the
Linux kernel may differ from the hardware implementation. The pipeline debug
API (DPIPE) is aimed at providing the user visibility into the ASIC's
pipeline in a generic way.
The hardware offload process is expected to be done in a way that the user
should not be able to distinguish between the hardware vs. software
implementation. In this process, hardware specifics are neglected. In
reality those details can have lots of meaning and should be exposed in some
standard way.
This problem is made even more complex when one wishes to offload the
control path of the whole networking stack to a switch ASIC. Due to
differences in the hardware and software models some processes cannot be
represented correctly.
One example is the kernel's LPM algorithm which in many cases differs
greatly to the hardware implementation. The configuration API is the same,
but one cannot rely on the Forward Information Base (FIB) to look like the
Level Path Compression trie (LPC-trie) in hardware.
In many situations trying to analyze systems failure solely based on the
kernel's dump may not be enough. By combining this data with complementary
information about the underlying hardware, this debugging can be made
easier; additionally, the information can be useful when debugging
performance issues.
Overview
========
The ``devlink-dpipe`` interface closes this gap. The hardware's pipeline is
modeled as a graph of match/action tables. Each table represents a specific
hardware block. This model is not new, first being used by the P4 language.
Traditionally it has been used as an alternative model for hardware
configuration, but the ``devlink-dpipe`` interface uses it for visibility
purposes as a standard complementary tool. The system's view from
``devlink-dpipe`` should change according to the changes done by the
standard configuration tools.
For example, it’s quiet common to implement Access Control Lists (ACL)
using Ternary Content Addressable Memory (TCAM). The TCAM memory can be
divided into TCAM regions. Complex TC filters can have multiple rules with
different priorities and different lookup keys. On the other hand hardware
TCAM regions have a predefined lookup key. Offloading the TC filter rules
using TCAM engine can result in multiple TCAM regions being interconnected
in a chain (which may affect the data path latency). In response to a new TC
filter new tables should be created describing those regions.
Model
=====
The ``DPIPE`` model introduces several objects:
* headers
* tables
* entries
A ``header`` describes packet formats and provides names for fields within
the packet. A ``table`` describes hardware blocks. An ``entry`` describes
the actual content of a specific table.
The hardware pipeline is not port specific, but rather describes the whole
ASIC. Thus it is tied to the top of the ``devlink`` infrastructure.
Drivers can register and unregister tables at run time, in order to support
dynamic behavior. This dynamic behavior is mandatory for describing hardware
blocks like TCAM regions which can be allocated and freed dynamically.
``devlink-dpipe`` generally is not intended for configuration. The exception
is hardware counting for a specific table.
The following commands are used to obtain the ``dpipe`` objects from
userspace:
* ``table_get``: Receive a table's description.
* ``headers_get``: Receive a device's supported headers.
* ``entries_get``: Receive a table's current entries.
* ``counters_set``: Enable or disable counters on a table.
Table
-----
The driver should implement the following operations for each table:
* ``matches_dump``: Dump the supported matches.
* ``actions_dump``: Dump the supported actions.
* ``entries_dump``: Dump the actual content of the table.
* ``counters_set_update``: Synchronize hardware with counters enabled or
disabled.
Header/Field
------------
In a similar way to P4 headers and fields are used to describe a table's
behavior. There is a slight difference between the standard protocol headers
and specific ASIC metadata. The protocol headers should be declared in the
``devlink`` core API. On the other hand ASIC meta data is driver specific
and should be defined in the driver. Additionally, each driver-specific
devlink documentation file should document the driver-specific ``dpipe``
headers it implements. The headers and fields are identified by enumeration.
In order to provide further visibility some ASIC metadata fields could be
mapped to kernel objects. For example, internal router interface indexes can
be directly mapped to the net device ifindex. FIB table indexes used by
different Virtual Routing and Forwarding (VRF) tables can be mapped to
internal routing table indexes.
Match
-----
Matches are kept primitive and close to hardware operation. Match types like
LPM are not supported due to the fact that this is exactly a process we wish
to describe in full detail. Example of matches:
* ``field_exact``: Exact match on a specific field.
* ``field_exact_mask``: Exact match on a specific field after masking.
* ``field_range``: Match on a specific range.
The id's of the header and the field should be specified in order to
identify the specific field. Furthermore, the header index should be
specified in order to distinguish multiple headers of the same type in a
packet (tunneling).
Action
------
Similar to match, the actions are kept primitive and close to hardware
operation. For example:
* ``field_modify``: Modify the field value.
* ``field_inc``: Increment the field value.
* ``push_header``: Add a header.
* ``pop_header``: Remove a header.
Entry
-----
Entries of a specific table can be dumped on demand. Each eentry is
identified with an index and its properties are described by a list of
match/action values and specific counter. By dumping the tables content the
interactions between tables can be resolved.
Abstraction Example
===================
The following is an example of the abstraction model of the L3 part of
Mellanox Spectrum ASIC. The blocks are described in the order they appear in
the pipeline. The table sizes in the following examples are not real
hardware sizes and are provided for demonstration purposes.
LPM
---
The LPM algorithm can be implemented as a list of hash tables. Each hash
table contains routes with the same prefix length. The root of the list is
/32, and in case of a miss the hardware will continue to the next hash
table. The depth of the search will affect the data path latency.
In case of a hit the entry contains information about the next stage of the
pipeline which resolves the MAC address. The next stage can be either local
host table for directly connected routes, or adjacency table for next-hops.
The ``meta.lpm_prefix`` field is used to connect two LPM tables.
.. code::
table lpm_prefix_16 {
size: 4096,
counters_enabled: true,
match: { meta.vr_id: exact,
ipv4.dst_addr: exact_mask,
ipv6.dst_addr: exact_mask,
meta.lpm_prefix: exact },
action: { meta.adj_index: set,
meta.adj_group_size: set,
meta.rif_port: set,
meta.lpm_prefix: set },
}
Local Host
----------
In the case of local routes the LPM lookup already resolves the egress
router interface (RIF), yet the exact MAC address is not known. The local
host table is a hash table combining the output interface id with
destination IP address as a key. The result is the MAC address.
.. code::
table local_host {
size: 4096,
counters_enabled: true,
match: { meta.rif_port: exact,
ipv4.dst_addr: exact},
action: { ethernet.daddr: set }
}
Adjacency
---------
In case of remote routes this table does the ECMP. The LPM lookup results in
ECMP group size and index that serves as a global offset into this table.
Concurrently a hash of the packet is generated. Based on the ECMP group size
and the packet's hash a local offset is generated. Multiple LPM entries can
point to the same adjacency group.
.. code::
table adjacency {
size: 4096,
counters_enabled: true,
match: { meta.adj_index: exact,
meta.adj_group_size: exact,
meta.packet_hash_index: exact },
action: { ethernet.daddr: set,
meta.erif: set }
}
ERIF
----
In case the egress RIF and destination MAC have been resolved by previous
tables this table does multiple operations like TTL decrease and MTU check.
Then the decision of forward/drop is taken and the port L3 statistics are
updated based on the packet's type (broadcast, unicast, multicast).
.. code::
table erif {
size: 800,
counters_enabled: true,
match: { meta.rif_port: exact,
meta.is_l3_unicast: exact,
meta.is_l3_broadcast: exact,
meta.is_l3_multicast, exact },
action: { meta.l3_drop: set,
meta.l3_forward: set }
}
.. SPDX-License-Identifier: GPL-2.0
==============
Devlink Health
==============
Background
==========
The ``devlink`` health mechanism is targeted for Real Time Alerting, in
order to know when something bad happened to a PCI device.
* Provide alert debug information.
* Self healing.
* If problem needs vendor support, provide a way to gather all needed
debugging information.
Overview
========
The main idea is to unify and centralize driver health reports in the
generic ``devlink`` instance and allow the user to set different
attributes of the health reporting and recovery procedures.
The ``devlink`` health reporter:
Device driver creates a "health reporter" per each error/health type.
Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
or unknown (driver specific).
For each registered health reporter a driver can issue error/health reports
asynchronously. All health reports handling is done by ``devlink``.
Device driver can provide specific callbacks for each "health reporter", e.g.:
* Recovery procedures
* Diagnostics procedures
* Object dump procedures
* OOB initial parameters
Different parts of the driver can register different types of health reporters
with different handlers.
Actions
=======
Once an error is reported, devlink health will perform the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and saved at the reporter instance (as long as
there is no other dump which is already stored)
* Auto recovery attempt is being done. Depends on:
- Auto-recovery configuration
- Grace period vs. time passed since last recover
User Interface
==============
User can access/change each reporter's parameters and driver specific callbacks
via ``devlink``, e.g per error type (per health reporter):
* Configure reporter's generic parameters (like: disable/enable auto recovery)
* Invoke recovery procedure
* Run diagnostics
* Object dump
.. list-table:: List of devlink health interfaces
:widths: 10 90
* - Name
- Description
* - ``DEVLINK_CMD_HEALTH_REPORTER_GET``
- Retrieves status and configuration info per DEV and reporter.
* - ``DEVLINK_CMD_HEALTH_REPORTER_SET``
- Allows reporter-related configuration setting.
* - ``DEVLINK_CMD_HEALTH_REPORTER_RECOVER``
- Triggers a reporter's recovery procedure.
* - ``DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE``
- Retrieves diagnostics data from a reporter on a device.
* - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET``
- Retrieves the last stored dump. Devlink health
saves a single dump. If an dump is not already stored by the devlink
for this reporter, devlink generates a new dump.
dump output is defined by the reporter.
* - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR``
- Clears the last saved dump file for the specified reporter.
The following diagram provides a general overview of ``devlink-health``::
netlink
+--------------------------+
| |
| + |
| | |
+--------------------------+
|request for ops
|(diagnose,
mlx5_core devlink |recover,
|dump)
+--------+ +--------------------------+
| | | reporter| |
| | | +---------v----------+ |
| | ops execution | | | |
| <----------------------------------+ | |
| | | | | |
| | | + ^------------------+ |
| | | | request for ops |
| | | | (recover, dump) |
| | | | |
| | | +-+------------------+ |
| | health report | | health handler | |
| +-------------------------------> | |
| | | +--------------------+ |
| | health reporter create | |
| +----------------------------> |
+--------+ +--------------------------+
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
===================== ============
Devlink info versions Devlink Info
===================== ============
The ``devlink-info`` mechanism enables device drivers to report device
information in a generic fashion. It is extensible, and enables exporting
even device or driver specific information.
devlink supports representing the following types of versions
.. list-table:: List of version types
:widths: 5 95
* - Type
- Description
* - ``fixed``
- Represents fixed versions, which cannot change. For example,
component identifiers or the board version reported in the PCI VPD.
* - ``running``
- Represents the version of the currently running component. For
example the running version of firmware. These versions generally
only update after a reboot.
* - ``stored``
- Represents the version of a component as stored, such as after a
flash update. Stored values should update to reflect changes in the
flash even if a reboot has not yet occurred.
Generic Versions
================
It is expected that drivers use the following generic names for exporting
version information. Other information may be exposed using driver-specific
names, but these should be documented in the driver-specific file.
board.id board.id
======== --------
Unique identifier of the board design. Unique identifier of the board design.
board.rev board.rev
========= ---------
Board design revision. Board design revision.
asic.id asic.id
======= -------
ASIC design identifier. ASIC design identifier.
asic.rev asic.rev
======== --------
ASIC design revision. ASIC design revision.
board.manufacture board.manufacture
================= -----------------
An identifier of the company or the facility which produced the part. An identifier of the company or the facility which produced the part.
fw fw
== --
Overall firmware version, often representing the collection of Overall firmware version, often representing the collection of
fw.mgmt, fw.app, etc. fw.mgmt, fw.app, etc.
fw.mgmt fw.mgmt
======= -------
Control unit firmware version. This firmware is responsible for house Control unit firmware version. This firmware is responsible for house
keeping tasks, PHY control etc. but not the packet-by-packet data path keeping tasks, PHY control etc. but not the packet-by-packet data path
operation. operation.
fw.app fw.app
====== ------
Data path microcode controlling high-speed packet processing. Data path microcode controlling high-speed packet processing.
fw.undi fw.undi
======= -------
UNDI software, may include the UEFI driver, firmware or both. UNDI software, may include the UEFI driver, firmware or both.
fw.ncsi fw.ncsi
======= -------
Version of the software responsible for supporting/handling the Version of the software responsible for supporting/handling the
Network Controller Sideband Interface. Network Controller Sideband Interface.
fw.psid fw.psid
======= -------
Unique identifier of the firmware parameter set. Unique identifier of the firmware parameter set.
.. SPDX-License-Identifier: GPL-2.0
==============
Devlink Params
==============
``devlink`` provides capability for a driver to expose device parameters for low
level device functionality. Since devlink can operate at the device-wide
level, it can be used to provide configuration that may affect multiple
ports on a single device.
This document describes a number of generic parameters that are supported
across multiple drivers. Each driver is also free to add their own
parameters. Each driver must document the specific parameters they support,
whether generic or not.
Configuration modes
===================
Parameters may be set in different configuration modes.
.. list-table:: Possible configuration modes
:widths: 5 90
* - Name
- Description
* - ``runtime``
- set while the driver is running, and takes effect immediately. No
reset is required.
* - ``driverinit``
- applied while the driver initializes. Requires the user to restart
the driver using the ``devlink`` reload command.
* - ``permanent``
- written to the device's non-volatile memory. A hard reset is required
for it to take effect.
Reloading
---------
In order for ``driverinit`` parameters to take effect, the driver must
support reloading via the ``devlink-reload`` command. This command will
request a reload of the device driver.
Generic configuration parameters
================================
The following is a list of generic configuration parameters that drivers may
add. Use of generic parameters is preferred over each driver creating their
own name.
.. list-table:: List of generic parameters
:widths: 5 5 90
* - Name
- Type
- Description
* - ``enable_sriov``
- Boolean
- Enable Single Root I/O Virtualization (SRIOV) in the device.
* - ``ignore_ari``
- Boolean
- Ignore Alternative Routing-ID Interpretation (ARI) capability. If
enabled, the adapter will ignore ARI capability even when the
platform has support enabled. The device will create the same number
of partitions as when the platform does not support ARI.
* - ``msix_vec_per_pf_max``
- u32
- Provides the maximum number of MSI-X interrupts that a device can
create. Value is the same across all physical functions (PFs) in the
device.
* - ``msix_vec_per_pf_min``
- u32
- Provides the minimum number of MSI-X interrupts required for the
device to initialize. Value is the same across all physical functions
(PFs) in the device.
* - ``fw_load_policy``
- u8
- Control the device's firmware loading policy.
- ``DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DRIVER`` (0)
Load firmware version preferred by the driver.
- ``DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH`` (1)
Load firmware currently stored in flash.
- ``DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DISK`` (2)
Load firmware currently available on host's disk.
* - ``reset_dev_on_drv_probe``
- u8
- Controls the device's reset policy on driver probe.
- ``DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_UNKNOWN`` (0)
Unknown or invalid value.
- ``DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_ALWAYS`` (1)
Always reset device on driver probe.
- ``DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_NEVER`` (2)
Never reset device on driver probe.
- ``DEVLINK_PARAM_RESET_DEV_ON_DRV_PROBE_VALUE_DISK`` (3)
Reset the device only if firmware can be found in the filesystem.
* - ``enable_roce``
- Boolean
- Enable handling of RoCE traffic in the device.
* - ``internal_err_reset``
- Boolean
- When enabled, the device driver will reset the device on internal
errors.
* - ``max_macs``
- u32
- Specifies the maximum number of MAC addresses per ethernet port of
this device.
* - ``region_snapshot_enable``
- Boolean
- Enable capture of ``devlink-region`` snapshots.
.. SPDX-License-Identifier: GPL-2.0
==============
Devlink Region
==============
``devlink`` regions enable access to driver defined address regions using
devlink.
Each device can create and register its own supported address regions. The
region can then be accessed via the devlink region interface.
Region snapshots are collected by the driver, and can be accessed via read
or dump commands. This allows future analysis on the created snapshots.
Regions may optionally support triggering snapshots on demand.
The major benefit to creating a region is to provide access to internal
address regions that are otherwise inaccessible to the user.
Regions may also be used to provide an additional way to debug complex error
states, but see also :doc:`devlink-health`
example usage
-------------
.. code:: shell
$ devlink region help
$ devlink region show [ DEV/REGION ]
$ devlink region del DEV/REGION snapshot SNAPSHOT_ID
$ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
$ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
address ADDRESS length length
# Show all of the exposed regions with region sizes:
$ devlink region show
pci/0000:00:05.0/cr-space: size 1048576 snapshot [1 2]
pci/0000:00:05.0/fw-health: size 64 snapshot [1 2]
# Delete a snapshot using:
$ devlink region del pci/0000:00:05.0/cr-space snapshot 1
# Trigger (request) a snapshot be taken:
$ devlink region trigger pci/0000:00:05.0/cr-space
# Dump a snapshot:
$ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc
0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5
# Read a specific part of a snapshot:
$ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0
length 16
0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
As regions are likely very device or driver specific, no generic regions are
defined. See the driver-specific documentation files for information on the
specific regions a driver supports.
.. SPDX-License-Identifier: GPL-2.0
================
Devlink Resource
================
``devlink`` provides the ability for drivers to register resources, which
can allow administrators to see the device restrictions for a given
resource, as well as how much of the given resource is currently
in use. Additionally, these resources can optionally have configurable size.
This could enable the administrator to limit the number of resources that
are used.
For example, the ``netdevsim`` driver enables ``/IPv4/fib`` and
``/IPv4/fib-rules`` as resources to limit the number of IPv4 FIB entries and
rules for a given device.
Resource Ids
============
Each resource is represented by an id, and contains information about its
current size and related sub resources. To access a sub resource, you
specify the path of the resource. For example ``/IPv4/fib`` is the id for
the ``fib`` sub-resource under the ``IPv4`` resource.
example usage
-------------
The resources exposed by the driver can be observed, for example:
.. code:: shell
$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 unit entry
resources:
name linear size 98304 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128
Some resource's size can be changed. Examples:
.. code:: shell
$devlink resource set pci/0000:03:00.0 path /kvd/hash_single size 73088
$devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 74368
The changes do not apply immediately, this can be validated by the 'size_new'
attribute, which represents the pending change in size. For example:
.. code:: shell
$devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
name kvd size 245760 unit entry size_valid false
resources:
name linear size 98304 size_new 147456 occ 0 unit entry size_min 0 size_max 147456 size_gran 128
name hash_double size 60416 unit entry size_min 32768 size_max 180224 size_gran 128
name hash_single size 87040 unit entry size_min 65536 size_max 212992 size_gran 128
Note that changes in resource size may require a device reload to properly
take effect.
...@@ -233,7 +233,7 @@ help debug packet drops caused by these exceptions. The following list includes ...@@ -233,7 +233,7 @@ help debug packet drops caused by these exceptions. The following list includes
links to the description of driver-specific traps registered by various device links to the description of driver-specific traps registered by various device
drivers: drivers:
* :doc:`devlink-trap-netdevsim` * :doc:`netdevsim`
Generic Packet Trap Groups Generic Packet Trap Groups
========================== ==========================
......
Linux Devlink Documentation
===========================
devlink is an API to expose device information and resources not directly
related to any device class, such as chip-wide/switch-ASIC-wide configuration.
Interface documentation
-----------------------
The following pages describe various interfaces available through devlink in
general.
.. toctree::
:maxdepth: 1
devlink-dpipe
devlink-health
devlink-info
devlink-params
devlink-region
devlink-resource
devlink-trap
Driver-specific documentation
-----------------------------
Each driver that implements ``devlink`` is expected to document what
parameters, info versions, and other features it supports.
.. toctree::
:maxdepth: 1
bnxt
ionic
mlx4
mlx5
mlxsw
mv88e6xxx
netdevsim
nfp
qed
ti-cpsw-switch
.. SPDX-License-Identifier: GPL-2.0
=====================
ionic devlink support
=====================
This document describes the devlink features implemented by the ``ionic``
device driver.
Info versions
=============
The ``ionic`` driver reports the following versions
.. list-table:: devlink info versions implemented
:widths: 5 5 90
* - Name
- Type
- Description
* - ``fw``
- running
- Version of firmware running on the device
* - ``asic.id``
- fixed
- The ASIC type for this device
* - ``asic.rev``
- fixed
- The revision of the ASIC for this device
.. SPDX-License-Identifier: GPL-2.0
====================
mlx4 devlink support
====================
This document describes the devlink features implemented by the ``mlx4``
device driver.
Parameters
==========
.. list-table:: Generic parameters implemented
* - Name
- Mode
* - ``internal_err_reset``
- driverinit, runtime
* - ``max_macs``
- driverinit
* - ``region_snapshot_enable``
- driverinit, runtime
The ``mlx4`` driver also implements the following driver-specific
parameters.
.. list-table:: Driver-specific parameters implemented
:widths: 5 5 5 85
* - Name
- Type
- Mode
- Description
* - ``enable_64b_cqe_eqe``
- Boolean
- driverinit
- Enable 64 byte CQEs/EQEs, if the FW supports it.
* - ``enable_4k_uar``
- Boolean
- driverinit
- Enable using the 4k UAR.
The ``mlx4`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
Regions
=======
The ``mlx4`` driver supports dumping the firmware PCI crspace and health
buffer during a critical firmware issue.
In case a firmware command times out, firmware getting stuck, or a non zero
value on the catastrophic buffer, a snapshot will be taken by the driver.
The ``cr-space`` region will contain the firmware PCI crspace contents. The
``fw-health`` region will contain the device firmware's health buffer.
Snapshots for both of these regions are taken on the same event triggers.
.. SPDX-License-Identifier: GPL-2.0
====================
mlx5 devlink support
====================
This document describes the devlink features implemented by the ``mlx5``
device driver.
Parameters
==========
.. list-table:: Generic parameters implemented
* - Name
- Mode
* - ``enable_roce``
- driverinit
The ``mlx5`` driver also implements the following driver-specific
parameters.
.. list-table:: Driver-specific parameters implemented
:widths: 5 5 5 85
* - Name
- Type
- Mode
- Description
* - ``flow_steering_mode``
- string
- runtime
- Controls the flow steering mode of the driver
* ``dmfs`` Device managed flow steering. In DMFS mode, the HW
steering entities are created and managed through firmware.
* ``smfs`` Software managed flow steering. In SMFS mode, the HW
steering entities are created and manage through the driver without
firmware intervention.
The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
Info versions
=============
The ``mlx5`` driver reports the following versions
.. list-table:: devlink info versions implemented
:widths: 5 5 90
* - Name
- Type
- Description
* - ``fw.psid``
- fixed
- Used to represent the board id of the device.
* - ``fw.version``
- stored, running
- Three digit major.minor.subminor firmware version number.
.. SPDX-License-Identifier: GPL-2.0
=====================
mlxsw devlink support
=====================
This document describes the devlink features implemented by the ``mlxsw``
device driver.
Parameters
==========
.. list-table:: Generic parameters implemented
* - Name
- Mode
* - ``fw_load_policy``
- driverinit
The ``mlxsw`` driver also implements the following driver-specific
parameters.
.. list-table:: Driver-specific parameters implemented
:widths: 5 5 5 85
* - Name
- Type
- Mode
- Description
* - ``acl_region_rehash_interval``
- u32
- runtime
- Sets an interval for periodic ACL region rehashes. The value is
specified in milliseconds, with a minimum of ``3000``. The value of
``0`` disables periodic work entirely. The first rehash will be run
immediately after the value is set.
The ``mlxsw`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
Info versions
=============
The ``mlx5`` driver reports the following versions
.. list-table:: devlink info versions implemented
:widths: 5 5 90
* - Name
- Type
- Description
* - ``hw.revision``
- fixed
- The hardware revision for this board
* - ``fw.psid``
- fixed
- Firmware PSID
* - ``fw.version``
- running
- Three digit firmware version
.. SPDX-License-Identifier: GPL-2.0
=========================
mv88e6xxx devlink support
=========================
This document describes the devlink features implemented by the ``mv88e6xxx``
device driver.
Parameters
==========
The ``mv88e6xxx`` driver implements the following driver-specific parameters.
.. list-table:: Driver-specific parameters implemented
:widths: 5 5 5 85
* - Name
- Type
- Mode
- Description
* - ``ATU_hash``
- u8
- runtime
- Select one of four possible hashing algorithms for MAC addresses in
the Address Translation Unit. A value of 3 may work better than the
default of 1 when many MAC addresses have the same OUI. Only the
values 0 to 3 are valid for this parameter.
.. SPDX-License-Identifier: GPL-2.0
=========================
netdevsim devlink support
=========================
This document describes the ``devlink`` features supported by the
``netdevsim`` device driver.
Parameters
==========
.. list-table:: Generic parameters implemented
* - Name
- Mode
* - ``max_macs``
- driverinit
The ``netdevsim`` driver also implements the following driver-specific
parameters.
.. list-table:: Driver-specific parameters implemented
:widths: 5 5 5 85
* - Name
- Type
- Mode
- Description
* - ``test1``
- Boolean
- driverinit
- Test parameter used to show how a driver-specific devlink parameter
can be implemented.
The ``netdevsim`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
Regions
=======
The ``netdevsim`` driver exposes a ``dummy`` region as an example of how the
devlink-region interfaces work. A snapshot is taken whenever the
``take_snapshot`` debugfs file is written to.
Resources
=========
The ``netdevsim`` driver exposes resources to control the number of FIB
entries and FIB rule entries that the driver will allow.
.. code:: shell
$ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 96
$ devlink resource set netdevsim/netdevsim0 path /IPv4/fib-rules size 16
$ devlink resource set netdevsim/netdevsim0 path /IPv6/fib size 64
$ devlink resource set netdevsim/netdevsim0 path /IPv6/fib-rules size 16
$ devlink dev reload netdevsim/netdevsim0
Driver-specific Traps
=====================
.. list-table:: List of Driver-specific Traps Registered by ``netdevsim``
:widths: 5 5 90
* - Name
- Type
- Description
* - ``fid_miss``
- ``exception``
- When a packet enters the device it is classified to a filtering
indentifier (FID) based on the ingress port and VLAN. This trap is used
to trap packets for which a FID could not be found
.. SPDX-License-Identifier: GPL-2.0
===================
nfp devlink support
===================
This document describes the devlink features implemented by the ``nfp``
device driver.
Parameters
==========
.. list-table:: Generic parameters implemented
* - Name
- Mode
* - ``fw_load_policy``
- permanent
* - ``reset_dev_on_drv_probe``
- permanent
Info versions
=============
The ``nfp`` driver reports the following versions
.. list-table:: devlink info versions implemented
:widths: 5 5 90
* - Name
- Type
- Description
* - ``board.id``
- fixed
- Part number identifying the board design
* - ``board.rev``
- fixed
- Revision of the board design
* - ``board.manufacture``
- fixed
- Vendor of the board design
* - ``board.model``
- fixed
- Model name of the board design
* - ``fw.bundle_id``
- stored, running
- Firmware bundle id
* - ``fw.mgmt``
- stored, running
- Version of the management firmware
* - ``fw.cpld``
- stored, running
- The CPLD firmware component version
* - ``fw.app``
- stored, running
- The APP firmware component version
* - ``fw.undi``
- stored, running
- The UNDI firmware component version
* - ``fw.ncsi``
- stored, running
- The NSCI firmware component version
* - ``chip.init``
- stored, running
- The CFGR firmware component version
.. SPDX-License-Identifier: GPL-2.0
===================
qed devlink support
===================
This document describes the devlink features implemented by the ``qed`` core
device driver.
Parameters
==========
The ``qed`` driver implements the following driver-specific parameters.
.. list-table:: Driver-specific parameters implemented
:widths: 5 5 5 85
* - Name
- Type
- Mode
- Description
* - ``iwarp_cmt``
- Boolean
- runtime
- Enable iWARP functionality for 100g devices. Notee that this impacts
L2 performance, and is therefor not enabled by default.
.. SPDX-License-Identifier: GPL-2.0
==============================
ti-cpsw-switch devlink support
==============================
This document describes the devlink features implemented by the ``ti-cpsw-switch``
device driver.
Parameters
==========
The ``ti-cpsw-switch`` driver implements the following driver-specific
parameters.
.. list-table:: Driver-specific parameters implemented
:widths: 5 5 5 85
* - Name
- Type
- Mode
- Description
* - ``ale_bypass``
- Boolean
- runtime
- Enables ALE_CONTROL(4).BYPASS mode for debugging purposes. In this
mode, all packets will be sent to the host port only.
* - ``switch_mode``
- Boolean
- runtime
- Enable switch mode
...@@ -13,9 +13,7 @@ Contents: ...@@ -13,9 +13,7 @@ Contents:
can_ucan_protocol can_ucan_protocol
device_drivers/index device_drivers/index
dsa/index dsa/index
devlink-info-versions devlink/index
devlink-trap
devlink-trap-netdevsim
ethtool-netlink ethtool-netlink
ieee802154 ieee802154
j1939 j1939
......
...@@ -4848,6 +4848,7 @@ S: Supported ...@@ -4848,6 +4848,7 @@ S: Supported
F: net/core/devlink.c F: net/core/devlink.c
F: include/net/devlink.h F: include/net/devlink.h
F: include/uapi/linux/devlink.h F: include/uapi/linux/devlink.h
F: Documentation/networking/devlink
DIALOG SEMICONDUCTOR DRIVERS DIALOG SEMICONDUCTOR DRIVERS
M: Support Opensource <support.opensource@diasemi.com> M: Support Opensource <support.opensource@diasemi.com>
...@@ -9889,7 +9890,7 @@ S: Maintained ...@@ -9889,7 +9890,7 @@ S: Maintained
F: drivers/net/dsa/mv88e6xxx/ F: drivers/net/dsa/mv88e6xxx/
F: include/linux/platform_data/mv88e6xxx.h F: include/linux/platform_data/mv88e6xxx.h
F: Documentation/devicetree/bindings/net/dsa/marvell.txt F: Documentation/devicetree/bindings/net/dsa/marvell.txt
F: Documentation/networking/devlink-params-mv88e6xxx.txt F: Documentation/networking/devlink/mv88e6xxx.rst
MARVELL ARMADA DRM SUPPORT MARVELL ARMADA DRM SUPPORT
M: Russell King <linux@armlinux.org.uk> M: Russell King <linux@armlinux.org.uk>
......
...@@ -270,7 +270,7 @@ struct nsim_trap_data { ...@@ -270,7 +270,7 @@ struct nsim_trap_data {
}; };
/* All driver-specific traps must be documented in /* All driver-specific traps must be documented in
* Documentation/networking/devlink-trap-netdevsim.rst * Documentation/networking/devlink/netdevsim.rst
*/ */
enum { enum {
NSIM_TRAP_ID_BASE = DEVLINK_TRAP_GENERIC_ID_MAX, NSIM_TRAP_ID_BASE = DEVLINK_TRAP_GENERIC_ID_MAX,
......
...@@ -485,6 +485,8 @@ enum devlink_param_generic_id { ...@@ -485,6 +485,8 @@ enum devlink_param_generic_id {
#define DEVLINK_INFO_VERSION_GENERIC_FW_UNDI "fw.undi" #define DEVLINK_INFO_VERSION_GENERIC_FW_UNDI "fw.undi"
/* NCSI support/handler version */ /* NCSI support/handler version */
#define DEVLINK_INFO_VERSION_GENERIC_FW_NCSI "fw.ncsi" #define DEVLINK_INFO_VERSION_GENERIC_FW_NCSI "fw.ncsi"
/* FW parameter set id */
#define DEVLINK_INFO_VERSION_GENERIC_FW_PSID "fw.psid"
struct devlink_region; struct devlink_region;
struct devlink_info_req; struct devlink_info_req;
...@@ -562,7 +564,7 @@ struct devlink_trap { ...@@ -562,7 +564,7 @@ struct devlink_trap {
}; };
/* All traps must be documented in /* All traps must be documented in
* Documentation/networking/devlink-trap.rst * Documentation/networking/devlink/devlink-trap.rst
*/ */
enum devlink_trap_generic_id { enum devlink_trap_generic_id {
DEVLINK_TRAP_GENERIC_ID_SMAC_MC, DEVLINK_TRAP_GENERIC_ID_SMAC_MC,
...@@ -596,7 +598,7 @@ enum devlink_trap_generic_id { ...@@ -596,7 +598,7 @@ enum devlink_trap_generic_id {
}; };
/* All trap groups must be documented in /* All trap groups must be documented in
* Documentation/networking/devlink-trap.rst * Documentation/networking/devlink/devlink-trap.rst
*/ */
enum devlink_trap_group_generic_id { enum devlink_trap_group_generic_id {
DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS, DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment