- 01 Jul, 2019 10 commits
-
-
Parav Pandit authored
While enabling SR-IOV, PCI core already checks that if SR-IOV is already enabled, it returns failure error code. Hence, remove such duplicate check from mlx5_core driver. While at it, make mlx5_device_disable_sriov() to perform cleanup of VFs in reverse order of mlx5_device_enable_sriov(). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Bodong Wang authored
When ECPF eswitch manager is at offloads mode, it monitors functions changed event from host PF side and acts according to the number of VFs enabled/disabled. As ECPF and host PF work in two independent hosts, it's possible that host PF OS reboots but ECPF system is still kept on and continues monitoring events from host PF. When kernel from host PF side is booting, PCI iov driver does sriov_init and compute_max_vf_buses by iterating over all valid num of VFs. This triggers FLR and generates functions changed events, even though host PF HCA is not enabled at this time. However, ECPF is not aware of this information, and still handles these events as usual. ECPF system will see massive number of reps are created, but destroyed immediately once creation finished. To eliminate this noise, a bit is added to host parameter context to indicate host PF is disabled. ECPF will not handle the VF changed event if this bit is set. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Parav Pandit authored
As mlx5_get_next_phys_dev is used only for PCI PF devices use case, limit it to search only for PCI devices. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Parav Pandit authored
mlx5_pci_init() performs pci specific initialization of the mlx5_core_dev struct. Hence move pci_status_mutex to pci initialization routine mlx5_pci_init(). This allows reusing mlx5_mdev_init() to non PCI devices. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Huy Nguyen authored
Rename mlx5_pci_dev_type to mlx5_coredev_type to distinguish different mlx5 device types. mlx5_coredev_type represents mlx5_core_dev instance type. Hence keep mlx5_coredev_type in mlx5_core_dev structure. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Bodong Wang authored
When an IB rep is loaded, netdev for the same vport is saved for later reference. However, it's not cleaned up when doing unload. For ECPF, kernel crashes when driver is referring to the already removed netdev. Following steps lead to a shown call trace: 1. Create n VFs from host PF 2. Distroy the VFs 3. Run "rdma link" from ARM Call trace: mlx5_ib_get_netdev+0x9c/0xe8 [mlx5_ib] mlx5_query_port_roce+0x268/0x558 [mlx5_ib] mlx5_ib_rep_query_port+0x14/0x34 [mlx5_ib] ib_query_port+0x9c/0xfc [ib_core] fill_port_info+0x74/0x28c [ib_core] nldev_port_get_doit+0x1a8/0x1e8 [ib_core] rdma_nl_rcv_msg+0x16c/0x1c0 [ib_core] rdma_nl_rcv+0xe8/0x144 [ib_core] netlink_unicast+0x184/0x214 netlink_sendmsg+0x288/0x354 sock_sendmsg+0x18/0x2c __sys_sendto+0xbc/0x138 __arm64_sys_sendto+0x28/0x34 el0_svc_common+0xb0/0x100 el0_svc_handler+0x6c/0x84 el0_svc+0x8/0xc Cleanup the rep and netdev reference when unloading IB rep. Fixes: 26628e2d ("RDMA/mlx5: Move to single device multiport ports in switchdev mode") Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Bodong Wang authored
In the single IB device mode, the mapping between vport number and rep relies on a counter. However for dynamic vport allocation, it is desired to keep consistent map of eswitch vport and IB port. Hence, simplify code to remove the free running counter and instead use the available vport index during load/unload sequence from the eswitch. Signed-off-by: Bodong Wang <bodong@mellanox.com> Suggested-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Bodong Wang authored
Driver is referring to the array index when doing rep initialization, using vport is confusing as it's normally interpreted as vport number. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Shay Agroskin authored
Given a fw component index, the MCQI register allows us to query this component's information (e.g. its version and capabilities). Given a fw component index, the MCQS register allows us to query the status of a fw component, including its type and state (e.g. PRESET/IN_USE). It can be used to find the index of a component of a specific type, by sequentially increasing the component index, and querying each time the type of the returned component. If max component index is reached, 'last_index_flag' is set by the HCA. These registers' description was added to query the running and pending fw version of the HCA. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Parav Pandit authored
Update mlx5 device interface data structures for: 1. New command definitions for allocating, deallocating SF 2. Query SF partition 3. Eswitch SF fields 4. HCA CAP SF fields 5. Extend Eswitch functions command for SF Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
- 26 Jun, 2019 13 commits
-
-
Jianbo Liu authored
As the ingress ACL rules save vhca id and vport number to packet's metadata REG_C_0, and the metadata matching for the rules in both fast path and slow path are all added, enable this feature if supported. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Jianbo Liu authored
If vport metadata matching is enabled in eswitch, the rule created must be changed to match on the metadata, instead of source port. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Jianbo Liu authored
In slow path, packet that not matched by any offloaded rule is forwarded to eswitch vport manager for further processing. Add matching on metadata for peer miss rules in FDB, and rules which forward packet to correct representor in esw manager NIC_RX table. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Jianbo Liu authored
In order to do matching on metadata in slow path when demuxing traffic to representors, explicitly enable the feature that allows HW to pass metadata REG_C_0 from FDB to eswitch manager NIC_RX table. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Jianbo Liu authored
Add esw vport query and modify functions, and exposing them is needed for enabling or disabling registers passed as metatdata to vport NIC_RX table in slow path. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Jianbo Liu authored
If FW's capabilities and configurations meet the requirement of vport metadata matching, this feature will be used. As the information about vport number and vhca_id related to packet is already stored to its metadata register, which is used as an indicator for perticular vport, now we can change to match on this metadata for all the offloading rules in fast path. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Jianbo Liu authored
In vport metadata matching, source port number is replaced by metadata. While FW has no idea about what it is in the metadata, a syndrome will happen. Specify a known origin to avoid the syndrome. However, there is no functional change because ANY_VPORT (0) is filled in flow_source, the same default value as before, as a pre-step towards metadata matching for fast path. There are two other values can be filled in flow_source. When setting 0x1, packet matching this rule is from uplink, while 0x2 is for packet from other local vports. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Jianbo Liu authored
When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport as it is not represented by single VHCA only in this case. So we change to match on metadata instead of source vport. To do that, a rule is created in all vports and uplink ingress ACLs, to save the source vport number and vhca id in the packet's metadata in order to match on it later. The metadata register used is the first of the 32-bit type C registers. It can be used for matching and header modify operations. The higher 16 bits of this register are for vhca id, and the lower 16 ones is for vport number. This change is not for dual-port RoCE only. If HW and FW allow, the vport metadata matching is enabled by default. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Jianbo Liu authored
Refactor the flow data structures, add new flow_context and move flow_tag into it, as flow_tag doesn't belong to the rule action. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Parav Pandit authored
Introduce a helper API mlx5_eswitch_is_vf_vport() to check if a given vport_num belongs to VF or not. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Jianbo Liu authored
That modify header action can be then attached to a steering rule in the ingress ACL. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Jianbo Liu authored
The ingress and egress ACL root namespaces are created per vport and stored into arrays. However, the vport number is not the same as the index. Passing the array index, instead of vport number, to get the correct ingress and egress acl namespace. Fixes: 9b93ab98 ("net/mlx5: Separate ingress/egress namespaces for each vport") Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Jianbo Liu authored
When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport. So we replace the match on source port with the match on metadata that was configured in ingress ACL, and that metadata will be passed further also to the NIC RX table of the eswitch manager. Introduce vport metadata matching bits and enum constants as a pre-step towards metadata matching. o metadata type C registers in the misc parameters 2 fields. o esw_uplink_ingress_acl bit in esw cap. If it set, the device supports ingress ACL for the uplink vport. o fdb_to_vport_reg_* bits in flow table cap and esw vport context, to support propagating the metadata to the nic rx through the loopback path. o flow_source in flow context, to indicate the known origin of packets. o enum constants, to support the above bits. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
- 24 Jun, 2019 1 commit
-
-
Matthew Wilcox authored
The lock protecting the data structure does not need to be an rwlock. The only read access to the lock is in an error path, and if that's limiting your scalability, you have bigger performance problems. Eliminate mlx5_mkey_table in favour of using the xarray directly. reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may be called in interrupt context. This also fixes a minor bug where SRCU locking was being used on the radix tree read side, when RCU was needed too. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
- 16 Jun, 2019 2 commits
-
-
Maor Gottlieb authored
Add API to get the current Eswitch encap mode. It will be used in downstream patches to check if flow table can be created with encap support or not. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
-
Leon Romanovsky authored
Devlink has UAPI declaration for encap mode, so there is no need to be loose on the data get/set by drivers. Update call sites to use enum devlink_eswitch_encap_mode instead of plain u8. Suggested-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Petr Vorel <pvorel@suse.cz>
-
- 13 Jun, 2019 14 commits
-
-
Yuval Avnery authored
Previously, EQ joined the chain notifier on creation. This forced the caller to be ready to handle events before creating the EQ through eq_create_generic interface. To help the caller control when the created EQ will be attached to the IRQ, add enable/disable API. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Ariel Levkovich authored
The patch modifies the IRQ allocation so that all async EQs are assigned to the same IRQ resulting in more available IRQs for completion EQs. The changes are using the support for IRQ sharing and EQ polling budget that was introduced in previous patches so when the shared interrupt is triggered, the kernel will serially call the handler of each of the sharing EQs with a certain budget of EQEs to poll in order to prevent starvation. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Yuval Avnery authored
struct mlx5_irq_info is an active object and not just info. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Yuval Avnery authored
Finalize IRQ separation and expose irq interface. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Yuval Avnery authored
IRQ interface should operate within the irq_table context. It should be independent of any EQ data structure. The interface that will be exposed: init/clenup, create/destroy, attach/detach Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Yuval Avnery authored
IRQ allocation should be part of the IRQ table life-cycle. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Yuval Avnery authored
Affinity set/clear is part of the IRQ life-cycle. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Yuval Avnery authored
Rmap creation/deletion is part of the IRQ life-cycle. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Yuval Avnery authored
IRQ table should only exist for mlx5_core_dev for PF and VF only. EQ table of mediated devices should hold a pointer to the IRQ table of the parent PCI device. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Yuval Avnery authored
Instead of requesting IRQ with eq creation, IRQs will be requested before EQ table creation. Instead of freeing the IRQs after EQ destroy, free IRQs after eq table destroy. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Yuval Avnery authored
Multiple EQs may share the same IRQ in subsequent patches. Instead of calling the IRQ handler directly, the EQ will register to an atomic chain notfier. The Linux built-in shared IRQ is not used because it forces the caller to disable the IRQ and clear affinity before free_irq() can be called. This patch is the first step in the separation of IRQ and EQ logic. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Yuval Avnery authored
Multiple EQs may share the same irq in subsequent patches. To avoid starvation, a budget is set per EQ's interrupt handler. Because of this change, it is no longer required to check that MLX5_NUM_SPARE_EQE eqes were polled (to detect that arm is required). It is guaranteed that MLX5_NUM_SPARE_EQE > budget, therefore the handler will arm and exit the handler before all the entries in the eq are polled. In the scenario where the handler is out of budget and there are more EQEs to poll, arming the EQ guarantees that the HW will send another interrupt and the handler will be called again. Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Bodong Wang authored
For ECPF with eswitch manager privilege, query the host max VF count by querying the device using query_functions command. With this enhancement: 1. flow steering entries are created only for valid vports based on the max VF count of the PF. 2. Driver only queries cap of valid vport. Eswitch requires the max VFs when doing initialization, so do sr-iov init before eswitch init. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-
Bodong Wang authored
Current function only returns host num of VFs, later patch requires other params such as host maximum num of VFs. Return the raw output so that caller can extract info as needed. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-