- 19 Jun, 2019 40 commits
-
-
David S. Miller authored
Jesper Dangaard Brouer says: ==================== xdp: page_pool fixes and in-flight accounting This patchset fix page_pool API and users, such that drivers can use it for DMA-mapping. A number of places exist, where the DMA-mapping would not get released/unmapped, all these are fixed. This occurs e.g. when an xdp_frame gets converted to an SKB. As network stack doesn't have any callback for XDP memory models. The patchset also address a shutdown race-condition. Today removing a XDP memory model, based on page_pool, is only delayed one RCU grace period. This isn't enough as redirected xdp_frames can still be in-flight on different queues (remote driver TX, cpumap or veth). We stress that when drivers use page_pool for DMA-mapping, then they MUST use one packet per page. This might change in the future, but more work lies ahead, before we can lift this restriction. This patchset change the page_pool API to be more strict, as in-flight page accounting is added. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
For DMA mapping use-case the page_pool keeps a pointer to the struct device, which is used in DMA map/unmap calls. For our in-flight handling, we also need to make sure that the struct device have not disappeared. This is assured via using get_device/put_device API. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reported-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
The xdp tracepoints for mem id disconnect don't carry information about, why it was not safe_to_remove. The tracepoint page_pool:page_pool_inflight in this patch can be used for extract this info for further debugging. This patchset also adds tracepoint for the pages_state_* release/hold transitions, including a pointer to the page. This can be used for stats about in-flight pages, or used to debug page leakage via keeping track of page pointer and combining this with kprobe for __put_page(). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
These tracepoints make it easier to troubleshoot XDP mem id disconnect. The xdp:mem_disconnect tracepoint cannot be replaced via kprobe. It is placed at the last stable place for the pointer to struct xdp_mem_allocator, just before it's scheduled for RCU removal. It also extract info on 'safe_to_remove' and 'force'. Detailed info about in-flight pages is not available at this layer. The next patch will added tracepoints needed at the page_pool layer for this. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
If bugs exists or are introduced later e.g. by drivers misusing the API, then we want to warn about the issue, such that developer notice. This patch will generate a bit of noise in form of periodic pr_warn every 30 seconds. It is not nice to have this stall warning running forever. Thus, this patch will (after 120 attempts) force disconnect the mem id (from the rhashtable) and free the page_pool object. This will cause fallback to the put_page() as before, which only potentially leak DMA-mappings, if objects are really stuck for this long. In that unlikely case, a WARN_ONCE should show us the call stack. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
This patch is needed before we can allow drivers to use page_pool for DMA-mappings. Today with page_pool and XDP return API, it is possible to remove the page_pool object (from rhashtable), while there are still in-flight packet-pages. This is safely handled via RCU and failed lookups in __xdp_return() fallback to call put_page(), when page_pool object is gone. In-case page is still DMA mapped, this will result in page note getting correctly DMA unmapped. To solve this, the page_pool is extended with tracking in-flight pages. And XDP disconnect system queries page_pool and waits, via workqueue, for all in-flight pages to be returned. To avoid killing performance when tracking in-flight pages, the implement use two (unsigned) counters, that in placed on different cache-lines, and can be used to deduct in-flight packets. This is done by mapping the unsigned "sequence" counters onto signed Two's complement arithmetic operations. This is e.g. used by kernel's time_after macros, described in kernel commit 1ba3aab3 and 5a581b36, and also explained in RFC1982. The trick is these two incrementing counters only need to be read and compared, when checking if it's safe to free the page_pool structure. Which will only happen when driver have disconnected RX/alloc side. Thus, on a non-fast-path. It is chosen that page_pool tracking is also enabled for the non-DMA use-case, as this can be used for statistics later. After this patch, using page_pool requires more strict resource "release", e.g. via page_pool_release_page() that was introduced in this patchset, and previous patches implement/fix this more strict requirement. Drivers no-longer call page_pool_destroy(). Drivers already call xdp_rxq_info_unreg() which call xdp_rxq_info_unreg_mem_model(), which will attempt to disconnect the mem id, and if attempt fails schedule the disconnect for later via delayed workqueue. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
The mlx5 driver is using page_pool, but not for DMA-mapping (currently), and is a little too relaxed about returning or releasing page resources, as it is not strictly necessary, when not using DMA-mappings. As this patchset is working towards tracking page_pool resources, to know about in-flight frames on shutdown. Then fix places where mlx5 leak page_pool resource. In case of dma_mapping_error, then recycle into page_pool. In mlx5e_free_rq() moved the page_pool_destroy() call to after the mlx5e_page_release() calls, as it is more correct. In mlx5e_page_release() when no recycle was requested, then release page from the page_pool, via page_pool_release_page(). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
In case driver fails to register the page_pool with XDP return API (via xdp_rxq_info_reg_mem_model()), then the driver can free the page_pool resources more directly than calling page_pool_destroy(), which does a unnecessarily RCU free procedure. This patch is preparing for removing page_pool_destroy(), from driver invocation. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
Like cpumap use xdp_release_frame() when an xdp_frame got converted into an SKB and send towars the network stack. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
When converting an xdp_frame into an SKB, and sending this into the network stack, then the underlying XDP memory model need to release associated resources, because the network stack don't have callbacks for XDP memory models. The only memory model that needs this is page_pool, when a driver use the DMA-mapping feature. Introduce page_pool_release_page(), which basically does the same as page_pool_unmap_page(). Add xdp_release_frame() as the XDP memory model interface for calling it, if the memory model match MEM_TYPE_PAGE_POOL, to save the function call overhead for others. Have cpumap call xdp_release_frame() before xdp_scrub_frame(). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Dangaard Brouer authored
Fix error handling case, where inserting ID with rhashtable_insert_slow fails in xdp_rxq_info_reg_mem_model, which leads to never releasing the IDA ID, as the lookup in xdp_rxq_info_unreg_mem_model fails and thus ida_simple_remove() is never called. Fix by releasing ID via ida_simple_remove(), and mark xdp_rxq->mem.id with zero, which is already checked in xdp_rxq_info_unreg_mem_model(). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ilias Apalodimas authored
On a previous patch dma addr was stored in 'struct page'. Use that to unmap DMA addresses used by network drivers Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ilias Apalodimas authored
On a previous patch dma addr was stored in 'struct page'. Use that to retrieve DMA addresses used by network drivers Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ilias Apalodimas authored
netsec_process_rx was running in a loop trying to process as many packets as possible before re-enabling interrupts. With the recent DMA changes this is not needed anymore as we manage to consume all the budget without looping over the function. Since it has no performance penalty let's remove that and simplify the Rx path a bit Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ilias Apalodimas authored
Since we changed the Tx ring handling and now depends on bit31 to figure out the owner of the descriptor, we should initialize this every time the device goes down-up instead of doing it once on driver init. If the value is not correctly initialized the device won't have any available descriptors Changes since v1: - Typo fixes Fixes: 35e07d23 ("net: socionext: remove mmio reads on Tx") Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rasmus Villemoes authored
The comment is correct, but the code ends up moving the bits four places too far, into the VTUOp field. Fixes: bec8e572 (net: dsa: mv88e6xxx: implement vtu_getnext and vtu_loadpurge for mv88e6250) Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Use _BITUL() instead. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Ido Schimmel says: ==================== mlxsw: Implement flower ingress device matching offload Jiri says: In case of using shared block, user might find it handy to be able to insert filters to match on particular ingress device. This patchset exposes the ingress ifindex through flow_dissector and flow_offload so mlxsw can use it to push down to HW. See the selftests for examples of usage. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Extend tc_flower to test plain ingress device matching and also tc_shblock to test ingress device matching on shared block. Add new tc_flower_router.sh where ingress device matching on egress (after routing) is done. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Benefit from the previously extended flow_dissector infrastructure and offload matching on ingress port. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Fix the size of the SRC_SYS_PORT element to be 16. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
RX_ACL_SYSTEM_PORT is 8 bit but SRC_SYS_PORT is 16 bits. Internally, SRC_SYS_PORT is used to carry the value. Relax the checker in case of RX_ACL_SYSTEM_PORT and allow different size. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
RX_ACL_SYSTEM_PORT is equal to SRC_SYS_PORT - 1. So before write to block we need to adjust the key value. Introduce new "EXT" helper to implement this. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Implement support for previously added flow dissector meta key. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Use previously introduced infra to obtain and store ingress ifindex instead doing it locally. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiri Pirko authored
Add new key meta that contains ingress ifindex value and add a function to dissect this from skb. The key and function is prepared to cover other potential skb metadata values dissection. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
Function mlx5_devlink_alloc is missing a void argument, add it to clean up the non-ANSI function declaration. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Maxime Chevallier says: ==================== net: mvpp2: cls: Allow steering based on vlan tag The PPv2 classifier can perform flow steering based on keys extracted from the VLAN tag. This series adds support for using the vlan id and the vlan prio as keys, using the ethtool interface. Patch 1 is a preparatory patch that prevent false-positive matches, using a dedicated lookup id for the RSS C2 lookup. Patch 2 allows to separate the flows based on the header fields they contain. The main goal is to be able to separate tagged traffic from untagged traffic for flow steering, just as we already do for RSS. Patch 3 solves an issue we have when extracting fields that aren't full bytes, such as the vlan tag which is 12 bits wide, or the priority which is 3 bits wide. Finally, patch 4 adds support for steering based on both vlan id and priority, extracted from the outermost tag. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Maxime Chevallier authored
This commit allows using the vlan Id and priority as parts of the key for classification offload. These fields are extracted from the outermost tag, if multiple tags are present. Vlan Id and priority are considered as 2 different fields by the classifier, however the fields are both appended in the Header Extracted Key in the same layout as they are found in the tags. This means that when steering only based on the prio, a 16-bit slot is still taken in the HEK. The classifier doesn't allow extracting the DEI bit from the tag, so we explicitly prevent user from using this bit in the key. This commit adds the vlan priotity as a compatible HEK field for tagged traffic, meaning that we limit the possibility of extracting this field only to the flows that contain tagged traffic. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Maxime Chevallier authored
The C2 TCAM used for classification uses a key (Header Extracted Key) built by concatenating several fields extracted from the packet header. After a lot of trial-and-error and some guess work, it seems the HEK is right justified, with the first fields being stored in the MSB, then concatenated up until the LSB. Until now, this doesn't cause any issue since all HEK fields we use are full bytes. However this is an issue for the upcoming VLAN id and pri extraction, which aren't full bytes. Rework the way we built that TCAM key, by changing the order in which we append the fields. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Maxime Chevallier authored
The way we currently handle classification offload and RSS is by having dedicated lookup sequences in the flow table, each being selected depending on several fields being present in the packet header. We need to make sure the classification operation we want to perform can be done in each flow we want to insert it into. As an example, classifying on VLAN tag can only be done on flows used for tagged traffic. This commit makes sure we don't insert rules in flows we aren't compatible with. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Maxime Chevallier authored
When performing a TCAM lookup in the C2 engine, it's possible that multiple entries match the packet. To make sure the correct entry match when performing a lookup, the Flow Table can set a lookup type, which will be used in the TCAM lookup, thus preventing such false-positives. We need to make sure the RSS match doesn't interfere with other classification lookups, hence we use a dedicated lookup_type for it. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Yash Shah says: ==================== Add macb support for SiFive FU540-C000 On FU540, the management IP block is tightly coupled with the Cadence MACB IP block. It manages many of the boundary signals from the MACB IP This patchset controls the tx_clk input signal to the MACB IP. It switches between the local TX clock (125MHz) and PHY TX clocks. This is necessary to toggle between 1Gb and 100/10Mb speeds. Future patches may add support for monitoring or controlling other IP boundary signals. This patchset is mostly based on work done by Wesley Terpstra <wesley@sifive.com> This patchset is based on Linux v5.2-rc1 and tested on HiFive Unleashed board with additional board related patches needed for testing can be found at dev/yashs/ethernet_v3 branch of: https://github.com/yashshah7/riscv-linux.git Change History: V3: - Revert "MACB_SIFIVE_FU540" config changes in Kconfig and driver code. The driver does not depend on SiFive GPIO driver. V2: - Change compatible string from "cdns,fu540-macb" to "sifive,fu540-macb" - Add "MACB_SIFIVE_FU540" in Kconfig to support SiFive FU540 in macb driver. This is needed because on FU540, the macb driver depends on SiFive GPIO driver. - Avoid writing the result of a comparison to a register. - Fix the issue of probe fail on reloading the module reported by: Andreas Schwab <schwab@suse.de> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yash Shah authored
The management IP block is tightly coupled with the Cadence MACB IP block on the FU540, and manages many of the boundary signals from the MACB IP. This patch only controls the tx_clk input signal to the MACB IP. Future patches may add support for monitoring or controlling other IP boundary signals. Signed-off-by: Yash Shah <yash.shah@sifive.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yash Shah authored
Add the compatibility string documentation for SiFive FU540-C0000 interface. On the FU540, this driver also needs to read and write registers in a management IP block that monitors or drives boundary signals for the GEMGXL IP block that are not directly mapped to GEMGXL registers. Therefore, add additional range to "reg" property for SiFive GEMGXL management IP registers. Signed-off-by: Yash Shah <yash.shah@sifive.com> Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Xue Chaojing says: ==================== hinic: add rss support and rss parameters configuration This series add rss support for HINIC driver and implement the ethtool interface related to rss parameter configuration. user can use ethtool configure rss parameters or show rss parameters. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xue Chaojing authored
This patch adds support rss parameters with ethtool, user can change hash key, hash indirection table, hash function by ethtool -X, and show rss parameters by ethtool -x. Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xue Chaojing authored
This patch moves ethtool code from hinic_main.c to hinic_ethtool.c Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xue Chaojing authored
This patch adds rss support for the HINIC driver. Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
Currently the call to device_property_read_u32_array is not error checked leading to potential garbage values in the delays array that are then used in msleep delays. Add a sanity check to the property fetching. Addresses-Coverity: ("Uninitialized scalar variable") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-