- 18 Nov, 2016 1 commit
-
-
Eric Dumazet authored
UDP busy polling is restricted to connected UDP sockets. This is because sk_busy_loop() only takes care of one NAPI context. There are cases where it could be extended. 1) Some hosts receive traffic on a single NIC, with one RX queue. 2) Some applications use SO_REUSEPORT and associated BPF filter to split the incoming traffic on one UDP socket per RX queue/thread/cpu 3) Some UDP sockets are used to send/receive traffic for one flow, but they do not bother with connect() This patch records the napi_id of first received skb, giving more reach to busy polling. Tested: lpaa23:~# echo 70 >/proc/sys/net/core/busy_read lpaa24:~# echo 70 >/proc/sys/net/core/busy_read lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done Before patch : 27867 28870 37324 41060 41215 36764 36838 44455 41282 43843 After patch : 73920 73213 70147 74845 71697 68315 68028 75219 70082 73707 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 17 Nov, 2016 22 commits
-
-
David S. Miller authored
Sowmini Varadhan says: ==================== RDS: TCP: HA/Failover fixes This series contains a set of fixes for bugs exposed when we ran the following in a loop between a test machine pair: while (1); do # modprobe rds-tcp on test nodes # run rds-stress in bi-dir mode between test machine pair # modprobe -r rds-tcp on test nodes done rds-stress in bi-dir mode will cause both nodes to initiate RDS-TCP connections at almost the same instant, exposing the bugs fixed in this series. Without the fixes, rds-stress reports sporadic packet drops, and packets arriving out of sequence. After the fixes,we have been able to run the test overnight, without any issues. Each patch has a detailed description of the root-cause fixed by the patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sowmini Varadhan authored
When 2 RDS peers initiate an RDS-TCP connection simultaneously, there is a potential for "duelling syns" on either/both sides. See commit 241b2719 ("RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()") for a description of this condition, and the arbitration logic which ensures that the numerically large IP address in the TCP connection is bound to the RDS_TCP_PORT ("canonical ordering"). The rds_connection should not be marked as RDS_CONN_UP until the arbitration logic has converged for the following reason. The sender may start transmitting RDS datagrams as soon as RDS_CONN_UP is set, and since the sender removes all datagrams from the rds_connection's cp_retrans queue based on TCP acks. If the TCP ack was sent from a tcp socket that got reset as part of duel aribitration (but before data was delivered to the receivers RDS socket layer), the sender may end up prematurely freeing the datagram, and the datagram is no longer reliably deliverable. This patch remedies that condition by making sure that, upon receipt of 3WH completion state change notification of TCP_ESTABLISHED in rds_tcp_state_change, we mark the rds_connection as RDS_CONN_UP if, and only if, the IP addresses and ports for the connection are canonically ordered. In all other cases, rds_tcp_state_change will force an rds_conn_path_drop(), and rds_queue_reconnect() on both peers will restart the connection to ensure canonical ordering. A side-effect of enforcing this condition in rds_tcp_state_change() is that rds_tcp_accept_one_path() can now be refactored for simplicity. It is also no longer possible to encounter an RDS_CONN_UP connection in the arbitration logic in rds_tcp_accept_one(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sowmini Varadhan authored
The RDS transport has to be able to distinguish between two types of failure events: (a) when the transport fails (e.g., TCP connection reset) but the RDS socket/connection layer on both sides stays the same (b) when the peer's RDS layer itself resets (e.g., due to module reload or machine reboot at the peer) In case (a) both sides must reconnect and continue the RDS messaging without any message loss or disruption to the message sequence numbers, and this is achieved by rds_send_path_reset(). In case (b) we should reset all rds_connection state to the new incarnation of the peer. Examples of state that needs to be reset are next expected rx sequence number from, or messages to be retransmitted to, the new incarnation of the peer. To achieve this, the RDS handshake probe added as part of commit 5916e2c1 ("RDS: TCP: Enable multipath RDS for TCP") is enhanced so that sender and receiver of the RDS ping-probe will add a generation number as part of the RDS_EXTHDR_GEN_NUM extension header. Each peer stores local and remote generation numbers as part of each rds_connection. Changes in generation number will be detected via incoming handshake probe ping request or response and will allow the receiver to reset rds_connection state. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sowmini Varadhan authored
As noted in rds_recv_incoming() sequence numbers on data packets can decreas for the failover case, and the Rx path is equipped to recover from this, if the RDS_FLAG_RETRANSMITTED is set on the rds header of an incoming message with a suspect sequence number. The RDS_FLAG_RETRANSMITTED is predicated on the RDS_FLAG_RETRANSMITTED flag in the rds_message, so make sure the flag is set on messages queued for retransmission. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
LABBE Corentin authored
As sugested by Joe Perches, we could replace all if (netif_msg_type(priv)) dev_xxx(priv->devices, ...) by the simpler macro netif_xxx(priv, hw, priv->dev, ...) Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
LABBE Corentin authored
Some printing have the function name hardcoded. It is better to use __func__ instead. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
LABBE Corentin authored
The stmmac driver use lots of pr_xxx functions to print information. This is bad since we cannot know which device logs the information. (moreover if two stmmac device are present) Furthermore, it seems that it assumes wrongly that all logs will always be subsequent by using a dev_xxx then some indented pr_xxx like this: kernel: sun7i-dwmac 1c50000.ethernet: no reset control found kernel: Ring mode enabled kernel: No HW DMA feature register supported kernel: Normal descriptors kernel: TX Checksum insertion supported So this patch replace all pr_xxx by their netdev_xxx counterpart. Excepts for some printing where netdev "cause" unpretty output like: sun7i-dwmac 1c50000.ethernet (unnamed net_device) (uninitialized): no reset control found In those case, I keep dev_xxx. In the same time I remove some "stmmac:" print since this will be a duplicate with that dev_xxx displays. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
When I wrote sch_fq.c, hash_ptr() on 64bit arches was awful, and I chose hash_32(). Linus Torvalds and George Spelvin fixed this issue, so we can use hash_ptr() to get more entropy on 64bit arches with Terabytes of memory, and avoid the cast games. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Calling napi_hash_del() after netif_napi_del() is pointless. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
There is no need calling napi_hash_del()+synchronize_rcu() before calling netif_napi_del() netif_napi_del() does this already. Using napi_hash_del() in a driver is useful only when dealing with a batch of NAPI structures, so that a single synchronize_rcu() can be used. mlx4_en_deactivate_cq() is deactivating a single NAPI. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== mlxsw: Introduce support for I2C bus Vadim says: This patchset adds I2C access support for SwitchX, SwitchX2, SwitchIB, SwitchIB2 and Spectrum silicones. It contains: - Small changes in mlxsw core code, needed for I2C bus support; - I2C driver, which obtains I2C input/output mailboxes setting and provides command interface implementation. - Minimal driver, which works on top of I2C driver and allows running of mlxsw command interface over I2C bus; Use case: On system, which does not have PCI to ASIC (BMC), hwmon functionality (sensors, pwm, tacho) will be available through I2C. Usage (manual probing): echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device Sysfs interface: /sys/bus/i2c/devices/2-0048/hwmon/hwmon5/pwm1 /sys/bus/i2c/devices/2-0048/hwmon/hwmon5/temp1_input ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vadim Pasternak authored
Add I2C access support for Mellanox ASICs: - Virtual Protocol Interconnect switches SwitchX, SwitchX2, providing InfiniBand, Ethernet and Fibre Channel connectivity; - Infiniband switches SwitchIB, SwitchIB2: - Ethernet switch Spectrum. Example of probing activation: echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vadim Pasternak authored
We are going to add a minimal driver on top of the mlxsw core infrastructure, which will be mainly used for hardware monitoring in Baseboard management controller (BMC) installations. Unlike the switch drivers (e.g., spectrum, switchx2), this driver does not initialize the ASIC and therefore doesn't need to implement the init() and fini() methods in its 'mlxsw_driver' struct. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vadim Pasternak authored
Add I2C bus implementation for Mellanox Technologies Switch ASICs. This includes command interface implementation using input / out mailboxes, whose location is retrieved from the firmware during probe time. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vadim Pasternak authored
The mlxsw core infrastructure currently assumes that communication with the ASIC is always possible using Ethernet management datagrams (EMADs), but this is only possible when the PCI bus is used. The bus capability flag is added to indicate EMAD support and make core initialize EMAD communication only when it's set. Otherwise, register access is done using command interface. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julia Lawall authored
knav_queue_open always returns an ERR_PTR value, never NULL. This can be confirmed by unfolding the function calls and conforms to the function's documentation. Thus, replace IS_ERR_OR_NULL by IS_ERR in error checks. The change is made using the following semantic patch: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; statement S; @@ x = knav_queue_open(...); if ( - IS_ERR_OR_NULL + IS_ERR (x)) S // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key to hash a node to one chain. If in one host thousands of assocs connect to one server with the same lport and different laddrs (although it's not a normal case), all the transports would be hashed into the same chain. It may cause to keep returning -EBUSY when inserting a new node, as the chain is too long and sctp inserts a transport node in a loop, which could even lead to system hangs there. The new rhlist interface works for this case that there are many nodes with the same key in one chain. It puts them into a list then makes this list be as a node of the chain. This patch is to replace rhashtable_ interface with rhltable_ interface. Since a chain would not be too long and it would not return -EBUSY with this fix when inserting a node, the reinsert loop is also removed here. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Michael Chan says: ==================== bnxt_en: Updates. New firmware spec. update, autoneg update, and UDP RSS support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
To display and modify the RSS hash. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
The newer chips have proper support for 4-tuple UDP RSS. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
On some dual port NICs, the speed setting on one port can affect the available speed on the other port. Add logic to detect these changes and adjust the advertised speed settings when necessary. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Use the new FORCE_LINK_DWN bit to shutdown link during close. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 16 Nov, 2016 17 commits
-
-
Eric Dumazet authored
Callers of netpoll_poll_lock() own NAPI_STATE_SCHED Callers of netpoll_poll_unlock() have BH blocked between the NAPI_STATE_SCHED being cleared and poll_lock is released. We can avoid the spinlock which has no contention, and use cmpxchg() on poll_owner which we need to set anyway. This removes a possible lockdep violation after the cited commit, since sk_busy_loop() re-enables BH before calling busy_poll_stop() Fixes: 217f6974 ("net: busy-poll: allow preemption in sk_busy_loop()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rafal Ozieblo authored
New Cadence GEM hardware support Large Segment Offload (LSO): TCP segmentation offload (TSO) as well as UDP fragmentation offload (UFO). Support for those features was added to the driver. Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The netdev->real_num_rx_queues setting is only available if CONFIG_SYSFS is enabled, so we now get a build failure when that is turned off: netronome/nfp/nfp_net_common.c: In function 'nfp_net_ring_swap_enable': netronome/nfp/nfp_net_common.c:2489:18: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'? As far as I can tell, the check here is only used as an optimization that we can skip in order to fix the compilation. If sysfs is disabled, the following netif_set_real_num_rx_queues() has no effect. Fixes: 164d1e9e ("nfp: add support for ethtool .set_channels") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Calling napi_hash_del() after netif_napi_del() is pointless. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Edward Cree <ecree@solarflare.com> Cc: Bert Kenward <bkenward@solarflare.com> Acked-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Lebrun authored
This patch changes the lwtunnel_headroom() function which is called in ipv4_mtu() and ip6_mtu(), to also return the correct headroom value when the lwtunnel state is OUTPUT_REDIRECT. This patch enables e.g. SR-IPv6 encapsulations to work without manually setting the route mtu. Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
The recent merge commit bb598c1b ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") would cause the FIB abort warning to fire whenever we flush the FIB tables - either during module removal or actual abort. Move it back to its rightful location in the FIB abort function. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
The SPEED_UNFORCED indicates the MAC & PHY should perform auto-negotiation to determine a speed which works. If this is called for, don't set the force bit. If it is set, the MAC actually does 10Gbps, why the internal PHYs don't support. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Tom Lendacky says: ==================== amd-xgbe: AMD XGBE driver updates 2016-11-15 This patch series addresses some minor issues found in the recently accepted patch series for the AMD XGBE driver. The following fixes are included in this driver update series: - Fix a possibly uninitialized variable in the debugfs support - Fix the GPIO pin number constraint check This patch series is based on net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lendacky, Thomas authored
The GPIO support in the hardware allows for up to 16 GPIO pins, enumerated from 0 to 15. The driver uses the wrong value (16) to validate the GPIO pin range in the routines to set and clear the GPIO output pins. Update the code to use the correct value (15). Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lendacky, Thomas authored
The debugfs support in the driver uses a common routine to write the debugfs values. In this routine, if the input file position is non-zero then the write routine will not return an error and an output parameter will not have been set. Because an error isn't returned an uninitialized value will be written into a register. Fix the common write routine to return an error if the input file position is non-zero, which will propagate back to the caller. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Florian Fainelli says: ==================== net: Implenent ethtool::nway_reset for a few drivers This patch series depends on "net: phy: Centralize auto-negotation restart" since it provides phy_ethtool_nway_reset as a helper function. The drivers here already support PHYLIB, so there really is no reason why restarting auto-negotiation would not be possible with these. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are already using dev->phydev all over the place so this comes for free. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are already using dev->phydev all over the place so this comes for free. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are already using dev->phydev all over the place so this comes for free. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are already using dev->phydev all over the place so this comes for free. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Utilize the generic phy_ethtool_nway_reset() helper function to implement an autonegotiation restart. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric Dumazet says: ==================== net: busy-poll: allow preemption and other optimizations It is time to have preemption points in sk_busy_loop() and improve its scalability. Also napi_complete() and friends can tell drivers when it is safe to not re-enable device interrupts, saving some overhead under high busy polling. mlx4 and bnx2x are changed accordingly, to show how this busy polling status can be exploited by drivers. Next steps will implement Zach Brown suggestion, where NAPI polling would be enabled all the time for some chosen queues. This is needed for efficient epoll() support anyway. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-