- 05 Dec, 2016 27 commits
-
-
Daniel Borkmann authored
Commit 18cdb37e ("net: sched: do not use tcf_proto 'tp' argument from call_rcu") removed the last usage of tp from cls_bpf_delete_prog(), so also remove it from the function as argument to not give a wrong impression. tp is illegal to access from this callback, since it could already have been freed. Refactor the deletion code a bit, so that cls_bpf_destroy() can call into the same code for prog deletion as cls_bpf_delete() op, instead of having it unnecessarily duplicated. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Commit d691f9e8 ("bpf: allow programs to write to certain skb fields") pushed access type check outside of __is_valid_access() to have different restrictions for socket filters and tc programs. type is thus not used anymore within __is_valid_access() and should be removed as a function argument. Same for __is_valid_xdp_access() introduced by 6a773a15 ("bpf: add XDP prog type for early driver filter"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Florian Fainelli says: ==================== net: ethoc: Misc improvements This patch series fixes/improves a few things: - implement a proper PHYLIB adjust_link callback to set the duplex mode accordingly - do not open code the fetching of a MAC address in OF/DT environments - demote an error message that occurs more frequently than expected in low CPU/memory/bandwidth environments Tested on a Cirrus Logic EP93xx / TS7300 board. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Spamming the console with: net eth1: packet dropped can happen fairly frequently if the adapter is busy transmitting, demote the message to a debug print. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Do not open code getting the MAC address exclusively from the "local-mac-address" property, but instead use of_get_mac_address() which looks up the MAC address using the 3 typical property names. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
ethoc_mdio_poll() which is our PHYLIB adjust_link callback does nothing, we should at least react to duplex changes and change MODER accordingly. Speed changes is not a problem, since the OpenCores Ethernet core seems to be reacting okay without us telling it. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
1) Old code was hard to maintain, due to complex lock chains. (We probably will be able to remove some kfree_rcu() in callers) 2) Using a single timer to update all estimators does not scale. 3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity is not supposed to work well) In this rewrite : - I removed the RB tree that had to be scanned in gen_estimator_active(). qdisc dumps should be much faster. - Each estimator has its own timer. - Estimations are maintained in net_rate_estimator structure, instead of dirtying the qdisc. Minor, but part of the simplification. - Reading the estimator uses RCU and a seqcount to provide proper support for 32bit kernels. - We reduce memory need when estimators are not used, since we store a pointer, instead of the bytes/packets counters. - xt_rateest_mt() no longer has to grab a spinlock. (In the future, xt_rateest_tg() could be switched to per cpu counters) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hadar Hen Zion authored
Check if the returned device from tcf_exts_get_dev function supports tc offload and in case the rule can't be offloaded, set the filter hw_dev parameter to the original device given by the user. The filter hw_device parameter should always be set by fl_hw_replace_filter function, since this pointer is used by dump stats and destroy filter for each flower rule (offloaded or not). Fixes: 7091d8c7 ('net/sched: cls_flower: Add offload support using egress Hardware device') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reported-by: Simon Horman <horms@verge.net.au> Tested-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Erik Nordmark authored
Made kernel accept IPv6 routes with IPv4-mapped address as next-hop. It is possible to configure IP interfaces with IPv4-mapped addresses, and one can add IPv6 routes for IPv4-mapped destinations/prefixes, yet prior to this fix the kernel returned an EINVAL when attempting to add an IPv6 route with an IPv4-mapped address as a nexthop/gateway. RFC 4798 (a proposed standard RFC) uses IPv4-mapped addresses as nexthops, thus in order to support that type of address configuration the kernel needs to allow IPv4-mapped addresses as nexthops. Signed-off-by: Erik Nordmark <nordmark@arista.com> Signed-off-by: Bob Gilligan <gilligan@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gianluca Borello authored
Occasionally, clang (e.g. version 3.8.1) translates a sum between two constant operands using a BPF_OR instead of a BPF_ADD. The verifier is currently not handling this scenario, and the destination register type becomes UNKNOWN_VALUE even if it's still storing a constant. As a result, the destination register cannot be used as argument to a helper function expecting a ARG_CONST_STACK_*, limiting some use cases. Modify the verifier to handle this case, and add a few tests to make sure all combinations are supported, and stack boundaries are still verified even with BPF_OR. Signed-off-by: Gianluca Borello <g.borello@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
Implement ethtooll::nway_restart by utilizing mii_nway_restart. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-12-03 Here's a set of Bluetooth & 802.15.4 patches for net-next (i.e. 4.10 kernel): - Fix for a potential NULL deref in the ieee802154 netlink code - Fix for the ED values of the at86rf2xx driver - Documentation updates to ieee802154 - Cleanups to u8 vs __u8 usage - Timer API usage cleanups in HCI drivers Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric Dumazet says: ==================== tcp: tsq: performance series Under very high TX stress, CPU handling NIC TX completions can spend considerable amount of cycles handling TSQ (TCP Small Queues) logic. This patch series avoids some atomic operations, but most notable patch is the 3rd one, allowing other cpus processing ACK packets and calling tcp_write_xmit() to grab TCP_TSQ_DEFERRED so that tcp_tasklet_func() can skip already processed sockets. This avoid lots of lock acquisitions and cache lines accesses, particularly under load. In v2, I added : - tcp_small_queue_check() change to allow 1st and 2nd packets in write queue to be sent, even in the case TX completion of already acknowledged packets did not happen yet. This helps when TX completion coalescing parameters are set even to insane values, and/or busy polling is used. - A reorganization of struct sock fields to lower false sharing and increase data locality. - Then I moved tsq_flags from tcp_sock to struct sock also to reduce cache line misses during TX completions. I measured an overall throughput gain of 22 % for heavy TCP use over a single TX queue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
tsq_flags being in the same cache line than sk_wmem_alloc makes a lot of sense. Both fields are changed from tcp_wfree() and more generally by various TSQ related functions. Prior patch made room in struct sock and added sk_tsq_flags, this patch deletes tsq_flags from struct tcp_sock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Group fields used in TX path, and keep some cache lines mostly read to permit sharing among cpus. Gained two 4 bytes holes on 64bit arches. Added a place holder for tcp tsq_flags, next to sk_wmem_alloc to speed up tcp_wfree() in the following patch. I have not added ____cacheline_aligned_in_smp, this might be done later. I prefer doing this once inet and tcp/udp sockets reorg is also done. Tested with both TCP and UDP. UDP receiver performance under flood increased by ~20 % : Accessing sk_filter/sk_wq/sk_napi_id no longer stalls because sk_drops was moved away from a critical cache line, now mostly read and shared. /* --- cacheline 4 boundary (256 bytes) --- */ unsigned int sk_napi_id; /* 0x100 0x4 */ int sk_rcvbuf; /* 0x104 0x4 */ struct sk_filter * sk_filter; /* 0x108 0x8 */ union { struct socket_wq * sk_wq; /* 0x8 */ struct socket_wq * sk_wq_raw; /* 0x8 */ }; /* 0x110 0x8 */ struct xfrm_policy * sk_policy[2]; /* 0x118 0x10 */ struct dst_entry * sk_rx_dst; /* 0x128 0x8 */ struct dst_entry * sk_dst_cache; /* 0x130 0x8 */ atomic_t sk_omem_alloc; /* 0x138 0x4 */ int sk_sndbuf; /* 0x13c 0x4 */ /* --- cacheline 5 boundary (320 bytes) --- */ int sk_wmem_queued; /* 0x140 0x4 */ atomic_t sk_wmem_alloc; /* 0x144 0x4 */ long unsigned int sk_tsq_flags; /* 0x148 0x8 */ struct sk_buff * sk_send_head; /* 0x150 0x8 */ struct sk_buff_head sk_write_queue; /* 0x158 0x18 */ __s32 sk_peek_off; /* 0x170 0x4 */ int sk_write_pending; /* 0x174 0x4 */ long int sk_sndtimeo; /* 0x178 0x8 */ Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Adding a likely() in tcp_mtu_probe() moves its code which used to be inlined in front of tcp_write_xmit() We still have a cache line miss to access icsk->icsk_mtup.enabled, we will probably have to reorganize fields to help data locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Always allow the two first skbs in write queue to be sent, regardless of sk_wmem_alloc/sk_pacing_rate values. This helps a lot in situations where TX completions are delayed either because of driver latencies or softirq latencies. Test is done with no cache line misses. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Under high load, tcp_wfree() has an atomic operation trying to schedule a tasklet over and over. We can schedule it only if our per cpu list was empty. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Under high stress, I've seen tcp_tasklet_func() consuming ~700 usec, handling ~150 tcp sockets. By setting TCP_TSQ_DEFERRED in tcp_wfree(), we give a chance for other cpus/threads entering tcp_write_xmit() to grab it, allowing tcp_tasklet_func() to skip sockets that already did an xmit cycle. In the future, we might give to ACK processing an increased budget to reduce even more tcp_tasklet_func() amount of work. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Instead of atomically clear TSQ_THROTTLED and atomically set TSQ_QUEUED bits, use one cmpxchg() to perform a single locked operation. Since the following patch will also set TCP_TSQ_DEFERRED here, this cmpxchg() will make this addition free. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This is a cleanup, to ease code review of following patches. Old 'enum tsq_flags' is renamed, and a new enumeration is added with the flags used in cmpxchg() operations as opposed to single bit operations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Michael Chan says: ==================== bnxt_en: Add DCBNL support. This series adds DCBNL operations to support host-based IEEE DCBX. v2: Updated to the latest firmware interface spec. David, please consider this series for net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Report PFC statistics to ethtool -S and DCBNL. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Support only IEEE DCBX initially. Add IEEE DCBNL ops and functions to get and set the hardware DCBX parameters. The DCB code is conditional on Kconfig CONFIG_BNXT_DCB. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Latest interface has the latest DCB command structs. Get and store the max number of lossless TCs the hardware can support. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
Add a new function bnxt_setup_mq_tc() to handle MQPRIO. This new function will be called during ETS setup when we add DCBNL in the next patch. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jesper Nilsson authored
According to the documentation, the PHYs supported by this driver can also support pause frames. Announce this to be so. Tested with a TI83822I. Acked-by: Andrew F. Davis <afd@ti.com> Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 04 Dec, 2016 13 commits
-
-
Erik Nordmark authored
Implemented RFC7527 Enhanced DAD. IPv6 duplicate address detection can fail if there is some temporary loopback of Ethernet frames. RFC7527 solves this by including a random nonce in the NS messages used for DAD, and if an NS is received with the same nonce it is assumed to be a looped back DAD probe and is ignored. RFC7527 is enabled by default. Can be disabled by setting both of conf/{all,interface}/enhanced_dad to zero. Signed-off-by: Erik Nordmark <nordmark@arista.com> Signed-off-by: Bob Gilligan <gilligan@arista.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Andrew Lunn says: ==================== mv88e6390 batch 3 More patches to support the MV88e6390. This is mostly refactoring existing code and adding implementations for the mv88e6390. This patchset set which reserved frames are sent to the cpu, the size of jumbo frames that will be accepted, turn off egress rate limiting, and configuration of pause frames. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
The mv88e6390 has a number flow control registers accessed via the Flow Control register. Use these to set the pause control. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
The mv88e6390 has a different mechanism for configuring pause. Refactor the code into an ops function, and for the moment, don't add any mv88e6390 code yet. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
There are two different rate limiting configurations, depending on the switch generation. Refactor this into ops. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
Some switches support jumbo frames. Refactor this code into operations in the ops structure. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
Older devices have a couple of registers in global2. The mv88e6390 family has a single register in global1 behind which hides similar configuration. Implement and op for this. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Andrew Lunn says: ==================== MV88E6390 batch two This is the second batch of patches adding support for the MV88e6390. They are not sufficient to make it work properly. The mv88e6390 has a much expanded set of priority maps. Refactor the existing code, and implement basic support for the new device. Similarly, the monitor control register has been reworked. The mv88e6390 has something odd in its EDSA tagging implementation, which means it is not possible to use it. So we need to use DSA tagging. This is the first device with EDSA support where we need to use DSA, and the code does not support this. So two patches refactor the existing code. The two different register definitions are separated out, and using DSA on an EDSA capable device is added. v2: Add port prefix Add helper function for 6390 Add _IEEE_ into #defines Split monitor_ctrl into a number of separate ops. Remove 6390 code which is management, used in a later patch s/EGREES/EGRESS/. Broke up setup_port_dsa() and set_port_dsa() into a number of ops v3: Verify mandatory ops for port setup Don't set ether type for DSA port. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
Older chips only support DSA tagging. Newer chips have both DSA and EDSA tagging. Refactor the code by adding port functions for setting the frame mode, egress mode, and if to forward unknown frames. This results in the helper mv88e6xxx_6065_family() becoming unused, so remove it. Signed-off-by: Andrew Lunn <andrew@lunn.ch> v3: Verify mandatory ops for port setup Don't set ether type for DSA port. Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
Older chips support a single tagging protocol, DSA. New chips support both DSA and EDSA, an enhanced version. Having both as an option changes the register layouts. Up until now, it has been assumed that if EDSA is supported, it will be used. Hence the register layout has been determined by which protocol should be used. However, mv88e6390 has a different implementation of EDSA, which requires we need to use the DSA tagging. Hence separate the selection of the protocol from the register layout. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
The mv88e6390 changes the monitor control register into the Monitor and Management control, which is an indirection register to various registers. Add ops to set the CPU port and the ingress/egress port for both register layouts, to global1 Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
The mv88e6390 does not have the two registers to set the frame priority map. Instead it has an indirection registers for setting a number of different priority maps. Refactor the old code into an function, implement the mv88e6390 version, and use an op to call the right one. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== ipv4: fib: Replay events when registering FIB notifier Ido says: In kernel 4.9 the switchdev-specific FIB offload mechanism was replaced by a new FIB notification chain to which modules could register in order to be notified about the addition and deletion of FIB entries. The motivation for this change was that switchdev drivers need to be able to reflect the entire FIB table and not only FIBs configured on top of the port netdevs themselves. This is useful in case of in-band management. The fundamental problem with this approach is that upon registration listeners lose all the information previously sent in the chain and thus have an incomplete view of the FIB tables, which can result in packet loss. This patchset fixes that by dumping the FIB tables and replaying notifications previously sent in the chain for the registered notification block. The entire dump process is done under RCU and thus the FIB notification chain is converted to be atomic. The listeners are modified accordingly. This is done in the first eight patches. The ninth patch adds a change sequence counter to ensure the integrity of the FIB dump. The last patch adds the dump itself to the FIB chain registration function and modifies existing listeners to pass a callback to be executed in case dump was inconsistent. --- v3->v4: - Register the notification block after the dump and protect it using the change sequence counter (Hannes Frederic Sowa). - Since we now integrate the dump into the registration function, drop the sysctl to set maximum number of retries and instead set it to a fixed number. Lets see if it's really a problem before adding something we can never remove. - For the same reason, dump FIB tables for all net namespaces. - Add a comment regarding guarantees provided by mutex semantics. v2->v3: - Add sysctl to set the number of FIB dump retries (Hannes Frederic Sowa). - Read the sequence counter under RTNL to ensure synchronization between the dump process and other processes changing the routing tables (Hannes Frederic Sowa). - Pass a callback to the dump function to be executed prior to a retry. - Limit the dump to a single net namespace. v1->v2: - Add a sequence counter to ensure the integrity of the FIB dump (David S. Miller, Hannes Frederic Sowa). - Protect notifications from re-ordering in listeners by using an ordered workqueue (Hannes Frederic Sowa). - Introduce fib_info_hold() (Jiri Pirko). - Relieve rocker from the need to invoke the FIB dump by registering to the FIB notification chain prior to ports creation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-