- 06 Nov, 2017 6 commits
-
-
Florian Westphal authored
We currently call ->nlattr_tuple_size() once at register time and cache result in l4proto->nla_size. nla_size is the only member that is written to, avoiding this would allow to make l4proto trackers const. We can use ->nlattr_tuple_size() at run time, and cache result in the individual trackers instead. This is an intermediate step, next patch removes nlattr_size() callback and computes size at compile time, then removes nla_size. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
Jindřich Makovička says: The logical OR looks fishy to me. Shouldn't be && there instead? Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1199Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
Instead of passing mask to all the helpers, just fixup the search key early. After rbtree conversion, each rbtree node stores connections of same 'addr & mask', so no need to pass the mask too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Colin Ian King authored
buf is initialized to buf_start and then set on the next statement to buf_start + offsets[i]. Clean this up to just initialize buf to buf_start + offsets[i] to clean up the clang build warning: "Value stored to 'buf' during its initialization is never read" Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
KUWAZAWA Takuya authored
Information about ipvs in different network namespace can be seen via procfs. How to reproduce: # ip netns add ns01 # ip netns add ns02 # ip netns exec ns01 ip a add dev lo 127.0.0.1/8 # ip netns exec ns02 ip a add dev lo 127.0.0.1/8 # ip netns exec ns01 ipvsadm -A -t 10.1.1.1:80 # ip netns exec ns02 ipvsadm -A -t 10.1.1.2:80 The ipvsadm displays information about its own network namespace only. # ip netns exec ns01 ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.1.1.1:80 wlc # ip netns exec ns02 ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.1.1.2:80 wlc But I can see information about other network namespace via procfs. # ip netns exec ns01 cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 0A010101:0050 wlc TCP 0A010102:0050 wlc # ip netns exec ns02 cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 0A010102:0050 wlc Signed-off-by: KUWAZAWA Takuya <albatross0@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Helge Deller authored
The debug and error printk functions in ipvs uses wrongly the %pF instead of the %pS printk format specifier for printing symbols for the address returned by _builtin_return_address(0). Fix it for the ia64, ppc64 and parisc64 architectures. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Cc: lvs-devel@vger.kernel.org Cc: netfilter-devel@vger.kernel.org Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 24 Oct, 2017 10 commits
-
-
Eric Sesterhenn authored
Add missing counter decrement to prevent out of bounds memory read. Signed-off-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
Only stored, never read. This is a leftover from commit 7d084877 ("netfilter: connlimit: use rbtree for per-host conntrack obj storage"), which added the rbtree node struct that stores the address instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Harsha Sharma authored
Remove typedef from struct as linux-kernel coding style tends to avoid using typedefs. Done using following coccinelle semantic patch @r1@ type T; @@ typedef struct { ... } T; @script:python c1@ T2; T << r1.T; @@ if T[-2:] =="_t" or T[-2:] == "_T": coccinelle.T2 = T[:-2]; else: coccinelle.T2 = T; print T, coccinelle.T2 @r2@ type r1.T; identifier c1.T2; @@ -typedef struct + T2 { ... } -T ; @r3@ type r1.T; identifier c1.T2; @@ -T +struct T2 Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
previous patches removed all writes to them. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
after previous commit xt_replace_table will wait until all cpus had even seqcount (i.e., no cpu is accessing old ruleset). Add a 'old' counter retrival version that doesn't synchronize counters. Its not needed, the old counters are not in use anymore at this point. This speeds up table replacement on busy systems with large tables (and many cores). Cc: Dan Williams <dcbw@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
xt_replace_table relies on table replacement counter retrieval (which uses xt_recseq to synchronize pcpu counters). This is fine, however with large rule set get_counters() can take a very long time -- it needs to synchronize all counters because it has to assume concurrent modifications can occur. Make xt_replace_table synchronize by itself by waiting until all cpus had an even seqcount. This allows a followup patch to copy the counters of the old ruleset without any synchonization after xt_replace_table has completed. Cc: Dan Williams <dcbw@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
not needed/used anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
We currently pass down the l4 protocol to the conntrack ->packet() function, but the only user of this is the debug info decision. Same information can be derived from struct nf_conn. Add a wrapper for the previous patch that extracs the information from nf_conn and passes it to nf_l4proto_log_invalid(). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
We currently pass down the l4 protocol to the conntrack ->packet() function, but the only user of this is the debug info decision. Same information can be derived from struct nf_conn. As a first step, add and use a new log function for this, similar to nf_ct_helper_log(). Add __cold annotation -- invalid packets should be infrequent so gcc can consider all call paths that lead to such a function as unlikely. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
We can use a single statement for this. While at it, fixup the comment -- we don't have pernet table/ops anymore, the function is only called from module exit path. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 04 Oct, 2017 3 commits
-
-
Aaron Conole authored
The prefixlen maps used here are identical, and have been since introduction. It seems to make sense to use a single large map, that the preprocessor will fill appropriately. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jozsef Kadlecsik authored
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
simran singhal authored
Simplify function returns by merging assignment and return into one command line. Signed-off-by: simran singhal <singhalsimran0@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 03 Oct, 2017 21 commits
-
-
David S. Miller authored
Marcelo Ricardo Leitner says: ==================== Introduce SCTP Stream Schedulers This patchset introduces the SCTP Stream Schedulers are defined by https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13 It provides 3 schedulers at the moment: FCFS, Priority and Round Robin. The other 3, Round Robin per packet, Fair Capacity and Weighted Fair Capacity will be added later. More specifically, WFQ is required by WebRTC Datachannels. The draft also defines the idata chunk, allowing a usermsg to be interrupted by another piece of idata from another stream. This patchset *doesn't* include it. It will be posted later by Xin Long. Its integration with this patchset is very simple and it basically only requires a tweak in sctp_sched_dequeue_done(), to ignore datamsg boundaries. The first 5 patches are a preparation for the next ones. The most relevant patches are the 4th and 6th ones. More details are available on each patch. v2: changelog update on patch 3 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
This patch introduces RFC Draft ndata section 3.2 Priority Based Scheduler (SCTP_SS_RR). Works by maintaining a list of enqueued streams and tracking the last one used to send data. When the datamsg is done, it switches to the next stream. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
This patch introduces RFC Draft ndata section 3.4 Priority Based Scheduler (SCTP_SS_PRIO). It works by having a struct sctp_stream_priority for each priority configured. This struct is then enlisted on a queue ordered per priority if, and only if, there is a stream with data queued, so that dequeueing is very straightforward: either finish current datamsg or simply dequeue from the highest priority queued, which is the next stream pointed, and that's it. If there are multiple streams assigned with the same priority and with data queued, it will do round robin amongst them while respecting datamsgs boundaries (when not using idata chunks), to be reasonably fair. We intentionally don't maintain a list of priorities nor a list of all streams with the same priority to save memory. The first would mean at least 2 other pointers per priority (which, for 1000 priorities, that can mean 16kB) and the second would also mean 2 other pointers but per stream. As SCTP supports up to 65535 streams on a given asoc, that's 1MB. This impacts when giving a priority to some stream, as we have to find out if the new priority is already being used and if we can free the old one, and also when tearing down. The new fields in struct sctp_stream_out_ext and sctp_stream are added under a union because that memory is to be shared with other schedulers. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
As defined per RFC Draft ndata Section 4.3.3, named as SCTP_STREAM_SCHEDULER_VALUE. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
As defined per RFC Draft ndata Section 4.3.2, named as SCTP_STREAM_SCHEDULER. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
This patch introduces the hooks necessary to do stream scheduling, as per RFC Draft ndata. It also introduces the first scheduler, which is what we do today but now factored out: first come first served (FCFS). With stream scheduling now we have to track which chunk was enqueued on which stream and be able to select another other than the in front of the main outqueue. So we introduce a list on sctp_stream_out_ext structure for this purpose. We reuse sctp_chunk->transmitted_list space for the list above, as the chunk cannot belong to the two lists at the same time. By using the union in there, we can have distinct names for these moments. sctp_sched_ops are the operations expected to be implemented by each scheduler. The dequeueing is a bit particular to this implementation but it is to match how we dequeue packets today. We first dequeue and then check if it fits the packet and if not, we requeue it at head. Thus why we don't have a peek operation but have dequeue_done instead, which is called once the chunk can be safely considered as transmitted. The check removed from sctp_outq_flush is now performed by sctp_stream_outq_migrate, which is only called during assoc setup. (sctp_sendmsg() also checks for it) The only operation that is foreseen but not yet added here is a way to signalize that a new packet is starting or that the packet is done, for round robin scheduler per packet, but is intentionally left to the patch that actually implements it. Support for I-DATA chunks, also described in this RFC, with user message interleaving is straightforward as it just requires the schedulers to probe for the feature and ignore datamsg boundaries when dequeueing. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
Add a helper to fetch the stream number from a given chunk. Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
With the stream schedulers, sctp_stream_out will become too big to be allocated by kmalloc and as we need to allocate with BH disabled, we cannot use __vmalloc in sctp_stream_init(). This patch moves out the stats from sctp_stream_out to sctp_stream_out_ext, which will be allocated only when the application tries to sendmsg something on it. Just the introduction of sctp_stream_out_ext would already fix the issue described above by splitting the allocation in two. Moving the stats to it also reduces the pressure on the allocator as we will ask for less memory atomically when creating the socket and we will use GFP_KERNEL later. Then, for stream schedulers, we will just use sctp_stream_out_ext. Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
There is 1 place allocating it and another reallocating. Move such procedures to a common function. v2: updated changelog Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
There is 1 place allocating it and 2 other reallocating. Move such procedures to a common function. Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marcelo Ricardo Leitner authored
As SCTP supports up to 65535 streams, that can lead to very large allocations in sctp_stream_init(). As Xin Long noticed, systems with small amounts of memory are more prone to not have enough memory and dump warnings on dmesg initiated by user actions. Thus, silence them. Also, if the reallocation of stream->out is not necessary, skip it and keep the memory we already have. Reported-by: Xin Long <lucien.xin@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2017-10-03 This series contains updates to fm10k only. Jake provides majority of the changes in this series, starting with using fm10k_prepare_for_reset() if we lose PCIe link. Before we would detach the device and close the netdev, which left a lot of items still active, such as the Tx/Rx resources. This could cause problems where register reads would return potentially invalid values and would result in unknown driver behavior, so call fm10k_prepare_for_reset() much like we do for suspend/resume cycles. This will attempt to shutdown as much as possible to prevent possible issues. Then replaced the PCI specific legacy power management hooks with the new generic power management hooks for both suspend and hibernate. Introduced a workqueue item which monitors a queue of MAC and VLAN requests since a large number of MAC address or VLAN updates at once can overload the mailbox with too many messages at once. Fixed a cppcheck warning by properly declaring the min_rate and max_rate variables in the declaration and definition for .ndo_set_vf_bw, rather than using "unused" for the minimum rates. Joe Perches fixes the backward logic when using net_ratelimit(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
Device alias can be set by either rtnetlink (rtnl is held) or sysfs. rtnetlink hold the rtnl mutex, sysfs acquires it for this purpose. Add an extra mutex for it and use rcu to protect concurrent accesses. This allows the sysfs path to not take rtnl and would later allow to not hold it when dumping ifalias. Based on suggestion from Eric Dumazet. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mahesh Bandewar authored
Some NIC drivers don't have correct speed/duplex settings at the time they send NETDEV_UP notification and that messes up the bonding state. Especially 802.3ad mode which is very sensitive to these settings. In the current implementation we invoke bond_update_speed_duplex() when we receive NETDEV_UP, however, ignore the return value. If the values we get are invalid (UNKNOWN), then slave gets removed from the aggregator with speed and duplex set to UNKNOWN while link is still marked as UP. This patch fixes this scenario. Also 802.3ad mode is sensitive to these conditions while other modes are not, so making sure that it doesn't change the behavior for other modes. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
We accidentally return success if the kmalloc_array() call fails. Fixes: 0e14c777 ("mlxsw: spectrum: Add the multicast routing hardware logic") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dan Carpenter authored
mlxsw_afa_block_create() doesn't return error pointers, it returns NULL on error. Fixes: 0e14c777 ("mlxsw: spectrum: Add the multicast routing hardware logic") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The function mt7530_phy_write is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warnings: symbol 'mt7530_phy_write' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The functions lan9303_mdio_phy_write and lan9303_mdio_phy_read are local to the source and do not need to be in global scope, so make them static. Cleans up sparse warnings: symbol 'lan9303_mdio_phy_write' was not declared. Should it be static? symbol 'lan9303_mdio_phy_read' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiri Pirko says: ==================== mlxsw: Add support for partial multicast route offload Yotam says: Previous patchset introduced support for offloading multicast MFC routes to the Spectrum hardware. As described in that patchset, no partial offloading is supported, i.e if a route has one output interface which is not a valid offloadable device (e.g. pimreg device, dummy device, management NIC), the route is trapped to the CPU and the forwarding is done in slow-path. Add support for partial offloading of multicast routes, by letting the hardware to forward the packet to all the in-hardware devices, while the kernel ipmr module will continue forwarding to all other interfaces. Similarly to the bridge, the kernel ipmr module will forward a marked packet to an interface only if the interface has a different parent ID than the packet's ingress interfaces. The first patch introduces the offload_mr_fwd_mark skb field, which can be used by offloading drivers to indicate that a packet had already gone through multicast forwarding in hardware, similarly to the offload_fwd_mark field that indicates that a packet had already gone through L2 forwarding in hardware. Patches 2 and 3 change the ipmr module to not forward packets that had already been forwarded by the hardware, i.e. packets that are marked with offload_mr_fwd_mark and the ingress VIF shares the same parent ID with the egress VIF. Patches 4, 5, 6 and 7 add the support in the mlxsw Spectrum driver for trap and forward routes, while marking the trapped packets with the offload_mr_fwd_mark. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yotam Gigi authored
Add the support of trap-and-forward route action in the multicast routing offloading logic. A route will be set to trap-and-forward action if one (or more) of its output interfaces is not offload-able, i.e. does not have a valid Spectrum RIF. This way, a route with mixed output VIFs list, which contains both offload-able and un-offload-able devices can go through partial offloading in hardware, and the rest will be done in the kernel ipmr module. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yotam Gigi authored
In addition to the current multicast route actions, which include trap route action and a forward route action, add the trap-and-forward multicast route action, and implement it in the multicast routing hardware logic. To implement that, add a trap-and-forward ACL action as the last action in the route flexible action set. The used trap is the ACL2 trap, which marks the packets with offload_mr_forward_mark, to prevent the packet from being forwarded again by the kernel. Note: At that stage the offloading logic does not support trap-and-forward multicast routes. This patch adds the support only in the hardware logic. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-