- 04 Nov, 2016 1 commit
-
-
Vivien Didelot authored
The Marvell switches contains one internal SMI device per port, called "Port Registers". Depending on the model, the addresses of these devices start from 0x0, 0x8 or 0x10. Start moving Port Registers specific code to their own files. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 03 Nov, 2016 12 commits
-
-
Simon Horman authored
Support matching on SCTP ports in the same way that matching on TCP and UDP ports is already supported. Example usage: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ip parent ffff: \ flower indev eth0 ip_proto sctp dst_port 80 \ action drop Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Elad Raz authored
The system-status register is actually 16-bit wide and not 8 bit-wide. Fixes: 233fa44b ("mlxsw: pci: Implement reset done check") Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Willem de Bruijn says: ==================== ip: add RECVFRAGSIZE cmsg On IP datagrams and raw sockets, when packets arrive fragmented, expose the largest received fragment size through a new cmsg. Protocols implemented on top of these sockets may use this, for instance, to inform peers to lower MSS on platforms that silently allow send calls to exceed PMTU and cause fragmentation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Willem de Bruijn authored
IP6CB and IPCB have a frag_max_size field. In IPv6 this field is filled in when packets are reassembled by the connection tracking code. Also fill in when reassembling in the input path, to expose it through cmsg IPV6_RECVFRAGSIZE in all cases. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Willem de Bruijn authored
When reading a datagram or raw packet that arrived fragmented, expose the maximum fragment size if recorded to allow applications to estimate receive path MTU. At this point, the field is only recorded when ipv6 connection tracking is enabled. A follow-up patch will record this field also in the ipv6 input path. Tested using the test for IP_RECVFRAGSIZE plus ip netns exec to ip addr add dev veth1 fc07::1/64 ip netns exec from ip addr add dev veth0 fc07::2/64 ip netns exec to ./recv_cmsg_recvfragsize -6 -u -p 6000 & ip netns exec from nc -q 1 -u fc07::1 6000 < payload Both with and without enabling connection tracking ip6tables -A INPUT -m state --state NEW -p udp -j LOG Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Willem de Bruijn authored
The IP stack records the largest fragment of a reassembled packet in IPCB(skb)->frag_max_size. When reading a datagram or raw packet that arrived fragmented, expose the value to allow applications to estimate receive path MTU. Tested: Sent data over a veth pair of which the source has a small mtu. Sent data using netcat, received using a dedicated process. Verified that the cmsg IP_RECVFRAGSIZE is returned only when data arrives fragmented, and in that cases matches the veth mtu. ip link add veth0 type veth peer name veth1 ip netns add from ip netns add to ip link set dev veth1 netns to ip netns exec to ip addr add dev veth1 192.168.10.1/24 ip netns exec to ip link set dev veth1 up ip link set dev veth0 netns from ip netns exec from ip addr add dev veth0 192.168.10.2/24 ip netns exec from ip link set dev veth0 up ip netns exec from ip link set dev veth0 mtu 1300 ip netns exec from ethtool -K veth0 ufo off dd if=/dev/zero bs=1 count=1400 2>/dev/null > payload ip netns exec to ./recv_cmsg_recvfragsize -4 -u -p 6000 & ip netns exec from nc -q 1 -u 192.168.10.1 6000 < payload using github.com/wdebruij/kerneltools/blob/master/tests/recvfragsize.c Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Neil Armstrong says: ==================== net: stmmac: Add OXNAS DWMAC Glue This patchset add support for the Sysnopsys DWMAC Gigabit Ethernet controller Glue layer of the Oxford Semiconductor OX820 SoC. Changes since v2 at http://lkml.kernel.org/r/20161031105345.16711-1-narmstrong@baylibre.com : - Disable/Unprepare clock if regmap read fails in oxnas_dwmac_init Changes since v1 at https://patchwork.kernel.org/patch/9388231/ : - Split dt-bindings in a separate patch - Add IP version in the dt-bindings compatible - Check return of clk_prepare_enable() - use get_stmmac_bsp_priv() helper - hardwire setup values in oxnas_dwmac_init() Changes since RFC at https://patchwork.kernel.org/patch/9387257 : - Drop init/exit callbacks - Implement proper remove and PM callback - Call init from probe - Disable/Unprepare clock if stmmac probe fails ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neil Armstrong authored
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neil Armstrong authored
Add Synopsys Designware MAC Glue layer for the Oxford Semiconductor OX820. Acked-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Cyrill Gorcunov says: ==================== net: Fixes for raw diag sockets handling Hi! Here are a few fixes for raw-diag sockets handling: missing sock_put call and jump for exiting from nested cycle. I made patches for iproute2 as well so will send them out soon. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Cyrill Gorcunov authored
I managed to miss that sk_for_each is called under "for" cycle so need to use goto here to return matching socket. CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Ahern <dsa@cumulusnetworks.com> CC: Andrey Vagin <avagin@openvz.org> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Cyrill Gorcunov authored
In raw_diag_destroy the helper raw_sock_get returns with sock_hold call, so we have to put it then. CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Ahern <dsa@cumulusnetworks.com> CC: Andrey Vagin <avagin@openvz.org> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 02 Nov, 2016 18 commits
-
-
Govindarajulu Varadarajan authored
Driver sets the skb l4/l3 hash based on NIC_CFG_RSS_HASH_TYPE_*, which is bit mask. This is wrong. Hw actually provides us enum. Use CQ_ENET_RQ_DESC_RSS_TYPE_* to set l3 and l4 hash type. Fixes: bf751ba8 ("driver/net: enic: record q_number and rss_hash for skb") Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Philippe Reynes authored
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Reviewed-by: David Dillow <dave@thedillows.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
commit ca26893f ("rhashtable: Add rhlist interface") added a field to rhashtable_iter so that length became 56 bytes and would exceed the size of args in netlink_callback (which is 48 bytes). The netlink diag dump function already has been allocating a iter structure and storing the pointed to that in the args of netlink_callback. ila_xlat also uses rhahstable_iter but is still putting that directly in the arg block. Now since rhashtable_iter size is increased we are overwriting beyond the structure. The next field happens to be cb_mutex pointer in netlink_sock and hence the crash. Fix is to alloc the rhashtable_iter and save it as pointer in arg. Tested: modprobe ila ./ip ila add loc 3333:0:0:0 loc_match 2222:0:0:1, ./ip ila list # NO crash now Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Cyrill Gorcunov authored
While being preparing patches for killing raw sockets via diag netlink interface I noticed that my runs are stuck: | [root@pcs7 ~]# cat /proc/`pidof ss`/stack | [<ffffffff816d1a76>] __lock_sock+0x80/0xc4 | [<ffffffff816d206a>] lock_sock_nested+0x47/0x95 | [<ffffffff8179ded6>] udp_disconnect+0x19/0x33 | [<ffffffff8179b517>] raw_abort+0x33/0x42 | [<ffffffff81702322>] sock_diag_destroy+0x4d/0x52 which has not been the case before. I narrowed it down to the commit | commit 286c72de | Author: Eric Dumazet <edumazet@google.com> | Date: Thu Oct 20 09:39:40 2016 -0700 | | udp: must lock the socket in udp_disconnect() where we start locking the socket for different reason. So the raw_abort escaped the renaming and we have to fix this typo using __udp_disconnect instead. Fixes: 286c72de ("udp: must lock the socket in udp_disconnect()") CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Ahern <dsa@cumulusnetworks.com> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> CC: Andrey Vagin <avagin@openvz.org> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Woojung Huh authored
To utilize phylib with interrupt fully than handling some of phy stuff in the MAC driver, create irq_domain for USB interrupt EP of phy interrupt and pass the irq number to phy_connect_direct() instead of PHY_IGNORE_INTERRUPT. Idea comes from drivers/gpio/gpio-dl2.c Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
As Ilya Lesokhin suggested, we can collapse two skbs at retransmit time even if the skb at the right has fragments. We simply have to use more generic skb_copy_bits() instead of skb_copy_from_linear_data() in tcp_collapse_retrans() Also need to guard this skb_copy_bits() in case there is nothing to copy, otherwise skb_put() could panic if left skb has frags. Tested: Used following packetdrill test // Establish a connection. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> +.100 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 write(4, ..., 200) = 200 +0 > P. 1:201(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 201:401(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 401:601(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 601:801(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 801:1001(200) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1001:1101(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1101:1201(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1201:1301(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1301:1401(100) ack 1 +.100 < . 1:1(0) ack 1 win 257 <nop,nop,sack 1001:1401> // Check that TCP collapse works : +0 > P. 1:1001(1000) ack 1 Reported-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Philippe Reynes authored
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Philippe Reynes authored
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Philippe Reynes authored
The old ethtool api (get_setting and set_setting) has generic mii functions mii_ethtool_sset and mii_ethtool_gset. To support the new ethtool api ({get|set}_link_ksettings), we add two generics mii function mii_ethtool_{get|set}_link_ksettings_get. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Tariq Toukan says: ==================== mlx4 XDP TX refactor This patchset refactors the XDP forwarding case, so that its dedicated transmit queues are managed in a complete separation from the other regular ones. It also adds ethtool counters for XDP cases. Series generated against net-next commit: 22ca904a genetlink: fix error return code in genl_register_family() Thanks, Tariq. v3: * Exposed per ring counters. v2: * Added ethtool counters. * Rebased, now patch 2 reverts Brenden's fix, as the bug no longer exists: 958b3d39 ("net/mlx4_en: fixup xdp tx irq to match rx") * Updated commit message of patch 2. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tariq Toukan authored
XDP statistics are reported in ethtool, in total and per ring, as follows: - xdp_drop: the number of packets dropped by xdp. - xdp_tx: the number of packets forwarded by xdp. - xdp_tx_full: the number of times an xdp forward failed due to a full tx xdp ring. In addition, all packets that are dropped/forwarded by XDP are no longer accounted in rx_packets/rx_bytes of the ring, so that they count traffic that is passed to the stack. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tariq Toukan authored
Separately manage the two types of TX rings: regular ones, and XDP. Upon an XDP set, do not borrow regular TX rings and convert them into XDP ones, but allocate new ones, unless we hit the max number of rings. Which means that in systems with smaller #cores we will not consume the current TX rings for XDP, while we are still in the num TX limit. XDP TX rings counters are not shown in ethtool statistics. Instead, XDP counters will be added to the respective RX rings in a downstream patch. This has no performance implications. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tariq Toukan authored
Support XDP CQ type, and refactor the CQ type enum. Rename the is_tx field to match the change. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
After adding sctp gso, sctp_packet_transmit is a quite big function now. This patch is to extract the codes for packing packet to sctp_packet_pack from sctp_packet_transmit, and add some comments, simplify the err path by freeing auth chunk when freeing packet chunk_list in out path and freeing head skb early if it fails to pack packet. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Roi Dayan says: ==================== misc TC/flower changes This series includes two small changes to the TC flower classifier. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roi Dayan authored
Move common code from fl_delete and fl_detroy to __fl_delete. Signed-off-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roi Dayan authored
tcf_unbind was called in fl_delete but was missing in fl_destroy when force deleting flows. Fixes: 77b9900e ('tc: introduce Flower classifier') Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller authored
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. This includes better integration with the routing subsystem for nf_tables, explicit notrack support and smaller updates. More specifically, they are: 1) Add fib lookup expression for nf_tables, from Florian Westphal. This new expression provides a native replacement for iptables addrtype and rp_filter matches. This is more flexible though, since we can populate the kernel flowi representation to inquire fib to accomodate new usecases, such as RTBH through skb mark. 2) Introduce rt expression for nf_tables, from Anders K. Pedersen. This new expression allow you to access skbuff route metadata, more specifically nexthop and classid fields. 3) Add notrack support for nf_tables, to skip conntracking, requested by many users already. 4) Add boilerplate code to allow to use nf_log infrastructure from nf_tables ingress. 5) Allow to mangle pkttype from nf_tables prerouting chain, to emulate the xtables cluster match, from Liping Zhang. 6) Move socket lookup code into generic nf_socket_* infrastructure so we can provide a native replacement for the xtables socket match. 7) Make sure nfnetlink_queue data that is updated on every packets is placed in a different cache from read-only data, from Florian Westphal. 8) Handle NF_STOLEN from nf_tables core, also from Florian Westphal. 9) Start round robin number generation in nft_numgen from zero, instead of n-1, for consistency with xtables statistics match, patch from Liping Zhang. 10) Set GFP_NOWARN flag in skbuff netlink allocations in nfnetlink_log, given we retry with a smaller allocation on failure, from Calvin Owens. 11) Cleanup xt_multiport to use switch(), from Gao feng. 12) Remove superfluous check in nft_immediate and nft_cmp, from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 01 Nov, 2016 9 commits
-
-
Florian Westphal authored
As the comment indicates, the data at the end of nfqnl_instance struct is written on every queue/dequeue, so it should reside in its own cacheline. Before this change, 'lock' was in first cacheline so we dirtied both. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Liping Zhang authored
After call nft_data_init, size is already validated and desc.len will not exceed the sizeof(struct nft_data), i.e. 16 bytes. So it will never exceed U8_MAX. Furthermore, in nft_immediate_init, we forget to call nft_data_uninit when desc.len exceeds U8_MAX, although this will not happen, but it's a logical mistake. Now remove these redundant validation introduced by commit 36b701fa ("netfilter: nf_tables: validate maximum value of u32 netlink attributes") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Anders K. Pedersen authored
Introduces an nftables rt expression for routing related data with support for nexthop (i.e. the directly connected IP address that an outgoing packet is sent to), which can be used either for matching or accounting, eg. # nft add rule filter postrouting \ ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop This will drop any traffic to 192.168.1.0/24 that is not routed via 192.168.0.1. # nft add rule filter postrouting \ flow table acct { rt nexthop timeout 600s counter } # nft add rule ip6 filter postrouting \ flow table acct { rt nexthop timeout 600s counter } These rules count outgoing traffic per nexthop. Note that the timeout releases an entry if no traffic is seen for this nexthop within 10 minutes. # nft add rule inet filter postrouting \ ether type ip \ flow table acct { rt nexthop timeout 600s counter } # nft add rule inet filter postrouting \ ether type ip6 \ flow table acct { rt nexthop timeout 600s counter } Same as above, but via the inet family, where the ether type must be specified explicitly. "rt classid" is also implemented identical to "meta rtclassid", since it is more logical to have this match in the routing expression going forward. Signed-off-by: Anders K. Pedersen <akp@cohaesio.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
We need this split to reuse existing codebase for the upcoming nf_tables socket expression. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
Move layer 2 packet logging into nf_log_l2packet() that resides in nf_log_common.c, so this can be shared by both bridge and netdev families. This patch adds the boiler plate code to register the netdev logging family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
Add FIB expression, supported for ipv4, ipv6 and inet family (the latter just dispatches to ipv4 or ipv6 one based on nfproto). Currently supports fetching output interface index/name and the rtm_type associated with an address. This can be used for adding path filtering. rtm_type is useful to e.g. enforce a strong-end host model where packets are only accepted if daddr is configured on the interface the packet arrived on. The fib expression is a native nftables alternative to the xtables addrtype and rp_filter matches. FIB result order for oif/oifname retrieval is as follows: - if packet is local (skb has rtable, RTF_LOCAL set, this will also catch looped-back multicast packets), set oif to the loopback interface. - if fib lookup returns an error, or result points to local, store zero result. This means '--local' option of -m rpfilter is not supported. It is possible to use 'fib type local' or add explicit saddr/daddr matching rules to create exceptions if this is really needed. - store result in the destination register. In case of multiple routes, search set for desired oif in case strict matching is requested. ipv4 and ipv6 behave fib expressions are supposed to behave the same. [ I have collapsed Arnd Bergmann's ("netfilter: nf_tables: fib warnings") http://patchwork.ozlabs.org/patch/688615/ to address fallout from this patch after rebasing nf-next, that was posted to address compilation warnings. --pablo ] Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Wei Yongjun authored
Fix to return a negative error code from the idr_alloc() error handling case instead of 0, as done elsewhere in this function. Also fix the return value check of idr_alloc() since idr_alloc return negative errors on failure, not zero. Fixes: 2ae0f17d ("genetlink: use idr to track families") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Ahern authored
Enable support for IPv4 multicast: - similar to unicast the flow struct is updated to L3 master device if relevant prior to calling fib_rules_lookup. The table id is saved to the lookup arg so the rule action for ipmr can return the table associated with the device. - ip_mr_forward needs to check for master device mismatch as well since the skb->dev is set to it - allow multicast address on VRF device for Rx by checking for the daddr in the VRF device as well as the original ingress device - on Tx need to drop to __mkroute_output when FIB lookup fails for multicast destination address. - if CONFIG_IP_MROUTE_MULTIPLE_TABLES is enabled VRF driver creates IPMR FIB rules on first device create similar to FIB rules. In addition the VRF driver does not divert IPv4 multicast packets: it breaks on Tx since the fib lookup fails on the mcast address. With this patch, ipmr forwarding and local rx/tx work. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Parthasarathy Bhuvaragan says: ==================== tipc: socket layer improvements The following issues with the current socket layer hinders socket diagnostics implementation, which led to this patch series. 1. tipc socket state is derived from multiple variables like sock->state, tsk->probing_state and tsk->connected. This style forces us to export multiple attributes to the user space, which has to be backward compatible. 2. Abuse of sock->state cannot be exported to user-space without requiring tipc specific hacks in the user-space. - For connection less (CL) sockets sock->state is overloaded to tipc state SS_READY. - For connection oriented (CO) listening socket sock->state is overloaded to tipc state SS_LISTEN. This series is split into four: 1. Bug fixes in patch #1,2,3. 2. Minor cleanups in patch#4-5. 3. Express all tipc states using a single variable in patch#6-8. 4. Migrate the new tipc states to sk->sk_state in patch#9-16. The figures below represents the FSM after this series: Stream Server Listening Socket: +-----------+ +-------------+ | TIPC_OPEN |------>| TIPC_LISTEN | +-----------+ +-------------+ Stream Server Data Socket: +-----------+ +------------------+ | TIPC_OPEN |------>| TIPC_ESTABLISHED | +-----------+ +------------------+ ^ | | | | v +--------------------+ | TIPC_DISCONNECTING | +--------------------+ Stream Socket Client: +-----------+ +-----------------+ | TIPC_OPEN |------>| TIPC_CONNECTING |------+ +-----------+ +-----------------+ | | | | | v | +------------------+ | | TIPC_ESTABLISHED | | +------------------+ | ^ | | | | | | v | +--------------------+ | | TIPC_DISCONNECTING |<--+ +--------------------+ NOTE: This is just a base refractoring required for socket diagnostics. TIPC socket diagnostics support will be introduced in a later series. v2: - remove extra cast and parenthesis as suggested by David S. Miller in #4. - map new tipc state values to tcp states to address Eric Dumazet's concern, thus allow the usage of generic sk_* helpers. This is done in patch#10-15. - remove TIPC_PROBING state and replace it with probe_unacked flag in #11. - replace the TIPC_CLOSING state in v1 with sk_shutdown flag in #14. - introduce __tipc_shutdown() to avoid code duplication in #14. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-