- 17 Jun, 2019 7 commits
-
-
Fernando Fernandez Mancera authored
This new UAPI file is going to be used by the xt and nft common SYNPROXY infrastructure. It is needed to avoid duplicated code. Signed-off-by:
Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
git://blackhole.kfki.hu/nf-nextPablo Neira Ayuso authored
Jozsef Kadlecsik says: ==================== ipset patches for nf-next - Remove useless memset() calls, nla_parse_nested/nla_parse erase the tb array properly, from Florent Fourcot. - Merge the uadd and udel functions, the code is nicer this way, also from Florent Fourcot. - Add a missing check for the return value of a nla_parse[_deprecated] call, from Aditya Pakki. - Add the last missing check for the return value of nla_parse[_deprecated] call. - Fix error path and release the references properly in set_target_v3_checkentry(). - Fix memory accounting which is reported to userspace for hash types on resize, from Stefano Brivio. - Update my email address to kadlec@netfilter.org. The patch covers all places in the source tree where my kadlec@blackhole.kfki.hu address could be found. ==================== Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Christian Brauner authored
Currently, the /proc/sys/net/bridge folder is only created in the initial network namespace. This patch ensures that the /proc/sys/net/bridge folder is available in each network namespace if the module is loaded and disappears from all network namespaces when the module is unloaded. In doing so the patch makes the sysctls: bridge-nf-call-arptables bridge-nf-call-ip6tables bridge-nf-call-iptables bridge-nf-filter-pppoe-tagged bridge-nf-filter-vlan-tagged bridge-nf-pass-vlan-input-dev apply per network namespace. This unblocks some use-cases where users would like to e.g. not do bridge filtering for bridges in a specific network namespace while doing so for bridges located in another network namespace. The netfilter rules are afaict already per network namespace so it should be safe for users to specify whether bridge devices inside a network namespace are supposed to go through iptables et al. or not. Also, this can already be done per-bridge by setting an option for each individual bridge via Netlink. It should also be possible to do this for all bridges in a network namespace via sysctls. Cc: Tyler Hicks <tyhicks@canonical.com> Signed-off-by:
Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Christian Brauner authored
This ports the sysctls to use struct brnf_net. With this patch we make it possible to namespace the br_netfilter module in the following patch. Signed-off-by:
Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
Reject flags that are not supported with EINVAL. Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
____nf_conntrack_find() performs checks on the conntrack objects in this order: 1. if (nf_ct_is_expired(ct)) This fetches ct->timeout, in third cache line. The hnnode that is used to store the list pointers resides in the first (origin) or second (reply tuple) cache lines. This test rarely passes, but its necessary to reap obsolete entries. 2. if (nf_ct_is_dying(ct)) This fetches ct->status, also in third cache line. The test is useless, and can be removed: Consider: cpu0 cpu1 ct = ____nf_conntrack_find() atomic_inc_not_zero(ct) -> ok nf_ct_key_equal -> ok is_dying -> DYING bit not set, ok set_bit(ct, DYING); ... unhash ... etc. return ct -> returning a ct with dying bit set, despite having a test for it. This (unlikely) case is fine - refcount prevents ct from getting free'd. 3. if (nf_ct_key_equal(h, tuple, zone, net)) nf_ct_key_equal checks in following order: 1. Tuple equal (first or second cacheline) 2. Zone equal (third cacheline) 3. confirmed bit set (->status, third cacheline) 4. net namespace match (third cacheline). Swapping "timeout" and "cpu" places timeout in the first cacheline. This has two advantages: 1. For a conntrack that won't even match the original tuple, we will now only fetch the first and maybe the second cacheline instead of always accessing the 3rd one as well. 2. in case of TCP ct->timeout changes frequently because we reduce/increase it when there are packets outstanding in the network. The first cacheline contains both the reference count and the ct spinlock, i.e. moving timeout there avoids writes to 3rd cacheline. The restart sequence in __nf_conntrack_find() is removed, if we found a candidate, but then fail to increment the refcount or discover the tuple has changed (object recycling), just pretend we did not find an entry. A second lookup won't find anything until another CPU adds a new conntrack with identical tuple into the hash table, which is very unlikely. We have the confirmation-time checks (when we hold hash lock) that deal with identical entries and even perform clash resolution in some cases. Signed-off-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Stéphane Veyret authored
This patch allows to add, list and delete expectations via nft objref infrastructure and assigning these expectations via nft rule. This allows manual port triggering when no helper is defined to manage a specific protocol. For example, if I have an online game which protocol is based on initial connection to TCP port 9753 of the server, and where the server opens a connection to port 9876, I can set rules as follow: table ip filter { ct expectation mygame { protocol udp; dport 9876; timeout 2m; size 1; } chain input { type filter hook input priority 0; policy drop; tcp dport 9753 ct expectation set "mygame"; } chain output { type filter hook output priority 0; policy drop; udp dport 9876 ct status expected accept; } } Signed-off-by:
Stéphane Veyret <sveyret@gmail.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- 10 Jun, 2019 7 commits
-
-
Jozsef Kadlecsik authored
It's better to use my kadlec@netfilter.org email address in the source code. I might not be able to use kadlec@blackhole.kfki.hu in the future. Signed-off-by:
Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by:
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
Stefano Brivio authored
If a fresh array block is allocated during resize, the current in-memory set size should be increased by the size of the block, not replaced by it. Before the fix, adding entries to a hash set type, leading to a table resize, caused an inconsistent memory size to be reported. This becomes more obvious when swapping sets with similar sizes: # cat hash_ip_size.sh #!/bin/sh FAIL_RETRIES=10 tries=0 while [ ${tries} -lt ${FAIL_RETRIES} ]; do ipset create t1 hash:ip for i in `seq 1 4345`; do ipset add t1 1.2.$((i / 255)).$((i % 255)) done t1_init="$(ipset list t1|sed -n 's/Size in memory: \(.*\)/\1/p')" ipset create t2 hash:ip for i in `seq 1 4360`; do ipset add t2 1.2.$((i / 255)).$((i % 255)) done t2_init="$(ipset list t2|sed -n 's/Size in memory: \(.*\)/\1/p')" ipset swap t1 t2 t1_swap="$(ipset list t1|sed -n 's/Size in memory: \(.*\)/\1/p')" t2_swap="$(ipset list t2|sed -n 's/Size in memory: \(.*\)/\1/p')" ipset destroy t1 ipset destroy t2 tries=$((tries + 1)) if [ ${t1_init} -lt 10000 ] || [ ${t2_init} -lt 10000 ]; then echo "FAIL after ${tries} tries:" echo "T1 size ${t1_init}, after swap ${t1_swap}" echo "T2 size ${t2_init}, after swap ${t2_swap}" exit 1 fi done echo "PASS" # echo -n 'func hash_ip4_resize +p' > /sys/kernel/debug/dynamic_debug/control # ./hash_ip_size.sh [ 2035.018673] attempt to resize set t1 from 10 to 11, t 00000000fe6551fa [ 2035.078583] set t1 resized from 10 (00000000fe6551fa) to 11 (00000000172a0163) [ 2035.080353] Table destroy by resize 00000000fe6551fa FAIL after 4 tries: T1 size 9064, after swap 71128 T2 size 71128, after swap 9064 Reported-by:
NOYB <JunkYardMail1@Frontier.com> Fixes: 9e41f26a ("netfilter: ipset: Count non-static extension memory for userspace") Signed-off-by:
Stefano Brivio <sbrivio@redhat.com> Signed-off-by:
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
Jozsef Kadlecsik authored
Fix error path and release the references properly. Signed-off-by:
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
Jozsef Kadlecsik authored
In dump_init() the outdated comment was incorrect and we had a missing validation check of nla_parse_deprecated(). Signed-off-by:
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
Aditya Pakki authored
When nla_parse fails, we should not use the results (the first argument). The fix checks if it fails, and if so, returns its error code upstream. Signed-off-by:
Aditya Pakki <pakki001@umn.edu> Signed-off-by:
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
Florent Fourcot authored
Both functions are using exactly the same code, except the command value passed to call_ad function. Signed-off-by:
Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by:
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
Florent Fourcot authored
One of the memset call is buggy: it does not erase full array, but only pointer size. Moreover, after a check, first step of nla_parse_nested/nla_parse is to erase tb array as well. We can remove both calls safely. Signed-off-by:
Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by:
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
-
- 06 Jun, 2019 9 commits
-
-
wenxu authored
CONFIG_NETFILTER=m and CONFIG_NF_DEFRAG_IPV6 is not set ERROR: "nf_ct_frag6_gather" [net/ipv6/ipv6.ko] undefined! Fixes: c9bb6165 ("netfilter: nf_conntrack_bridge: fix CONFIG_IPV6=y") Reported-by:
kbuild test robot <lkp@intel.com> Signed-off-by:
wenxu <wenxu@ucloud.cn> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Zhiqiang Liu authored
small cleanup: "struct request_sock_queue *queue" parameter of reqsk_queue_unlink func is never used in the func, so we can remove it. Signed-off-by:
Zhiqiang Liu <liuzhiqiang26@huawei.com> Reviewed-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Heiner Kallweit authored
In the early days of phylib we had a functionality that changed to the next lower speed in fixed mode if no link was established after a certain period of time. This functionality has been removed years ago, and state PHY_FORCING isn't needed any longer. Instead we can go from UP to RUNNING or NOLINK directly (same as in autoneg mode). Signed-off-by:
Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Zhu Yanjun authored
The variable cache_allocs is to indicate how many frags (KiB) are in one rds connection frag cache. The command "rds-info -Iv" will output the rds connection cache statistics as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 1.1.1.14 1.1.1.14 58 255 fe80::2:c903:a:7a31 fe80::2:c903:a:7a31 send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096, rdma_mr_size=257, cache_allocs=12 " This means that there are about 12KiB frag in this rds connection frag cache. Since rds.h in rds-tools is not related with the kernel rds.h, the change in kernel rds.h does not affect rds-tools. rds-info in rds-tools 2.0.5 and 2.0.6 is tested with this commit. It works well. Signed-off-by:
Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Biao Huang says: ==================== complete dwmac-mediatek driver and fix flow control issue Changes in v2: patch#1: there is no extra action in mediatek_dwmac_remove, remove it v1: This series mainly complete dwmac-mediatek driver: 1. add power on/off operations for dwmac-mediatek. 2. disable rx watchdog to reduce rx path reponding time. 3. change the default value of tx-frames from 25 to 1, so ptp4l will test pass by default. and also fix the issue that flow control won't be disabled any more once being enabled. ==================== Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Biao Huang authored
Current dwmac4_flow_ctrl will not clear GMAC_RX_FLOW_CTRL_RFE/GMAC_RX_FLOW_CTRL_RFE bits, so MAC hw will keep flow control on although expecting flow control off by ethtool. Add codes to fix it. Fixes: 477286b5 ("stmmac: add GMAC4 core support") Signed-off-by:
Biao Huang <biao.huang@mediatek.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Biao Huang authored
the default value of tx-frames is 25, it's too late when passing tstamp to stack, then the ptp4l will fail: ptp4l -i eth0 -f gPTP.cfg -m ptp4l: selected /dev/ptp0 as PTP clock ptp4l: port 1: INITIALIZING to LISTENING on INITIALIZE ptp4l: port 0: INITIALIZING to LISTENING on INITIALIZE ptp4l: port 1: link up ptp4l: timed out while polling for tx timestamp ptp4l: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l: port 1: send peer delay response failed ptp4l: port 1: LISTENING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l tests pass when changing the tx-frames from 25 to 1 with ethtool -C option. It should be fine to set tx-frames default value to 1, so ptp4l will pass by default. Signed-off-by:
Biao Huang <biao.huang@mediatek.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Biao Huang authored
disable rx watchdog for dwmac-mediatek, then the hw will issue a rx interrupt once receiving a packet, so the responding time for rx path will be reduced. Signed-off-by:
Biao Huang <biao.huang@mediatek.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Biao Huang authored
add Ethernet power on/off operations in init/exit flow. Signed-off-by:
Biao Huang <biao.huang@mediatek.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 05 Jun, 2019 17 commits
-
-
Enrico Weigelt authored
IS_ERR() already calls unlikely(), so this extra likely() call around the !IS_ERR() is not needed. Signed-off-by:
Enrico Weigelt <info@metux.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Enrico Weigelt authored
IS_ERR() already calls unlikely(), so this extra unlikely() call around IS_ERR() is not needed. Signed-off-by:
Enrico Weigelt <info@metux.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Enrico Weigelt authored
IS_ERR() already calls unlikely(), so this extra unlikely() call around IS_ERR() is not needed. Signed-off-by:
Enrico Weigelt <info@metux.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Enrico Weigelt authored
IS_ERR() already calls unlikely(), so this extra likely() call around the !IS_ERR() is not needed. Signed-off-by:
Enrico Weigelt <info@metux.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Enrico Weigelt authored
IS_ERR() already calls unlikely(), so this extra likely() call around the !IS_ERR() is not needed. Signed-off-by:
Enrico Weigelt <info@metux.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Gustavo A. R. Silva authored
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct nfp_tun_active_tuns { ... struct route_ip_info { __be32 ipv4; __be32 egress_port; __be32 extra[2]; } tun_info[]; }; Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. So, replace the following form: sizeof(struct nfp_tun_active_tuns) + sizeof(struct route_ip_info) * count with: struct_size(payload, tun_info, count) This code was detected with the help of Coccinelle. Signed-off-by:
Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by:
Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Lihong Yang authored
The PF driver state flag __I40E_VIRTCHNL_OP_PENDING needs to be checked and set at the beginning of i40e_ndo_set_vf_mac. Otherwise, if there are error conditions before it, the flag will be cleared unexpectedly by this function to cause potential race conditions. Hence move the check to the top of this function. Signed-off-by:
Lihong Yang <lihong.yang@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Lihong Yang authored
The VF configuration returned in i40e_ndo_get_vf_config is already stored by the PF. There is no dependency on any specific state of the VF to return the configuration. Drop the check against I40E_VF_STATE_INIT since it is not needed. Signed-off-by:
Lihong Yang <lihong.yang@intel.com> Tested-by:
Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller authored
Jeff Kirsher says: ==================== 10GbE Intel Wired LAN Driver Updates 2019-06-05 This series contains updates to mainly ixgbe, with a few updates to i40e, net, ice and hns2 driver. Jan adds support for tracking each queue pair for whether or not AF_XDP zero copy is enabled. Also updated the ixgbe driver to use the netdev-provided umems so that we do not need to contain these structures in our own adapter structure. William Tu provides two fixes for AF_XDP statistics which were causing incorrect counts. Jake reduces the PTP transmit timestamp timeout from 15 seconds to 1 second, which is still well after the maximum expected delay. Also fixes an issues with the PTP SDP pin setup which was not properly aligning on a full second, so updated the code to account for the cyclecounter multiplier and simplify the code to make the intent of the calculations more clear. Updated the function header comments to help with the code documentation. Added support for SDP/PPS output for x550 devices, which is slightly different than x540 devices that currently have this support. Anirudh adds a new define for Link Layer Discovery Protocol to the networking core, so that drivers do not have to create and use their own definitions. In addition, update all the drivers currently defining their own LLDP define to use the new networking core define. ==================== Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Kangjie Lu authored
If ixgbevf_write_msg_read_ack fails, return its error code upstream Signed-off-by:
Kangjie Lu <kjlu@umn.edu> Tested-by:
Andrew Bowers <andrewx.bowers@intel.com> Reviewed-by:
Mukesh Ojha <mojha@codeaurora.org> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
Similar to the X540 hardware, enable support for generating a 1pps output signal on SDP0. This support is slightly different to the X540 hardware, because of the register layout changes. First, the system time register is now represented in 'cycles' and 'billions of cycles'. Second, we need to also program the TSSDP register, as well as the ESDP register. Third, the clock output uses only FREQOUT, instead of a full 64bit value for the output clock period. Finally, we have to use the ST0 bit instead of the SYNCLK bit in the TSAUXC register. This support should work even for the hardware with a higher frequency clock, as it carefully takes into account the multiply and shift of the cycle counter used. We also set the pps configuration to 1, since we now support generating a pulse per second output. Signed-off-by:
Jacob Keller <jacob.e.keller@intel.com> Tested-by:
Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anirudh Venkataramanan authored
Remove references to HCLGE_MAC_ETHERTYPE_LLDP and use ETH_P_LLDP instead. Signed-off-by:
Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by:
Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jeff Kirsher authored
Instead of using a local define for the LLDP ethertype, use the kernel define ETH_P_LLDP. Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anirudh Venkataramanan authored
Remove references to IXGBE_ETH_P_LLD and use ETH_P_LLDP instead. Signed-off-by:
Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by:
Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anirudh Venkataramanan authored
Remove references to I40E_ETH_P_LLDP and use ETH_P_LLDP instead. Signed-off-by:
Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by:
Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Anirudh Venkataramanan authored
Add a new define ETH_P_LLDP for Link Layer Discovery Protocol (LLDP) ethertype. Suggested-by:
Bruce Allan <bruce.w.allan@intel.com> Signed-off-by:
Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by:
Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
This function was missing a documentation comment. Add one now. Signed-off-by:
Jacob Keller <jacob.e.keller@intel.com> Tested-by:
Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-