- 27 Jan, 2017 2 commits
-
-
Andreas Schultz authored
Auto-load the module when userspace asks for the gtp netlink family. Signed-off-by: Andreas Schultz <aschultz@tpip.net> Acked-by: Harald Welte <laforge@netfilter.org> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pablo Neira authored
Unlike ipv4, this control socket is shared by all cpus so we cannot use it as scratchpad area to annotate the mark that we pass to ip6_xmit(). Add a new parameter to ip6_xmit() to indicate the mark. The SCTP socket family caches the flowi6 structure in the sctp_transport structure, so we cannot use to carry the mark unless we later on reset it back, which I discarded since it looks ugly to me. Fixes: bf99b4de ("tcp: fix mark propagation with fwmark_reflect enabled") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 26 Jan, 2017 10 commits
-
-
Kazuya Mizuguchi authored
"swiotlb buffer is full" errors occur after repeated initialisation of a device - f.e. suspend/resume or ip link set up/down. This is because memory mapped using dma_map_single() in ravb_ring_format() and ravb_start_xmit() is not released. Resolve this problem by unmapping descriptors when freeing rings. Fixes: c156633f ("Renesas Ethernet AVB driver proper") Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com> [simon: reworked] Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller authored
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains a large batch with Netfilter fixes for your net tree, they are: 1) Two patches to solve conntrack garbage collector cpu hogging, one to remove GC_MAX_EVICTS and another to look at the ratio (scanned entries vs. evicted entries) to make a decision on whether to reduce or not the scanning interval. From Florian Westphal. 2) Two patches to fix incorrect set element counting if NLM_F_EXCL is is not set. Moreover, don't decrenent set->nelems from abort patch if -ENFILE which leaks a spare slot in the set. This includes a patch to deconstify the set walk callback to update set->ndeact. 3) Two fixes for the fwmark_reflect sysctl feature: Propagate mark to reply packets both from nf_reject and local stack, from Pau Espin Pedrol. 4) Fix incorrect handling of loopback traffic in rpfilter and nf_tables fib expression, from Liping Zhang. 5) Fix oops on stateful objects netlink dump, when no filter is specified. Also from Liping Zhang. 6) Fix a build error if proc is not available in ipt_CLUSTERIP, related to fix that was applied in the previous batch for net. From Arnd Bergmann. 7) Fix lack of string validation in table, chain, set and stateful object names in nf_tables, from Liping Zhang. Moreover, restrict maximum log prefix length to 127 bytes, otherwise explicitly bail out. 8) Two patches to fix spelling and typos in nf_tables uapi header file and Kconfig, patches from Alexander Alemayhu and William Breathitt Gray. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.open-mesh.org/linux-mergeDavid S. Miller authored
Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - fix reference count handling on fragmentation error, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
commit 17bedab2 ("bpf: xdp: Allow head adjustment in XDP prog") added a new XDP helper to prepend and remove data from a frame. Make virtio_net reject programs making use of this helper until proper support is added. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Fastabend authored
In the small buffer case during driver unload we currently use put_page instead of dev_kfree_skb. Resolve this by adding a check for virtnet mode when checking XDP queue type. Also name the function so that the code reads correctly to match the additional check. Fixes: bb91accf ("virtio-net: XDP support for small buffers") Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Hayes Wang says: ==================== r8152: fix scheduling napi v3: simply the argument for patch #3. Replace &tp->napi with napi. v2: Add smp_mb__after_atomic() for patch #1. v1: Scheduling the napi during the following periods would let it be ignored. And the events wouldn't be handled until next napi_schedule() is called. 1. after napi_disable and before napi_enable(). 2. after all actions of napi function is completed and before calling napi_complete(). If no next napi_schedule() is called, tx or rx would stop working. In order to avoid these situations, the followings solutions are applied. 1. prevent start_xmit() from calling napi_schedule() during runtime suspend or after napi_disable(). 2. re-schedule the napi for tx if it is necessary. 3. check if any rx is finished or not after napi_enable(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
hayeswang authored
Schedule the napi after napi_enable() for rx, if it is necessary. If the rx is completed when napi is disabled, the sheduling of napi would be lost. Then, no one handles the rx packet until next napi is scheduled. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
hayeswang authored
Re-schedule napi after napi_complete() for tx, if it is necessay. In r8152_poll(), if the tx is completed after tx_bottom() and before napi_complete(), the scheduling of napi would be lost. Then, no one handles the next tx until the next napi_schedule() is called. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
hayeswang authored
Stop the tx when the napi is disabled to prevent napi_schedule() is called. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
hayeswang authored
Adjust the setting of the flag of SELECTIVE_SUSPEND to prevent start_xmit() from calling napi_schedule() directly during runtime suspend. After calling napi_disable() or clearing the flag of WORK_ENABLE, scheduling the napi is useless. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 25 Jan, 2017 14 commits
-
-
Florian Fainelli authored
Commit 448b4482 ("net: dsa: Add lockdep class to tx queues to avoid lockdep splat") removed the netif_device_detach() call done in dsa_slave_suspend() which is necessary, and paired with a corresponding netif_device_attach(), bring it back. Fixes: 448b4482 ("net: dsa: Add lockdep class to tx queues to avoid lockdep splat") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Geert Uytterhoeven says: ==================== net: phy: leds: Fix truncated LED trigger names and crashes I started seeing crashes during s2ram and poweroff on all my ARM boards, like: Unable to handle kernel NULL pointer dereference at virtual address 00000000 ... [<c04116d4>] (__list_del_entry_valid) from [<c05e8948>] (led_trigger_unregister+0x34/0xcc) [<c05e8948>] (led_trigger_unregister) from [<c05336c4>] (phy_led_triggers_unregister+0x28/0x34) [<c05336c4>] (phy_led_triggers_unregister) from [<c0531d44>] (phy_detach+0x30/0x74) [<c0531d44>] (phy_detach) from [<c0538bdc>] (sh_eth_close+0x64/0x9c) [<c0538bdc>] (sh_eth_close) from [<c04d4ce0>] (dpm_run_callback+0x48/0xc8) or: list_del corruption. prev->next should be dede6540, but was 2e323931 ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:52! ... [<c02f6d70>] (__list_del_entry_valid) from [<c0425168>] (led_trigger_unregister+0x34/0xcc) [<c0425168>] (led_trigger_unregister) from [<c03a05a0>] (phy_led_triggers_unregister+0x28/0x34) [<c03a05a0>] (phy_led_triggers_unregister) from [<c039ec04>] (phy_detach+0x30/0x74) [<c039ec04>] (phy_detach) from [<c03a4fc0>] (sh_eth_close+0x6c/0xa4) [<c03a4fc0>] (sh_eth_close) from [<c0483234>] (__dev_close_many+0xac/0xd0) As the only clue was a kernel message like sh-eth ee700000.ethernet eth0: No phy led trigger registered for speed(100) I had to bisected this, leading to commit 4567d686 ("phy: increase size of MII_BUS_ID_SIZE and bus_id"). Reverting that commit fixed the issue. More investigation revealed the crashes are due to the combination of two things: - Truncated LED trigger names, leading to duplicate names, and registration failures, - Bad error handling in case of registration failures. Both are fixed by this patch series. Changes compared to v1: - Add Reviewed-by, - New patch "net: phy: leds: Break dependency of phy.h on phy_led_triggers.h", - Drop moving the include of <linux/phy_led_triggers.h>, as <linux/phy.h> no longer includes it, - #include <linux/phy.h> from <linux/phy_led_triggers.h>. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geert Uytterhoeven authored
Commit 4567d686 ("phy: increase size of MII_BUS_ID_SIZE and bus_id") increased the size of MII bus IDs, but forgot to update the private definition in <linux/phy_led_triggers.h>. This may cause: 1. Truncation of LED trigger names, 2. Duplicate LED trigger names, 3. Failures registering LED triggers, 4. Crashes due to bad error handling in the LED trigger failure path. To fix this, and prevent the definitions going out of sync again in the future, let the PHY LED trigger code use the existing MII_BUS_ID_SIZE definition. Example: - Before I had triggers "ee700000.etherne:01:100Mbps" and "ee700000.etherne:01:10Mbps", - After the increase of MII_BUS_ID_SIZE, both became "ee700000.ethernet-ffffffff:01:" => FAIL, - Now, the triggers are "ee700000.ethernet-ffffffff:01:100Mbps" and "ee700000.ethernet-ffffffff:01:10Mbps", which are unique again. Fixes: 4567d686 ("phy: increase size of MII_BUS_ID_SIZE and bus_id") Fixes: 2e0bc452 ("net: phy: leds: add support for led triggers on phy link state change") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geert Uytterhoeven authored
<linux/phy.h> includes <linux/phy_led_triggers.h>, which is not really needed. Drop the include from <linux/phy.h>, and add it to all users that didn't include it explicitly. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geert Uytterhoeven authored
phy_attach_direct() ignores errors returned by phy_led_triggers_register(). I think that's OK, as LED triggers can be considered a non-critical feature. However, this causes problems later: - phy_led_trigger_change_speed() will access the array phy_device.phy_led_triggers, which has been freed in the error path of phy_led_triggers_register(), which may lead to a crash. - phy_led_triggers_unregister() will access the same array, leading to crashes during s2ram or poweroff, like: Unable to handle kernel NULL pointer dereference at virtual address 00000000 ... [<c04116d4>] (__list_del_entry_valid) from [<c05e8948>] (led_trigger_unregister+0x34/0xcc) [<c05e8948>] (led_trigger_unregister) from [<c05336c4>] (phy_led_triggers_unregister+0x28/0x34) [<c05336c4>] (phy_led_triggers_unregister) from [<c0531d44>] (phy_detach+0x30/0x74) [<c0531d44>] (phy_detach) from [<c0538bdc>] (sh_eth_close+0x64/0x9c) [<c0538bdc>] (sh_eth_close) from [<c04d4ce0>] (dpm_run_callback+0x48/0xc8) or: list_del corruption. prev->next should be dede6540, but was 2e323931 ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:52! ... [<c02f6d70>] (__list_del_entry_valid) from [<c0425168>] (led_trigger_unregister+0x34/0xcc) [<c0425168>] (led_trigger_unregister) from [<c03a05a0>] (phy_led_triggers_unregister+0x28/0x34) [<c03a05a0>] (phy_led_triggers_unregister) from [<c039ec04>] (phy_detach+0x30/0x74) [<c039ec04>] (phy_detach) from [<c03a4fc0>] (sh_eth_close+0x6c/0xa4) [<c03a4fc0>] (sh_eth_close) from [<c0483234>] (__dev_close_many+0xac/0xd0) To fix this, clear phy_device.phy_num_led_triggers in the error path of phy_led_triggers_register() fails. Note that the "No phy led trigger registered for speed" message will still be printed on link speed changes, which is a good cue that something went wrong with the LED triggers. Fixes: 2e0bc452 ("net: phy: leds: add support for led triggers on phy link state change") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Crispin authored
When the binding was defined, I was not aware that mt2701 was an earlier version of the SoC. For sake of consistency, the ethernet driver should use mt2701 inside the compat string as this is the earliest SoC with the ethernet core. The ethernet driver is currently of no real use until we finish and upstream the DSA driver. There are no users of this binding yet. It should be safe to fix this now before it is too late and we need to provide backward compatibility for the mt7623-eth compat string. Reported-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
John Crispin authored
When the binding was defined, I was not aware that mt2701 was an earlier version of the SoC. For sake of consistency, the ethernet driver should use mt2701 inside the compat string as this is the earliest SoC with the ethernet core. The ethernet driver is currently of no real use until we finish and upstream the DSA driver. There are no users of this binding yet. It should be safe to fix this now before it is too late and we need to provide backward compatibility for the mt7623-eth compat string. Reported-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: John Crispin <john@phrozen.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Michael Chan says: ==================== bnxt_en: Fix RTNL lock usage in bnxt_sp_task(). There are 2 function calls from bnxt_sp_task() that have buggy RTNL usage. These 2 functions take RTNL lock under some conditions, but some callers (such as open, ethtool) have already taken RTNL. These 3 patches fix the issue by making it clear that callers must take RTNL. If the caller is bnxt_sp_task() which does not automatically take RTNL, we add a common scheme for bnxt_sp_task() to call these functions properly under RTNL. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
bnxt_get_port_module_status() calls bnxt_update_link() which expects RTNL to be held. In bnxt_sp_task() that does not hold RTNL, we need to call it with a prior call to bnxt_rtnl_lock_sp() and the call needs to be moved to the end of bnxt_sp_task(). Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
bnxt_update_link() is called from multiple code paths. Most callers, such as open, ethtool, already hold RTNL. Only the caller bnxt_sp_task() does not. So it is a bug to take RTNL inside bnxt_update_link(). Fix it by removing the RTNL inside bnxt_update_link(). The function now expects the caller to always hold RTNL. In bnxt_sp_task(), call bnxt_rtnl_lock_sp() before calling bnxt_update_link(). We also need to move the call to the end of bnxt_sp_task() since it will be clearing the BNXT_STATE_IN_SP_TASK bit. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
In bnxt_sp_task(), we set a bit BNXT_STATE_IN_SP_TASK so that bnxt_close() will synchronize and wait for bnxt_sp_task() to finish. Some functions in bnxt_sp_task() require us to clear BNXT_STATE_IN_SP_TASK and then acquire rtnl_lock() to prevent race conditions. There are some bugs related to this logic. This patch refactors the code to have common bnxt_rtnl_lock_sp() and bnxt_rtnl_unlock_sp() to handle the RTNL and the clearing/setting of the bit. Multiple functions will need the same logic. We also need to move bnxt_reset() to the end of bnxt_sp_task(). Functions that clear BNXT_STATE_IN_SP_TASK must be the last functions to be called in bnxt_sp_task(). The common scheme will handle the condition properly. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jason Baron authored
sock_reset_flag() maps to __clear_bit() not the atomic version clear_bit(). Thus, we need smp_mb(), smp_mb__after_atomic() is not sufficient. Fixes: 3c715127 ("tcp: add memory barriers to write space paths") Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Eric Dumazet <edumazet@google.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Now sctp gso puts segments into skb's frag_list, then processes these segments in skb_segment. But skb_segment handles them only when gs is enabled, as it's in the same branch with skb's frags. Although almost all the NICs support sg other than some old ones, but since commit 1e16aa3d ("net: gso: use feature flag argument in all protocol gso handlers"), features &= skb->dev->hw_enc_features, and xfrm_output_gso call skb_segment with features = 0, which means sctp gso would call skb_segment with sg = 0, and skb_segment would not work as expected. This patch is to fix it by setting features param with NETIF_F_SG when calling skb_segment so that it can go the right branch to process the skb's frag_list. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
sctp_addr_id2transport is a function for sockopt to look up assoc by address. As the address is from userspace, it can be a v4-mapped v6 address. But in sctp protocol stack, it always handles a v4-mapped v6 address as a v4 address. So it's necessary to convert it to a v4 address before looking up assoc by address. This patch is to fix it by calling sctp_verify_addr in which it can do this conversion before calling sctp_endpoint_lookup_assoc, just like what sctp_sendmsg and __sctp_connect do for the address from users. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 24 Jan, 2017 14 commits
-
-
David S. Miller authored
Robert Shearman says: ==================== net: Fix oops on state free after lwt module unload An oops is seen in lwtstate_free after an lwt ops module has been unloaded. This patchset fixes this by preventing modules implementing lwtunnel ops from being unloaded whilst there's state alive using those ops. The first patch adds fills in a new owner field in all lwt ops and the second patch makes use of this to reference count the modules as state is built and destroyed using them. Changes in v3: - don't put module reference if try_module_get fails on building state Changes in v2: - specify module owner for all modules as suggested by DaveM - reference count all modules building lwt state, not just those ops implementing destroy_state, as also suggested by DaveM. - rebased on top of David Ahern's lwtunnel changes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Robert Shearman authored
When attempting to free lwtunnel state after the module for the encap has been unloaded an oops occurs: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: lwtstate_free+0x18/0x40 [..] task: ffff88003e372380 task.stack: ffffc900001fc000 RIP: 0010:lwtstate_free+0x18/0x40 RSP: 0018:ffff88003fd83e88 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88002bbb3380 RCX: ffff88000c91a300 [..] Call Trace: <IRQ> free_fib_info_rcu+0x195/0x1a0 ? rt_fibinfo_free+0x50/0x50 rcu_process_callbacks+0x2d3/0x850 ? rcu_process_callbacks+0x296/0x850 __do_softirq+0xe4/0x4cb irq_exit+0xb0/0xc0 smp_apic_timer_interrupt+0x3d/0x50 apic_timer_interrupt+0x93/0xa0 [..] Code: e8 6e c6 fc ff 89 d8 5b 5d c3 bb de ff ff ff eb f4 66 90 66 66 66 66 90 55 48 89 e5 53 0f b7 07 48 89 fb 48 8b 04 c5 00 81 d5 81 <48> 8b 40 08 48 85 c0 74 13 ff d0 48 8d 7b 20 be 20 00 00 00 e8 The problem is after the module for the encap can be unloaded the corresponding ops is removed and is thus NULL here. Modules implementing lwtunnel ops should not be allowed to unload while there is state alive using those ops, so grab the module reference for the ops on creating lwtunnel state and of course release the reference when freeing the state. Fixes: 1104d9ba ("lwtunnel: Add destroy state operation") Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Robert Shearman authored
Modules implementing lwtunnel ops should not be allowed to unload while there is state alive using those ops, so specify the owning module for all lwtunnel ops. Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Parthasarathy Bhuvaragan says: ==================== tipc: topology server fixes for nametable soft lockup In this series, we revert the commit 333f7962 ("tipc: fix a race condition leading to subscriber refcnt bug") and provide an alternate solution to fix the race conditions in commits 2-4. We have to do this as the above commit introduced a nametbl soft lockup at module exit as described by patch#4. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Parthasarathy Bhuvaragan authored
In tipc_server_stop(), we iterate over the connections with limiting factor as server's idr_in_use. We ignore the fact that this variable is decremented in tipc_close_conn(), leading to premature exit. In this commit, we iterate until the we have no connections left. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: John Thompson <thompa.atl@gmail.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Parthasarathy Bhuvaragan authored
In tipc_conn_sendmsg(), we first queue the request to the outqueue followed by the connection state check. If the connection is not connected, we should not queue this message. In this commit, we reject the messages if the connection state is not CF_CONNECTED. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: John Thompson <thompa.atl@gmail.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Parthasarathy Bhuvaragan authored
Commit 333f7962 ("tipc: fix a race condition leading to subscriber refcnt bug") reveals a soft lockup while acquiring nametbl_lock. Before commit 333f7962, we call tipc_conn_shutdown() from tipc_close_conn() in the context of tipc_topsrv_stop(). In that context, we are allowed to grab the nametbl_lock. Commit 333f7962, moved tipc_conn_release (renamed from tipc_conn_shutdown) to the connection refcount cleanup. This allows either tipc_nametbl_withdraw() or tipc_topsrv_stop() to the cleanup. Since tipc_exit_net() first calls tipc_topsrv_stop() and then tipc_nametble_withdraw() increases the chances for the later to perform the connection cleanup. The soft lockup occurs in the call chain of tipc_nametbl_withdraw(), when it performs the tipc_conn_kref_release() as it tries to grab nametbl_lock again while holding it already. tipc_nametbl_withdraw() grabs nametbl_lock tipc_nametbl_remove_publ() tipc_subscrp_report_overlap() tipc_subscrp_send_event() tipc_conn_sendmsg() << if (con->flags != CF_CONNECTED) we do conn_put(), triggering the cleanup as refcount=0. >> tipc_conn_kref_release tipc_sock_release tipc_conn_release tipc_subscrb_delete tipc_subscrp_delete tipc_nametbl_unsubscribe << Soft Lockup >> The previous changes in this series fixes the race conditions fixed by commit 333f7962. Hence we can now revert the commit. Fixes: 333f7962 ("tipc: fix a race condition leading to subscriber refcnt bug") Reported-and-Tested-by: John Thompson <thompa.atl@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Parthasarathy Bhuvaragan authored
Until now, the generic server framework maintains the connection id's per subscriber in server's conn_idr. At tipc_close_conn, we remove the connection id from the server list, but the connection is valid until we call the refcount cleanup. Hence we have a window where the server allocates the same connection to an new subscriber leading to inconsistent reference count. We have another refcount warning we grab the refcount in tipc_conn_lookup() for connections with flag with CF_CONNECTED not set. This usually occurs at shutdown when the we stop the topology server and withdraw TIPC_CFG_SRV publication thereby triggering a withdraw message to subscribers. In this commit, we: 1. remove the connection from the server list at recount cleanup. 2. grab the refcount for a connection only if CF_CONNECTED is set. Tested-by: John Thompson <thompa.atl@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Parthasarathy Bhuvaragan authored
Until now, the subscribers keep track of the subscriptions using reference count at subscriber level. At subscription cancel or subscriber delete, we delete the subscription only if the timer was pending for the subscription. This approach is incorrect as: 1. del_timer() is not SMP safe, if on CPU0 the check for pending timer returns true but CPU1 might schedule the timer callback thereby deleting the subscription. Thus when CPU0 is scheduled, it deletes an invalid subscription. 2. We export tipc_subscrp_report_overlap(), which accesses the subscription pointer multiple times. Meanwhile the subscription timer can expire thereby freeing the subscription and we might continue to access the subscription pointer leading to memory violations. In this commit, we introduce subscription refcount to avoid deleting an invalid subscription. Reported-and-Tested-by: John Thompson <thompa.atl@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Parthasarathy Bhuvaragan authored
We trigger a soft lockup as we grab nametbl_lock twice if the node has a pending node up/down or link up/down event while: - we process an incoming named message in tipc_named_rcv() and perform an tipc_update_nametbl(). - we have pending backlog items in the name distributor queue during a nametable update using tipc_nametbl_publish() or tipc_nametbl_withdraw(). The following are the call chain associated: tipc_named_rcv() Grabs nametbl_lock tipc_update_nametbl() (publish/withdraw) tipc_node_subscribe()/unsubscribe() tipc_node_write_unlock() << lockup occurs if an outstanding node/link event exits, as we grabs nametbl_lock again >> tipc_nametbl_withdraw() Grab nametbl_lock tipc_named_process_backlog() tipc_update_nametbl() << rest as above >> The function tipc_node_write_unlock(), in addition to releasing the lock processes the outstanding node/link up/down events. To do this, we need to grab the nametbl_lock again leading to the lockup. In this commit we fix the soft lockup by introducing a fast variant of node_unlock(), where we just release the lock. We adapt the node_subscribe()/node_unsubscribe() to use the fast variants. Reported-and-Tested-by: John Thompson <thompa.atl@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pablo Neira Ayuso authored
Add missing set->ndeact update on each deactivated element from the set flush path. Otherwise, sets with fixed size break after flush since accounting breaks. # nft add set x y { type ipv4_addr\; size 2\; } # nft add element x y { 1.1.1.1 } # nft add element x y { 1.1.1.2 } # nft flush set x y # nft add element x y { 1.1.1.1 } <cmdline>:1:1-28: Error: Could not process rule: Too many open files in system Fixes: 8411b644 ("netfilter: nf_tables: support for set flushing") Reported-by: Elise Lennion <elise.lennion@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
The flush operation needs to modify set and element objects, so let's deconstify this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
If the element exists and no NLM_F_EXCL is specified, do not bump set->nelems, otherwise we leak one set element slot. This problem amplifies if the set is full since the abort path always decrements the counter for the -ENFILE case too, giving one spare extra slot. Fix this by moving set->nelems update to nft_add_set_elem() after successful element insertion. Moreover, remove the element if the set is full so there is no need to rely on the abort path to undo things anymore. Fixes: c016c7e4 ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Liping Zhang authored
First, log prefix will be truncated to NF_LOG_PREFIXLEN-1, i.e. 127, at nf_log_packet(), so the extra part is useless. Second, after adding a log rule with a very very long prefix, we will fail to dump the nft rules after this _special_ one, but acctually, they do exist. For example: # name_65000=$(printf "%0.sQ" {1..65000}) # nft add rule filter output log prefix "$name_65000" # nft add rule filter output counter # nft add rule filter output counter # nft list chain filter output table ip filter { chain output { type filter hook output priority 0; policy accept; } } So now, restrict the log prefix length to NF_LOG_PREFIXLEN-1. Fixes: 96518518 ("netfilter: add nftables") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-