- 28 Nov, 2018 1 commit
-
-
Taehee Yoo authored
There is no expression deactivation call from the rule replacement path, hence, chain counter is not decremented. A few steps to reproduce the problem: %nft add table ip filter %nft add chain ip filter c1 %nft add chain ip filter c1 %nft add rule ip filter c1 jump c2 %nft replace rule ip filter c1 handle 3 accept %nft flush ruleset <jump c2> expression means immediate NFT_JUMP to chain c2. Reference count of chain c2 is increased when the rule is added. When rule is deleted or replaced, the reference counter of c2 should be decreased via nft_rule_expr_deactivate() which calls nft_immediate_deactivate(). Splat looks like: [ 214.396453] WARNING: CPU: 1 PID: 21 at net/netfilter/nf_tables_api.c:1432 nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables] [ 214.398983] Modules linked in: nf_tables nfnetlink [ 214.398983] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 4.20.0-rc2+ #44 [ 214.398983] Workqueue: events nf_tables_trans_destroy_work [nf_tables] [ 214.398983] RIP: 0010:nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables] [ 214.398983] Code: 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8e 00 00 00 48 8b 7b 58 e8 e1 2c 4e c6 48 89 df e8 d9 2c 4e c6 eb 9a <0f> 0b eb 96 0f 0b e9 7e fe ff ff e8 a7 7e 4e c6 e9 a4 fe ff ff e8 [ 214.398983] RSP: 0018:ffff8881152874e8 EFLAGS: 00010202 [ 214.398983] RAX: 0000000000000001 RBX: ffff88810ef9fc28 RCX: ffff8881152876f0 [ 214.398983] RDX: dffffc0000000000 RSI: 1ffff11022a50ede RDI: ffff88810ef9fc78 [ 214.398983] RBP: 1ffff11022a50e9d R08: 0000000080000000 R09: 0000000000000000 [ 214.398983] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11022a50eba [ 214.398983] R13: ffff888114446e08 R14: ffff8881152876f0 R15: ffffed1022a50ed6 [ 214.398983] FS: 0000000000000000(0000) GS:ffff888116400000(0000) knlGS:0000000000000000 [ 214.398983] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 214.398983] CR2: 00007fab9bb5f868 CR3: 000000012aa16000 CR4: 00000000001006e0 [ 214.398983] Call Trace: [ 214.398983] ? nf_tables_table_destroy.isra.37+0x100/0x100 [nf_tables] [ 214.398983] ? __kasan_slab_free+0x145/0x180 [ 214.398983] ? nf_tables_trans_destroy_work+0x439/0x830 [nf_tables] [ 214.398983] ? kfree+0xdb/0x280 [ 214.398983] nf_tables_trans_destroy_work+0x5f5/0x830 [nf_tables] [ ... ] Fixes: bb7b40ae ("netfilter: nf_tables: bogus EBUSY in chain deletions") Reported by: Christoph Anton Mitterer <calestyo@scientia.net> Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914505 Link: https://bugzilla.kernel.org/show_bug.cgi?id=201791Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 26 Nov, 2018 6 commits
-
-
Taehee Yoo authored
All lists that reach the tree_nodes_free() function have both zero counter and true dead flag. The reason for this is that lists to be release are selected by nf_conncount_gc_list() which already decrements the list counter and sets on the dead flag. Therefore, this if statement in tree_nodes_free() is unnecessary and wrong. Fixes: 31568ec0 ("netfilter: nf_conncount: fix list_del corruption in conn_free") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Taehee Yoo authored
There is a reference counter to ensure that masquerade modules register notifiers only once. However, the existing reference counter approach is not safe, test commands are: while : do modprobe ip6t_MASQUERADE & modprobe nft_masq_ipv6 & modprobe -rv ip6t_MASQUERADE & modprobe -rv nft_masq_ipv6 & done numbers below represent the reference counter. -------------------------------------------------------- CPU0 CPU1 CPU2 CPU3 CPU4 [insmod] [insmod] [rmmod] [rmmod] [insmod] -------------------------------------------------------- 0->1 register 1->2 returns 2->1 returns 1->0 0->1 register <-- unregister -------------------------------------------------------- The unregistation of CPU3 should be processed before the registration of CPU4. In order to fix this, use a mutex instead of reference counter. splat looks like: [ 323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381] [ 323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n] [ 323.869574] irq event stamp: 194074 [ 323.898930] hardirqs last enabled at (194073): [<ffffffff90004a0d>] trace_hardirqs_on_thunk+0x1a/0x1c [ 323.898930] hardirqs last disabled at (194074): [<ffffffff90004a29>] trace_hardirqs_off_thunk+0x1a/0x1c [ 323.898930] softirqs last enabled at (182132): [<ffffffff922006ec>] __do_softirq+0x6ec/0xa3b [ 323.898930] softirqs last disabled at (182109): [<ffffffff90193426>] irq_exit+0x1a6/0x1e0 [ 323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27 [ 323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240 [ 323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 00 00 fc ff df eb 22 48 8d 7b 10 488 [ 323.898930] RSP: 0018:ffff888101597218 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 323.898930] RAX: 0000000000000000 RBX: ffffffffc04361c0 RCX: 0000000000000000 [ 323.898930] RDX: 1ffffffff26132ae RSI: ffffffffc04aa3c0 RDI: ffffffffc04361d0 [ 323.898930] RBP: ffffffffc04361c8 R08: 0000000000000000 R09: 0000000000000001 [ 323.898930] R10: ffff8881015972b0 R11: fffffbfff26132c4 R12: dffffc0000000000 [ 323.898930] R13: 0000000000000000 R14: 1ffff110202b2e44 R15: ffffffffc04aa3c0 [ 323.898930] FS: 00007f813ed41540(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000 [ 323.898930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 323.898930] CR2: 0000559bf2c9f120 CR3: 000000010bc80000 CR4: 00000000001006f0 [ 323.898930] Call Trace: [ 323.898930] ? atomic_notifier_chain_register+0x2d0/0x2d0 [ 323.898930] ? down_read+0x150/0x150 [ 323.898930] ? sched_clock_cpu+0x126/0x170 [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 323.898930] register_netdevice_notifier+0xbb/0x790 [ 323.898930] ? __dev_close_many+0x2d0/0x2d0 [ 323.898930] ? __mutex_unlock_slowpath+0x17f/0x740 [ 323.898930] ? wait_for_completion+0x710/0x710 [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 323.898930] ? up_write+0x6c/0x210 [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 324.127073] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 324.127073] nft_chain_filter_init+0x1e/0xe8a [nf_tables] [ 324.127073] nf_tables_module_init+0x37/0x92 [nf_tables] [ ... ] Fixes: 8dd33cc9 ("netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables") Fixes: be6b635c ("netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Taehee Yoo authored
register_{netdevice/inetaddr/inet6addr}_notifier may return an error value, this patch adds the code to handle these error paths. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Alin Nastac authored
When ip6_route_me_harder is invoked, it resets outgoing interface of: - link-local scoped packets sent by neighbor discovery - multicast packets sent by MLD host - multicast packets send by MLD proxy daemon that sets outgoing interface through IPV6_PKTINFO ipi6_ifindex Link-local and multicast packets must keep their original oif after ip6_route_me_harder is called. Signed-off-by: Alin Nastac <alin.nastac@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
syzbot was able to trigger the WARN in cttimeout_default_get() by passing UDPLITE as l4protocol. Alias UDPLITE to UDP, both use same timeout values. Furthermore, also fetch GRE timeouts. GRE is a bit more complicated, as it still can be a module and its netns_proto_gre struct layout isn't visible outside of the gre module. Can't move timeouts around, it appears conntrack sysctl unregister assumes net_generic() returns nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead. A followup nf-next patch could make gre tracker be built-in as well if needed, its not that large. Last, make the WARN() mention the missing protocol value in case anything else is missing. Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com Fixes: 8866df92 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Xin Long authored
ip_vs_dst_event is supposed to clean up all dst used in ipvs' destinations when a net dev is going down. But it works only when the dst's dev is the same as the dev from the event. Now with the same priority but late registration, ip_vs_dst_notifier is always called later than ipv6_dev_notf where the dst's dev is set to lo for NETDEV_DOWN event. As the dst's dev lo is not the same as the dev from the event in ip_vs_dst_event, ip_vs_dst_notifier doesn't actually work. Also as these dst have to wait for dest_trash_timer to clean them up. It would cause some non-permanent kernel warnings: unregister_netdevice: waiting for br0 to become free. Usage count = 3 To fix it, call ip_vs_dst_notifier earlier than ipv6_dev_notf by increasing its priority to ADDRCONF_NOTIFY_PRIORITY + 5. Note that for ipv4 route fib_netdev_notifier doesn't set dst's dev to lo in NETDEV_DOWN event, so this fix is only needed when IP_VS_IPV6 is defined. Fixes: 7a4f0761 ("IPVS: init and cleanup restructuring") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 17 Nov, 2018 1 commit
-
-
Taehee Yoo authored
In the htable_create(), hinfo is allocated by vmalloc() So that if error occurred, hinfo should be freed. Fixes: 11d5f157 ("netfilter: xt_hashlimit: Create revision 2 to support higher pps rates") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 13 Nov, 2018 2 commits
-
-
Florian Westphal authored
nft_compat ops do not have static storage duration, unlike all other expressions. When nf_tables_expr_destroy() returns, expr->ops might have been free'd already, so we need to store next address before calling expression destructor. For same reason, we can't deref match pointer after nft_xt_put(). This can be easily reproduced by adding msleep() before nft_match_destroy() returns. Fixes: 0ca743a5 ("netfilter: nf_tables: add compatibility layer for x_tables") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Taehee Yoo authored
xt_rateest_net_exit() was added to check whether rules are flushed successfully. but ->net_exit() callback is called earlier than ->destroy() callback. So that ->net_exit() callback can't check that. test commands: %ip netns add vm1 %ip netns exec vm1 iptables -t mangle -I PREROUTING -p udp \ --dport 1111 -j RATEEST --rateest-name ap \ --rateest-interval 250ms --rateest-ewma 0.5s %ip netns del vm1 splat looks like: [ 668.813518] WARNING: CPU: 0 PID: 87 at net/netfilter/xt_RATEEST.c:210 xt_rateest_net_exit+0x210/0x340 [xt_RATEEST] [ 668.813518] Modules linked in: xt_RATEEST xt_tcpudp iptable_mangle bpfilter ip_tables x_tables [ 668.813518] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc7+ #21 [ 668.813518] Workqueue: netns cleanup_net [ 668.813518] RIP: 0010:xt_rateest_net_exit+0x210/0x340 [xt_RATEEST] [ 668.813518] Code: 00 48 8b 85 30 ff ff ff 4c 8b 23 80 38 00 0f 85 24 01 00 00 48 8b 85 30 ff ff ff 4d 85 e4 4c 89 a5 58 ff ff ff c6 00 f8 74 b2 <0f> 0b 48 83 c3 08 4c 39 f3 75 b0 48 b8 00 00 00 00 00 fc ff df 49 [ 668.813518] RSP: 0018:ffff8801156c73f8 EFLAGS: 00010282 [ 668.813518] RAX: ffffed0022ad8e85 RBX: ffff880118928e98 RCX: 5db8012a00000000 [ 668.813518] RDX: ffff8801156c7428 RSI: 00000000cb1d185f RDI: ffff880115663b74 [ 668.813518] RBP: ffff8801156c74d0 R08: ffff8801156633c0 R09: 1ffff100236440be [ 668.813518] R10: 0000000000000001 R11: ffffed002367d852 R12: ffff880115142b08 [ 668.813518] R13: 1ffff10022ad8e81 R14: ffff880118928ea8 R15: dffffc0000000000 [ 668.813518] FS: 0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000 [ 668.813518] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 668.813518] CR2: 0000563aa69f4f28 CR3: 0000000105a16000 CR4: 00000000001006f0 [ 668.813518] Call Trace: [ 668.813518] ? unregister_netdevice_many+0xe0/0xe0 [ 668.813518] ? xt_rateest_net_init+0x2c0/0x2c0 [xt_RATEEST] [ 668.813518] ? default_device_exit+0x1ca/0x270 [ 668.813518] ? remove_proc_entry+0x1cd/0x390 [ 668.813518] ? dev_change_net_namespace+0xd00/0xd00 [ 668.813518] ? __init_waitqueue_head+0x130/0x130 [ 668.813518] ops_exit_list.isra.10+0x94/0x140 [ 668.813518] cleanup_net+0x45b/0x900 [ 668.813518] ? net_drop_ns+0x110/0x110 [ 668.813518] ? swapgs_restore_regs_and_return_to_usermode+0x3c/0x80 [ 668.813518] ? save_trace+0x300/0x300 [ 668.813518] ? lock_acquire+0x196/0x470 [ 668.813518] ? lock_acquire+0x196/0x470 [ 668.813518] ? process_one_work+0xb60/0x1de0 [ 668.813518] ? _raw_spin_unlock_irq+0x29/0x40 [ 668.813518] ? _raw_spin_unlock_irq+0x29/0x40 [ 668.813518] ? __lock_acquire+0x4500/0x4500 [ 668.813518] ? __lock_is_held+0xb4/0x140 [ 668.813518] process_one_work+0xc13/0x1de0 [ 668.813518] ? pwq_dec_nr_in_flight+0x3c0/0x3c0 [ 668.813518] ? set_load_weight+0x270/0x270 [ ... ] Fixes: 3427b2ab ("netfilter: make xt_rateest hash table per net") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 12 Nov, 2018 6 commits
-
-
Florian Westphal authored
Its possible to set both HANDLE and POSITION when replacing a rule. In this case, the rule at POSITION gets replaced using the userspace-provided handle. Rule handles are supposed to be generated by the kernel only. Duplicate handles should be harmless, however better disable this "feature" by only checking for the POSITION attribute on insert operations. Fixes: 5e948466 ("netfilter: nf_tables: add insert operation") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
Start flood ping for each cpu while loading/flushing rulesets to make sure we do not access already-free'd rules from nf_tables evaluation loop. Also add this to TARGETS so 'make run_tests' in selftest dir runs it automatically. This would have caught the bug fixed in previous change ("netfilter: nf_tables: do not skip inactive chains during generation update") sooner. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
There is no synchronization between packet path and the configuration plane. The packet path uses two arrays with rules, one contains the current (active) generation. The other either contains the last (obsolete) generation or the future one. Consider: cpu1 cpu2 nft_do_chain(c); delete c net->gen++; genbit = !!net->gen; rules = c->rg[genbit]; cpu1 ignores c when updating if c is not active anymore in the new generation. On cpu2, we now use rules from wrong generation, as c->rg[old] contains the rules matching 'c' whereas c->rg[new] was not updated and can even point to rules that have been free'd already, causing a crash. To fix this, make sure that 'current' to the 'next' generation are identical for chains that are going away so that c->rg[new] will just use the matching rules even if genbit was incremented already. Fixes: 0cbc06b3 ("netfilter: nf_tables: remove synchronize_rcu in commit phase") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Taehee Yoo authored
When list->count is 0, the list is deleted by GC. But list->count is never reached 0 because initial count value is 1 and it is increased when node is inserted. So that initial value of list->count should be 0. Originally GC always finds zero count list through deleting node and decreasing count. However, list may be left empty since node insertion may fail eg. allocaton problem. In order to solve this problem, GC routine also finds zero count list without deleting node. Fixes: cb2b36f5 ("netfilter: nf_conncount: Switch to plain list") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Taehee Yoo authored
nf_conncount_tuple is an element of nft_connlimit and that is deleted by conn_free(). Elements can be deleted by both GC routine and data path functions (nf_conncount_lookup, nf_conncount_add) and they call conn_free() to free elements. But conn_free() only protects lists, not each element. So that list_del corruption could occurred. The conn_free() doesn't check whether element is already deleted. In order to protect elements, dead flag is added. If an element is deleted, dead flag is set. The only conn_free() can delete elements so that both list lock and dead flag are enough to protect it. test commands: %nft add table ip filter %nft add chain ip filter input { type filter hook input priority 0\; } %nft add rule filter input meter test { ip id ct count over 2 } counter splat looks like: [ 1779.495778] list_del corruption, ffff8800b6e12008->prev is LIST_POISON2 (dead000000000200) [ 1779.505453] ------------[ cut here ]------------ [ 1779.506260] kernel BUG at lib/list_debug.c:50! [ 1779.515831] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 1779.516772] CPU: 0 PID: 33 Comm: kworker/0:2 Not tainted 4.19.0-rc6+ #22 [ 1779.516772] Workqueue: events_power_efficient nft_rhash_gc [nf_tables_set] [ 1779.516772] RIP: 0010:__list_del_entry_valid+0xd8/0x150 [ 1779.516772] Code: 39 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 89 ea 48 c7 c7 00 c3 5b 98 e8 0f dc 40 ff 0f 0b 48 c7 c7 60 c3 5b 98 e8 01 dc 40 ff <0f> 0b 48 c7 c7 c0 c3 5b 98 e8 f3 db 40 ff 0f 0b 48 c7 c7 20 c4 5b [ 1779.516772] RSP: 0018:ffff880119127420 EFLAGS: 00010286 [ 1779.516772] RAX: 000000000000004e RBX: dead000000000200 RCX: 0000000000000000 [ 1779.516772] RDX: 000000000000004e RSI: 0000000000000008 RDI: ffffed0023224e7a [ 1779.516772] RBP: ffff88011934bc10 R08: ffffed002367cea9 R09: ffffed002367cea9 [ 1779.516772] R10: 0000000000000001 R11: ffffed002367cea8 R12: ffff8800b6e12008 [ 1779.516772] R13: ffff8800b6e12010 R14: ffff88011934bc20 R15: ffff8800b6e12008 [ 1779.516772] FS: 0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000 [ 1779.516772] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1779.516772] CR2: 00007fc876534010 CR3: 000000010da16000 CR4: 00000000001006f0 [ 1779.516772] Call Trace: [ 1779.516772] conn_free+0x9f/0x2b0 [nf_conncount] [ 1779.516772] ? nf_ct_tmpl_alloc+0x2a0/0x2a0 [nf_conntrack] [ 1779.516772] ? nf_conncount_add+0x520/0x520 [nf_conncount] [ 1779.516772] ? do_raw_spin_trylock+0x1a0/0x1a0 [ 1779.516772] ? do_raw_spin_trylock+0x10/0x1a0 [ 1779.516772] find_or_evict+0xe5/0x150 [nf_conncount] [ 1779.516772] nf_conncount_gc_list+0x162/0x360 [nf_conncount] [ 1779.516772] ? nf_conncount_lookup+0xee0/0xee0 [nf_conncount] [ 1779.516772] ? _raw_spin_unlock_irqrestore+0x45/0x50 [ 1779.516772] ? trace_hardirqs_off+0x6b/0x220 [ 1779.516772] ? trace_hardirqs_on_caller+0x220/0x220 [ 1779.516772] nft_rhash_gc+0x16b/0x540 [nf_tables_set] [ ... ] Fixes: 5c789e13 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Taehee Yoo authored
conn_free() holds lock with spin_lock() and it is called by both nf_conncount_lookup() and nf_conncount_gc_list(). nf_conncount_lookup() is called from bottom-half context and nf_conncount_gc_list() from process context. So that spin_lock() call is not safe. Hence conn_free() should use spin_lock_bh() instead of spin_lock(). test commands: %nft add table ip filter %nft add chain ip filter input { type filter hook input priority 0\; } %nft add rule filter input meter test { ip saddr ct count over 2 } \ counter splat looks like: [ 461.996507] ================================ [ 461.998999] WARNING: inconsistent lock state [ 461.998999] 4.19.0-rc6+ #22 Not tainted [ 461.998999] -------------------------------- [ 461.998999] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 461.998999] kworker/0:2/134 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 461.998999] 00000000a71a559a (&(&list->list_lock)->rlock){+.?.}, at: conn_free+0x69/0x2b0 [nf_conncount] [ 461.998999] {IN-SOFTIRQ-W} state was registered at: [ 461.998999] _raw_spin_lock+0x30/0x70 [ 461.998999] nf_conncount_add+0x28a/0x520 [nf_conncount] [ 461.998999] nft_connlimit_eval+0x401/0x580 [nft_connlimit] [ 461.998999] nft_dynset_eval+0x32b/0x590 [nf_tables] [ 461.998999] nft_do_chain+0x497/0x1430 [nf_tables] [ 461.998999] nft_do_chain_ipv4+0x255/0x330 [nf_tables] [ 461.998999] nf_hook_slow+0xb1/0x160 [ ... ] [ 461.998999] other info that might help us debug this: [ 461.998999] Possible unsafe locking scenario: [ 461.998999] [ 461.998999] CPU0 [ 461.998999] ---- [ 461.998999] lock(&(&list->list_lock)->rlock); [ 461.998999] <Interrupt> [ 461.998999] lock(&(&list->list_lock)->rlock); [ 461.998999] [ 461.998999] *** DEADLOCK *** [ 461.998999] [ ... ] Fixes: 5c789e13 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
- 11 Nov, 2018 18 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: "One last pull request before heading to Vancouver for LPC, here we have: 1) Don't forget to free VSI contexts during ice driver unload, from Victor Raj. 2) Don't forget napi delete calls during device remove in ice driver, from Dave Ertman. 3) Don't request VLAN tag insertion of ibmvnic device when SKB doesn't have VLAN tags at all. 4) IPV4 frag handling code has to accomodate the situation where two threads try to insert the same fragment into the hash table at the same time. From Eric Dumazet. 5) Relatedly, don't flow separate on protocol ports for fragmented frames, also from Eric Dumazet. 6) Memory leaks in qed driver, from Denis Bolotin. 7) Correct valid MTU range in smsc95xx driver, from Stefan Wahren. 8) Validate cls_flower nested policies properly, from Jakub Kicinski. 9) Clearing of stats counters in mc88e6xxx driver doesn't retain important bits in the G1_STATS_OP register causing the chip to hang. Fix from Andrew Lunn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits) act_mirred: clear skb->tstamp on redirect net: dsa: mv88e6xxx: Fix clearing of stats counters tipc: fix link re-establish failure net: sched: cls_flower: validate nested enc_opts_policy to avoid warning net: mvneta: correct typo flow_dissector: do not dissect l4 ports for fragments net: qualcomm: rmnet: Fix incorrect assignment of real_dev net: aquantia: allow rx checksum offload configuration net: aquantia: invalid checksumm offload implementation net: aquantia: fixed enable unicast on 32 macvlan net: aquantia: fix potential IOMMU fault after driver unbind net: aquantia: synchronized flow control between mac/phy net: smsc95xx: Fix MTU range net: stmmac: Fix RX packet size > 8191 qed: Fix potential memory corruption qed: Fix SPQ entries not returned to pool in error flows qed: Fix blocking/unlimited SPQ entries leak qed: Fix memory/entry leak in qed_init_sp_request() inet: frags: better deal with smp races net: hns3: bugfix for not checking return value ...
-
Linus Torvalds authored
Merge tag 'kbuild-fixes-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - fix build errors in binrpm-pkg and bindeb-pkg targets - fix false positive matches in merge_config.sh - fix build version mismatch in deb-pkg target - fix dtbs_install handling in (bin)deb-pkg target - revert a commit that allows setlocalversion to write to source tree * tag 'kbuild-fixes-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: builddeb: Fix inclusion of dtbs in debian package Revert "scripts/setlocalversion: git: Make -dirty check more robust" kbuild: deb-pkg: fix too low build version number kconfig: merge_config: avoid false positive matches from comment lines kbuild: deb-pkg: fix bindeb-pkg breakage when O= is used kbuild: rpm-pkg: fix binrpm-pkg breakage when O= is used
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fixes from David Sterba: "Several fixes to recent release (4.19, fixes tagged for stable) and other fixes" * tag 'for-4.20-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix missing delayed iputs on unmount Btrfs: fix data corruption due to cloning of eof block Btrfs: fix infinite loop on inode eviction after deduplication of eof block Btrfs: fix deadlock on tree root leaf when finding free extent btrfs: avoid link error with CONFIG_NO_AUTO_INLINE btrfs: tree-checker: Fix misleading group system information Btrfs: fix missing data checksums after a ranged fsync (msync) btrfs: fix pinned underflow after transaction aborted Btrfs: fix cur_offset in the error case for nocow
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds authored
Pull ext4 fixes from Ted Ts'o: "A large number of ext4 bug fixes, mostly buffer and memory leaks on error return cleanup paths" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: missing !bh check in ext4_xattr_inode_write() ext4: fix buffer leak in __ext4_read_dirblock() on error path ext4: fix buffer leak in ext4_expand_extra_isize_ea() on error path ext4: fix buffer leak in ext4_xattr_move_to_block() on error path ext4: release bs.bh before re-using in ext4_xattr_block_find() ext4: fix buffer leak in ext4_xattr_get_block() on error path ext4: fix possible leak of s_journal_flag_rwsem in error path ext4: fix possible leak of sbi->s_group_desc_leak in error path ext4: remove unneeded brelse call in ext4_xattr_inode_update_ref() ext4: avoid possible double brelse() in add_new_gdb() on error path ext4: avoid buffer leak in ext4_orphan_add() after prior errors ext4: avoid buffer leak on shutdown in ext4_mark_iloc_dirty() ext4: fix possible inode leak in the retry loop of ext4_resize_fs() ext4: fix missing cleanup if ext4_alloc_flex_bg_array() fails while resizing ext4: add missing brelse() update_backups()'s error path ext4: add missing brelse() add_new_gdb_meta_bg()'s error path ext4: add missing brelse() in set_flexbg_block_bitmap()'s error path ext4: avoid potential extra brelse in setup_new_flex_group_blocks()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Thomas Gleixner: "A set of x86 fixes: - Cure the LDT remapping to user space on 5 level paging which ended up in the KASLR space - Remove LDT mapping before freeing the LDT pages - Make NFIT MCE handling more robust - Unbreak the VSMP build by removing the dependency on paravirt ops - Support broken PIT emulation on Microsoft hyperV - Don't trace vmware_sched_clock() to avoid tracer recursion - Remove -pipe from KBUILD CFLAGS which breaks clang and is also slower on GCC - Trivial coding style and typo fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/vmware: Do not trace vmware_sched_clock() x86/vsmp: Remove dependency on pv_irq_ops x86/ldt: Remove unused variable in map_ldt_struct() x86/ldt: Unmap PTEs for the slot before freeing LDT pages x86/mm: Move LDT remap out of KASLR region on 5-level paging acpi/nfit, x86/mce: Validate a MCE's address before using it acpi/nfit, x86/mce: Handle only uncorrectable machine checks x86/build: Remove -pipe from KBUILD_CFLAGS x86/hyper-v: Fix indentation in hv_do_fast_hypercall16() Documentation/x86: Fix typo in zero-page.txt x86/hyper-v: Enable PIT shutdown quirk clockevents/drivers/i8253: Add support for PIT shutdown quirk
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fixes from Thomas Gleixner: "A bunch of perf tooling fixes: - Make the Intel PT SQL viewer more robust - Make the Intel PT debug log more useful - Support weak groups in perf record so it's behaving the same way as perf stat - Display the LBR stats in callchain entries properly in perf top - Handle different PMu names with common prefix properlin in pert stat - Start syscall augmenting in perf trace. Preparation for architecture independent eBPF instrumentation of syscalls. - Fix build breakage in JVMTI perf lib - Fix arm64 tools build failure wrt smp_load_{acquire,release}" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Do not zero sample_id_all for group members perf tools: Fix undefined symbol scnprintf in libperf-jvmti.so perf beauty: Use SRCARCH, ARCH=x86_64 must map to "x86" to find the headers perf intel-pt: Add MTC and CYC timestamps to debug log perf intel-pt: Add more event information to debug log perf scripts python: exported-sql-viewer.py: Fix table find when table re-ordered perf scripts python: exported-sql-viewer.py: Add help window perf scripts python: exported-sql-viewer.py: Add Selected branches report perf scripts python: exported-sql-viewer.py: Fall back to /usr/local/lib/libxed.so perf top: Display the LBR stats in callchain entry perf stat: Handle different PMU names with common prefix perf record: Support weak groups perf evlist: Move perf_evsel__reset_weak_group into evlist perf augmented_syscalls: Start collecting pathnames in the BPF program perf trace: Fix setting of augmented payload when using eBPF + raw_syscalls perf trace: When augmenting raw_syscalls plug raw_syscalls:sys_exit too perf examples bpf: Start augmenting raw_syscalls:sys_{start,exit} tools headers barrier: Fix arm64 tools build failure wrt smp_load_{acquire,release}
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer fix from Thomas Gleixner: "Just the removal of a redundant call into the sched deadline overrun check" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Remove useless call to check_dl_overrun()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fixes from Thomas Gleixner: "Two small scheduler fixes: - Take hotplug lock in sched_init_smp(). Technically not really required, but lockdep will complain other. - Trivial comment fix in sched/fair" * 'sched/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix a comment in task_numa_fault() sched/core: Take the hotplug lock in sched_init_smp()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking build fix from Thomas Gleixner: "A single fix for a build fail with CONFIG_PROFILE_ALL_BRANCHES=y in the qspinlock code" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/qspinlock: Fix compile error
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull core fixes from Thomas Gleixner: "A couple of fixlets for the core: - Kernel doc function documentation fixes - Missing prototypes for weak watchdog functions" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: resource/docs: Complete kernel-doc style function documentation watchdog/core: Add missing prototypes for weak functions resource/docs: Fix new kernel-doc warnings
-
Eric Dumazet authored
If sch_fq is used at ingress, skbs that might have been timestamped by net_timestamp_set() if a packet capture is requesting timestamps could be delayed by arbitrary amount of time, since sch_fq time base is MONOTONIC. Fix this problem by moving code from sch_netem.c to act_mirred.c. Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
The mv88e6161 would sometime fail to probe with a timeout waiting for the switch to complete an operation. This operation is supposed to clear the statistics counters. However, due to a read/modify/write, without the needed mask, the operation actually carried out was more random, with invalid parameters, resulting in the switch not responding. We need to preserve the histogram mode bits, so apply a mask to keep them. Reported-by: Chris Healy <Chris.Healy@zii.aero> Fixes: 40cff8fc ("net: dsa: mv88e6xxx: Fix stats histogram mode") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Maloy authored
When a link failure is detected locally, the link is reset, the flag link->in_session is set to false, and a RESET_MSG with the 'stopping' bit set is sent to the peer. The purpose of this bit is to inform the peer that this endpoint just is going down, and that the peer should handle the reception of this particular RESET message as a local failure. This forces the peer to accept another RESET or ACTIVATE message from this endpoint before it can re-establish the link. This again is necessary to ensure that link session numbers are properly exchanged before the link comes up again. If a failure is detected locally at the same time at the peer endpoint this will do the same, which is also a correct behavior. However, when receiving such messages, the endpoints will not distinguish between 'stopping' RESETs and ordinary ones when it comes to updating session numbers. Both endpoints will copy the received session number and set their 'in_session' flags to true at the reception, while they are still expecting another RESET from the peer before they can go ahead and re-establish. This is contradictory, since, after applying the validation check referred to below, the 'in_session' flag will cause rejection of all such messages, and the link will never come up again. We now fix this by not only handling received RESET/STOPPING messages as a local failure, but also by omitting to set a new session number and the 'in_session' flag in such cases. Fixes: 7ea817f4 ("tipc: check session number before accepting link protocol messages") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rob Herring authored
Commit 37c8a5fa ("kbuild: consolidate Devicetree dtb build rules") moved the location of 'dtbs_install' target which caused dtbs to not be installed when building debian package with 'bindeb-pkg' target. Update the builddeb script to use the same logic that determines if there's a 'dtbs_install' target which is presence of the arch dts directory. Also, use CONFIG_OF_EARLY_FLATTREE instead of CONFIG_OF as that's a better indication of whether we are building dtbs. This commit will also have the side effect of installing dtbs on any arch that has dts files. Previously, it was dependent on whether the arch defined 'dtbs_install'. Fixes: 37c8a5fa ("kbuild: consolidate Devicetree dtb build rules") Reported-by: Nuno Gonçalves <nunojpg@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
-
Guenter Roeck authored
This reverts commit 6147b1cf. The reverted patch results in attempted write access to the source repository, even if that repository is mounted read-only. Output from "strace git status -uno --porcelain": getcwd("/tmp/linux-test", 129) = 16 open("/tmp/linux-test/.git/index.lock", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = -1 EROFS (Read-only file system) While git appears to be able to handle this situation, a monitored build environment (such as the one used for Chrome OS kernel builds) may detect it and bail out with an access violation error. On top of that, the attempted write access suggests that git _will_ write to the file even if a build output directory is specified. Users may have the reasonable expectation that the source repository remains untouched in that situation. Fixes: 6147b1cf ("scripts/setlocalversion: git: Make -dirty check more robust" Cc: Genki Sky <sky@genki.is> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
-
Masahiro Yamada authored
Since commit b41d920a ("kbuild: deb-pkg: split generating packaging and build"), the build version of the kernel contained in a deb package is too low by 1. Prior to the bad commit, the kernel was built first, then the number in .version file was read out, and written into the debian control file. Now, the debian control file is created before the kernel is actually compiled, which is causing the version number mismatch. Let the mkdebian script pass KBUILD_BUILD_VERSION=${revision} to require the build system to use the specified version number. Fixes: b41d920a ("kbuild: deb-pkg: split generating packaging and build") Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Tested-by: Doug Smythies <dsmythies@telus.net>
-
Masahiro Yamada authored
The current SED_CONFIG_EXP could match to comment lines in config fragment files, especially when CONFIG_PREFIX_ is empty. For example, Buildroot uses empty prefixing; starting symbols with BR2_ is just convention. Make the sed expression more robust against false positives from comment lines. The new sed expression matches to only valid patterns. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Petr Vorel <petr.vorel@gmail.com> Reviewed-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
-
- 10 Nov, 2018 6 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds authored
Pull tty/serial fixes from Greg KH: "Here are some small tty fixes for 4.20-rc2 One of these missed the original 4.19-final release, I missed that I hadn't done a pull request for it as it was in linux-next and my branch for a long time, that's my fault. The others are small, fixing some reported issues and finally fixing the termios mess for alpha so that glibc has a chance to implement some missing functionality that has been pending for many years now. All of these have been in linux-next with no reported issues" * tag 'tty-4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: sh-sci: Fix could not remove dev_attr_rx_fifo_timeout arch/alpha, termios: implement BOTHER, IBSHIFT and termios2 termios, tty/tty_baudrate.c: fix buffer overrun vt: fix broken display when running aptitude serial: sh-sci: Fix receive on SCIFA/SCIFB variants with DMA
-
git://anongit.freedesktop.org/drm/drmLinus Torvalds authored
Pull drm fixes from Dave Airlie: "drm: i915, amdgpu, sun4i, exynos and etnaviv fixes: - amdgpu has some display fixes, KFD ioctl fixes and a Vega20 bios interaction fix. - sun4i has some NULL checks added - i915 has a 32-bit system fix, LPE audio oops, and HDMI2.0 clock fixes. - Exynos has a 3 regression fixes (one frame counter, fbdev missing, dsi->panel check) - Etnaviv has a single fencing fix for GPU recovery" * tag 'drm-fixes-2018-11-11' of git://anongit.freedesktop.org/drm/drm: (39 commits) drm/amd/amdgpu/dm: Fix dm_dp_create_fake_mst_encoder() drm/amd/display: Drop reusing drm connector for MST drm/amd/display: Cleanup MST non-atomic code workaround drm/amd/powerplay: always use fast UCLK switching when UCLK DPM enabled drm/amd/powerplay: set a default fclk/gfxclk ratio drm/amdgpu/display/dce11: only enable FBC when selected drm/amdgpu/display/dm: handle FBC dc feature parameter drm/amdgpu/display/dc: add FBC to dc_config drm/amdgpu: add DC feature mask module parameter drm/amdgpu/display: check if fbc is available in set_static_screen_control (v2) drm/amdgpu/vega20: add CLK base offset drm/amd/display: Stop leaking planes drm/amd/display: Fix misleading buffer information Revert "drm/amd/display: set backlight level limit to 1" drm/amd: Update atom_smu_info_v3_3 structure drm/i915: Fix ilk+ watermarks when disabling pipes drm/sun4i: tcon: prevent tcon->panel dereference if NULL drm/sun4i: tcon: fix check of tcon->panel null pointer drm/i915: Don't oops during modeset shutdown after lpe audio deinit drm/i915: Mark pin flags as u64 ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds authored
Pull namespace fixes from Eric Biederman: "I believe all of these are simple obviously correct bug fixes. These fall into two groups: - Fixing the implementation of MNT_LOCKED which prevents lesser privileged users from seeing unders mounts created by more privileged users. - Fixing the extended uid and group mapping in user namespaces. As well as ensuring the code looks correct I have spot tested these changes as well and in my testing the fixes are working" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: mount: Prevent MNT_DETACH from disconnecting locked mounts mount: Don't allow copying MNT_UNBINDABLE|MNT_LOCKED mounts mount: Retest MNT_LOCKED in do_umount userns: also map extents in the reverse map to kernel IDs
-
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linuxLinus Torvalds authored
Pull clk fixes from Stephen Boyd: "A small set of fixes for clk drivers. One to fix a DT refcount imbalance, two to mark some Amlogic clks as critical, and one final one that fixes a clk name for the Qualcomm driver merged this cycle" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: qcom: gcc: Fix board clock node name clk: meson: axg: mark fdiv2 and fdiv3 as critical clk: meson-gxbb: set fclk_div3 as CLK_IS_CRITICAL clk: fixed-factor: fix of_node_get-put imbalance
-
git://people.freedesktop.org/~agd5f/linuxDave Airlie authored
Fixes for 4.20: - DC MST fixes - DC FBC fix - Vega20 updates to support the latest vbios - KFD type fixes for ioctl headers Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181108035551.2904-1-alexander.deucher@amd.com
-
git://anongit.freedesktop.org/drm/drm-miscDave Airlie authored
- sun4i: tcon->panel NULL deref protections (Giulio) Cc: Giulio Benetti <giulio.benetti@micronovasrl.com> Signed-off-by: Dave Airlie <airlied@redhat.com> From: Sean Paul <sean@poorly.run> Link: https://patchwork.freedesktop.org/patch/msgid/20181107205051.GA27823@art_vandelay
-