1. 09 Feb, 2014 1 commit
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · f41f0319
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/nftables/IPVS fixes for net
      
      The following patchset contains Netfilter/IPVS fixes, mostly nftables
      fixes, most relevantly they are:
      
      * Fix a crash in the h323 conntrack NAT helper due to expectation list
        corruption, from Alexey Dobriyan.
      
      * A couple of RCU race fixes for conntrack, one manifests by hitting BUG_ON
        in nf_nat_setup_info() and the destroy path, patches from Andrey Vagin and
        me.
      
      * Dump direction attribute in nft_ct only if it is set, from Arturo
        Borrero.
      
      * Fix IPVS bug in its own connection tracking system that may lead to
        copying only 4 bytes of the IPv6 address when initializing the
        ip_vs_conn object, from Michal Kubecek.
      
      * Fix -EBUSY errors in nftables when deleting the rules, chain and tables
        in a row due mixture of asynchronous and synchronous object releasing,
        from me.
      
      * Three fixes for the nf_tables set infrastructure when using intervals and
        mappings, from me.
      
      * Four patches to fixing the nf_tables log, reject and ct expressions from
        the new inet table, from Patrick McHardy.
      
      * Fix memory overrun in the map that is used to dynamically allocate names
        from anonymous sets, also from Patrick.
      
      * Fix a potential oops if you dump a set with NFPROTO_UNSPEC and a table
        name, from Patrick McHardy.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f41f0319
  2. 07 Feb, 2014 23 commits
  3. 06 Feb, 2014 6 commits
  4. 05 Feb, 2014 10 commits
    • Patrick McHardy's avatar
      netfilter: nf_tables: add AF specific expression support · 64d46806
      Patrick McHardy authored
      For the reject module, we need to add AF-specific implementations to
      get rid of incorrect module dependencies. Try to load an AF-specific
      module first and fall back to generic modules.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      64d46806
    • Patrick McHardy's avatar
      netfilter: nft_ct: fix missing NFT_CT_L3PROTOCOL key in validity checks · 51292c07
      Patrick McHardy authored
      The key was missing in the list of valid keys, add it.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      51292c07
    • Patrick McHardy's avatar
      netfilter: nf_tables: fix potential oops when dumping sets · ec2c9935
      Patrick McHardy authored
      Commit c9c8e485 (netfilter: nf_tables: dump sets in all existing families)
      changed nft_ctx_init_from_setattr() to only look up the address family if it
      is not NFPROTO_UNSPEC. However if it is NFPROTO_UNSPEC and a table attribute
      is given, nftables_afinfo_lookup() will dereference the NULL afi pointer.
      
      Fix by checking for non-NULL afi and also move a check added by that commit
      to the proper position.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ec2c9935
    • Patrick McHardy's avatar
      netfilter: nf_tables: fix overrun in nf_tables_set_alloc_name() · 53b70287
      Patrick McHardy authored
      The map that is used to allocate anonymous sets is indeed
      BITS_PER_BYTE * PAGE_SIZE long.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      53b70287
    • Pablo Neira Ayuso's avatar
      netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt · e53376be
      Pablo Neira Ayuso authored
      With this patch, the conntrack refcount is initially set to zero and
      it is bumped once it is added to any of the list, so we fulfill
      Eric's golden rule which is that all released objects always have a
      refcount that equals zero.
      
      Andrey Vagin reports that nf_conntrack_free can't be called for a
      conntrack with non-zero ref-counter, because it can race with
      nf_conntrack_find_get().
      
      A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
      ref-counter says that this conntrack is used. So when we release
      a conntrack with non-zero counter, we break this assumption.
      
      CPU1                                    CPU2
      ____nf_conntrack_find()
                                              nf_ct_put()
                                               destroy_conntrack()
                                              ...
                                              init_conntrack
                                               __nf_conntrack_alloc (set use = 1)
      atomic_inc_not_zero(&ct->use) (use = 2)
                                               if (!l4proto->new(ct, skb, dataoff, timeouts))
                                                nf_conntrack_free(ct); (use = 2 !!!)
                                              ...
                                              __nf_conntrack_alloc (set use = 1)
       if (!nf_ct_key_equal(h, tuple, zone))
        nf_ct_put(ct); (use = 0)
         destroy_conntrack()
                                              /* continue to work with CT */
      
      After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
      race in nf_conntrack_find_get" another bug was triggered in
      destroy_conntrack():
      
      <4>[67096.759334] ------------[ cut here ]------------
      <2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
      ...
      <4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G         C ---------------    2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
      <4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>]  [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
      <4>[67096.760255] Call Trace:
      <4>[67096.760255]  [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
      <4>[67096.760255]  [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
      <4>[67096.760255]  [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
      <4>[67096.760255]  [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
      <4>[67096.760255]  [<ffffffff81484419>] nf_iterate+0x69/0xb0
      <4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
      <4>[67096.760255]  [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
      <4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
      <4>[67096.760255]  [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
      <4>[67096.760255]  [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
      <4>[67096.760255]  [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
      <4>[67096.760255]  [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
      <4>[67096.760255]  [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
      <4>[67096.760255]  [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
      <4>[67096.760255]  [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
      <4>[67096.760255]  [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
      <4>[67096.760255]  [<ffffffff814457c9>] sys_sendto+0x139/0x190
      <4>[67096.760255]  [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
      <4>[67096.760255]  [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
      <4>[67096.760255]  [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
      <4>[67096.760255]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
      
      I have reused the original title for the RFC patch that Andrey posted and
      most of the original patch description.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Andrew Vagin <avagin@parallels.com>
      Cc: Florian Westphal <fw@strlen.de>
      Reported-by: default avatarAndrew Vagin <avagin@parallels.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarAndrew Vagin <avagin@parallels.com>
      e53376be
    • Alexey Dobriyan's avatar
      netfilter: nf_nat_h323: fix crash in nf_ct_unlink_expect_report() · 829d9315
      Alexey Dobriyan authored
      Similar bug fixed in SIP module in 3f509c68 ("netfilter: nf_nat_sip: fix
      incorrect handling of EBUSY for RTCP expectation").
      
      BUG: unable to handle kernel paging request at 00100104
      IP: [<f8214f07>] nf_ct_unlink_expect_report+0x57/0xf0 [nf_conntrack]
      ...
      Call Trace:
        [<c0244bd8>] ? del_timer+0x48/0x70
        [<f8215687>] nf_ct_remove_expectations+0x47/0x60 [nf_conntrack]
        [<f8211c99>] nf_ct_delete_from_lists+0x59/0x90 [nf_conntrack]
        [<f8212e5e>] death_by_timeout+0x14e/0x1c0 [nf_conntrack]
        [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack]
        [<c024442d>] call_timer_fn+0x1d/0x80
        [<c024461e>] run_timer_softirq+0x18e/0x1a0
        [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack]
        [<c023e6f3>] __do_softirq+0xa3/0x170
        [<c023e650>] ? __local_bh_enable+0x70/0x70
        <IRQ>
        [<c023e587>] ? irq_exit+0x67/0xa0
        [<c0202af6>] ? do_IRQ+0x46/0xb0
        [<c027ad05>] ? clockevents_notify+0x35/0x110
        [<c066ac6c>] ? common_interrupt+0x2c/0x40
        [<c056e3c1>] ? cpuidle_enter_state+0x41/0xf0
        [<c056e6fb>] ? cpuidle_idle_call+0x8b/0x100
        [<c02085f8>] ? arch_cpu_idle+0x8/0x30
        [<c027314b>] ? cpu_idle_loop+0x4b/0x140
        [<c0273258>] ? cpu_startup_entry+0x18/0x20
        [<c066056d>] ? rest_init+0x5d/0x70
        [<c0813ac8>] ? start_kernel+0x2ec/0x2f2
        [<c081364f>] ? repair_env_string+0x5b/0x5b
        [<c0813269>] ? i386_start_kernel+0x33/0x35
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      829d9315
    • Andrey Vagin's avatar
      netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get · c6825c09
      Andrey Vagin authored
      Lets look at destroy_conntrack:
      
      hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
      ...
      nf_conntrack_free(ct)
      	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
      
      net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
      
      The hash is protected by rcu, so readers look up conntracks without
      locks.
      A conntrack is removed from the hash, but in this moment a few readers
      still can use the conntrack. Then this conntrack is released and another
      thread creates conntrack with the same address and the equal tuple.
      After this a reader starts to validate the conntrack:
      * It's not dying, because a new conntrack was created
      * nf_ct_tuple_equal() returns true.
      
      But this conntrack is not initialized yet, so it can not be used by two
      threads concurrently. In this case BUG_ON may be triggered from
      nf_nat_setup_info().
      
      Florian Westphal suggested to check the confirm bit too. I think it's
      right.
      
      task 1			task 2			task 3
      			nf_conntrack_find_get
      			 ____nf_conntrack_find
      destroy_conntrack
       hlist_nulls_del_rcu
       nf_conntrack_free
       kmem_cache_free
      						__nf_conntrack_alloc
      						 kmem_cache_alloc
      						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
      			 if (nf_ct_is_dying(ct))
      			 if (!nf_ct_tuple_equal()
      
      I'm not sure, that I have ever seen this race condition in a real life.
      Currently we are investigating a bug, which is reproduced on a few nodes.
      In our case one conntrack is initialized from a few tasks concurrently,
      we don't have any other explanation for this.
      
      <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
      ...
      <4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>]  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat]
      ...
      <4>[46267.085549] Call Trace:
      <4>[46267.085622]  [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat]
      <4>[46267.085697]  [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
      <4>[46267.085770]  [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat]
      <4>[46267.085843]  [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat]
      <4>[46267.085919]  [<ffffffff814841b9>] nf_iterate+0x69/0xb0
      <4>[46267.085991]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
      <4>[46267.086063]  [<ffffffff81484374>] nf_hook_slow+0x74/0x110
      <4>[46267.086133]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
      <4>[46267.086207]  [<ffffffff814b5890>] ? dst_output+0x0/0x20
      <4>[46267.086277]  [<ffffffff81495204>] ip_output+0xa4/0xc0
      <4>[46267.086346]  [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910
      <4>[46267.086419]  [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0
      <4>[46267.086491]  [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50
      <4>[46267.086562]  [<ffffffff81444d67>] sock_sendmsg+0x117/0x140
      <4>[46267.086638]  [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20
      <4>[46267.086712]  [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40
      <4>[46267.086785]  [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80
      <4>[46267.086858]  [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20
      <4>[46267.086936]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
      <4>[46267.087006]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
      <4>[46267.087081]  [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0
      <4>[46267.087151]  [<ffffffff81445599>] sys_sendto+0x139/0x190
      <4>[46267.087229]  [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0
      <4>[46267.087303]  [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200
      <4>[46267.087378]  [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290
      <4>[46267.087454]  [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210
      <4>[46267.087531]  [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210
      <4>[46267.087607]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
      <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
      <1>[46267.088023] RIP  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c6825c09
    • Patrick McHardy's avatar
      netfilter: nf_tables: fix oops when deleting a chain with references · 3dd7279f
      Patrick McHardy authored
      The following commands trigger an oops:
      
       # nft -i
       nft> add table filter
       nft> add chain filter input { type filter hook input priority 0; }
       nft> add chain filter test
       nft> add rule filter input jump test
       nft> delete chain filter test
      
      We need to check the chain use counter before allowing destruction since
      we might have references from sets or jump rules.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=69341Reported-by: default avatarMatthew Ife <deleriux1@gmail.com>
      Tested-by: default avatarMatthew Ife <deleriux1@gmail.com>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3dd7279f
    • Arturo Borrero's avatar
      netfilter: nft_ct: fix unconditional dump of 'dir' attr · 2a53bfb3
      Arturo Borrero authored
      We want to make sure that the information that we get from the kernel can
      be reinjected without troubles. The kernel shouldn't return an attribute
      that is not required, or even prohibited.
      
      Dumping unconditionally NFTA_CT_DIRECTION could lead an application in
      userspace to interpret that the attribute was originally set, while it
      was not.
      Signed-off-by: default avatarArturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2a53bfb3
    • Andy Zhou's avatar
      openvswitch: Suppress error messages on megaflow updates · c14e0953
      Andy Zhou authored
      With subfacets, we'd expect megaflow updates message to carry
      the original micro flow. If not, EINVAL is returned and kernel
      logs an error message.  Now that the user space subfacet layer is
      removed, it is expected that flow updates can arrive with a
      micro flow other than the original. Change the return code to
      EEXIST and remove the kernel error log message.
      Reported-by: default avatarBen Pfaff <blp@nicira.com>
      Signed-off-by: default avatarAndy Zhou <azhou@nicira.com>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      c14e0953