1. 18 Jul, 2018 10 commits
    • Julian Anastasov's avatar
      ipvs: drop conn templates under attack · 762c4007
      Julian Anastasov authored
      Before now, connection templates were ignored by the random
      dropentry procedure. But Michal Koutný suggests that we
      should add exception for connections under SYN attack.
      He provided patch that implements it for TCP:
      
      <quote>
      
      IPVS includes protection against filling the ip_vs_conn_tab by
      dropping 1/32 of feasible entries every second. The template
      entries (for persistent services) are never directly deleted by
      this mechanism but when a picked TCP connection entry is being
      dropped (1), the respective template entry is dropped too (realized
      by expiring 60 seconds after the connection entry being dropped).
      
      There is another mechanism that removes connection entries when they
      time out (2), in this case the associated template entry is not deleted.
      Under SYN flood template entries would accumulate (due to their entry
      longer timeout).
      
      The accumulation takes place also with drop_entry being enabled. Roughly
      15% ((31/32)^60) of SYN_RECV connections survive the dropping mechanism
      (1) and are removed by the timeout mechanism (2)(defaults to 60 seconds
      for SYN_RECV), thus template entries would still accumulate.
      
      The patch ensures that when a connection entry times out, we also remove
      the template entry from the table. To prevent breaking persistent
      services (since the connection may time out in already established state)
      we add a new entry flag to protect templates what spawned at least one
      established TCP connection.
      
      </quote>
      
      We already added ASSURED flag for the templates in previous patch, so
      that we can use it now to decide which connection templates should be
      dropped under attack. But we also have some cases that need special
      handling.
      
      We modify the dropentry procedure as follows:
      
      - Linux timers currently use LIFO ordering but we can not rely on
      this to drop controlling connections. So, set cp->timeout to 0
      to indicate that connection was dropped and that on expiration we
      should try to drop our controlling connections. As result, we can
      now avoid the ip_vs_conn_expire_now call.
      
      - move the cp->n_control check above, so that it avoids restarting
      the timer for controlling connections when not needed.
      
      - drop unassured connection templates here if they are not referred
      by any connections.
      
      On connection expiration: if connection was dropped (cp->timeout=0)
      try to drop our controlling connection except if it is a template
      in assured state.
      
      In ip_vs_conn_flush change order of ip_vs_conn_expire_now calls
      according to the LIFO timer expiration order. It should work
      faster for controlling connections with single controlled one.
      Suggested-by: default avatarMichal Koutný <mkoutny@suse.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      762c4007
    • Julian Anastasov's avatar
      ipvs: add assured state for conn templates · 27541143
      Julian Anastasov authored
      cp->state was not used for templates. Add support for state bits
      and for the first "assured" bit which indicates that some
      connection controlled by this template was established or assured
      by the real server. In a followup patch we will use it to drop
      templates under SYN attack.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      27541143
    • Julian Anastasov's avatar
      ipvs: provide just conn to ip_vs_state_name · ec1b28ca
      Julian Anastasov authored
      In preparation for followup patches, provide just the cp
      ptr to ip_vs_state_name.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ec1b28ca
    • Martynas Pumputis's avatar
      netfilter: nf_conntrack: resolve clash for matching conntracks · ed07d9a0
      Martynas Pumputis authored
      This patch enables the clash resolution for NAT (disabled in
      "590b52e1") if clashing conntracks match (i.e. both tuples are equal)
      and a protocol allows it.
      
      The clash might happen for a connections-less protocol (e.g. UDP) when
      two threads in parallel writes to the same socket and consequent calls
      to "get_unique_tuple" return the same tuples (incl. reply tuples).
      
      In this case it is safe to perform the resolution, as the losing CT
      describes the same mangling as the winning CT, so no modifications to
      the packet are needed, and the result of rules traversal for the loser's
      packet stays valid.
      Signed-off-by: default avatarMartynas Pumputis <martynas@weave.works>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ed07d9a0
    • Yi-Hung Wei's avatar
      netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search · 5c789e13
      Yi-Hung Wei authored
      This patch is originally from Florian Westphal.
      
      This patch does the following 3 main tasks.
      
      1) Add list lock to 'struct nf_conncount_list' so that we can
      alter the lists containing the individual connections without holding the
      main tree lock.  It would be useful when we only need to add/remove to/from
      a list without allocate/remove a node in the tree.  With this change, we
      update nft_connlimit accordingly since we longer need to maintain
      a list lock in nft_connlimit now.
      
      2) Use RCU for the initial tree search to improve tree look up performance.
      
      3) Add a garbage collection worker. This worker is schedule when there
      are excessive tree node that needed to be recycled.
      
      Moreover,the rbnode reclaim logic is moved from search tree to insert tree
      to avoid race condition.
      Signed-off-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      5c789e13
    • Yi-Hung Wei's avatar
      netfilter: nf_conncount: Split insert and traversal · 34848d5c
      Yi-Hung Wei authored
      This patch is originally from Florian Westphal.
      
      When we have a very coarse grouping, e.g. by large subnets, zone id,
      etc, it's likely that we do not need to do tree rotation because
      we'll find a node where we can attach new entry.  Based on this
      observation, we split tree traversal and insertion.
      
      Later on, we can make traversal lockless (tree protected
      by RCU), and add extra lock in the individual nodes to protect list
      insertion/deletion, thereby allowing parallel insert/delete in different
      tree nodes.
      Signed-off-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      34848d5c
    • Yi-Hung Wei's avatar
      netfilter: nf_conncount: Move locking into count_tree() · 2ba39118
      Yi-Hung Wei authored
      This patch is originally from Florian Westphal.
      
      This is a preparation patch to allow lockless traversal
      of the tree via RCU.
      Signed-off-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2ba39118
    • Yi-Hung Wei's avatar
      netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanup · 976afca1
      Yi-Hung Wei authored
      This patch is originally from Florian Westphal.
      
      This patch does the following three tasks.
      
      It applies the same early exit technique for nf_conncount_lookup().
      
      Since now we keep the number of connections in 'struct nf_conncount_list',
      we no longer need to return the count in nf_conncount_lookup().
      
      Moreover, we expose the garbage collection function nf_conncount_gc_list()
      for nft_connlimit.
      Signed-off-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      976afca1
    • Yi-Hung Wei's avatar
      netfilter: nf_conncount: Switch to plain list · cb2b36f5
      Yi-Hung Wei authored
      Original patch is from Florian Westphal.
      
      This patch switches from hlist to plain list to store the list of
      connections with the same filtering key in nf_conncount. With the
      plain list, we can insert new connections at the tail, so over time
      the beginning of list holds long-running connections and those are
      expired, while the newly creates ones are at the end.
      
      Later on, we could probably move checked ones to the end of the list,
      so the next run has higher chance to reclaim stale entries in the front.
      Signed-off-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      cb2b36f5
    • Yi-Hung Wei's avatar
      netfilter: nf_conncount: Early exit for garbage collection · 2a406e8a
      Yi-Hung Wei authored
      This patch is originally from Florian Westphal.
      
      We use an extra function with early exit for garbage collection.
      It is not necessary to traverse the full list for every node since
      it is enough to zap a couple of entries for garbage collection.
      Signed-off-by: default avatarYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2a406e8a
  2. 17 Jul, 2018 2 commits
    • Máté Eckl's avatar
      netfilter: Kconfig: Change select IPv6 dependencies · 5d400a49
      Máté Eckl authored
      ... from IPV6 to NF_TABLES_IPV6 and IP6_NF_IPTABLES.
      
      In some cases module selects depend on IPV6, but this means that they
      select another module even if eg. NF_TABLES_IPV6 is not set in which
      case the selected module is useless due to the lack of IPv6 nf_tables
      functionality.
      
      The same applies for IP6_NF_IPTABLES and iptables.
      
      Joint work with: Arnd Bermann <arnd@arndb.de>
      Signed-off-by: default avatarMáté Eckl <ecklm94@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      5d400a49
    • Florian Westphal's avatar
      netfilter: conntrack: remove l3proto abstraction · a0ae2562
      Florian Westphal authored
      This unifies ipv4 and ipv6 protocol trackers and removes the l3proto
      abstraction.
      
      This gets rid of all l3proto indirect calls and the need to do
      a lookup on the function to call for l3 demux.
      
      It increases module size by only a small amount (12kbyte), so this reduces
      size because nf_conntrack.ko is useless without either nf_conntrack_ipv4
      or nf_conntrack_ipv6 module.
      
      before:
         text    data     bss     dec     hex filename
         7357    1088       0    8445    20fd nf_conntrack_ipv4.ko
         7405    1084       4    8493    212d nf_conntrack_ipv6.ko
        72614   13689     236   86539   1520b nf_conntrack.ko
       19K nf_conntrack_ipv4.ko
       19K nf_conntrack_ipv6.ko
      179K nf_conntrack.ko
      
      after:
         text    data     bss     dec     hex filename
        79277   13937     236   93450   16d0a nf_conntrack.ko
        191K nf_conntrack.ko
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a0ae2562
  3. 16 Jul, 2018 28 commits