1. 09 May, 2012 25 commits
  2. 08 May, 2012 15 commits
    • David S. Miller's avatar
      9bb862be
    • Pablo Neira Ayuso's avatar
      netfilter: remove ip_queue support · d16cf20e
      Pablo Neira Ayuso authored
      This patch removes ip_queue support which was marked as obsolete
      years ago. The nfnetlink_queue modules provides more advanced
      user-space packet queueing mechanism.
      
      This patch also removes capability code included in SELinux that
      refers to ip_queue. Otherwise, we break compilation.
      
      Several warning has been sent regarding this to the mailing list
      in the past month without anyone rising the hand to stop this
      with some strong argument.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d16cf20e
    • Pablo Neira Ayuso's avatar
      netfilter: nf_conntrack: fix explicit helper attachment and NAT · 6714cf54
      Pablo Neira Ayuso authored
      Explicit helper attachment via the CT target is broken with NAT
      if non-standard ports are used. This problem was hidden behind
      the automatic helper assignment routine. Thus, it becomes more
      noticeable now that we can disable the automatic helper assignment
      with Eric Leblond's:
      
      9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment
      
      Basically, nf_conntrack_alter_reply asks for looking up the helper
      up if NAT is enabled. Unfortunately, we don't have the conntrack
      template at that point anymore.
      
      Since we don't want to rely on the automatic helper assignment,
      we can skip the second look-up and stick to the helper that was
      attached by iptables. With the CT target, the user is in full
      control of helper attachment, thus, the policy is to trust what
      the user explicitly configures via iptables (no automatic magic
      anymore).
      
      Interestingly, this bug was hidden by the automatic helper look-up
      code. But it can be easily trigger if you attach the helper in
      a non-standard port, eg.
      
      iptables -I PREROUTING -t raw -p tcp --dport 8888 \
      	-j CT --helper ftp
      
      And you disabled the automatic helper assignment.
      
      I added the IPS_HELPER_BIT that allows us to differenciate between
      a helper that has been explicitly attached and those that have been
      automatically assigned. I didn't come up with a better solution
      (having backward compatibility in mind).
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6714cf54
    • Kelvie Wong's avatar
      netfilter: nf_ct_expect: partially implement ctnetlink_change_expect · 9768e1ac
      Kelvie Wong authored
      This refreshes the "timeout" attribute in existing expectations if one is
      given.
      
      The use case for this would be for userspace helpers to extend the lifetime
      of the expectation when requested, as this is not possible right now
      without deleting/recreating the expectation.
      
      I use this specifically for forwarding DCERPC traffic through:
      
      DCERPC has a port mapper daemon that chooses a (seemingly) random port for
      future traffic to go to. We expect this traffic (with a reasonable
      timeout), but sometimes the port mapper will tell the client to continue
      using the same port. This allows us to extend the expectation accordingly.
      Signed-off-by: default avatarKelvie Wong <kelvie@ieee.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9768e1ac
    • Hans Schillstrom's avatar
      net: export sysctl_[r|w]mem_max symbols needed by ip_vs_sync · 6d8ebc8a
      Hans Schillstrom authored
      To build ip_vs as a module sysctl_rmem_max and sysctl_wmem_max
      needs to be exported.
      
      The dependency was added by "ipvs: wakeup master thread" patch.
      Signed-off-by: default avatarHans Schillstrom <hans.schillstrom@ericsson.com>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6d8ebc8a
    • H Hartley Sweeten's avatar
      ipvs: ip_vs_proto: local functions should not be exposed globally · 068d5220
      H Hartley Sweeten authored
      Functions not referenced outside of a source file should be marked
      static to prevent it from being exposed globally.
      
      This quiets the sparse warnings:
      
      warning: symbol '__ipvs_proto_data_get' was not declared. Should it be static?
      Signed-off-by: default avatarH Hartley Sweeten <hsweeten@visionengravers.com>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      068d5220
    • H Hartley Sweeten's avatar
      ipvs: ip_vs_ftp: local functions should not be exposed globally · d5cce208
      H Hartley Sweeten authored
      Functions not referenced outside of a source file should be marked
      static to prevent it from being exposed globally.
      
      This quiets the sparse warnings:
      
      warning: symbol 'ip_vs_ftp_init' was not declared. Should it be static?
      Signed-off-by: default avatarH Hartley Sweeten <hsweeten@visionengravers.com>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      d5cce208
    • Pablo Neira Ayuso's avatar
      ipvs: optimize the use of flags in ip_vs_bind_dest · 6b324dbf
      Pablo Neira Ayuso authored
      	cp->flags is marked volatile but ip_vs_bind_dest
      can safely modify the flags, so save some CPU cycles by
      using temp variable.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      6b324dbf
    • Pablo Neira Ayuso's avatar
      ipvs: add support for sync threads · f73181c8
      Pablo Neira Ayuso authored
      	Allow master and backup servers to use many threads
      for sync traffic. Add sysctl var "sync_ports" to define the
      number of threads. Every thread will use single UDP port,
      thread 0 will use the default port 8848 while last thread
      will use port 8848+sync_ports-1.
      
      	The sync traffic for connections is scheduled to many
      master threads based on the cp address but one connection is
      always assigned to same thread to avoid reordering of the
      sync messages.
      
      	Remove ip_vs_sync_switch_mode because this check
      for sync mode change is still risky. Instead, check for mode
      change under sync_buff_lock.
      
      	Make sure the backup socks do not block on reading.
      
      Special thanks to Aleksey Chudov for helping in all tests.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Tested-by: default avatarAleksey Chudov <aleksey.chudov@gmail.com>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      f73181c8
    • Julian Anastasov's avatar
      ipvs: reduce sync rate with time thresholds · 749c42b6
      Julian Anastasov authored
      	Add two new sysctl vars to control the sync rate with the
      main idea to reduce the rate for connection templates because
      currently it depends on the packet rate for controlled connections.
      This mechanism should be useful also for normal connections
      with high traffic.
      
      sync_refresh_period: in seconds, difference in reported connection
      	timer that triggers new sync message. It can be used to
      	avoid sync messages for the specified period (or half of
      	the connection timeout if it is lower) if connection state
      	is not changed from last sync.
      
      sync_retries: integer, 0..3, defines sync retries with period of
      	sync_refresh_period/8. Useful to protect against loss of
      	sync messages.
      
      	Allow sysctl_sync_threshold to be used with
      sysctl_sync_period=0, so that only single sync message is sent
      if sync_refresh_period is also 0.
      
      	Add new field "sync_endtime" in connection structure to
      hold the reported time when connection expires. The 2 lowest
      bits will represent the retry count.
      
      	As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
      avoid division by zero.
      
      	Special thanks to Aleksey Chudov for being patient with me,
      for his extensive reports and helping in all tests.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Tested-by: default avatarAleksey Chudov <aleksey.chudov@gmail.com>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      749c42b6
    • Pablo Neira Ayuso's avatar
      ipvs: wakeup master thread · 1c003b15
      Pablo Neira Ayuso authored
      	High rate of sync messages in master can lead to
      overflowing the socket buffer and dropping the messages.
      Fixed sleep of 1 second without wakeup events is not suitable
      for loaded masters,
      
      	Use delayed_work to schedule sending for queued messages
      and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
      reduce the rate of wakeups but to avoid sending long bursts we
      wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.
      
      	Add hard limit for the queued messages before sending
      by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
      the memory pages but actually represents number of messages.
      It will protect us from allocating large parts of memory
      when the sending rate is lower than the queuing rate.
      
      	As suggested by Pablo, add new sysctl var
      "sync_sock_size" to configure the SNDBUF (master) or
      RCVBUF (slave) socket limit. Default value is 0 (preserve
      system defaults).
      
      	Change the master thread to detect and block on
      SNDBUF overflow, so that we do not drop messages when
      the socket limit is low but the sync_qlen_max limit is
      not reached. On ENOBUFS or other errors just drop the
      messages.
      
      	Change master thread to enter TASK_INTERRUPTIBLE
      state early, so that we do not miss wakeups due to messages or
      kthread_should_stop event.
      
      Thanks to Pablo Neira Ayuso for his valuable feedback!
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      1c003b15
    • Julian Anastasov's avatar
      ipvs: always update some of the flags bits in backup · cdcc5e90
      Julian Anastasov authored
      	As the goal is to mirror the inactconns/activeconns
      counters in the backup server, make sure the cp->flags are
      updated even if cp is still not bound to dest. If cp->flags
      are not updated ip_vs_bind_dest will rely only on the initial
      flags when updating the counters. To avoid mistakes and
      complicated checks for protocol state rely only on the
      IP_VS_CONN_F_INACTIVE bit when updating the counters.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Tested-by: default avatarAleksey Chudov <aleksey.chudov@gmail.com>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      cdcc5e90
    • Julian Anastasov's avatar
      ipvs: fix ip_vs_try_bind_dest to rebind app and transmitter · 882a844b
      Julian Anastasov authored
      	Initially, when the synced connection is created we
      use the forwarding method provided by master but once we
      bind to destination it can be changed. As result, we must
      update the application and the transmitter.
      
      	As ip_vs_try_bind_dest is called always for connections
      that require dest binding, there is no need to validate the
      cp and dest pointers.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      882a844b
    • Julian Anastasov's avatar
      ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest · 06611f82
      Julian Anastasov authored
      	As the IP_VS_CONN_F_INACTIVE bit is properly set
      in cp->flags for all kind of connections we do not need to
      add special checks for synced connections when updating
      the activeconns/inactconns counters for first time. Now
      logic will look just like in ip_vs_unbind_dest.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      06611f82
    • Julian Anastasov's avatar
      ipvs: ignore IP_VS_CONN_F_NOOUTPUT in backup server · 82cfc062
      Julian Anastasov authored
      	As IP_VS_CONN_F_NOOUTPUT is derived from the
      forwarding method we should get it from conn_flags just
      like we do it for IP_VS_CONN_F_FWD_MASK bits when binding
      to real server.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      82cfc062