1. 18 Sep, 2013 6 commits
    • Julian Anastasov's avatar
      ipvs: do not use dest after ip_vs_dest_put in LBLCR · 742617b1
      Julian Anastasov authored
      commit c5549571 ("ipvs: convert lblcr scheduler to rcu")
      allows RCU readers to use dest after calling ip_vs_dest_put().
      In the corner case it can race with ip_vs_dest_trash_expire()
      which can release the dest while it is being returned to the
      RCU readers as scheduling result.
      
      To fix the problem do not allow e->dest to be replaced and
      defer the ip_vs_dest_put() call by using RCU callback. Now
      e->dest does not need to be RCU pointer.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      742617b1
    • Julian Anastasov's avatar
      ipvs: do not use dest after ip_vs_dest_put in LBLC · 2f3d771a
      Julian Anastasov authored
      commit c2a4ffb7 ("ipvs: convert lblc scheduler to rcu")
      allows RCU readers to use dest after calling ip_vs_dest_put().
      In the corner case it can race with ip_vs_dest_trash_expire()
      which can release the dest while it is being returned to the
      RCU readers as scheduling result.
      
      To fix the problem do not allow en->dest to be replaced and
      defer the ip_vs_dest_put() call by using RCU callback. Now
      en->dest does not need to be RCU pointer.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      2f3d771a
    • Julian Anastasov's avatar
      ipvs: make the service replacement more robust · bcbde4c0
      Julian Anastasov authored
      commit 578bc3ef ("ipvs: reorganize dest trash") added
      IP_VS_DEST_STATE_REMOVING flag and RCU callback named
      ip_vs_dest_wait_readers() to keep dests and services after
      removal for at least a RCU grace period. But we have the
      following corner cases:
      
      - we can not reuse the same dest if its service is removed
      while IP_VS_DEST_STATE_REMOVING is still set because another dest
      removal in the first grace period can not extend this period.
      It can happen when ipvsadm -C && ipvsadm -R is used.
      
      - dest->svc can be replaced but ip_vs_in_stats() and
      ip_vs_out_stats() have no explicit read memory barriers
      when accessing dest->svc. It can happen that dest->svc
      was just freed (replaced) while we use it to update
      the stats.
      
      We solve the problems as follows:
      
      - IP_VS_DEST_STATE_REMOVING is removed and we ensure a fixed
      idle period for the dest (IP_VS_DEST_TRASH_PERIOD). idle_start
      will remember when for first time after deletion we noticed
      dest->refcnt=0. Later, the connections can grab a reference
      while in RCU grace period but if refcnt becomes 0 we can
      safely free the dest and its svc.
      
      - dest->svc becomes RCU pointer. As result, we add explicit
      RCU locking in ip_vs_in_stats() and ip_vs_out_stats().
      
      - __ip_vs_unbind_svc is renamed to __ip_vs_svc_put(), it
      now can free the service immediately or after a RCU grace
      period. dest->svc is not set to NULL anymore.
      
      	As result, unlinked dests and their services are
      freed always after IP_VS_DEST_TRASH_PERIOD period, unused
      services are freed after a RCU grace period.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      bcbde4c0
    • Simon Kirby's avatar
      ipvs: fix overflow on dest weight multiply · c16526a7
      Simon Kirby authored
      Schedulers such as lblc and lblcr require the weight to be as high as the
      maximum number of active connections. In commit b552f7e3
      ("ipvs: unify the formula to estimate the overhead of processing
      connections"), the consideration of inactconns and activeconns was cleaned
      up to always count activeconns as 256 times more important than inactconns.
      In cases where 3000 or more connections are expected, a weight of 3000 *
      256 * 3000 connections overflows the 32-bit signed result used to determine
      if rescheduling is required.
      
      On amd64, this merely changes the multiply and comparison instructions to
      64-bit. On x86, a 64-bit result is already present from imull, so only
      a few more comparison instructions are emitted.
      Signed-off-by: default avatarSimon Kirby <sim@hostway.ca>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      c16526a7
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 61c5923a
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      The following patchset contains Netfilter fixes for you net tree,
      mostly targeted to ipset, they are:
      
      * Fix ICMPv6 NAT due to wrong comparison, code instead of type, from
        Phil Oester.
      
      * Fix RCU race in conntrack extensions release path, from Michal Kubecek.
      
      * Fix missing inversion in the userspace ipset test command match if
        the nomatch option is specified, from Jozsef Kadlecsik.
      
      * Skip layer 4 protocol matching in ipset in case of IPv6 fragments,
        also from Jozsef Kadlecsik.
      
      * Fix sequence adjustment in nfnetlink_queue due to using the netlink
        skb instead of the network skb, from Gao feng.
      
      * Make sure we cannot swap of sets with different layer 3 family in
        ipset, from Jozsef Kadlecsik.
      
      * Fix possible bogus matching in ipset if hash sets with net elements
        are used, from Oliver Smith.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61c5923a
    • Sridhar Samudrala's avatar
      vxlan: Avoid creating fdb entry with NULL destination · 2936b6ab
      Sridhar Samudrala authored
      Commit afbd8bae
         vxlan: add implicit fdb entry for default destination
      creates an implicit fdb entry for default destination. This results
      in an invalid fdb entry if default destination is not specified.
      For ex:
        ip link add vxlan1 type vxlan id 100
      creates the following fdb entry
        00:00:00:00:00:00 dev vxlan1 dst 0.0.0.0 self permanent
      
      This patch fixes this issue by creating an fdb entry only if a
      valid default destination is specified.
      Signed-off-by: default avatarSridhar Samudrala <sri@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2936b6ab
  2. 17 Sep, 2013 14 commits
  3. 16 Sep, 2013 20 commits