1. 29 Mar, 2015 18 commits
    • Guenter Roeck's avatar
      net: dsa: mv88e6xxx: Disable Message Port bit for CPU port · 366f0a0f
      Guenter Roeck authored
      Datasheet says that the Message Port bit should not be set for the CPU port.
      Having it set causes DSA tagged packets to be sent to the CPU port roughly
      every 30 seconds. Those packets are the same as real packets forwarded between
      switch ports if the switch is configured for switching between multiple ports.
      The packets are then bridged by the software bridge, resulting in duplicated
      packets on the network.
      Reported-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Tested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      366f0a0f
    • Guenter Roeck's avatar
      net: dsa: mv88e6xxx: Provide function for common port initialization · d827e88a
      Guenter Roeck authored
      Provide mv88e6xxx_setup_port_common() for common port initialization.
      Currently only write Port 1 Control and VLAN configuration since
      this will be needed for hardware bridging. More can be added later
      if desired/needed.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Tested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d827e88a
    • Guenter Roeck's avatar
      net: dsa: mv88e6xxx: Factor out common initialization code · acdaffcc
      Guenter Roeck authored
      Code used and needed in mv886xxx.c should be initialized there as well,
      so factor it out from the individual initialization files.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Tested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acdaffcc
    • Haiyang Zhang's avatar
      hv_netvsc: remove vmbus_are_subchannels_present() in rndis_filter_device_add() · 5ce58c2f
      Haiyang Zhang authored
      The vmbus_are_subchannels_present() also involves opening the channels, which
      may be too early at this point. Checking for subchannels is not necessary here.
      So this patch removes it. Subchannels will be opened when offer messages arrive.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ce58c2f
    • Uwe Kleine-König's avatar
      net: smc91x: make use of 4th parameter to devm_gpiod_get_index · cb6e0b36
      Uwe Kleine-König authored
      Since 39b2bbe3 (gpio: add flags argument to gpiod_get*() functions)
      which appeared in v3.17-rc1, the gpiod_get* functions take an additional
      parameter that allows to specify direction and initial value for output.
      Simplify accordingly.
      
      Moreover use devm_gpiod_get_index_optional for still simpler handling.
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb6e0b36
    • Haiyang Zhang's avatar
      hv_netvsc: Implement batching in send buffer · 7c3877f2
      Haiyang Zhang authored
      With this patch, we can send out multiple RNDIS data packets in one send buffer
      slot and one VMBus message. It reduces the overhead associated with VMBus messages.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c3877f2
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 4ef295e0
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for your net-next tree.
      Basically, nf_tables updates to add the set extension infrastructure and finish
      the transaction for sets from Patrick McHardy. More specifically, they are:
      
      1) Move netns to basechain and use recently added possible_net_t, from
         Patrick McHardy.
      
      2) Use LOGLEVEL_<FOO> from nf_log infrastructure, from Joe Perches.
      
      3) Restore nf_log_trace that was accidentally removed during conflict
         resolution.
      
      4) nft_queue does not depend on NETFILTER_XTABLES, starting from here
         all patches from Patrick McHardy.
      
      5) Use raw_smp_processor_id() in nft_meta.
      
      Then, several patches to prepare ground for the new set extension
      infrastructure:
      
      6) Pass object length to the hash callback in rhashtable as needed by
         the new set extension infrastructure.
      
      7) Cleanup patch to restore struct nft_hash as wrapper for struct
         rhashtable
      
      8) Another small source code readability cleanup for nft_hash.
      
      9) Convert nft_hash to rhashtable callbacks.
      
      And finally...
      
      10) Add the new set extension infrastructure.
      
      11) Convert the nft_hash and nft_rbtree sets to use it.
      
      12) Batch set element release to avoid several RCU grace period in a row
          and add new function nft_set_elem_destroy() to consolidate set element
          release.
      
      13) Return the set extension data area from nft_lookup.
      
      14) Refactor existing transaction code to add some helper functions
          and document it.
      
      15) Complete the set transaction support, using similar approach to what we
          already use, to activate/deactivate elements in an atomic fashion.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ef295e0
    • David S. Miller's avatar
      Merge branch 'tipc-next' · ae7633c8
      David S. Miller authored
      Ying Xue says:
      
      ====================
      tipc: fix two corner issues
      
      The patch set aims at resolving the following two critical issues:
      
      Patch #1: Resolve a deadlock which happens while all links are reset
      Patch #2: Correct a mistake usage of RCU lock which is used to protect
                node list
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae7633c8
    • Ying Xue's avatar
      tipc: involve reference counter for node structure · 8a0f6ebe
      Ying Xue authored
      TIPC node hash node table is protected with rcu lock on read side.
      tipc_node_find() is used to look for a node object with node address
      through iterating the hash node table. As the entire process of what
      tipc_node_find() traverses the table is guarded with rcu read lock,
      it's safe for us. However, when callers use the node object returned
      by tipc_node_find(), there is no rcu read lock applied. Therefore,
      this is absolutely unsafe for callers of tipc_node_find().
      
      Now we introduce a reference counter for node structure. Before
      tipc_node_find() returns node object to its caller, it first increases
      the reference counter. Accordingly, after its caller used it up,
      it decreases the counter again. This can prevent a node being used by
      one thread from being freed by another thread.
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericson.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a0f6ebe
    • Ying Xue's avatar
      tipc: fix potential deadlock when all links are reset · b952b2be
      Ying Xue authored
      [   60.988363] ======================================================
      [   60.988754] [ INFO: possible circular locking dependency detected ]
      [   60.989152] 3.19.0+ #194 Not tainted
      [   60.989377] -------------------------------------------------------
      [   60.989781] swapper/3/0 is trying to acquire lock:
      [   60.990079]  (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
      [   60.990743]
      [   60.990743] but task is already holding lock:
      [   60.991106]  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.991738]
      [   60.991738] which lock already depends on the new lock.
      [   60.991738]
      [   60.992174]
      [   60.992174] the existing dependency chain (in reverse order) is:
      [   60.992174]
      -> #1 (&(&bclink->lock)->rlock){+.-...}:
      [   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   60.992174]        [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.992174]        [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc]
      [   60.992174]        [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc]
      [   60.992174]        [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc]
      [   60.992174]        [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc]
      [   60.992174]        [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc]
      [   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
      [   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
      [   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
      [   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
      [   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
      [   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
      [   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
      [   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
      [   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
      [   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
      [   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
      [   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
      [   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
      [   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
      [   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
      [   60.992174]
      -> #0 (&(&n_ptr->lock)->rlock){+.-...}:
      [   60.992174]        [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0
      [   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   60.992174]        [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
      [   60.992174]        [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc]
      [   60.992174]        [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc]
      [   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
      [   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
      [   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
      [   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
      [   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
      [   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
      [   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
      [   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
      [   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
      [   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
      [   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
      [   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
      [   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
      [   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
      [   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
      [   60.992174]
      [   60.992174] other info that might help us debug this:
      [   60.992174]
      [   60.992174]  Possible unsafe locking scenario:
      [   60.992174]
      [   60.992174]        CPU0                    CPU1
      [   60.992174]        ----                    ----
      [   60.992174]   lock(&(&bclink->lock)->rlock);
      [   60.992174]                                lock(&(&n_ptr->lock)->rlock);
      [   60.992174]                                lock(&(&bclink->lock)->rlock);
      [   60.992174]   lock(&(&n_ptr->lock)->rlock);
      [   60.992174]
      [   60.992174]  *** DEADLOCK ***
      [   60.992174]
      [   60.992174] 3 locks held by swapper/3/0:
      [   60.992174]  #0:  (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980
      [   60.992174]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
      [   60.992174]  #2:  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.992174]
      
      The correct the sequence of grabbing n_ptr->lock and bclink->lock
      should be that the former is first held and the latter is then taken,
      which exactly happened on CPU1. But especially when the retransmission
      of broadcast link is failed, bclink->lock is first held in
      tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
      called by tipc_link_retransmit() subsequently, which is demonstrated on
      CPU0. As a result, deadlock occurs.
      
      If the order of holding the two locks happening on CPU0 is reversed, the
      deadlock risk will be relieved. Therefore, the node lock taken in
      link_retransmit_failure() originally is moved to tipc_bclink_rcv()
      so that it's obtained before bclink lock. But the precondition of
      the adjustment of node lock is that responding to bclink reset event
      must be moved from tipc_bclink_unlock() to tipc_node_unlock().
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b952b2be
    • Li RongQing's avatar
      virtio: simplify the using of received in virtnet_poll · faadb05f
      Li RongQing authored
      received is 0, no need to minus it and use "+=" to reassign it
      Signed-off-by: default avatarLi RongQing <roy.qing.li@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faadb05f
    • David S. Miller's avatar
      Merge branch 'be2net-next' · 3556eaaa
      David S. Miller authored
      Sathya Perla says:
      
      ====================
      be2net: patch set
      
      Hi David, this patch set includes 2 feature additions to the be2net driver:
      
      Patch 1 sets up cpu affinity hints for be2net irqs using the
      cpumask_set_cpu_local_first() API that first picks the near numa cores
      and when they are exhausted, selects the far numa cores.
      
      Patch 2 setups up xps queue mapping for be2net's TXQs to avoid,
      by default, TX lock contention.
      
      Patch 3 just bumps up the driver version.
      
      Pls consider applying this patch set to the net-next queue. Thanks!
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3556eaaa
    • Sathya Perla's avatar
    • Sathya Perla's avatar
      be2net: setup xps queue mapping · 73f394e6
      Sathya Perla authored
      This patch sets up xps queue mapping on load, so that TX traffic is
      steered to the queue whose irqs are being processed by the current cpu.
      This helps in avoiding TX lock contention.
      Signed-off-by: default avatarSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73f394e6
    • Padmanabh Ratnakar's avatar
      be2net: assign CPU affinity hints to be2net IRQs · d658d98a
      Padmanabh Ratnakar authored
      This patch provides hints to irqbalance to map be2net IRQs to
      specific CPU cores. cpumask_set_cpu_local_first() is used, which first
      maps IRQs to near NUMA cores; when those cores are exhausted, IRQs are
      mapped to far NUMA cores.
      Signed-off-by: default avatarPadmanabh Ratnakar <padmanabh.ratnakar@emulex.com>
      Signed-off-by: default avatarSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d658d98a
    • Eric Dumazet's avatar
      tcp: tcp_syn_flood_action() can be static · 41d25fe0
      Eric Dumazet authored
      After commit 1fb6f159 ("tcp: add tcp_conn_request"),
      tcp_syn_flood_action() is no longer used from IPv6.
      
      We can make it static, by moving it above tcp_conn_request()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarOctavian Purdila <octavian.purdila@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41d25fe0
    • Wu Fengguang's avatar
      cxgb4: fix boolreturn.cocci warnings · 1fb7cd4e
      Wu Fengguang authored
      drivers/net/ethernet/chelsio/cxgb4/cxgb4_fcoe.c:49:9-10: WARNING: return of 0/1 in function 'cxgb_fcoe_sof_eof_supported' with return type bool
      
       Return statements in functions returning bool should use
       true/false instead of 1/0.
      Generated by: scripts/coccinelle/misc/boolreturn.cocci
      
      CC: Varun Prakash <varun@chelsio.com>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fb7cd4e
    • WANG Cong's avatar
      fib6: install fib6 ops in the last step · 85b99092
      WANG Cong authored
      We should not commit the new ops until we finish
      all the setup, otherwise we have to NULL it on failure.
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85b99092
  2. 27 Mar, 2015 12 commits
  3. 26 Mar, 2015 4 commits
    • Patrick McHardy's avatar
      netfilter: nf_tables: implement set transaction support · cc02e457
      Patrick McHardy authored
      Set elements are the last object type not supporting transaction support.
      Implement similar to the existing rule transactions:
      
      The global transaction counter keeps track of two generations, current
      and next. Each element contains a bitmask specifying in which generations
      it is inactive.
      
      New elements start out as inactive in the current generation and active
      in the next. On commit, the previous next generation becomes the current
      generation and the element becomes active. The bitmask is then cleared
      to indicate that the element is active in all future generations. If the
      transaction is aborted, the element is removed from the set before it
      becomes active.
      
      When removing an element, it gets marked as inactive in the next generation.
      On commit the next generation becomes active and the therefor the element
      inactive. It is then taken out of then set and released. On abort, the
      element is marked as active for the next generation again.
      
      Lookups ignore elements not active in the current generation.
      
      The current set types (hash/rbtree) both use a field in the extension area
      to store the generation mask. This (currently) does not require any
      additional memory since we have some free space in there.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      cc02e457
    • Patrick McHardy's avatar
      netfilter: nf_tables: add transaction helper functions · ea4bd995
      Patrick McHardy authored
      Add some helper functions for building the genmask as preparation for
      set transactions.
      
      Also add a little documentation how this stuff actually works.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ea4bd995
    • Patrick McHardy's avatar
      netfilter: nf_tables: return set extensions from ->lookup() · b2832dd6
      Patrick McHardy authored
      Return the extension area from the ->lookup() function to allow to
      consolidate common actions.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b2832dd6
    • Patrick McHardy's avatar
      netfilter: nf_tables: consolide set element destruction · 61edafbb
      Patrick McHardy authored
      With the conversion to set extensions, it is now possible to consolidate
      the different set element destruction functions.
      
      The set implementations' ->remove() functions are changed to only take
      the element out of their internal data structures. Elements will be freed
      in a batched fashion after the global transaction's completion RCU grace
      period.
      
      This reduces the amount of grace periods required for nft_hash from N
      to zero additional ones, additionally this guarantees that the set
      elements' extensions of all implementations can be used under RCU
      protection.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      61edafbb
  4. 25 Mar, 2015 6 commits
    • Hannes Frederic Sowa's avatar
      ipv6: hash net ptr into fragmentation bucket selection · 5a352dd0
      Hannes Frederic Sowa authored
      As namespaces are sometimes used with overlapping ip address ranges,
      we should also use the namespace as input to the hash to select the ip
      fragmentation counter bucket.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Flavio Leitner <fbl@redhat.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a352dd0
    • Hannes Frederic Sowa's avatar
      ipv4: hash net ptr into fragmentation bucket selection · b6a7719a
      Hannes Frederic Sowa authored
      As namespaces are sometimes used with overlapping ip address ranges,
      we should also use the namespace as input to the hash to select the ip
      fragmentation counter bucket.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Flavio Leitner <fbl@redhat.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6a7719a
    • David S. Miller's avatar
      Merge branch 'tipc-next' · 8fa38a38
      David S. Miller authored
      Jon Maloy says:
      
      ====================
      tipc: some improvements and fixes
      
      We introduce a better algorithm for selecting when and which
      users should be subject to link congestion control, plus clean
      up some code for that mechanism.
      Commit #3 fixes another rare race condition during packet reception.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fa38a38
    • Jon Paul Maloy's avatar
      tipc: eliminate race condition at dual link establishment · 8b4ed863
      Jon Paul Maloy authored
      Despite recent improvements, the establishment of dual parallel
      links still has a small glitch where messages can bypass each
      other. When the second link in a dual-link configuration is
      established, part of the first link's traffic will be steered over
      to the new link. Although we do have a mechanism to ensure that
      packets sent before and after the establishment of the new link
      arrive in sequence to the destination node, this is not enough.
      The arriving messages will still be delivered upwards in different
      threads, something entailing a risk of message disordering during
      the transition phase.
      
      To fix this, we introduce a synchronization mechanism between the
      two parallel links, so that traffic arriving on the new link cannot
      be added to its input queue until we are guaranteed that all
      pre-establishment messages have been delivered on the old, parallel
      link.
      
      This problem seems to always have been around, but its occurrence is
      so rare that it has not been noticed until recent intensive testing.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b4ed863
    • Jon Paul Maloy's avatar
      tipc: clean up handling of link congestion · 3127a020
      Jon Paul Maloy authored
      After the recent changes in message importance handling it becomes
      possible to simplify handling of messages and sockets when we
      encounter link congestion.
      
      We merge the function tipc_link_cong() into link_schedule_user(),
      and simplify the code of the latter. The code should now be
      easier to follow, especially regarding return codes and handling
      of the message that caused the situation.
      
      In case the scheduling function is unable to pre-allocate a wakeup
      message buffer, it now returns -ENOBUFS, which is a more correct
      code than the previously used -EHOSTUNREACH.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3127a020
    • Jon Paul Maloy's avatar
      tipc: introduce starvation free send algorithm · 1f66d161
      Jon Paul Maloy authored
      Currently, we only use a single counter; the length of the backlog
      queue, to determine whether a message should be accepted to the queue
      or not. Each time a message is being sent, the queue length is compared
      to a threshold value for the message's importance priority. If the queue
      length is beyond this threshold, the message is rejected. This algorithm
      implies a risk of starvation of low importance senders during very high
      load, because it may take a long time before the backlog queue has
      decreased enough to accept a lower level message.
      
      We now eliminate this risk by introducing a counter for each importance
      priority. When a message is sent, we check only the queue level for that
      particular message's priority. If that is ok, the message can be added
      to the backlog, irrespective of the queue level for other priorities.
      This way, each level is guaranteed a certain portion of the total
      bandwidth, and any risk of starvation is eliminated.
      Reviewed-by: default avatarYing Xue <ying.xue@windriver.com>
      Reviewed-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f66d161