1. 02 Aug, 2013 16 commits
  2. 01 Aug, 2013 24 commits
    • David S. Miller's avatar
      Merge branch 'bond_rcu' · a594e4f8
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
       This patchset aims to lay the groundwork, and do the initial conversion to
      RCUism. I decided that it'll be much better to make the bonding RCU
      conversion gradual, so patches can be reviewed and tested better rather
      than having one huge patch (which I did in the beginning, before this).
      The first patch is straightforward and it converts the bonding to the
      standard list API, simplifying a lot of code, removing unnecessary local
      variables and allowing to use the nice rculist API later. It also takes
      care of some minor styling issues (re-arranging local variables longest ->
      shortest, removing brackets for single statement if/else, leaving new line
      before return statement etc.).
       The second patch simplifies the conversion by removing unnecessary
      read_lock(&bond->curr_slave_lock) in xmit paths that are to be converted
      later, because we only care if the pointer is NULL or a slave there, since
      we already have bond->lock the slave can't go away.
       The third patch simplifies the broadcast xmit function by removing
      the use of curr_active_slave and converting to standard list API. Also this
      design of the broadcast xmit function avoids a subtle double packet tx race
      when converted to RCU.
       The fourth patch factors out the code that transmits skb through a slave
      with given id (i.e. rr_tx_counter in rr mode, hashed value in xor mode) and
      simplifies the active-backup xmit path because bond_dev_queue_xmit always
      consumes the skb. The new bond_xmit_slave_id function is used in rr and xor
      modes currently, but the plans are to use it in 3ad mode as well thus it's
      made global. I've left the function prototype to be 81 chars so I wouldn't
      break it, if this is an issue I can always break it in more lines.
       The fifth patch introduces RCU by converting attach/detach and release to
      RCU. It also converts dereferencing of curr_active_slave to rcu_dereference
      although it's not fully converted to RCU, that is needed for the converted
      xmit paths. And it converts roundrobin, broadcast, xor and active-backup
      xmit paths to RCU. The 3ad and ALB/TLB modes acquire read_lock(&bond->lock)
      to make sure that no slave will be removed and to sync properly with
      enslave and release as before.
       This way for the price of a little complexity, we'll be able to convert
      individual parts of the bonding to RCU, and test them easier in the
      process. If this patchset is accepted in some form, I'll post followups
      in the next weeks that gradually convert the bonding to RCU and remove the
      need for the rwlocks.
       For performance notes please refer to patch 5 (RCU conversion one).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a594e4f8
    • nikolay@redhat.com's avatar
      bonding: initial RCU conversion · 278b2083
      nikolay@redhat.com authored
      This patch does the initial bonding conversion to RCU. After it the
      following modes are protected by RCU alone: roundrobin, active-backup,
      broadcast and xor. Modes ALB/TLB and 3ad still acquire bond->lock for
      reading, and will be dealt with later. curr_active_slave needs to be
      dereferenced via rcu in the converted modes because the only thing
      protecting the slave after this patch is rcu_read_lock, so we need the
      proper barrier for weakly ordered archs and to make sure we don't have
      stale pointer. It's not tagged with __rcu yet because there's still work
      to be done to remove the curr_slave_lock, so sparse will complain when
      rcu_assign_pointer and rcu_dereference are used, but the alternative to use
      rcu_dereference_protected would've created much bigger code churn which is
      more difficult to test and review. That will be converted in time.
      
      1. Active-backup mode
       1.1 Perf recording while doing iperf -P 4
        - old bonding: iperf spent 0.55% in bonding, system spent 0.29% CPU
                       in bonding
        - new bonding: iperf spent 0.29% in bonding, system spent 0.15% CPU
                       in bonding
       1.2. Bandwidth measurements
        - old bonding: 16.1 gbps consistently
        - new bonding: 17.5 gbps consistently
      
      2. Round-robin mode
       2.1 Perf recording while doing iperf -P 4
        - old bonding: iperf spent 0.51% in bonding, system spent 0.24% CPU
                       in bonding
        - new bonding: iperf spent 0.16% in bonding, system spent 0.11% CPU
                       in bonding
       2.2 Bandwidth measurements
        - old bonding: 8 gbps (variable due to packet reorderings)
        - new bonding: 10 gbps (variable due to packet reorderings)
      
      Of course the latency has improved in all converted modes, and moreover
      while
      doing enslave/release (since it doesn't affect tx anymore).
      
      Also I've stress tested all modes doing enslave/release in a loop while
      transmitting traffic.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      278b2083
    • Nikolay Aleksandrov's avatar
      bonding: factor out slave id tx code and simplify xmit paths · 15077228
      Nikolay Aleksandrov authored
      I factored out the tx xmit code which relies on slave id in
      bond_xmit_slave_id. It is global because later it can be used also in
      3ad mode xmit. Unnecessary obvious comments are removed. Active-backup
      mode is simplified because bond_dev_queue_xmit always consumes the skb.
      bond_xmit_xor becomes one line because of bond_xmit_slave_id.
      bond_for_each_slave_from is not used in bond_xmit_slave_id because later
      when RCU is used we can avoid important race condition by using standard
      rculist routines.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15077228
    • Nikolay Aleksandrov's avatar
      bonding: simplify broadcast_xmit function · 78a646ce
      Nikolay Aleksandrov authored
      We don't need to start from the curr_active_slave as the frame will be
      sent to all eligible slaves anyway, so we remove the unnecessary local
      variables, checks and comments, and make it use the standard list API.
      This has the nice side-effect that later when it's converted to RCU
      a race condition will be avoided which could lead to double packet tx.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78a646ce
    • nikolay@redhat.com's avatar
      bonding: remove unnecessary read_locks of curr_slave_lock · 71bc3b2d
      nikolay@redhat.com authored
      In all the cases we already hold bond->lock for reading, so the slave
      can't get away and the check != NULL is sufficient. curr_active_slave
      can still change after the read_lock is unlocked prior to use of the
      dereferenced value, so there's no need for it. It either contains a
      valid slave which we use (and can't get away), or it is NULL which is
      checked.
      In some places the read_lock of curr_slave_lock was left because we need
      it not to change while performing some action (e.g. syncing current
      active slave's addresses, sending ARP requests through the active slave)
      such cases will be dealt with individually while converting to RCU.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71bc3b2d
    • nikolay@redhat.com's avatar
      bonding: convert to list API and replace bond's custom list · dec1e90e
      nikolay@redhat.com authored
      This patch aims to remove struct bonding's first_slave and struct
      slave's next and prev pointers, and replace them with the standard Linux
      list API. The old macros are converted to list API as well and some new
      primitives are available now. The checks if there're slaves that used
      slave_cnt have been replaced by the list_empty macro.
      Also a few small style fixes, changing longest -> shortest line in local
      variable declarations, leaving an empty line before return and removing
      unnecessary brackets.
      This is the first step to gradual RCU conversion.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dec1e90e
    • fan.du's avatar
      ipv6: bump genid when delete/add address · 439677d7
      fan.du authored
      Server           Client
      2001:1::803/64  <-> 2001:1::805/64
      2001:2::804/64  <-> 2001:2::806/64
      
      Server side fib binary tree looks like this:
      
                                         (2001:/64)
                                         /
                                        /
                         ffff88002103c380
                       /                 \
           (2)        /                   \
       (2001::803/128)                     ffff880037ac07c0
                                          /               \
                                         /                 \  (3)
                            ffff880037ac0640               (2001::806/128)
                             /             \
                   (1)      /               \
              (2001::804/128)               (2001::805/128)
      
      Delete 2001::804/64 won't cause prefix route deleted as well as rt in (3)
      destinate to 2001::806 with source address as 2001::804/64. That's because
      2001::803/64 is still alive, which make onlink=1 in ipv6_del_addr, this is
      where the substantial difference between same prefix configuration and
      different prefix configuration :) So packet are still transmitted out to
      2001::806 with source address as 2001::804/64.
      
      So bump genid will clear rt in (3), and up layer protocol will eventually
      find the right one for themselves.
      
      This problem arised from the discussion in here:
      http://marc.info/?l=linux-netdev&m=137404469219410&w=4Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      439677d7
    • David S. Miller's avatar
      Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next · c1fc20aa
      David S. Miller authored
      Marc Kleine-Budde says:
      
      ====================
      this is a pull-request for net-next/master. It consists of two patches
      by Fabio Estevam. Them first convert the flexcan driver to use
      devm_ioremap_resource(), the second adds return value checking for
      clk_prepare_enable().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1fc20aa
    • Yuval Mintz's avatar
      bnx2x: Revising locking scheme for MAC configuration · 8b09be5f
      Yuval Mintz authored
      On very rare occasions, repeated load/unload stress test in the presence of
      our storage driver (bnx2i/bnx2fc) causes a kernel panic in bnx2x code
      (NULL pointer dereference). Stack traces indicate the issue happens during MAC
      configuration; thorough code review showed that indeed several races exist
      in which one thread can iterate over the list of configured MACs while another
      deletes entries from the same list.
      
      This patch adds a varient on the single-writer/Multiple-reader lock mechanism -
      It utilizes an already exsiting bottom-half lock, using it so that Whenever
      a writer is unable to continue due to the existence of another writer/reader,
      it pends its request for future deliverance.
      The writer / last readers will check for the existence of such requests and
      perform them instead of the original initiator.
      This prevents the writer from having to sleep while waiting for the lock
      to be accessible, which might cause deadlocks given the locks already
      held by the writer.
      
      Another result of this patch is that setting of Rx Mode is now made in
      sleepable context - Setting of Rx Mode is made under a bottom-half lock, which
      was always nontrivial for the bnx2x driver, as the HW/FW configuration requires
      wait for completions.
      Since sleep was impossible (due to the sleepless-context), various mechanisms
      were utilized to prevent the calling thread from sleep, but the truth was that
      when the caller thread (i.e, the one calling ndo_set_rx_mode()) returned, the
      Rx mode was still not set in HW/FW.
      
      bnx2x_set_rx_mode() will now overtly schedule for the Rx changes to be
      configured by the sp_rtnl_task which hold the RTNL lock and is sleepable
      context.
      Signed-off-by: default avatarYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: default avatarAriel Elior <ariele@broadcom.com>
      Signed-off-by: default avatarEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b09be5f
    • Nikolay Aleksandrov's avatar
      bonding: fix system hang due to fast igmp timer rescheduling · 4beac029
      Nikolay Aleksandrov authored
      After commit 4aa5dee4 ("net: convert resend IGMP to notifier event")
      we try to acquire rtnl in bond_resend_igmp_join_requests but it can be
      scheduled with rtnl already held (e.g. when bond_change_active_slave is
      called with rtnl) causing a loop of immediate reschedules + calls because
      rtnl_trylock fails each time since it's being already held.
      For me this issue leads to system hangs very easy:
      modprobe bonding; ifconfig bond0 up; ifenslave bond0 eth0; rmmod
      bonding;
      
      The fix is to introduce a small (1 jiffy) delay which is enough for the
      sections holding rtnl to finish without putting any strain on the system.
      Also adjust the timer in bond_change_active_slave to be 1 jiffy, since
      most of the time it's called with rtnl already held.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4beac029
    • Chris Metcalf's avatar
    • Chris Metcalf's avatar
    • Chris Metcalf's avatar
    • Chris Metcalf's avatar
    • Chris Metcalf's avatar
      tile: support multiple mPIPE shims in tilegx network driver · f3286a3a
      Chris Metcalf authored
      The initial driver support was for a single mPIPE shim on the chip
      (as is the case for the Gx36 hardware).  The Gx72 chip has two mPIPE
      shims, so we extend the driver to handle that case.
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3286a3a
    • Chris Metcalf's avatar
      6ab4ae9a
    • Chris Metcalf's avatar
      tile: fix panic bug in napi support for tilegx network driver · 5e7a54a2
      Chris Metcalf authored
      The code used to call napi_disable() in an interrupt handler
      (from smp_call_function), which in turn could call msleep().
      Unfortunately you can't sleep in an interrupt context.
      
      Luckily it turns out all the NAPI support functions are
      just operating on data structures and not on any deeply
      per-cpu data, so we can arrange to set up and tear down all
      the NAPI state on the core driving the process, and just
      do the IRQ enable/disable as a smp_call_function thing.
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e7a54a2
    • Chris Metcalf's avatar
    • Chris Metcalf's avatar
    • Chris Metcalf's avatar
    • Chris Metcalf's avatar
      tile: avoid bug in tilepro net driver built with old hypervisor · 815d3bae
      Chris Metcalf authored
      Building against headers from an older Tilera hypervisor can cause
      the frags[] array to be overrun.  Don't enable TSO in that case.
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      815d3bae
    • Chris Metcalf's avatar
    • Chris Metcalf's avatar
      tile: set hw_features and vlan_features in setup · a8eaed55
      Chris Metcalf authored
      This change allows the user to configure various features of the tile
      networking drivers on and off.  There is no change to the default
      initialization state of either the tilegx or tilepro drivers.
      
      Neither driver needs the ndo_fix_features or ndo_set_features callbacks,
      since the generic code already handles the dependencies for
      fix_features, and there is no hardware state to tweak in set_features.
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8eaed55
    • Claudiu Manoil's avatar
      gianfar: Remove unused field grp_id from gfar_priv_grp · 84915c64
      Claudiu Manoil authored
      grp->grp_id is obsolete. It has no use in the current driver.
      Remove it from gfar_priv_grp and put the 'rstat' member
      in its place, in the 2nd cache line, as rstat needs fast access.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84915c64