1. 13 May, 2015 12 commits
  2. 12 May, 2015 28 commits
    • David S. Miller's avatar
      Merge branch 'switchdev_spring_cleanup' · a62b70dd
      David S. Miller authored
      Scott Feldman says:
      
      ====================
      switchdev: spring cleanup
      
      v7:
      
      Address review comments:
      
       - [Jiri] split the br_setlink and br_dellink reverts into their own patches
       - [Jiri] some parameter cleanup of rocker's memory allocators
       - [Jiri] pass trans mode as formal parameter rather than hanging off of
           rocker_port.
      
      v6:
      
      Address review comments:
      
       - [Jiri] split a couple of patches into one-logical-change per patch
       - [Joe Perches] revert checkpatch -f changes for wrapped lines with long
           symbols.
      
      v5:
      
      Address review comments:
      
       - [Jiri] include Jiri's s/swdev/switchdev rename patches up front.
       - [Jiri] squash some patches.  Now setlink/dellink/getlink patches are in
           three parts: new implementation, convert drivers to new, delete old impl.
       - [Jiri] some minor variable renames
       - [Jiri] use BUG_ON rather than WARN when COMMIT phase fails when PREPARE
           phase said it was safe to come into the water.
       - [Simon] rocker: fix a few transaction prepare-commit cases that were wrong.
           This was the bulk of the changes in v5.
      
      v4:
      
      Well, it was a lot of work, but now prepare-commit transaction model is how
      davem advises: if prepare fails, abort the transaction.  The driver must do
      resource reservations up front in prepare phase and return those resources if
      aborting.  Commit phase would use reserved resources.  The good news is the
      driver code (for rocker) now handles resource allocation failures better by not
      leaving partially device or driver states.  This is a side-effect of the
      prepare phase where state isn't modified; only validation of inputs and
      resource reservations happen in the prepare phase.  Since we're supporting
      setting attrs and add objs across lower devs in the stacked case, we need to
      hold rtnl_lock (or ensure rtnl_lock is held) so lower devs don't move on us
      during the prepare-commit transaction.  DSA driver code skips the prepare phase
      and goes straight for the commit phase since no up-front allocations are done
      and no device failures (that could be detected in the prepare phase) can
      happen.
      
      Remove NETIF_F_HW_SWITCH_OFFLOAD from rocker and the swdev_attr_set/get
      wrappers.  DSA doesn't set NETIF_F_HW_SWITCH_OFFLOAD, so it can't be in
      swdev_attr_set/get.  rocker doesn't need it; or rather can't support
      NETIF_F_HW_SWITCH_OFFLOAD being set/cleared at run-time after the device
      port is already up and offloading L2/L3.  NETIF_F_HW_SWITCH_OFFLOAD is still
      left as a feature flag for drivers that can use it.
      
      Drop the renaming patch for netdev_switch_notifier.  Other renames are a
      result of moving to the attr get/set or obj add/del model.  Everything
      but the netdev_switch_notifier is still prefixed with "swdev_".
      
      v3:
      
      Move to two-phase prepare-commit transaction model for attr set and obj add.
      Driver gets a change in prepare phase to NACK transaction if lack of resources
      or support in device.
      
      v2:
      
      Address review comments:
      
       - [Jiri] squash a few related patches
       - [Roopa] don't remove NETIF_F_HW_SWITCH_OFFLOAD
       - [Roopa] address VLAN setlink/dellink
       - [Ronen] print warning is attr set revert fails
      
      Not address:
      
       - Using something other than "swdev_" prefix
       - Vendor extentions
      
      The patch set grew a bit to not only support port attr get/set but also add
      support for port obj add/del.  Example of port objs are VLAN, FDB entries, and
      FIB entries.  The VLAN support now allows the swdev driver to get VLAN ranges
      and flags like PVID and "untagged".  Sridhar will be adding FDB obj support
      in follow-on patch.
      
      v1:
      
      The main theme of this patch set is to cleanup swdev in preparation for
      new features or fixes to be added soon.  We have a pretty good idea now how
      to handle stacked drivers in swdev, but there where some loose ends.  For
      example, if a set failed in the middle of walking the lower devs, we would
      leave the system in an undefined state...there was no way to recover back to
      the previous state.  Speaking of sets, also recognize a pattern that most
      swdev API accesses are gets or sets of port attributes, so go ahead and make
      port attr get/set the central swdev API, and convert everything that is
      set-ish/get-ish to this new API.
      
      Features/fixes that should follow from this cleanup:
      
       - solve the duplicate pkt forwarding issue
       - get/set bridge attrs, like ageing_time, from/to device
       - get/set more bridge port attrs from/to device
      
      There are some rename cleanups tagging along at the end, to give swdev
      consistent naming.
      
      And finally, some much needed updates to the switchdev.txt documentation to
      hopefully capture the state-of-the-art of swdev.  Hopefully, we can do a better
      job keeping this document up-to-date.
      
      Tested with rocker, of course, to make sure nothing functional broke.  There
      are a couple minor tweaks to DSA code for getting switch ID and setting STP
      updates to use new API, but not expecting amy breakage there.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a62b70dd
    • Scott Feldman's avatar
      switchdev: bring documentation up-to-date · 4ceec22d
      Scott Feldman authored
      Much need updated of switchdev documentation to cover what's been
      implmented to-date.  There are some XXX comments in the text for
      unimplemented or broken items.  I'd like to keep these in there (poor-man's
      TODO list) and update the document once each issue is resolved.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ceec22d
    • Scott Feldman's avatar
      rocker: make checkpatch -f clean · 4725ceb9
      Scott Feldman authored
      Well almost clean: ignore the CHECKs for space after cast operator and some
      longer-than-80 char cases where for readability it's better to keep as-is.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4725ceb9
    • Scott Feldman's avatar
      switchdev: remove NETIF_F_HW_SWITCH_OFFLOAD feature flag · 7889cbee
      Scott Feldman authored
      Roopa said remove the feature flag for this series and she'll work on
      bringing it back if needed at a later date.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7889cbee
    • Scott Feldman's avatar
      switchdev: convert fib_ipv4_add/del over to switchdev_port_obj_add/del · 58c2cb16
      Scott Feldman authored
      The IPv4 FIB ops convert nicely to the switchdev objs and we're left with
      only four switchdev ops: port get/set and port add/del.  Other objs will
      follow, such as FDB.  So go ahead and convert IPv4 FIB over to switchdev
      obj for consistency, anticipating more objs to come.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58c2cb16
    • Scott Feldman's avatar
    • Scott Feldman's avatar
      switchdev: add new switchdev_port_bridge_getlink · 8793d0a6
      Scott Feldman authored
      Like bridge_setlink, add switchdev wrapper to handle bridge_getlink and
      call into port driver to get port attrs.  For now, only BR_LEARNING and
      BR_LEARNING_SYNC are returned.  To add more, we'll probably want to break
      away from ndo_dflt_bridge_getlink() and build the netlink skb directly in
      the switchdev code.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8793d0a6
    • Scott Feldman's avatar
      bridge: revert br_dellink change back to original · 8508025c
      Scott Feldman authored
      This is revert of:
      
      commit 68e331c7 ("bridge: offload bridge port attributes to switch asic
      if feature flag set")
      
      Restore br_dellink back to original and don't call into SELF port driver.
      rtnetlink.c:bridge_dellink() already does a call into port driver for SELF.
      
      bridge vlan add/del cmd defaults to MASTER.  From man page for bridge vlan
      add/del cmd:
      
             self   the vlan is configured on the specified physical device.
                    Required if the device is the bridge device.
      
             master the vlan is configured on the software bridge (default).
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8508025c
    • Scott Feldman's avatar
      switchdev: remove unused switchdev_port_bridge_dellink · 87a5dae5
      Scott Feldman authored
      Now we can remove old wrappers for dellink.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87a5dae5
    • Scott Feldman's avatar
      switchdev: cut over to new switchdev_port_bridge_dellink · 54ba5a0b
      Scott Feldman authored
      Rocker, bonding and team and switch over to the new
      switchdev_port_bridge_dellink to avoid duplicating code in each driver.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54ba5a0b
    • Scott Feldman's avatar
      switchdev: add new switchdev_port_bridge_dellink · 5c34e022
      Scott Feldman authored
      Same change as setlink.  Provide the wrapper op for SELF ndo_bridge_dellink
      and call into the switchdev driver to delete afspec VLANs.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c34e022
    • Scott Feldman's avatar
      bridge: restore br_setlink back to original · 41c498b9
      Scott Feldman authored
      This is revert of:
      
      commit 68e331c7 ("bridge: offload bridge port attributes to switch asic
      if feature flag set")
      
      Restore br_setlink back to original and don't call into SELF port driver.
      rtnetlink.c:bridge_setlink() already does a call into port driver for SELF.
      
      bridge set link cmd defaults to MASTER.  From man page for bridge link set
      cmd:
      
             self   link setting is configured on specified physical device
      
             master link setting is configured on the software bridge (default)
      
      The link setting has two values: the device-side value and the software
      bridge-side value.  These are independent and settable using the bridge
      link set cmd by specifying some combination of [master] | [self].
      Furthermore, the device-side and bridge-side settings have their own
      initial value, viewable from bridge -d link show cmd.
      
      Restoring br_setlink back to original makes rocker (the only in-kernel user
      of SELF link settings) work as first implement: two-sided values.
      
      It's true that when both MASTER and SELF are specified from the command,
      two netlink notifications are generated, one for each side of the settings.
      The user-space app can distiquish between the two notifications by
      observing the MASTER or SELF flag.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41c498b9
    • Scott Feldman's avatar
      switchdev: remove old switchdev_port_bridge_setlink · e71f220b
      Scott Feldman authored
      New attr-based bridge_setlink can recurse lower devs and recover on err, so
      remove old wrapper (including ndo_dflt_switchdev_port_bridge_setlink).
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e71f220b
    • Scott Feldman's avatar
      switchdev: cut over to new switchdev_port_bridge_setlink · fc8f40d8
      Scott Feldman authored
      Rocker, bonding, and team can now use the switchdev bridge setlink to parse
      raw netlink; no need to duplicate this code in each driver.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc8f40d8
    • Scott Feldman's avatar
      switchdev: add new switchdev bridge setlink · 47f8328b
      Scott Feldman authored
      Add new switchdev_port_bridge_setlink that can be used by drivers
      implementing .ndo_bridge_setlink to set switchdev bridge attributes.
      Basically turn the raw rtnl_bridge_setlink netlink into switchdev attr
      sets.  Proper netlink attr policy checking is done on the protinfo part of
      the netlink msg.
      
      Currently, for protinfo, only bridge port attrs BR_LEARNING and
      BR_LEARNING_SYNC are parsed and passed to port driver.
      
      For afspec, VLAN objs are passed so switchdev driver can set VLANs assigned
      to SELF.  To illustrate with iproute2 cmd, we have:
      
      	bridge vlan add vid 10 dev sw1p1 self master
      
      To add VLAN 10 to port sw1p1 for both the bridge (master) and the device
      (self).
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47f8328b
    • Scott Feldman's avatar
      switchdev: add bridge port flags attr · 6004c867
      Scott Feldman authored
      rocker: use switchdev get/set attr for bridge port flags
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6004c867
    • Scott Feldman's avatar
    • Scott Feldman's avatar
      switchdev: add port vlan obj · 6fc3016d
      Scott Feldman authored
      VLAN obj has flags (PVID and untagged) as well as start and end vid ranges.
      The switchdev driver can optimize programing the device using the ranges.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fc3016d
    • Scott Feldman's avatar
      switchdev: introduce switchdev add/del obj ops · 491d0f15
      Scott Feldman authored
      Like switchdev attr get/set, add new switchdev obj add/del.  switchdev objs
      will be things like VLANs or FIB entries, so add/del fits better for
      objects than get/set used for attributes.
      
      Use same two-phase prepare-commit transaction model as in attr set.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      491d0f15
    • Scott Feldman's avatar
      switchdev: convert STP update to switchdev attr set · 35636062
      Scott Feldman authored
      STP update is just a settable port attribute, so convert
      switchdev_port_stp_update to an attr set.
      
      For DSA, the prepare phase is skipped and STP updates are only done in the
      commit phase.  This is because currently the DSA drivers don't need to
      allocate any memory for STP updates and the STP update will not fail to HW
      (unless something horrible goes wrong on the MDIO bus, in which case the
      prepare phase wouldn't have been able to predict anyway).
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35636062
    • Scott Feldman's avatar
      rocker: support prepare-commit transaction model · c4f20321
      Scott Feldman authored
      For rocker, support prepare-commit transaction model for setting attributes
      (and for adding objects).  This requires rocker to preallocate memory
      needed for the commit up front in the prepare phase.  Since rtnl_lock is
      held between prepare-commit, store the allocated memory on a queue hanging
      off of the rocker_port.  Also, in prepare phase, do everything right up to
      calling into HW.  The same code paths are tranversed in the driver for both
      prepare and commit phases.  In some cases, any state modified in the
      prepare phase must be reverted before returning so the commit phase makes
      the same decisions.
      
      As a consequence of holding rtnl_lock in process context for all attr sets
      (and obj adds), all memory is GFP_KERNEL allocated and we don't need to
      busy spin waiting for the device to complete the command.  So the bulk of
      this patch is simplifying the memory allocations to only use GFP_KERNEL and
      to remove the nowait flag and busy spin loop.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4f20321
    • Scott Feldman's avatar
      switchdev: convert parent_id_get to switchdev attr get · f8e20a9f
      Scott Feldman authored
      Switch ID is just a gettable port attribute.  Convert switchdev op
      switchdev_parent_id_get to a switchdev attr.
      
      Note: for sysfs and netlink interfaces, SWITCHDEV_ATTR_PORT_PARENT_ID is
      called with SWITCHDEV_F_NO_RECUSE to limit switch ID user-visiblity to only
      port netdevs.  So when a port is stacked under bond/bridge, the user can
      only query switch id via the switch ports, but not via the upper devices
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8e20a9f
    • Scott Feldman's avatar
      switchdev: introduce get/set attrs ops · 3094333d
      Scott Feldman authored
      Add two new swdev ops for get/set switch port attributes.  Most swdev
      interactions on a port are gets or sets on port attributes, so rather than
      adding ops for each attribute, let's define clean get/set ops for all
      attributes, and then we can have clear, consistent rules on how attributes
      propagate on stacked devs.
      
      Add the basic algorithms for get/set attr ops.  Use the same recusive algo
      to walk lower devs we've used for STP updates, for example.  For get,
      compare attr value for each lower dev and only return success if attr
      values match across all lower devs.  For sets, set the same attr value for
      all lower devs.  We'll use a two-phase prepare-commit transaction model for
      sets.  In the first phase, the driver(s) are asked if attr set is OK.  If
      all OK, the commit attr set in second phase.  A driver would NACK the
      prepare phase if it can't set the attr due to lack of resources or support,
      within it's control.  RTNL lock must be held across both phases because
      we'll recurse all lower devs first in prepare phase, and then recurse all
      lower devs again in commit phase.  If any lower dev fails the prepare
      phase, we need to abort the transaction for all lower devs.
      
      If lower dev recusion isn't desired, allow a flag SWITCHDEV_F_NO_RECURSE to
      indicate get/set only work on port (lowest) device.
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3094333d
    • Jiri Pirko's avatar
      switchdev: s/swdev_/switchdev_/ · 9d47c0a2
      Jiri Pirko authored
      Turned out that "switchdev" sticks. So just unify all related terms to use
      this prefix.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Acked-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d47c0a2
    • Jiri Pirko's avatar
      switchdev: s/netdev_switch_/switchdev_/ and s/NETDEV_SWITCH_/SWITCHDEV_/ · ebb9a03a
      Jiri Pirko authored
      Turned out that "switchdev" sticks. So just unify all related terms to use
      this prefix.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarScott Feldman <sfeldma@gmail.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Acked-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebb9a03a
    • David Ward's avatar
      net_sched: gred: add TCA_GRED_LIMIT attribute · a3eb95f8
      David Ward authored
      In a GRED qdisc, if the default "virtual queue" (VQ) does not have drop
      parameters configured, then packets for the default VQ are not subjected
      to RED and are only dropped if the queue is larger than the net_device's
      tx_queue_len. This behavior is useful for WRED mode, since these packets
      will still influence the calculated average queue length and (therefore)
      the drop probability for all of the other VQs. However, for some drivers
      tx_queue_len is zero. In other cases the user may wish to make the limit
      the same for all VQs (including the default VQ with no drop parameters).
      
      This change adds a TCA_GRED_LIMIT attribute to set the GRED queue limit,
      in bytes, during qdisc setup. (This limit is in bytes to be consistent
      with the drop parameters.) The default limit is the same as for a bfifo
      queue (tx_queue_len * psched_mtu). If the drop parameters of any VQ are
      configured with a smaller limit than the GRED queue limit, that VQ will
      still observe the smaller limit instead.
      Signed-off-by: default avatarDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3eb95f8
    • Nicolas Schichan's avatar
    • David S. Miller's avatar
      Merge branch 'netdev_page_frags' · 8df29145
      David S. Miller authored
      Alexander Duyck says:
      
      ====================
      Refactor netdev page frags and move them into mm/
      
      This patch series addresses several things.
      
      First I found an issue in the performance of the pfmemalloc check from
      build_skb.  To work around it I have provided a cached copy of pfmemalloc
      to be used in __netdev_alloc_skb and __napi_alloc_skb.
      
      Second I moved the page fragment allocation logic into the mm tree and
      added functionality for freeing page fragments.  I had to fix igb before I
      could do this as it was using a reference to NETDEV_FRAG_PAGE_MAX_SIZE
      incorrectly.
      
      Finally I went through and replaced all of the duplicate code that was
      calling put_page and replaced it with calls to skb_free_frag.
      
      With these changes in place a simple receive and drop test increased from a
      packet rate of 8.9Mpps to 9.8Mpps.  The gains breakdown as follows:
      
      8.9Mpps	Before			9.8Mpps	After
      ------------------------	------------------------
      7.8%	put_compound_page	9.1%	__free_page_frag
      3.9%	skb_free_head
      1.1%	put_page
      
      4.9%	build_skb		3.8%	__napi_alloc_skb
      2.5%	__alloc_rx_skb
      1.9%	__napi_alloc_skb
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8df29145