1. 03 Oct, 2017 25 commits
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · af14827f
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2017-10-03
      
      This series contains updates to fm10k only.
      
      Jake provides majority of the changes in this series, starting with using
      fm10k_prepare_for_reset() if we lose PCIe link.  Before we would detach
      the device and close the netdev, which left a lot of items still active,
      such as the Tx/Rx resources.  This could cause problems where register
      reads would return potentially invalid values and would result in unknown
      driver behavior, so call fm10k_prepare_for_reset() much like we do for
      suspend/resume cycles.  This will attempt to shutdown as much as possible
      to prevent possible issues.  Then replaced the PCI specific legacy power
      management hooks with the new generic power management hooks for both
      suspend and hibernate.  Introduced a workqueue item which monitors a
      queue of MAC and VLAN requests since a large number of MAC address or
      VLAN updates at once can overload the mailbox with too many messages at
      once.  Fixed a cppcheck warning by properly declaring the min_rate and
      max_rate variables in the declaration and definition for .ndo_set_vf_bw,
      rather than using "unused" for the minimum rates.
      
      Joe Perches fixes the backward logic when using net_ratelimit().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af14827f
    • Florian Westphal's avatar
      net: core: decouple ifalias get/set from rtnl lock · 6c557001
      Florian Westphal authored
      Device alias can be set by either rtnetlink (rtnl is held) or sysfs.
      
      rtnetlink hold the rtnl mutex, sysfs acquires it for this purpose.
      Add an extra mutex for it and use rcu to protect concurrent accesses.
      
      This allows the sysfs path to not take rtnl and would later allow
      to not hold it when dumping ifalias.
      
      Based on suggestion from Eric Dumazet.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c557001
    • Mahesh Bandewar's avatar
      bonding: speed/duplex update at NETDEV_UP event · 4d2c0cda
      Mahesh Bandewar authored
      Some NIC drivers don't have correct speed/duplex settings at the
      time they send NETDEV_UP notification and that messes up the
      bonding state. Especially 802.3ad mode which is very sensitive
      to these settings. In the current implementation we invoke
      bond_update_speed_duplex() when we receive NETDEV_UP, however,
      ignore the return value. If the values we get are invalid
      (UNKNOWN), then slave gets removed from the aggregator with
      speed and duplex set to UNKNOWN while link is still marked as UP.
      
      This patch fixes this scenario. Also 802.3ad mode is sensitive to
      these conditions while other modes are not, so making sure that it
      doesn't change the behavior for other modes.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d2c0cda
    • Dan Carpenter's avatar
      mlxsw: spectrum: Add missing error code on allocation failure · b5c7d4e5
      Dan Carpenter authored
      We accidentally return success if the kmalloc_array() call fails.
      
      Fixes: 0e14c777 ("mlxsw: spectrum: Add the multicast routing hardware logic")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5c7d4e5
    • Dan Carpenter's avatar
      mlxsw: spectrum: Fix check for IS_ERR() instead of NULL · b508e0b6
      Dan Carpenter authored
      mlxsw_afa_block_create() doesn't return error pointers, it returns NULL
      on error.
      
      Fixes: 0e14c777 ("mlxsw: spectrum: Add the multicast routing hardware logic")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b508e0b6
    • Colin Ian King's avatar
      net: dsa: mt7530: make functions mt7530_phy_write static · 360cc342
      Colin Ian King authored
      The function mt7530_phy_write is local to the source and does not need to
      be in global scope, so make it static.
      
      Cleans up sparse warnings:
      symbol 'mt7530_phy_write' was not declared. Should it be static?
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      360cc342
    • Colin Ian King's avatar
      net: dsa: lan9303: make functions lan9303_mdio_phy_{read|write} static · 161ae6b0
      Colin Ian King authored
      The functions lan9303_mdio_phy_write and lan9303_mdio_phy_read are local
      to the source and do not need to be in global scope, so make them static.
      
      Cleans up sparse warnings:
      symbol 'lan9303_mdio_phy_write' was not declared. Should it be static?
      symbol 'lan9303_mdio_phy_read' was not declared. Should it be static?
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      161ae6b0
    • David S. Miller's avatar
      Merge branch 'mlxsw-mc-route-offload' · da885b61
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: Add support for partial multicast route offload
      
      Yotam says:
      
      Previous patchset introduced support for offloading multicast MFC routes to
      the Spectrum hardware. As described in that patchset, no partial offloading
      is supported, i.e if a route has one output interface which is not a valid
      offloadable device (e.g. pimreg device, dummy device, management NIC), the
      route is trapped to the CPU and the forwarding is done in slow-path.
      
      Add support for partial offloading of multicast routes, by letting the
      hardware to forward the packet to all the in-hardware devices, while the
      kernel ipmr module will continue forwarding to all other interfaces.
      
      Similarly to the bridge, the kernel ipmr module will forward a marked
      packet to an interface only if the interface has a different parent ID than
      the packet's ingress interfaces.
      
      The first patch introduces the offload_mr_fwd_mark skb field, which can be
      used by offloading drivers to indicate that a packet had already gone
      through multicast forwarding in hardware, similarly to the offload_fwd_mark
      field that indicates that a packet had already gone through L2 forwarding
      in hardware.
      
      Patches 2 and 3 change the ipmr module to not forward packets that had
      already been forwarded by the hardware, i.e. packets that are marked with
      offload_mr_fwd_mark and the ingress VIF shares the same parent ID with the
      egress VIF.
      
      Patches 4, 5, 6 and 7 add the support in the mlxsw Spectrum driver for trap
      and forward routes, while marking the trapped packets with the
      offload_mr_fwd_mark.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da885b61
    • Yotam Gigi's avatar
      mlxsw: spectrum: mr: Support trap-and-forward routes · f60c2549
      Yotam Gigi authored
      Add the support of trap-and-forward route action in the multicast routing
      offloading logic. A route will be set to trap-and-forward action if one (or
      more) of its output interfaces is not offload-able, i.e. does not have a
      valid Spectrum RIF.
      
      This way, a route with mixed output VIFs list, which contains both
      offload-able and un-offload-able devices can go through partial offloading
      in hardware, and the rest will be done in the kernel ipmr module.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f60c2549
    • Yotam Gigi's avatar
      mlxsw: spectrum: mr_tcam: Add trap-and-forward multicast route · 607feade
      Yotam Gigi authored
      In addition to the current multicast route actions, which include trap
      route action and a forward route action, add the trap-and-forward multicast
      route action, and implement it in the multicast routing hardware logic.
      
      To implement that, add a trap-and-forward ACL action as the last action in
      the route flexible action set. The used trap is the ACL2 trap, which marks
      the packets with offload_mr_forward_mark, to prevent the packet from being
      forwarded again by the kernel.
      
      Note: At that stage the offloading logic does not support trap-and-forward
      multicast routes. This patch adds the support only in the hardware logic.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      607feade
    • Yotam Gigi's avatar
      mlxsw: spectrum: Add trap for multicast trap-and-forward routes · a0040c8c
      Yotam Gigi authored
      When a multicast route is configured with trap-and-forward action, the
      packets should be marked with skb->offload_mr_fwd_mark, in order to prevent
      the packets from being forwarded again by the kernel ipmr module.
      
      Due to this, it is not possible to use the already existing multicast trap
      (MLXSW_TRAP_ID_ACL1) as the packet should be marked differently. Add the
      MLXSW_TRAP_ID_ACL2 which is for trap-and-forward multicast routes, and set
      the offload_mr_fwd_mark skb field in its handler.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0040c8c
    • Yotam Gigi's avatar
      mlxsw: acl: Introduce ACL trap and forward action · 26787243
      Yotam Gigi authored
      Use trap/discard flex action to implement trap and forward. The action will
      later be used for multicast routing, as the multicast routing mechanism is
      done using ACL flexible actions in Spectrum hardware. Using that action, it
      will be possible to implement a trap-and-forward route.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26787243
    • Yotam Gigi's avatar
      ipv4: ipmr: Don't forward packets already forwarded by hardware · a5bc9294
      Yotam Gigi authored
      Change the ipmr module to not forward packets if:
       - The packet is marked with the offload_mr_fwd_mark, and
       - Both input interface and output interface share the same parent ID.
      
      This way, a packet can go through partial multicast forwarding in the
      hardware, where it will be forwarded only to the devices that share the
      same parent ID (AKA, reside inside the same hardware). The kernel will
      forward the packet to all other interfaces.
      
      To do this, add the ipmr_offload_forward helper, which per skb, ingress VIF
      and egress VIF, returns whether the forwarding was offloaded to hardware.
      The ipmr_queue_xmit frees the skb and does not forward it if the result is
      a true value.
      
      All the forwarding path code compiles out when the CONFIG_NET_SWITCHDEV is
      not set.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5bc9294
    • Yotam Gigi's avatar
      ipv4: ipmr: Add the parent ID field to VIF struct · 5d8b3e69
      Yotam Gigi authored
      In order to allow the ipmr module to do partial multicast forwarding
      according to the device parent ID, add the device parent ID field to the
      VIF struct. This way, the forwarding path can use the parent ID field
      without invoking switchdev calls, which requires the RTNL lock.
      
      When a new VIF is added, set the device parent ID field in it by invoking
      the switchdev_port_attr_get call.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d8b3e69
    • Yotam Gigi's avatar
      skbuff: Add the offload_mr_fwd_mark field · abf4bb6b
      Yotam Gigi authored
      Similarly to the offload_fwd_mark field, the offload_mr_fwd_mark field is
      used to allow partial offloading of MFC multicast routes.
      
      Switchdev drivers can offload MFC multicast routes to the hardware by
      registering to the FIB notification chain. When one of the route output
      interfaces is not offload-able, i.e. has different parent ID, the route
      cannot be fully offloaded by the hardware. Examples to non-offload-able
      devices are a management NIC, dummy device, pimreg device, etc.
      
      Similar problem exists in the bridge module, as one bridge can hold
      interfaces with different parent IDs. At the bridge, the problem is solved
      by the offload_fwd_mark skb field.
      
      Currently, when a route cannot go through full offload, the only solution
      for a switchdev driver is not to offload it at all and let the packet go
      through slow path.
      
      Using the offload_mr_fwd_mark field, a driver can indicate that a packet
      was already forwarded by hardware to all the devices with the same parent
      ID as the input device. Further patches in this patch-set are going to
      enhance ipmr to skip multicast forwarding to devices with the same parent
      ID if a packets is marked with that field.
      
      The reason why the already existing "offload_fwd_mark" bit cannot be used
      is that a switchdev driver would want to make the distinction between a
      packet that has already gone through L2 forwarding but did not go through
      multicast forwarding, and a packet that has already gone through both L2
      and multicast forwarding.
      
      For example: when a packet is ingressing from a switchport enslaved to a
      bridge, which is configured with multicast forwarding, the following
      scenarios are possible:
       - The packet can be trapped to the CPU due to exception while multicast
         forwarding (for example, MTU error). In that case, it had already gone
         through L2 forwarding in the hardware, thus A switchdev driver would
         want to set the skb->offload_fwd_mark and not the
         skb->offload_mr_fwd_mark.
       - The packet can also be trapped due to a pimreg/dummy device used as one
         of the output interfaces. In that case, it can go through both L2 and
         (partial) multicast forwarding inside the hardware, thus a switchdev
         driver would want to set both the skb->offload_fwd_mark and
         skb->offload_mr_fwd_mark.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellaox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abf4bb6b
    • Arjun Vynipadath's avatar
      cxgb4: Update comment for min_mtu · a047fbae
      Arjun Vynipadath authored
      We have lost a comment for minimum mtu value set for netdevice with
      'commit d894be57 ("ethernet: use net core MTU range checking in
      more drivers"). Updating it accordingly.
      Signed-off-by: default avatarArjun Vynipadath <arjun@chelsio.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a047fbae
    • Jacob Keller's avatar
      fm10k: fix mis-ordered parameters in declaration for .ndo_set_vf_bw · 3e256ac5
      Jacob Keller authored
      We've had support for setting both a minimum and maximum bandwidth via
      .ndo_set_vf_bw since commit 883a9ccb ("fm10k: Add support for SR-IOV
      to driver", 2014-09-20).
      
      Likely because we do not support minimum rates, the declaration
      mis-ordered the "unused" parameter, which causes warnings when analyzed
      with cppcheck.
      
      Fix this warning by properly declaring the min_rate and max_rate
      variables in the declaration and definition (rather than using
      "unused"). Also rename "rate" to max_rate so as to clarify that we only
      support setting the maximum rate.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      3e256ac5
    • Jacob Keller's avatar
      fm10k: prefer %s and __func__ for diagnostic prints · 87be9892
      Jacob Keller authored
      Don't hard code the function names in the diagnostic output when these
      reset related routines fail. Instead, use %s and __func__ so that future
      refactors don't need to change the print outs.
      
      Additionally, while we are here, add missing function header comments
      for the new reset_prepare and reset_done function handlers.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      87be9892
    • Joe Perches's avatar
      fm10k: Fix misuse of net_ratelimit() · c0ad8ef3
      Joe Perches authored
      Correct the backward logic using !net_ratelimit()
      
      Miscellanea:
      
      o Add a blank line before the error return label
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c0ad8ef3
    • Jacob Keller's avatar
    • Jacob Keller's avatar
      fm10k: use the MAC/VLAN queue for VF<->PF MAC/VLAN requests · 1f5c27e5
      Jacob Keller authored
      Now that we have a working MAC/VLAN queue for handling MAC/VLAN messages
      from the netdev, replace the default handler for the VF<->PF messages.
      This new handler is very similar to the default code, but uses the
      MAC/VLAN queue instead of sending the message directly. Unfortunately we
      can't easily re-use the default code, so we'll just replace the entire
      function.
      
      This ensures that a VF requesting a large number of VLANs or MAC
      addresses does not start a reset cycle, as explained in the commit which
      introduced the message queue.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarNgai-mint Kwan <ngai-mint.kwan@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1f5c27e5
    • Jacob Keller's avatar
      fm10k: introduce a message queue for MAC/VLAN messages · fc917368
      Jacob Keller authored
      Under some circumstances, when dealing with a large number of MAC
      address or VLAN updates at once, the fm10k driver, particularly the VFs
      can overload the mailbox with too many messages at once.
      
      This results in a mailbox timeout, which causes the driver to initiate
      a reset. During the reset, we re-send all the same messages that
      originally caused the timeout. This results in a cycle of resets each
      triggering a future reset.
      
      To fix or avoid this, we introduce a workqueue item which monitors
      a queue of MAC and VLAN requests. These requests are queued to the end
      of the list, and we process as a FIFO periodically.
      
      Initially we only handle requests for the netdev, but we do handle
      unicast MAC addresses, multicast MAC addresses, and update VLAN
      requests.
      
      A future patch will add support to use this queue for handling MAC
      update requests from the VF<->PF mailbox.
      
      The MAC/VLAN work item will keep checking to make sure that each request
      does not overflow the mailbox and cause a timeout. If it might, then the
      work item will reschedule itself a short time later. This avoids any
      reset cycle, since we never send the message if the mailbox is not
      ready.
      
      As an alternative, we tried increasing the mailbox message FIFO, but
      this just delays the problem and results in needless memory waste on the
      system. Our new message queue is dynamically allocated so only uses as
      much memory as it needs. Additionally, it need not be contiguous like
      the Tx and Rx FIFOs.
      
      Note that this patch chose to only create a queue for MAC and VLAN
      messages, since these are the only messages sent in a large enough
      volume to cause the reset loop. Other messages are very unlikely to
      overflow the mailbox Tx FIFO so easily.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fc917368
    • Jacob Keller's avatar
      fm10k: use generic PM hooks instead of legacy PCIe power hooks · 8249c47c
      Jacob Keller authored
      Replace the PCI specific legacy power management hooks with the new
      generic power management hooks which work properly for both suspend and
      hibernate. The new generic system is better and properly handles the
      lower level PCIe power management rather than forcing the driver to
      handle it.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      8249c47c
    • Jacob Keller's avatar
      fm10k: use spinlock to implement mailbox lock · b4fcd436
      Jacob Keller authored
      Lets not re-invent the locking wheel. Remove our bitlock and use
      a proper spinlock instead.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b4fcd436
    • Jacob Keller's avatar
      fm10k: prepare_for_reset() when we lose PCIe Link · 0b40f457
      Jacob Keller authored
      If we lose PCIe link, such as when an unannounced PFLR event occurs, or
      when a device is surprise removed, we currently detach the device and
      close the netdev. This unfortunately leaves a lot of things still
      active, such as the msix_mbx_pf IRQ, and Tx/Rx resources.
      
      This can cause problems because the register reads will return
      potentially invalid values which may result in unknown driver behavior.
      
      Begin the process of resetting using fm10k_prepare_for_reset(), much in
      the same way as the suspend and resume cycle does. This will attempt to
      shutdown as much as possible, in order to prevent possible issues.
      
      A naive implementation for this has issues, because there are now
      multiple flows calling the reset logic and setting a reset bit. This
      would cause problems, because the "re-attach" routine might call
      fm10k_handle_reset() prior to the reset actually finishing. Instead,
      we'll add state bits to indicate which flow actually initiated the
      reset.
      
      For the general reset flow, we'll assume that if someone else is
      resetting that we do not need to handle it at all, so it does not need
      its own state bit. For the suspend case, we will simply issue a warning
      indicating that we are attempting to recover from this case when
      resuming.
      
      For the detached subtask, we'll simply refuse to re-attach until we've
      actually initiated a reset as part of that flow.
      
      Finally, we'll stop attempting to manage the mailbox subtask when we're
      detached, since there's nothing we can do if we don't have a PCIe
      address.
      
      Overall this produces a much cleaner shutdown and recovery cycle for
      a PCIe surprise remove event.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      0b40f457
  2. 02 Oct, 2017 15 commits