1. 07 Oct, 2021 39 commits
    • Grzegorz Nitka's avatar
      ice: switchdev slow path · f5396b8a
      Grzegorz Nitka authored
      Slow path means allowing packet to go from uplink to representor
      and from representor to correct VF on Rx site and from VF to
      representor and to uplink on Tx site.
      
      To accomplish this driver, has to set correct Tx descriptor. When
      packet is sent from representor to VF, destination should be
      set to VF VSI. When packet is sent from uplink port destination
      should be uplink to bypass switch infrastructure and send packet
      outside.
      
      On Rx site driver should check source VSI field from Rx descriptor
      and based on that forward packed to correct netdev. To allow
      this there is a target netdevs table in control plane VSI
      struct.
      Co-developed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarGrzegorz Nitka <grzegorz.nitka@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      f5396b8a
    • Grzegorz Nitka's avatar
      ice: rebuild switchdev when resetting all VFs · b3be918d
      Grzegorz Nitka authored
      As resetting all VFs behaves mostly like creating new VFs also
      eswitch infrastructure has to be recreated. The easiest way to
      do that is to rebuild eswitch after resetting VFs.
      
      Implement helper functions to start and stop all representors
      queues. This is used to disable traffic on port representors.
      
      In rebuild path:
      - NAPI has to be disabled
      - eswitch environment has to be set up
      - new port representors have to be created, because the old
      one had pointer to not existing VFs
      - new control plane VSI ring should be remapped
      - NAPI hast to be enabled
      - rxdid has to be set to FLEX_NIC_2, because this descriptor id
      support source_vsi, which is needed on control plane VSI queues
      - port representors queues have to be started
      Signed-off-by: default avatarGrzegorz Nitka <grzegorz.nitka@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      b3be918d
    • Grzegorz Nitka's avatar
      ice: enable/disable switchdev when managing VFs · 1c54c839
      Grzegorz Nitka authored
      Only way to enable switchdev is to create VFs when the eswitch
      mode is set to switchdev. Check if correct mode is set and
      enable switchdev in function which creating VFs.
      
      Disable switchdev when user change number of VFs to 0. Changing
      eswitch mode back to legacy when VFs are created in switchdev
      mode isn't allowed.
      
      As switchdev takes care of managing filter rules, adding new
      rules on VF is blocked.
      
      In case of resetting VF driver has to update pointer in ice_repr
      struct, because after reset VSI related things can change.
      Co-developed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: default avatarGrzegorz Nitka <grzegorz.nitka@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      1c54c839
    • Grzegorz Nitka's avatar
      ice: introduce new type of VSI for switchdev · f66756e0
      Grzegorz Nitka authored
      New type of VSI has to be defined for switchdev control plane
      VSI. Number of allocated Tx and Rx queue has to be equal to
      amount of VFs, because each port representor should have one
      Tx and Rx queue.
      
      Also to not increase number of used irqs too much, control plane
      VSI uses only one q_vector and handle all queues in one irq.
      To allow handling all queues in one irq , new function to clean
      msix for eswitch was introduced. This function will schedule napi
      for each representor instead of scheduling it only for one like in
      normal clean irq function.
      
      Only one additional msix has to be requested. Always try to request
      it in ice_ena_msix_range function.
      Signed-off-by: default avatarGrzegorz Nitka <grzegorz.nitka@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      f66756e0
    • Grzegorz Nitka's avatar
      ice: set and release switchdev environment · 1a1c40df
      Grzegorz Nitka authored
      Switchdev environment has to be set up when user create VFs
      and eswitch mode is switchdev. Release is done when user
      delete all VFs.
      
      Data path in this implementation is based on control plane VSI.
      This VSI is used to pass traffic from port representors to
      corresponding VFs and vice versa. Default TX rule has to be
      added to forward packet to control plane VSI. This will redirect
      packets from VFs which don't match other rules to control plane
      VSI.
      
      On RX side default rule is added on uplink VSI to receive all
      traffic that doesn't match other rules. When setting switchdev
      environment all other rules from VFs should be removed. Packet to
      VFs will be forwarded by control plane VSI.
      
      As VF without any mac rules can't send any packet because of
      antispoof mechanism, VSI antispoof should be turned off on each VFs.
      
      To send packet from representor to correct VSI, destination VSI
      field in TX descriptor will have to be filled. Allow that by
      setting destination override bit in control plane VSI security config.
      
      Packet from VFs will be received on control plane VSI. Driver
      should decide to which netdev forward the packet. Decision is
      made based on src_vsi field from descriptor. There is a target
      netdev list in control plane VSI struct which choose netdev
      based on src_vsi number.
      Co-developed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarGrzegorz Nitka <grzegorz.nitka@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      1a1c40df
    • Michal Swiatkowski's avatar
      ice: allow changing lan_en and lb_en on dflt rules · bd676b29
      Michal Swiatkowski authored
      There is no way to change default lan_en and lb_en flags while
      adding new rule. Add function that allows changing these flags
      on ICE_SW_LKUP_DFLT recipe and any rule id.
      
      lan_en allows packet to go outside if rule is matched. Clearing
      this bit will block packet from sending it outside.
      
      lb_en allows packet to be forwarded to other VSI. Clearing
      this bit will block packet from forwarding it to other VSI.
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      bd676b29
    • Michal Swiatkowski's avatar
      ice: manage VSI antispoof and destination override · ff5411ef
      Michal Swiatkowski authored
      Implement functions to make setting VSI security config easier.
      Main function ice_update_security fills security section field and
      checks against error in updating VSI. Reset functions are responsible
      for correct filling config according to user expectations.
      
      This helper is needed because destination override is located in
      this section. Driver has to set this bit to allow strering Tx packet
      on VSI based on value in Tx descriptors.
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      ff5411ef
    • Michal Swiatkowski's avatar
      ice: allow process VF opcodes in different ways · ac19e03e
      Michal Swiatkowski authored
      In switchdev driver shouldn't add MAC, VLAN and promisc
      filters on iavf demand but should return success to not
      break normal iavf flow.
      
      Achieve that by creating table of functions pointer with
      default functions used to parse iavf command. While parse
      iavf command, call correct function from table instead of
      calling function direct.
      
      When port representors are being created change functions
      in table to new one that behaves correctly for switchdev
      puprose (ignoring new filters).
      
      Change back to default ops when representors are being
      removed.
      Co-developed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      ac19e03e
    • Michal Swiatkowski's avatar
      ice: introduce VF port representor · 37165e3f
      Michal Swiatkowski authored
      Port representor is used to manage VF from host side. To allow
      it each created representor registers netdevice with random hw
      address. Also devlink port is created for all representors.
      
      Port representor name is created based on switch id or managed
      by devlink core if devlink port was registered with success.
      
      Open and stop ndo ops are implemented to allow managing the VF
      link state. Link state is tracked in VF struct.
      
      Struct ice_netdev_priv is extended by pointer to representor
      field. This is needed to get correct representor from netdev
      struct mostly used in ndo calls.
      
      Implement helper functions to check if given netdev is netdev of
      port representor (ice_is_port_repr_netdev) and to get representor
      from netdev (ice_netdev_to_repr).
      
      As driver mostly will create or destroy port representors on all
      VFs instead of on single one, write functions to add and remove
      representor for each VF.
      
      Representor struct contains pointer to source VSI, which is VSI
      configured on VF, backpointer to VF, backpointer to netdev,
      q_vector pointer and metadata_dst which will be used in data path.
      Co-developed-by: default avatarGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: default avatarGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      37165e3f
    • Wojciech Drewek's avatar
      ice: Move devlink port to PF/VF struct · 2ae0aa47
      Wojciech Drewek authored
      Keeping devlink port inside VSI data structure causes some issues.
      Since VF VSI is released during reset that means that we have to
      unregister devlink port and register it again every time reset is
      triggered. With the new changes in devlink API it
      might cause deadlock issues. After calling
      devlink_port_register/devlink_port_unregister devlink API is going to
      lock rtnl_mutex. It's an issue when VF reset is triggered in netlink
      operation context (like setting VF MAC address or VLAN),
      because rtnl_lock is already taken by netlink. Another call of
      rtnl_lock from devlink API results in dead-lock.
      
      By moving devlink port to PF/VF we avoid creating/destroying it
      during reset. Since this patch, devlink ports are created during
      ice_probe, destroyed during ice_remove for PF and created during
      ice_repr_add, destroyed during ice_repr_rem for VF.
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      2ae0aa47
    • Michal Swiatkowski's avatar
      ice: support basic E-Switch mode control · 3ea9bd5d
      Michal Swiatkowski authored
      Write set and get eswitch mode functions used by devlink
      ops. Use new pf struct member eswitch_mode to track current
      eswitch mode in driver.
      
      Changing eswitch mode is only allowed when there are no
      VFs created.
      
      Create new file for eswitch related code.
      
      Add config flag ICE_SWITCHDEV to allow user to choose if
      switchdev support should be enabled or disabled.
      
      Use case examples:
      - show current eswitch mode ('legacy' is the default one)
      [root@localhost]# devlink dev eswitch show pci/0000:03:00.1
      pci/0000:03:00.1: mode legacy
      
      - move to 'switchdev' mode
      [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode
      switchdev
      [root@localhost]# devlink dev eswitch show pci/0000:03:00.1
      pci/0000:03:00.1: mode switchdev
      
      - create 2 VFs
      [root@localhost]# echo 2 > /sys/class/net/ens4f1/device/sriov_numvfs
      
      - unsuccessful attempt to change eswitch mode while VFs are created
      [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy
      devlink answers: Operation not supported
      
      - destroy VFs
      [root@localhost]# echo 0 > /sys/class/net/ens4f1/device/sriov_numvfs
      
      - restore 'legacy' mode
      [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy
      [root@localhost]# devlink dev eswitch show pci/0000:03:00.1
      pci/0000:03:00.1: mode legacy
      Co-developed-by: default avatarGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: default avatarGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      3ea9bd5d
    • Gustavo A. R. Silva's avatar
      ethernet: ti: cpts: Use devm_kcalloc() instead of devm_kzalloc() · c514fbb6
      Gustavo A. R. Silva authored
      Use 2-factor multiplication argument form devm_kcalloc() instead
      of devm_kzalloc().
      
      Link: https://github.com/KSPP/linux/issues/162Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Link: https://lore.kernel.org/r/20211006181115.GA913499@embeddedorSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c514fbb6
    • Gustavo A. R. Silva's avatar
      36371876
    • Gustavo A. R. Silva's avatar
      net: mana: Use kcalloc() instead of kzalloc() · 149ef7b2
      Gustavo A. R. Silva authored
      Use 2-factor multiplication argument form kcalloc() instead
      of kzalloc().
      
      Link: https://github.com/KSPP/linux/issues/162Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Reviewed-by: default avatarDexuan Cui <decui@microsoft.com>
      Link: https://lore.kernel.org/r/20211006180927.GA913456@embeddedorSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      149ef7b2
    • Gustavo A. R. Silva's avatar
      2b8a0f15
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-next-2021-10-07' of... · 44cc24b0
      David S. Miller authored
      Merge tag 'wireless-drivers-next-2021-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      wireless-drivers-next patches for v5.16
      
      First set of patches for v5.16. ath11k getting most of new features
      this time. Other drivers also have few new features, and of course the
      usual set of fixes and cleanups all over.
      
      Major changes:
      
      rtw88
      
      * support adaptivity for ETSI/JP DFS region
      
      * 8821c: support RFE type4 wifi NIC
      
      brcmfmac
      
      * DMI nvram filename quirk for Cyberbook T116 tablet
      
      ath9k
      
      * load calibration data and pci init values via nvmem subsystem
      
      ath11k
      
      * include channel rx and tx time in survey dump statistics
      
      * support for setting fixed Wi-Fi 6 rates from user space
      
      * support for 80P80 and 160 MHz bandwidths
      
      * spectral scan support for QCN9074
      
      * support for calibration data files per radio
      
      * support for calibration data via eeprom
      
      * support for rx decapsulation offload (data frames in 802.3 format)
      
      * support channel 2 in 6 GHz band
      
      ath10k
      
      * include frame time stamp in beacon and probe response frames
      
      wcn36xx
      
      * enable Idle Mode Power Save (IMPS) to reduce power consumption during idle
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44cc24b0
    • David S. Miller's avatar
      Merge branch 'dev_addr-fw-helpers' · 5a98dcf5
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      net: add a helpers for loading netdev->dev_addr from FW
      
      We're trying to make all writes to netdev->dev_addr go via helpers.
      A lot of places pass netdev->dev_addr to of_get_ethdev_address() and
      device_get_ethdev_addr() so this set adds new functions which wrap
      the functionality.
      
      v2 performs suggested code moves, adds a couple additional clean ups
      on the device property side, and an extra patch converting drivers
      which can benefit from device_get_ethdev_address().
      
      v3 removes OF_NET and corrects kdoc.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a98dcf5
    • Jakub Kicinski's avatar
      ethernet: make more use of device_get_ethdev_address() · 894b0fb0
      Jakub Kicinski authored
      Convert a few drivers to device_get_ethdev_address(),
      saving a few LoC.
      
      The check if addr is valid in netsec is superfluous,
      device_get_ethdev_addr() already checks that (in
      fwnode_get_mac_addr()).
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      894b0fb0
    • Jakub Kicinski's avatar
      ethernet: use device_get_ethdev_address() · b8eeac56
      Jakub Kicinski authored
      Use the new device_get_ethdev_address() helper for the cases
      where dev->dev_addr is passed in directly as the destination.
      
        @@
        expression dev, np;
        @@
        - device_get_mac_address(np, dev->dev_addr, ETH_ALEN)
        + device_get_ethdev_address(np, dev)
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8eeac56
    • Jakub Kicinski's avatar
      eth: fwnode: add a helper for loading netdev->dev_addr · d9eb4490
      Jakub Kicinski authored
      Commit 406f42fa ("net-next: When a bond have a massive amount
      of VLANs...") introduced a rbtree for faster Ethernet address look
      up. To maintain netdev->dev_addr in this tree we need to make all
      the writes to it got through appropriate helpers.
      
      There is a handful of drivers which pass netdev->dev_addr as
      the destination buffer to device_get_mac_address(). Add a helper
      which takes a dev pointer instead, so it can call an appropriate
      helper.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9eb4490
    • Jakub Kicinski's avatar
      eth: fwnode: remove the addr len from mac helpers · 0a14501e
      Jakub Kicinski authored
      All callers pass in ETH_ALEN and the function itself
      will return -EINVAL for any other address length.
      Just assume it's ETH_ALEN like all other mac address
      helpers (nvm, of, platform).
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a14501e
    • Jakub Kicinski's avatar
      eth: fwnode: change the return type of mac address helpers · 8017c4d8
      Jakub Kicinski authored
      fwnode_get_mac_address() and device_get_mac_address()
      return a pointer to the buffer that was passed to them
      on success or NULL on failure. None of the callers
      care about the actual value, only if it's NULL or not.
      
      These semantics differ from of_get_mac_address() which
      returns an int so to avoid confusion make the device
      helpers return an errno.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8017c4d8
    • Jakub Kicinski's avatar
      device property: move mac addr helpers to eth.c · 433baf07
      Jakub Kicinski authored
      Move the mac address helpers out, eth.c already contains
      a bunch of similar helpers.
      Suggested-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      433baf07
    • Jakub Kicinski's avatar
      ethernet: use of_get_ethdev_address() · 9ca01b25
      Jakub Kicinski authored
      Use the new of_get_ethdev_address() helper for the cases
      where dev->dev_addr is passed in directly as the destination.
      
        @@
        expression dev, np;
        @@
        - of_get_mac_address(np, dev->dev_addr)
        + of_get_ethdev_address(np, dev)
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ca01b25
    • Jakub Kicinski's avatar
      of: net: add a helper for loading netdev->dev_addr · d466effe
      Jakub Kicinski authored
      Commit 406f42fa ("net-next: When a bond have a massive amount
      of VLANs...") introduced a rbtree for faster Ethernet address look
      up. To maintain netdev->dev_addr in this tree we need to make all
      the writes to it got through appropriate helpers.
      
      There are roughly 40 places where netdev->dev_addr is passed
      as the destination to a of_get_mac_address() call. Add a helper
      which takes a dev pointer instead, so it can call an appropriate
      helper.
      
      Note that of_get_mac_address() already assumes the address is
      6 bytes long (ETH_ALEN) so use eth_hw_addr_set().
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d466effe
    • Jakub Kicinski's avatar
      of: net: move of_net under net/ · e330fb14
      Jakub Kicinski authored
      Rob suggests to move of_net.c from under drivers/of/ somewhere
      to the networking code.
      Suggested-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e330fb14
    • David S. Miller's avatar
      Merge branch 'nfc-pn533-const' · 944b33ca
      David S. Miller authored
      Rikard Falkeborn says:
      
      ====================
      nfc: pn533: Constify ops-structs
      
      Constify a couple of ops-structs. This allows the compiler to put the
      static structs in read-only memory.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      944b33ca
    • Rikard Falkeborn's avatar
      nfc: pn533: Constify pn533_phy_ops · bc642817
      Rikard Falkeborn authored
      Neither the driver or the core modifies the pn533_phy_ops struct, so
      make them const to allow the compiler to put the static structs in
      read-only memory.
      Signed-off-by: default avatarRikard Falkeborn <rikard.falkeborn@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc642817
    • Rikard Falkeborn's avatar
      nfc: pn533: Constify serdev_device_ops · be5f60d8
      Rikard Falkeborn authored
      The only usage of pn532_serdev_ops is to pass its address to
      serdev_device_set_client_ops(), which takes a pointer to const
      serdev_device_ops as argument. Make it const to allow the compiler to
      put it in read-only memory.
      Signed-off-by: default avatarRikard Falkeborn <rikard.falkeborn@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be5f60d8
    • Jakub Kicinski's avatar
      Merge branch 'add-mdiobus_modify_changed-helper' · 6d99f85e
      Jakub Kicinski authored
      Russell King says:
      
      ====================
      Add mdiobus_modify_changed() helper
      
      Sean Anderson's recent patch series is introducing more read-write
      operations on the MDIO bus that only need to happen if a change is
      being made.
      
      We have similar logic in __mdiobus_modify_changed(), but we didn't
      add its correponding locked variant mdiobus_modify_changed() as we
      had very few users. Now that we are getting more, let's add the
      helper.
      ====================
      
      Link: https://lore.kernel.org/r/YV2UIa2eU+UjmWaE@shell.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6d99f85e
    • Russell King (Oracle)'s avatar
      net: phylink: use mdiobus_modify_changed() helper · 078e0b53
      Russell King (Oracle) authored
      Use the mdiobus_modify_changed() helper in the C22 PCS advertisement
      helper.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      078e0b53
    • Russell King (Oracle)'s avatar
      net: mdio: add mdiobus_modify_changed() · 79365f36
      Russell King (Oracle) authored
      Add mdiobus_modify_changed() helper to reflect the phylib and similar
      equivalents. This will avoid this functionality being open-coded, as
      has already happened in phylink, and it looks like other users will be
      appearing soon.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      79365f36
    • Jakub Kicinski's avatar
      Merge branch 'ethtool-add-ability-to-control-transceiver-modules-power-mode' · 4c827082
      Jakub Kicinski authored
      Ido Schimmel says:
      
      ====================
      ethtool: Add ability to control transceiver modules' power mode
      
      This patchset extends the ethtool netlink API to allow user space to
      control transceiver modules. Two specific APIs are added, but the plan
      is to extend the interface with more APIs in the future (see "Future
      plans").
      
      This submission is a complete rework of a previous submission [1] that
      tried to achieve the same goal by allowing user space to write to the
      EEPROMs of these modules. It was rejected as it could have enabled user
      space binary blob drivers.
      
      However, the main issue is that by directly writing to some pages of
      these EEPROMs, we are interfering with the entity that is controlling
      the modules (kernel / device firmware). In addition, some functionality
      cannot be implemented solely by writing to the EEPROM, as it requires
      the assertion / de-assertion of hardware signals (e.g., "ResetL" pin in
      SFF-8636).
      
      Motivation
      ==========
      
      The kernel can currently dump the contents of module EEPROMs to user
      space via the ethtool legacy ioctl API or the new netlink API. These
      dumps can then be parsed by ethtool(8) according to the specification
      that defines the memory map of the EEPROM. For example, SFF-8636 [2] for
      QSFP and CMIS [3] for QSFP-DD.
      
      In addition to read-only elements, these specifications also define
      writeable elements that can be used to control the behavior of the
      module. For example, controlling whether the module is put in low or
      high power mode to limit its power consumption.
      
      The CMIS specification even defines a message exchange mechanism (CDB,
      Command Data Block) on top of the module's memory map. This allows the
      host to send various commands to the module. For example, to update its
      firmware.
      
      Implementation
      ==============
      
      The ethtool netlink API is extended with two new messages,
      'ETHTOOL_MSG_MODULE_SET' and 'ETHTOOL_MSG_MODULE_GET', that allow user
      space to set and get transceiver module parameters. Specifically, the
      'ETHTOOL_A_MODULE_POWER_MODE_POLICY' attribute allows user space to
      control the power mode policy of the module in order to limit its power
      consumption. See detailed description in patch #1.
      
      The user API is designed to be generic enough so that it could be used
      for modules with different memory maps (e.g., SFF-8636, CMIS).
      
      The only implementation of the device driver API in this series is for a
      MAC driver (mlxsw) where the module is controlled by the device's
      firmware, but it is designed to be generic enough so that it could also
      be used by implementations where the module is controlled by the kernel.
      
      Testing and introspection
      =========================
      
      See detailed description in patches #1 and #5.
      
      Patchset overview
      =================
      
      Patch #1 adds the initial infrastructure in ethtool along with the
      ability to control transceiver modules' power mode.
      
      Patches #2-#3 add required device registers in mlxsw.
      
      Patch #4 implements in mlxsw the ethtool operations added in patch #1.
      
      Patch #5 adds extended link states in order to allow user space to
      troubleshoot link down issues related to transceiver modules.
      
      Patch #6 adds support for these extended states in mlxsw.
      
      Future plans
      ============
      
      * Extend 'ETHTOOL_MSG_MODULE_SET' to control Tx output among other
      attributes.
      
      * Add new ethtool message(s) to update firmware on transceiver modules.
      
      * Extend ethtool(8) to parse more diagnostic information from CMIS
      modules. No kernel changes required.
      
      [1] https://lore.kernel.org/netdev/20210623075925.2610908-1-idosch@idosch.org/
      [2] https://members.snia.org/document/dl/26418
      [3] http://www.qsfp-dd.com/wp-content/uploads/2021/05/CMIS5p0.pdf
      
      Previous versions:
      [4] https://lore.kernel.org/netdev/20211003073219.1631064-1-idosch@idosch.org/
      [5] https://lore.kernel.org/netdev/20210824130344.1828076-1-idosch@idosch.org/
      [6] https://lore.kernel.org/netdev/20210818155202.1278177-1-idosch@idosch.org/
      [7] https://lore.kernel.org/netdev/20210809102152.719961-1-idosch@idosch.org/
      ====================
      
      Link: https://lore.kernel.org/r/20211006104647.2357115-1-idosch@idosch.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4c827082
    • Ido Schimmel's avatar
      mlxsw: Add support for transceiver module extended state · 235dbbec
      Ido Schimmel authored
      Add support for the transceiver module extended state and sub-state
      added in previous patch. The extended state is meant to describe link
      issues related to transceiver modules.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      235dbbec
    • Ido Schimmel's avatar
      ethtool: Add transceiver module extended state · 3dfb5112
      Ido Schimmel authored
      Add an extended state and sub-state to describe link issues related to
      transceiver modules.
      
      The 'ETHTOOL_LINK_EXT_SUBSTATE_MODULE_CMIS_NOT_READY' extended sub-state
      tells user space that port is unable to gain a carrier because the CMIS
      Module State Machine did not reach the ModuleReady (Fully Operational)
      state. For example, if the module is stuck at ModuleLowPwr or
      ModuleFault state. In case of the latter, user space can read the fault
      reason from the module's EEPROM and potentially reset it.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3dfb5112
    • Ido Schimmel's avatar
      mlxsw: Add ability to control transceiver modules' power mode · 0455dc50
      Ido Schimmel authored
      Implement support for ethtool_ops::.get_module_power_mode and
      ethtool_ops::set_module_power_mode.
      
      The get operation is implemented using the Management Cable IO and
      Notifications (MCION) register that reports the operational power mode
      of the module and its presence. In case a module is not present, its
      operational power mode is not reported to ethtool and user space. If not
      set before, the power mode policy is reported as "high", which is the
      default on Mellanox systems.
      
      The set operation is implemented using the Port Module Memory Map
      Properties (PMMP) register. The register instructs the device's firmware
      to transition a plugged-in module to / out of low power mode by writing
      to its memory map.
      
      When the power mode policy is set to 'auto', a module will not
      transition to low power mode as long as any ports using it are
      administratively up. Example:
      
       # devlink port split swp11 count 4
      
       # ethtool --set-module swp11s0 power-mode-policy auto
      
       $ ethtool --show-module swp11s0
       Module parameters for swp11s0:
       power-mode-policy auto
       power-mode low
      
       # ip link set dev swp11s0 up
      
       # ip link set dev swp11s1 up
      
       $ ethtool --show-module swp11s0
       Module parameters for swp11s0:
       power-mode-policy auto
       power-mode high
      
       # ip link set dev swp11s1 down
      
       $ ethtool --show-module swp11s0
       Module parameters for swp11s0:
       power-mode-policy auto
       power-mode high
      
       # ip link set dev swp11s0 down
      
       $ ethtool --show-module swp11s0
       Module parameters for swp11s0:
       power-mode-policy auto
       power-mode low
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0455dc50
    • Ido Schimmel's avatar
      mlxsw: reg: Add Management Cable IO and Notifications register · fc53f5fb
      Ido Schimmel authored
      Add the Management Cable IO and Notifications register. It will be used
      to retrieve the power mode status of a module in subsequent patches and
      whether a module is present in a cage or not.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fc53f5fb
    • Ido Schimmel's avatar
      mlxsw: reg: Add Port Module Memory Map Properties register · f10ba086
      Ido Schimmel authored
      Add the Port Module Memory Map Properties register. It will be used to
      set the power mode of a module in subsequent patches.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f10ba086
    • Ido Schimmel's avatar
      ethtool: Add ability to control transceiver modules' power mode · 353407d9
      Ido Schimmel authored
      Add a pair of new ethtool messages, 'ETHTOOL_MSG_MODULE_SET' and
      'ETHTOOL_MSG_MODULE_GET', that can be used to control transceiver
      modules parameters and retrieve their status.
      
      The first parameter to control is the power mode of the module. It is
      only relevant for paged memory modules, as flat memory modules always
      operate in low power mode.
      
      When a paged memory module is in low power mode, its power consumption
      is reduced to the minimum, the management interface towards the host is
      available and the data path is deactivated.
      
      User space can choose to put modules that are not currently in use in
      low power mode and transition them to high power mode before putting the
      associated ports administratively up. This is useful for user space that
      favors reduced power consumption and lower temperatures over reduced
      link up times. In QSFP-DD modules the transition from low power mode to
      high power mode can take a few seconds and this transition is only
      expected to get longer with future / more complex modules.
      
      User space can control the power mode of the module via the power mode
      policy attribute ('ETHTOOL_A_MODULE_POWER_MODE_POLICY'). Possible
      values:
      
      * high: Module is always in high power mode.
      
      * auto: Module is transitioned by the host to high power mode when the
        first port using it is put administratively up and to low power mode
        when the last port using it is put administratively down.
      
      The operational power mode of the module is available to user space via
      the 'ETHTOOL_A_MODULE_POWER_MODE' attribute. The attribute is not
      reported to user space when a module is not plugged-in.
      
      The user API is designed to be generic enough so that it could be used
      for modules with different memory maps (e.g., SFF-8636, CMIS).
      
      The only implementation of the device driver API in this series is for a
      MAC driver (mlxsw) where the module is controlled by the device's
      firmware, but it is designed to be generic enough so that it could also
      be used by implementations where the module is controlled by the CPU.
      
      CMIS testing
      ============
      
       # ethtool -m swp11
       Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
       ...
       Module State                              : 0x03 (ModuleReady)
       LowPwrAllowRequestHW                      : Off
       LowPwrRequestSW                           : Off
      
      The module is not in low power mode, as it is not forced by hardware
      (LowPwrAllowRequestHW is off) or by software (LowPwrRequestSW is off).
      
      The power mode can be queried from the kernel. In case
      LowPwrAllowRequestHW was on, the kernel would need to take into account
      the state of the LowPwrRequestHW signal, which is not visible to user
      space.
      
       $ ethtool --show-module swp11
       Module parameters for swp11:
       power-mode-policy high
       power-mode high
      
      Change the power mode policy to 'auto':
      
       # ethtool --set-module swp11 power-mode-policy auto
      
      Query the power mode again:
      
       $ ethtool --show-module swp11
       Module parameters for swp11:
       power-mode-policy auto
       power-mode low
      
      Verify with the data read from the EEPROM:
      
       # ethtool -m swp11
       Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
       ...
       Module State                              : 0x01 (ModuleLowPwr)
       LowPwrAllowRequestHW                      : Off
       LowPwrRequestSW                           : On
      
      Put the associated port administratively up which will instruct the host
      to transition the module to high power mode:
      
       # ip link set dev swp11 up
      
      Query the power mode again:
      
       $ ethtool --show-module swp11
       Module parameters for swp11:
       power-mode-policy auto
       power-mode high
      
      Verify with the data read from the EEPROM:
      
       # ethtool -m swp11
       Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
       ...
       Module State                              : 0x03 (ModuleReady)
       LowPwrAllowRequestHW                      : Off
       LowPwrRequestSW                           : Off
      
      Put the associated port administratively down which will instruct the
      host to transition the module to low power mode:
      
       # ip link set dev swp11 down
      
      Query the power mode again:
      
       $ ethtool --show-module swp11
       Module parameters for swp11:
       power-mode-policy auto
       power-mode low
      
      Verify with the data read from the EEPROM:
      
       # ethtool -m swp11
       Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
       ...
       Module State                              : 0x01 (ModuleLowPwr)
       LowPwrAllowRequestHW                      : Off
       LowPwrRequestSW                           : On
      
      SFF-8636 testing
      ================
      
       # ethtool -m swp13
       Identifier                                : 0x11 (QSFP28)
       ...
       Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) enabled
       Power set                                 : Off
       Power override                            : On
       ...
       Transmit avg optical power (Channel 1)    : 0.7733 mW / -1.12 dBm
       Transmit avg optical power (Channel 2)    : 0.7649 mW / -1.16 dBm
       Transmit avg optical power (Channel 3)    : 0.7790 mW / -1.08 dBm
       Transmit avg optical power (Channel 4)    : 0.7837 mW / -1.06 dBm
       Rcvr signal avg optical power(Channel 1)  : 0.9302 mW / -0.31 dBm
       Rcvr signal avg optical power(Channel 2)  : 0.9079 mW / -0.42 dBm
       Rcvr signal avg optical power(Channel 3)  : 0.8993 mW / -0.46 dBm
       Rcvr signal avg optical power(Channel 4)  : 0.8778 mW / -0.57 dBm
      
      The module is not in low power mode, as it is not forced by hardware
      (Power override is on) or by software (Power set is off).
      
      The power mode can be queried from the kernel. In case Power override
      was off, the kernel would need to take into account the state of the
      LPMode signal, which is not visible to user space.
      
       $ ethtool --show-module swp13
       Module parameters for swp13:
       power-mode-policy high
       power-mode high
      
      Change the power mode policy to 'auto':
      
       # ethtool --set-module swp13 power-mode-policy auto
      
      Query the power mode again:
      
       $ ethtool --show-module swp13
       Module parameters for swp13:
       power-mode-policy auto
       power-mode low
      
      Verify with the data read from the EEPROM:
      
       # ethtool -m swp13
       Identifier                                : 0x11 (QSFP28)
       Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) not enabled
       Power set                                 : On
       Power override                            : On
       ...
       Transmit avg optical power (Channel 1)    : 0.0000 mW / -inf dBm
       Transmit avg optical power (Channel 2)    : 0.0000 mW / -inf dBm
       Transmit avg optical power (Channel 3)    : 0.0000 mW / -inf dBm
       Transmit avg optical power (Channel 4)    : 0.0000 mW / -inf dBm
       Rcvr signal avg optical power(Channel 1)  : 0.0000 mW / -inf dBm
       Rcvr signal avg optical power(Channel 2)  : 0.0000 mW / -inf dBm
       Rcvr signal avg optical power(Channel 3)  : 0.0000 mW / -inf dBm
       Rcvr signal avg optical power(Channel 4)  : 0.0000 mW / -inf dBm
      
      Put the associated port administratively up which will instruct the host
      to transition the module to high power mode:
      
       # ip link set dev swp13 up
      
      Query the power mode again:
      
       $ ethtool --show-module swp13
       Module parameters for swp13:
       power-mode-policy auto
       power-mode high
      
      Verify with the data read from the EEPROM:
      
       # ethtool -m swp13
       Identifier                                : 0x11 (QSFP28)
       ...
       Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) enabled
       Power set                                 : Off
       Power override                            : On
       ...
       Transmit avg optical power (Channel 1)    : 0.7934 mW / -1.01 dBm
       Transmit avg optical power (Channel 2)    : 0.7859 mW / -1.05 dBm
       Transmit avg optical power (Channel 3)    : 0.7885 mW / -1.03 dBm
       Transmit avg optical power (Channel 4)    : 0.7985 mW / -0.98 dBm
       Rcvr signal avg optical power(Channel 1)  : 0.9325 mW / -0.30 dBm
       Rcvr signal avg optical power(Channel 2)  : 0.9034 mW / -0.44 dBm
       Rcvr signal avg optical power(Channel 3)  : 0.9086 mW / -0.42 dBm
       Rcvr signal avg optical power(Channel 4)  : 0.8885 mW / -0.51 dBm
      
      Put the associated port administratively down which will instruct the
      host to transition the module to low power mode:
      
       # ip link set dev swp13 down
      
      Query the power mode again:
      
       $ ethtool --show-module swp13
       Module parameters for swp13:
       power-mode-policy auto
       power-mode low
      
      Verify with the data read from the EEPROM:
      
       # ethtool -m swp13
       Identifier                                : 0x11 (QSFP28)
       ...
       Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) not enabled
       Power set                                 : On
       Power override                            : On
       ...
       Transmit avg optical power (Channel 1)    : 0.0000 mW / -inf dBm
       Transmit avg optical power (Channel 2)    : 0.0000 mW / -inf dBm
       Transmit avg optical power (Channel 3)    : 0.0000 mW / -inf dBm
       Transmit avg optical power (Channel 4)    : 0.0000 mW / -inf dBm
       Rcvr signal avg optical power(Channel 1)  : 0.0000 mW / -inf dBm
       Rcvr signal avg optical power(Channel 2)  : 0.0000 mW / -inf dBm
       Rcvr signal avg optical power(Channel 3)  : 0.0000 mW / -inf dBm
       Rcvr signal avg optical power(Channel 4)  : 0.0000 mW / -inf dBm
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      353407d9
  2. 06 Oct, 2021 1 commit