1. 17 Jan, 2018 40 commits
    • David S. Miller's avatar
      Merge branch 'net-sched-allow-qdiscs-to-share-filter-block-instances' · ca46abd6
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      net: sched: allow qdiscs to share filter block instances
      
      Currently the filters added to qdiscs are independent. So for example if you
      have 2 netdevices and you create ingress qdisc on both and you want to add
      identical filter rules both, you need to add them twice. This patchset
      makes this easier and mainly saves resources allowing to share all filters
      within a qdisc - I call it a "filter block". Also this helps to save
      resources when we do offload to hw for example to expensive TCAM.
      
      So back to the example. First, we create 2 qdiscs. Both will share
      block number 22. "22" is just an identification:
      $ tc qdisc add dev ens7 ingress_block 22 ingress
                              ^^^^^^^^^^^^^^^^
      $ tc qdisc add dev ens8 ingress_block 22 ingress
                              ^^^^^^^^^^^^^^^^
      
      If we don't specify "block" command line option, no shared block would
      be created:
      $ tc qdisc add dev ens9 ingress
      
      Now if we list the qdiscs, we will see the block index in the output:
      
      $ tc qdisc
      qdisc ingress ffff: dev ens7 parent ffff:fff1 ingress_block 22
      qdisc ingress ffff: dev ens8 parent ffff:fff1 ingress_block 22
      qdisc ingress ffff: dev ens9 parent ffff:fff1
      
      To make is more visual, the situation looks like this:
      
         ens7 ingress qdisc                 ens7 ingress qdisc
                |                                  |
                |                                  |
                +---------->  block 22  <----------+
      
      Unlimited number of qdiscs may share the same block.
      
      Note that this patchset introduces block sharing support also for clsact
      qdisc:
      $ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact
      $ tc qdisc show dev ens10
      qdisc clsact ffff: dev ens10 parent ffff:fff1 ingress_block 23 egress_block 24
      
      We can add filter using the block index:
      
      $ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop
      
      Note we cannot use the qdisc for filter manipulations of shared blocks:
      
      $ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 192.168.100.2 action drop
      Error: This filter block is shared. Please use the block index to manipulate the filters.
      
      We will see the same output if we list filters for ingress qdisc of
      ens7 and ens8, also for the block 22:
      
      $ tc filter show block 22
      filter block 22 protocol ip pref 25 flower chain 0
      filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
      ...
      
      $ tc filter show dev ens7 ingress
      filter block 22 protocol ip pref 25 flower chain 0
      filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
      ...
      
      $ tc filter show dev ens8 ingress
      filter block 22 protocol ip pref 25 flower chain 0
      filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
      ...
      
      ---
      v10->v11:
      - patch 2:
       - fixed error path when register_pernet_subsys fails pointed out by Cong
      - patch 9:
       - rebased on top of the current net-next
      
      v9->v10:
      - patch 7:
       - fixed ifindex magic in the patch description
      - userspace patches:
       - added manpages and patch descriptions
      
      v8->v9:
      - patch "net: sched: add rt netlink message type for block get" was
        removed, userspace check filter existence using qdisc dump
      
      v7->v8:
      - patch 7:
       - added comment to ifindex block magic
      - patch 9:
       - new patch
      - patch 10:
       - base this on the patch that introduces qdisc-generic block index
         attributes parsing/dumping
      - patch 13:
       - rebased on top of current net-next
      
      v6->v7:
      - patch 1:
       - unsquashed shared block patch that was previously squashed by mistake
       - fixed error path in block create - freeing chain 0
      - patch 2:
       - new patch - splitted from the previous one as it got accidentaly
         squashed in the rebasing process in the past
       - converted to idr extended
       - removed auto-generating of block indexes. Callers have to explicily
         tell that the block is shared by passing non-zero block index
       - fixed error path in block get ext - freeing chain 0
      - patch 7:
       - changed extack message for block index handle as suggested by DaveA
       - added extack message when block index does not exist
       - the block ifindex magic is in define and change to 0xffffffff
         as suggested by Jamal
      - patch 8:
       - new patch implementing RTM_GETBLOCK in order to query if the block
         with some index exists
      - patch 9:
       - adjust to the core changes and check block index attributes for being 0
      
      v5->v6:
      - added patch 6 that introduces block handle
      
      v4->v5:
      - patch 5:
       - add tracking of binding of devs that are unable to offload and check
         that before block cbs call.
      
      v3->v4:
      - patch 1:
       - rebased on top of the current net-next
       - added some extack strings
      - patch 3:
       - rebased on top of the current net-next
      - patch 5:
       - propagate netdev_ops->ndo_setup_tc error up to tcf_block_offload_bind
         caller
      - patch 7:
       - rebased on top of the current net-next
      
      v2->v3:
      - removed original patch 1, removing tp->q cls_bpf dependency. Fixed by
        Jakub in the meantime.
      - patch 1:
       - rebased on top of the current net-next
      - patch 5:
       - new patch
      - patch 8:
       - removed "p_" prefix from block index function args
      - patch 10:
       - add tc offload feature handling
      ====================
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca46abd6
    • Jiri Pirko's avatar
      mlxsw: spectrum_acl: Pass mlxsw_sp_port down to ruleset bind/unbind ops · 4b23258d
      Jiri Pirko authored
      No need to convert from mlxsw_sp_port to net_device and back again.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b23258d
    • Jiri Pirko's avatar
      mlxsw: spectrum_acl: Implement TC block sharing · 3aaff323
      Jiri Pirko authored
      Benefit from the prepared TC and in-driver ACL infrastructure and
      introduce block sharing offload. For that, a new struct "block" is
      introduced in spectrum_acl in order to hold a list of specific
      block-port bindings.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3aaff323
    • Jiri Pirko's avatar
      mlxsw: spectrum_acl: Don't store netdev and ingress for ruleset unbind · 02caf499
      Jiri Pirko authored
      Instead, pass netdev and ingress flag to ruleset unbind op.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02caf499
    • Jiri Pirko's avatar
      mlxsw: spectrum_acl: Reshuffle code around mlxsw_sp_acl_ruleset_create/destroy · 9fe5fdf2
      Jiri Pirko authored
      In order to prepare for follow-up changes, make the bind/unbind helpers
      very simple. That required move of ht insertion/removal and bind/unbind
      calls into mlxsw_sp_acl_ruleset_create/destroy.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fe5fdf2
    • Jiri Pirko's avatar
      net: sched: allow ingress and clsact qdiscs to share filter blocks · 51ab2994
      Jiri Pirko authored
      Benefit from the previously introduced shared filter blocks
      infrastructure and allow ingress and clsact qdisc instances to share
      filter blocks. The block index is coming from userspace as qdisc option.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51ab2994
    • Jiri Pirko's avatar
      net: sched: introduce ingress/egress block index attributes for qdisc · d47a6b0e
      Jiri Pirko authored
      Introduce two new attributes to be used for qdisc creation and dumping.
      One for ingress block, one for egress block. Introduce a set of ops that
      qdisc which supports block sharing would implement.
      
      Passing block indexes in qdisc change is not supported yet and it is
      checked and forbidded.
      
      In future, these attributes are to be reused for specifying block
      indexes for classes as well. As of this moment however, it is not
      supported so a check is in place to forbid it.
      Suggested-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d47a6b0e
    • Jiri Pirko's avatar
      net: sched: use block index as a handle instead of qdisc when block is shared · 7960d1da
      Jiri Pirko authored
      As the tcm_ifindex with value TCM_IFINDEX_MAGIC_BLOCK is invalid ifindex,
      use it to indicate that we work with block, instead of qdisc.
      So if tcm_ifindex is set to TCM_IFINDEX_MAGIC_BLOCK, tcm_parent is used
      to carry block_index.
      
      If the block is set to be shared between at least 2 qdiscs, it is
      forbidden to use the qdisc handle to add/delete filters. In that case,
      userspace has to pass block_index.
      
      Also, for dump of the filters, in case the block is shared in between at
      least 2 qdiscs, the each filter is dumped with tcm_ifindex value
      TCM_IFINDEX_MAGIC_BLOCK and tcm_parent set to block_index. That gives
      the user clear indication, that the filter belongs to a shared block
      and not only to one qdisc under which it is dumped.
      Suggested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7960d1da
    • Jiri Pirko's avatar
      net: sched: keep track of offloaded filters and check tc offload feature · caa72601
      Jiri Pirko authored
      During block bind, we need to check tc offload feature. If it is
      disabled yet still the block contains offloaded filters, forbid the
      bind. Also forbid to register callback for a block that already
      contains offloaded filters, as the play back is not supported now.
      For keeping track of offloaded filters there is a new counter
      introduced, alongside with couple of helpers called from cls_* code.
      These helpers set and clear TCA_CLS_FLAGS_IN_HW flag.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caa72601
    • Jiri Pirko's avatar
      net: sched: remove classid and q fields from tcf_proto · edf6711c
      Jiri Pirko authored
      Both are no longer used, so remove them.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edf6711c
    • Jiri Pirko's avatar
      net: sched: introduce block mechanism to handle netif_keep_dst calls · f36fe1c4
      Jiri Pirko authored
      Couple of classifiers call netif_keep_dst directly on q->dev. That is
      not possible to do directly for shared blocke where multiple qdiscs are
      owning the block. So introduce a infrastructure to keep track of the
      block owners in list and use this list to implement block variant of
      netif_keep_dst.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f36fe1c4
    • Jiri Pirko's avatar
      net: sched: avoid usage of tp->q in tcf_classify · 9d3aaff3
      Jiri Pirko authored
      Use block index in the messages instead.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d3aaff3
    • Jiri Pirko's avatar
      net: sched: introduce shared filter blocks infrastructure · 48617387
      Jiri Pirko authored
      Allow qdiscs to share filter blocks among them. Each qdisc type has to
      use block get/put extended modifications that enable sharing.
      Shared blocks are tracked within each net namespace and identified
      by u32 index. This index is passed from user during the qdisc creation.
      If user passes index that is not used by any other qdisc, new block
      is created. If user passes index that is already used, the existing
      block will be re-used.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48617387
    • Jiri Pirko's avatar
      net: sched: introduce support for multiple filter chain pointers registration · a9b19443
      Jiri Pirko authored
      So far, there was possible only to register a single filter chain
      pointer to block->chain[0]. However, when the blocks will get shareable,
      we need to allow multiple filter chain pointers registration.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9b19443
    • David S. Miller's avatar
      Merge branch 'bnxt_en-next' · c9a82421
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Updates for net-next.
      
      First, we upgrade the firmware interface spec.  Due to a change in
      the toolchains, the auto-generated bnxt_hsi.h does not match the
      old bnxt_hsi.h and the patch is really big.  This should be just
      one-time.  Going forward, changes should be incremental.
      
      The next 10 patches implement a new scheme for the PF and VF drivers
      to allocate and reserve resources.  The new scheme is more flexible
      and allows dynamic and asymmetric distribution of resources, whereas
      the old scheme is static and even distribution.
      
      The last few patches add cacheline size setting, a couple of PCI IDs,
      better management of VF MAC address, and a better parent switchdev ID
      for dual-port devices.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9a82421
    • Sathya Perla's avatar
      bnxt_en: export a common switchdev PARENT_ID for all reps of an adapter · dd4ea1da
      Sathya Perla authored
      Currently the driver exports different switchdev PARENT_IDs for
      representors belonging to different SR-IOV PF-pools of an adapter.
      This is not correct as the adapter can switch across all vports
      of an adapter. This patch fixes this by exporting a common switchdev
      PARENT_ID for all reps of an adapter. The PCIE DSN is used as the id.
      Signed-off-by: default avatarSathya Perla <sathya.perla@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd4ea1da
    • Michael Chan's avatar
      bnxt_en: Add cache line size setting to optimize performance. · c3480a60
      Michael Chan authored
      The chip supports 64-byte and 128-byte cache line size for more optimal
      DMA performance when matched to the CPU cache line size.  The default is 64.
      If the system is using 128-byte cache line size, set it to 128.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3480a60
    • Vasundhara Volam's avatar
      bnxt_en: Forward VF MAC address to the PF. · 91cdda40
      Vasundhara Volam authored
      Forward hwrm_func_vf_cfg command from VF to PF driver, to store
      VF MAC address in PF's context.  This will allow "ip link show"
      to display all VF MAC addresses.
      
      Maintain 2 locations of MAC address in VF info structure, one for
      a PF assigned MAC and one for VF assigned MAC.
      
      Display VF assigned MAC in "ip link show", only if PF assigned MAC is
      not valid.
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91cdda40
    • Vasundhara Volam's avatar
    • Michael Chan's avatar
      bnxt_en: Expand bnxt_check_rings() to check all resources. · 8f23d638
      Michael Chan authored
      bnxt_check_rings() is called by ethtool, XDP setup, and ndo_setup_tc()
      to see if there are enough resources to support the new configuration.
      Expand the call to test all resources if the firmware supports the new
      API.  With the more flexible resource allocation scheme, this call must
      be made to check that all resources are available before committing to
      allocate the resources.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f23d638
    • Michael Chan's avatar
      bnxt_en: Implement new method for the PF to assign SRIOV resources. · 4673d664
      Michael Chan authored
      Instead of the old method of evenly dividing the resources to the VFs,
      use the new firmware API to specify min and max resources for each VF.
      This way, there is more flexibility for each VF to allocate more or less
      resources.
      
      The min is the absolute minimum for each VF to function.  The max is the
      global resources minus the resources used by the PF.  Each VF is
      guaranteed the min.  Up to max resources may be available for some VFs.
      
      The PF driver can use one of 2 strategies specified in NVRAM to assign
      the resources.  The old legacy strategy of evenly dividing the resources
      or the new flexible strategy.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4673d664
    • Michael Chan's avatar
      bnxt_en: Reserve resources for RFS. · 6a1eef5b
      Michael Chan authored
      In bnxt_rfs_capable(), add call to reserve vnic resources to support
      NTUPLE.  Return true if we can successfully reserve enough vnics.
      Otherwise, reserve the minimum 1 VNIC for normal operations not
      supporting NTUPLE and return false.
      
      Also, suppress warning message about not enough resources for NTUPLE when
      only 1 RX ring is in use.  NTUPLE filters by definition require multiple
      RX rings.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a1eef5b
    • Michael Chan's avatar
      bnxt_en: Implement new method to reserve rings. · 674f50a5
      Michael Chan authored
      The new method will call firmware to reserve the desired tx, rx, cmpl
      rings, ring groups, stats context, and vnic resources.  A second query
      call will check the actual resources that firmware is able to reserve.
      The driver will then trim and adjust based on the actual resources
      provided by firmware.  The driver will then reserve the final resources
      in use.
      
      This method is a more flexible way of using hardware resources.  The
      resources are not fixed and can by adjusted by firmware.  The driver
      adapts to the available resources that the firmware can reserve for
      the driver.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      674f50a5
    • Michael Chan's avatar
      bnxt_en: Set initial default RX and TX ring numbers the same in combined mode. · 58ea801a
      Michael Chan authored
      In combined mode, the driver is currently not setting RX and TX ring
      numbers the same when firmware can allocate more RX than TX or vice versa.
      This will confuse the user as the ethtool convention assumes they are the
      same in combined mode.  Fix it by adding bnxt_trim_dflt_sh_rings() to trim
      RX and TX ring numbers to be the same as the completion ring number in
      combined mode.
      
      Note that if TCs are enabled and/or XDP is enabled, the number of TX rings
      will not be the same as RX rings in combined mode.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58ea801a
    • Michael Chan's avatar
      bnxt_en: Add the new firmware API to query hardware resources. · be0dd9c4
      Michael Chan authored
      The new API HWRM_FUNC_RESOURCE_QCAPS provides min and max hardware
      resources.  Use the new API when it is supported by firmware.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be0dd9c4
    • Michael Chan's avatar
      bnxt_en: Refactor hardware resource data structures. · 6a4f2947
      Michael Chan authored
      In preparation for new firmware APIs to allocate hardware resources,
      add a new struct bnxt_hw_resc to hold various min, max and reserved
      resources.  This new structure is common for PFs and VFs.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a4f2947
    • Michael Chan's avatar
      bnxt_en: Restore MSIX after disabling SRIOV. · 80fcaf46
      Michael Chan authored
      After SRIOV has been enabled and disabled, the MSIX vectors assigned to
      the VFs have to be re-initialized.  Otherwise they cannot be re-used by
      the PF.  For example, increasing the number of PF rings after disabling
      SRIOV may fail if the PF uses MSIX vectors previously assigned to the VFs.
      
      To fix this, we add logic in bnxt_restore_pf_fw_resources() to close the
      NIC, clear and re-init MSIX, and re-open the NIC.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80fcaf46
    • Michael Chan's avatar
      bnxt_en: Refactor bnxt_close_nic(). · 86e953db
      Michael Chan authored
      Add a new __bnxt_close_nic() function to do all the work previously done
      in bnxt_close_nic() except waiting for SRIOV configuration.  The new
      function will be used in the next patch as part of SRIOV cleanup.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86e953db
    • Michael Chan's avatar
      bnxt_en: Update firmware interface to 1.9.0. · 894aa69a
      Michael Chan authored
      The version has new firmware APIs to allocate PF/VF resources more
      flexibly.
      
      New toolchains were used to generate this file, resulting in a one-time
      large diffstat.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      894aa69a
    • David S. Miller's avatar
      Merge branch 'dwmac-meson8b-clock-fixes-for-Meson8b' · ee81098e
      David S. Miller authored
      Martin Blumenstingl says:
      
      ====================
      dwmac-meson8b: clock fixes for Meson8b
      
      this series is now successfully tested, thus we think it's ready to be
      applied to your net-next tree.
      
      Emiliano reported [0] that he couldn't get dwmac-meson8b to work on his
      Odroid-C1. This is the (hopefully) final version of this series, which
      was successfully tested.
      
      Due to the fact that the public S805/S905/S912 datasheets all seem to
      be outdated regarding the description of the PRG_ETH0 (also called
      PRG_ETHERNET_ADDR0) register Linus Lüssing offered to help testing with
      an oscilloscope and an Odroid-C1. I would like to say HUGE thanks to him
      at this point as he spent hours figuring out the effects of the bits
      that are (though to be) relevant to get Ethernet working on the
      Odroid-C1.
      We tested three scenarios, all based on version 3 of this series:
      1) MPLL2 at ~500MHz, m250_div set to 1, bit 10 enabled
      this resulted in a clock rate twice as high as expected at the RGMII TX
      clock pin (250MHz instead of 125MHz for Gbit connections and 50MHz
      instead of 25MHz for 100Mbit/s connections). it did not change the
      rate at the XTAL_IN pin of PHY (which stayed consistenly at 25MHz)
      2) MPLL2 at ~250MHz, m250_div set to 1, bit 10 disabled
      the oscilloscope shows "no clock" for the RGMII TX clock pin at it's
      highest resolution (and random rates at lower resolutions). XTAL_IN is
      still at 25MHz
      3) MPLL2 at ~250MHz, m250_div set to 1, bit 10 enabled
      this resulted in a 125MHz signal at the RGMII TX clock pin for Gbit
      speeds and 25MHz for 100Mbit/s - both values are as expected. The rate
      on the XTAL_IN pin was at 25MHz
      -> boot-logs (with the PRG_ETH0 register value) and screenshots from the
      readings of the oscilloscope can be found at:
      https://metameute.de/~tux/linux/amlogic/odroidc1/ethernet/
      
      Version 4 of this series is based on the results from Linus Lüssing's
      help with the oscilloscope and Odroid-C1.
      Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
      only partially test this. @Emiliano: Could you please give this version
      a try and let me know about the results (preferably with a "Tested-by"
      if it works)?
      You obviously still need your two "ARM: dts: meson8b" patches which
      - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
      - enable Ethernet on the Odroid-C1 (according to your last thest a TX
        delay of 4ns is required to make it work properly)
      
      When testing on Meson8b this also needs a fix for the MPLL clock driver:
      "clk: meson: mpll: use 64-bit maths in params_from_rate", see:
      https://patchwork.kernel.org/patch/10131677/
      
      I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
      and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
      fine (so let's hope that this also fixes your Meson8b issue :)).
      
      changes since v4 at [4]:
      - dropped "RFT" status since Jerome tested this series successfully!
      - dropped PATCH #2 ("simplify generating the clock names"). I will
        improve the whole clock registration in a separate series. since that
        patch didn't really improve anything I dropped it for now
      - added Jerome's Acked-/Reviewed-/Tested-by's - many thanks!
      
      changes since v3 at [3]:
      - renamed the function PATCH #1 from meson8b_init_rgmii_clk to
        meson8b_init_rgmii_tx_clk since we now know what the register bits
        mean
      - rewrote PATCH #3 because bit 10 is a gate clock and it seems that
        there is an internal fixed divide-by-2 clock. see the patch
        description for a detailed explanation
      - updated the description of PATCH #4 and #5 as the clock we're trying
        to fix is the "RGMII TX" clock (old version stated that this is the
        "RGMII clock" or "PHY reference clock"). also updated the numbers in
        the description now that we have the clock hierarchy right (at least
        we hope so)
      
      changes since v2 at [2]:
      - added PATCH #2 to make the following patch easier
      - Emiliano reported that there's currently another bug in the
        dwmac-meson8b driver which prevents it from working with RGMII PHYs on
        Meson8b: bit 10 of the PRG_ETH0 register is configures a clock gate
        (instead of a divide by 5 or divide by 10 clock divider). This has not
        been visible on GXBB and later due to the input clock which always led
        to a selection of "divide by 10" (which is done internally in the IP
        block, but the bit actually means "enable RGMII clock output").
        PATCH #3 was added to address this issue.
      - the commit message of PATCH #4 and #5 (formerly PATCH #2 and #3) were
        updated and the patch itself rebased because the m25_div clock was
        removed with the new PATCH #3 (so some of the statements were not
        valid anymore)
      
      changes since v1 at [1]:
      - changed the subject of the cover-letter to indicate that this is all
        about the RGMII clock
      - added PATCH #1 which ensures that we don't unnecessarily change the
        parent clocks in RMII mode (and also makes the code easier to
        understand)
      - changed subject of PATCH #2 (formerly PATCH #1) to state that this
        is about the RGMII clock
      - added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
      - replaced PATCH #3 (formerly PATCH #2) with one that sets
        CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
        on Meson8b correctly
      
      [0] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005596.html
      [1] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005848.html
      [2] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005861.html
      [3] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005899.html
      [4] http://lists.infradead.org/pipermail/linux-amlogic/2018-January/006125.html
      ====================
      Tested-by: default avatarEmiliano Ingrassia <ingrassia@epigenesys.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee81098e
    • Martin Blumenstingl's avatar
      net: stmmac: dwmac-meson8b: propagate rate changes to the parent clock · fb7d38a7
      Martin Blumenstingl authored
      On Meson8b the only valid input clock is MPLL2. The bootloader
      configures that to run at 500002394Hz which cannot be divided evenly
      down to 125MHz using the m250_div clock. Currently the common clock
      framework chooses a m250_div of 2 - with the internal fixed
      "divide by 10" this results in a RGMII TX clock of 125001197Hz (120Hz
      above the requested 125MHz).
      
      Letting the common clock framework propagate the rate changes up to the
      parent of m250_mux allows us to get the best possible clock rate. With
      this patch the common clock framework calculates a rate of
      very-close-to-250MHz (249999701Hz to be exact) for the MPLL2 clock
      (which is the mux input). Dividing that by 2 (which is an internal,
      fixed divider for the RGMII TX clock) gives us an RGMII TX clock of
      124999850Hz (which is only 150Hz off the requested 125MHz, compared to
      1197Hz based on the MPLL2 rate set by u-boot and the Amlogic GPL kernel
      sources).
      
      SoCs from the Meson GX series are not affected by this change because
      the input clock is FCLK_DIV2 whose rate cannot be changed (which is fine
      since it's running at 1GHz, so it's already a multiple of 250MHz and
      125MHz).
      
      Fixes: 566e8251 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
      Suggested-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Reviewed-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Tested-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb7d38a7
    • Martin Blumenstingl's avatar
      net: stmmac: dwmac-meson8b: fix setting the RGMII TX clock on Meson8b · 433c6cab
      Martin Blumenstingl authored
      Meson8b only supports MPLL2 as clock input. The rate of the MPLL2 clock
      set by Odroid-C1's u-boot is close to (but not exactly) 500MHz. The
      exact rate is 500002394Hz, which is calculated in
      drivers/clk/meson/clk-mpll.c using the following formula:
      DIV_ROUND_UP_ULL((u64)parent_rate * SDM_DEN, (SDM_DEN * n2) + sdm)
      Odroid-C1's u-boot configures MPLL2 with the following values:
      - SDM_DEN = 16384
      - SDM = 1638
      - N2 = 5
      
      The 250MHz clock (m250_div) inside dwmac-meson8b driver is derived from
      the MPLL2 clock. Due to MPLL2 running slightly faster than 500MHz the
      common clock framework chooses a divider which is too big to generate
      the 250MHz clock (a divider of 2 would be needed, but this is rounded up
      to a divider of 3). This breaks the RTL8211F RGMII PHY on Odroid-C1
      because it requires a (close to) 125MHz RGMII TX clock (on Gbit speeds,
      the IP block internally divides that down to 25MHz on 100Mbit/s
      connections and 2.5MHz on 10Mbit/s connections - we don't need any
      special configuration for that).
      
      Round the divider to the closest value to prevent this issue on Meson8b.
      This means we'll now end up with a clock rate for the RGMII TX clock of
      125001197Hz (= 125MHz plus 1197Hz), which is close-enough to 125MHz.
      This has no effect on the Meson GX SoCs since there fclk_div2 is used as
      input clock, which has a rate of 1000MHz (and thus is divisible cleanly
      to 250MHz and 125MHz).
      
      Fixes: 566e8251 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
      Reported-by: default avatarEmiliano Ingrassia <ingrassia@epigenesys.com>
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Reviewed-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Tested-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      433c6cab
    • Martin Blumenstingl's avatar
      net: stmmac: dwmac-meson8b: fix internal RGMII clock configuration · 4f6a71b8
      Martin Blumenstingl authored
      Tests (using an oscilloscope and an Odroid-C1 board with a RTL8211F
      RGMII PHY) have shown that the PRG_ETH0 register behaves as follows:
      - bit 4 is a mux to choose between two parent clocks. according to the
        public S805 datasheet the only supported parent clock is MPLL2 (this
        was not verified using the oscilloscope).
        The public S805/S905 datasheet claims that this bit is reserved.
      - bits 9:7 control a one-based divider (register value 1 means "divide
        by 1", etc.) for the input clock. we call this clock the "m250_div"
        clock because it's value is always supposed to be (close to) 250MHz
        (see below for an explanation).
        The description in the public S805/S905 datasheet is a bit cryptic,
        but it comes down to "input clock = 250MHz * value" (which could also
        be expressed as "250MHz = input clock / value")
      - there seems to be an internal fixed divide-by-2 clock which takes the
        output from the m250_div and divides it by 2. This is not unusual on
        Amlogic SoCs, since the SDIO (MMC) driver also uses an internal fixed
        divide-by-2 clock.
        This is not documented in the public S805/S905 datasheet
      - bit 10 controls a gate clock which enables or disables the RGMII TX
        clock (which is an output on the MAC/SoC and an input in the PHY). we
        call this the "rgmii_tx_en" clock. if this bit is set to "0" the RGMII
        TX clock output is close to 0
        The description for this bit in the public S805/S905 datasheet is
        "Generate 25MHz clock for PHY". Based on these tests it's believed
        that this is wrong, and should probably read "Generate the 125MHz
        RGMII TX clock for the PHY"
      - the RGMII TX clock has to be set to 125MHz - the IP block adjusts the
        output (automatically) depending on the line speed (RGMII specifies
        that Gbit connections use a 125MHz clock, 100Mbit/s connections use a
        25MHz clock and 10Mbit/s connections use a 2.5MHz clock. only Gbit and
        100Mbit/s were tested with an oscilloscope). Due to the requirement
        that this clock always has to be set to 125MHz and due to the fixed
        divide-by-2 parent clock this means that m250_div will always end up
        with a rate of (close to) 250MHz.
      - bits 6:5 are the TX delay, which is also named "clock phase" in some
        of Amlogic's older GPL kernel sources.
      
      The PHY also has an XTAL_IN pin where a 25MHz clock has to be provided.
      Tests with the oscilloscope have shown that this is routed to a crystal
      right next to the RTL8211F PHY. The same seems to be true on the Khadas
      VIM2 (which uses a GXM SoC) board - however the 25MHz crystal is on the
      other side of the PCB there.
      
      This updates the clocks in the dwmac-meson8b driver by replacing the
      "m25_div" with the "rgmii_tx_en" clock and additionally introducing a
      fixed divide-by-2 clock between "m250_div" and "rgmii_tx_en".
      Now we also need to set a frequency of 125MHz on the RGMII clock
      (opposed to the 25MHz we set before, with that non-existing
      divide-by-5-or-10 divider).
      
      Special thanks go to Linus Lüssing for testing the various bits and
      checking the results with an oscilloscope on his Odroid-C1!
      
      Fixes: 566e8251 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
      Reported-by: default avatarEmiliano Ingrassia <ingrassia@epigenesys.com>
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Acked-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Tested-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f6a71b8
    • Martin Blumenstingl's avatar
      net: stmmac: dwmac-meson8b: only configure the clocks in RGMII mode · 37512b42
      Martin Blumenstingl authored
      Neither the m25_div_clk nor the m250_div_clk or m250_mux_clk are used in
      RMII mode. The m25_div_clk output is routed to the RGMII PHY's "RGMII
      clock".
      This means that we don't need to configure the clocks in RMII mode. The
      driver however did this - with no effect since the clocks are not routed
      to the PHY in RMII mode.
      
      While here also rename meson8b_init_clk to meson8b_init_rgmii_tx_clk to
      make it easier to understand the code.
      
      Fixes: 566e8251 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
      Signed-off-by: default avatarMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Tested-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37512b42
    • Jakub Kicinski's avatar
      net: sched: red: don't reset the backlog on every stat dump · 416ef9b1
      Jakub Kicinski authored
      Commit 0dfb33a0 ("sch_red: report backlog information") copied
      child's backlog into RED's backlog.  Back then RED did not maintain
      its own backlog counts.  This has changed after commit 2ccccf5f
      ("net_sched: update hierarchical backlog too") and commit d7f4f332
      ("sch_red: update backlog as well").  Copying is no longer necessary.
      
      Tested:
      
      $ tc -s qdisc show dev veth0
      qdisc red 1: root refcnt 2 limit 400000b min 30000b max 30000b ecn
       Sent 20942 bytes 221 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 1260b 14p requeues 14
        marked 0 early 0 pdrop 0 other 0
      qdisc tbf 2: parent 1: rate 1Kbit burst 15000b lat 3585.0s
       Sent 20942 bytes 221 pkt (dropped 0, overlimits 138 requeues 0)
       backlog 1260b 14p requeues 14
      
      Recently RED offload was added.  We need to make sure drivers don't
      depend on resetting the stats.  This means backlog should be treated
      like any other statistic:
      
        total_stat = new_hw_stat - prev_hw_stat;
      
      Adjust mlxsw.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarNogah Frankel <nogahf@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      416ef9b1
    • Saeed Mahameed's avatar
      net/mlx5: Fix build break · 2d83619d
      Saeed Mahameed authored
      The latest merge between net and net-next introduced a complier assert in
      mlx5 driver.  In hca_cap_bits older fields are kept along with newer
      fields that should have replaced them.
      
      Fixes: c02b3741 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d83619d
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c02b3741
      David S. Miller authored
      Overlapping changes all over.
      
      The mini-qdisc bits were a little bit tricky, however.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c02b3741
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 7018d1b3
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2018-01-17
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Add initial BPF map offloading for nfp driver. Currently only
         programs were supported so far w/o being able to access maps.
         Offloaded programs are right now only allowed to perform map
         lookups, and control path is responsible for populating the
         maps. BPF core infrastructure along with nfp implementation is
         provided, from Jakub.
      
      2) Various follow-ups to Josef's BPF error injections. More
         specifically that includes: properly check whether the error
         injectable event is on function entry or not, remove the percpu
         bpf_kprobe_override and rather compare instruction pointer
         with original one, separate error-injection from kprobes since
         it's not limited to it, add injectable error types in order to
         specify what is the expected type of failure, and last but not
         least also support the kernel's fault injection framework, all
         from Masami.
      
      3) Various misc improvements and cleanups to the libbpf Makefile.
         That is, fix permissions when installing BPF header files, remove
         unused variables and functions, and also install the libbpf.h
         header, from Jesper.
      
      4) When offloading to nfp JIT and the BPF insn is unsupported in the
         JIT, then reject right at verification time. Also fix libbpf with
         regards to ELF section name matching by properly treating the
         program type as prefix. Both from Quentin.
      
      5) Add -DPACKAGE to bpftool when including bfd.h for the disassembler.
         This is needed, for example, when building libfd from source as
         bpftool doesn't supply a config.h for bfd.h. Fix from Jiong.
      
      6) xdp_convert_ctx_access() is simplified since it doesn't need to
         set target size during verification, from Jesper.
      
      7) Let bpftool properly recognize BPF_PROG_TYPE_CGROUP_DEVICE
         program types, from Roman.
      
      8) Various functions in BPF cpumap were not declared static, from Wei.
      
      9) Fix a double semicolon in BPF samples, from Luis.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7018d1b3
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 8cbab92d
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "We had a few more items creep up over the last week. Given we are in
        -rc8, these are obviously limited to bugs that have a big downside and
        for which we are certain of the fix.
      
        The first is a straight up oops bug that all you have to do is read
        the code to see it's a guaranteed 100% oops bug.
      
        The second is a use-after-free issue. We get away lucky if the queue
        we are shutting down is empty, but if it isn't, we can end up oopsing.
        We really need to drain the queue before destroying it.
      
        The final one is an issue with bad user input causing us to access our
        port array out of bounds. While fixing the array out of bounds issue,
        it was noticed that the original code did the same thing twice (the
        call to rdma_ah_set_port_num()), so its removal is not balanced by a
        readd elsewhere, it was already where it needed to be in addition to
        where it didn't need to be.
      
        Summary:
      
         - Oops fix in hfi1 driver
      
         - use-after-free issue in iser-target
      
         - use of user supplied array index without proper checking"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        RDMA/mlx5: Fix out-of-bound access while querying AH
        IB/hfi1: Prevent a NULL dereference
        iser-target: Fix possible use-after-free in connection establishment error
      8cbab92d
    • Daniel Borkmann's avatar
      Merge branch 'bpf-libbpf-cleanups' · e8a9d968
      Daniel Borkmann authored
      Jesper Dangaard Brouer says:
      
      ====================
      This patchset contains some small improvements and cleanup for
      the Makefile in tools/lib/bpf/.
      
      It worries me that the libbpf.so shared library is not versioned,
      but it not addressed in this patchset.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e8a9d968