1. 03 Mar, 2022 40 commits
    • Toshiaki Makita's avatar
      act_ct: Support GRE offload · fcb6aa86
      Toshiaki Makita authored
      Support GREv0 without NAT.
      Signed-off-by: default avatarToshiaki Makita <toshiaki.makita1@gmail.com>
      Acked-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      fcb6aa86
    • Toshiaki Makita's avatar
      netfilter: flowtable: Support GRE · 4e8d9584
      Toshiaki Makita authored
      Support GREv0 without NAT.
      Signed-off-by: default avatarToshiaki Makita <toshiaki.makita1@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4e8d9584
    • Phil Sutter's avatar
      netfilter: nf_tables: Reject tables of unsupported family · f1082dd3
      Phil Sutter authored
      An nftables family is merely a hollow container, its family just a
      number and such not reliant on compile-time options other than nftables
      support itself. Add an artificial check so attempts at using a family
      the kernel can't support fail as early as possible. This helps user
      space detect kernels which lack e.g. NFPROTO_INET.
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f1082dd3
    • Florian Westphal's avatar
      Revert "netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY" · bbfbf7a5
      Florian Westphal authored
      This reverts commit 5bed9f3f.
      
      Gal Presman says:
       this patch broke geneve tunnels, or possibly all udp tunnels?
       A simple test that creates two geneve tunnels and runs tcp iperf fails
       and results in checksum errors (TcpInCsumErrors).
      
      Original commit wanted to fix nf_reject with zero checksum,
      so it appears better to change nf reject infra instead.
      
      Fixes: 5bed9f3f ("netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY")
      Reported-by: default avatarGal Pressman <gal@nvidia.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      bbfbf7a5
    • David S. Miller's avatar
      Merge branch 'nfc-llcp-cleanups' · ef132dc4
      David S. Miller authored
      Krzysztof Kozlowski says:
      
      ====================
      nfc: llcp: few cleanups/improvements
      
      These are improvements, not fixing any experienced issue, just looking correct
      to me from the code point of view.
      
      Changes since v1
      ================
      1. Split from the fix.
      
      Testing
      =======
      Under QEMU only. The NFC/LLCP code was not really tested on a device.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef132dc4
    • Krzysztof Kozlowski's avatar
      nfc: llcp: Revert "NFC: Keep socket alive until the DISC PDU is actually sent" · 44cd5765
      Krzysztof Kozlowski authored
      This reverts commit 17f7ae16.
      
      The commit brought a new socket state LLCP_DISCONNECTING, which was
      never set, only read, so socket could never set to such state.
      
      Remove the dead code.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44cd5765
    • Krzysztof Kozlowski's avatar
      nfc: llcp: protect nfc_llcp_sock_unlink() calls · a06b8044
      Krzysztof Kozlowski authored
      nfc_llcp_sock_link() is called in all paths (bind/connect) as a last
      action, still protected with lock_sock().  When cleaning up in
      llcp_sock_release(), call nfc_llcp_sock_unlink() in a mirrored way:
      earlier and still under the lock_sock().
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a06b8044
    • Krzysztof Kozlowski's avatar
      nfc: llcp: use test_bit() · a7364912
      Krzysztof Kozlowski authored
      Use test_bit() instead of open-coding it, just like in other places
      touching the bitmap.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7364912
    • Krzysztof Kozlowski's avatar
      nfc: llcp: use centralized exiting of bind on errors · 4dbbf673
      Krzysztof Kozlowski authored
      Coding style encourages centralized exiting of functions, so rewrite
      llcp_sock_bind() error paths to use such pattern.  This reduces the
      duplicated cleanup code, make success path visually shorter and also
      cleans up the errors in proper order (in reversed way from
      initialization).
      
      No functional impact expected.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4dbbf673
    • Krzysztof Kozlowski's avatar
      nfc: llcp: simplify llcp_sock_connect() error paths · ec10fd15
      Krzysztof Kozlowski authored
      The llcp_sock_connect() error paths were using a mixed way of central
      exit (goto) and cleanup
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec10fd15
    • Krzysztof Kozlowski's avatar
      nfc: llcp: nullify llcp_sock->dev on connect() error paths · 13a3585b
      Krzysztof Kozlowski authored
      Nullify the llcp_sock->dev on llcp_sock_connect() error paths,
      symmetrically to the code llcp_sock_bind().  The non-NULL value of
      llcp_sock->dev is used in a few places to check whether the socket is
      still valid.
      
      There was no particular issue observed with missing NULL assignment in
      connect() error path, however a similar case - in the bind() error path
      - was triggereable.  That one was fixed in commit 4ac06a1e ("nfc:
      fix NULL ptr dereference in llcp_sock_getname() after failed connect"),
      so the change here seems logical as well.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13a3585b
    • David S. Miller's avatar
      Merge branch 'net-hw-counters-for-soft-devices' · ca0a53dc
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      HW counters for soft devices
      
      Petr says:
      
      Offloading switch device drivers may be able to collect statistics of the
      traffic taking place in the HW datapath that pertains to a certain soft
      netdevice, such as a VLAN. In this patch set, add the necessary
      infrastructure to allow exposing these statistics to the offloaded
      netdevice in question, and add mlxsw offload.
      
      Across HW platforms, the counter itself very likely constitutes a limited
      resource, and the act of counting may have a performance impact. Therefore
      this patch set makes the HW statistics collection opt-in and togglable from
      userspace on a per-netdevice basis.
      
      Additionally, HW devices may have various limiting conditions under which
      they can realize the counter. Therefore it is also possible to query
      whether the requested counter is realized by any driver. In TC parlance,
      which is to a degree reused in this patch set, two values are recognized:
      "request" tracks whether the user enabled collecting HW statistics, and
      "used" tracks whether any HW statistics are actually collected.
      
      In the past, this author has expressed the opinion that `a typical user
      doing "ip -s l sh", including various scripts, wants to see the full
      picture and not worry what's going on where'. While that would be nice,
      unfortunately it cannot work:
      
      - Packets that trap from the HW datapath to the SW datapath would be
        double counted.
      
        For a given netdevice, some traffic can be purely a SW artifact, and some
        may flow through the HW object corresponding to the netdevice. But some
        traffic can also get trapped to the SW datapath after bumping the HW
        counter. It is not clear how to make sure double-counting does not occur
        in the SW datapath in that case, while still making sure that possibly
        divergent SW forwarding path gets bumped as appropriate.
      
        So simply adding HW and SW stats may work roughly, most of the time, but
        there are scenarios where the result is nonsensical.
      
      - HW devices will have limitations as to what type of traffic they can
        count.
      
        In case of mlxsw, which is part of this patch set, there is no reasonable
        way to count all traffic going through a certain netdevice, such as a
        VLAN netdevice enslaved to a bridge. It is however very simple to count
        traffic flowing through an L3 object, such as a VLAN netdevice with an IP
        address.
      
        Similarly for physical netdevices, the L3 object at which the counter is
        installed is the subport carrying untagged traffic.
      
        These are not "just counters". It is important that the user understands
        what is being counted. It would be incorrect to conflate these statistics
        with another existing statistics suite.
      
      To that end, this patch set introduces a statistics suite called "L3
      stats". This label should make it easy to understand what is being counted,
      and to decide whether a given device can or cannot implement this suite for
      some type of netdevice. At the same time, the code is written to make
      future extensions easy, should a device pop up that can implement a
      different flavor of statistics suite (say L2, or an address-family-specific
      suite).
      
      For example, using a work-in-progress iproute2[1], to turn on and then list
      the counters on a VLAN netdevice:
      
          # ip stats set dev swp1.200 l3_stats on
          # ip stats show dev swp1.200 group offload subgroup l3_stats
          56: swp1.200: group offload subgroup l3_stats on used on
      	RX:  bytes packets errors dropped  missed   mcast
      		0       0      0       0       0       0
      	TX:  bytes packets errors dropped carrier collsns
      		0       0      0       0       0       0
      
      The patchset progresses as follows:
      
      - Patch #1 is a cleanup.
      
      - In patch #2, remove the assumption that all LINK_OFFLOAD_XSTATS are
        dev-backed.
      
        The only attribute defined under the nest is currently
        IFLA_OFFLOAD_XSTATS_CPU_HIT. L3_STATS differs from CPU_HIT in that the
        driver that supplies the statistics is not the same as the driver that
        implements the netdevice. Make the code compatible with this in patch #2.
      
      - In patch #3, add the possibility to filter inside nests.
      
        The filter_mask field of RTM_GETSTATS header determines which
        top-level attributes should be included in the netlink response. This
        saves processing time by only including the bits that the user cares
        about instead of always dumping everything. This is doubly important
        for HW-backed statistics that would typically require a trip to the
        device to fetch the stats. In this patch, the UAPI is extended to
        allow filtering inside IFLA_STATS_LINK_OFFLOAD_XSTATS in particular,
        but the scheme is easily extensible to other nests as well.
      
      - In patch #4, propagate extack where we need it.
        In patch #5, make it possible to propagate errors from drivers to the
        user.
      
      - In patch #6, add the in-kernel APIs for keeping track of the new stats
        suite, and the notifiers that the core uses to communicate with the
        drivers.
      
      - In patch #7, add UAPI for obtaining the new stats suite.
      
      - In patch #8, add a new UAPI message, RTM_SETSTATS, which will carry
        the message to toggle the newly-added stats suite.
        In patch #9, add the toggle itself.
      
      At this point the core is ready for drivers to add support for the new
      stats suite.
      
      - In patches #10, #11 and #12, apply small tweaks to mlxsw code.
      
      - In patch #13, add support for L3 stats, which are realized as RIF
        counters.
      
      - Finally in patch #14, a selftest is added to the net/forwarding
        directory. Technically this is a HW-specific test, in that without a HW
        implementing the counters, it just will not pass. But devices that
        support L3 statistics at all are likely to be able to reuse this
        selftest, so it seems appropriate to put it in the general forwarding
        directory.
      
      We also have a netdevsim implementation, and a corresponding selftest that
      verifies specifically some of the core code. We intend to contribute these
      later. Interested parties can take a look at the raw code at [2].
      
      [1] https://github.com/pmachata/iproute2/commits/soft_counters
      [2] https://github.com/pmachata/linux_mlxsw/commits/petrm_soft_counters_2
      
      v2:
      - Patch #3:
          - Do not declare strict_start_type at the new policies, since they are
            used with nla_parse_nested() (sans _deprecated).
          - Use NLA_POLICY_NESTED to declare what the nest contents should be
          - Use NLA_POLICY_MASK instead of BITFIELD32 for the filtering
            attribute.
      - Patch #6:
          - s/monotonous/monotonic/ in commit message
          - Use a newly-added struct rtnl_hw_stats64 for stats transfer
      - Patch #7:
          - Use a newly-added struct rtnl_hw_stats64 for stats transfer
      - Patch #8:
          - Do not declare strict_start_type at the new policies, since they are
            used with nla_parse_nested() (sans _deprecated).
      - Patch #13:
          - Use a newly-added struct rtnl_hw_stats64 for stats transfer
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca0a53dc
    • Petr Machata's avatar
      selftests: forwarding: hw_stats_l3: Add a new test · ba95e793
      Petr Machata authored
      Add a test that verifies operation of L3 HW statistics.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba95e793
    • Petr Machata's avatar
      mlxsw: Add support for IFLA_OFFLOAD_XSTATS_L3_STATS · 8d0f7d3a
      Petr Machata authored
      Spectrum machines support L3 stats by binding a counter to a RIF, a
      hardware object representing a router interface. Recognize the netdevice
      notifier events, NETDEV_OFFLOAD_XSTATS_*, to support enablement,
      disablement, and reporting back to core.
      
      As a netdevice gains a RIF, if L3 stats are enabled, install the counters,
      and ping the core so that a userspace notification can be emitted.
      
      Similarly, as a netdevice loses a RIF, push the as-yet-unreported
      statistics to the core, so that they are not lost, and ping the core to
      emit userspace notification.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d0f7d3a
    • Petr Machata's avatar
      mlxsw: Extract classification of router-related events to a helper · c1de13f9
      Petr Machata authored
      Several more events are coming in the following patches, and extending the
      if statement is getting awkward. Instead, convert it to a switch.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1de13f9
    • Petr Machata's avatar
      mlxsw: spectrum_router: Drop mlxsw_sp arg from counter alloc/free functions · 9834e246
      Petr Machata authored
      The mlxsw_sp reference is carried by the mlxsw_sp_rif object that is passed
      to these functions as well. Just deduce the former from the latter,
      and drop the explicit mlxsw_sp parameter. Adapt callers.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9834e246
    • Petr Machata's avatar
      mlxsw: reg: Fix packing of router interface counters · 8fe96f58
      Petr Machata authored
      The function mlxsw_reg_ritr_counter_pack() formats a register to configure
      a router interface (RIF) counter. The parameter `egress' determines whether
      an ingress or egress counter is to be configured. RITR, the register in
      question, has two sets of counter-related fields: one for ingress, one for
      egress. When setting values of the fields, the function sets the proper
      counter index field, but when setting the counter type, it always sets the
      egress field. Thus configuration of ingress counters is broken, and in fact
      an attempt to configure an ingress counter mangles a previously configured
      egress counter.
      
      This was never discovered, because there is currently no way to enable
      ingress counters on a router interface, only the egress one.
      
      Fix in an obvious way.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fe96f58
    • Petr Machata's avatar
      net: rtnetlink: Add UAPI toggle for IFLA_OFFLOAD_XSTATS_L3_STATS · 5fd0b838
      Petr Machata authored
      The offloaded HW stats are designed to allow per-netdevice enablement and
      disablement. Add an attribute, IFLA_STATS_SET_OFFLOAD_XSTATS_L3_STATS,
      which should be carried by the RTM_SETSTATS message, and expresses a desire
      to toggle L3 offload xstats on or off.
      
      As part of the above, add an exported function rtnl_offload_xstats_notify()
      that drivers can use when they have installed or deinstalled the counters
      backing the HW stats.
      
      At this point, it is possible to enable, disable and query L3 offload
      xstats on netdevices. (However there is no driver actually implementing
      these.)
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fd0b838
    • Petr Machata's avatar
      net: rtnetlink: Add RTM_SETSTATS · 03ba3566
      Petr Machata authored
      The offloaded HW stats are designed to allow per-netdevice enablement and
      disablement. These stats are only accessible through RTM_GETSTATS, and
      therefore should be toggled by a RTM_SETSTATS message. Add it, and the
      necessary skeleton handler.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03ba3566
    • Petr Machata's avatar
      net: rtnetlink: Add UAPI for obtaining L3 offload xstats · 0e7788fd
      Petr Machata authored
      Add a new IFLA_STATS_LINK_OFFLOAD_XSTATS child attribute,
      IFLA_OFFLOAD_XSTATS_L3_STATS, to carry statistics for traffic that takes
      place in a HW router.
      
      The offloaded HW stats are designed to allow per-netdevice enablement and
      disablement. Additionally, as a netdevice is configured, it may become or
      cease being suitable for binding of a HW counter. Both of these aspects
      need to be communicated to the userspace. To that end, add another child
      attribute, IFLA_OFFLOAD_XSTATS_HW_S_INFO:
      
          - attr nest IFLA_OFFLOAD_XSTATS_HW_S_INFO
      	- attr nest IFLA_OFFLOAD_XSTATS_L3_STATS
       	    - attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_REQUEST
      	      - {0,1} as u8
       	    - attr IFLA_OFFLOAD_XSTATS_HW_S_INFO_USED
      	      - {0,1} as u8
      
      Thus this one attribute is a nest that can be used to carry information
      about various types of HW statistics, and indexing is very simply done by
      wrapping the information for a given statistics suite into the attribute
      that carries the suite is the RTM_GETSTATS query. At the same time, because
      _HW_S_INFO is nested directly below IFLA_STATS_LINK_OFFLOAD_XSTATS, it is
      possible through filtering to request only the metadata about individual
      statistics suites, without having to hit the HW to get the actual counters.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e7788fd
    • Petr Machata's avatar
      net: dev: Add hardware stats support · 9309f97a
      Petr Machata authored
      Offloading switch device drivers may be able to collect statistics of the
      traffic taking place in the HW datapath that pertains to a certain soft
      netdevice, such as VLAN. Add the necessary infrastructure to allow exposing
      these statistics to the offloaded netdevice in question. The API was shaped
      by the following considerations:
      
      - Collection of HW statistics is not free: there may be a finite number of
        counters, and the act of counting may have a performance impact. It is
        therefore necessary to allow toggling whether HW counting should be done
        for any particular SW netdevice.
      
      - As the drivers are loaded and removed, a particular device may get
        offloaded and unoffloaded again. At the same time, the statistics values
        need to stay monotonic (modulo the eventual 64-bit wraparound),
        increasing only to reflect traffic measured in the device.
      
        To that end, the netdevice keeps around a lazily-allocated copy of struct
        rtnl_link_stats64. Device drivers then contribute to the values kept
        therein at various points. Even as the driver goes away, the struct stays
        around to maintain the statistics values.
      
      - Different HW devices may be able to count different things. The
        motivation behind this patch in particular is exposure of HW counters on
        Nvidia Spectrum switches, where the only practical approach to counting
        traffic on offloaded soft netdevices currently is to use router interface
        counters, and count L3 traffic. Correspondingly that is the statistics
        suite added in this patch.
      
        Other devices may be able to measure different kinds of traffic, and for
        that reason, the APIs are built to allow uniform access to different
        statistics suites.
      
      - Because soft netdevices and offloading drivers are only loosely bound, a
        netdevice uses a notifier chain to communicate with the drivers. Several
        new notifiers, NETDEV_OFFLOAD_XSTATS_*, have been added to carry messages
        to the offloading drivers.
      
      - Devices can have various conditions for when a particular counter is
        available. As the device is configured and reconfigured, the device
        offload may become or cease being suitable for counter binding. A
        netdevice can use a notifier type NETDEV_OFFLOAD_XSTATS_REPORT_USED to
        ping offloading drivers and determine whether anyone currently implements
        a given statistics suite. This information can then be propagated to user
        space.
      
        When the driver decides to unoffload a netdevice, it can use a
        newly-added function, netdev_offload_xstats_report_delta(), to record
        outstanding collected statistics, before destroying the HW counter.
      
      This patch adds a helper, call_netdevice_notifiers_info_robust(), for
      dispatching a notifier with the possibility of unwind when one of the
      consumers bails. Given the wish to eventually get rid of the global
      notifier block altogether, this helper only invokes the per-netns notifier
      block.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9309f97a
    • Petr Machata's avatar
      net: rtnetlink: rtnl_fill_statsinfo(): Permit non-EMSGSIZE error returns · 216e6906
      Petr Machata authored
      Obtaining stats for the IFLA_STATS_LINK_OFFLOAD_XSTATS nest involves a HW
      access, and can fail for more reasons than just netlink message size
      exhaustion. Therefore do not always return -EMSGSIZE on the failure path,
      but respect the error code provided by the callee. Set the error explicitly
      where it is reasonable to assume -EMSGSIZE as the failure reason.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      216e6906
    • Petr Machata's avatar
      net: rtnetlink: Propagate extack to rtnl_offload_xstats_fill() · 05415bcc
      Petr Machata authored
      Later patches add handlers for more HW-backed statistics. An extack will be
      useful when communicating HW / driver errors to the client. Add the
      arguments as appropriate.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05415bcc
    • Petr Machata's avatar
      net: rtnetlink: RTM_GETSTATS: Allow filtering inside nests · 46efc97b
      Petr Machata authored
      The filter_mask field of RTM_GETSTATS header determines which top-level
      attributes should be included in the netlink response. This saves
      processing time by only including the bits that the user cares about
      instead of always dumping everything. This is doubly important for
      HW-backed statistics that would typically require a trip to the device to
      fetch the stats.
      
      So far there was only one HW-backed stat suite per attribute. However,
      IFLA_STATS_LINK_OFFLOAD_XSTATS is a nest, and will gain a new stat suite in
      the following patches. It would therefore be advantageous to be able to
      filter within that nest, and select just one or the other HW-backed
      statistics suite.
      
      Extend rtnetlink so that RTM_GETSTATS permits attributes in the payload.
      The scheme is as follows:
      
          - RTM_GETSTATS
      	- struct if_stats_msg
      	- attr nest IFLA_STATS_GET_FILTERS
      	    - attr IFLA_STATS_LINK_OFFLOAD_XSTATS
      		- u32 filter_mask
      
      This scheme reuses the existing enumerators by nesting them in a dedicated
      context attribute. This is covered by policies as usual, therefore a
      gradual opt-in is possible. Currently only IFLA_STATS_LINK_OFFLOAD_XSTATS
      nest has filtering enabled, because for the SW counters the issue does not
      seem to be that important.
      
      rtnl_offload_xstats_get_size() and _fill() are extended to observe the
      requested filters.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46efc97b
    • Petr Machata's avatar
      net: rtnetlink: Stop assuming that IFLA_OFFLOAD_XSTATS_* are dev-backed · f6e0fb81
      Petr Machata authored
      The IFLA_STATS_LINK_OFFLOAD_XSTATS attribute is a nest whose child
      attributes carry various special hardware statistics. The code that handles
      this nest was written with the idea that all these statistics would be
      exposed by the device driver of a physical netdevice.
      
      In the following patches, a new attribute is added to the abovementioned
      nest, which however can be defined for some soft netdevices. The NDO-based
      approach to querying these does not work, because it is not the soft
      netdevice driver that exposes these statistics, but an offloading NIC
      driver that does so.
      
      The current code does not scale well to this usage. Simply rewrite it back
      to the pattern seen in other fill-like and get_size-like functions
      elsewhere.
      
      Extract to helpers the code that is concerned with handling specifically
      NDO-backed statistics so that it can be easily reused should more such
      statistics be added.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6e0fb81
    • Petr Machata's avatar
      net: rtnetlink: Namespace functions related to IFLA_OFFLOAD_XSTATS_* · 6b524a1d
      Petr Machata authored
      The currently used names rtnl_get_offload_stats() and
      rtnl_get_offload_stats_size() do not clearly show the namespace. The former
      function additionally seems to have been named this way in accordance with
      the NDO name, as opposed to the naming used in the rtnetlink.c file (and
      indeed elsewhere in the netlink handling code). As more and
      differently-flavored attributes are introduced, a common clear prefix is
      needed for all related functions.
      
      Rename the functions to follow the rtnl_offload_xstats_* naming scheme.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b524a1d
    • Manish Chopra's avatar
      qed: validate and restrict untrusted VFs vlan promisc mode · cbcc44db
      Manish Chopra authored
      Today when VFs are put in promiscuous mode, they can request PF
      to configure device for them to receive all VLANs traffic regardless
      of what vlan is configured by the PF (via ip link) and PF allows this
      config request regardless of whether VF is trusted or not.
      
      From security POV, when VLAN is configured for VF through PF (via ip link),
      honour such config requests from VF only when they are configured to be
      trusted, otherwise restrict such VFs vlan promisc mode config.
      
      Cc: stable@vger.kernel.org
      Fixes: f990c82c ("qed*: Add support for ndo_set_vf_trust")
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbcc44db
    • Manish Chopra's avatar
      qed: display VF trust config · 4e6e6bec
      Manish Chopra authored
      Driver does support SR-IOV VFs trust configuration but
      it does not display it when queried via ip link utility.
      
      Cc: stable@vger.kernel.org
      Fixes: f990c82c ("qed*: Add support for ndo_set_vf_trust")
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e6e6bec
    • David S. Miller's avatar
      Merge branch 'stmmac-SA8155p-ADP' · d52b4536
      David S. Miller authored
      @ 2022-03-02 10:39 Bhupesh Sharma
        2022-03-02 10:39 ` [PATCH v2 1/2 net-next] net: stmmac: Add support for SM8150 Bhupesh Sharma
        2022-03-02 10:39 ` [PATCH v2 2/2 net-next] net: stmmac: dwmac-qcom-ethqos: Adjust rgmii loopback_en per platform Bhupesh Sharma
        0 siblings, 2 replies; 3+ messages in thread
      Bhupesh Sharma says:
      
      ====================
      net: stmmac: Enable support for Qualcomm SA8155p-ADP board
      
      Changes since v1:
      -----------------
      - v1 can be seen here: https://lore.kernel.org/netdev/20220126221725.710167-1-bhupesh.sharma@linaro.org/t/
      - Fixed review comments from Bjorn - broke the v1 series into two
        separate series - one each for 'net' tree and 'arm clock/dts' tree
        - so as to ease review of the same from the respective maintainers.
      - This series is intended for the 'net' tree.
      
      The SA8155p-ADP board supports on-board ethernet (Gibabit Interface),
      with support for both RGMII and RMII buses.
      
      This patchset adds the support for the same.
      
      Note that this patchset is based on an earlier sent patchset
      for adding PDC controller support on SM8150 (see [1]).
      
      [1]. https://lore.kernel.org/linux-arm-msm/20220226184028.111566-1-bhupesh.sharma@linaro.org/T/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d52b4536
    • Bjorn Andersson's avatar
      net: stmmac: dwmac-qcom-ethqos: Adjust rgmii loopback_en per platform · a7bf6d7c
      Bjorn Andersson authored
      Not all platforms should have RGMII_CONFIG_LOOPBACK_EN and the result it
      about 50% packet loss on incoming messages. So make it possile to
      configure this per compatible and enable it for QCS404.
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7bf6d7c
    • Vinod Koul's avatar
      net: stmmac: Add support for SM8150 · d90b3120
      Vinod Koul authored
      This adds compatible, POR config & driver data for ethernet controller
      found in SM8150 SoC.
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarVinod Koul <vkoul@kernel.org>
      [bhsharma: Massage the commit log and other cosmetic changes]
      Signed-off-by: default avatarBhupesh Sharma <bhupesh.sharma@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d90b3120
    • David S. Miller's avatar
      Merge branch 'page_pool-stats' · a8ff736d
      David S. Miller authored
      Joe Damato says:
      
      ====================
      page_pool: Add stats counters
      
      Greetings:
      
      Welcome to v9.
      
      This revisions adds a commit which updates the page_pool documentation to
      describe the stats API, structures, and fields.
      
      Additionally, this revision contains a minor cosmetic change suggested by
      Saeed in page_pool_recycle_in_ring in commit 2: "page_pool: Add recycle
      stats", which removes an unnecessary #ifdef.
      
      There are no functional changes in this revision.
      
      Benchmark output from the v7 cover [1] is pasted below, as it is still
      relevant since no functional changes have been made in this revision:
      
      Benchmarks have been re-run. As always, results between runs are highly
      variable; you'll find results showing that stats disabled are both faster
      and slower than stats enabled in back to back benchmark runs.
      
      Raw benchmark output with stats off [2] and stats on [3] are available for
      examination.
      
      Test system:
      	- 2x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
      	- 2 NUMA zones, with 18 cores per zone and 2 threads per core
      
      bench_page_pool_simple results, loops=200000000
      test name			stats enabled		stats disabled
      				cycles	nanosec		cycles	nanosec
      
      for_loop			0	0.335		0	0.336
      atomic_inc 			14	6.106		13	6.022
      lock				30	13.365		32	13.968
      
      no-softirq-page_pool01		75	32.884		74	32.308
      no-softirq-page_pool02		79	34.696		74	32.302
      no-softirq-page_pool03		110	48.005		105	46.073
      
      tasklet_page_pool01_fast_path	14	6.156		14	6.211
      tasklet_page_pool02_ptr_ring	41	18.028		39	17.391
      tasklet_page_pool03_slow	107	46.646		105	46.123
      
      bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=4:
      test name			stats enabled		stats disabled
      				cycles	nanosec		cycles	nanosec
      
      page_pool_cross_cpu CPU(0)	3973	1731.596	4015	1750.015
      page_pool_cross_cpu CPU(1)	3976	1733.217	4022	1752.864
      page_pool_cross_cpu CPU(2)	3973	1731.615	4016	1750.433
      page_pool_cross_cpu CPU(3)	3976	1733.218	4021	1752.806
      page_pool_cross_cpu CPU(4)	994	433.305		1005	438.217
      
      page_pool_cross_cpu average	3378	-		3415	-
      
      bench_page_pool_cross_cpu results, loops=20000000 returning_cpus=8:
      test name			stats enabled		stats disabled
      				cycles	nanosec		cycles	nanosec
      
      page_pool_cross_cpu CPU(0)	6969	3037.488	6909	3011.463
      page_pool_cross_cpu CPU(1)	6974	3039.469	6913	3012.961
      page_pool_cross_cpu CPU(2)	6969	3037.575	6910	3011.585
      page_pool_cross_cpu CPU(3)	6974	3039.415	6913	3012.961
      page_pool_cross_cpu CPU(4)	6969	3037.288	6909	3011.368
      page_pool_cross_cpu CPU(5)	6972	3038.732	6913	3012.920
      page_pool_cross_cpu CPU(6)	6969	3037.350	6909	3011.386
      page_pool_cross_cpu CPU(7)	6973	3039.356	6913	3012.921
      page_pool_cross_cpu CPU(8)	871	379.934		864	376.620
      
      page_pool_cross_cpu average	6293	-		6239	-
      
      Thanks.
      
      [1]: https://lore.kernel.org/all/1645810914-35485-1-git-send-email-jdamato@fastly.com/
      [2]: https://gist.githubusercontent.com/jdamato-fsly/d7c34b9fa7be1ce132a266b0f2b92aea/raw/327dcd71d11ece10238fbf19e0472afbcbf22fd4/v7_stats_disabled
      [3]: https://gist.githubusercontent.com/jdamato-fsly/d7c34b9fa7be1ce132a266b0f2b92aea/raw/327dcd71d11ece10238fbf19e0472afbcbf22fd4/v7_stats_enabled
      
      v8 -> v9:
      	- Add documentation about the page_pool_get_stats API, stats
      	  structures, and fields to Documentation/networking/page_pool.rst.
      	- Remove unnecessary #ifdef in page_pool_recycle_in_ring.
      
      v7 -> v8:
      	- Rename mlx5 ethtool stats so that users have a better idea of
      	  their meaning.
      
      v6 -> v7:
      	- stats split out into two structs one single per-page pool struct
      	  for allocation path stats and one per-cpu pointer for recycle
      	  path stats.
      	- page_pool_get_stats updated to use a wrapper struct to gather
      	  stats for allocation and recycle stats with a single argument.
      	- placement of structs adjusted
      	- mlx5 driver modified to use page_pool_get_stats API
      
      v5 -> v6:
      	- Per cpu page_pool_stats struct pointer is now marked as
      	  ____cacheline_aligned_in_smp. Placement of the field in the
      	  struct is unchanged; it is the last field.
      
      v4 -> v5:
      	- Fixed the description of the kernel option in Kconfig.
      	- Squashed commits 1-10 from v4 into a single commit for easier
      	  review.
      	- Changed the comment style of the comment for
      	  the this_cpu_inc_alloc_stat macro.
      	- Changed the return type of page_pool_get_stats from struct
      	  page_pool_stat * to bool.
      
      v3 -> v4:
      	- Restructured stats to be per-cpu per-pool.
      	- Global stats and proc file were removed.
      	- Exposed an API (page_pool_get_stats) for batching the pool stats.
      
      v2 -> v3:
      	- patch 8/10 ("Add stat tracking cache refill") fixed placement of
      	  counter increment.
      	- patch 10/10 ("net-procfs: Show page pool stats in proc") updated:
      		- fix unused label warning from kernel test robot,
      		- fixed page_pool_seq_show to only display the refill stat
      		  once,
      		- added a remove_proc_entry for page_pool_stat to
      		  dev_proc_net_exit.
      
      v1 -> v2:
      	- A new kernel config option has been added, which defaults to N,
      	   preventing this code from being compiled in by default
      	- The stats structure has been converted to a per-cpu structure
      	- The stats are now exported via proc (/proc/net/page_pool_stat)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8ff736d
    • Joe Damato's avatar
      mlx5: add support for page_pool_get_stats · cc10e84b
      Joe Damato authored
      This change adds support for the page_pool_get_stats API to mlx5. If the
      user has enabled CONFIG_PAGE_POOL_STATS in their kernel, ethtool will
      output page pool stats.
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Acked-by: default avatarSaeed Mahameed <saeed@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc10e84b
    • Joe Damato's avatar
      Documentation: update networking/page_pool.rst · a3dd9828
      Joe Damato authored
      Add the new stats API, kernel config parameter, and stats structure
      information to the page_pool documentation.
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3dd9828
    • Joe Damato's avatar
      page_pool: Add function to batch and return stats · 6b95e338
      Joe Damato authored
      Adds a function page_pool_get_stats which can be used by drivers to obtain
      stats for a specified page_pool.
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b95e338
    • Joe Damato's avatar
      page_pool: Add recycle stats · ad6fa1e1
      Joe Damato authored
      Add per-cpu stats tracking page pool recycling events:
      	- cached: recycling placed page in the page pool cache
      	- cache_full: page pool cache was full
      	- ring: page placed into the ptr ring
      	- ring_full: page released from page pool because the ptr ring was full
      	- released_refcnt: page released (and not recycled) because refcnt > 1
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad6fa1e1
    • Joe Damato's avatar
      page_pool: Add allocation stats · 8610037e
      Joe Damato authored
      Add per-pool statistics counters for the allocation path of a page pool.
      These stats are incremented in softirq context, so no locking or per-cpu
      variables are needed.
      
      This code is disabled by default and a kernel config option is provided for
      users who wish to enable them.
      
      The statistics added are:
      	- fast: successful fast path allocations
      	- slow: slow path order-0 allocations
      	- slow_high_order: slow path high order allocations
      	- empty: ptr ring is empty, so a slow path allocation was forced.
      	- refill: an allocation which triggered a refill of the cache
      	- waive: pages obtained from the ptr ring that cannot be added to
      	  the cache due to a NUMA mismatch.
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: default avatarIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8610037e
    • Tao Chen's avatar
      tcp: Remove the unused api · 42f0c193
      Tao Chen authored
      Last tcp_write_queue_head() use was removed in commit
      114f39fe ("tcp: restore autocorking"), so remove it.
      Signed-off-by: default avatarTao Chen <chentao3@hotmail.com>
      Link: https://lore.kernel.org/r/SYZP282MB33317DEE1253B37C0F57231E86029@SYZP282MB3331.AUSP282.PROD.OUTLOOK.COMSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      42f0c193
    • Kurt Kanzenbach's avatar
      flow_dissector: Add support for HSR · bf08824a
      Kurt Kanzenbach authored
      Network drivers such as igb or igc call eth_get_headlen() to determine the
      header length for their to be constructed skbs in receive path.
      
      When running HSR on top of these drivers, it results in triggering BUG_ON() in
      skb_pull(). The reason is the skb headlen is not sufficient for HSR to work
      correctly. skb_pull() notices that.
      
      For instance, eth_get_headlen() returns 14 bytes for TCP traffic over HSR which
      is not correct. The problem is, the flow dissection code does not take HSR into
      account. Therefore, add support for it.
      Reported-by: default avatarAnthony Harivel <anthony.harivel@linutronix.de>
      Signed-off-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Link: https://lore.kernel.org/r/20220228195856.88187-1-kurt@linutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bf08824a
    • Baruch Siach's avatar
      net: dsa: mv88e6xxx: support RMII cmode · 00202885
      Baruch Siach authored
      Add support for direct RMII MAC mode. This allows hardware with CPU port
      connected in direct 100M fixed link to work properly.
      Signed-off-by: default avatarBaruch Siach <baruch.siach@siklu.com>
      Link: https://lore.kernel.org/r/a962d1ccbeec42daa10dd8aff0e66e31f0faf1eb.1646050203.git.baruch@tkos.co.ilSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      00202885