1. 01 Jul, 2024 5 commits
    • David S. Miller's avatar
      Merge tag 'nf-next-24-06-28' of... · 1c5fc27b
      David S. Miller authored
      Merge tag 'nf-next-24-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next into main
      
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset contains Netfilter/IPVS updates for net-next:
      
      Patch #1 to #11 to shrink memory consumption for transaction objects:
      
        struct nft_trans_chain { /* size: 120 (-32), cachelines: 2, members: 10 */
        struct nft_trans_elem { /* size: 72 (-40), cachelines: 2, members: 4 */
        struct nft_trans_flowtable { /* size: 80 (-48), cachelines: 2, members: 5 */
        struct nft_trans_obj { /* size: 72 (-40), cachelines: 2, members: 4 */
        struct nft_trans_rule { /* size: 80 (-32), cachelines: 2, members: 6 */
        struct nft_trans_set { /* size: 96 (-24), cachelines: 2, members: 8 */
        struct nft_trans_table { /* size: 56 (-40), cachelines: 1, members: 2 */
      
        struct nft_trans_elem can now be allocated from kmalloc-96 instead of
        kmalloc-128 slab.
      
        Series from Florian Westphal. For the record, I have mangled patch #1
        to add nft_trans_container_*() and use if for every transaction object.
         I have also added BUILD_BUG_ON to ensure struct nft_trans always comes
        at the beginning of the container transaction object. And few minor
        cleanups, any new bugs are of my own.
      
      Patch #12 simplify check for SCTP GSO in IPVS, from Ismael Luceno.
      
      Patch #13 nf_conncount key length remains in the u32 bound, from Yunjian Wang.
      
      Patch #14 removes unnecessary check for CTA_TIMEOUT_L3PROTO when setting
                default conntrack timeouts via nfnetlink_cttimeout API, from
                Lin Ma.
      
      Patch #15 updates NFT_SECMARK_CTX_MAXLEN to 4096, SELinux could use
                larger secctx names than the existing 256 bytes length.
      
      Patch #16 adds a selftest to exercise nfnetlink_queue listeners leaving
                nfnetlink_queue, from Florian Westphal.
      
      Patch #17 increases hitcount from 255 to 65535 in xt_recent, from Phil Sutter.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c5fc27b
    • David S. Miller's avatar
      Merge branch 'tcp_metrics-netlink-specs' into main · a051091c
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      tcp_metrics: add netlink protocol spec in YAML
      
      Add a netlink protocol spec for the tcp_metrics generic netlink family.
      First patch adjusts the uAPI header guards to make it easier to build
      tools/ with non-system headers.
      
      v1: https://lore.kernel.org/all/20240626201133.2572487-1-kuba@kernel.org
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a051091c
    • Jakub Kicinski's avatar
      tcp_metrics: add netlink protocol spec in YAML · 85674625
      Jakub Kicinski authored
      Add a protocol spec for tcp_metrics, so that it's accessible via YNL.
      Useful at the very least for testing fixes.
      
      In this episode of "10,000 ways to complicate netlink" the metric
      nest has defines which are off by 1. iproute2 does:
      
              struct rtattr *m[TCP_METRIC_MAX + 1 + 1];
      
              parse_rtattr_nested(m, TCP_METRIC_MAX + 1, a);
      
              for (i = 0; i < TCP_METRIC_MAX + 1; i++) {
                      // ...
                      attr = m[i + 1];
      
      This is too weird to support in YNL, add a new set of defines
      with _correct_ values to the official kernel header.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85674625
    • Jakub Kicinski's avatar
      tcp_metrics: add UAPI to the header guard · 7c811005
      Jakub Kicinski authored
      tcp_metrics' header lacks the customary _UAPI in the header guard.
      This makes YNL build rules work less seamlessly.
      We can easily fix that on YNL side, but this could also be
      problematic if we ever needed to create a kernel-only tcp_metrics.h.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c811005
    • Marek Vasut's avatar
      net: phy: realtek: Add support for PHY LEDs on RTL8211F · 17784801
      Marek Vasut authored
      Realtek RTL8211F Ethernet PHY supports 3 LED pins which are used to
      indicate link status and activity. Add minimal LED controller driver
      supporting the most common uses with the 'netdev' trigger.
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17784801
  2. 29 Jun, 2024 18 commits
  3. 28 Jun, 2024 17 commits
    • Phil Sutter's avatar
      netfilter: xt_recent: Lift restrictions on max hitcount value · f4ebd034
      Phil Sutter authored
      Support tracking of up to 65535 packets per table entry instead of just
      255 to better facilitate longer term tracking or higher throughput
      scenarios.
      
      Note how this aligns sizes of struct recent_entry's 'nstamps' and
      'index' fields when 'nstamps' was larger before. This is unnecessary as
      the value of 'nstamps' grows along with that of 'index' after being
      initialized to 1 (see recent_entry_update()). Its value will thus never
      exceed that of 'index' and therefore does not need to provide space for
      larger values.
      Requested-by: default avatarFabio <pedretti.fabio@gmail.com>
      Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1745Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f4ebd034
    • Florian Westphal's avatar
      selftests: netfilter: nft_queue.sh: add test for disappearing listener · 742ad979
      Florian Westphal authored
      If userspace program exits while the queue its subscribed to has packets
      those need to be discarded.
      
      commit dc21c6cc ("netfilter: nfnetlink_queue: acquire rcu_read_lock()
      in instance_destroy_rcu()") fixed a (harmless) rcu splat that could be
      triggered in this case.
      
      Add a test case to cover this.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      742ad979
    • David S. Miller's avatar
      Merge branch 'net-selftests-mirroring-cleanup' into main · 748e3bbf
      David S. Miller authored
      Petr Machata says:
      
      ====================
      selftest: Clean-up and stabilize mirroring tests
      
      The mirroring selftests work by sending ICMP traffic between two hosts.
      Along the way, this traffic is mirrored to a gretap netdevice, and counter
      taps are then installed strategically along the path of the mirrored
      traffic to verify the mirroring took place.
      
      The problem with this is that besides mirroring the primary traffic, any
      other service traffic is mirrored as well. At the same time, because the
      tests need to work in HW-offloaded scenarios, the ability of the device to
      do arbitrary packet inspection should not be taken for granted. Most tests
      therefore simply use matchall, one uses flower to match on IP address.
      As a result, the selftests are noisy.
      
      mirror_test() accommodated this noisiness by giving the counters an
      allowance of several packets. But that only works up to a point, and on
      busy systems won't be always enough.
      
      In this patch set, clean up and stabilize the mirroring selftests. The
      original intention was to port the tests over to UDP, but the logic of
      ICMP ends up being so entangled in the mirroring selftests that the
      changes feel overly invasive. Instead, ICMP is kept, but where possible,
      we match on ICMP message type, thus filtering out hits by other ICMP
      messages.
      
      Where this is not practical (where the counter tap is put on a device
      that carries encapsulated packets), switch the counter condition to _at
      least_ X observed packets. This is less robust, but barely so --
      probably the only scenario that this would not catch is something like
      erroneous packet duplication, which would hopefully get caught by the
      numerous other tests in this extensive suite.
      
      - Patches #1 to #3 clean up parameters at various helpers.
      
      - Patches #4 to #6 stabilize the mirroring selftests as described above.
      
      - Mirroring tests currently allow testing SW datapath even on HW
        netdevices by trapping traffic to the SW datapath. This complicates
        the tests a bit without a good reason: to test SW datapath, just run
        the selftests on the veth topology. Thus in patch #7, drop support for
        this dual SW/HW testing.
      
      - At this point, some cleanups were either made possible by the previous
        patches, or were always possible. In patches #8 to #11, realize these
        cleanups.
      
      - In patch #12, fix mlxsw mirror_gre selftest to respect setting TESTS.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      748e3bbf
    • Petr Machata's avatar
      selftests: mlxsw: mirror_gre: Obey TESTS · 098ba97d
      Petr Machata authored
      This test is unusual in that overriding TESTS does not change the tests to
      be run. Split the individual tests into several functions and invoke them
      through tests_run() as appropriate.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      098ba97d
    • Petr Machata's avatar
      selftests: libs: Drop unused functions · 06704a0d
      Petr Machata authored
      Nothing calls these.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06704a0d
    • Petr Machata's avatar
      selftests: libs: Drop slow_path_trap_install()/_uninstall() · 4e9cd3d0
      Petr Machata authored
      These functions are not used anymore.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e9cd3d0
    • Petr Machata's avatar
      selftests: mirror_gre_lag_lacp: Drop unnecessary code · 95d33989
      Petr Machata authored
      The selftest does not use functions from mirror_gre_lib, ditch the import.
      
      It does not use arping either, so drop the require_command as well.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95d33989
    • Petr Machata's avatar
      selftests: mlxsw: mirror_gre: Simplify · 388b2d98
      Petr Machata authored
      After the previous patch, the function test_span_failable() is always
      called with should_fail=1. Drop the argument and streamline the code.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      388b2d98
    • Petr Machata's avatar
      selftests: mirror: Drop dual SW/HW testing · d361d78f
      Petr Machata authored
      The mirroring tests are currently run in a skip_hw and optionally a skip_sw
      mode. The former tests the SW datapath, the latter the HW datapath, if
      available. In order to be able to test SW datapath on HW loopbacks, traps
      are installed on ingress to get traffic from the HW datapath to the SW one.
      This adds an unnecessary complexity when it would be much simpler to just
      use a veth-based topology to test the SW datapath. Thus drop all the code
      that supports this dual testing.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d361d78f
    • Petr Machata's avatar
      selftests: mirror: mirror_test(): Allow exact count of packets · a86e0df9
      Petr Machata authored
      The mirroring selftests work by sending ICMP traffic between two hosts.
      Along the way, this traffic is mirrored to a gretap netdevice, and counter
      taps are then installed strategically along the path of the mirrored
      traffic to verify the mirroring took place.
      
      The problem with this is that besides mirroring the primary traffic, any
      other service traffic is mirrored as well. At the same time, because the
      tests need to work in HW-offloaded scenarios, the ability of the device to
      do arbitrary packet inspection should not be taken for granted. Most tests
      therefore simply use matchall, one uses flower to match on IP address.
      
      As a result, the selftests are noisy, because besides the primary ICMP
      traffic, any amount of other service traffic is mirrored as well.
      
      mirror_test() accommodated this noisiness by giving the counters an
      allowance of several packets. But in the previous patch, where possible,
      counter taps were changed to match only on an exact ICMP message. At least
      in those cases, we can demand an exact number of packets to match.
      
      Where the tap is installed on a connective netdevice, the exact matching is
      not practical (though with u32, anything is possible). In those places,
      there should still be some leeway -- and probably bigger than before,
      because experience shows that these tests are very noisy.
      
      To that end, change mirror_test() so that it can be either called with an
      exact number to expect, or with an expression. Where leeway is needed,
      adjust callers to pass a ">= 10" instead of mere 10.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a86e0df9
    • Petr Machata's avatar
      selftests: mirror: do_test_span_dir_ips(): Install accurate taps · 83341535
      Petr Machata authored
      The mirroring selftests work by sending ICMP traffic between two hosts.
      Along the way, this traffic is mirrored to a gretap netdevice, and counter
      taps are then installed strategically along the path of the mirrored
      traffic to verify the mirroring took place.
      
      The problem with this is that besides mirroring the primary traffic, any
      other service traffic is mirrored as well. At the same time, because the
      tests need to work in HW-offloaded scenarios, the ability of the device to
      do arbitrary packet inspection should not be taken for granted. Most tests
      therefore simply use matchall, one uses flower to match on IP address.
      
      As a result, the selftests are noisy, because besides the primary ICMP
      traffic, any amount of other service traffic is mirrored as well.
      
      However, often the counter tap is installed at the remote end of the gretap
      tunnel. Since this is a SW-datapath scenario anyway, we can make the filter
      arbitrarily accurate.
      
      Thus in this patch, add parameters forward_type and backward_type to
      several mirroring test helpers, as some other helpers already have. Then
      change do_test_span_dir_ips() to instead of installing one generic tap and
      using it for test in both directions, install the tap for each direction
      separately, matching on the ICMP type given by these parameters.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83341535
    • Petr Machata's avatar
      selftests: mirror_gre_lag_lacp: Check counters at tunnel · 95e7b860
      Petr Machata authored
      The test works by sending packets through a tunnel, whence they are
      forwarded to a LAG. One of the LAG children is removed from the LAG prior
      to the exercise, and the test then counts how many packets pass through the
      other one. The issue with this is that it counts all packets, not just the
      encapsulated ones.
      
      So instead add a second gretap endpoint to receive the sent packets, and
      check reception counters there.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95e7b860
    • Petr Machata's avatar
      selftests: lib: tc_rule_stats_get(): Move default to argument definition · 9b5d5f27
      Petr Machata authored
      The argument $dir has a fallback value of "ingress". Move the fallback from
      the usage site to the argument definition block to make the fact clearer.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b5d5f27
    • Petr Machata's avatar
      selftests: mirror: Drop direction argument from several functions · 28e67746
      Petr Machata authored
      The argument is not used by these functions except to propagate it for
      ultimately no purpose.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28e67746
    • Petr Machata's avatar
      selftests: libs: Expand "$@" where possible · d5fbb2eb
      Petr Machata authored
      In some functions, argument-forwarding through "$@" without listing the
      individual arguments explicitly is fundamental to the operation of a
      function. E.g. xfail_on_veth() should be able to run various tests in the
      fail-to-xfail regime, and usage of "$@" is appropriate as an abstraction
      mechanism. For functions such as simple_if_init(), $@ is a handy way to
      pass an array.
      
      In other functions, it's merely a mechanism to save some typing, which
      however ends up obscuring the real arguments and makes life hard for those
      that end up reading the code.
      
      This patch adds some of the implicit function arguments and correspondingly
      expands $@'s. In several cases this will come in handy as following patches
      adjust the parameter lists.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5fbb2eb
    • David S. Miller's avatar
      Merge branch 'net-flash-modees-firmware' into main · c977ac49
      David S. Miller authored
      Danielle Ratson says:
      
      ====================
      Add ability to flash modules' firmware
      
      CMIS compliant modules such as QSFP-DD might be running a firmware that
      can be updated in a vendor-neutral way by exchanging messages between
      the host and the module as described in section 7.2.2 of revision
      4.0 of the CMIS standard.
      
      According to the CMIS standard, the firmware update process is done
      using a CDB commands sequence.
      
      CDB (Command Data Block Message Communication) reads and writes are
      performed on memory map pages 9Fh-AFh according to the CMIS standard,
      section 8.12 of revision 4.0.
      
      Add a pair of new ethtool messages that allow:
      
      * User space to trigger firmware update of transceiver modules
      
      * The kernel to notify user space about the progress of the process
      
      The user interface is designed to be asynchronous in order to avoid RTNL
      being held for too long and to allow several modules to be updated
      simultaneously. The interface is designed with CMIS compliant modules in
      mind, but kept generic enough to accommodate future use cases, if these
      arise.
      
      The kernel interface that will implement the firmware update using CDB
      command will include 2 layers that will be added under ethtool:
      
      * The upper layer that will be triggered from the module layer, is
       cmis_ fw_update.
      * The lower one is cmis_cdb.
      
      In the future there might be more operations to implement using CDB
      commands. Therefore, the idea is to keep the cmis_cdb interface clean and
      the cmis_fw_update specific to the cdb commands handling it.
      
      The communication between the kernel and the driver will be done using
      two ethtool operations that enable reading and writing the transceiver
      module EEPROM.
      The operation ethtool_ops::get_module_eeprom_by_page, that is already
      implemented, will be used for reading from the EEPROM the CDB reply,
      e.g. reading module setting, state, etc.
      The operation ethtool_ops::set_module_eeprom_by_page, that is added in
      the current patchset, will be used for writing to the EEPROM the CDB
      command such as start firmware image, run firmware image, etc.
      
      Therefore in order for a driver to implement module flashing, that
      driver needs to implement the two functions mentioned above.
      
      Patchset overview:
      Patch #1-#2: Implement the EEPROM writing in mlxsw.
      Patch #3: Define the interface between the kernel and user space.
      Patch #4: Add ability to notify the flashing firmware progress.
      Patch #5: Veto operations during flashing.
      Patch #6: Add extended compliance codes.
      Patch #7: Add the cdb layer.
      Patch #8: Add the fw_update layer.
      Patch #9: Add ability to flash transceiver modules' firmware.
      
      v8:
      	Patch #7:
      	* In the ethtool_cmis_wait_for_cond() evaluate the condition once more
      	  to decide if the error code should be -ETIMEDOUT or something else.
      	* s/netdev_err/netdev_err_once.
      
      v7:
      	Patch #4:
      		* Return -ENOMEM instead of PTR_ERR(attr) on
      		  ethnl_module_fw_flash_ntf_put_err().
      	Patch #9:
      		* Fix Warning for not unlocking the spin_lock in the error flow
                	  on module_flash_fw_work_list_add().
      		* Avoid the fall-through on ethnl_sock_priv_destroy().
      
      v6:
      	* Squash some of the last patch to patch #5 and patch #9.
      	Patch #3:
      		* Add paragraph in .rst file.
      	Patch #4:
      		* Reserve '1' more place on SKB for NUL terminator in
      		  the error message string.
      		* Add more prints on error flow, re-write the printing
      		  function and add ethnl_module_fw_flash_ntf_put_err().
      		* Change the communication method so notification will be
      		  sent in unicast instead of multicast.
      		* Add new 'struct ethnl_module_fw_flash_ntf_params' that holds
      		  the relevant info for unicast communication and use it to
      		  send notification to the specific socket.
      		* s/nla_put_u64_64bit/nla_put_uint/
      	Patch #7:
      		* In ethtool_cmis_cdb_init(), Use 'const' for the 'params'
      		  parameter.
      	Patch #8:
      		* Add a list field to struct ethtool_module_fw_flash for
      		  module_fw_flash_work_list that will be presented in the next
      		  patch.
      		* Move ethtool_cmis_fw_update() cleaning to a new function that
      		  will be represented in the next patch.
      		* Move some of the fields in struct ethtool_module_fw_flash to
      		  a separate struct, so ethtool_cmis_fw_update() will get only
      		  the relevant parameters for it.
      		* Edit the relevant functions to get the relevant params for
      		  them.
      		* s/CMIS_MODULE_READY_MAX_DURATION_USEC/CMIS_MODULE_READY_MAX_DURATION_MSEC
      	Patch #9:
      		* Add a paragraph in the commit message.
      		* Rename labels in module_flash_fw_schedule().
      		* Add info to genl_sk_priv_*() and implement the relevant
      		  callbacks, in order to handle properly a scenario of closing
      		  the socket from user space before the work item was ended.
      		* Add a list the holds all the ethtool_module_fw_flash struct
      		  that corresponds to the in progress work items.
      		* Add a new enum for the socket types.
      		* Use both above to identify a flashing socket, add it to the
      		  list and when closing socket affect only the flashing type.
      		* Create a new function that will get the work item instead of
      		  ethtool_cmis_fw_update().
      		* Edit the relevant functions to get the relevant params for
      		  them.
      		* The new function will call the old ethtool_cmis_fw_update(),
      		  and do the cleaning, so the existence of the list should be
      		  completely isolated in module.c.
      ===================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c977ac49
    • Danielle Ratson's avatar
      ethtool: Add ability to flash transceiver modules' firmware · 32b4c8b5
      Danielle Ratson authored
      Add the ability to flash the modules' firmware by implementing the
      interface between the user space and the kernel.
      
      Example from a succeeding implementation:
      
       # ethtool --flash-module-firmware swp40 file test.bin
      
       Transceiver module firmware flashing started for device swp40
       Transceiver module firmware flashing in progress for device swp40
       Progress: 99%
       Transceiver module firmware flashing completed for device swp40
      
      In addition, add infrastructure that allows modules to set socket-specific
      private data. This ensures that when a socket is closed from user space
      during the flashing process, the right socket halts sending notifications
      to user space until the work item is completed.
      Signed-off-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32b4c8b5