1. 17 Nov, 2023 2 commits
  2. 16 Nov, 2023 38 commits
    • David S. Miller's avatar
      Merge branch 'phylink-sfp-linkmode' · 9e631101
      David S. Miller authored
      Russell King says:
      
      ====================
      net: Add linkmode_fill, use linkmode_*() in phylink/sfp code
      
      This small series adds a linkmode_fill() op, and uses it in phylink.
      The SFP code is also converted to use linkmode_*() ops.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e631101
    • Russell King (Oracle)'s avatar
      net: sfp: use linkmode_*() rather than open coding · 466b97b1
      Russell King (Oracle) authored
      Use the linkmode_*() helpers rather than open coding the calls to the
      bitmap operators.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      466b97b1
    • Russell King (Oracle)'s avatar
      net: phylink: use linkmode_fill() · ba50a8d4
      Russell King (Oracle) authored
      Use linkmode_fill() rather than open coding the bitmap operation.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba50a8d4
    • Russell King (Oracle)'s avatar
      net: linkmode: add linkmode_fill() helper · 96fa96e1
      Russell King (Oracle) authored
      Add a linkmode_fill() helper, which will allow us to convert phylink's
      open coded bitmap_fill() operations.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96fa96e1
    • David S. Miller's avatar
      Merge branch 'tcp-change-reaction-to-ICMP' · 9a1f02f3
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: change reaction to ICMP messages
      
      ICMP[v6] messages received for a socket in TCP_SYN_SENT currently abort
      the connection attempt, in violation of standards.
      
      This series changes our stack to adhere to RFC 6069 and RFC 1122
      (4.2.3.9)
      
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a1f02f3
    • Eric Dumazet's avatar
      tcp: no longer abort SYN_SENT when receiving some ICMP · 0a8de364
      Eric Dumazet authored
      Currently, non fatal ICMP messages received on behalf
      of SYN_SENT sockets do call tcp_ld_RTO_revert()
      to implement RFC 6069, but immediately call tcp_done(),
      thus aborting the connect() attempt.
      
      This violates RFC 1122 following requirement:
      
      4.2.3.9  ICMP Messages
      ...
                o    Destination Unreachable -- codes 0, 1, 5
      
                       Since these Unreachable messages indicate soft error
                       conditions, TCP MUST NOT abort the connection, and it
                       SHOULD make the information available to the
                       application.
      
      This patch makes sure non 'fatal' ICMP[v6] messages do not
      abort the connection attempt.
      
      It enables RFC 6069 for SYN_SENT sockets as a result.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Morley <morleyd@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a8de364
    • Eric Dumazet's avatar
      tcp: use tp->total_rto to track number of linear timeouts in SYN_SENT state · 14dd92d0
      Eric Dumazet authored
      In commit ccce324d ("tcp: make the first N SYN RTO backoffs linear")
      David used icsk->icsk_backoff field to track the number of linear timeouts.
      
      Since then, tp->total_rto has been added.
      
      This commit uses tp->total_rto instead of icsk->icsk_backoff
      so that tcp_ld_RTO_revert() no longer can trigger an overflow
      in inet_csk_rto_backoff(). Other than the potential UBSAN
      report, there was no issue because receiving an ICMP message
      currently aborts the connect().
      
      In the following patch, we want to adhere to RFC 6069
      and RFC 1122 4.2.3.9.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Morley <morleyd@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14dd92d0
    • Alce Lafranque's avatar
      vxlan: add support for flowlabel inherit · c6e9dba3
      Alce Lafranque authored
      By default, VXLAN encapsulation over IPv6 sets the flow label to 0, with
      an option for a fixed value. This commits add the ability to inherit the
      flow label from the inner packet, like for other tunnel implementations.
      This enables devices using only L3 headers for ECMP to correctly balance
      VXLAN-encapsulated IPv6 packets.
      
      ```
      $ ./ip/ip link add dummy1 type dummy
      $ ./ip/ip addr add 2001:db8::2/64 dev dummy1
      $ ./ip/ip link set up dev dummy1
      $ ./ip/ip link add vxlan1 type vxlan id 100 flowlabel inherit remote 2001:db8::1 local 2001:db8::2
      $ ./ip/ip link set up dev vxlan1
      $ ./ip/ip addr add 2001:db8:1::2/64 dev vxlan1
      $ ./ip/ip link set arp off dev vxlan1
      $ ping -q 2001:db8:1::1 &
      $ tshark -d udp.port==8472,vxlan -Vpni dummy1 -c1
      [...]
      Internet Protocol Version 6, Src: 2001:db8::2, Dst: 2001:db8::1
          0110 .... = Version: 6
          .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT)
              .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
              .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0)
          .... 1011 0001 1010 1111 1011 = Flow Label: 0xb1afb
      [...]
      Virtual eXtensible Local Area Network
          Flags: 0x0800, VXLAN Network ID (VNI)
          Group Policy ID: 0
          VXLAN Network Identifier (VNI): 100
      [...]
      Internet Protocol Version 6, Src: 2001:db8:1::2, Dst: 2001:db8:1::1
          0110 .... = Version: 6
          .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT)
              .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
              .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0)
          .... 1011 0001 1010 1111 1011 = Flow Label: 0xb1afb
      ```
      Signed-off-by: default avatarAlce Lafranque <alce@lafranque.net>
      Co-developed-by: default avatarVincent Bernat <vincent@bernat.ch>
      Signed-off-by: default avatarVincent Bernat <vincent@bernat.ch>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6e9dba3
    • Lucas Karpinski's avatar
      selftests/net: synchronize udpgro tests' tx and rx connection · 3bdd9fd2
      Lucas Karpinski authored
      The sockets used by udpgso_bench_tx aren't always ready when
      udpgso_bench_tx transmits packets. This issue is more prevalent in -rt
      kernels, but can occur in both. Replace the hacky sleep calls with a
      function that checks whether the ports in the namespace are ready for
      use.
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarLucas Karpinski <lkarpins@redhat.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bdd9fd2
    • David S. Miller's avatar
      Merge branch 'tc-testing-tdc-updates' · e47ef9eb
      David S. Miller authored
      Pedro Tammela says:
      
      ====================
      selftests: tc-testing: updates to tdc
      
      - Patch 1 removes an obscure feature from tdc
      - Patch 2 reworks the namespace and devices setup giving a nice speed
      boost
      - Patch 3 preloads all tc modules when running kselftests
      - Patch 4 turns on parallel testing in kselftests
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e47ef9eb
    • Pedro Tammela's avatar
      selftests: tc-testing: use parallel tdc in kselftests · 04fd47bf
      Pedro Tammela authored
      Leverage parallel tests in kselftests using all the available cpus.
      We tested this in tuxsuite and locally extensively and it seems it's ready for prime time.
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04fd47bf
    • Pedro Tammela's avatar
      selftests: tc-testing: preload all modules in kselftests · bb9623c3
      Pedro Tammela authored
      While running tdc tests in parallel it can race over the module loading
      done by tc and fail the run with random errors.
      So avoid this by preloading all modules before running tdc in kselftests.
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb9623c3
    • Pedro Tammela's avatar
      selftests: tc-testing: rework namespaces and devices setup · fa63d353
      Pedro Tammela authored
      As mentioned in the TC Workshop 0x17, our recent changes to tdc broke
      downstream CI systems like tuxsuite. The issue is the classic problem
      with rcu/workqueue objects where you can miss them if not enough wall time
      has passed. The latter is subjective to the system and kernel config,
      in my machine could be nanoseconds while in another could be microseconds
      or more.
      
      In order to make the suite deterministic, poll for the existence
      of the objects in a reasonable manner. Talking netlink directly is the
      the best solution in order to avoid paying the cost of multiple
      'fork()' calls, so introduce a netlink based setup routine using
      pyroute2. We leave the iproute2 one as a fallback when pyroute2 is not
      available.
      
      Also rework the iproute2 side to mimic the netlink routine where it
      creates DEV0 as the peer of DEV1 and moves DEV1 into the net namespace.
      This way when the namespace is deleted DEV0 is also deleted
      automatically, leaving no margin for resource leaks.
      
      Another bonus of this change is that our setup time sped up by a factor
      of 2 when using netlink.
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa63d353
    • Pedro Tammela's avatar
      selftests: tc-testing: drop '-N' argument from nsPlugin · 9ffa01ca
      Pedro Tammela authored
      This argument would bypass the net namespace creation and run the test in
      the root namespace, even if nsPlugin was specified.
      Drop it as it's the same as commenting out the nsPlugin from a test and adds
      additional complexity to the plugin code.
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ffa01ca
    • Christian Marangi's avatar
      dt-bindings: Document Marvell Aquantia PHY · 0fbe92b9
      Christian Marangi authored
      Document bindings for Marvell Aquantia PHY.
      
      The Marvell Aquantia PHY require a firmware to work correctly and there
      at least 3 way to load this firmware.
      
      Describe all the different way and document the binding "firmware-name"
      to load the PHY firmware from userspace.
      Signed-off-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Reviewed-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fbe92b9
    • Robert Marko's avatar
      net: phy: aquantia: add firmware load support · e93984eb
      Robert Marko authored
      Aquantia PHY-s require firmware to be loaded before they start operating.
      It can be automatically loaded in case when there is a SPI-NOR connected
      to Aquantia PHY-s or can be loaded from the host via MDIO.
      
      This patch adds support for loading the firmware via MDIO as in most cases
      there is no SPI-NOR being used to save on cost.
      Firmware loading code itself is ported from mainline U-boot with cleanups.
      
      The firmware has mixed values both in big and little endian.
      PHY core itself is big-endian but it expects values to be in little-endian.
      The firmware is little-endian but CRC-16 value for it is stored at the end
      of firmware in big-endian.
      
      It seems the PHY does the conversion internally from firmware that is
      little-endian to the PHY that is big-endian on using the mailbox
      but mailbox returns a big-endian CRC-16 to verify the written data
      integrity.
      Co-developed-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Signed-off-by: default avatarRobert Marko <robimarko@gmail.com>
      Signed-off-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e93984eb
    • Christian Marangi's avatar
      net: phy: aquantia: move MMD_VEND define to header · e1fbfa4a
      Christian Marangi authored
      Move MMD_VEND define to header to clean things up and in preparation for
      firmware loading support that require some define placed in
      aquantia_main.
      Signed-off-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1fbfa4a
    • Christian Marangi's avatar
      net: phy: aquantia: move to separate directory · d2213db3
      Christian Marangi authored
      Move aquantia PHY driver to separate driectory in preparation for
      firmware loading support to keep things tidy.
      Signed-off-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2213db3
    • David S. Miller's avatar
      Merge branch 'octeon_ep-transmit-cleanups-and-optimizations' · 470f3669
      David S. Miller authored
      Shinas Rasheed says:
      
      ====================
      Cleanup and optimizations to transmit code
      
      Pad small packets to ETH_ZLEN before transmit, cleanup dma sync calls,
      add xmit_more functionality and then further remove atomic
      variable usage in the prior.
      
      Changes:
      V3:
        - Stop returning NETDEV_TX_BUSY when ring is full in xmit_patch.
          Change to inspect early if next packet can fit in ring instead of
          current packet, and stop queue if not.
        - Add smp_mb between stopping tx queue and checking if tx queue has
          free entries again, in queue full check function to let reflect
          IQ process completions that might have happened on other cpus.
        - Update small packet padding patch changelog to give more info.
      V2: https://lore.kernel.org/all/20231024145119.2366588-1-srasheed@marvell.com/
        - Added patch for padding small packets to ETH_ZLEN, part of
          optimization patches for transmit code missed out in V1
        - Updated changelog to provide more details for dma_sync remove patch
        - Updated changelog to use imperative tone in add xmit_more patch
      V1: https://lore.kernel.org/all/20231023114449.2362147-1-srasheed@marvell.com/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      470f3669
    • Shinas Rasheed's avatar
      octeon_ep: remove atomic variable usage in Tx data path · dc9c02b7
      Shinas Rasheed authored
      Replace atomic variable "instr_pending" which represents number of
      posted tx instructions pending completion, with host_write_idx and
      flush_index variables in the xmit and completion processing respectively.
      Signed-off-by: default avatarShinas Rasheed <srasheed@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc9c02b7
    • Shinas Rasheed's avatar
      octeon_ep: implement xmit_more in transmit · 373d9a55
      Shinas Rasheed authored
      Add xmit_more handling in tx datapath for octeon_ep pf.
      Signed-off-by: default avatarShinas Rasheed <srasheed@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      373d9a55
    • Shinas Rasheed's avatar
      octeon_ep: remove dma sync in trasmit path · 2fba5069
      Shinas Rasheed authored
      Cleanup dma sync calls for scatter gather
      mappings, since they are coherent allocations
      and do not need explicit sync to be called.
      Signed-off-by: default avatarShinas Rasheed <srasheed@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fba5069
    • Shinas Rasheed's avatar
      octeon_ep: add padding for small packets · 5827fe2b
      Shinas Rasheed authored
      Pad small packets to ETH_ZLEN before transmit, as hardware
      cannot pad and requires software padding to ensure
      minimum ethernet frame length.
      Signed-off-by: default avatarShinas Rasheed <srasheed@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5827fe2b
    • Paolo Abeni's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 56eddc3c
      Paolo Abeni authored
      Cross-merge networking fixes after downstream PR.
      
      No conflicts.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      56eddc3c
    • Linus Torvalds's avatar
      Merge tag 'net-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 7475e51b
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from BPF and netfilter.
      
        Current release - regressions:
      
         - core: fix undefined behavior in netdev name allocation
      
         - bpf: do not allocate percpu memory at init stage
      
         - netfilter: nf_tables: split async and sync catchall in two
           functions
      
         - mptcp: fix possible NULL pointer dereference on close
      
        Current release - new code bugs:
      
         - eth: ice: dpll: fix initial lock status of dpll
      
        Previous releases - regressions:
      
         - bpf: fix precision backtracking instruction iteration
      
         - af_unix: fix use-after-free in unix_stream_read_actor()
      
         - tipc: fix kernel-infoleak due to uninitialized TLV value
      
         - eth: bonding: stop the device in bond_setup_by_slave()
      
         - eth: mlx5:
            - fix double free of encap_header
            - avoid referencing skb after free-ing in drop path
      
         - eth: hns3: fix VF reset
      
         - eth: mvneta: fix calls to page_pool_get_stats
      
        Previous releases - always broken:
      
         - core: set SOCK_RCU_FREE before inserting socket into hashtable
      
         - bpf: fix control-flow graph checking in privileged mode
      
         - eth: ppp: limit MRU to 64K
      
         - eth: stmmac: avoid rx queue overrun
      
         - eth: icssg-prueth: fix error cleanup on failing initialization
      
         - eth: hns3: fix out-of-bounds access may occur when coalesce info is
           read via debugfs
      
         - eth: cortina: handle large frames
      
        Misc:
      
         - selftests: gso: support CONFIG_MAX_SKB_FRAGS up to 45"
      
      * tag 'net-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (78 commits)
        macvlan: Don't propagate promisc change to lower dev in passthru
        net: sched: do not offload flows with a helper in act_ct
        net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors
        net/mlx5e: Check return value of snprintf writing to fw_version buffer
        net/mlx5e: Reduce the size of icosq_str
        net/mlx5: Increase size of irq name buffer
        net/mlx5e: Update doorbell for port timestamping CQ before the software counter
        net/mlx5e: Track xmit submission to PTP WQ after populating metadata map
        net/mlx5e: Avoid referencing skb after free-ing in drop path of mlx5e_sq_xmit_wqe
        net/mlx5e: Don't modify the peer sent-to-vport rules for IPSec offload
        net/mlx5e: Fix pedit endianness
        net/mlx5e: fix double free of encap_header in update funcs
        net/mlx5e: fix double free of encap_header
        net/mlx5: Decouple PHC .adjtime and .adjphase implementations
        net/mlx5: DR, Allow old devices to use multi destination FTE
        net/mlx5: Free used cpus mask when an IRQ is released
        Revert "net/mlx5: DR, Supporting inline WQE when possible"
        bpf: Do not allocate percpu memory at init stage
        net: Fix undefined behavior in netdev name allocation
        dt-bindings: net: ethernet-controller: Fix formatting error
        ...
      7475e51b
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.7a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 6eb1acd9
      Linus Torvalds authored
      Pull xen updates from Juergen Gross:
      
       - A fix in the Xen events driver avoiding the use of RCU after
         the call to rcu_report_dead() when taking a cpu down
      
       - A fix for running as Xen dom0 to line up ACPI's idea of power
         management capabilities with the one of Xen
      
       - A cleanup eliminating several kernel-doc warnings in Xen related
         code
      
       - A cleanup series of the Xen events driver
      
      * tag 'for-linus-6.7a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/events: remove some info_for_irq() calls in pirq handling
        xen/events: modify internal [un]bind interfaces
        xen/events: drop xen_allocate_irqs_dynamic()
        xen/events: remove some simple helpers from events_base.c
        xen/events: reduce externally visible helper functions
        xen/events: remove unused functions
        xen/events: fix delayed eoi list handling
        xen/shbuf: eliminate 17 kernel-doc warnings
        acpi/processor: sanitize _OSC/_PDC capabilities for Xen dom0
        xen/events: avoid using info_for_irq() in xen_send_IPI_one()
      6eb1acd9
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 372bed5f
      Linus Torvalds authored
      Pull virtio fixes from Michael Tsirkin:
       "Bugfixes all over the place"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        vhost-vdpa: fix use after free in vhost_vdpa_probe()
        virtio_pci: Switch away from deprecated irq_set_affinity_hint
        riscv, qemu_fw_cfg: Add support for RISC-V architecture
        vdpa_sim_blk: allocate the buffer zeroed
        virtio_pci: move structure to a header
      372bed5f
    • Tobias Klauser's avatar
      indirect_call_wrapper: Fix typo in INDIRECT_CALL_$NR kerneldoc · 3185d57c
      Tobias Klauser authored
      Fix a small typo in the kerneldoc comment of the INDIRECT_CALL_$NR
      macro.
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20231114104202.4680-1-tklauser@distanz.chSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3185d57c
    • Paolo Abeni's avatar
      Merge tag 'nf-23-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · cff088d9
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Remove unused variable causing compilation warning in nft_set_rbtree,
         from Yang Li. This unused variable is a left over from previous
         merge window.
      
      2) Possible return of uninitialized in nf_conntrack_bridge, from
         Linkui Xiao. This is there since nf_conntrack_bridge is available.
      
      3) Fix incorrect pointer math in nft_byteorder, from Dan Carpenter.
         Problem has been there since 2016.
      
      4) Fix bogus error in destroy set element command. Problem is there
         since this new destroy command was added.
      
      5) Fix race condition in ipset between swap and destroy commands and
         add/del/test control plane. This problem is there since ipset was
         merged.
      
      6) Split async and sync catchall GC in two function to fix unsafe
         iteration over RCU. This is a fix-for-fix that was included in
         the previous pull request.
      
      * tag 'nf-23-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: split async and sync catchall in two functions
        netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test
        netfilter: nf_tables: bogus ENOENT when destroying element which does not exist
        netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()
        netfilter: nf_conntrack_bridge: initialize err to 0
        netfilter: nft_set_rbtree: Remove unused variable nft_net
      ====================
      
      Link: https://lore.kernel.org/r/20231115184514.8965-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cff088d9
    • Vlad Buslov's avatar
      macvlan: Don't propagate promisc change to lower dev in passthru · 7e1caeac
      Vlad Buslov authored
      Macvlan device in passthru mode sets its lower device promiscuous mode
      according to its MACVLAN_FLAG_NOPROMISC flag instead of synchronizing it to
      its own promiscuity setting. However, macvlan_change_rx_flags() function
      doesn't check the mode before propagating such changes to the lower device
      which can cause net_device->promiscuity counter overflow as illustrated by
      reproduction example [0] and resulting dmesg log [1]. Fix the issue by
      first verifying the mode in macvlan_change_rx_flags() function before
      propagating promiscuous mode change to the lower device.
      
      [0]:
      ip link add macvlan1 link enp8s0f0 type macvlan mode passthru
      ip link set macvlan1 promisc on
      ip l set dev macvlan1 up
      ip link set macvlan1 promisc off
      ip l set dev macvlan1 down
      ip l set dev macvlan1 up
      
      [1]:
      [ 5156.281724] macvlan1: entered promiscuous mode
      [ 5156.285467] mlx5_core 0000:08:00.0 enp8s0f0: entered promiscuous mode
      [ 5156.287639] macvlan1: left promiscuous mode
      [ 5156.288339] mlx5_core 0000:08:00.0 enp8s0f0: left promiscuous mode
      [ 5156.290907] mlx5_core 0000:08:00.0 enp8s0f0: entered promiscuous mode
      [ 5156.317197] mlx5_core 0000:08:00.0 enp8s0f0: promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken.
      
      Fixes: efdbd2b3 ("macvlan: Propagate promiscuity setting to lower devices.")
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20231114175915.1649154-1-vladbu@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7e1caeac
    • Xin Long's avatar
      net: sched: do not offload flows with a helper in act_ct · 7cd5af0e
      Xin Long authored
      There is no hardware supporting ct helper offload. However, prior to this
      patch, a flower filter with a helper in the ct action can be successfully
      set into the HW, for example (eth1 is a bnxt NIC):
      
        # tc qdisc add dev eth1 ingress_block 22 ingress
        # tc filter add block 22 proto ip flower skip_sw ip_proto tcp \
          dst_port 21 ct_state -trk action ct helper ipv4-tcp-ftp
        # tc filter show dev eth1 ingress
      
          filter block 22 protocol ip pref 49152 flower chain 0 handle 0x1
            eth_type ipv4
            ip_proto tcp
            dst_port 21
            ct_state -trk
            skip_sw
            in_hw in_hw_count 1   <----
              action order 1: ct zone 0 helper ipv4-tcp-ftp pipe
               index 2 ref 1 bind 1
              used_hw_stats delayed
      
      This might cause the flower filter not to work as expected in the HW.
      
      This patch avoids this problem by simply returning -EOPNOTSUPP in
      tcf_ct_offload_act_setup() to not allow to offload flows with a helper
      in act_ct.
      
      Fixes: a21b06e7 ("net: sched: add helper support in act_ct")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/f8685ec7702c4a448a1371a8b34b43217b583b9d.1699898008.git.lucien.xin@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7cd5af0e
    • Jakub Kicinski's avatar
      Merge branch 'mlx5-fixes-2023-11-13-manual' · bdc454fc
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      This series provides bug fixes to mlx5 driver.
      ====================
      
      Link: https://lore.kernel.org/r/20231114215846.5902-1-saeed@kernel.org/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bdc454fc
    • Rahul Rameshbabu's avatar
      net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors · 1b2bd0c0
      Rahul Rameshbabu authored
      Treat the operation as an error case when the return value is equivalent to
      the size of the name buffer. Failed to write null terminator to the name
      buffer, making the string malformed and should not be used. Provide a
      string with only the firmware version when forming the string with the
      board id fails. This logic for representors is identical to normal flow
      with ethtool.
      
      Without check, will trigger -Wformat-truncation with W=1.
      
          drivers/net/ethernet/mellanox/mlx5/core/en_rep.c: In function 'mlx5e_rep_get_drvinfo':
          drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:78:31: warning: '%.16s' directive output may be truncated writing up to 16 bytes into a region of size between 13 and 22 [-Wformat-truncation=]
            78 |                  "%d.%d.%04d (%.16s)",
               |                               ^~~~~
          drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:77:9: note: 'snprintf' output between 12 and 37 bytes into a destination of size 32
            77 |         snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
               |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            78 |                  "%d.%d.%04d (%.16s)",
               |                  ~~~~~~~~~~~~~~~~~~~~~
            79 |                  fw_rev_maj(mdev), fw_rev_min(mdev),
               |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            80 |                  fw_rev_sub(mdev), mdev->board_id);
               |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Fixes: cf83c8fd ("net/mlx5e: Add missing ethtool driver info for representors")
      Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732cSigned-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-16-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1b2bd0c0
    • Rahul Rameshbabu's avatar
      net/mlx5e: Check return value of snprintf writing to fw_version buffer · 41e63c2b
      Rahul Rameshbabu authored
      Treat the operation as an error case when the return value is equivalent to
      the size of the name buffer. Failed to write null terminator to the name
      buffer, making the string malformed and should not be used. Provide a
      string with only the firmware version when forming the string with the
      board id fails.
      
      Without check, will trigger -Wformat-truncation with W=1.
      
          drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: In function 'mlx5e_ethtool_get_drvinfo':
          drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:49:31: warning: '%.16s' directive output may be truncated writing up to 16 bytes into a region of size between 13 and 22 [-Wformat-truncation=]
            49 |                  "%d.%d.%04d (%.16s)",
               |                               ^~~~~
          drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:48:9: note: 'snprintf' output between 12 and 37 bytes into a destination of size 32
            48 |         snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
               |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            49 |                  "%d.%d.%04d (%.16s)",
               |                  ~~~~~~~~~~~~~~~~~~~~~
            50 |                  fw_rev_maj(mdev), fw_rev_min(mdev), fw_rev_sub(mdev),
               |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            51 |                  mdev->board_id);
               |                  ~~~~~~~~~~~~~~~
      
      Fixes: 84e11edb ("net/mlx5e: Show board id in ethtool driver information")
      Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732cSigned-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      41e63c2b
    • Saeed Mahameed's avatar
      net/mlx5e: Reduce the size of icosq_str · dce94142
      Saeed Mahameed authored
      icosq_str size is unnecessarily too long, and it causes a build warning
      -Wformat-truncation with W=1. Looking closely, It doesn't need to be 255B,
      hence this patch reduces the size to 32B which should be more than enough
      to host the string: "ICOSQ: 0x%x, ".
      
      While here, add a missing space in the formatted string.
      
      This fixes the following build warning:
      
      $ KCFLAGS='-Wall -Werror'
      $ make O=/tmp/kbuild/linux W=1 -s -j12 drivers/net/ethernet/mellanox/mlx5/core/
      
      drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c: In function 'mlx5e_reporter_rx_timeout':
      drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c:718:56:
      error: ', CQ: 0x' directive output may be truncated writing 8 bytes into a region of size between 0 and 255 [-Werror=format-truncation=]
        718 |                  "RX timeout on channel: %d, %sRQ: 0x%x, CQ: 0x%x",
            |                                                        ^~~~~~~~
      drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c:717:9: note: 'snprintf' output between 43 and 322 bytes into a destination of size 288
        717 |         snprintf(err_str, sizeof(err_str),
            |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        718 |                  "RX timeout on channel: %d, %sRQ: 0x%x, CQ: 0x%x",
            |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        719 |                  rq->ix, icosq_str, rq->rqn, rq->cq.mcq.cqn);
            |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Fixes: 521f31af ("net/mlx5e: Allow RQ outside of channel context")
      Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732cSigned-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-14-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dce94142
    • Rahul Rameshbabu's avatar
      net/mlx5: Increase size of irq name buffer · 3338bebf
      Rahul Rameshbabu authored
      Without increased buffer size, will trigger -Wformat-truncation with W=1
      for the snprintf operation writing to the buffer.
      
          drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c: In function 'mlx5_irq_alloc':
          drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:296:7: error: '@pci:' directive output may be truncated writing 5 bytes into a region of size between 1 and 32 [-Werror=format-truncation=]
            296 |    "%s@pci:%s", name, pci_name(dev->pdev));
                |       ^~~~~
          drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:295:2: note: 'snprintf' output 6 or more bytes (assuming 37) into a destination of size 32
            295 |  snprintf(irq->name, MLX5_MAX_IRQ_NAME,
                |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            296 |    "%s@pci:%s", name, pci_name(dev->pdev));
                |    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Fixes: ada9f5d0 ("IB/mlx5: Fix eq names to display nicely in /proc/interrupts")
      Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732cSigned-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-13-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3338bebf
    • Rahul Rameshbabu's avatar
      net/mlx5e: Update doorbell for port timestamping CQ before the software counter · 92214be5
      Rahul Rameshbabu authored
      Previously, mlx5e_ptp_poll_ts_cq would update the device doorbell with the
      incremented consumer index after the relevant software counters in the
      kernel were updated. In the mlx5e_sq_xmit_wqe context, this would lead to
      either overrunning the device CQ or exceeding the expected software buffer
      size in the device CQ if the device CQ size was greater than the software
      buffer size. Update the relevant software counter only after updating the
      device CQ consumer index in the port timestamping napi_poll context.
      
      Log:
          mlx5_core 0000:08:00.0: cq_err_event_notifier:517:(pid 0): CQ error on CQN 0x487, syndrome 0x1
          mlx5_core 0000:08:00.0 eth2: mlx5e_cq_error_event: cqn=0x000487 event=0x04
      
      Fixes: 1880bc4e ("net/mlx5e: Add TX port timestamp support")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-12-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      92214be5
    • Rahul Rameshbabu's avatar
      net/mlx5e: Track xmit submission to PTP WQ after populating metadata map · 7e3f3ba9
      Rahul Rameshbabu authored
      Ensure the skb is available in metadata mapping to skbs before tracking the
      metadata index for detecting undelivered CQEs. If the metadata index is put
      in the tracking list before putting the skb in the map, the metadata index
      might be used for detecting undelivered CQEs before the relevant skb is
      available in the map, which can lead to a null-ptr-deref.
      
      Log:
          general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN
          KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
          CPU: 0 PID: 1243 Comm: kworker/0:2 Not tainted 6.6.0-rc4+ #108
          Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
          Workqueue: events mlx5e_rx_dim_work [mlx5_core]
          RIP: 0010:mlx5e_ptp_napi_poll+0x9a4/0x2290 [mlx5_core]
          Code: 8c 24 38 cc ff ff 4c 8d 3c c1 4c 89 f9 48 c1 e9 03 42 80 3c 31 00 0f 85 97 0f 00 00 4d 8b 3f 49 8d 7f 28 48 89 f9 48 c1 e9 03 <42> 80 3c 31 00 0f 85 8b 0f 00 00 49 8b 47 28 48 85 c0 0f 84 05 07
          RSP: 0018:ffff8884d3c09c88 EFLAGS: 00010206
          RAX: 0000000000000069 RBX: ffff8881160349d8 RCX: 0000000000000005
          RDX: ffffed10218f48cf RSI: 0000000000000004 RDI: 0000000000000028
          RBP: ffff888122707700 R08: 0000000000000001 R09: ffffed109a781383
          R10: 0000000000000003 R11: 0000000000000003 R12: ffff88810c7a7a40
          R13: ffff888122707700 R14: dffffc0000000000 R15: 0000000000000000
          FS:  0000000000000000(0000) GS:ffff8884d3c00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00007f4f878dd6e0 CR3: 000000014d108002 CR4: 0000000000370eb0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
          Call Trace:
          <IRQ>
          ? die_addr+0x3c/0xa0
          ? exc_general_protection+0x144/0x210
          ? asm_exc_general_protection+0x22/0x30
          ? mlx5e_ptp_napi_poll+0x9a4/0x2290 [mlx5_core]
          ? mlx5e_ptp_napi_poll+0x8f6/0x2290 [mlx5_core]
          __napi_poll.constprop.0+0xa4/0x580
          net_rx_action+0x460/0xb80
          ? _raw_spin_unlock_irqrestore+0x32/0x60
          ? __napi_poll.constprop.0+0x580/0x580
          ? tasklet_action_common.isra.0+0x2ef/0x760
          __do_softirq+0x26c/0x827
          irq_exit_rcu+0xc2/0x100
          common_interrupt+0x7f/0xa0
          </IRQ>
          <TASK>
          asm_common_interrupt+0x22/0x40
          RIP: 0010:__kmem_cache_alloc_node+0xb/0x330
          Code: 41 5d 41 5e 41 5f c3 8b 44 24 14 8b 4c 24 10 09 c8 eb d5 e8 b7 43 ca 01 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 <41> 56 41 89 d6 41 55 41 89 f5 41 54 49 89 fc 53 48 83 e4 f0 48 83
          RSP: 0018:ffff88812c4079c0 EFLAGS: 00000246
          RAX: 1ffffffff083c7fe RBX: ffff888100042dc0 RCX: 0000000000000218
          RDX: 00000000ffffffff RSI: 0000000000000dc0 RDI: ffff888100042dc0
          RBP: ffff88812c4079c8 R08: ffffffffa0289f96 R09: ffffed1025880ea9
          R10: ffff888138839f80 R11: 0000000000000002 R12: 0000000000000dc0
          R13: 0000000000000100 R14: 000000000000008c R15: ffff8881271fc450
          ? cmd_exec+0x796/0x2200 [mlx5_core]
          kmalloc_trace+0x26/0xc0
          cmd_exec+0x796/0x2200 [mlx5_core]
          mlx5_cmd_do+0x22/0xc0 [mlx5_core]
          mlx5_cmd_exec+0x17/0x30 [mlx5_core]
          mlx5_core_modify_cq_moderation+0x139/0x1b0 [mlx5_core]
          ? mlx5_add_cq_to_tasklet+0x280/0x280 [mlx5_core]
          ? lockdep_set_lock_cmp_fn+0x190/0x190
          ? process_one_work+0x659/0x1220
          mlx5e_rx_dim_work+0x9d/0x100 [mlx5_core]
          process_one_work+0x730/0x1220
          ? lockdep_hardirqs_on_prepare+0x400/0x400
          ? max_active_store+0xf0/0xf0
          ? assign_work+0x168/0x240
          worker_thread+0x70f/0x12d0
          ? __kthread_parkme+0xd1/0x1d0
          ? process_one_work+0x1220/0x1220
          kthread+0x2d9/0x3b0
          ? kthread_complete_and_exit+0x20/0x20
          ret_from_fork+0x2d/0x70
          ? kthread_complete_and_exit+0x20/0x20
          ret_from_fork_asm+0x11/0x20
          </TASK>
          Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_ib ib_uverbs ib_core zram zsmalloc mlx5_core fuse
          ---[ end trace 0000000000000000 ]---
      
      Fixes: 3178308a ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20231114215846.5902-11-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7e3f3ba9