1. 18 Apr, 2024 10 commits
  2. 17 Apr, 2024 5 commits
    • Gerd Bayer's avatar
      s390/ism: Properly fix receive message buffer allocation · 83781384
      Gerd Bayer authored
      Since [1], dma_alloc_coherent() does not accept requests for GFP_COMP
      anymore, even on archs that may be able to fulfill this. Functionality that
      relied on the receive buffer being a compound page broke at that point:
      The SMC-D protocol, that utilizes the ism device driver, passes receive
      buffers to the splice processor in a struct splice_pipe_desc with a
      single entry list of struct pages. As the buffer is no longer a compound
      page, the splice processor now rejects requests to handle more than a
      page worth of data.
      
      Replace dma_alloc_coherent() and allocate a buffer with folio_alloc and
      create a DMA map for it with dma_map_page(). Since only receive buffers
      on ISM devices use DMA, qualify the mapping as FROM_DEVICE.
      Since ISM devices are available on arch s390, only, and on that arch all
      DMA is coherent, there is no need to introduce and export some kind of
      dma_sync_to_cpu() method to be called by the SMC-D protocol layer.
      
      Analogously, replace dma_free_coherent by a two step dma_unmap_page,
      then folio_put to free the receive buffer.
      
      [1] https://lore.kernel.org/all/20221113163535.884299-1-hch@lst.de/
      
      Fixes: c08004ee ("s390/ism: don't pass bogus GFP_ flags to dma_alloc_coherent")
      Signed-off-by: default avatarGerd Bayer <gbayer@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83781384
    • David S. Miller's avatar
      Merge branch 'mt7530-fixes' · cb178ccb
      David S. Miller authored
      Merge branch 'mr7530-fixes'
      
      Arınç ÜNAL says:
      
      ====================
      Fix port mirroring on MT7530 DSA subdriver
      
      This patch series fixes the frames received on the local port (monitor
      port) not being mirrored, and port mirroring for the MT7988 SoC switch.
      ====================
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      cb178ccb
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: fix port mirroring for MT7988 SoC switch · 2c606d13
      Arınç ÜNAL authored
      The "MT7988A Wi-Fi 7 Generation Router Platform: Datasheet (Open Version)
      v0.1" document shows bits 16 to 18 as the MIRROR_PORT field of the CPU
      forward control register. Currently, the MT7530 DSA subdriver configures
      bits 0 to 2 of the CPU forward control register which breaks the port
      mirroring feature for the MT7988 SoC switch.
      
      Fix this by using the MT7531_MIRROR_PORT_GET() and MT7531_MIRROR_PORT_SET()
      macros which utilise the correct bits.
      
      Fixes: 110c18bf ("net: dsa: mt7530: introduce driver for MT7988 built-in switch")
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Acked-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c606d13
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: fix mirroring frames received on local port · d59cf049
      Arınç ÜNAL authored
      This switch intellectual property provides a bit on the ARL global control
      register which controls allowing mirroring frames which are received on the
      local port (monitor port). This bit is unset after reset.
      
      This ability must be enabled to fully support the port mirroring feature on
      this switch intellectual property.
      
      Therefore, this patch fixes the traffic not being reflected on a port,
      which would be configured like below:
      
        tc qdisc add dev swp0 clsact
      
        tc filter add dev swp0 ingress matchall skip_sw \
        action mirred egress mirror dev swp0
      
      As a side note, this configuration provides the hairpinning feature for a
      single port.
      
      Fixes: 37feab60 ("net: dsa: mt7530: add support for port mirroring")
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d59cf049
    • Lei Chen's avatar
      tun: limit printing rate when illegal packet received by tun dev · f8bbc07a
      Lei Chen authored
      vhost_worker will call tun call backs to receive packets. If too many
      illegal packets arrives, tun_do_read will keep dumping packet contents.
      When console is enabled, it will costs much more cpu time to dump
      packet and soft lockup will be detected.
      
      net_ratelimit mechanism can be used to limit the dumping rate.
      
      PID: 33036    TASK: ffff949da6f20000  CPU: 23   COMMAND: "vhost-32980"
       #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253
       #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3
       #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e
       #3 [fffffe00003fced0] do_nmi at ffffffff8922660d
       #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663
          [exception RIP: io_serial_in+20]
          RIP: ffffffff89792594  RSP: ffffa655314979e8  RFLAGS: 00000002
          RAX: ffffffff89792500  RBX: ffffffff8af428a0  RCX: 0000000000000000
          RDX: 00000000000003fd  RSI: 0000000000000005  RDI: ffffffff8af428a0
          RBP: 0000000000002710   R8: 0000000000000004   R9: 000000000000000f
          R10: 0000000000000000  R11: ffffffff8acbf64f  R12: 0000000000000020
          R13: ffffffff8acbf698  R14: 0000000000000058  R15: 0000000000000000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #5 [ffffa655314979e8] io_serial_in at ffffffff89792594
       #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470
       #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6
       #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605
       #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558
       #10 [ffffa65531497ac8] console_unlock at ffffffff89316124
       #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07
       #12 [ffffa65531497b68] printk at ffffffff89318306
       #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765
       #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun]
       #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun]
       #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net]
       #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost]
       #18 [ffffa65531497f10] kthread at ffffffff892d2e72
       #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f
      
      Fixes: ef3db4a5 ("tun: avoid BUG, dump packet on GSO errors")
      Signed-off-by: default avatarLei Chen <lei.chen@smartx.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f8bbc07a
  3. 16 Apr, 2024 12 commits
    • Marcin Szycik's avatar
      ice: Fix checking for unsupported keys on non-tunnel device · 2cca35f5
      Marcin Szycik authored
      Add missing FLOW_DISSECTOR_KEY_ENC_* checks to TC flower filter parsing.
      Without these checks, it would be possible to add filters with tunnel
      options on non-tunnel devices. enc_* options are only valid for tunnel
      devices.
      
      Example:
        devlink dev eswitch set $PF1_PCI mode switchdev
        echo 1 > /sys/class/net/$PF1/device/sriov_numvfs
        tc qdisc add dev $VF1_PR ingress
        ethtool -K $PF1 hw-tc-offload on
        tc filter add dev $VF1_PR ingress flower enc_ttl 12 skip_sw action drop
      
      Fixes: 9e300987 ("ice: VXLAN and Geneve TC support")
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarSujai Buvaneswaran <sujai.buvaneswaran@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      2cca35f5
    • Michal Swiatkowski's avatar
      ice: tc: allow zero flags in parsing tc flower · 73278715
      Michal Swiatkowski authored
      The check for flags is done to not pass empty lookups to adding switch
      rule functions. Since metadata is always added to lookups there is no
      need to check against the flag.
      
      It is also fixing the problem with such rule:
      $ tc filter add dev gtp_dev ingress protocol ip prio 0 flower \
      	enc_dst_port 2123 action drop
      Switch block in case of GTP can't parse the destination port, because it
      should always be set to GTP specific value. The same with ethertype. The
      result is that there is no other matching criteria than GTP tunnel. In
      this case flags is 0, rule can't be added only because of defensive
      check against flags.
      
      Fixes: 9a225f81 ("ice: Support GTP-U and GTP-C offload in switchdev")
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: default avatarSujai Buvaneswaran <sujai.buvaneswaran@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      73278715
    • Michal Swiatkowski's avatar
      ice: tc: check src_vsi in case of traffic from VF · 42805160
      Michal Swiatkowski authored
      In case of traffic going from the VF (so ingress for port representor)
      source VSI should be consider during packet classification. It is
      needed for hardware to not match packets from different ports with
      filters added on other port.
      
      It is only for "from VF" traffic, because other traffic direction
      doesn't have source VSI.
      
      Set correct ::src_vsi in rule_info to pass it to the hardware filter.
      
      For example this rule should drop only ipv4 packets from eth10, not from
      the others VF PRs. It is needed to check source VSI in this case.
      $tc filter add dev eth10 ingress protocol ip flower skip_sw action drop
      
      Fixes: 0d08a441 ("ice: ndo_setup_tc implementation for PF")
      Reviewed-by: default avatarJedrzej Jagielski <jedrzej.jagielski@intel.com>
      Reviewed-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: default avatarSujai Buvaneswaran <sujai.buvaneswaran@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      42805160
    • Paolo Abeni's avatar
      Merge branch 'net-stmmac-fix-mac-capabilities-procedure' · e226eade
      Paolo Abeni authored
      Serge Semin says:
      
      ====================
      net: stmmac: Fix MAC-capabilities procedure
      
      The series got born as a result of the discussions around the recent
      Yanteng' series adding the Loongson LS7A1000, LS2K1000, LS7A2000, LS2K2000
      MACs support:
      Link: https://lore.kernel.org/netdev/fu3f6uoakylnb6eijllakeu5i4okcyqq7sfafhp5efaocbsrwe@w74xe7gb6x7p
      
      In particular the Yanteng' patchset needed to implement the Loongson
      MAC-specific constraints applied to the link speed and link duplex mode.
      As a result of the discussion with Russel the next preliminary patch was
      born:
      Link: https://lore.kernel.org/netdev/df31e8bcf74b3b4ddb7ddf5a1c371390f16a2ad5.1712917541.git.siyanteng@loongson.cn
      
      The patch above was a temporal solution utilized by Yanteng for further
      developments and to move on with the on-going review. This patchset is a
      refactored version of that single patch with formatting required for the
      fixes patches.
      
      In particular the series starts with fixing the half-duplex-less
      constraint currently applied for all IP-cores. In fact it's specific for
      the DW QoS Eth only (DW GMAC v4.x/v5.x).
      
      The next patch fixes the MAC-capabilities setting up during the active
      Tx/Rx queues re-initialization procedure. Particularly the procedure
      missed the max-speed limit thus possibly activating speeds prohibited on
      the respective platforms.
      
      Third patch fixes the incorrect MAC-capabilities initialization for DW
      MAC100, DW XGMAC and DW XLGMAC devices by moving the correct
      initialization to the IP-core specific setup() methods.
      
      That's it for now. Thanks for review and testing in advance.
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Simon Horman <horms@kernel.org>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Chen-Yu Tsai <wens@csie.org>
      Cc: Jernej Skrabec <jernej.skrabec@gmail.com>
      Cc: Samuel Holland <samuel@sholland.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-stm32@st-md-mailman.stormreply.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-sunxi@lists.linux.dev
      Cc: linux-kernel@vger.kernel.org
      ====================
      
      Link: https://lore.kernel.org/r/20240412180340.7965-1-fancer.lancer@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e226eade
    • Serge Semin's avatar
      net: stmmac: Fix IP-cores specific MAC capabilities · 9cb54af2
      Serge Semin authored
      Here is the list of the MAC capabilities specific to the particular DW MAC
      IP-cores currently supported by the driver:
      
      DW MAC100: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
      	   MAC_10 | MAC_100
      
      DW GMAC:  MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
                MAC_10 | MAC_100 | MAC_1000
      
      Allwinner sun8i MAC: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
                           MAC_10 | MAC_100 | MAC_1000
      
      DW QoS Eth: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
                  MAC_10 | MAC_100 | MAC_1000 | MAC_2500FD
      if there is more than 1 active Tx/Rx queues:
      	   MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
                 MAC_10FD | MAC_100FD | MAC_1000FD | MAC_2500FD
      
      DW XGMAC: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
                MAC_1000FD | MAC_2500FD | MAC_5000FD | MAC_10000FD
      
      DW XLGMAC: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
                MAC_1000FD | MAC_2500FD | MAC_5000FD | MAC_10000FD |
                MAC_25000FD | MAC_40000FD | MAC_50000FD | MAC_100000FD
      
      As you can see there are only two common capabilities:
      MAC_ASYM_PAUSE | MAC_SYM_PAUSE.
      Meanwhile what is currently implemented defines 10/100/1000 link speeds
      for all IP-cores, which is definitely incorrect for DW MAC100, DW XGMAC
      and DW XLGMAC devices.
      
      Seeing the flow-control is implemented as a callback for each MAC IP-core
      (see dwmac100_flow_ctrl(), dwmac1000_flow_ctrl(), sun8i_dwmac_flow_ctrl(),
      etc) and since the MAC-specific setup() method is supposed to be called
      for each available DW MAC-based device, the capabilities initialization
      can be freely moved to these setup() functions, thus correctly setting up
      the MAC-capabilities for each IP-core (including the Allwinner Sun8i). A
      new stmmac_link::caps field was specifically introduced for that so to
      have all link-specific info preserved in a single structure.
      
      Note the suggested change fixes three earlier commits at a time. The
      commit 5b0d7d7d ("net: stmmac: Add the missing speeds that XGMAC
      supports") permitted the 10-100 link speeds and 1G half-duplex mode for DW
      XGMAC IP-core even though it doesn't support them. The commit df7699c7
      ("net: stmmac: Do not cut down 1G modes") incorrectly added the MAC1000
      capability to the DW MAC100 IP-core. Similarly to the DW XGMAC the commit
      8a880936 ("net: stmmac: Add XLGMII support") incorrectly permitted the
      10-100 link speeds and 1G half-duplex mode for DW XLGMAC IP-core.
      
      Fixes: 5b0d7d7d ("net: stmmac: Add the missing speeds that XGMAC supports")
      Fixes: df7699c7 ("net: stmmac: Do not cut down 1G modes")
      Fixes: 8a880936 ("net: stmmac: Add XLGMII support")
      Suggested-by: default avatarRussell King (Oracle) <linux@armlinux.org.uk>
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Reviewed-by: default avatarRomain Gantois <romain.gantois@bootlin.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9cb54af2
    • Serge Semin's avatar
      net: stmmac: Fix max-speed being ignored on queue re-init · 59c3d6ca
      Serge Semin authored
      It's possible to have the maximum link speed being artificially limited on
      the platform-specific basis. It's done either by setting up the
      plat_stmmacenet_data::max_speed field or by specifying the "max-speed"
      DT-property. In such cases it's required that any specific
      MAC-capabilities re-initializations would take the limit into account. In
      particular the link speed capabilities may change during the number of
      active Tx/Rx queues re-initialization. But the currently implemented
      procedure doesn't take the speed limit into account.
      
      Fix that by calling phylink_limit_mac_speed() in the
      stmmac_reinit_queues() method if the speed limitation was required in the
      same way as it's done in the stmmac_phy_setup() function.
      
      Fixes: 95201f36 ("net: stmmac: update MAC capabilities when tx queues are updated")
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Reviewed-by: default avatarRomain Gantois <romain.gantois@bootlin.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      59c3d6ca
    • Serge Semin's avatar
      net: stmmac: Apply half-duplex-less constraint for DW QoS Eth only · 0ebd96f5
      Serge Semin authored
      There are three DW MAC IP-cores which can have the multiple Tx/Rx queues
      enabled:
      DW GMAC v3.7+ with AV feature,
      DW QoS Eth v4.x/v5.x,
      DW XGMAC/XLGMAC
      Based on the respective HW databooks, only the DW QoS Eth IP-core doesn't
      support the half-duplex link mode in case if more than one queues enabled:
      
      "In multiple queue/channel configurations, for half-duplex operation,
      enable only the Q0/CH0 on Tx and Rx. For single queue/channel in
      full-duplex operation, any queue/channel can be enabled."
      
      The rest of the IP-cores don't have such constraint. Thus in order to have
      the constraint applied for the DW QoS Eth MACs only, let's move the it'
      implementation to the respective MAC-capabilities getter and make sure the
      getter is called in the queues re-init procedure.
      
      Fixes: b6cfffa7 ("stmmac: fix DMA channel hang in half-duplex mode")
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Reviewed-by: default avatarRomain Gantois <romain.gantois@bootlin.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0ebd96f5
    • Paolo Abeni's avatar
      Merge branch 'selftests-net-tcp_ao-a-bunch-of-fixes-for-tcp-ao-selftests' · 24f4c99e
      Paolo Abeni authored
      Dmitry Safonov via says:
      
      ====================
      selftests/net/tcp_ao: A bunch of fixes for TCP-AO selftests
      
      Started as addressing the flakiness issues in rst_ipv*, that affect
      netdev dashboard.
      Signed-off-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      ====================
      
      Link: https://lore.kernel.org/r/20240413-tcp-ao-selftests-fixes-v1-0-f9c41c96949d@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      24f4c99e
    • Dmitry Safonov's avatar
      selftests/tcp_ao: Printing fixes to confirm with format-security · b476c936
      Dmitry Safonov authored
      On my new laptop with packages from nixos-unstable, gcc 12.3.0 produces
      > lib/setup.c: In function ‘__test_msg’:
      > lib/setup.c:20:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    20 |         ksft_print_msg(buf);
      >       |         ^~~~~~~~~~~~~~
      > lib/setup.c: In function ‘__test_ok’:
      > lib/setup.c:26:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    26 |         ksft_test_result_pass(buf);
      >       |         ^~~~~~~~~~~~~~~~~~~~~
      > lib/setup.c: In function ‘__test_fail’:
      > lib/setup.c:32:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    32 |         ksft_test_result_fail(buf);
      >       |         ^~~~~~~~~~~~~~~~~~~~~
      > lib/setup.c: In function ‘__test_xfail’:
      > lib/setup.c:38:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    38 |         ksft_test_result_xfail(buf);
      >       |         ^~~~~~~~~~~~~~~~~~~~~~
      > lib/setup.c: In function ‘__test_error’:
      > lib/setup.c:44:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    44 |         ksft_test_result_error(buf);
      >       |         ^~~~~~~~~~~~~~~~~~~~~~
      > lib/setup.c: In function ‘__test_skip’:
      > lib/setup.c:50:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    50 |         ksft_test_result_skip(buf);
      >       |         ^~~~~~~~~~~~~~~~~~~~~
      > cc1: some warnings being treated as errors
      
      As the buffer was already pre-printed into, print it as a string
      rather than a format-string.
      
      Fixes: cfbab37b ("selftests/net: Add TCP-AO library")
      Signed-off-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Reported-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b476c936
    • Dmitry Safonov's avatar
      selftests/tcp_ao: Fix fscanf() call for format-security · beb78cd1
      Dmitry Safonov authored
      On my new laptop with packages from nixos-unstable, gcc 12.3.0 produces:
      > lib/proc.c: In function ‘netstat_read_type’:
      > lib/proc.c:89:9: error: format not a string literal and no format arguments [-Werror=format-security]
      >    89 |         if (fscanf(fnetstat, type->header_name) == EOF)
      >       |         ^~
      > cc1: some warnings being treated as errors
      
      Here the selftests lib parses header name, while expectes non-space word
      ending with a column.
      
      Fixes: cfbab37b ("selftests/net: Add TCP-AO library")
      Signed-off-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Reported-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      beb78cd1
    • Dmitry Safonov's avatar
      selftests/tcp_ao: Zero-init tcp_ao_info_opt · b089b3be
      Dmitry Safonov authored
      The structure is on the stack and has to be zero-initialized as
      the kernel checks for:
      >	if (in.reserved != 0 || in.reserved2 != 0)
      >		return -EINVAL;
      
      Fixes: b2666053 ("selftests/net: Add test for TCP-AO add setsockopt() command")
      Signed-off-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b089b3be
    • Dmitry Safonov's avatar
      selftests/tcp_ao: Make RST tests less flaky · 4225dfa4
      Dmitry Safonov authored
      Currently, "active reset" cases are flaky, because select() is called
      for 3 sockets, while only 2 are expected to receive RST.
      The idea of the third socket was to get into request_sock_queue,
      but the test mistakenly attempted to connect() after the listener
      socket was shut down.
      
      Repair this test, it's important to check the different kernel
      code-paths for signing RST TCP-AO segments.
      
      Fixes: c6df7b23 ("selftests/net: Add TCP-AO RST test")
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4225dfa4
  4. 15 Apr, 2024 2 commits
  5. 14 Apr, 2024 1 commit
    • Yuri Benditovich's avatar
      net: change maximum number of UDP segments to 128 · 1382e3b6
      Yuri Benditovich authored
      The commit fc8b2a61
      ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation")
      adds check of potential number of UDP segments vs
      UDP_MAX_SEGMENTS in linux/virtio_net.h.
      After this change certification test of USO guest-to-guest
      transmit on Windows driver for virtio-net device fails,
      for example with packet size of ~64K and mss of 536 bytes.
      In general the USO should not be more restrictive than TSO.
      Indeed, in case of unreasonably small mss a lot of segments
      can cause queue overflow and packet loss on the destination.
      Limit of 128 segments is good for any practical purpose,
      with minimal meaningful mss of 536 the maximal UDP packet will
      be divided to ~120 segments.
      The number of segments for UDP packets is validated vs
      UDP_MAX_SEGMENTS also in udp.c (v4,v6), this does not affect
      quest-to-guest path but does affect packets sent to host, for
      example.
      It is important to mention that UDP_MAX_SEGMENTS is kernel-only
      define and not available to user mode socket applications.
      In order to request MSS smaller than MTU the applications
      just uses setsockopt with SOL_UDP and UDP_SEGMENT and there is
      no limitations on socket API level.
      
      Fixes: fc8b2a61 ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation")
      Signed-off-by: default avatarYuri Benditovich <yuri.benditovich@daynix.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1382e3b6
  6. 13 Apr, 2024 10 commits
    • Jakub Kicinski's avatar
      Merge branch 'mlx5-fixes' · 72041e53
      Jakub Kicinski authored
      Tariq Toukan says:
      
      ====================
      mlx5 fixes
      
      This patchset provides bug fixes to mlx5 core and Eth drivers.
      ====================
      
      Link: https://lore.kernel.org/r/20240411115444.374475-1-tariqt@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      72041e53
    • Carolina Jubran's avatar
      net/mlx5e: Prevent deadlock while disabling aRFS · fef96576
      Carolina Jubran authored
      When disabling aRFS under the `priv->state_lock`, any scheduled
      aRFS works are canceled using the `cancel_work_sync` function,
      which waits for the work to end if it has already started.
      However, while waiting for the work handler, the handler will
      try to acquire the `state_lock` which is already acquired.
      
      The worker acquires the lock to delete the rules if the state
      is down, which is not the worker's responsibility since
      disabling aRFS deletes the rules.
      
      Add an aRFS state variable, which indicates whether the aRFS is
      enabled and prevent adding rules when the aRFS is disabled.
      
      Kernel log:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      6.7.0-rc4_net_next_mlx5_5483eb2 #1 Tainted: G          I
      ------------------------------------------------------
      ethtool/386089 is trying to acquire lock:
      ffff88810f21ce68 ((work_completion)(&rule->arfs_work)){+.+.}-{0:0}, at: __flush_work+0x74/0x4e0
      
      but task is already holding lock:
      ffff8884a1808cc0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_ethtool_set_channels+0x53/0x200 [mlx5_core]
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&priv->state_lock){+.+.}-{3:3}:
             __mutex_lock+0x80/0xc90
             arfs_handle_work+0x4b/0x3b0 [mlx5_core]
             process_one_work+0x1dc/0x4a0
             worker_thread+0x1bf/0x3c0
             kthread+0xd7/0x100
             ret_from_fork+0x2d/0x50
             ret_from_fork_asm+0x11/0x20
      
      -> #0 ((work_completion)(&rule->arfs_work)){+.+.}-{0:0}:
             __lock_acquire+0x17b4/0x2c80
             lock_acquire+0xd0/0x2b0
             __flush_work+0x7a/0x4e0
             __cancel_work_timer+0x131/0x1c0
             arfs_del_rules+0x143/0x1e0 [mlx5_core]
             mlx5e_arfs_disable+0x1b/0x30 [mlx5_core]
             mlx5e_ethtool_set_channels+0xcb/0x200 [mlx5_core]
             ethnl_set_channels+0x28f/0x3b0
             ethnl_default_set_doit+0xec/0x240
             genl_family_rcv_msg_doit+0xd0/0x120
             genl_rcv_msg+0x188/0x2c0
             netlink_rcv_skb+0x54/0x100
             genl_rcv+0x24/0x40
             netlink_unicast+0x1a1/0x270
             netlink_sendmsg+0x214/0x460
             __sock_sendmsg+0x38/0x60
             __sys_sendto+0x113/0x170
             __x64_sys_sendto+0x20/0x30
             do_syscall_64+0x40/0xe0
             entry_SYSCALL_64_after_hwframe+0x46/0x4e
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&priv->state_lock);
                                     lock((work_completion)(&rule->arfs_work));
                                     lock(&priv->state_lock);
        lock((work_completion)(&rule->arfs_work));
      
       *** DEADLOCK ***
      
      3 locks held by ethtool/386089:
       #0: ffffffff82ea7210 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40
       #1: ffffffff82e94c88 (rtnl_mutex){+.+.}-{3:3}, at: ethnl_default_set_doit+0xd3/0x240
       #2: ffff8884a1808cc0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_ethtool_set_channels+0x53/0x200 [mlx5_core]
      
      stack backtrace:
      CPU: 15 PID: 386089 Comm: ethtool Tainted: G          I        6.7.0-rc4_net_next_mlx5_5483eb2 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x60/0xa0
       check_noncircular+0x144/0x160
       __lock_acquire+0x17b4/0x2c80
       lock_acquire+0xd0/0x2b0
       ? __flush_work+0x74/0x4e0
       ? save_trace+0x3e/0x360
       ? __flush_work+0x74/0x4e0
       __flush_work+0x7a/0x4e0
       ? __flush_work+0x74/0x4e0
       ? __lock_acquire+0xa78/0x2c80
       ? lock_acquire+0xd0/0x2b0
       ? mark_held_locks+0x49/0x70
       __cancel_work_timer+0x131/0x1c0
       ? mark_held_locks+0x49/0x70
       arfs_del_rules+0x143/0x1e0 [mlx5_core]
       mlx5e_arfs_disable+0x1b/0x30 [mlx5_core]
       mlx5e_ethtool_set_channels+0xcb/0x200 [mlx5_core]
       ethnl_set_channels+0x28f/0x3b0
       ethnl_default_set_doit+0xec/0x240
       genl_family_rcv_msg_doit+0xd0/0x120
       genl_rcv_msg+0x188/0x2c0
       ? ethnl_ops_begin+0xb0/0xb0
       ? genl_family_rcv_msg_dumpit+0xf0/0xf0
       netlink_rcv_skb+0x54/0x100
       genl_rcv+0x24/0x40
       netlink_unicast+0x1a1/0x270
       netlink_sendmsg+0x214/0x460
       __sock_sendmsg+0x38/0x60
       __sys_sendto+0x113/0x170
       ? do_user_addr_fault+0x53f/0x8f0
       __x64_sys_sendto+0x20/0x30
       do_syscall_64+0x40/0xe0
       entry_SYSCALL_64_after_hwframe+0x46/0x4e
       </TASK>
      
      Fixes: 45bf454a ("net/mlx5e: Enabling aRFS mechanism")
      Signed-off-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240411115444.374475-7-tariqt@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fef96576
    • Carolina Jubran's avatar
      net/mlx5e: Acquire RTNL lock before RQs/SQs activation/deactivation · fdce06bd
      Carolina Jubran authored
      netif_queue_set_napi asserts whether RTNL lock is held if
      the netdev is initialized.
      
      Acquire the RTNL lock before activating or deactivating
      RQs/SQs if the lock has not been held before in the flow.
      
      Fixes: f25e7b82 ("net/mlx5e: link NAPI instances to queues and IRQs")
      Cc: Joe Damato <jdamato@fastly.com>
      Signed-off-by: default avatarCarolina Jubran <cjubran@nvidia.com>
      Reviewed-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240411115444.374475-6-tariqt@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fdce06bd
    • Rahul Rameshbabu's avatar
      net/mlx5e: Use channel mdev reference instead of global mdev instance for coalescing · 6c685bdb
      Rahul Rameshbabu authored
      Channels can potentially have independent mdev instances. Do not refer to
      the global mdev instance in the mlx5e_priv instance for channel FW
      operations related to coalescing. CQ numbers that would be valid on the
      channel's mdev instance may not be correctly referenced if using the
      mlx5e_priv instance.
      
      Fixes: 67936e13 ("net/mlx5e: Let channels be SD-aware")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240411115444.374475-5-tariqt@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6c685bdb
    • Shay Drory's avatar
      net/mlx5: Restore mistakenly dropped parts in register devlink flow · bf729988
      Shay Drory authored
      Code parts from cited commit were mistakenly dropped while rebasing
      before submission. Add them here.
      
      Fixes: c6e77aa9 ("net/mlx5: Register devlink first under devlink lock")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240411115444.374475-4-tariqt@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bf729988
    • Tariq Toukan's avatar
      net/mlx5: SD, Handle possible devcom ERR_PTR · aa4ac90d
      Tariq Toukan authored
      Check if devcom holds an error pointer and return immediately.
      
      This fixes Smatch static checker warning:
      drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c:221 sd_register()
      error: 'devcom' dereferencing possible ERR_PTR()
      
      Enhance mlx5_devcom_register_component() so it stops returning NULL,
      making it easier for its callers.
      
      Fixes: d3d05766 ("net/mlx5: SD, Implement devcom communication and primary election")
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Link: https://lore.kernel.org/all/f09666c8-e604-41f6-958b-4cc55c73faf9@gmail.com/T/Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Link: https://lore.kernel.org/r/20240411115444.374475-3-tariqt@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aa4ac90d
    • Shay Drory's avatar
      net/mlx5: Lag, restore buckets number to default after hash LAG deactivation · 37cc10da
      Shay Drory authored
      The cited patch introduces the concept of buckets in LAG in hash mode.
      However, the patch doesn't clear the number of buckets in the LAG
      deactivation. This results in using the wrong number of buckets in
      case user create a hash mode LAG and afterwards create a non-hash
      mode LAG.
      
      Hence, restore buckets number to default after hash mode LAG
      deactivation.
      
      Fixes: 352899f3 ("net/mlx5: Lag, use buckets in hash mode")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240411115444.374475-2-tariqt@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      37cc10da
    • Asbjørn Sloth Tønnesen's avatar
      net: sparx5: flower: fix fragment flags handling · 68aba004
      Asbjørn Sloth Tønnesen authored
      I noticed that only 3 out of the 4 input bits were used,
      mt.key->flags & FLOW_DIS_IS_FRAGMENT was never checked.
      
      In order to avoid a complicated maze, I converted it to
      use a 16 byte mapping table.
      
      As shown in the table below the old heuristics doesn't
      always do the right thing, ie. when FLOW_DIS_IS_FRAGMENT=1/1
      then it used to only match follow-up fragment packets.
      
      Here are all the combinations, and their resulting new/old
      VCAP key/mask filter:
      
        /- FLOW_DIS_IS_FRAGMENT (key/mask)
        |    /- FLOW_DIS_FIRST_FRAG (key/mask)
        |    |    /-- new VCAP fragment (key/mask)
        v    v    v    v- old VCAP fragment (key/mask)
      
       0/0  0/0  -/-  -/-     impossible (due to entry cond. on mask)
       0/0  0/1  -/-  0/3 !!  invalid (can't match non-fragment + follow-up frag)
       0/0  1/0  -/-  -/-     impossible (key > mask)
       0/0  1/1  1/3  1/3     first fragment
      
       0/1  0/0  0/3  3/3 !!  not fragmented
       0/1  0/1  0/3  3/3 !!  not fragmented (+ not first fragment)
       0/1  1/0  -/-  -/-     impossible (key > mask)
       0/1  1/1  -/-  1/3 !!  invalid (non-fragment and first frag)
      
       1/0  0/0  -/-  -/-     impossible (key > mask)
       1/0  0/1  -/-  -/-     impossible (key > mask)
       1/0  1/0  -/-  -/-     impossible (key > mask)
       1/0  1/1  -/-  -/-     impossible (key > mask)
      
       1/1  0/0  1/1  3/3 !!  some fragment
       1/1  0/1  3/3  3/3     follow-up fragment
       1/1  1/0  -/-  -/-     impossible (key > mask)
       1/1  1/1  1/3  1/3     first fragment
      
      In the datasheet the VCAP fragment values are documented as:
       0 = no fragment
       1 = initial fragment
       2 = suspicious fragment
       3 = valid follow-up fragment
      
      Result: 3 combinations match the old behavior,
              3 combinations have been corrected,
              2 combinations are now invalid, and fail,
              8 combinations are impossible.
      
      It should now be aligned with how FLOW_DIS_IS_FRAGMENT
      and FLOW_DIS_FIRST_FRAG is set in __skb_flow_dissect() in
      net/core/flow_dissector.c
      
      Since the VCAP fragment values are not a bitfield, we have
      to ignore the suspicious fragment value, eg. when matching
      on any kind of fragment with FLOW_DIS_IS_FRAGMENT=1/1.
      
      Only compile tested, and logic tested in userspace, as I
      unfortunately don't have access to this switch chip (yet).
      
      Fixes: d6c2964d ("net: microchip: sparx5: Adding more tc flower keys for the IS2 VCAP")
      Signed-off-by: default avatarAsbjørn Sloth Tønnesen <ast@fiberby.net>
      Reviewed-by: default avatarSteen Hegelund <Steen.Hegelund@microchip.com>
      Tested-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20240411111321.114095-1-ast@fiberby.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      68aba004
    • Jakub Kicinski's avatar
      Merge branch 'af_unix-fix-msg_oob-bugs-with-msg_peek' · 27f58f7f
      Jakub Kicinski authored
      Kuniyuki Iwashima says:
      
      ====================
      af_unix: Fix MSG_OOB bugs with MSG_PEEK.
      
      Currently, OOB data can be read without MSG_OOB accidentally
      in two cases, and this seris fixes the bugs.
      
      v1: https://lore.kernel.org/netdev/20240409225209.58102-1-kuniyu@amazon.com/
      ====================
      
      Link: https://lore.kernel.org/r/20240410171016.7621-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      27f58f7f
    • Kuniyuki Iwashima's avatar
      af_unix: Don't peek OOB data without MSG_OOB. · 22dd70eb
      Kuniyuki Iwashima authored
      Currently, we can read OOB data without MSG_OOB by using MSG_PEEK
      when OOB data is sitting on the front row, which is apparently
      wrong.
      
        >>> from socket import *
        >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
        >>> c1.send(b'a', MSG_OOB)
        1
        >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
        b'a'
      
      If manage_oob() is called when no data has been copied, we only
      check if the socket enables SO_OOBINLINE or MSG_PEEK is not used.
      Otherwise, the skb is returned as is.
      
      However, here we should return NULL if MSG_PEEK is set and no data
      has been copied.
      
      Also, in such a case, we should not jump to the redo label because
      we will be caught in the loop and hog the CPU until normal data
      comes in.
      
      Then, we need to handle skb == NULL case with the if-clause below
      the manage_oob() block.
      
      With this patch:
      
        >>> from socket import *
        >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
        >>> c1.send(b'a', MSG_OOB)
        1
        >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        BlockingIOError: [Errno 11] Resource temporarily unavailable
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240410171016.7621-3-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      22dd70eb