1. 23 Apr, 2024 29 commits
    • Kuniyuki Iwashima's avatar
      af_unix: Don't access successor in unix_del_edges() during GC. · 1af2dfac
      Kuniyuki Iwashima authored
      syzbot reported use-after-free in unix_del_edges().  [0]
      
      What the repro does is basically repeat the following quickly.
      
        1. pass a fd of an AF_UNIX socket to itself
      
          socketpair(AF_UNIX, SOCK_DGRAM, 0, [3, 4]) = 0
          sendmsg(3, {..., msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET,
                                         cmsg_type=SCM_RIGHTS, cmsg_data=[4]}], ...}, 0) = 0
      
        2. pass other fds of AF_UNIX sockets to the socket above
      
          socketpair(AF_UNIX, SOCK_SEQPACKET, 0, [5, 6]) = 0
          sendmsg(3, {..., msg_control=[{cmsg_len=48, cmsg_level=SOL_SOCKET,
                                         cmsg_type=SCM_RIGHTS, cmsg_data=[5, 6]}], ...}, 0) = 0
      
        3. close all sockets
      
      Here, two skb are created, and every unix_edge->successor is the first
      socket.  Then, __unix_gc() will garbage-collect the two skb:
      
        (a) free skb with self-referencing fd
        (b) free skb holding other sockets
      
      After (a), the self-referencing socket will be scheduled to be freed
      later by the delayed_fput() task.
      
      syzbot repeated the sequences above (1. ~ 3.) quickly and triggered
      the task concurrently while GC was running.
      
      So, at (b), the socket was already freed, and accessing it was illegal.
      
      unix_del_edges() accesses the receiver socket as edge->successor to
      optimise GC.  However, we should not do it during GC.
      
      Garbage-collecting sockets does not change the shape of the rest
      of the graph, so we need not call unix_update_graph() to update
      unix_graph_grouped when we purge skb.
      
      However, if we clean up all loops in the unix_walk_scc_fast() path,
      unix_graph_maybe_cyclic remains unchanged (true), and __unix_gc()
      will call unix_walk_scc_fast() continuously even though there is no
      socket to garbage-collect.
      
      To keep that optimisation while fixing UAF, let's add the same
      updating logic of unix_graph_maybe_cyclic in unix_walk_scc_fast()
      as done in unix_walk_scc() and __unix_walk_scc().
      
      Note that when unix_del_edges() is called from other places, the
      receiver socket is always alive:
      
        - sendmsg: the successor's sk_refcnt is bumped by sock_hold()
                   unix_find_other() for SOCK_DGRAM, connect() for SOCK_STREAM
      
        - recvmsg: the successor is the receiver, and its fd is alive
      
      [0]:
      BUG: KASAN: slab-use-after-free in unix_edge_successor net/unix/garbage.c:109 [inline]
      BUG: KASAN: slab-use-after-free in unix_del_edge net/unix/garbage.c:165 [inline]
      BUG: KASAN: slab-use-after-free in unix_del_edges+0x148/0x630 net/unix/garbage.c:237
      Read of size 8 at addr ffff888079c6e640 by task kworker/u8:6/1099
      
      CPU: 0 PID: 1099 Comm: kworker/u8:6 Not tainted 6.9.0-rc4-next-20240418-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Workqueue: events_unbound __unix_gc
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
       print_address_description mm/kasan/report.c:377 [inline]
       print_report+0x169/0x550 mm/kasan/report.c:488
       kasan_report+0x143/0x180 mm/kasan/report.c:601
       unix_edge_successor net/unix/garbage.c:109 [inline]
       unix_del_edge net/unix/garbage.c:165 [inline]
       unix_del_edges+0x148/0x630 net/unix/garbage.c:237
       unix_destroy_fpl+0x59/0x210 net/unix/garbage.c:298
       unix_detach_fds net/unix/af_unix.c:1811 [inline]
       unix_destruct_scm+0x13e/0x210 net/unix/af_unix.c:1826
       skb_release_head_state+0x100/0x250 net/core/skbuff.c:1127
       skb_release_all net/core/skbuff.c:1138 [inline]
       __kfree_skb net/core/skbuff.c:1154 [inline]
       kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1190
       __skb_queue_purge_reason include/linux/skbuff.h:3251 [inline]
       __skb_queue_purge include/linux/skbuff.h:3256 [inline]
       __unix_gc+0x1732/0x1830 net/unix/garbage.c:575
       process_one_work kernel/workqueue.c:3218 [inline]
       process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
       worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
       kthread+0x2f0/0x390 kernel/kthread.c:389
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
       </TASK>
      
      Allocated by task 14427:
       kasan_save_stack mm/kasan/common.c:47 [inline]
       kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
       unpoison_slab_object mm/kasan/common.c:312 [inline]
       __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338
       kasan_slab_alloc include/linux/kasan.h:201 [inline]
       slab_post_alloc_hook mm/slub.c:3897 [inline]
       slab_alloc_node mm/slub.c:3957 [inline]
       kmem_cache_alloc_noprof+0x135/0x290 mm/slub.c:3964
       sk_prot_alloc+0x58/0x210 net/core/sock.c:2074
       sk_alloc+0x38/0x370 net/core/sock.c:2133
       unix_create1+0xb4/0x770
       unix_create+0x14e/0x200 net/unix/af_unix.c:1034
       __sock_create+0x490/0x920 net/socket.c:1571
       sock_create net/socket.c:1622 [inline]
       __sys_socketpair+0x33e/0x720 net/socket.c:1773
       __do_sys_socketpair net/socket.c:1822 [inline]
       __se_sys_socketpair net/socket.c:1819 [inline]
       __x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Freed by task 1805:
       kasan_save_stack mm/kasan/common.c:47 [inline]
       kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
       kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
       poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
       __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
       kasan_slab_free include/linux/kasan.h:184 [inline]
       slab_free_hook mm/slub.c:2190 [inline]
       slab_free mm/slub.c:4393 [inline]
       kmem_cache_free+0x145/0x340 mm/slub.c:4468
       sk_prot_free net/core/sock.c:2114 [inline]
       __sk_destruct+0x467/0x5f0 net/core/sock.c:2208
       sock_put include/net/sock.h:1948 [inline]
       unix_release_sock+0xa8b/0xd20 net/unix/af_unix.c:665
       unix_release+0x91/0xc0 net/unix/af_unix.c:1049
       __sock_release net/socket.c:659 [inline]
       sock_close+0xbc/0x240 net/socket.c:1421
       __fput+0x406/0x8b0 fs/file_table.c:422
       delayed_fput+0x59/0x80 fs/file_table.c:445
       process_one_work kernel/workqueue.c:3218 [inline]
       process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
       worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
       kthread+0x2f0/0x390 kernel/kthread.c:389
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
      
      The buggy address belongs to the object at ffff888079c6e000
       which belongs to the cache UNIX of size 1920
      The buggy address is located 1600 bytes inside of
       freed 1920-byte region [ffff888079c6e000, ffff888079c6e780)
      
      Reported-by: syzbot+f3f3eef1d2100200e593@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=f3f3eef1d2100200e593
      Fixes: 77e5593a ("af_unix: Skip GC if no cycle exists.")
      Fixes: fd863448 ("af_unix: Try not to hold unix_gc_lock during accept().")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240419235102.31707-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1af2dfac
    • Paolo Abeni's avatar
      Merge branch 'net-ipa-eight-simple-cleanups' · 0ff1db48
      Paolo Abeni authored
      Alex Elder says:
      
      ====================
      net: ipa: eight simple cleanups
      
      This series contains a mix of cleanups, some dating back to
      December, 2022.  Version 1 was based on an older version of
      net-next/main; this version has simply been rebased.
      
      The first two make it so the IPA SUSPEND interrupt only gets enabled
      when necessary.  That make it possible in the third patch to call
      device_init_wakeup() during an earlier phase of initialization, and
      remove two functions.
      
      The next patch removes IPA register definitions that are never used.
      The fifth patch makes ipa_table_hash_support() a real function, so
      the IPA structure only needs to be declared rather than defined when
      that file is parsed.
      
      The sixth patch fixes improper argument names in two function
      declarations.  The seventh removes the declaration for a function
      that does not exist, and makes ipa_cmd_init() actually get called.
      And the last one eliminates ipa_version_supported(), in favor of
      just deciding that if a device is probed because its compatible
      matches, that device is assumed to be supported.
      ====================
      
      Link: https://lore.kernel.org/r/20240419151800.2168903-1-elder@linaro.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0ff1db48
    • Alex Elder's avatar
      net: ipa: kill ipa_version_supported() · dfdd70e2
      Alex Elder authored
      The only place ipa_version_supported() is called is in the probe
      function.  The version comes from the match data.  Rather than
      checking the version validity separately, just consider anything
      that has match data to be supported.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      dfdd70e2
    • Alex Elder's avatar
      net: ipa: fix two minor ipa_cmd problems · 319b6d4e
      Alex Elder authored
      In "ipa_cmd.h", ipa_cmd_data_valid() is declared, but that function
      does not exist.  So delete that declaration.
      
      Also, for some reason ipa_cmd_init() never gets called.  It isn't
      really critical--it just validates that some memory offsets and a
      size can be represented in some register fields, and they won't fail
      with current data.  Regardless, call the function in ipa_probe().
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      319b6d4e
    • Alex Elder's avatar
      net: ipa: fix two bogus argument names · f2e4e9ea
      Alex Elder authored
      In "ipa_endpoint.h", two function declarations have bogus argument
      names.  Fix these.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f2e4e9ea
    • Alex Elder's avatar
      net: ipa: make ipa_table_hash_support() a real function · b81565b7
      Alex Elder authored
      With the exception of ipa_table_hash_support(), nothing defined in
      "ipa_table.h" requires the full definition of the IPA structure.
      
      Change that function to be a "real" function rather than an inline,
      to avoid requring the IPA structure to be defined.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b81565b7
    • Alex Elder's avatar
      net: ipa: remove unneeded FILT_ROUT_HASH_EN definitions · 5043d6b1
      Alex Elder authored
      The FILT_ROUT_HASH_EN register is only used for IPA v4.2.  There,
      routing and filter table hashing are not supported, and so the
      register must be written to disable the feature.  No other version
      uses this register, so its definition can be removed.  If we need to
      use these some day (for example, explicitly enable the feature) this
      commit can be reverted.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5043d6b1
    • Alex Elder's avatar
      net: ipa: call device_init_wakeup() earlier · 19790951
      Alex Elder authored
      Currently, enabling wakeup for the IPA device doesn't occur until
      the setup phase of initialization (in ipa_power_setup()).
      
      There is no need to delay doing that, however.  We can conveniently
      do it during the config phase, in ipa_interrupt_config(), where we
      enable power management wakeup mode for the IPA interrupt.
      
      Moving the device_init_wakeup() out of ipa_power_setup() leaves that
      function empty, so it can just be eliminated.
      
      Similarly, rearrange all of the matching inverse calls, disabling
      device wakeup in ipa_interrupt_deconfig() and removing that function
      as well.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      19790951
    • Alex Elder's avatar
      net: ipa: only enable the SUSPEND IPA interrupt when needed · 6f370026
      Alex Elder authored
      Only enable the SUSPEND IPA interrupt type when at least one
      endpoint has that interrupt enabled.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6f370026
    • Alex Elder's avatar
      net: ipa: maintain bitmap of suspend-enabled endpoints · 2eca7344
      Alex Elder authored
      Keep track of which endpoints have the SUSPEND IPA interrupt enabled
      in a variable-length bitmap.  This will be used in the next patch to
      allow the SUSPEND interrupt type to be disabled except when needed.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2eca7344
    • Paolo Abeni's avatar
      Merge branch 'net-stmmac-fix-mac-capabilities-procedure' · 57f15912
      Paolo Abeni authored
      Serge Semin says:
      
      ====================
      net: stmmac: Fix MAC-capabilities procedure
      
      The series got born as a result of the discussions around the recent
      Yanteng' series adding the Loongson LS7A1000, LS2K1000, LS7A2000, LS2K2000
      MACs support:
      Link: https://lore.kernel.org/netdev/fu3f6uoakylnb6eijllakeu5i4okcyqq7sfafhp5efaocbsrwe@w74xe7gb6x7p
      
      In particular the Yanteng' patchset needed to implement the Loongson
      MAC-specific constraints applied to the link speed and link duplex mode.
      As a result of the discussion with Russel the next preliminary patch was
      born:
      Link: https://lore.kernel.org/netdev/df31e8bcf74b3b4ddb7ddf5a1c371390f16a2ad5.1712917541.git.siyanteng@loongson.cn
      
      The patch above was a temporal solution utilized by Yanteng for further
      developments and to move on with the on-going review. This patchset is a
      refactored version of that single patch with formatting required for the
      fixes patches.
      
      The main part of the series has already been merged in on v1 stage. The
      leftover is the cleanup patches which rename
      stmmac_ops::phylink_get_caps() callback to stmmac_ops::update_caps() and
      move the MAC-capabilities init/re-init to the phylink MAC-capabilities
      getter.
      
      Link: https://lore.kernel.org/netdev/20240412180340.7965-1-fancer.lancer@gmail.com/
      Changelog v2:
      - Add a new patch (Romain):
        [PATCH net-next v2 1/2] net: stmmac: Rename phylink_get_caps() callback to update_caps()
      - Resubmit the leftover patches to net-next tree (Paolo).
      
      Link: https://lore.kernel.org/netdev/20240417140013.12575-1-fancer.lancer@gmail.com/
      Changelog v3:
      - Just resubmit (Jakub).
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      ====================
      
      Link: https://lore.kernel.org/r/20240419090357.5547-1-fancer.lancer@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      57f15912
    • Serge Semin's avatar
      net: stmmac: Move MAC caps init to phylink MAC caps getter · f951a649
      Serge Semin authored
      After a set of recent fixes the stmmac_phy_setup() and
      stmmac_reinit_queues() methods have turned to having some duplicated code.
      Let's get rid from the duplication by moving the MAC-capabilities
      initialization to the PHYLINK MAC-capabilities getter. The getter is
      called during each network device interface open/close cycle. So the
      MAC-capabilities will be initialized in generic device open procedure and
      in case of the Tx/Rx queues re-initialization as the original code
      semantics implies.
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Reviewed-by: default avatarRomain Gantois <romain.gantois@bootlin.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f951a649
    • Serge Semin's avatar
      net: stmmac: Rename phylink_get_caps() callback to update_caps() · dc144bae
      Serge Semin authored
      Since recent commits the stmmac_ops::phylink_get_caps() callback has no
      longer been responsible for the phylink MAC capabilities getting, but
      merely updates the MAC capabilities in the mac_device_info::link::caps
      field. Rename the callback to comply with the what the method does now.
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Reviewed-by: default avatarRomain Gantois <romain.gantois@bootlin.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      dc144bae
    • Paolo Abeni's avatar
      Merge branch 'enable-rx-hw-timestamp-for-ptp-packets-using-cpts-fifo' · 30b3fe06
      Paolo Abeni authored
      Chintan Vankar says:
      
      ====================
      Enable RX HW timestamp for PTP packets using CPTS FIFO
      
      The CPSW offers two mechanisms for communicating packet ingress timestamp
      information to the host.
      
      The first mechanism is via the CPTS Event FIFO which records timestamp
      when triggered by certain events. One such event is the reception of an
      Ethernet packet with a specified EtherType field. This is used to capture
      ingress timestamps for PTP packets. With this mechanism the host must
      read the timestamp (from the CPTS FIFO) separately from the packet payload
      which is delivered via DMA.
      
      In the second mechanism of timestamping, CPSW driver enables hardware
      timestamping for all received packets by setting the TSTAMP_EN bit in
      CPTS_CONTROL register, which directs the CPTS module to timestamp all
      received packets, followed by passing timestamp via DMA descriptors.
      This mechanism is responsible for triggering errata i2401:
      "CPSW: Host Timestamps Cause CPSW Port to Lock up."
      
      The errata affects all K3 SoCs. Link to errata for AM64x:
      https://www.ti.com/lit/er/sprz457h/sprz457h.pdf
      
      As a workaround we can use first mechanism to timestamp received
      packets.
      ====================
      
      Link: https://lore.kernel.org/r/20240419082626.57225-1-c-vankar@ti.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      30b3fe06
    • Chintan Vankar's avatar
      net: ethernet: ti: am65-cpsw/ethtool: Enable RX HW timestamp only for PTP packets · c03a6fd3
      Chintan Vankar authored
      In the current mechanism of timestamping, am65-cpsw-nuss driver
      enables hardware timestamping for all received packets by setting
      the TSTAMP_EN bit in CPTS_CONTROL register, which directs the CPTS
      module to timestamp all received packets, followed by passing
      timestamp via DMA descriptors. This mechanism causes CPSW Port to
      Lock up.
      
      To prevent port lock up, don't enable rx packet timestamping by
      setting TSTAMP_EN bit in CPTS_CONTROL register. The workaround for
      timestamping received packets is to utilize the CPTS Event FIFO
      that records timestamps corresponding to certain events. The CPTS
      module is configured to generate timestamps for Multicast Ethernet,
      UDP/IPv4 and UDP/IPv6 PTP packets.
      
      Update supported hwtstamp_rx_filters values for CPSW's timestamping
      capability.
      
      Fixes: b1f66a5b ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support")
      Signed-off-by: default avatarChintan Vankar <c-vankar@ti.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c03a6fd3
    • Chintan Vankar's avatar
      net: ethernet: ti: am65-cpts: Enable RX HW timestamp for PTP packets using CPTS FIFO · c459f606
      Chintan Vankar authored
      Add a new function "am65_cpts_rx_timestamp()" which checks for PTP
      packets from header and timestamps them.
      
      Add another function "am65_cpts_find_rx_ts()" which finds CPTS FIFO
      Event to get the timestamp of received PTP packet.
      Signed-off-by: default avatarChintan Vankar <c-vankar@ti.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c459f606
    • Paolo Abeni's avatar
      Merge branch 'read-phy-address-of-switch-from-device-tree-on-mt7530-dsa-subdriver' · 9b9fd023
      Paolo Abeni authored
      Arınç ÜNAL says:
      
      ====================
      Read PHY address of switch from device tree on MT7530 DSA subdriver
      
      This patch series makes the driver read the PHY address the switch listens
      on from the device tree which, in result, brings support for MT7530
      switches listening on a different PHY address than 31. And the patch series
      simplifies the core operations.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      ====================
      
      Link: https://lore.kernel.org/r/20240418-b4-for-netnext-mt7530-phy-addr-from-dt-and-simplify-core-ops-v3-0-3b5fb249b004@arinc9.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9b9fd023
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: simplify core operations · 7c5e37d7
      Arınç ÜNAL authored
      The core_rmw() function calls core_read_mmd_indirect() to read the
      requested register, and then calls core_write_mmd_indirect() to write the
      requested value to the register. Because Clause 22 is used to access Clause
      45 registers, some operations on core_write_mmd_indirect() are
      unnecessarily run. Get rid of core_read_mmd_indirect() and
      core_write_mmd_indirect(), and run only the necessary operations on
      core_write() and core_rmw().
      Reviewed-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Tested-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7c5e37d7
    • Arınç ÜNAL's avatar
      net: dsa: mt7530-mdio: read PHY address of switch from device tree · 868ff5f4
      Arınç ÜNAL authored
      Read the PHY address the switch listens on from the reg property of the
      switch node on the device tree. This change brings support for MT7530
      switches on boards with such bootstrapping configuration where the switch
      listens on a different PHY address than the hardcoded PHY address on the
      driver, 31.
      
      As described on the "MT7621 Programming Guide v0.4" document, the MT7530
      switch and its PHYs can be configured to listen on the range of 7-12,
      15-20, 23-28, and 31 and 0-4 PHY addresses.
      
      There are operations where the switch PHY registers are used. For the PHY
      address of the control PHY, transform the MT753X_CTRL_PHY_ADDR constant
      into a macro and use it. The PHY address for the control PHY is 0 when the
      switch listens on 31. In any other case, it is one greater than the PHY
      address the switch listens on.
      Reviewed-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Tested-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      868ff5f4
    • Asbjørn Sloth Tønnesen's avatar
      net: ethernet: mtk_eth_soc: flower: validate control flags · 077633af
      Asbjørn Sloth Tønnesen authored
      This driver currently doesn't support any control flags.
      
      Use flow_rule_has_control_flags() to check for control flags,
      such as can be set through `tc flower ... ip_flags frag`.
      
      In case any control flags are masked, flow_rule_has_control_flags()
      sets a NL extended error message, and we return -EOPNOTSUPP.
      
      Only compile-tested.
      Signed-off-by: default avatarAsbjørn Sloth Tønnesen <ast@fiberby.net>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240418161821.189263-1-ast@fiberby.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      077633af
    • Asbjørn Sloth Tønnesen's avatar
      dpaa2-switch: flower: validate control flags · af7dfa94
      Asbjørn Sloth Tønnesen authored
      This driver currently doesn't support any control flags.
      
      Use flow_rule_match_has_control_flags() to check for control flags,
      such as can be set through `tc flower ... ip_flags frag`.
      
      In case any control flags are masked, flow_rule_match_has_control_flags()
      sets a NL extended error message, and we return -EOPNOTSUPP.
      
      Only compile-tested.
      Signed-off-by: default avatarAsbjørn Sloth Tønnesen <ast@fiberby.net>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Tested-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Link: https://lore.kernel.org/r/20240418161802.189247-1-ast@fiberby.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      af7dfa94
    • Asbjørn Sloth Tønnesen's avatar
      cxgb4: flower: validate control flags · 93a8540a
      Asbjørn Sloth Tønnesen authored
      This driver currently doesn't support any control flags.
      
      Use flow_rule_match_has_control_flags() to check for control flags,
      such as can be set through `tc flower ... ip_flags frag`.
      
      In case any control flags are masked, flow_rule_match_has_control_flags()
      sets a NL extended error message, and we return -EOPNOTSUPP.
      
      Only compile-tested.
      
      Only compile tested, no hardware available.
      Signed-off-by: default avatarAsbjørn Sloth Tønnesen <ast@fiberby.net>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240418161751.189226-1-ast@fiberby.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      93a8540a
    • Jun Gu's avatar
      net: openvswitch: Check vport netdev name · 2540088b
      Jun Gu authored
      Ensure that the provided netdev name is not one of its aliases to
      prevent unnecessary creation and destruction of the vport by
      ovs-vswitchd.
      Signed-off-by: default avatarJun Gu <jun.gu@easystack.cn>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Link: https://lore.kernel.org/r/20240419061425.132723-1-jun.gu@easystack.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2540088b
    • Jakub Kicinski's avatar
      Merge branch 'netlink-add-nftables-spec-w-multi-messages' · 2557e2ec
      Jakub Kicinski authored
      Donald Hunter says:
      
      ====================
      netlink: Add nftables spec w/ multi messages
      
      This series adds a ynl spec for nftables and extends ynl with a --multi
      command line option that makes it possible to send transactional batches
      for nftables.
      
      This series includes a patch for nfnetlink which adds ACK processing for
      batch begin/end messages. If you'd prefer that to be sent separately to
      nf-next then I can do so, but I included it here so that it gets seen in
      context.
      
      An example of usage is:
      
      ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/nftables.yaml \
       --multi batch-begin '{"res-id": 10}' \
       --multi newtable '{"name": "test", "nfgen-family": 1}' \
       --multi newchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \
       --multi batch-end '{"res-id": 10}'
      [None, None, None, None]
      
      It can also be used for bundling get requests:
      
      ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/nftables.yaml \
       --multi gettable '{"name": "test", "nfgen-family": 1}' \
       --multi getchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \
       --output-json
      [{"name": "test", "use": 1, "handle": 1, "flags": [],
       "nfgen-family": 1, "version": 0, "res-id": 2},
       {"table": "test", "name": "chain", "handle": 1, "use": 0,
       "nfgen-family": 1, "version": 0, "res-id": 2}]
      
      There are 2 issues that may be worth resolving:
      
       - ynl reports errors by raising an NlError exception so only the first
         error gets reported. This could be changed to add errors to the list
         of responses so that multiple errors could be reported.
      
       - If any message does not get a response (e.g. batch-begin w/o patch 2)
         then ynl waits indefinitely. A recv timeout could be added which
         would allow ynl to terminate.
      ====================
      
      Link: https://lore.kernel.org/r/20240418104737.77914-1-donald.hunter@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2557e2ec
    • Donald Hunter's avatar
      netfilter: nfnetlink: Handle ACK flags for batch messages · bf2ac490
      Donald Hunter authored
      The NLM_F_ACK flag is ignored for nfnetlink batch begin and end
      messages. This is a problem for ynl which wants to receive an ack for
      every message it sends, not just the commands in between the begin/end
      messages.
      
      Add processing for ACKs for begin/end messages and provide responses
      when requested.
      
      I have checked that iproute2, pyroute2 and systemd are unaffected by
      this change since none of them use NLM_F_ACK for batch begin/end.
      Signed-off-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Link: https://lore.kernel.org/r/20240418104737.77914-5-donald.hunter@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bf2ac490
    • Donald Hunter's avatar
      tools/net/ynl: Add multi message support to ynl · ba8be00f
      Donald Hunter authored
      Add a "--multi <do-op> <json>" command line to ynl that makes it
      possible to add several operations to a single netlink request payload.
      The --multi command line option is repeated for each operation.
      
      This is used by the nftables family for transaction batches. For
      example:
      
      ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/nftables.yaml \
       --multi batch-begin '{"res-id": 10}' \
       --multi newtable '{"name": "test", "nfgen-family": 1}' \
       --multi newchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \
       --multi batch-end '{"res-id": 10}'
      [None, None, None, None]
      
      It can also be used for bundling get requests:
      
      ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/nftables.yaml \
       --multi gettable '{"name": "test", "nfgen-family": 1}' \
       --multi getchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \
       --output-json
      [{"name": "test", "use": 1, "handle": 1, "flags": [],
       "nfgen-family": 1, "version": 0, "res-id": 2},
       {"table": "test", "name": "chain", "handle": 1, "use": 0,
       "nfgen-family": 1, "version": 0, "res-id": 2}]
      Signed-off-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Link: https://lore.kernel.org/r/20240418104737.77914-4-donald.hunter@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ba8be00f
    • Donald Hunter's avatar
      tools/net/ynl: Fix extack decoding for directional ops · 0a966d60
      Donald Hunter authored
      NetlinkProtocol.decode() was looking up ops by response value which breaks
      when it is used for extack decoding of directional ops. Instead, pass
      the op to decode().
      Signed-off-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Link: https://lore.kernel.org/r/20240418104737.77914-3-donald.hunter@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0a966d60
    • Donald Hunter's avatar
      doc/netlink/specs: Add draft nftables spec · 1ee73168
      Donald Hunter authored
      Add a spec for nftables that has nearly complete coverage of the ops,
      but limited coverage of rule types and subexpressions.
      Signed-off-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Link: https://lore.kernel.org/r/20240418104737.77914-2-donald.hunter@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1ee73168
    • Jakub Kicinski's avatar
      Merge branch 'for-uring-ubufops' into HEAD · af046fd1
      Jakub Kicinski authored
      Pavel Begunkov says:
      
      ====================
      implement io_uring notification (ubuf_info) stacking (net part)
      
      To have per request buffer notifications each zerocopy io_uring send
      request allocates a new ubuf_info. However, as an skb can carry only
      one uarg, it may force the stack to create many small skbs hurting
      performance in many ways.
      
      The patchset implements notification, i.e. an io_uring's ubuf_info
      extension, stacking. It attempts to link ubuf_info's into a list,
      allowing to have multiple of them per skb.
      
      liburing/examples/send-zerocopy shows up 6 times performance improvement
      for TCP with 4KB bytes per send, and levels it with MSG_ZEROCOPY. Without
      the patchset it requires much larger sends to utilise all potential.
      
      bytes  | before | after (Kqps)
      1200   | 195    | 1023
      4000   | 193    | 1386
      8000   | 154    | 1058
      ====================
      
      Link: https://lore.kernel.org/all/cover.1713369317.git.asml.silence@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      af046fd1
  2. 22 Apr, 2024 11 commits