1. 11 Aug, 2024 22 commits
    • Russell King (Oracle)'s avatar
      net: phylib: do not disable autoneg for fixed speeds >= 1G · 6ff3cddc
      Russell King (Oracle) authored
      We have an increasing number of drivers that are forcing
      auto-negotiation to be enabled for speeds of 1G or faster.
      
      It would appear that auto-negotiation is mandatory for speeds above
      100M. In 802.3, Annex 40C's state diagrams seems to imply that
      mr_autoneg_enable (BMCR AN ENABLE) doesn't affect whether or not the
      AN state machines work for 1000base-T, and some PHY datasheets (e.g.
      Marvell Alaska) state that disabling mr_autoneg_enable leaves AN
      enabled but forced to 1G full duplex.
      
      Other PHY datasheets imply that BMCR AN ENABLE should not be cleared
      for >= 1G.
      
      Thus, this should be handled in phylib rather than in each driver.
      
      Rather than erroring out, arrange to implement the Marvell Alaska
      solution but in software for all PHYs: generate an appropriate
      single-speed advertisement for the requested speed, and keep AN
      enabled to the PHY driver. However, to avoid userspace API breakage,
      continue to report to userspace that we have AN disabled.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ff3cddc
    • Russell King (Oracle)'s avatar
      net: mii: constify advertising mask · aa9fbc5d
      Russell King (Oracle) authored
      Constify the advertising mask to linkmode functions that only read from
      the advertising mask.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa9fbc5d
    • David S. Miller's avatar
      Merge branch 'mvpp2-child-port-removal' · 4efee05f
      David S. Miller authored
      Javier Carrasco says:
      
      ====================
      net: mvpp2: rework child node/port removal handling
      
      These two patches used to be part of another series [1] that did not
      apply to the networking tree without conflicts. This is therefore just a
      partial resend with no code modifications, just rebased onto net/main.
      
      Link: https://lore.kernel.org/all/20240806181026.5fe7f777@kernel.org/ [1]
      ====================
      Signed-off-by: default avatarJavier Carrasco <javier.carrasco.cruz@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4efee05f
    • Javier Carrasco's avatar
      net: mvpp2: use device_for_each_child_node() to access device child nodes · a7b32744
      Javier Carrasco authored
      The iterated nodes are direct children of the device node, and the
      `device_for_each_child_node()` macro accounts for child node
      availability.
      
      `fwnode_for_each_available_child_node()` is meant to access the child
      nodes of an fwnode, and therefore not direct child nodes of the device
      node.
      
      The child nodes within mvpp2_probe are not accessed outside the loops,
      and the scoped version of the macro can be used to automatically
      decrement the refcount on early exits.
      
      Use `device_for_each_child_node()` and its scoped variant to indicate
      device's direct child nodes.
      Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarJavier Carrasco <javier.carrasco.cruz@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7b32744
    • Javier Carrasco's avatar
      net: mvpp2: use port_count to remove ports · e81d00a6
      Javier Carrasco authored
      As discussed in [1], there is no need to iterate over child nodes to
      remove the list of ports. Instead, a loop up to `port_count` ports can
      be used, and is in fact more reliable in case the child node
      availability changes.
      
      The suggested approach removes the need for the `fwnode` and
      `port_fwnode` variables in mvpp2_remove() as well.
      
      Link: https://lore.kernel.org/all/ZqdRgDkK1PzoI2Pf@shell.armlinux.org.uk/ [1]
      Suggested-by: default avatarRussell King <linux@armlinux.org.uk>
      Signed-off-by: default avatarJavier Carrasco <javier.carrasco.cruz@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e81d00a6
    • David S. Miller's avatar
      Merge branch 'bnxt_en-fix-queue-reset-when-queue-active' · 80d021bc
      David S. Miller authored
      David Wei says:
      
      ====================
      fix bnxt_en queue reset when queue is active
      
      The current bnxt_en queue API implementation is buggy when resetting a
      queue that has active traffic. The problem is that there is no FW
      involved to stop the flow of packets and relying on napi_disable() isn't
      enough.
      
      To fix this, call bnxt_hwrm_vnic_update() with MRU set to 0 for both the
      default and the ntuple vnic to stop the flow of packets. This works for
      any Rx queue and not only those that have ntuple rules since every Rx
      queue is either in the default or the ntuple vnic.
      
      For bnxt_hwrm_vnic_update() to work, proper flushing must be done by the
      FW. A FW flag is there to indicate support and queue_mgmt_ops is keyed
      behind this.
      
      The first three patches are from Michael Chan and adds the prerequisite
      vnic functions and FW flags indicating that it will properly flush
      during vnic update.
      
      Tested on BCM957504 while iperf3 is active:
      
      1. Reset a queue that has an ntuple rule steering flow into it
      2. Reset all queues in order, one at a time
      
      In both cases the flow is not interrupted.
      
      Sending this to net-next as there is no in-tree kernel consumer of queue
      API just yet, and there is a patch that changes when the queue_mgmt_ops
      is registered.
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      ---
      v3:
       - include patches from Michael Chan that adds a FW flag for vnic flush
         capability
       - key support for queue_mgmt_ops behind this new flag
      
      v2:
       - split setting vnic->mru into a separate patch (Wojciech)
       - clarify why napi_enable()/disable() is removed
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80d021bc
    • David Wei's avatar
      bnxt_en: only set dev->queue_mgmt_ops if supported by FW · 97cbf3d0
      David Wei authored
      The queue API calls bnxt_hwrm_vnic_update() to stop/start the flow of
      packets, which can only properly flush the pipeline if FW indicates
      support.
      
      Add a macro BNXT_SUPPORTS_QUEUE_API that checks for the required flags
      and only set queue_mgmt_ops if true.
      Signed-off-by: default avatarDavid Wei <dw@davidwei.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97cbf3d0
    • David Wei's avatar
      bnxt_en: stop packet flow during bnxt_queue_stop/start · b9d2956e
      David Wei authored
      The current implementation when resetting a queue while packets are
      flowing puts the queue into an inconsistent state.
      
      There needs to be some synchronisation with the FW. Add calls to
      bnxt_hwrm_vnic_update() to set the MRU for both the default and ntuple
      vnic during queue start/stop. When the MRU is set to 0, flow is stopped.
      Each Rx queue belongs to either the default or the ntuple vnic.
      
      With calling bnxt_hwrm_vnic_update() the calls to napi_enable() and
      napi_disable() must be removed for reset to work on a queue that has
      active traffic flowing e.g. iperf3.
      Co-developed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarDavid Wei <dw@davidwei.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9d2956e
    • David Wei's avatar
      bnxt_en: set vnic->mru in bnxt_hwrm_vnic_cfg() · d41575f7
      David Wei authored
      Set the newly added vnic->mru field in bnxt_hwrm_vnic_cfg().
      Signed-off-by: default avatarDavid Wei <dw@davidwei.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d41575f7
    • Michael Chan's avatar
      bnxt_en: Check the FW's VNIC flush capability · 6e360862
      Michael Chan authored
      Check the HWRM_VNIC_QCAPS FW response for the receive engine flush
      capability.  This capability indicates that we can reliably support
      RX ring restart when calling HWRM_VNIC_UPDATE with MRU set to 0.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid Wei <dw@davidwei.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e360862
    • Michael Chan's avatar
      bnxt_en: Add support to call FW to update a VNIC · f2878cde
      Michael Chan authored
      Add the function bnxt_hwrm_vnic_update() to call FW to update
      a VNIC.  This call can be used when disabling and enabling a
      receive ring within a VNIC.  The mru which is the maximum receive
      size of packets received by the VNIC can be updated.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid Wei <dw@davidwei.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2878cde
    • Michael Chan's avatar
      bnxt_en: Update firmware interface to 1.10.3.68 · fbda8ee6
      Michael Chan authored
      The main changes are:
      
      1. HWRM_VNIC_UPDATE used to safely disable and enable an RX ring within
      the VNIC.
      
      2. New flag in HWRM_VNIC_QCAPS to indicate FW will do the proper flush
      during HWRM_VNIC_UPDATE.
      
      3. New flag in HWRM_FUNC_QCAPS to indicate that reservations for some
      resources such as VNIC can be reduced.
      
      4. New backing store memory types not used by the driver yet.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid Wei <dw@davidwei.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fbda8ee6
    • David S. Miller's avatar
      Merge branch 'l2tp-misc-improvements' · 969afb43
      David S. Miller authored
      James Chapman says:
      
      ====================
      l2tp: misc improvements
      
      This series makes several improvements to l2tp:
      
       * update documentation to be consistent with recent l2tp changes.
       * move l2tp_ip socket tables to per-net data.
       * fix handling of hash key collisions in l2tp_v3_session_get
       * implement and use get-next APIs for management and procfs/debugfs.
       * improve l2tp refcount helpers.
       * use per-cpu dev->tstats in l2tpeth devices.
       * fix a lockdep splat.
       * fix a race between l2tp_pre_exit_net and pppol2tp_release.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      969afb43
    • James Chapman's avatar
      l2tp: flush workqueue before draining it · c1b2e36b
      James Chapman authored
      syzbot exposes a race where a net used by l2tp is removed while an
      existing pppol2tp socket is closed. In l2tp_pre_exit_net, l2tp queues
      TUNNEL_DELETE work items to close each tunnel in the net. When these
      are run, new SESSION_DELETE work items are queued to delete each
      session in the tunnel. This all happens in drain_workqueue. However,
      drain_workqueue allows only new work items if they are queued by other
      work items which are already in the queue. If pppol2tp_release runs
      after drain_workqueue has started, it may queue a SESSION_DELETE work
      item, which results in the warning below in drain_workqueue.
      
      Address this by flushing the workqueue before drain_workqueue such
      that all queued TUNNEL_DELETE work items run before drain_workqueue is
      started. This will queue SESSION_DELETE work items for each session in
      the tunnel, hence pppol2tp_release or other API requests won't queue
      SESSION_DELETE requests once drain_workqueue is started.
      
        WARNING: CPU: 1 PID: 5467 at kernel/workqueue.c:2259 __queue_work+0xcd3/0xf50 kernel/workqueue.c:2258
        Modules linked in:
        CPU: 1 UID: 0 PID: 5467 Comm: syz.3.43 Not tainted 6.11.0-rc1-syzkaller-00247-g3608d6ac #0
        Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
        RIP: 0010:__queue_work+0xcd3/0xf50 kernel/workqueue.c:2258
        Code: ff e8 11 84 36 00 90 0f 0b 90 e9 1e fd ff ff e8 03 84 36 00 eb 13 e8 fc 83 36 00 eb 0c e8 f5 83 36 00 eb 05 e8 ee 83 36 00 90 <0f> 0b 90 48 83 c4 60 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc
        RSP: 0018:ffffc90004607b48 EFLAGS: 00010093
        RAX: ffffffff815ce274 RBX: ffff8880661fda00 RCX: ffff8880661fda00
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
        RBP: 0000000000000000 R08: ffffffff815cd6d4 R09: 0000000000000000
        R10: ffffc90004607c20 R11: fffff520008c0f85 R12: ffff88802ac33800
        R13: ffff88802ac339c0 R14: dffffc0000000000 R15: 0000000000000008
        FS:  00005555713eb500(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000008 CR3: 000000001eda6000 CR4: 00000000003506f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         <TASK>
         queue_work_on+0x1c2/0x380 kernel/workqueue.c:2392
         pppol2tp_release+0x163/0x230 net/l2tp/l2tp_ppp.c:445
         __sock_release net/socket.c:659 [inline]
         sock_close+0xbc/0x240 net/socket.c:1421
         __fput+0x24a/0x8a0 fs/file_table.c:422
         task_work_run+0x24f/0x310 kernel/task_work.c:228
         resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
         exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
         exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
         __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
         syscall_exit_to_user_mode+0x168/0x370 kernel/entry/common.c:218
         do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
         entry_SYSCALL_64_after_hwframe+0x77/0x7f
        RIP: 0033:0x7f061e9779f9
        Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
        RSP: 002b:00007ffff1c1fce8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
        RAX: 0000000000000000 RBX: 000000000001017d RCX: 00007f061e9779f9
        RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
        RBP: 00007ffff1c1fdc0 R08: 0000000000000001 R09: 00007ffff1c1ffcf
        R10: 00007f061e800000 R11: 0000000000000246 R12: 0000000000000032
        R13: 00007ffff1c1fde0 R14: 00007ffff1c1fe00 R15: ffffffffffffffff
        </TASK>
      
      Fixes: fc7ec7f5 ("l2tp: delete sessions using work queue")
      Reported-by: syzbot+0e85b10481d2f5478053@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=0e85b10481d2f5478053Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1b2e36b
    • James Chapman's avatar
      l2tp: l2tp_eth: use per-cpu counters from dev->tstats · dcc59d3e
      James Chapman authored
      l2tp_eth uses old-style dev->stats for fastpath packet/byte
      counters. Convert it to use dev->tstats per-cpu counters.
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcc59d3e
    • James Chapman's avatar
      l2tp: improve tunnel/session refcount helpers · abe7a1a7
      James Chapman authored
      l2tp_tunnel_inc_refcount and l2tp_session_inc_refcount wrap
      refcount_inc. They add no value so just use the refcount APIs directly
      and drop l2tp's helpers. l2tp already uses refcount_inc_not_zero
      anyway.
      
      Rename l2tp_tunnel_dec_refcount and l2tp_session_dec_refcount to
      l2tp_tunnel_put and l2tp_session_put to better match their use pairing
      various _get getters.
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abe7a1a7
    • James Chapman's avatar
      l2tp: use get_next APIs for management requests and procfs/debugfs · 1f4c3dce
      James Chapman authored
      l2tp netlink and procfs/debugfs iterate over tunnel and session lists
      to obtain data. They currently use very inefficient get_nth functions
      to do so. Replace these with get_next.
      
      For netlink, use nl cb->ctx[] for passing state instead of the
      obsolete cb->args[].
      
      l2tp_tunnel_get_nth and l2tp_session_get_nth are no longer used so
      they can be removed.
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f4c3dce
    • James Chapman's avatar
      l2tp: add tunnel/session get_next helpers · aa92c1ce
      James Chapman authored
      l2tp management APIs and procfs/debugfs iterate over l2tp tunnel and
      session lists. Since these lists are now implemented using IDR, we can
      use IDR get_next APIs to iterate them. Add tunnel/session get_next
      functions to do so.
      
      The session get_next functions get the next session in a given tunnel
      and need to account for l2tpv2 and l2tpv3 differences:
      
       * l2tpv2 sessions are keyed by tunnel ID / session ID. Iteration for
         a given tunnel ID, TID, can therefore start with a key given by
         TID/0 and finish when the next entry's tunnel ID is not TID. This
         is possible only because the tunnel ID part of the key is the upper
         16 bits and the session ID part the lower 16 bits; when idr_next
         increments the key value, it therefore finds the next sessions of
         the current tunnel before those of the next tunnel. Entries with
         session ID 0 are always skipped because they are used internally by
         pppol2tp.
      
       * l2tpv3 sessions are keyed by session ID. Iteration starts at the
         first IDR entry and skips entries where the tunnel does not
         match. Iteration must also consider session ID collisions and walk
         the list of colliding sessions (if any) for one which matches the
         supplied tunnel.
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa92c1ce
    • James Chapman's avatar
      l2tp: handle hash key collisions in l2tp_v3_session_get · b0a8deda
      James Chapman authored
      To handle colliding l2tpv3 session IDs, l2tp_v3_session_get searches a
      hashed list keyed by ID and sk. Although unlikely, if hash keys
      collide, it is possible that hash_for_each_possible loops over a
      session which doesn't have the ID that we are searching for. So check
      for session ID match when looping over possible hash key matches.
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0a8deda
    • James Chapman's avatar
      l2tp: move l2tp_ip and l2tp_ip6 data to pernet · ebed6606
      James Chapman authored
      l2tp_ip[6] have always used global socket tables. It is therefore not
      possible to create l2tpip sockets in different namespaces with the
      same socket address.
      
      To support this, move l2tpip socket tables to pernet data.
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebed6606
    • James Chapman's avatar
      l2tp: remove inline from functions in c sources · 168464c1
      James Chapman authored
      Update l2tp to remove the inline keyword from several functions in C
      sources, since this is now discouraged.
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      168464c1
    • James Chapman's avatar
      documentation/networking: update l2tp docs · e2b1762c
      James Chapman authored
      l2tp no longer uses sk_user_data in tunnel sockets and now manages
      tunnel/session lifetimes slightly differently. Update docs to cover
      this.
      
      CC: linux-doc@vger.kernel.org
      CC: corbet@lwn.net
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2b1762c
  2. 10 Aug, 2024 18 commits