1. 15 Apr, 2024 5 commits
    • Asbjørn Sloth Tønnesen's avatar
      net: prestera: flower: validate control flags · f8a5ea8c
      Asbjørn Sloth Tønnesen authored
      Add check for unsupported control flags.
      
      Only compile-tested, no access to HW.
      Signed-off-by: default avatarAsbjørn Sloth Tønnesen <ast@fiberby.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8a5ea8c
    • Asbjørn Sloth Tønnesen's avatar
      nfp: flower: fix check for unsupported control flags · e36245da
      Asbjørn Sloth Tønnesen authored
      Use flow_rule_is_supp_control_flags()
      
      Check the mask, not the key, for unsupported control flags.
      
      Only compile-tested, no access to HW
      Signed-off-by: default avatarAsbjørn Sloth Tønnesen <ast@fiberby.net>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e36245da
    • Asbjørn Sloth Tønnesen's avatar
      flow_offload: add control flag checking helpers · d11e6311
      Asbjørn Sloth Tønnesen authored
      These helpers aim to help drivers, with checking
      for the presence of unsupported control flags.
      
      For drivers supporting at least one control flag:
        flow_rule_is_supp_control_flags()
      
      For drivers using flow_rule_match_control(), but not using flags:
        flow_rule_has_control_flags()
      
      For drivers not using flow_rule_match_control():
        flow_rule_match_has_control_flags()
      
      While primarily aimed at FLOW_DISSECTOR_KEY_CONTROL
      and flow_rule_match_control(), then the first two
      can also be used with FLOW_DISSECTOR_KEY_ENC_CONTROL
      and flow_rule_match_enc_control().
      
      These helpers mirrors the existing check done in sfc:
        drivers/net/ethernet/sfc/tc.c +276
      
      Only compile-tested.
      Signed-off-by: default avatarAsbjørn Sloth Tønnesen <ast@fiberby.net>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d11e6311
    • Jakub Kicinski's avatar
      net: dev_addr_lists: move locking out of init/exit in kunit · 3db3b629
      Jakub Kicinski authored
      We lock and unlock rtnl in init/exit for convenience,
      but it started causing problems if the exit is handled
      by a different thread. To avoid having to futz with
      disabling locking assertions move the locking into
      the test cases. We don't use ASSERTs so it should
      be safe.
      
         ============= dev-addr-list-test (6 subtests) ==============
         [PASSED] dev_addr_test_basic
         [PASSED] dev_addr_test_sync_one
         [PASSED] dev_addr_test_add_del
         [PASSED] dev_addr_test_del_main
         [PASSED] dev_addr_test_add_set
         [PASSED] dev_addr_test_add_excl
         =============== [PASSED] dev-addr-list-test ================
      
      Link: https://lore.kernel.org/all/20240403131936.787234-7-linux@roeck-us.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3db3b629
    • Wander Lairson Costa's avatar
      drop_monitor: replace spin_lock by raw_spin_lock · f1e197a6
      Wander Lairson Costa authored
      trace_drop_common() is called with preemption disabled, and it acquires
      a spin_lock. This is problematic for RT kernels because spin_locks are
      sleeping locks in this configuration, which causes the following splat:
      
      BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
      in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 449, name: rcuc/47
      preempt_count: 1, expected: 0
      RCU nest depth: 2, expected: 2
      5 locks held by rcuc/47/449:
       #0: ff1100086ec30a60 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0x105/0x210
       #1: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0xbf/0x130
       #2: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip+0x11c/0x210
       #3: ffffffffb394a160 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x360/0xc70
       #4: ff1100086ee07520 (&data->lock){+.+.}-{2:2}, at: trace_drop_common.constprop.0+0xb5/0x290
      irq event stamp: 139909
      hardirqs last  enabled at (139908): [<ffffffffb1df2b33>] _raw_spin_unlock_irqrestore+0x63/0x80
      hardirqs last disabled at (139909): [<ffffffffb19bd03d>] trace_drop_common.constprop.0+0x26d/0x290
      softirqs last  enabled at (139892): [<ffffffffb07a1083>] __local_bh_enable_ip+0x103/0x170
      softirqs last disabled at (139898): [<ffffffffb0909b33>] rcu_cpu_kthread+0x93/0x1f0
      Preemption disabled at:
      [<ffffffffb1de786b>] rt_mutex_slowunlock+0xab/0x2e0
      CPU: 47 PID: 449 Comm: rcuc/47 Not tainted 6.9.0-rc2-rt1+ #7
      Hardware name: Dell Inc. PowerEdge R650/0Y2G81, BIOS 1.6.5 04/15/2022
      Call Trace:
       <TASK>
       dump_stack_lvl+0x8c/0xd0
       dump_stack+0x14/0x20
       __might_resched+0x21e/0x2f0
       rt_spin_lock+0x5e/0x130
       ? trace_drop_common.constprop.0+0xb5/0x290
       ? skb_queue_purge_reason.part.0+0x1bf/0x230
       trace_drop_common.constprop.0+0xb5/0x290
       ? preempt_count_sub+0x1c/0xd0
       ? _raw_spin_unlock_irqrestore+0x4a/0x80
       ? __pfx_trace_drop_common.constprop.0+0x10/0x10
       ? rt_mutex_slowunlock+0x26a/0x2e0
       ? skb_queue_purge_reason.part.0+0x1bf/0x230
       ? __pfx_rt_mutex_slowunlock+0x10/0x10
       ? skb_queue_purge_reason.part.0+0x1bf/0x230
       trace_kfree_skb_hit+0x15/0x20
       trace_kfree_skb+0xe9/0x150
       kfree_skb_reason+0x7b/0x110
       skb_queue_purge_reason.part.0+0x1bf/0x230
       ? __pfx_skb_queue_purge_reason.part.0+0x10/0x10
       ? mark_lock.part.0+0x8a/0x520
      ...
      
      trace_drop_common() also disables interrupts, but this is a minor issue
      because we could easily replace it with a local_lock.
      
      Replace the spin_lock with raw_spin_lock to avoid sleeping in atomic
      context.
      Signed-off-by: default avatarWander Lairson Costa <wander@redhat.com>
      Reported-by: default avatarHu Chunyu <chuhu@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1e197a6
  2. 13 Apr, 2024 31 commits
  3. 12 Apr, 2024 4 commits
    • David S. Miller's avatar
      Merge branch 'nfp-minor-improvements' · 982a73c7
      David S. Miller authored
      Louis Peens says:
      
      ====================
      nfp: series of minor driver improvements
      
      This short series bundles now only includes a small update to add a
      board part number to devlink. Previously some dim patches also formed
      part of this series, these were dropped in v5.
      
      Patch1: Add new define for devlink string "board.part_number"
      Patch2: Make use of this field in the nfp driver
      
      Changes since V4:
      - Dropped the dim patches, as there is a more significant rework in
        progress to make it more flexible, as mentioned in the V4 review:
        https://lore.kernel.org/all/1712547870-112976-2-git-send-email-hengqi@linux.alibaba.com/
      - Updated the devlink description of 'board.part_number'
      
      Changes since V3:
      - Fixed: Documentation/networking/devlink/devlink-info.rst:150:
          WARNING: Title underline too short.
      
      Changes since V2:
      - After some discussion on the previous series it was agreed that only
        the "board.part_number" field makes sense in the common code. The
        "board.model" field which was moved to devlink common code in V1 is
        now kept in the driver. The field is specific to the nfp driver,
        exposing the codename of the board.
      - In summary, add "board.part_number" to devlink, and populate it
        in the the nfp driver.
      
      Changes since V1:
      - Move nfp local defines to devlink common code as it is quite generic.
      - Add new 'dim' profile instead of using driver local overrides, as this
        allows use of the 'dim' helpers.
      - This expanded 2 patches to 4, as the common code changes are split
        into seperate patches.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      982a73c7
    • Fei Qin's avatar
      nfp: update devlink device info output · 8910f93b
      Fei Qin authored
      Newer NIC will introduce a new part number, now add it
      into devlink device info.
      
      This patch also updates the information of "board.id" in
      nfp.rst to match the devlink-info.rst.
      Signed-off-by: default avatarFei Qin <fei.qin@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8910f93b
    • Fei Qin's avatar
      devlink: add a new info version tag · 3bb946c9
      Fei Qin authored
      Add definition and documentation for the new generic
      info "board.part_number".
      
      The new one is for part number specific use, and board.id
      is modified to match the documentation in devlink-info.
      Signed-off-by: default avatarFei Qin <fei.qin@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bb946c9
    • Hechao Li's avatar
      tcp: increase the default TCP scaling ratio · 697a6c8c
      Hechao Li authored
      After commit dfa2f048 ("tcp: get rid of sysctl_tcp_adv_win_scale"),
      we noticed an application-level timeout due to reduced throughput.
      
      Before the commit, for a client that sets SO_RCVBUF to 65k, it takes
      around 22 seconds to transfer 10M data. After the commit, it takes 40
      seconds. Because our application has a 30-second timeout, this
      regression broke the application.
      
      The reason that it takes longer to transfer data is that
      tp->scaling_ratio is initialized to a value that results in ~0.25 of
      rcvbuf. In our case, SO_RCVBUF is set to 65536 by the application, which
      translates to 2 * 65536 = 131,072 bytes in rcvbuf and hence a ~28k
      initial receive window.
      
      Later, even though the scaling_ratio is updated to a more accurate
      skb->len/skb->truesize, which is ~0.66 in our environment, the window
      stays at ~0.25 * rcvbuf. This is because tp->window_clamp does not
      change together with the tp->scaling_ratio update when autotuning is
      disabled due to SO_RCVBUF. As a result, the window size is capped at the
      initial window_clamp, which is also ~0.25 * rcvbuf, and never grows
      bigger.
      
      Most modern applications let the kernel do autotuning, and benefit from
      the increased scaling_ratio. But there are applications such as kafka
      that has a default setting of SO_RCVBUF=64k.
      
      This patch increases the initial scaling_ratio from ~25% to 50% in order
      to make it backward compatible with the original default
      sysctl_tcp_adv_win_scale for applications setting SO_RCVBUF.
      
      Fixes: dfa2f048 ("tcp: get rid of sysctl_tcp_adv_win_scale")
      Signed-off-by: default avatarHechao Li <hli@netflix.com>
      Reviewed-by: default avatarTycho Andersen <tycho@tycho.pizza>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/netdev/20240402215405.432863-1-hli@netflix.com/Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      697a6c8c