1. 15 Jun, 2023 8 commits
    • Jan Karcher's avatar
      MAINTAINERS: add reviewers for SMC Sockets · 7d03646d
      Jan Karcher authored
      adding three people from Alibaba as reviewers for SMC.
      They are currently working on improving SMC on other architectures than
      s390 and help with reviewing patches on top.
      
      Thank you D. Wythe, Tony Lu and Wen Gu for your contributions and
      collaboration and welcome on board as reviewers!
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Signed-off-by: default avatarJan Karcher <jaka@linux.ibm.com>
      Acked-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Acked-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d03646d
    • Julian Ruess's avatar
      s390/ism: Fix trying to free already-freed IRQ by repeated ism_dev_exit() · 78d0f949
      Julian Ruess authored
      This patch prevents the system from crashing when unloading the ISM module.
      
      How to reproduce: Attach an ISM device and execute 'rmmod ism'.
      
      Error-Log:
      - Trying to free already-free IRQ 0
      - WARNING: CPU: 1 PID: 966 at kernel/irq/manage.c:1890 free_irq+0x140/0x540
      
      After calling ism_dev_exit() for each ISM device in the exit routine,
      pci_unregister_driver() will execute ism_remove() for each ISM device.
      Because ism_remove() also calls ism_dev_exit(),
      free_irq(pci_irq_vector(pdev, 0), ism) is called twice for each ISM
      device. This results in a crash with the error
      'Trying to free already-free IRQ'.
      
      In the exit routine, it is enough to call pci_unregister_driver()
      because it ensures that ism_dev_exit() is called once per
      ISM device.
      
      Cc: <stable@vger.kernel.org> # 6.3+
      Fixes: 89e7d2ba ("net/ism: Add new API for client registration")
      Reviewed-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: default avatarJulian Ruess <julianr@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78d0f949
    • Vladimir Oltean's avatar
      net: dsa: felix: fix taprio guard band overflow at 10Mbps with jumbo frames · 6ac7a27a
      Vladimir Oltean authored
      The DEV_MAC_MAXLEN_CFG register contains a 16-bit value - up to 65535.
      Plus 2 * VLAN_HLEN (4), that is up to 65543.
      
      The picos_per_byte variable is the largest when "speed" is lowest -
      SPEED_10 = 10. In that case it is (1000000L * 8) / 10 = 800000.
      
      Their product - 52434400000 - exceeds 32 bits, which is a problem,
      because apparently, a multiplication between two 32-bit factors is
      evaluated as 32-bit before being assigned to a 64-bit variable.
      In fact it's a problem for any MTU value larger than 5368.
      
      Cast one of the factors of the multiplication to u64 to force the
      multiplication to take place on 64 bits.
      
      Issue found by Coverity.
      
      Fixes: 55a515b1 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230613170907.2413559-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6ac7a27a
    • Vlad Buslov's avatar
      net/sched: cls_api: Fix lockup on flushing explicitly created chain · c9a82bec
      Vlad Buslov authored
      Mingshuai Ren reports:
      
      When a new chain is added by using tc, one soft lockup alarm will be
       generated after delete the prio 0 filter of the chain. To reproduce
       the problem, perform the following steps:
      (1) tc qdisc add dev eth0 root handle 1: htb default 1
      (2) tc chain add dev eth0
      (3) tc filter del dev eth0 chain 0 parent 1: prio 0
      (4) tc filter add dev eth0 chain 0 parent 1:
      
      Fix the issue by accounting for additional reference to chains that are
      explicitly created by RTM_NEWCHAIN message as opposed to implicitly by
      RTM_NEWTFILTER message.
      
      Fixes: 726d0612 ("net: sched: prevent insertion of new classifiers during chain flush")
      Reported-by: default avatarMingshuai Ren <renmingshuai@huawei.com>
      Closes: https://lore.kernel.org/lkml/87legswvi3.fsf@nvidia.com/T/Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Link: https://lore.kernel.org/r/20230612093426.2867183-1-vladbu@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c9a82bec
    • Jakub Buchocki's avatar
      ice: Fix ice module unload · 24b454bc
      Jakub Buchocki authored
      Clearing the interrupt scheme before PFR reset,
      during the removal routine, could cause the hardware
      errors and possibly lead to system reboot, as the PF
      reset can cause the interrupt to be generated.
      
      Place the call for PFR reset inside ice_deinit_dev(),
      wait until reset and all pending transactions are done,
      then call ice_clear_interrupt_scheme().
      
      This introduces a PFR reset to multiple error paths.
      
      Additionally, remove the call for the reset from
      ice_load() - it will be a part of ice_unload() now.
      
      Error example:
      [   75.229328] ice 0000:ca:00.1: Failed to read Tx Scheduler Tree - User Selection data from flash
      [   77.571315] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
      [   77.571418] {1}[Hardware Error]: event severity: recoverable
      [   77.571459] {1}[Hardware Error]:  Error 0, type: recoverable
      [   77.571500] {1}[Hardware Error]:   section_type: PCIe error
      [   77.571540] {1}[Hardware Error]:   port_type: 4, root port
      [   77.571580] {1}[Hardware Error]:   version: 3.0
      [   77.571615] {1}[Hardware Error]:   command: 0x0547, status: 0x4010
      [   77.571661] {1}[Hardware Error]:   device_id: 0000:c9:02.0
      [   77.571703] {1}[Hardware Error]:   slot: 25
      [   77.571736] {1}[Hardware Error]:   secondary_bus: 0xca
      [   77.571773] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347a
      [   77.571821] {1}[Hardware Error]:   class_code: 060400
      [   77.571858] {1}[Hardware Error]:   bridge: secondary_status: 0x2800, control: 0x0013
      [   77.572490] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
      [   77.572870] pcieport 0000:c9:02.0:    [21] ACSViol                (First)
      [   77.573222] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
      [   77.573554] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
      [   77.691273] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
      [   77.691738] {2}[Hardware Error]: event severity: recoverable
      [   77.691971] {2}[Hardware Error]:  Error 0, type: recoverable
      [   77.692192] {2}[Hardware Error]:   section_type: PCIe error
      [   77.692403] {2}[Hardware Error]:   port_type: 4, root port
      [   77.692616] {2}[Hardware Error]:   version: 3.0
      [   77.692825] {2}[Hardware Error]:   command: 0x0547, status: 0x4010
      [   77.693032] {2}[Hardware Error]:   device_id: 0000:c9:02.0
      [   77.693238] {2}[Hardware Error]:   slot: 25
      [   77.693440] {2}[Hardware Error]:   secondary_bus: 0xca
      [   77.693641] {2}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347a
      [   77.693853] {2}[Hardware Error]:   class_code: 060400
      [   77.694054] {2}[Hardware Error]:   bridge: secondary_status: 0x0800, control: 0x0013
      [   77.719115] pci 0000:ca:00.1: AER: can't recover (no error_detected callback)
      [   77.719140] pcieport 0000:c9:02.0: AER: device recovery failed
      [   77.719216] pcieport 0000:c9:02.0: AER: aer_status: 0x00200000, aer_mask: 0x00100020
      [   77.719390] pcieport 0000:c9:02.0:    [21] ACSViol                (First)
      [   77.719557] pcieport 0000:c9:02.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
      [   77.719723] pcieport 0000:c9:02.0: AER: aer_uncor_severity: 0x00463010
      
      Fixes: 5b246e53 ("ice: split probe into smaller functions")
      Signed-off-by: default avatarJakub Buchocki <jakubx.buchocki@intel.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230612171421.21570-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      24b454bc
    • Jakub Kicinski's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · d6858e19
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-06-12 (igc, igb)
      
      This series contains updates to igc and igb drivers.
      
      Husaini clears Tx rings when interface is brought down for igc.
      
      Vinicius disables PTM and PCI busmaster when removing igc driver.
      
      Alex adds error check and path for NVM read error on igb.
      
      * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        igb: fix nvm.ops.read() error handling
        igc: Fix possible system crash when loading module
        igc: Clean the TX buffer and TX descriptor ring
      ====================
      
      Link: https://lore.kernel.org/r/20230612205208.115292-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d6858e19
    • Lin Ma's avatar
      net/handshake: remove fput() that causes use-after-free · 361b6889
      Lin Ma authored
      A reference underflow is found in TLS handshake subsystem that causes a
      direct use-after-free. Part of the crash log is like below:
      
      [    2.022114] ------------[ cut here ]------------
      [    2.022193] refcount_t: underflow; use-after-free.
      [    2.022288] WARNING: CPU: 0 PID: 60 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
      [    2.022432] Modules linked in:
      [    2.022848] RIP: 0010:refcount_warn_saturate+0xbe/0x110
      [    2.023231] RSP: 0018:ffffc900001bfe18 EFLAGS: 00000286
      [    2.023325] RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00000000ffffdfff
      [    2.023438] RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 0000000000000001
      [    2.023555] RBP: ffff888004c20098 R08: ffffffff82b392c8 R09: 00000000ffffdfff
      [    2.023693] R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888004c200d8
      [    2.023813] R13: 0000000000000000 R14: ffff888004c20000 R15: ffffc90000013ca8
      [    2.023930] FS:  0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
      [    2.024062] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    2.024161] CR2: ffff888003601000 CR3: 0000000002a2e000 CR4: 00000000000006f0
      [    2.024275] Call Trace:
      [    2.024322]  <TASK>
      [    2.024367]  ? __warn+0x7f/0x130
      [    2.024430]  ? refcount_warn_saturate+0xbe/0x110
      [    2.024513]  ? report_bug+0x199/0x1b0
      [    2.024585]  ? handle_bug+0x3c/0x70
      [    2.024676]  ? exc_invalid_op+0x18/0x70
      [    2.024750]  ? asm_exc_invalid_op+0x1a/0x20
      [    2.024830]  ? refcount_warn_saturate+0xbe/0x110
      [    2.024916]  ? refcount_warn_saturate+0xbe/0x110
      [    2.024998]  __tcp_close+0x2f4/0x3d0
      [    2.025065]  ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10
      [    2.025168]  tcp_close+0x1f/0x70
      [    2.025231]  inet_release+0x33/0x60
      [    2.025297]  sock_release+0x1f/0x80
      [    2.025361]  handshake_req_cancel_test2+0x100/0x2d0
      [    2.025457]  kunit_try_run_case+0x4c/0xa0
      [    2.025532]  kunit_generic_run_threadfn_adapter+0x15/0x20
      [    2.025644]  kthread+0xe1/0x110
      [    2.025708]  ? __pfx_kthread+0x10/0x10
      [    2.025780]  ret_from_fork+0x2c/0x50
      
      One can enable CONFIG_NET_HANDSHAKE_KUNIT_TEST config to reproduce above
      crash.
      
      The root cause of this bug is that the commit 1ce77c99
      ("net/handshake: Unpin sock->file if a handshake is cancelled") adds one
      additional fput() function. That patch claims that the fput() is used to
      enable sock->file to be freed even when user space never calls DONE.
      
      However, it seems that the intended DONE routine will never give an
      additional fput() of ths sock->file. The existing two of them are just
      used to balance the reference added in sockfd_lookup().
      
      This patch revert the mentioned commit to avoid the use-after-free. The
      patched kernel could successfully pass the KUNIT test and boot to shell.
      
      [    0.733613]     # Subtest: Handshake API tests
      [    0.734029]     1..11
      [    0.734255]         KTAP version 1
      [    0.734542]         # Subtest: req_alloc API fuzzing
      [    0.736104]         ok 1 handshake_req_alloc NULL proto
      [    0.736114]         ok 2 handshake_req_alloc CLASS_NONE
      [    0.736559]         ok 3 handshake_req_alloc CLASS_MAX
      [    0.737020]         ok 4 handshake_req_alloc no callbacks
      [    0.737488]         ok 5 handshake_req_alloc no done callback
      [    0.737988]         ok 6 handshake_req_alloc excessive privsize
      [    0.738529]         ok 7 handshake_req_alloc all good
      [    0.739036]     # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
      [    0.739444]     ok 1 req_alloc API fuzzing
      [    0.740065]     ok 2 req_submit NULL req arg
      [    0.740436]     ok 3 req_submit NULL sock arg
      [    0.740834]     ok 4 req_submit NULL sock->file
      [    0.741236]     ok 5 req_lookup works
      [    0.741621]     ok 6 req_submit max pending
      [    0.741974]     ok 7 req_submit multiple
      [    0.742382]     ok 8 req_cancel before accept
      [    0.742764]     ok 9 req_cancel after accept
      [    0.743151]     ok 10 req_cancel after done
      [    0.743510]     ok 11 req_destroy works
      [    0.743882] # Handshake API tests: pass:11 fail:0 skip:0 total:11
      [    0.744205] # Totals: pass:17 fail:0 skip:0 total:17
      Acked-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Fixes: 1ce77c99 ("net/handshake: Unpin sock->file if a handshake is cancelled")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Link: https://lore.kernel.org/r/20230613083204.633896-1-linma@zju.edu.cn
      Link: https://lore.kernel.org/r/20230614015249.987448-1-linma@zju.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      361b6889
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2023-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · 37cec6ed
      Jakub Kicinski authored
      Johannes Berg says:
      
      ====================
      A couple of straggler fixes, mostly in the stack:
       - fix fragmentation for multi-link related elements
       - fix callback copy/paste error
       - fix multi-link locking
       - remove double-locking of wiphy mutex
       - transmit only on active links, not all
       - activate links in the correct order
       - don't remove links that weren't added
       - disable soft-IRQs for LQ lock in iwlwifi
      
      * tag 'wireless-2023-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
        wifi: mac80211: fragment per STA profile correctly
        wifi: mac80211: Use active_links instead of valid_links in Tx
        wifi: cfg80211: remove links only on AP
        wifi: mac80211: take lock before setting vif links
        wifi: cfg80211: fix link del callback to call correct handler
        wifi: mac80211: fix link activation settings order
        wifi: cfg80211: fix double lock bug in reg_wdev_chan_valid()
      ====================
      
      Link: https://lore.kernel.org/r/20230614075502.11765-1-johannes@sipsolutions.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      37cec6ed
  2. 14 Jun, 2023 12 commits
    • Danielle Ratson's avatar
      selftests: forwarding: hw_stats_l3: Set addrgenmode in a separate step · bef68e20
      Danielle Ratson authored
      Setting the IPv6 address generation mode of a net device during its
      creation never worked, but after commit b0ad3c17 ("rtnetlink: call
      validate_linkmsg in rtnl_create_link") it explicitly fails [1]. The
      failure is caused by the fact that validate_linkmsg() is called before
      the net device is registered, when it still does not have an 'inet6_dev'.
      
      Likewise, raising the net device before setting the address generation
      mode is meaningless, because by the time the mode is set, the address
      has already been generated.
      
      Therefore, fix the test to first create the net device, then set its
      IPv6 address generation mode and finally bring it up.
      
      [1]
       # ip link add name mydev addrgenmode eui64 type dummy
       RTNETLINK answers: Address family not supported by protocol
      
      Fixes: ba95e793 ("selftests: forwarding: hw_stats_l3: Add a new test")
      Signed-off-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Link: https://lore.kernel.org/r/f3b05d85b2bc0c3d6168fe8f7207c6c8365703db.1686580046.git.petrm@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bef68e20
    • Paolo Abeni's avatar
      Merge branch 'net-sched-fix-race-conditions-in-mini_qdisc_pair_swap' · 3b0d2819
      Paolo Abeni authored
      Peilin Ye says:
      
      ====================
      net/sched: Fix race conditions in mini_qdisc_pair_swap()
      
      These 2 patches fix race conditions for ingress and clsact Qdiscs as
      reported [1] by syzbot, split out from another [2] series (last 2 patches
      of it).  Per-patch changelog omitted.
      
      Patch 1 hasn't been touched since last version; I just included
      everybody's tag.
      
      Patch 2 bases on patch 6 v1 of [2], with comments and commit log slightly
      changed.  We also need rtnl_dereference() to load ->qdisc_sleeping since
      commit d636fc5d ("net: sched: add rcu annotations around
      qdisc->qdisc_sleeping"), so I changed that; please take yet another look,
      thanks!
      
      Patch 2 has been tested with the new reproducer Pedro posted [3].
      
      [1] https://syzkaller.appspot.com/bug?extid=b53a9c0d1ea4ad62da8b
      [2] https://lore.kernel.org/r/cover.1684887977.git.peilin.ye@bytedance.com/
      [3] https://lore.kernel.org/r/7879f218-c712-e9cc-57ba-665990f5f4c9@mojatatu.com/
      ====================
      
      Link: https://lore.kernel.org/r/cover.1686355297.git.peilin.ye@bytedance.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3b0d2819
    • Peilin Ye's avatar
      net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting · 84ad0af0
      Peilin Ye authored
      mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized
      in ingress_init() to point to net_device::miniq_ingress.  ingress Qdiscs
      access this per-net_device pointer in mini_qdisc_pair_swap().  Similar
      for clsact Qdiscs and miniq_egress.
      
      Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER
      requests (thanks Hillf Danton for the hint), when replacing ingress or
      clsact Qdiscs, for example, the old Qdisc ("@old") could access the same
      miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"),
      causing race conditions [1] including a use-after-free bug in
      mini_qdisc_pair_swap() reported by syzbot:
      
       BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
       Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901
      ...
       Call Trace:
        <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
        print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
        print_report mm/kasan/report.c:430 [inline]
        kasan_report+0x11c/0x130 mm/kasan/report.c:536
        mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
        tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
        tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
        tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
        tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
        tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266
      ...
      
      @old and @new should not affect each other.  In other words, @old should
      never modify miniq_{in,e}gress after @new, and @new should not update
      @old's RCU state.
      
      Fixing without changing sch_api.c turned out to be difficult (please
      refer to Closes: for discussions).  Instead, make sure @new's first call
      always happen after @old's last call (in {ingress,clsact}_destroy()) has
      finished:
      
      In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests,
      and call qdisc_destroy() for @old before grafting @new.
      
      Introduce qdisc_refcount_dec_if_one() as the counterpart of
      qdisc_refcount_inc_nz() used for filter requests.  Introduce a
      non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check,
      just like qdisc_put() etc.
      
      Depends on patch "net/sched: Refactor qdisc_graft() for ingress and
      clsact Qdiscs".
      
      [1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under
      TC_H_ROOT (no longer possible after commit c7cfbd11 ("net/sched:
      sch_ingress: Only create under TC_H_INGRESS")) on eth0 that has 8
      transmission queues:
      
        Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2),
        then adds a flower filter X to A.
      
        Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
        b2) to replace A, then adds a flower filter Y to B.
      
       Thread 1               A's refcnt   Thread 2
        RTM_NEWQDISC (A, RTNL-locked)
         qdisc_create(A)               1
         qdisc_graft(A)                9
      
        RTM_NEWTFILTER (X, RTNL-unlocked)
         __tcf_qdisc_find(A)          10
         tcf_chain0_head_change(A)
         mini_qdisc_pair_swap(A) (1st)
                  |
                  |                         RTM_NEWQDISC (B, RTNL-locked)
               RCU sync                2     qdisc_graft(B)
                  |                    1     notify_and_destroy(A)
                  |
         tcf_block_release(A)          0    RTM_NEWTFILTER (Y, RTNL-unlocked)
         qdisc_destroy(A)                    tcf_chain0_head_change(B)
         tcf_chain0_head_change_cb_del(A)    mini_qdisc_pair_swap(B) (2nd)
         mini_qdisc_pair_swap(A) (3rd)                |
                 ...                                 ...
      
      Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to
      its mini Qdisc, b1.  Then, A calls mini_qdisc_pair_swap() again during
      ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress
      packets on eth0 will not find filter Y in sch_handle_ingress().
      
      This is just one of the possible consequences of concurrently accessing
      miniq_{in,e}gress pointers.
      
      Fixes: 7a096d57 ("net: sched: ingress: set 'unlocked' flag for Qdisc ops")
      Fixes: 87f37392 ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops")
      Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      84ad0af0
    • Peilin Ye's avatar
      net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs · 2d5f6a8d
      Peilin Ye authored
      Grafting ingress and clsact Qdiscs does not need a for-loop in
      qdisc_graft().  Refactor it.  No functional changes intended.
      Tested-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2d5f6a8d
    • Paul Blakey's avatar
      net/sched: act_ct: Fix promotion of offloaded unreplied tuple · 41f2c7c3
      Paul Blakey authored
      Currently UNREPLIED and UNASSURED connections are added to the nf flow
      table. This causes the following connection packets to be processed
      by the flow table which then skips conntrack_in(), and thus such the
      connections will remain UNREPLIED and UNASSURED even if reply traffic
      is then seen. Even still, the unoffloaded reply packets are the ones
      triggering hardware update from new to established state, and if
      there aren't any to triger an update and/or previous update was
      missed, hardware can get out of sync with sw and still mark
      packets as new.
      
      Fix the above by:
      1) Not skipping conntrack_in() for UNASSURED packets, but still
         refresh for hardware, as before the cited patch.
      2) Try and force a refresh by reply-direction packets that update
         the hardware rules from new to established state.
      3) Remove any bidirectional flows that didn't failed to update in
         hardware for re-insertion as bidrectional once any new packet
         arrives.
      
      Fixes: 6a9bad00 ("net/sched: act_ct: offload UDP NEW connections")
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Link: https://lore.kernel.org/r/1686313379-117663-1-git-send-email-paulb@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      41f2c7c3
    • Hugh Dickins's avatar
      wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression · f1a0898b
      Hugh Dickins authored
      Lockdep on 6.4-rc on ThinkPad X1 Carbon 5th says
      =====================================================
      WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      6.4.0-rc5 #1 Not tainted
      -----------------------------------------------------
      kworker/3:1/49 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
      ffff8881066fa368 (&mvm_sta->deflink.lq_sta.rs_drv.pers.lock){+.+.}-{2:2}, at: rs_drv_get_rate+0x46/0xe7
      
      and this task is already holding:
      ffff8881066f80a8 (&sta->rate_ctrl_lock){+.-.}-{2:2}, at: rate_control_get_rate+0xbd/0x126
      which would create a new lock dependency:
       (&sta->rate_ctrl_lock){+.-.}-{2:2} -> (&mvm_sta->deflink.lq_sta.rs_drv.pers.lock){+.+.}-{2:2}
      
      but this new dependency connects a SOFTIRQ-irq-safe lock:
       (&sta->rate_ctrl_lock){+.-.}-{2:2}
      etc. etc. etc.
      
      Changing the spin_lock() in rs_drv_get_rate() to spin_lock_bh() was not
      enough to pacify lockdep, but changing them all on pers.lock has worked.
      
      Fixes: a8938bc8 ("wifi: iwlwifi: mvm: Add locking to the rate read flow")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Link: https://lore.kernel.org/r/79ffcc22-9775-cb6d-3ffd-1a517c40beef@google.comSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      f1a0898b
    • Jakub Kicinski's avatar
      Merge branch 'fix-small-bugs-and-annoyances-in-tc-testing' · 07b1cc84
      Jakub Kicinski authored
      Vlad Buslov says:
      
      ====================
      Fix small bugs and annoyances in tc-testing
      ====================
      
      Link: https://lore.kernel.org/r/20230612075712.2861848-1-vladbu@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      07b1cc84
    • Vlad Buslov's avatar
      selftests/tc-testing: Remove configs that no longer exist · 11b8b2e7
      Vlad Buslov authored
      Some qdiscs and classifiers have recently been retired from kernel.
      However, tc-testing config is still cluttered with them which causes noise
      when using merge_config.sh script to update existing config for tc-testing
      compatibility. Remove the config settings for affected qdiscs and
      classifiers.
      
      Fixes: fb38306c ("net/sched: Retire ATM qdisc")
      Fixes: 051d4420 ("net/sched: Retire CBQ qdisc")
      Fixes: bbe77c14 ("net/sched: Retire dsmark qdisc")
      Fixes: 265b4da8 ("net/sched: Retire rsvp classifier")
      Fixes: 8c710f75 ("net/sched: Retire tcindex classifier")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      11b8b2e7
    • Vlad Buslov's avatar
      selftests/tc-testing: Fix SFB db test · b39d8c41
      Vlad Buslov authored
      Setting very small value of db like 10ms introduces rounding errors when
      converting to/from jiffies on some kernel configs. For example, on 250hz
      the actual value will be set to 12ms which causes the test to fail:
      
       # $ sudo ./tdc.py  -d eth2 -e 3410
       #  -- ns/SubPlugin.__init__
       # Test 3410: Create SFB with db setting
       #
       # All test results:
       #
       # 1..1
       # not ok 1 3410 - Create SFB with db setting
       #         Could not match regex pattern. Verify command output:
       # qdisc sfb 1: root refcnt 2 rehash 600s db 12ms limit 1000p max 25p target 20p increment 0.000503548 decrement 4.57771e-05 penalty_rate 10pps penalty_burst 20p
      
      Set the value to 100ms instead which currently seem to work on 100hz,
      250hz, 300hz and 1000hz kernel configs.
      
      Fixes: 6ad92dc5 ("selftests/tc-testing: add selftests for sfb qdisc")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b39d8c41
    • Vlad Buslov's avatar
      selftests/tc-testing: Fix Error: failed to find target LOG · b849c566
      Vlad Buslov authored
      Add missing netfilter config dependency.
      
      Fixes following example error when running tests via tdc.sh for all XT
      tests:
      
       # $ sudo ./tdc.py -d eth2 -e 2029
       # Test 2029: Add xt action with log-prefix
       # exit: 255
       # exit: 0
       #  failed to find target LOG
       #
       # bad action parsing
       # parse_action: bad value (7:xt)!
       # Illegal "action"
       #
       # -----> teardown stage *** Could not execute: "$TC actions flush action xt"
       #
       # -----> teardown stage *** Error message: "Error: Cannot flush unknown TC action.
       # We have an error flushing
       # "
       # returncode 1; expected [0]
       #
       # -----> teardown stage *** Aborting test run.
       #
       # <_io.BufferedReader name=3> *** stdout ***
       #
       # <_io.BufferedReader name=5> *** stderr ***
       # "-----> teardown stage" did not complete successfully
       # Exception <class '__main__.PluginMgrTestFail'> ('teardown', ' failed to find target LOG\n\nbad action parsing\nparse_action: bad value (7:xt)!\nIllegal "action"\n', '"-----> teardown stage" did not complete successfully') (caught in test_runner, running test 2 2029 Add xt action with log-prefix stage teardown)
       # ---------------
       # traceback
       #   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 495, in test_runner
       #     res = run_one_test(pm, args, index, tidx)
       #   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 434, in run_one_test
       #     prepare_env(args, pm, 'teardown', '-----> teardown stage', tidx['teardown'], procout)
       #   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 245, in prepare_env
       #     raise PluginMgrTestFail(
       # ---------------
       # accumulated output for this test:
       #  failed to find target LOG
       #
       # bad action parsing
       # parse_action: bad value (7:xt)!
       # Illegal "action"
       #
       # ---------------
       #
       # All test results:
       #
       # 1..1
       # ok 1 2029 - Add xt action with log-prefix # skipped - "-----> teardown stage" did not complete successfully
      
      Fixes: 910d504b ("selftests/tc-testings: add selftests for xt action")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b849c566
    • Vlad Buslov's avatar
      selftests/tc-testing: Fix Error: Specified qdisc kind is unknown. · aef6e908
      Vlad Buslov authored
      All TEQL tests assume that sch_teql module is loaded. Load module in tdc.sh
      before running qdisc tests.
      
      Fixes following example error when running tests via tdc.sh for all TEQL
      tests:
      
       # $ sudo ./tdc.py -d eth2 -e 84a0
       #  -- ns/SubPlugin.__init__
       # Test 84a0: Create TEQL with default setting
       # exit: 2
       # exit: 0
       # Error: Specified qdisc kind is unknown.
       #
       # -----> teardown stage *** Could not execute: "$TC qdisc del dev $DUMMY handle 1: root"
       #
       # -----> teardown stage *** Error message: "Error: Invalid handle.
       # "
       # returncode 2; expected [0]
       #
       # -----> teardown stage *** Aborting test run.
       #
       # <_io.BufferedReader name=3> *** stdout ***
       #
       # <_io.BufferedReader name=5> *** stderr ***
       # "-----> teardown stage" did not complete successfully
       # Exception <class '__main__.PluginMgrTestFail'> ('teardown', 'Error: Specified qdisc kind is unknown.\n', '"-----> teardown stage" did not complete successfully') (caught in test_runner, running test 2 84a0 Create TEQL with default setting stage teardown)
       # ---------------
       # traceback
       #   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 495, in test_runner
       #     res = run_one_test(pm, args, index, tidx)
       #   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 434, in run_one_test
       #     prepare_env(args, pm, 'teardown', '-----> teardown stage', tidx['teardown'], procout)
       #   File "/images/src/linux/tools/testing/selftests/tc-testing/./tdc.py", line 245, in prepare_env
       #     raise PluginMgrTestFail(
       # ---------------
       # accumulated output for this test:
       # Error: Specified qdisc kind is unknown.
       #
       # ---------------
       #
       # All test results:
       #
       # 1..1
       # ok 1 84a0 - Create TEQL with default setting # skipped - "-----> teardown stage" did not complete successfully
      
      Fixes: cc62fbe1 ("selftests/tc-testing: add selftests for teql qdisc")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aef6e908
    • Dan Carpenter's avatar
      net: ethernet: ti: am65-cpsw: Call of_node_put() on error path · 374283a1
      Dan Carpenter authored
      This code returns directly but it should instead call of_node_put()
      to drop some reference counts.
      
      Fixes: dab2b265 ("net: ethernet: ti: am65-cpsw: Add support for SERDES configuration")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Reviewed-by: default avatarRoger Quadros <rogerq@kernel.org>
      Link: https://lore.kernel.org/r/e3012f0c-1621-40e6-bf7d-03c276f6e07f@kili.mountainSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      374283a1
  3. 12 Jun, 2023 20 commits
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-fixes' · fbf6f482
      Jakub Kicinski authored
      Matthieu Baerts says:
      
      ====================
      selftests: mptcp: skip tests not supported by old kernels (part 3)
      
      After a few years of increasing test coverage in the MPTCP selftests, we
      realised [1] the last version of the selftests is supposed to run on old
      kernels without issues.
      
      Supporting older versions is not that easy for this MPTCP case: these
      selftests are often validating the internals by checking packets that
      are exchanged, when some MIB counters are incremented after some
      actions, how connections are getting opened and closed in some cases,
      etc. In other words, it is not limited to the socket interface between
      the userspace and the kernelspace.
      
      In addition to that, the current MPTCP selftests run a lot of different
      sub-tests but the TAP13 protocol used in the selftests don't support
      sub-tests: one failure in sub-tests implies that the whole selftest is
      seen as failed at the end because sub-tests are not tracked. It is then
      important to skip sub-tests not supported by old kernels.
      
      To minimise the modifications and reduce the complexity to support old
      versions, the idea is to look at external signs and skip the whole
      selftest or just some sub-tests before starting them. This cannot be
      applied in all cases.
      
      Similar to the second part, this third one focuses on marking different
      sub-tests as skipped if some MPTCP features are not supported. This
      time, only in "mptcp_join.sh" selftest, the remaining one, is modified.
      Several techniques are used here to achieve this task:
      
      - Before starting some tests:
      
        - Check if a file (sysctl knob) is present: that's what patch 12/17 is
          doing for the userspace PM feature.
      
        - Check if a required kernel symbol is present in /proc/kallsyms:
          patches 9, 10, 14 and 15/17 are using this technique.
      
        - Check if it is possible to setup a particular network environment
          requiring Netfilter or TC: if the preparation step fail, the linked
          sub-test is marked as skipped. Patch 5/17 is doing that.
      
        - Check if a MIB counter is available: patches 7 and 13/17 do that.
      
        - Check if the kernel version is newer than a specific one: patch 1/17
          adds some helpers in mptcp_lib.sh to ease its use. That's not ideal
          and it is only used as last resort but as mentioned above, it is
          important to skip tests if they are not supported not to have the
          whole selftest always being marked as failed on old kernels. Patches
          11 and 17/17 are checking the kernel version. An alternative would
          be to ignore the results for some sub-tests but that's not ideal
          too. Note that SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK env var can be
          set to 1 not to skip these tests if the running kernel doesn't have
          a supported version.
      
      - After having launched the tests:
      
        - Adapt the expectations depending on the presence of a kernel symbol
          (patch 6/17) or a kernel version (patch 8/17).
      
        - Check is a MIB counter is available and skip the verification if
          not. Patch 4/17 is using this technique.
      
      Before skipping tests, SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var
      value is checked: if it is set to 1, the test is marked as "failed"
      instead of "skipped". MPTCP public CI expects to have all features
      supported and it sets this env var to 1 to catch regressions in these
      new checks.
      
      Patch 2/17 uses 'iptables-legacy' if available because it might be
      needed when using an older kernel not supporting iptables-nft.
      
      Patch 3/17 adds some helpers used in the other patches mentioned to
      easily mark sub-tests as skipped.
      
      Patch 16/17 uniforms MPTCP Join "listener" tests: it was imported code
      from userspace_pm.sh but without using the "code style" and ways of
      using tools and printing messages from MPTCP Join selftest.
      
      Link: https://lore.kernel.org/stable/CA+G9fYtDGpgT4dckXD-y-N92nqUxuvue_7AtDdBcHrbOMsDZLg@mail.gmail.com/ [1]
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      ====================
      
      Link: https://lore.kernel.org/r/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fbf6f482
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip mixed tests if not supported · 6673851b
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of a mix of subflows in v4 and v6 by the
      in-kernel PM introduced by commit b9d69db8 ("mptcp: let the
      in-kernel PM use mixed IPv4 and IPv6 addresses").
      
      It looks like there is no external sign we can use to predict the
      expected behaviour. Instead of accepting different behaviours and thus
      not really checking for the expected behaviour, we are looking here for
      a specific kernel version. That's not ideal but it looks better than
      removing the test because it cannot support older kernel versions.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: ad349374 ("selftests: mptcp: add test-cases for mixed v4/v6 subflows")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6673851b
    • Matthieu Baerts's avatar
      selftests: mptcp: join: uniform listener tests · 96b84195
      Matthieu Baerts authored
      The alignment was different from the other tests because tabs were used
      instead of spaces.
      
      While at it, also use 'echo' instead of 'printf' to print the result to
      keep the same style as done in the other sub-tests. And, even if it
      should be better with, also remove 'stdbuf' and sed's '--unbuffered'
      option because they are not used in the other subtests and they are not
      available when using a minimal environment with busybox.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 178d0232 ("selftests: mptcp: listener test for in-kernel PM")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      96b84195
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip PM listener tests if not supported · 0471bb47
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of PM listener events introduced by commit
      f8c9dfbd ("mptcp: add pm listener events").
      
      It is possible to look for "mptcp_event_pm_listener" in kallsyms to know
      in advance if the kernel supports this feature.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 178d0232 ("selftests: mptcp: listener test for in-kernel PM")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0471bb47
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip MPC backups tests if not supported · 632978f0
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of sending an MP_PRIO signal for the initial
      subflow, introduced by commit c157bbe7 ("mptcp: allow the in kernel
      PM to set MPC subflow priority").
      
      It is possible to look for "mptcp_subflow_send_ack" in kallsyms because
      it was needed to introduce the mentioned feature. So we can know in
      advance if the feature is supported instead of trying and accepting any
      results.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 914f6a59 ("selftests: mptcp: add MPC backup tests")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      632978f0
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip fail tests if not supported · ff8897b5
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of the MP_FAIL / infinite mapping introduced
      by commit 1e39e5a3 ("mptcp: infinite mapping sending") and the
      following ones.
      
      It is possible to look for one of the infinite mapping counters to know
      in advance if the this feature is available.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: b6e074e1 ("selftests: mptcp: add infinite map testcase")
      Cc: stable@vger.kernel.org
      Fixes: 2ba18161 ("selftests: mptcp: add MP_FAIL reset testcase")
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ff8897b5
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip userspace PM tests if not supported · f2b492b0
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of the userspace PM introduced by commit
      4638de5a ("mptcp: handle local addrs announced by userspace PMs")
      and the following ones.
      
      It is possible to look for the MPTCP pm_type's sysctl knob to know in
      advance if the userspace PM is available.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 5ac1d2d6 ("selftests: mptcp: Add tests for userspace PM type")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f2b492b0
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip fullmesh flag tests if not supported · 9db34c42
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of the fullmesh flag for the in-kernel PM
      introduced by commit 2843ff6f ("mptcp: remote addresses fullmesh")
      and commit 1a0d6136 ("mptcp: local addresses fullmesh").
      
      It looks like there is no easy external sign we can use to predict the
      expected behaviour. We could add the flag and then check if it has been
      added but for that, and for each fullmesh test, we would need to setup a
      new environment, do the checks, clean it and then only start the test
      from yet another clean environment. To keep it simple and avoid
      introducing new issues, we look for a specific kernel version. That's
      not ideal but an acceptable solution for this case.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 6a0653b9 ("selftests: mptcp: add fullmesh setting tests")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9db34c42
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip backup if set flag on ID not supported · 07216a3c
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      Commit bccefb76 ("selftests: mptcp: simplify pm_nl_change_endpoint")
      has simplified the way the backup flag is set on an endpoint. Instead of
      doing:
      
        ./pm_nl_ctl set 10.0.2.1 flags backup
      
      Now we do:
      
        ./pm_nl_ctl set id 1 flags backup
      
      The new way is easier to maintain but it is also incompatible with older
      kernels not supporting the implicit endpoints putting in place the
      infrastructure to set flags per ID, hence the second Fixes tag.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: bccefb76 ("selftests: mptcp: simplify pm_nl_change_endpoint")
      Cc: stable@vger.kernel.org
      Fixes: 4cf86ae8 ("mptcp: strict local address ID selection")
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      07216a3c
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip implicit tests if not supported · 36c4127a
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of the implicit endpoints introduced by
      commit d045b9eb ("mptcp: introduce implicit endpoints").
      
      It is possible to look for "mptcp_subflow_send_ack" in kallsyms because
      it was needed to introduce the mentioned feature. So we can know in
      advance if the feature is supported instead of trying and accepting any
      results.
      
      Note that here and in the following commits, we re-do the same check for
      each sub-test of the same function for a few reasons. The main one is
      not to break the ID assign to each test in order to be able to easily
      compare results between different kernel versions. Also, we can still
      run a specific test even if it is skipped. Another reason is that it
      makes it clear during the review that a specific subtest will be skipped
      or not under certain conditions. At the end, it looks OK to call the
      exact same helper multiple times: it is not a critical path and it is
      the same code that is executed, not really more cases to maintain.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 69c6ce7b ("selftests: mptcp: add implicit endpoint test case")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      36c4127a
    • Matthieu Baerts's avatar
      selftests: mptcp: join: support RM_ADDR for used endpoints or not · 425ba803
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      At some points, a new feature caused internal behaviour changes we are
      verifying in the selftests, see the Fixes tag below. It was not a UAPI
      change but because in these selftests, we check some internal
      behaviours, it is normal we have to adapt them from time to time after
      having added some features.
      
      It looks like there is no external sign we can use to predict the
      expected behaviour. Instead of accepting different behaviours and thus
      not really checking for the expected behaviour, we are looking here for
      a specific kernel version. That's not ideal but it looks better than
      removing the test because it cannot support older kernel versions.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 6fa0174a ("mptcp: more careful RM_ADDR generation")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      425ba803
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip Fastclose tests if not supported · ae947bb2
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the support of MP_FASTCLOSE introduced in commit
      f284c0c7 ("mptcp: implement fastclose xmit path").
      
      If the MIB counter is not available, the test cannot be verified and the
      behaviour will not be the expected one. So we can skip the test if the
      counter is missing.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 01542c9b ("selftests: mptcp: add fastclose testcase")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ae947bb2
    • Matthieu Baerts's avatar
      selftests: mptcp: join: support local endpoint being tracked or not · d4c81bbb
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      At some points, a new feature caused internal behaviour changes we are
      verifying in the selftests, see the Fixes tag below. It was not a uAPI
      change but because in these selftests, we check some internal
      behaviours, it is normal we have to adapt them from time to time after
      having added some features.
      
      It is possible to look for "mptcp_pm_subflow_check_next" in kallsyms
      because it was needed to introduce the mentioned feature. So we can know
      in advance what the behaviour we are expecting here instead of
      supporting the two behaviours.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 86e39e04 ("mptcp: keep track of local endpoint still available for each msk")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d4c81bbb
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip test if iptables/tc cmds fail · 4a0b866a
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      Some tests are using IPTables and/or TC commands to force some
      behaviours. If one of these commands fails -- likely because some
      features are not available due to missing kernel config -- we should
      intercept the error and skip the tests requiring these features.
      
      Note that if we expect to have these features available and if
      SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests
      will be marked as failed instead of skipped.
      
      This patch also replaces the 'exit 1' by 'return 1' not to stop the
      selftest in the middle without the conclusion if there is an issue with
      NF or TC.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 8d014eaa ("selftests: mptcp: add ADD_ADDR timeout test case")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4a0b866a
    • Matthieu Baerts's avatar
      selftests: mptcp: join: skip check if MIB counter not supported · 47867f0a
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      One of them is the MPTCP MIB counters introduced in commit fc518953
      ("mptcp: add and use MIB counter infrastructure") and more later. The
      MPTCP Join selftest heavily relies on these counters.
      
      If a counter is not supported by the kernel, it is not displayed when
      using 'nstat -z'. We can then detect that and skip the verification. A
      new helper (get_counter()) has been added to do the required checks and
      return an error if the counter is not available.
      
      Note that if we expect to have these features available and if
      SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES env var is set to 1, the tests
      will be marked as failed instead of skipped.
      
      This new helper also makes sure we get the exact counter we want to
      avoid issues we had in the past, e.g. with MPTcpExtRmAddr and
      MPTcpExtRmAddrDrop sharing the same prefix. While at it, we uniform the
      way we fetch a MIB counter.
      
      Note for the backports: we rarely change these modified blocks so if
      there is are conflicts, it is very likely because a counter is not used
      in the older kernels and we don't need that chunk.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: b08fbf24 ("selftests: add test-cases for MPTCP MP_JOIN")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      47867f0a
    • Matthieu Baerts's avatar
      selftests: mptcp: join: helpers to skip tests · cdb50525
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      Here are some helpers that will be used to mark subtests as skipped if a
      feature is not supported. Marking as a fix for the commit introducing
      this selftest to help with the backports.
      
      While at it, also check if kallsyms feature is available as it will also
      be used in the following commits to check if MPTCP features are
      available before starting a test.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: b08fbf24 ("selftests: add test-cases for MPTCP MP_JOIN")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cdb50525
    • Matthieu Baerts's avatar
      selftests: mptcp: join: use 'iptables-legacy' if available · 0c4cd3f8
      Matthieu Baerts authored
      IPTables commands using 'iptables-nft' fail on old kernels, at least
      5.15 because it doesn't see the default IPTables chains:
      
        $ iptables -L
        iptables/1.8.2 Failed to initialize nft: Protocol not supported
      
      As a first step before switching to NFTables, we can use iptables-legacy
      if available.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 8d014eaa ("selftests: mptcp: add ADD_ADDR timeout test case")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c4cd3f8
    • Matthieu Baerts's avatar
      selftests: mptcp: lib: skip if not below kernel version · b1a6a38a
      Matthieu Baerts authored
      Selftests are supposed to run on any kernels, including the old ones not
      supporting all MPTCP features.
      
      A new function is now available to easily detect if a feature is
      missing by looking at the kernel version. That's clearly not ideal and
      this kind of check should be avoided as soon as possible. But sometimes,
      there are no external sign that a "feature" is available or not:
      internal behaviours can change without modifying the uAPI and these
      selftests are verifying the internal behaviours. Sometimes, the only
      (easy) way to verify if the feature is present is to run the test but
      then the validation cannot determine if there is a failure with the
      feature or if the feature is missing. Then it looks better to check the
      kernel version instead of having tests that can never fail. In any case,
      we need a solution not to have a whole selftest being marked as failed
      just because one sub-test has failed.
      
      Note that this env var car be set to 1 not to do such check and run the
      linked sub-test: SELFTESTS_MPTCP_LIB_NO_KVERSION_CHECK.
      
      This new helper is going to be used in the following commits. In order
      to ease the backport of such future patches, it would be good if this
      patch is backported up to the introduction of MPTCP selftests, hence the
      Fixes tag below: this type of check was supposed to be done from the
      beginning.
      
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
      Fixes: 048d19d4 ("mptcp: add basic kselftest for mptcp")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b1a6a38a
    • Jakub Kicinski's avatar
      Merge branch 'fixes-for-q-usgmii-speeds-and-autoneg' · 4d17beb6
      Jakub Kicinski authored
      Maxime Chevallier says:
      
      ====================
      fixes for Q-USGMII speeds and autoneg
      
      This is the second version of a small changeset for QUSGMII support,
      fixing inconsistencies in reported max speed and control word parsing.
      
      As reported here [1], there are some inconsistencies for the Q-USGMII
      mode speeds and configuration. The first patch in this fixup series
      makes so that we correctly report the max speed of 1Gbps for this mode.
      
      The second patch uses a dedicated helper to decode the control word.
      This is necessary as although USGMII control words are close to USXGMII,
      they don't support the same speeds.
      
      [1] : https://lore.kernel.org/netdev/ZHnd+6FUO77XFJvQ@shell.armlinux.org.uk/
      ====================
      
      Link: https://lore.kernel.org/r/20230609080305.546028-1-maxime.chevallier@bootlin.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4d17beb6
    • Maxime Chevallier's avatar
      net: phylink: use a dedicated helper to parse usgmii control word · 923454c0
      Maxime Chevallier authored
      Q-USGMII is a derivative of USGMII, that uses a specific formatting for
      the control word. The layout is close to the USXGMII control word, but
      doesn't support speeds over 1Gbps. Use a dedicated decoding logic for
      the USGMII control word, re-using USXGMII definitions but only considering
      10/100/1000Mbps speeds
      
      Fixes: 5e61fe15 ("net: phy: Introduce QUSGMII PHY mode")
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      923454c0