1. 29 Jul, 2023 7 commits
    • Eric Dumazet's avatar
      net: annotate data-races around sk->sk_reserved_mem · fe11fdcb
      Eric Dumazet authored
      sk_getsockopt() runs locklessly. This means sk->sk_reserved_mem
      can be read while other threads are changing its value.
      
      Add missing annotations where they are needed.
      
      Fixes: 2bb2f5fb ("net: add new socket option SO_RESERVE_MEM")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe11fdcb
    • Richard Gobert's avatar
      net: gro: fix misuse of CB in udp socket lookup · 7938cd15
      Richard Gobert authored
      This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to
      `udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the
      device from CB. The fix changes it to fetch the device from `skb->dev`.
      l3mdev case requires special attention since it has a master and a slave
      device.
      
      Fixes: a6024562 ("udp: Add GRO functions to UDP socket")
      Reported-by: default avatarGal Pressman <gal@nvidia.com>
      Signed-off-by: default avatarRichard Gobert <richardbgobert@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7938cd15
    • Konstantin Khorenko's avatar
      qed: Fix scheduling in a tasklet while getting stats · e346e231
      Konstantin Khorenko authored
      Here we've got to a situation when tasklet called usleep_range() in PTT
      acquire logic, thus welcome to the "scheduling while atomic" BUG().
      
        BUG: scheduling while atomic: swapper/24/0/0x00000100
      
         [<ffffffffb41c6199>] schedule+0x29/0x70
         [<ffffffffb41c5512>] schedule_hrtimeout_range_clock+0xb2/0x150
         [<ffffffffb41c55c3>] schedule_hrtimeout_range+0x13/0x20
         [<ffffffffb41c3bcf>] usleep_range+0x4f/0x70
         [<ffffffffc08d3e58>] qed_ptt_acquire+0x38/0x100 [qed]
         [<ffffffffc08eac48>] _qed_get_vport_stats+0x458/0x580 [qed]
         [<ffffffffc08ead8c>] qed_get_vport_stats+0x1c/0xd0 [qed]
         [<ffffffffc08dffd3>] qed_get_protocol_stats+0x93/0x100 [qed]
                              qed_mcp_send_protocol_stats
                  case MFW_DRV_MSG_GET_LAN_STATS:
                  case MFW_DRV_MSG_GET_FCOE_STATS:
                  case MFW_DRV_MSG_GET_ISCSI_STATS:
                  case MFW_DRV_MSG_GET_RDMA_STATS:
         [<ffffffffc08e36d8>] qed_mcp_handle_events+0x2d8/0x890 [qed]
                              qed_int_assertion
                              qed_int_attentions
         [<ffffffffc08d9490>] qed_int_sp_dpc+0xa50/0xdc0 [qed]
         [<ffffffffb3aa7623>] tasklet_action+0x83/0x140
         [<ffffffffb41d9125>] __do_softirq+0x125/0x2bb
         [<ffffffffb41d560c>] call_softirq+0x1c/0x30
         [<ffffffffb3a30645>] do_softirq+0x65/0xa0
         [<ffffffffb3aa78d5>] irq_exit+0x105/0x110
         [<ffffffffb41d8996>] do_IRQ+0x56/0xf0
      
      Fix this by making caller to provide the context whether it could be in
      atomic context flow or not when getting stats from QED driver.
      QED driver based on the context provided decide to schedule out or not
      when acquiring the PTT BAR window.
      
      We faced the BUG_ON() while getting vport stats, but according to the
      code same issue could happen for fcoe and iscsi statistics as well, so
      fixing them too.
      
      Fixes: 6c754246 ("qed: Add support for NCSI statistics.")
      Fixes: 1e128c81 ("qed: Add support for hardware offloaded FCoE.")
      Fixes: 2f2b2614 ("qed: Provide iSCSI statistics to management")
      Cc: Sudarsana Kalluru <skalluru@marvell.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Manish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarKonstantin Khorenko <khorenko@virtuozzo.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e346e231
    • Lukasz Majewski's avatar
      net: dsa: microchip: KSZ9477 register regmap alignment to 32 bit boundaries · 8d7ae22a
      Lukasz Majewski authored
      The commit (SHA1: 5c844d57) provided code
      to apply "Module 6: Certain PHY registers must be written as pairs instead
      of singly" errata for KSZ9477 as this chip for certain PHY registers
      (0xN120 to 0xN13F, N=1,2,3,4,5) must be accesses as 32 bit words instead
      of 16 or 8 bit access.
      Otherwise, adjacent registers (no matter if reserved or not) are
      overwritten with 0x0.
      
      Without this patch some registers (e.g. 0x113c or 0x1134) required for 32
      bit access are out of valid regmap ranges.
      
      As a result, following error is observed and KSZ9477 is not properly
      configured:
      
      ksz-switch spi1.0: can't rmw 32bit reg 0x113c: -EIO
      ksz-switch spi1.0: can't rmw 32bit reg 0x1134: -EIO
      ksz-switch spi1.0 lan1 (uninitialized): failed to connect to PHY: -EIO
      ksz-switch spi1.0 lan1 (uninitialized): error -5 setting up PHY for tree 0, switch 0, port 0
      
      The solution is to modify regmap_reg_range to allow accesses with 4 bytes
      boundaries.
      Signed-off-by: default avatarLukasz Majewski <lukma@denx.de>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d7ae22a
    • Thierry Reding's avatar
      net: stmmac: tegra: Properly allocate clock bulk data · a0b1b205
      Thierry Reding authored
      The clock data is an array of struct clk_bulk_data, so make sure to
      allocate enough memory.
      
      Fixes: d8ca1137 ("net: stmmac: tegra: Add MGBE support")
      Signed-off-by: default avatarThierry Reding <treding@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0b1b205
    • Chengfeng Ye's avatar
      mISDN: hfcpci: Fix potential deadlock on &hc->lock · 56c6be35
      Chengfeng Ye authored
      As &hc->lock is acquired by both timer _hfcpci_softirq() and hardirq
      hfcpci_int(), the timer should disable irq before lock acquisition
      otherwise deadlock could happen if the timmer is preemtped by the hadr irq.
      
      Possible deadlock scenario:
      hfcpci_softirq() (timer)
          -> _hfcpci_softirq()
          -> spin_lock(&hc->lock);
              <irq interruption>
              -> hfcpci_int()
              -> spin_lock(&hc->lock); (deadlock here)
      
      This flaw was found by an experimental static analysis tool I am developing
      for irq-related deadlock.
      
      The tentative patch fixes the potential deadlock by spin_lock_irq()
      in timer.
      
      Fixes: b36b654a ("mISDN: Create /sys/class/mISDN")
      Signed-off-by: default avatarChengfeng Ye <dg573847474@gmail.com>
      Link: https://lore.kernel.org/r/20230727085619.7419-1-dg573847474@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      56c6be35
    • Jamal Hadi Salim's avatar
      net: sched: cls_u32: Fix match key mis-addressing · e68409db
      Jamal Hadi Salim authored
      A match entry is uniquely identified with an "address" or "path" in the
      form of: hashtable ID(12b):bucketid(8b):nodeid(12b).
      
      When creating table match entries all of hash table id, bucket id and
      node (match entry id) are needed to be either specified by the user or
      reasonable in-kernel defaults are used. The in-kernel default for a table id is
      0x800(omnipresent root table); for bucketid it is 0x0. Prior to this fix there
      was none for a nodeid i.e. the code assumed that the user passed the correct
      nodeid and if the user passes a nodeid of 0 (as Mingi Cho did) then that is what
      was used. But nodeid of 0 is reserved for identifying the table. This is not
      a problem until we dump. The dump code notices that the nodeid is zero and
      assumes it is referencing a table and therefore references table struct
      tc_u_hnode instead of what was created i.e match entry struct tc_u_knode.
      
      Ming does an equivalent of:
      tc filter add dev dummy0 parent 10: prio 1 handle 0x1000 \
      protocol ip u32 match ip src 10.0.0.1/32 classid 10:1 action ok
      
      Essentially specifying a table id 0, bucketid 1 and nodeid of zero
      Tableid 0 is remapped to the default of 0x800.
      Bucketid 1 is ignored and defaults to 0x00.
      Nodeid was assumed to be what Ming passed - 0x000
      
      dumping before fix shows:
      ~$ tc filter ls dev dummy0 parent 10:
      filter protocol ip pref 1 u32 chain 0
      filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
      filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor -30591
      
      Note that the last line reports a table instead of a match entry
      (you can tell this because it says "ht divisor...").
      As a result of reporting the wrong data type (misinterpretting of struct
      tc_u_knode as being struct tc_u_hnode) the divisor is reported with value
      of -30591. Ming identified this as part of the heap address
      (physmap_base is 0xffff8880 (-30591 - 1)).
      
      The fix is to ensure that when table entry matches are added and no
      nodeid is specified (i.e nodeid == 0) then we get the next available
      nodeid from the table's pool.
      
      After the fix, this is what the dump shows:
      $ tc filter ls dev dummy0 parent 10:
      filter protocol ip pref 1 u32 chain 0
      filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
      filter protocol ip pref 1 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 10:1 not_in_hw
        match 0a000001/ffffffff at 12
      	action order 1: gact action pass
      	 random type none pass val 0
      	 index 1 ref 1 bind 1
      Reported-by: default avatarMingi Cho <mgcho.minic@gmail.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20230726135151.416917-1-jhs@mojatatu.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e68409db
  2. 28 Jul, 2023 7 commits
    • Eugen Hristev's avatar
      dt-bindings: net: rockchip-dwmac: fix {tx|rx}-delay defaults/range in schema · 5416d792
      Eugen Hristev authored
      The range and the defaults are specified in the description instead of
      being specified in the schema.
      Fix it by adding the default value in the `default` field and specifying
      the range as `minimum` and `maximum`.
      
      Fixes: b331b8ef ("dt-bindings: net: convert rockchip-dwmac to json-schema")
      Signed-off-by: default avatarEugen Hristev <eugen.hristev@collabora.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5416d792
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-fixes-2023-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 4a082260
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2023-07-26
      
      This series provides bug fixes to mlx5 driver.
      
      * tag 'mlx5-fixes-2023-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5: Unregister devlink params in case interface is down
        net/mlx5: DR, Fix peer domain namespace setting
        net/mlx5: fs_chains: Fix ft prio if ignore_flow_level is not supported
        net/mlx5e: kTLS, Fix protection domain in use syndrome when devlink reload
        net/mlx5: Bridge, set debugfs access right to root-only
        net/mlx5e: xsk: Fix crash on regular rq reactivation
        net/mlx5e: xsk: Fix invalid buffer access for legacy rq
        net/mlx5e: Move representor neigh cleanup to profile cleanup_tx
        net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set
        net/mlx5e: Don't hold encap tbl lock if there is no encap action
        net/mlx5: Honor user input for migratable port fn attr
        net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
        net/mlx5: fix potential memory leak in mlx5e_init_rep_rx
        net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
        net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups
      ====================
      
      Link: https://lore.kernel.org/r/20230726213206.47022-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4a082260
    • Yuanjun Gong's avatar
      net: dsa: fix value check in bcm_sf2_sw_probe() · dadc5b86
      Yuanjun Gong authored
      in bcm_sf2_sw_probe(), check the return value of clk_prepare_enable()
      and return the error code if clk_prepare_enable() returns an
      unexpected value.
      
      Fixes: e9ec5c3b ("net: dsa: bcm_sf2: request and handle clocks")
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Link: https://lore.kernel.org/r/20230726170506.16547-1-ruc_gongyuanjun@163.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dadc5b86
    • Eric Dumazet's avatar
      net: flower: fix stack-out-of-bounds in fl_set_key_cfm() · 4d50e500
      Eric Dumazet authored
      Typical misuse of
      
      	nla_parse_nested(array, XXX_MAX, ...);
      
      array must be declared as
      
      	struct nlattr *array[XXX_MAX + 1];
      
      v2: Based on feedbacks from Ido Schimmel and Zahari Doychev,
      I also changed TCA_FLOWER_KEY_CFM_OPT_MAX and cfm_opt_policy
      definitions.
      
      syzbot reported:
      
      BUG: KASAN: stack-out-of-bounds in __nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588
      Write of size 32 at addr ffffc90003a0ee20 by task syz-executor296/5014
      
      CPU: 0 PID: 5014 Comm: syz-executor296 Not tainted 6.5.0-rc2-syzkaller-00307-gd192f538 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2023
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
      print_address_description mm/kasan/report.c:364 [inline]
      print_report+0x163/0x540 mm/kasan/report.c:475
      kasan_report+0x175/0x1b0 mm/kasan/report.c:588
      kasan_check_range+0x27e/0x290 mm/kasan/generic.c:187
      __asan_memset+0x23/0x40 mm/kasan/shadow.c:84
      __nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588
      __nla_parse+0x40/0x50 lib/nlattr.c:700
      nla_parse_nested include/net/netlink.h:1262 [inline]
      fl_set_key_cfm+0x1e3/0x440 net/sched/cls_flower.c:1718
      fl_set_key+0x2168/0x6620 net/sched/cls_flower.c:1884
      fl_tmplt_create+0x1fe/0x510 net/sched/cls_flower.c:2666
      tc_chain_tmplt_add net/sched/cls_api.c:2959 [inline]
      tc_ctl_chain+0x131d/0x1ac0 net/sched/cls_api.c:3068
      rtnetlink_rcv_msg+0x82b/0xf50 net/core/rtnetlink.c:6424
      netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2549
      netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
      netlink_unicast+0x7c3/0x990 net/netlink/af_netlink.c:1365
      netlink_sendmsg+0xa2a/0xd60 net/netlink/af_netlink.c:1914
      sock_sendmsg_nosec net/socket.c:725 [inline]
      sock_sendmsg net/socket.c:748 [inline]
      ____sys_sendmsg+0x592/0x890 net/socket.c:2494
      ___sys_sendmsg net/socket.c:2548 [inline]
      __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2577
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f54c6150759
      Code: 48 83 c4 28 c3 e8 d7 19 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffe06c30578 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f54c619902d RCX: 00007f54c6150759
      RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
      RBP: 00007ffe06c30590 R08: 0000000000000000 R09: 00007ffe06c305f0
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f54c61c35f0
      R13: 00007ffe06c30778 R14: 0000000000000001 R15: 0000000000000001
      </TASK>
      
      The buggy address belongs to stack of task syz-executor296/5014
      and is located at offset 32 in frame:
      fl_set_key_cfm+0x0/0x440 net/sched/cls_flower.c:374
      
      This frame has 1 object:
      [32, 56) 'nla_cfm_opt'
      
      The buggy address belongs to the virtual mapping at
      [ffffc90003a08000, ffffc90003a11000) created by:
      copy_process+0x5c8/0x4290 kernel/fork.c:2330
      
      Fixes: 7cfffd5f ("net: flower: add support for matching cfm fields")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Simon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarZahari Doychev <zdoychev@maxlinear.com>
      Link: https://lore.kernel.org/r/20230726145815.943910-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4d50e500
    • Jakub Kicinski's avatar
      MAINTAINERS: stmmac: retire Giuseppe Cavallaro · fa467226
      Jakub Kicinski authored
      I tried to get stmmac maintainers to be more active by agreeing with
      them off-list on a review rotation. I pinged Peppe 3 times over 2 weeks
      during his "shift month", no reviews are flowing.
      
      All the contributions are much appreciated! But stmmac is quite
      active, we need participating maintainers :(
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230726151120.1649474-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fa467226
    • Russell King (Oracle)'s avatar
      net: dsa: fix older DSA drivers using phylink · 9945c1fb
      Russell King (Oracle) authored
      Older DSA drivers that do not provide an dsa_ops adjust_link method end
      up using phylink. Unfortunately, a recent phylink change that requires
      its supported_interfaces bitmap to be filled breaks these drivers
      because the bitmap remains empty.
      
      Rather than fixing each driver individually, fix it in the core code so
      we have a sensible set of defaults.
      Reported-by: default avatarSergei Antonov <saproj@gmail.com>
      Fixes: de5c9bf4 ("net: phylink: require supported_interfaces to be filled")
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Tested-by: Vladimir Oltean <olteanv@gmail.com> # dsa_loop
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Link: https://lore.kernel.org/r/E1qOflM-001AEz-D3@rmk-PC.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9945c1fb
    • Lin Ma's avatar
      rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length · d73ef2d6
      Lin Ma authored
      There are totally 9 ndo_bridge_setlink handlers in the current kernel,
      which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3)
      i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5)
      ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7)
      nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink.
      
      By investigating the code, we find that 1-7 parse and use nlattr
      IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can
      lead to an out-of-attribute read and allow a malformed nlattr (e.g.,
      length 0) to be viewed as a 2 byte integer.
      
      To avoid such issues, also for other ndo_bridge_setlink handlers in the
      future. This patch adds the nla_len check in rtnl_bridge_setlink and
      does an early error return if length mismatches. To make it works, the
      break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure
      this nla_for_each_nested iterates every attribute.
      
      Fixes: b1edc14a ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
      Fixes: 51616018 ("i40e: Add support for getlink, setlink ndo ops")
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://lore.kernel.org/r/20230726075314.1059224-1-linma@zju.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d73ef2d6
  3. 27 Jul, 2023 15 commits
  4. 26 Jul, 2023 11 commits
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-more-fixes-for-6-5' · 2e3c5df2
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: More fixes for 6.5
      
      Patch 1: Better detection of ip6tables vs ip6tables-legacy tools for
      self tests. Fix for 6.4 and newer.
      
      Patch 2: Only generate "new listener" event if listen operation
      succeeds. Fix for 6.2 and newer.
      ====================
      
      Link: https://lore.kernel.org/r/20230725-send-net-20230725-v1-0-6f60fe7137a9@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2e3c5df2
    • Paolo Abeni's avatar
      mptcp: more accurate NL event generation · 21d9b73a
      Paolo Abeni authored
      Currently the mptcp code generate a "new listener" event even
      if the actual listen() syscall fails. Address the issue moving
      the event generation call under the successful branch.
      
      Cc: stable@vger.kernel.org
      Fixes: f8c9dfbd ("mptcp: add pm listener events")
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Link: https://lore.kernel.org/r/20230725-send-net-20230725-v1-2-6f60fe7137a9@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      21d9b73a
    • Matthieu Baerts's avatar
      selftests: mptcp: join: only check for ip6tables if needed · 016e7ba4
      Matthieu Baerts authored
      If 'iptables-legacy' is available, 'ip6tables-legacy' command will be
      used instead of 'ip6tables'. So no need to look if 'ip6tables' is
      available in this case.
      
      Cc: stable@vger.kernel.org
      Fixes: 0c4cd3f8 ("selftests: mptcp: join: use 'iptables-legacy' if available")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Link: https://lore.kernel.org/r/20230725-send-net-20230725-v1-1-6f60fe7137a9@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      016e7ba4
    • Shay Drory's avatar
      net/mlx5: Unregister devlink params in case interface is down · 53d737df
      Shay Drory authored
      Currently, in case an interface is down, mlx5 driver doesn't
      unregister its devlink params, which leads to this WARN[1].
      Fix it by unregistering devlink params in that case as well.
      
      [1]
      [  295.244769 ] WARNING: CPU: 15 PID: 1 at net/core/devlink.c:9042 devlink_free+0x174/0x1fc
      [  295.488379 ] CPU: 15 PID: 1 Comm: shutdown Tainted: G S         OE 5.15.0-1017.19.3.g0677e61-bluefield #g0677e61
      [  295.509330 ] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.2.0.12761 Jun  6 2023
      [  295.543096 ] pc : devlink_free+0x174/0x1fc
      [  295.551104 ] lr : mlx5_devlink_free+0x18/0x2c [mlx5_core]
      [  295.561816 ] sp : ffff80000809b850
      [  295.711155 ] Call trace:
      [  295.716030 ]  devlink_free+0x174/0x1fc
      [  295.723346 ]  mlx5_devlink_free+0x18/0x2c [mlx5_core]
      [  295.733351 ]  mlx5_sf_dev_remove+0x98/0xb0 [mlx5_core]
      [  295.743534 ]  auxiliary_bus_remove+0x2c/0x50
      [  295.751893 ]  __device_release_driver+0x19c/0x280
      [  295.761120 ]  device_release_driver+0x34/0x50
      [  295.769649 ]  bus_remove_device+0xdc/0x170
      [  295.777656 ]  device_del+0x17c/0x3a4
      [  295.784620 ]  mlx5_sf_dev_remove+0x28/0xf0 [mlx5_core]
      [  295.794800 ]  mlx5_sf_dev_table_destroy+0x98/0x110 [mlx5_core]
      [  295.806375 ]  mlx5_unload+0x34/0xd0 [mlx5_core]
      [  295.815339 ]  mlx5_unload_one+0x70/0xe4 [mlx5_core]
      [  295.824998 ]  shutdown+0xb0/0xd8 [mlx5_core]
      [  295.833439 ]  pci_device_shutdown+0x3c/0xa0
      [  295.841651 ]  device_shutdown+0x170/0x340
      [  295.849486 ]  __do_sys_reboot+0x1f4/0x2a0
      [  295.857322 ]  __arm64_sys_reboot+0x2c/0x40
      [  295.865329 ]  invoke_syscall+0x78/0x100
      [  295.872817 ]  el0_svc_common.constprop.0+0x54/0x184
      [  295.882392 ]  do_el0_svc+0x30/0xac
      [  295.889008 ]  el0_svc+0x48/0x160
      [  295.895278 ]  el0t_64_sync_handler+0xa4/0x130
      [  295.903807 ]  el0t_64_sync+0x1a4/0x1a8
      [  295.911120 ] ---[ end trace 4f1d2381d00d9dce  ]---
      
      Fixes: fe578cbb ("net/mlx5: Move devlink registration before mlx5_load")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      53d737df
    • Shay Drory's avatar
      net/mlx5: DR, Fix peer domain namespace setting · 62752c0b
      Shay Drory authored
      The offending patch is based on the assumption that for PFs,
      mlx5_get_dev_index() is the same as vhca_id. However, this assumption
      is wrong in case of DPU (ECPF).
      Fix it by using vhca_id directly, and switch the array of peers to
      xarray.
      
      Fixes: 6d5b7321 ("net/mlx5: DR, handle more than one peer domain")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      62752c0b
    • Chris Mi's avatar
      net/mlx5: fs_chains: Fix ft prio if ignore_flow_level is not supported · 61eab651
      Chris Mi authored
      The cited commit sets ft prio to fs_base_prio. But if
      ignore_flow_level it not supported, ft prio must be set based on
      tc filter prio. Otherwise, all the ft prio are the same on the same
      chain. It is invalid if ignore_flow_level is not supported.
      
      Fix it by setting ft prio based on tc filter prio and setting
      fs_base_prio to 0 for fdb.
      
      Fixes: 8e80e564 ("net/mlx5: fs_chains: Refactor to detach chains from tc usage")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      61eab651
    • Jianbo Liu's avatar
      net/mlx5e: kTLS, Fix protection domain in use syndrome when devlink reload · 3e4cf1dd
      Jianbo Liu authored
      There are DEK objects cached in DEK pool after kTLS is used, and they
      are freed only in mlx5e_ktls_cleanup().
      
      mlx5e_destroy_mdev_resources() is called in mlx5e_suspend() to
      free mdev resources, including protection domain (PD). However, PD is
      still referenced by the cached DEK objects in this case, because
      profile->cleanup() (and therefore mlx5e_ktls_cleanup()) is called
      after mlx5e_suspend() during devlink reload. So the following FW
      syndrome is generated:
      
       mlx5_cmd_out_err:803:(pid 12948): DEALLOC_PD(0x801) op_mod(0x0) failed,
          status bad resource state(0x9), syndrome (0xef0c8a), err(-22)
      
      To avoid this syndrome, move DEK pool destruction to
      mlx5e_ktls_cleanup_tx(), which is called by profile->cleanup_tx(). And
      move pool creation to mlx5e_ktls_init_tx() for symmetry.
      
      Fixes: f741db1a ("net/mlx5e: kTLS, Improve connection rate by using fast update encryption key")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3e4cf1dd
    • Vlad Buslov's avatar
      net/mlx5: Bridge, set debugfs access right to root-only · eb02b93a
      Vlad Buslov authored
      As suggested during code review set the access rights for bridge 'fdb'
      debugfs file to root-only.
      
      Fixes: 791eb782 ("net/mlx5: Bridge, expose FDB state via debugfs")
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/netdev/20230619120515.5045132a@kernel.org/Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      eb02b93a
    • Dragos Tatulea's avatar
      net/mlx5e: xsk: Fix crash on regular rq reactivation · 39646d9b
      Dragos Tatulea authored
      When the regular rq is reactivated after the XSK socket is closed
      it could be reading stale cqes which eventually corrupts the rq.
      This leads to no more traffic being received on the regular rq and a
      crash on the next close or deactivation of the rq.
      
      Kal Cuttler Conely reported this issue as a crash on the release
      path when the xdpsock sample program is stopped (killed) and restarted
      in sequence while traffic is running.
      
      This patch flushes all cqes when during the rq flush. The cqe flushing
      is done in the reset state of the rq. mlx5e_rq_to_ready code is moved
      into the flush function to allow for this.
      
      Fixes: 082a9edf ("net/mlx5e: xsk: Flush RQ on XSK activation to save memory")
      Reported-by: default avatarKal Cutter Conley <kal.conley@dectris.com>
      Closes: https://lore.kernel.org/xdp-newbies/CAHApi-nUAs4TeFWUDV915CZJo07XVg2Vp63-no7UDfj6wur9nQ@mail.gmail.comSigned-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      39646d9b
    • Dragos Tatulea's avatar
      net/mlx5e: xsk: Fix invalid buffer access for legacy rq · e0f52298
      Dragos Tatulea authored
      The below crash can be encountered when using xdpsock in rx mode for
      legacy rq: the buffer gets released in the XDP_REDIRECT path, and then
      once again in the driver. This fix sets the flag to avoid releasing on
      the driver side.
      
      XSK handling of buffers for legacy rq was relying on the caller to set
      the skip release flag. But the referenced fix started using fragment
      counts for pages instead of the skip flag.
      
      Crash log:
       general protection fault, probably for non-canonical address 0xffff8881217e3a: 0000 [#1] SMP
       CPU: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 6.5.0-rc1+ #31
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:bpf_prog_03b13f331978c78c+0xf/0x28
       Code:  ...
       RSP: 0018:ffff88810082fc98 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffff888138404901 RCX: c0ffffc900027cbc
       RDX: ffffffffa000b514 RSI: 00ffff8881217e32 RDI: ffff888138404901
       RBP: ffff88810082fc98 R08: 0000000000091100 R09: 0000000000000006
       R10: 0000000000000800 R11: 0000000000000800 R12: ffffc9000027a000
       R13: ffff8881217e2dc0 R14: ffff8881217e2910 R15: ffff8881217e2f00
       FS:  0000000000000000(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000564cb2e2cde0 CR3: 000000010e603004 CR4: 0000000000370eb0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        <TASK>
        ? die_addr+0x32/0x80
        ? exc_general_protection+0x192/0x390
        ? asm_exc_general_protection+0x22/0x30
        ? 0xffffffffa000b514
        ? bpf_prog_03b13f331978c78c+0xf/0x28
        mlx5e_xdp_handle+0x48/0x670 [mlx5_core]
        ? dev_gro_receive+0x3b5/0x6e0
        mlx5e_xsk_skb_from_cqe_linear+0x6e/0x90 [mlx5_core]
        mlx5e_handle_rx_cqe+0x55/0x100 [mlx5_core]
        mlx5e_poll_rx_cq+0x87/0x6e0 [mlx5_core]
        mlx5e_napi_poll+0x45e/0x6b0 [mlx5_core]
        __napi_poll+0x25/0x1a0
        net_rx_action+0x28a/0x300
        __do_softirq+0xcd/0x279
        ? sort_range+0x20/0x20
        run_ksoftirqd+0x1a/0x20
        smpboot_thread_fn+0xa2/0x130
        kthread+0xc9/0xf0
        ? kthread_complete_and_exit+0x20/0x20
        ret_from_fork+0x1f/0x30
        </TASK>
       Modules linked in: mlx5_ib mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter overlay zram zsmalloc fuse [last unloaded: mlx5_core]
       ---[ end trace 0000000000000000 ]---
      
      Fixes: 7abd955a ("net/mlx5e: RX, Fix page_pool page fragment tracking for XDP")
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e0f52298
    • Jianbo Liu's avatar
      net/mlx5e: Move representor neigh cleanup to profile cleanup_tx · d03b6e6f
      Jianbo Liu authored
      For IP tunnel encapsulation in ECMP (Equal-Cost Multipath) mode, as
      the flow is duplicated to the peer eswitch, the related neighbour
      information on the peer uplink representor is created as well.
      
      In the cited commit, eswitch devcom unpair is moved to uplink unload
      API, specifically the profile->cleanup_tx. If there is a encap rule
      offloaded in ECMP mode, when one eswitch does unpair (because of
      unloading the driver, for instance), and the peer rule from the peer
      eswitch is going to be deleted, the use-after-free error is triggered
      while accessing neigh info, as it is already cleaned up in uplink's
      profile->disable, which is before its profile->cleanup_tx.
      
      To fix this issue, move the neigh cleanup to profile's cleanup_tx
      callback, and after mlx5e_cleanup_uplink_rep_tx is called. The neigh
      init is moved to init_tx for symmeter.
      
      [ 2453.376299] BUG: KASAN: slab-use-after-free in mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
      [ 2453.379125] Read of size 4 at addr ffff888127af9008 by task modprobe/2496
      
      [ 2453.381542] CPU: 7 PID: 2496 Comm: modprobe Tainted: G    B              6.4.0-rc7+ #15
      [ 2453.383386] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [ 2453.384335] Call Trace:
      [ 2453.384625]  <TASK>
      [ 2453.384891]  dump_stack_lvl+0x33/0x50
      [ 2453.385285]  print_report+0xc2/0x610
      [ 2453.385667]  ? __virt_addr_valid+0xb1/0x130
      [ 2453.386091]  ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
      [ 2453.386757]  kasan_report+0xae/0xe0
      [ 2453.387123]  ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
      [ 2453.387798]  mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
      [ 2453.388465]  mlx5e_rep_encap_entry_detach+0xa6/0xe0 [mlx5_core]
      [ 2453.389111]  mlx5e_encap_dealloc+0xa7/0x100 [mlx5_core]
      [ 2453.389706]  mlx5e_tc_tun_encap_dests_unset+0x61/0xb0 [mlx5_core]
      [ 2453.390361]  mlx5_free_flow_attr_actions+0x11e/0x340 [mlx5_core]
      [ 2453.391015]  ? complete_all+0x43/0xd0
      [ 2453.391398]  ? free_flow_post_acts+0x38/0x120 [mlx5_core]
      [ 2453.392004]  mlx5e_tc_del_fdb_flow+0x4ae/0x690 [mlx5_core]
      [ 2453.392618]  mlx5e_tc_del_fdb_peers_flow+0x308/0x370 [mlx5_core]
      [ 2453.393276]  mlx5e_tc_clean_fdb_peer_flows+0xf5/0x140 [mlx5_core]
      [ 2453.393925]  mlx5_esw_offloads_unpair+0x86/0x540 [mlx5_core]
      [ 2453.394546]  ? mlx5_esw_offloads_set_ns_peer.isra.0+0x180/0x180 [mlx5_core]
      [ 2453.395268]  ? down_write+0xaa/0x100
      [ 2453.395652]  mlx5_esw_offloads_devcom_event+0x203/0x530 [mlx5_core]
      [ 2453.396317]  mlx5_devcom_send_event+0xbb/0x190 [mlx5_core]
      [ 2453.396917]  mlx5_esw_offloads_devcom_cleanup+0xb0/0xd0 [mlx5_core]
      [ 2453.397582]  mlx5e_tc_esw_cleanup+0x42/0x120 [mlx5_core]
      [ 2453.398182]  mlx5e_rep_tc_cleanup+0x15/0x30 [mlx5_core]
      [ 2453.398768]  mlx5e_cleanup_rep_tx+0x6c/0x80 [mlx5_core]
      [ 2453.399367]  mlx5e_detach_netdev+0xee/0x120 [mlx5_core]
      [ 2453.399957]  mlx5e_netdev_change_profile+0x84/0x170 [mlx5_core]
      [ 2453.400598]  mlx5e_vport_rep_unload+0xe0/0xf0 [mlx5_core]
      [ 2453.403781]  mlx5_eswitch_unregister_vport_reps+0x15e/0x190 [mlx5_core]
      [ 2453.404479]  ? mlx5_eswitch_register_vport_reps+0x200/0x200 [mlx5_core]
      [ 2453.405170]  ? up_write+0x39/0x60
      [ 2453.405529]  ? kernfs_remove_by_name_ns+0xb7/0xe0
      [ 2453.405985]  auxiliary_bus_remove+0x2e/0x40
      [ 2453.406405]  device_release_driver_internal+0x243/0x2d0
      [ 2453.406900]  ? kobject_put+0x42/0x2d0
      [ 2453.407284]  bus_remove_device+0x128/0x1d0
      [ 2453.407687]  device_del+0x240/0x550
      [ 2453.408053]  ? waiting_for_supplier_show+0xe0/0xe0
      [ 2453.408511]  ? kobject_put+0xfa/0x2d0
      [ 2453.408889]  ? __kmem_cache_free+0x14d/0x280
      [ 2453.409310]  mlx5_rescan_drivers_locked.part.0+0xcd/0x2b0 [mlx5_core]
      [ 2453.409973]  mlx5_unregister_device+0x40/0x50 [mlx5_core]
      [ 2453.410561]  mlx5_uninit_one+0x3d/0x110 [mlx5_core]
      [ 2453.411111]  remove_one+0x89/0x130 [mlx5_core]
      [ 2453.411628]  pci_device_remove+0x59/0xf0
      [ 2453.412026]  device_release_driver_internal+0x243/0x2d0
      [ 2453.412511]  ? parse_option_str+0x14/0x90
      [ 2453.412915]  driver_detach+0x7b/0xf0
      [ 2453.413289]  bus_remove_driver+0xb5/0x160
      [ 2453.413685]  pci_unregister_driver+0x3f/0xf0
      [ 2453.414104]  mlx5_cleanup+0xc/0x20 [mlx5_core]
      
      Fixes: 2be5bd42 ("net/mlx5: Handle pairing of E-switch via uplink un/load APIs")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d03b6e6f