1. 23 Nov, 2022 16 commits
  2. 22 Nov, 2022 24 commits
    • Gerhard Engleder's avatar
      tsnep: Fix rotten packets · 2dc4ac91
      Gerhard Engleder authored
      If PTP synchronisation is done every second, then sporadic the interval
      is higher than one second:
      
      ptp4l[696.582]: master offset        -17 s2 freq   -1891 path delay 573
      ptp4l[697.582]: master offset        -22 s2 freq   -1901 path delay 573
      ptp4l[699.368]: master offset         -1 s2 freq   -1887 path delay 573
            ^^^^^^^ Should be 698.582!
      
      This problem is caused by rotten packets, which are received after
      polling but before interrupts are enabled again. This can be fixed by
      checking for pending work and rescheduling if necessary after interrupts
      has been enabled again.
      
      Fixes: 403f69bb ("tsnep: Add TSN endpoint Ethernet MAC driver")
      Signed-off-by: default avatarGerhard Engleder <gerhard@engleder-embedded.com>
      Link: https://lore.kernel.org/r/20221119211825.81805-1-gerhard@engleder-embedded.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2dc4ac91
    • Zheng Bin's avatar
      octeontx2-pf: Remove duplicate MACSEC setting · bb3cfbaf
      Zheng Bin authored
      Commit 4581dd48 ("net: octeontx2-pf: mcs: consider MACSEC setting")
      has already added "depends on MACSEC || !MACSEC", so remove it.
      Signed-off-by: default avatarZheng Bin <zhengbin13@huawei.com>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Link: https://lore.kernel.org/r/20221119133616.3583538-1-zhengbin13@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bb3cfbaf
    • Yang Yingliang's avatar
      bnx2x: fix pci device refcount leak in bnx2x_vf_is_pcie_pending() · 3637a29c
      Yang Yingliang authored
      As comment of pci_get_domain_bus_and_slot() says, it returns
      a pci device with refcount increment, when finish using it,
      the caller must decrement the reference count by calling
      pci_dev_put(). Call pci_dev_put() before returning from
      bnx2x_vf_is_pcie_pending() to avoid refcount leak.
      
      Fixes: b56e9670 ("bnx2x: Prepare device and initialize VF database")
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20221119070202.1407648-1-yangyingliang@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3637a29c
    • Liu Shixin's avatar
      NFC: nci: fix memory leak in nci_rx_data_packet() · 53270fb0
      Liu Shixin authored
      Syzbot reported a memory leak about skb:
      
      unreferenced object 0xffff88810e144e00 (size 240):
        comm "syz-executor284", pid 3701, jiffies 4294952403 (age 12.620s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff83ab79a9>] __alloc_skb+0x1f9/0x270 net/core/skbuff.c:497
          [<ffffffff82a5cf64>] alloc_skb include/linux/skbuff.h:1267 [inline]
          [<ffffffff82a5cf64>] virtual_ncidev_write+0x24/0xe0 drivers/nfc/virtual_ncidev.c:116
          [<ffffffff815f6503>] do_loop_readv_writev fs/read_write.c:759 [inline]
          [<ffffffff815f6503>] do_loop_readv_writev fs/read_write.c:743 [inline]
          [<ffffffff815f6503>] do_iter_write+0x253/0x300 fs/read_write.c:863
          [<ffffffff815f66ed>] vfs_writev+0xdd/0x240 fs/read_write.c:934
          [<ffffffff815f68f6>] do_writev+0xa6/0x1c0 fs/read_write.c:977
          [<ffffffff848802d5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<ffffffff848802d5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
          [<ffffffff84a00087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      In nci_rx_data_packet(), if we don't get a valid conn_info, we will return
      directly but forget to release the skb.
      
      Reported-by: syzbot+cdb9a427d1bc08815104@syzkaller.appspotmail.com
      Fixes: 4aeee687 ("NFC: nci: Add dynamic logical connections support")
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Link: https://lore.kernel.org/r/20221118082419.239475-1-liushixin2@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      53270fb0
    • Xin Long's avatar
      net: sched: allow act_ct to be built without NF_NAT · 8427fd10
      Xin Long authored
      In commit f11fe1da ("net/sched: Make NET_ACT_CT depends on NF_NAT"),
      it fixed the build failure when NF_NAT is m and NET_ACT_CT is y by
      adding depends on NF_NAT for NET_ACT_CT. However, it would also cause
      NET_ACT_CT cannot be built without NF_NAT, which is not expected. This
      patch fixes it by changing to use "(!NF_NAT || NF_NAT)" as the depend.
      
      Fixes: f11fe1da ("net/sched: Make NET_ACT_CT depends on NF_NAT")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Link: https://lore.kernel.org/r/b6386f28d1ba34721795fb776a91cbdabb203447.1668807183.git.lucien.xin@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8427fd10
    • Liu Jian's avatar
      net: sparx5: fix error handling in sparx5_port_open() · 4305fe23
      Liu Jian authored
      If phylink_of_phy_connect() fails, the port should be disabled.
      If sparx5_serdes_set()/phy_power_on() fails, the port should be
      disabled and the phylink should be stopped and disconnected.
      
      Fixes: 946e7fd5 ("net: sparx5: add port module support")
      Fixes: f3cad261 ("net: sparx5: add hostmode with phylink support")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Tested-by: default avatarBjarni Jonasson <bjarni.jonasson@microchip.com>
      Reviewed-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Link: https://lore.kernel.org/r/20221117125918.203997-1-liujian56@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4305fe23
    • Zhang Changzhong's avatar
      sfc: fix potential memleak in __ef100_hard_start_xmit() · aad98abd
      Zhang Changzhong authored
      The __ef100_hard_start_xmit() returns NETDEV_TX_OK without freeing skb
      in error handling case, add dev_kfree_skb_any() to fix it.
      
      Fixes: 51b35a45 ("sfc: skeleton EF100 PF driver")
      Signed-off-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Acked-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/1668671409-10909-1-git-send-email-zhangchangzhong@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      aad98abd
    • Wang ShaoBo's avatar
      net: wwan: iosm: use ACPI_FREE() but not kfree() in ipc_pcie_read_bios_cfg() · e541dd77
      Wang ShaoBo authored
      acpi_evaluate_dsm() should be coupled with ACPI_FREE() to free the ACPI
      memory, because we need to track the allocation of acpi_object when
      ACPI_DBG_TRACK_ALLOCATIONS enabled, so use ACPI_FREE() instead of kfree().
      
      Fixes: d38a648d ("net: wwan: iosm: fix memory leak in ipc_pcie_read_bios_cfg")
      Signed-off-by: default avatarWang ShaoBo <bobo.shaobowang@huawei.com>
      Link: https://lore.kernel.org/r/20221118062447.2324881-1-bobo.shaobowang@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e541dd77
    • Jacob Keller's avatar
      ice: fix handling of burst Tx timestamps · 30f15874
      Jacob Keller authored
      Commit 1229b339 ("ice: Add low latency Tx timestamp read") refactored
      PTP timestamping logic to use a threaded IRQ instead of a separate kthread.
      
      This implementation introduced ice_misc_intr_thread_fn and redefined the
      ice_ptp_process_ts function interface to return a value of whether or not
      the timestamp processing was complete.
      
      ice_misc_intr_thread_fn would take the return value from ice_ptp_process_ts
      and convert it into either IRQ_HANDLED if there were no more timestamps to
      be processed, or IRQ_WAKE_THREAD if the thread should continue processing.
      
      This is not correct, as the kernel does not re-schedule threaded IRQ
      functions automatically. IRQ_WAKE_THREAD can only be used by the main IRQ
      function.
      
      This results in the ice_ptp_process_ts function (and in turn the
      ice_ptp_tx_tstamp function) from only being called exactly once per
      interrupt.
      
      If an application sends a burst of Tx timestamps without waiting for a
      response, the interrupt will trigger for the first timestamp. However,
      later timestamps may not have arrived yet. This can result in dropped or
      discarded timestamps. Worse, on E822 hardware this results in the interrupt
      logic getting stuck such that no future interrupts will be triggered. The
      result is complete loss of Tx timestamp functionality.
      
      Fix this by modifying the ice_misc_intr_thread_fn to perform its own
      polling of the ice_ptp_process_ts function. We sleep for a few microseconds
      between attempts to avoid wasting significant CPU time. The value was
      chosen to allow time for the Tx timestamps to complete without wasting so
      much time that we overrun application wait budgets in the worst case.
      
      The ice_ptp_process_ts function also currently returns false in the event
      that the Tx tracker is not initialized. This would result in the threaded
      IRQ handler never exiting if it gets started while the tracker is not
      initialized.
      
      Fix the function to appropriately return true when the tracker is not
      initialized.
      
      Note that this will not reproduce with default ptp4l behavior, as the
      program always synchronously waits for a timestamp response before sending
      another timestamp request.
      Reported-by: default avatarSiddaraju DH <siddaraju.dh@intel.com>
      Fixes: 1229b339 ("ice: Add low latency Tx timestamp read")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20221118222729.1565317-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      30f15874
    • YueHaibing's avatar
      tipc: check skb_linearize() return value in tipc_disc_rcv() · cd0f6421
      YueHaibing authored
      If skb_linearize() fails in tipc_disc_rcv(), we need to free the skb instead of
      handle it.
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Link: https://lore.kernel.org/r/20221119072832.7896-1-yuehaibing@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cd0f6421
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 5916380c
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-11-18 (iavf)
      
      Ivan Vecera resolves issues related to reset by adding back call to
      netif_tx_stop_all_queues() and adding calls to dev_close() to ensure
      device is properly closed during reset.
      
      Stefan Assmann removes waiting for setting of MAC address as this breaks
      ARP.
      
      Slawomir adds setting of __IAVF_IN_REMOVE_TASK bit to prevent deadlock
      between remove and shutdown.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        iavf: Fix race condition between iavf_shutdown and iavf_remove
        iavf: remove INITIAL_MAC_SET to allow gARP to work properly
        iavf: Do not restart Tx queues after reset task failure
        iavf: Fix a crash during reset task
      ====================
      
      Link: https://lore.kernel.org/r/20221118222439.1565245-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5916380c
    • Jakub Kicinski's avatar
      Merge branch 'tipc-fix-two-race-issues-in-tipc_conn_alloc' · 3349c272
      Jakub Kicinski authored
      Xin Long says:
      
      ====================
      tipc: fix two race issues in tipc_conn_alloc
      
      The race exists beteen tipc_topsrv_accept() and tipc_conn_close(),
      one is allocating the con while the other is freeing it and there
      is no proper lock protecting it. Therefore, a null-pointer-defer
      and a use-after-free may be triggered, see details on each patch.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1668807842.git.lucien.xin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3349c272
    • Xin Long's avatar
      tipc: add an extra conn_get in tipc_conn_alloc · a7b42969
      Xin Long authored
      One extra conn_get() is needed in tipc_conn_alloc(), as after
      tipc_conn_alloc() is called, tipc_conn_close() may free this
      con before deferencing it in tipc_topsrv_accept():
      
         tipc_conn_alloc();
         newsk = newsock->sk;
                                       <---- tipc_conn_close();
         write_lock_bh(&sk->sk_callback_lock);
         newsk->sk_data_ready = tipc_conn_data_ready;
      
      Then an uaf issue can be triggered:
      
        BUG: KASAN: use-after-free in tipc_topsrv_accept+0x1e7/0x370 [tipc]
        Call Trace:
         <TASK>
         dump_stack_lvl+0x33/0x46
         print_report+0x178/0x4b0
         kasan_report+0x8c/0x100
         kasan_check_range+0x179/0x1e0
         tipc_topsrv_accept+0x1e7/0x370 [tipc]
         process_one_work+0x6a3/0x1030
         worker_thread+0x8a/0xdf0
      
      This patch fixes it by holding it in tipc_conn_alloc(), then after
      all accessing in tipc_topsrv_accept() releasing it. Note when does
      this in tipc_topsrv_kern_subscr(), as tipc_conn_rcv_sub() returns
      0 or -1 only, we don't need to check for "> 0".
      
      Fixes: c5fa7b3c ("tipc: introduce new TIPC server infrastructure")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a7b42969
    • Xin Long's avatar
      tipc: set con sock in tipc_conn_alloc · 0e5d56c6
      Xin Long authored
      A crash was reported by Wei Chen:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000018
        RIP: 0010:tipc_conn_close+0x12/0x100
        Call Trace:
         tipc_topsrv_exit_net+0x139/0x320
         ops_exit_list.isra.9+0x49/0x80
         cleanup_net+0x31a/0x540
         process_one_work+0x3fa/0x9f0
         worker_thread+0x42/0x5c0
      
      It was caused by !con->sock in tipc_conn_close(). In tipc_topsrv_accept(),
      con is allocated in conn_idr then its sock is set:
      
        con = tipc_conn_alloc();
        ...                    <----[1]
        con->sock = newsock;
      
      If tipc_conn_close() is called in anytime of [1], the null-pointer-def
      is triggered by con->sock->sk due to con->sock is not yet set.
      
      This patch fixes it by moving the con->sock setting to tipc_conn_alloc()
      under s->idr_lock. So that con->sock can never be NULL when getting the
      con from s->conn_idr. It will be also safer to move con->server and flag
      CF_CONNECTED setting under s->idr_lock, as they should all be set before
      tipc_conn_alloc() is called.
      
      Fixes: c5fa7b3c ("tipc: introduce new TIPC server infrastructure")
      Reported-by: default avatarWei Chen <harperchen1110@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0e5d56c6
    • Wei Yongjun's avatar
      net: phy: at803x: fix error return code in at803x_probe() · 1f0dd412
      Wei Yongjun authored
      Fix to return a negative error code from the ccr read error handling
      case instead of 0, as done elsewhere in this function.
      
      Fixes: 3265f421 ("net: phy: at803x: add fiber support")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20221118103635.254256-1-weiyongjun@huaweicloud.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1f0dd412
    • Emeel Hakim's avatar
      net/mlx5e: Fix possible race condition in macsec extended packet number update routine · 8514e325
      Emeel Hakim authored
      Currenty extended packet number (EPN) update routine is accessing
      macsec object without holding the general macsec lock hence facing
      a possible race condition when an EPN update occurs while updating
      or deleting the SA.
      Fix by holding the general macsec lock before accessing the object.
      
      Fixes: 4411a6c0 ("net/mlx5e: Support MACsec offload extended packet number (EPN)")
      Signed-off-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8514e325
    • Emeel Hakim's avatar
      net/mlx5e: Fix MACsec update SecY · 94ffd6e0
      Emeel Hakim authored
      Currently updating SecY destroys and re-creates RX SA objects,
      the re-created RX SA objects are not identical to the destroyed
      objects and it disagree on the encryption enabled property which
      holds the value false after recreation, this value is not
      supported with offload which leads to no traffic after an update.
      Fix by recreating an identical objects.
      
      Fixes: 5a39816a ("net/mlx5e: Add MACsec offload SecY support")
      Signed-off-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Reviewed-by: default avatarRaed Salem <raeds@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      94ffd6e0
    • Emeel Hakim's avatar
      net/mlx5e: Fix MACsec SA initialization routine · d20a56b0
      Emeel Hakim authored
      Currently as part of MACsec SA initialization routine
      extended packet number (EPN) object attribute is always
      being set without checking if EPN is actually enabled,
      the above could lead to a NULL dereference.
      Fix by adding such a check.
      
      Fixes: 4411a6c0 ("net/mlx5e: Support MACsec offload extended packet number (EPN)")
      Signed-off-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Reviewed-by: default avatarRaed Salem <raeds@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d20a56b0
    • Tariq Toukan's avatar
      net/mlx5e: Remove leftovers from old XSK queues enumeration · 11abca03
      Tariq Toukan authored
      Before the cited commit, for N channels, a dedicated set of N queues was
      created to support XSK, in indices [N, 2N-1], doubling the number of
      queues.
      
      In addition, changing the number of channels was prohibited, as it would
      shift the indices.
      
      Remove these two leftovers, as we moved XSK to a new queueing scheme,
      starting from index 0.
      
      Fixes: 3db4c85c ("net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      11abca03
    • Chris Mi's avatar
      net/mlx5e: Offload rule only when all encaps are valid · f3774220
      Chris Mi authored
      The cited commit adds a for loop to support multiple encapsulations.
      But it only checks if the last encap is valid.
      
      Fix it by setting slow path flag when one of the encap is invalid.
      
      Fixes: f493f155 ("net/mlx5e: Move flow attr reformat action bit to per dest flags")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f3774220
    • Tariq Toukan's avatar
      net/mlx5e: Fix missing alignment in size of MTT/KLM entries · 3e874cb1
      Tariq Toukan authored
      In the cited patch, an alignment required by the HW spec was mistakenly
      dropped. Bring it back to fix error completions like the below:
      
      mlx5_core 0000:00:08.0 eth2: Error cqe on cqn 0x40b, ci 0x0, qn 0x104f, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
      00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00000030: 00 00 00 00 86 00 68 02 25 00 10 4f 00 00 bb d2
      WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0x0, len: 192
      00000000: 00 00 00 25 00 10 4f 0c 00 00 00 00 00 18 2e 00
      00000010: 90 00 00 00 00 02 00 00 00 00 00 00 20 00 00 00
      00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      00000080: 08 00 00 00 48 6a 00 02 08 00 00 00 0e 10 00 02
      00000090: 08 00 00 00 0c db 00 02 08 00 00 00 0e 82 00 02
      000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      Fixes: 9f123f74 ("net/mlx5e: Improve MTT/KSM alignment")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3e874cb1
    • Moshe Shemesh's avatar
      net/mlx5: Fix sync reset event handler error flow · e1ad07b9
      Moshe Shemesh authored
      When sync reset now event handling fails on mlx5_pci_link_toggle() then
      no reset was done. However, since mlx5_cmd_fast_teardown_hca() was
      already done, the firmware function is closed and the driver is left
      without firmware functionality.
      
      Fix it by setting device error state and reopen the firmware resources.
      Reopening is done by the thread that was called for devlink reload
      fw_activate as it already holds the devlink lock.
      
      Fixes: 5ec69744 ("net/mlx5: Add support for devlink reload action fw activate")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarAya Levin <ayal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e1ad07b9
    • Roi Dayan's avatar
      net/mlx5: E-Switch, Set correctly vport destination · 6d942e40
      Roi Dayan authored
      The cited commit moved from using reformat_id integer to packet_reformat
      pointer which introduced the possibility to null pointer dereference.
      When setting packet reformat flag and pkt_reformat pointer must
      exists so checking MLX5_ESW_DEST_ENCAP is not enough, we need
      to make sure the pkt_reformat is valid and check for MLX5_ESW_DEST_ENCAP_VALID.
      If the dest encap valid flag does not exists then pkt_reformat can be
      either invalid address or null.
      Also, to make sure we don't try to access invalid pkt_reformat set it to
      null when invalidated and invalidate it before calling add flow code as
      its logically more correct and to be safe.
      
      Fixes: 2b688ea5 ("net/mlx5: Add flow steering actions to fs_cmd shim layer")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarChris Mi <cmi@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6d942e40
    • Eli Cohen's avatar
      net/mlx5: Lag, avoid lockdep warnings · 0d4e8ed1
      Eli Cohen authored
      ldev->lock is used to serialize lag change operations. Since multiport
      eswtich functionality was added, we now change the mode dynamically.
      However, acquiring ldev->lock is not allowed as it could possibly lead
      to a deadlock as reported by the lockdep mechanism.
      
      [  836.154963] WARNING: possible circular locking dependency detected
      [  836.155850] 5.19.0-rc5_net_56b7df2 #1 Not tainted
      [  836.156549] ------------------------------------------------------
      [  836.157418] handler1/12198 is trying to acquire lock:
      [  836.158178] ffff888187d52b58 (&ldev->lock){+.+.}-{3:3}, at: mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
      [  836.159575]
      [  836.159575] but task is already holding lock:
      [  836.160474] ffff8881d4de2930 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200
      [  836.161669] which lock already depends on the new lock.
      [  836.162905]
      [  836.162905] the existing dependency chain (in reverse order) is:
      [  836.164008] -> #3 (&block->cb_lock){++++}-{3:3}:
      [  836.164946]        down_write+0x25/0x60
      [  836.165548]        tcf_block_get_ext+0x1c6/0x5d0
      [  836.166253]        ingress_init+0x74/0xa0 [sch_ingress]
      [  836.167028]        qdisc_create.constprop.0+0x130/0x5e0
      [  836.167805]        tc_modify_qdisc+0x481/0x9f0
      [  836.168490]        rtnetlink_rcv_msg+0x16e/0x5a0
      [  836.169189]        netlink_rcv_skb+0x4e/0xf0
      [  836.169861]        netlink_unicast+0x190/0x250
      [  836.170543]        netlink_sendmsg+0x243/0x4b0
      [  836.171226]        sock_sendmsg+0x33/0x40
      [  836.171860]        ____sys_sendmsg+0x1d1/0x1f0
      [  836.172535]        ___sys_sendmsg+0xab/0xf0
      [  836.173183]        __sys_sendmsg+0x51/0x90
      [  836.173836]        do_syscall_64+0x3d/0x90
      [  836.174471]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  836.175282]
      
      [  836.175282] -> #2 (rtnl_mutex){+.+.}-{3:3}:
      [  836.176190]        __mutex_lock+0x6b/0xf80
      [  836.176830]        register_netdevice_notifier+0x21/0x120
      [  836.177631]        rtnetlink_init+0x2d/0x1e9
      [  836.178289]        netlink_proto_init+0x163/0x179
      [  836.178994]        do_one_initcall+0x63/0x300
      [  836.179672]        kernel_init_freeable+0x2cb/0x31b
      [  836.180403]        kernel_init+0x17/0x140
      [  836.181035]        ret_from_fork+0x1f/0x30
      
       [  836.181687] -> #1 (pernet_ops_rwsem){+.+.}-{3:3}:
      [  836.182628]        down_write+0x25/0x60
      [  836.183235]        unregister_netdevice_notifier+0x1c/0xb0
      [  836.184029]        mlx5_ib_roce_cleanup+0x94/0x120 [mlx5_ib]
      [  836.184855]        __mlx5_ib_remove+0x35/0x60 [mlx5_ib]
      [  836.185637]        mlx5_eswitch_unregister_vport_reps+0x22f/0x440 [mlx5_core]
      [  836.186698]        auxiliary_bus_remove+0x18/0x30
      [  836.187409]        device_release_driver_internal+0x1f6/0x270
      [  836.188253]        bus_remove_device+0xef/0x160
      [  836.188939]        device_del+0x18b/0x3f0
      [  836.189562]        mlx5_rescan_drivers_locked+0xd6/0x2d0 [mlx5_core]
      [  836.190516]        mlx5_lag_remove_devices+0x69/0xe0 [mlx5_core]
      [  836.191414]        mlx5_do_bond_work+0x441/0x620 [mlx5_core]
      [  836.192278]        process_one_work+0x25c/0x590
      [  836.192963]        worker_thread+0x4f/0x3d0
      [  836.193609]        kthread+0xcb/0xf0
      [  836.194189]        ret_from_fork+0x1f/0x30
      
      [  836.194826] -> #0 (&ldev->lock){+.+.}-{3:3}:
      [  836.195734]        __lock_acquire+0x15b8/0x2a10
      [  836.196426]        lock_acquire+0xce/0x2d0
      [  836.197057]        __mutex_lock+0x6b/0xf80
      [  836.197708]        mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
      [  836.198575]        tc_act_parse_mirred+0x25b/0x800 [mlx5_core]
      [  836.199467]        parse_tc_actions+0x168/0x5a0 [mlx5_core]
      [  836.200340]        __mlx5e_add_fdb_flow+0x263/0x480 [mlx5_core]
      [  836.201241]        mlx5e_configure_flower+0x8a0/0x1820 [mlx5_core]
      [  836.202187]        tc_setup_cb_add+0xd7/0x200
      [  836.202856]        fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
      [  836.203739]        fl_change+0xbbe/0x1730 [cls_flower]
      [  836.204501]        tc_new_tfilter+0x407/0xd90
      [  836.205168]        rtnetlink_rcv_msg+0x406/0x5a0
      [  836.205877]        netlink_rcv_skb+0x4e/0xf0
      [  836.206535]        netlink_unicast+0x190/0x250
      [  836.207217]        netlink_sendmsg+0x243/0x4b0
      [  836.207915]        sock_sendmsg+0x33/0x40
      [  836.208538]        ____sys_sendmsg+0x1d1/0x1f0
      [  836.209219]        ___sys_sendmsg+0xab/0xf0
      [  836.209878]        __sys_sendmsg+0x51/0x90
      [  836.210510]        do_syscall_64+0x3d/0x90
      [  836.211137]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      [  836.211954] other info that might help us debug this:
      [  836.213174] Chain exists of:
      [  836.213174]   &ldev->lock --> rtnl_mutex --> &block->cb_lock
         836.214650]  Possible unsafe locking scenario:
      [  836.214650]
      [  836.215574]        CPU0                    CPU1
      [  836.216255]        ----                    ----
      [  836.216943]   lock(&block->cb_lock);
      [  836.217518]                                lock(rtnl_mutex);
      [  836.218348]                                lock(&block->cb_lock);
      [  836.219212]   lock(&ldev->lock);
      [  836.219758]
      [  836.219758]  *** DEADLOCK ***
      [  836.219758]
       [  836.220747] 2 locks held by handler1/12198:
      [  836.221390]  #0: ffff8881d4de2930 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200
      [  836.222646]  #1: ffff88810c9a92c0 (&esw->mode_lock){++++}-{3:3}, at: mlx5_esw_hold+0x39/0x50 [mlx5_core]
      
      [  836.224063] stack backtrace:
      [  836.224799] CPU: 6 PID: 12198 Comm: handler1 Not tainted 5.19.0-rc5_net_56b7df2 #1
      [  836.225923] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  836.227476] Call Trace:
      [  836.227929]  <TASK>
      [  836.228332]  dump_stack_lvl+0x57/0x7d
      [  836.228924]  check_noncircular+0x104/0x120
      [  836.229562]  __lock_acquire+0x15b8/0x2a10
      [  836.230201]  lock_acquire+0xce/0x2d0
      [  836.230776]  ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
      [  836.231614]  ? find_held_lock+0x2b/0x80
      [  836.232221]  __mutex_lock+0x6b/0xf80
      [  836.232799]  ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
      [  836.233636]  ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
      [  836.234451]  ? xa_load+0xc3/0x190
      [  836.234995]  mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core]
      [  836.235803]  tc_act_parse_mirred+0x25b/0x800 [mlx5_core]
      [  836.236636]  ? tc_act_can_offload_mirred+0x135/0x210 [mlx5_core]
      [  836.237550]  parse_tc_actions+0x168/0x5a0 [mlx5_core]
      [  836.238364]  __mlx5e_add_fdb_flow+0x263/0x480 [mlx5_core]
      [  836.239202]  mlx5e_configure_flower+0x8a0/0x1820 [mlx5_core]
      [  836.240076]  ? lock_acquire+0xce/0x2d0
      [  836.240668]  ? tc_setup_cb_add+0x5b/0x200
      [  836.241294]  tc_setup_cb_add+0xd7/0x200
      [  836.241917]  fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
      [  836.242709]  fl_change+0xbbe/0x1730 [cls_flower]
      [  836.243408]  tc_new_tfilter+0x407/0xd90
      [  836.244043]  ? tc_del_tfilter+0x880/0x880
      [  836.244672]  rtnetlink_rcv_msg+0x406/0x5a0
      [  836.245310]  ? netlink_deliver_tap+0x7a/0x4b0
      [  836.245991]  ? if_nlmsg_stats_size+0x2b0/0x2b0
      [  836.246675]  netlink_rcv_skb+0x4e/0xf0
      [  836.258046]  netlink_unicast+0x190/0x250
      [  836.258669]  netlink_sendmsg+0x243/0x4b0
      [  836.259288]  sock_sendmsg+0x33/0x40
      [  836.259857]  ____sys_sendmsg+0x1d1/0x1f0
      [  836.260473]  ___sys_sendmsg+0xab/0xf0
      [  836.261064]  ? lock_acquire+0xce/0x2d0
      [  836.261669]  ? find_held_lock+0x2b/0x80
      [  836.262272]  ? __fget_files+0xb9/0x190
      [  836.262871]  ? __fget_files+0xd3/0x190
      [  836.263462]  __sys_sendmsg+0x51/0x90
      [  836.264064]  do_syscall_64+0x3d/0x90
      [  836.264652]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  836.265425] RIP: 0033:0x7fdbe5e2677d
      
      [  836.266012] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ba ee
      ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f
      05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 ee ee ff ff 48
      [  836.268485] RSP: 002b:00007fdbe48a75a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
      [  836.269598] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fdbe5e2677d
      [  836.270576] RDX: 0000000000000000 RSI: 00007fdbe48a7640 RDI: 000000000000003c
      [  836.271565] RBP: 00007fdbe48a8368 R08: 0000000000000000 R09: 0000000000000000
      [  836.272546] R10: 00007fdbe48a84b0 R11: 0000000000000293 R12: 0000557bd17dc860
      [  836.273527] R13: 0000000000000000 R14: 0000557bd17dc860 R15: 00007fdbe48a7640
      
      [  836.274521]  </TASK>
      
      To avoid using mode holding ldev->lock in the configure flow, we queue a
      work to the lag workqueue and cease wait on a completion object.
      
      In addition, we remove the lock from mlx5_lag_do_mirred() since it is
      not really protecting anything.
      
      It should be noted that an actual deadlock has not been observed.
      Signed-off-by: default avatarEli Cohen <elic@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0d4e8ed1