Commit a4bd8a5a authored by Guilherme G. Piccoli's avatar Guilherme G. Piccoli Committed by Greg Kroah-Hartman

bnx2x: Improve reliability in case of nested PCI errors


[ Upstream commit f7084059 ]

While in recovery process of PCI error (called EEH on PowerPC arch),
another PCI transaction could be corrupted causing a situation of
nested PCI errors. Also, this scenario could be reproduced with
error injection mechanisms (for debug purposes).

We observe that in case of nested PCI errors, bnx2x might attempt to
initialize its shmem and cause a kernel crash due to bad addresses
read from MCP. Multiple different stack traces were observed depending
on the point the second PCI error happens.

This patch avoids the crashes by:

 * failing PCI recovery in case of nested errors (since multiple
 PCI errors in a row are not expected to lead to a functional
 adapter anyway), and by,

 * preventing access to adapter FW when MCP is failed (we mark it as
 failed when shmem cannot get initialized properly).
Reported-by: default avatarAbdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Acked-by: default avatarShahed Shaikh <Shahed.Shaikh@cavium.com>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
parent d1f8a5e0
...@@ -3052,7 +3052,7 @@ int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode, bool keep_link) ...@@ -3052,7 +3052,7 @@ int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode, bool keep_link)
del_timer_sync(&bp->timer); del_timer_sync(&bp->timer);
if (IS_PF(bp)) { if (IS_PF(bp) && !BP_NOMCP(bp)) {
/* Set ALWAYS_ALIVE bit in shmem */ /* Set ALWAYS_ALIVE bit in shmem */
bp->fw_drv_pulse_wr_seq |= DRV_PULSE_ALWAYS_ALIVE; bp->fw_drv_pulse_wr_seq |= DRV_PULSE_ALWAYS_ALIVE;
bnx2x_drv_pulse(bp); bnx2x_drv_pulse(bp);
...@@ -3134,7 +3134,7 @@ int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode, bool keep_link) ...@@ -3134,7 +3134,7 @@ int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode, bool keep_link)
bp->cnic_loaded = false; bp->cnic_loaded = false;
/* Clear driver version indication in shmem */ /* Clear driver version indication in shmem */
if (IS_PF(bp)) if (IS_PF(bp) && !BP_NOMCP(bp))
bnx2x_update_mng_version(bp); bnx2x_update_mng_version(bp);
/* Check if there are pending parity attentions. If there are - set /* Check if there are pending parity attentions. If there are - set
......
...@@ -9570,6 +9570,15 @@ static int bnx2x_init_shmem(struct bnx2x *bp) ...@@ -9570,6 +9570,15 @@ static int bnx2x_init_shmem(struct bnx2x *bp)
do { do {
bp->common.shmem_base = REG_RD(bp, MISC_REG_SHARED_MEM_ADDR); bp->common.shmem_base = REG_RD(bp, MISC_REG_SHARED_MEM_ADDR);
/* If we read all 0xFFs, means we are in PCI error state and
* should bail out to avoid crashes on adapter's FW reads.
*/
if (bp->common.shmem_base == 0xFFFFFFFF) {
bp->flags |= NO_MCP_FLAG;
return -ENODEV;
}
if (bp->common.shmem_base) { if (bp->common.shmem_base) {
val = SHMEM_RD(bp, validity_map[BP_PORT(bp)]); val = SHMEM_RD(bp, validity_map[BP_PORT(bp)]);
if (val & SHR_MEM_VALIDITY_MB) if (val & SHR_MEM_VALIDITY_MB)
...@@ -14214,7 +14223,10 @@ static pci_ers_result_t bnx2x_io_slot_reset(struct pci_dev *pdev) ...@@ -14214,7 +14223,10 @@ static pci_ers_result_t bnx2x_io_slot_reset(struct pci_dev *pdev)
BNX2X_ERR("IO slot reset --> driver unload\n"); BNX2X_ERR("IO slot reset --> driver unload\n");
/* MCP should have been reset; Need to wait for validity */ /* MCP should have been reset; Need to wait for validity */
bnx2x_init_shmem(bp); if (bnx2x_init_shmem(bp)) {
rtnl_unlock();
return PCI_ERS_RESULT_DISCONNECT;
}
if (IS_PF(bp) && SHMEM2_HAS(bp, drv_capabilities_flag)) { if (IS_PF(bp) && SHMEM2_HAS(bp, drv_capabilities_flag)) {
u32 v; u32 v;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment