1. 29 Mar, 2018 37 commits
  2. 28 Mar, 2018 3 commits
    • Eran Ben Elisha's avatar
      net/mlx5e: Recover Send Queue (SQ) from error state · db75373c
      Eran Ben Elisha authored
      An error TX completion (CQE) which arrived on a specific SQ indicates
      that this SQ got moved by the hardware to error state, which means all
      pending and incoming TX requests are dropped or will be dropped and no
      further "Good" CQEs will be generated for that SQ.
      
      Before this patch TX completions (CQEs) were not monitored and were
      handled as a regular CQE. This caused the SQ to stay in an error state,
      making it useless for xmiting new packets.
      
      Mitigation plan:
      In case of an error completion, schedule a recovery work which would do
      the following:
      - Mark the TXQ as DRV_XOFF to disable new packets to arrive from the
        stack
      - NAPI to flush all pending SQ WQEs (via flush_in_error_en bit) to
        release SW and HW resources(SKB, DMA, etc) and have the SQ and CQ
        consumer/producer indices synced.
      - Modify the SQ state ERR -> RST -> RDY (restart the SQ).
      - Reactivate the SQ and reset SQ cc and pc
      
      If we identify two consecutive requests for SQ recover in less than
      500 msecs, drop the recover request to avoid CPU overload, as this
      scenario most likely happened due to a severe repeated bug.
      
      In addition, add SQ recover SW counter to monitor successful recoveries.
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      db75373c
    • Eran Ben Elisha's avatar
      net/mlx5e: Dump xmit error completions · 16cc14d8
      Eran Ben Elisha authored
      Monitor and dump xmit error completions. In addition, add err_cqe
      counter to track the number of error completion per send queue.
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      16cc14d8
    • Eran Ben Elisha's avatar
      mlx5: Move dump error CQE function out of mlx5_ib for code sharing · 1acae6b0
      Eran Ben Elisha authored
      Move mlx5_ib dump error CQE implementation to mlx5 CQ header file in
      order to use it in a downstream patch from mlx5e.
      
      In addition, use print_hex_dump instead of manual dumping of the buffer.
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      1acae6b0