1. 26 Oct, 2021 30 commits
  2. 25 Oct, 2021 10 commits
    • Parav Pandit's avatar
      net/mlx5: SF_DEV Add SF device trace points · d67ab0a8
      Parav Pandit authored
      Add SF device add and delete specific trace points.
      
      echo mlx5:mlx5_sf_dev_add >> /sys/kernel/debug/tracing/set_event
      echo mlx5:mlx5_sf_dev_del >> /sys/kernel/debug/tracing/set_event
      echo mlx5:mlx5_sf_vhca_event >> /sys/kernel/debug/tracing/set_event
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d67ab0a8
    • Parav Pandit's avatar
      net/mlx5: SF, Add SF trace points · b3ccada6
      Parav Pandit authored
      Add support for trace events for SFs to improve debugging.
      This covers
      (a) port add and free trace points
      (b) device level trace points
      (c) SF hardware context add, free trace points.
      (d) SF function activate/deacticate and state trace points
      
      SF events examples:
      echo mlx5:mlx5_sf_add >> /sys/kernel/debug/tracing/set_event
      echo mlx5:mlx5_sf_free >> /sys/kernel/debug/tracing/set_event
      echo mlx5:mlx5_sf_hwc_alloc >> /sys/kernel/debug/tracing/set_event
      echo mlx5:mlx5_sf_hwc_free >> /sys/kernel/debug/tracing/set_event
      echo mlx5:mlx5_sf_hwc_deferred_free >> /sys/kernel/debug/tracing/set_event
      echo mlx5:mlx5_sf_update_state >> /sys/kernel/debug/tracing/set_event
      echo mlx5:mlx5_sf_activate >> /sys/kernel/debug/tracing/set_event
      echo mlx5:mlx5_sf_deactivate >> /sys/kernel/debug/tracing/set_event
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b3ccada6
    • Shay Drory's avatar
      net/mlx5: Let user configure max_macs param · 55460406
      Shay Drory authored
      Currently, max_macs is taking 70Kbytes of memory per function. This
      size is not needed in all use cases, and is critical with large scale.
      Hence, allow user to configure the number of max_macs.
      
      For example, to reduce the number of max_macs to 1, execute::
      $ devlink dev param set pci/0000:00:0b.0 name max_macs value 1 \
                    cmode driverinit
      $ devlink dev reload pci/0000:00:0b.0
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      55460406
    • Shay Drory's avatar
      net/mlx5: Let user configure event_eq_size param · a6cb08da
      Shay Drory authored
      Event EQ is an EQ which received the notification of almost all the
      events generated by the NIC.
      Currently, each event EQ is taking 512KB of memory. This size is not
      needed in most use cases, and is critical with large scale. Hence,
      allow user to configure the size of the event EQ.
      
      For example to reduce event EQ size to 64, execute::
      $ devlink resource set pci/0000:00:0b.0 path /event_eq_size/ size 64
      $ devlink dev reload pci/0000:00:0b.0
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a6cb08da
    • Shay Drory's avatar
      net/mlx5: Let user configure io_eq_size param · 46ae40b9
      Shay Drory authored
      Currently, each I/O EQ is taking 128KB of memory. This size
      is not needed in all use cases, and is critical with large scale.
      Hence, allow user to configure the size of I/O EQs.
      
      For example, to reduce I/O EQ size to 64, execute:
      $ devlink resource set pci/0000:00:0b.0 path /io_eq_size/ size 64
      $ devlink dev reload pci/0000:00:0b.0
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      46ae40b9
    • Vlad Buslov's avatar
      net/mlx5: Bridge, support replacing existing FDB entry · 3518c83f
      Vlad Buslov authored
      The SWITCHDEV_FDB_ADD_TO_DEVICE is used for both adding new and replacing
      existing entry. Implement support for replacing existing FDB entries in
      mlx5 offload code.
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3518c83f
    • Vlad Buslov's avatar
      net/mlx5: Bridge, extract code to lookup and del/notify entry · 2deda2f1
      Vlad Buslov authored
      Following two patterns in bridge code are used in multiple places where
      similar code is duplicated:
      
      - Lookup FDB entry from hashtable by address+vid pair.
      
      - Notify software bridge and then delete existing FDB entry.
      
      In order to improve code quality and prepare for following patch series
      that also uses described patterns, extract the codes to dedicated helper
      functions.
      
      This commit doesn't change functionality.
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2deda2f1
    • Aya Levin's avatar
      net/mlx5: Add periodic update of host time to firmware · 5a1023de
      Aya Levin authored
      Firmware logs its asserts also to non-volatile memory. In order to
      reduce drift between the NIC and the host, the driver sets the host
      epoch-time to the firmware every hour.
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      5a1023de
    • Aya Levin's avatar
      net/mlx5: Print health buffer by log level · b87ef75c
      Aya Levin authored
      Add log macro which gets log level as a parameter. Use the severity
      read from the health buffer and the new log macro to log the health buffer
      with severity as log level.  Prior to this patch, health buffer was
      printed in error log level regardless of its severity. Now the user may
      filter dmesg (--level) or change kernel log level to focus on different
      severity levels of firmware errors.
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b87ef75c
    • Aya Levin's avatar
      net/mlx5: Extend health buffer dump · cb464ba5
      Aya Levin authored
      Enhance health buffer to include:
       - assert_var5: expose the 6'th assert variable.
       - time: error's time-stamp in seconds (epoch time).
       - rfr: Recovery Flow Requiered. When set, indicates that the error
              cannot be recovered without flow involving reset.
       - severity: error's severity value, ranging from emergency to debug.
      Expose them in the health buffer dump (dmesg and devlink fw reporter).
      
      Health buffer in dmesg:
      mlx5_core 0000:08:00.0: print_health_info:425:(pid 912): Health issue observed, firmware internal error, severity(3) ERROR:
      mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[0] 0x08040700
      mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[1] 0x00000000
      mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[2] 0x00000000
      mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[3] 0x00000000
      mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[4] 0x00000000
      mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[5] 0x00000000
      mlx5_core 0000:08:00.0: print_health_info:432:(pid 912): assert_exit_ptr 0x00aaf800
      mlx5_core 0000:08:00.0: print_health_info:434:(pid 912): assert_callra 0x00aaf70c
      mlx5_core 0000:08:00.0: print_health_info:436:(pid 912): fw_ver 16.32.492
      mlx5_core 0000:08:00.0: print_health_info:437:(pid 912): time 1634819758
      mlx5_core 0000:08:00.0: print_health_info:438:(pid 912): hw_id 0x0000020d
      mlx5_core 0000:08:00.0: print_health_info:439:(pid 912): rfr 0
      mlx5_core 0000:08:00.0: print_health_info:440:(pid 912): severity 3 (ERROR)
      mlx5_core 0000:08:00.0: print_health_info:441:(pid 912): irisc_index 9
      mlx5_core 0000:08:00.0: print_health_info:442:(pid 912): synd 0x1: firmware internal error
      mlx5_core 0000:08:00.0: print_health_info:444:(pid 912): ext_synd 0x802b
      mlx5_core 0000:08:00.0: print_health_info:445:(pid 912): raw fw_ver 0x102001ec
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      cb464ba5