- 20 Mar, 2023 12 commits
-
-
Ofir Bitton authored
We expose this in order for user applications to know how much dram is reserved for internal use. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Tomer Tayar authored
Remove all '\n' from strings which are passed as arguments to gaudi2_print_event(), because the newline character is added internally in this function. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
Now that CQ-completion based jobs do not trigger a reset upon failure, failure of such jobs (e.g., MMU cache invalidation) should be handled by the caller itself depending on the error code returned to it. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Oded Gabbay authored
Don't use padX for actual reservedX fields. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>
-
Dafna Hirschfeld authored
Since the err_cause register is unprivileged, we should read it from the driver instead of using the param that came from the FW. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Dafna Hirschfeld authored
- remove reset_sleep_ms arg from functions that don't use it. - move the call msleep(reset_sleep_ms) from btm poll to gaudi2_hw_fini as it is called from there already for other flow. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Dafna Hirschfeld authored
In hw_fini callback, we use either the cpucp packet method or polling a register. Currently we return error only in the case of cpucp packet failure. In this patch we also return error if polling timed out. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Ofir Bitton authored
Due to a firmware bug we need to increase reset poll timeout or else we will timeout in secured environments. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
Engines idle state can't always be verified between changes of engine modes (e.g., stall/halt). For example, if a CS is inflight when altering engine's mode, idle state will return NOT idle, always. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Oded Gabbay authored
Copy the most up-to-date interface files to the firmware. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>
-
Colin Ian King authored
There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Oded Gabbay authored
This function is only called inside gaudi2.c file. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202303071320.X5ouBlNY-lkp@intel.com/Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>
-
- 15 Mar, 2023 28 commits
-
-
Bjorn Helgaas authored
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since commit f26e58bf ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration, so the driver doesn't need to do it itself. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this only controls ERR_* Messages from the device. An ERR_* Message may cause the Root Port to generate an interrupt, depending on the AER Root Error Command register managed by the AER service driver. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Tomer Tayar authored
The memory manager IDR is currently destroyed when user releases the file descriptor. However, at this point the user context might be still held, and memory buffers might be still in use. Later on, calls to release those buffers will fail due to not finding their handles in the IDR, leading to a memory leak. To avoid this leak, split the IDR destruction from the memory manager fini, and postpone it to hpriv_release() when there is no user context and no buffers are used. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Dafna Hirschfeld authored
We plan to do soft-reset either by mmio or by using cpucp packet depending on the FW version. We don't want to check FW version in two different places for that (execute soft-reset and wait to soft-reset) so move the waiting to gaudi2_execute_soft_reset. This also makes sense because the cpucp also does the waiting. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
The user might want to stall/resume engines to perform power testing for various scenarios. Because our current HL_CS_FLAGS_ENGINE_CORE_COMMAND command only handles the engines' cores, we need to add another opcode for handling entire engine and not just its core. The user supplies an array, where each entry holds the engine's ID and the command to send to the engine. The size of the array is limited by the number of engines in the ASIC (only Gaudi2 is currently supported). Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Tomer Tayar authored
compose_device_in_use_info() was added to handle the snprintf() return value in a single place. However, the buffer size in print_device_in_use_info() is set such that it would be enough for the max possible print, so compose_device_in_use_info() is not really needed. Moreover, scnprintf() can be used instead of snprintf(), to save the check if the return value larger than the given size. Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
-
Dafna Hirschfeld authored
print more informative message when failing in dirty state Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
-
Koby Elbaz authored
There are two reasons why mutex is better here: 1. There's a critical section relatively long, where in certain scenarios (e.g., multiple VM allocations) taking a spinlock might cause noticeable performance degradation. 2. It will remove the incorrect usage of mutex under spin_lock (where preemption is disabled). Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Dafna Hirschfeld authored
We can allow userspace to query the dram usage during soft-reset. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
The PDMA/EDMA is_idle routines didn't check the correct CORE register in order to get the accurate idle state. Moreover, it's better to make the is_idle routine more robust by adding additional checks (IS_HALTED) before announcing that the core is idle. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
Is appears that the flag - DCORE0_TPC0_CFG_STATUS_VECTOR_PIPE_EMPTY_MASK, has no actual use when it comes to querying TPC idleness, since this flag's corresponding bit turns-off after stalling the engine, and turns back on after resuming it. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
farah kassabri authored
Run spell checker on the code and fix accordingly. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
In case the KDMA fails scrubbing the DCCMs (following a soft-reset upon device release), the driver will only print failure until reset flow ends, rather than escalating it into a hard-reset. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Tomer Tayar authored
Add notifications to user in case of decoder abnormal interrupts, and use the graceful reset mechanism if reset is required. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Dafna Hirschfeld authored
Since hw_fini return error code for failure indication, we should check its return value. Currently it might only fail upon soft-reset from hl_device_reset. Later patch will add hw_fini failure in case of polling timeout in hard-reset. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Koby Elbaz authored
is_idle() was too long, so break it up for readability. In addition, we can now use the new sub-routines from other places. Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Sagiv Ozeri authored
Compute driver threads names will start with hlX-*, when X is the device id. This will help distinguish them from the NIC thread names. Signed-off-by: Sagiv Ozeri <sozeri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Tomer Tayar authored
Add a helper function to search the vm hash for a node with a given virtual address. As opposed to the current code, this function explicitly returns NULL when no node is found, instead of basing on the loop cursor object's value. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Tomer Tayar authored
'irq_handler' in gaudi2_enable_msix(), is just assigned with a function name and then used when calling request_threaded_irq(). Remove the variable and use the function name directly as an argument. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
-
Dafna Hirschfeld authored
We later use cpucp packet for soft reset which might fail so we should be able propagate the failure case. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
-
Tomer Tayar authored
Remove leading zeroes when printing the idle mask to make it clearer. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
-
Sagiv Ozeri authored
Make the comments align with the order of the fields in the structure Signed-off-by: Sagiv Ozeri <sozeri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
-
Tom Rix authored
smatch reports drivers/accel/habanalabs/common/device.c:2619:6: warning: symbol 'hl_capture_hw_err' was not declared. Should it be static? drivers/accel/habanalabs/common/device.c:2641:6: warning: symbol 'hl_capture_fw_err' was not declared. Should it be static? both are only used in device.c, so they should be static Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Tom Rix authored
Building with clang W=2 has several similar warnings drivers/accel/habanalabs/common/decoder.c:46:51: error: declaration shadows a variable in the global scope [-Werror,-Wshadow] static void dec_error_intr_work(struct hl_device *hdev, u32 base_addr, u32 core_id) ^ drivers/accel/habanalabs/common/security.h:13:26: note: previous declaration is here extern struct hl_device *hdev; ^ There is no global definition of hdev, so the extern is not needed. Searched with grep -r '^struct' . | grep hl_dev Change to an forward decl to resolve these issues drivers/accel/habanalabs/common/mmu/../security.h:133:40: error: ‘struct hl_device’ declared inside parameter list will not be visible outside of this definition or declaration [-Werror] 133 | bool (*skip_block_hook)(struct hl_device *hdev, | ^~~~~~~~~ Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
-
Dafna Hirschfeld authored
The cpu accessible dma allocations use the gen_pool api which actually does not allocate new memory from the system but manages memory already allocated before. When tracing this together with real dma allocation/free it cause confusing logs like a '0' dma address and a cpu address appearing twice etc. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
-
Dafna Hirschfeld authored
in the out_err flow, combine the two cases of soft-reset since they have mostly common code. In addition unlock reset_info.lock after touching reset count. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
-
Dafna Hirschfeld authored
Because this field is only used for debug print, we can do more precise debug directly instead. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
-
Koby Elbaz authored
To match their description above the function Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
-
Dafna Hirschfeld authored
Align assignment of reset_upon_device_release to the convention used in this function. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
-