An error occurred fetching the project authors.
- 25 May, 2020 2 commits
-
-
Omer Shpigelman authored
MMU cache invalidation timeout indicates that the device is unstable and therefore unusable. Hence in such case do hard reset and return an error to the user if was called from ioctl. In addition, change the print to error level and rephrase its text. Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Oded Gabbay authored
GAUDI does not support soft-reset as it leaves the NIC ports in an awkward state, where their QMANs were reset but the NIC itself is still working. In addition, there is not much sense in doing soft-reset when training is done on multiple GAUDIs. Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by:
Tomer Tayar <ttayar@habana.ai>
-
- 19 May, 2020 12 commits
-
-
Ofir Bitton authored
Instead of writing similar event handling code for each ASIC, move the code to the common firmware file. This code will be used for GAUDI and all future ASICs. In addition, add two new fields to the auto-generated events file: valid and description. This will save the need to manually write the events description in the source code and simplify the code. Signed-off-by:
Ofir Bitton <obitton@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Oded Gabbay authored
Add the ASIC-dependent code for GAUDI. Supply (almost) all of the function callbacks that the driver's common code need to initialize, finalize and submit workloads to the GAUDI ASIC. It also contains the code to initialize the F/W of the GAUDI ASIC and to receive events from the F/W. Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Oded Gabbay authored
In Gaudi there is a feature of clock gating certain engines. Therefore, add this property to the device structure. In addition, due to a limitation of this feature, the driver needs to dynamically enable or disable this feature during run-time. Therefore, add ASIC interface functions to enable/disable this function from the common code. Moreover, this feature must be turned off when the user wishes to debug the ASIC by reading/writing registers and/or memory through the driver's debugfs. Therefore, add an option to enable/disable clock gating via the debugfs interface. Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Omer Shpigelman authored
Gaudi requires longer waiting during reset due to closing of network ports. Add this explanation to the relevant comment in the code and add a dedicated define for this reset timeout period, instead of multiplying another define. Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Omer Shpigelman authored
Coresight is not supported on simulator, therefore add a boolean for checking that (currently used by un-upstreamed code). Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Omer Shpigelman authored
Add the following two operations to the CS IOCTL: Signal: The signal operation is basically a command submission, that is created by the driver upon user request. It will be implemented using a dedicated PQE that will increment a specific SOB. There will be a new flag: HL_CS_FLAGS_SIGNAL. When the user set this flag in the CS IOCTL structure, the driver will execute a dedicated code path that will prepare this special PQE and submit it. The user only needs to provide a queue index on which to put the signal. Wait: The wait operation is also a command submission that is created by the driver upon user request. It will be implemented using a dedicated PQE that will contain packets of "ARM a monitor" + FENCE packet. There will be a new flag: HL_CS_FLAGS_WAIT. When the user set this flag in the CS structure, the driver will execute a dedicated code path that will prepare this special PQE and submit it. The user needs to provide the following parameters: 1. queue ID 2. an array of signal_seq numbers and the number of signals to wait on (the length of signal_seq_arr). The IOCTL will return the CS sequence number of the wait it put on the queue ID. Currently, the code supports signal_seq_nr==1. But this API definition will allow us to put a single PQE that waits on multiple signals. To correctly configure the monitor and fence, the driver will need to retrieve the specified signal CS object that contains the relevant SOB and its expected value. In case the signal CS has already been completed, there is no point of adding a wait operation. In this case, the driver will return to the user *without* putting anything on the PQ. The return code should reflect to the user that the signal was completed, as we won't return a CS sequence number for this wait. Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Omer Shpigelman authored
Define a structure representing the h/w sync object (SOB). a SOB can contain up to 2^15 values. Each signal CS will increment the SOB by 1, so after some time we will reach the maximum number the SOB can represent. When that happens, the driver needs to move to a different SOB for the signal operation. A SOB can be in 1 of 4 states: 1. Working state with value < 2^15 2. We reached a value of 2^15, but the signal operations weren't completed yet OR there are pending waits on this signal. For the next submission, the driver will move to another SOB. 3. ALL the signal operations on the SOB have finished AND there are no more pending waits on the SOB AND we reached a value of 2^15 (This basically means the refcnt of the SOB is 0 - see explanation below). When that happens, the driver can clear the SOB by simply doing WREG32 0 to it and set the refcnt back to 1. 4. The SOB is cleared and can be used next time by the driver when it needs to reuse an SOB. Per SOB, the driver will maintain a single refcnt, that will be initialized to 1. When a signal or wait operation on this SOB is submitted to the PQ, the refcnt will be incremented. When a signal or wait operation on this SOB completes, the refcnt will be decremented. After the submission of the signal operation that increments the SOB to a value of 2^15, the refcnt is also decremented. Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Omer Shpigelman authored
This feature requires handling h/w resources which are a bit different from one ASIC to the other. Therefore, we need to define a set of interfaces the ASIC code provides to the common code to signal, wait, reset sync object and to reset and init a queue. As this feature is not supported in Goya, provide an empty implementation of those functions. Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Omer Shpigelman authored
This is a pre-requisite to upstreaming GAUDI support. Signal/wait operations are done by the user to perform sync between two Primary Queues (PQs). The sync is done using the sync manager and it is usually resolved inside the device, but sometimes it can be resolved in the host, i.e. the user should be able to wait in the host until a signal has been completed. The mechanism to define signal and wait operations is done by the driver because it needs atomicity and serialization, which is already done in the driver when submitting work to the different queues. To implement this feature, the driver "takes" a couple of h/w resources, and this is reflected by the defines added to the uapi file. The signal/wait operations are done via the existing CS IOCTL, and they use the same data structure. There is a difference in the meaning of some of the parameters, and for that we added unions to make the code more readable. Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Ofir Bitton authored
Load CPU device boot loader during driver boot time in order to avoid flash write for every boot loader update. To preserve backward-compatibility, skip the device boot load if the device doesn't request it. Signed-off-by:
Ofir Bitton <obitton@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Christine Gharzuzi authored
Support hwmon_temp_reset_histroy, hwmon_in_reset_history and hwmon_curr_reset attribute which resets the historical highest value. Signed-off-by:
Christine Gharzuzi <cgharzuzi@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Tomer Tayar authored
Add a new opcode to the INFO IOCTL that retrieves the device time alongside the host time, to allow a user application that want to measure device time together with host time (such as a profiler) to synchronize these times. Signed-off-by:
Tomer Tayar <ttayar@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
- 17 May, 2020 6 commits
-
-
Oded Gabbay authored
When we have DMA QMAN with multiple streams, we need to know whether the command buffer contains at least one DMA packet in order to configure the barriers correctly when adding the 2xMSG_PROT at the end of the JOB. If there is no DMA packet, then there is no need to put engine barrier. This is relevant only for GAUDI as GOYA doesn't have streams so the engine can't be busy by another stream. Reviewed-by:
Tomer Tayar <ttayar@habana.ai> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Oded Gabbay authored
Retrieve from the firmware the DMA mask value we need to set according to the device's PCI controller configuration. This is needed when working on POWER9 machines, as the device's PCI controller is configured in a different way in those machines. Reviewed-by:
Tomer Tayar <ttayar@habana.ai> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Oded Gabbay authored
When doing training, the DL framework (e.g. tensorflow) performs hundreds of thousands of memory allocations and mappings. In case the driver needs to perform hard-reset during training, the driver kills the application and unmaps all those memory allocations. Unfortunately, because of that large amount of mappings, the driver isn't able to do that in the current timeout (5 seconds). Therefore, increase the timeout significantly to 30 seconds to avoid situation where the driver resets the device with active mappings, which sometime can cause a kernel bug. BTW, it doesn't mean we will spend all the 30 seconds because the reset thread checks every one second if the unmap operation is done. Reviewed-by:
Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Oded Gabbay authored
Move the code of device CPU initialization from being ASIC-Dependent to common code. In addition, add support for the new error reporting feature of the firmware boot code. Reviewed-by:
Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Omer Shpigelman authored
We want to remove the following restrictions/assumptions in our driver: 1. The H/W queue index is also the completion queue index. 2. The H/W queue index is also the IRQ number of the completion queue. 3. All queues of the same type have consecutive indexes. Therefore we add the support for H/W queues of the same type with nonconsecutive indexes and completion queue index and IRQ number different than the H/W queue index. Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Omer Shpigelman authored
Stop-on-error mode in DMA is useful as it stops the transaction immediately upon error e.g. page fault. But it may cause the next command submission to fail as is leaves the DMA in unstable state. Therefore we remove the stop-on-error configuration from the DMA. Stop-on-err is still available for debug. Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
- 24 Mar, 2020 7 commits
-
-
Oded Gabbay authored
If a GAUDI device is present in the system, display an error message that it is not supported by the current kernel. Reviewed-by:
Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Moti Haimovski authored
The hl read and write routines implement the hwmon_ops read and write interface routines respectively. These routines are expected to return a completion status when called, which was not the case until this commit. This commit modifies these routines to return 0 upon success and a negative error value upon failure. Signed-off-by:
Moti Haimovski <mhaimovski@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Moti Haimovski authored
This commit adds support for offsetting the temperatures reading by a specified value as defined in https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface using the standard sysfs defined for hwmon. This is required by system administrators to inject errors to test their monitoring applications in data centers. Signed-off-by:
Moti Haimovski <mhaimovski@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Moti Haimovski authored
Allow debug user to write/read 64-bit data through debugfs. This will expedite the dump process of the (large) internal memories of the device done during debug. Signed-off-by:
Moti Haimovski <mhaimovski@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Tomer Tayar authored
As HL_MAX_JOBS_PER_CS is 512, it is possible that more than 255 CS jobs will be submitted for a certain queue. Hence, modify the "jobs_in_queue_cnt" parameter of the "hl_cs" structure to be u16 instead of u8. Signed-off-by:
Tomer Tayar <ttayar@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Omer Shpigelman authored
Host memory may be allocated with huge pages. A different virtual range may be used for mapping in this case. Add Huge PCI MMU (HPMMU) properties to support it. This patch is a prerequisite for future ASICs support and has no effect on Goya ASIC as currently a single virtual host range is used for all page sizes. Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Pawel Piskorski authored
Optimize hl_mmu_map and hl_mmu_unmap by not calling flush(ctx) within per-page loop. Signed-off-by:
Pawel Piskorski <ppiskorski@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
- 21 Nov, 2019 10 commits
-
-
Omer Shpigelman authored
Split the properties used for MMU mappings to DRAM and PCI (host) types. This is a prerequisite for future ASICs support. Note that in Goya ASIC, the PMMU and DMMU are the same (except of page sizes) as only one MMU mechanism is used for both of the mapping types. Hence this patch should not have any effect on current behavior. Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Omer Shpigelman authored
Add the ability to invalidate the necessary MMU cache only. This ability is a prerequisite for future ASICs support. Note that in Goya ASIC, a single cache is used for both host/DRAM mappings and hence this patch should not have any effect on current behavior. Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Omer Shpigelman authored
Some of the functions in the memory module code were too long and/or contained multiple operations that are not always done together. Re-factor the code by dividing those functions to smaller functions which are more readable and maintainable. Signed-off-by:
Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Oded Gabbay authored
The two defines that control the maximum size of a command buffer and the maximum number of JOBS per CS need to be exported to the user as they are part of the API towards user-space. Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by:
Omer Shpigelman <oshpigelman@habana.ai>
-
Oded Gabbay authored
In training, there is a need for a large amount of patching to the recipe. This results in many command buffers contains a lot of DMA packets. The number of command buffers per CS is larger than the current maximum of 64, which is an arbitrary number that is enough for inference, but it has no real affect on the code and/or resources of the host machine. Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by:
Omer Shpigelman <oshpigelman@habana.ai>
-
Oded Gabbay authored
Add a new opcode to the INFO IOCTL to allow the user application to retrieve the ASIC's current and maximum clock rate. The rate is returned in MHz. Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by:
Tomer Tayar <ttayar@habana.ai>
-
Oded Gabbay authored
Reduce latency to memory during TPC kernel execution. Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by:
Tomer Tayar <ttayar@habana.ai>
-
Tomer Tayar authored
This patch adds a support for a new H/W queue type. This type of queue is for DMA and compute engines jobs, for which completion notification are sent by H/W. Command buffer for this queue can be created either through the CB IOCTL and using the retrieved CB handle, or by preparing a buffer on the host or device SRAM/DRAM, and using the device address to that buffer. The patch includes the handling of the 2 options, as well as the initialization of the H/W queue and its jobs scheduling. Signed-off-by:
Tomer Tayar <ttayar@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Tomer Tayar authored
Jobs on some queues must be provided with a handle to a driver command buffer object, while for other queues, jobs must be provided with an address to a command buffer. Currently the distinction is done based on the queue type, which is less flexible if the same queue type behaves differently on different types of ASICs. This patch adds a new queue property for this target, which is configured per queue type per ASIC type. Signed-off-by:
Tomer Tayar <ttayar@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
Tomer Tayar authored
s/paerser/parser/ s/requeusted/requested/ s/an JOB/a JOB/ Signed-off-by:
Tomer Tayar <ttayar@habana.ai> Reviewed-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com>
-
- 05 Sep, 2019 3 commits
-
-
Oded Gabbay authored
When using the macro le32_to_cpu(x), we need to correctly convert x to be __le32 in case it is defined as u32 variable. Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by:
Tomer Tayar <ttayar@habana.ai>
-
Oded Gabbay authored
We want to stop using the acronym KMD. Therefore, replace all locations (except for register names we can't modify) where KMD is written to other terms such as "Linux kernel driver" or "Host kernel driver", etc. Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by:
Omer Shpigelman <oshpigelman@habana.ai>
-
Oded Gabbay authored
Add a new opcode to INFO IOCTL to retrieve aggregate H/W events. i.e. the events counters are NOT cleared upon device reset, but count from the loading of the driver. Add the code to support it in the device event handling function. Signed-off-by:
Oded Gabbay <oded.gabbay@gmail.com> Reviewed-by:
Omer Shpigelman <oshpigelman@habana.ai>
-