Wendelin needs path to system wide installed CUDA librararies so all other CUDA wrapper librararies like tensorflow and / or keras itself do need them
The commit itself might be too hackish so I accept suggestion for proper one (and place).
With this change, it means that we always have to install cuda there... by hand? It seems just wrong for many reasons....
Weird. A quicker hack would be to use distrib packages (nvidia-cuda-dev on Debian ?), and the path would be different. If you manage to install by hand, why not adding to SlapOS ?
IIRC, we avoid setting
LD_LIBRARY_PATHand instead rely on in-binary
RPATH. This is because environment variables can be accidentally mangled or not exported by bad wrapper scripts. Especially, linker directives get stripped whenever there is any privilege change (sudo, su, suid binary, maybe capabilities mask change too), which although not standard practice in slapos would make debugging more painful. @kazuhiko may have specific recommendations in this area.
From a slapos perspective, pointing at foreign stuff is of course bad, so I do not think we want this in the generic SR. It would be much better, as Julien said, if we can build it and hence set an internal relationship.
Then it opens the topic of hardware feature reliance: I am not too familiar with CUDA, how does it behave when there is no hardware support ? Is there a device enumeration API in the library so applications will see they successfuly link but still cannot use the library ? If this is the case, then I'm ok with adding cuda libraries along with software able to use it. If it is not the case, then it means CUDA libraries should only be made available when hardware is known to support it, which means no support in generic ERP5 SR.
Thank you, some notes:
Even if CUDA is not installed and these PATH is not existing then no problem (except ugly hack as I already admitted)
we can not build CUDA not for all parts we have source code (something which I forgot to mention). And sometimes we need even more CUDA libraries and only way to have them is install system wide
most wrapper libraries like Keras (and below Tensorflow) can be smart enough and run on CPU in case GPU is unavailable thus it's them who provide hardware transparency. For us this means that code developed for Wendelin and using Keras will simply run much much slower when GPU is not there
Maybe the proper way would be to have this change in https://lab.nexedi.com/nexedi/slapos/blob/master/software/wendelin/software-kerastensorflow.cfg
A quicker hack would be to use distrib packages [...] and the path would be different.
(I just realize that in this case, the libraries would be found without adding paths to LD_LIBRARY_PATH or RPATH.)
- we can not build CUDA not for all parts we have source code (something which I forgot to mention). And sometimes we need even more CUDA libraries and only way to have them is install system wide
I do not understand the two arguments in this point.
Does the first mean basically CUDA has components which are proprietary ? (I would guess to interface with Nvidia driver, for example)
And the second mentions "more" CUDA libraries... More versions of the CUDA library ? Or components of the library ?
@vpelletier , correct.
Nvidia has a closed source libraries, closed source Nvidia drivers, etc.
See list of GPU libraries here: https://developer.nvidia.com/gpu-accelerated-libraries