Lighter processing for OCR activities
When running OCR, we sometimes have issues because processing is "too heavy":
-
use 2 or 3 Go of disk space for a one page PDF created by erp5_document_scanner, because we convert pdf -> png -> tiff before sending to tesseract. Modern Ghostscript supports running tesseract directly, so we use it if it's available. -
use 300% of CPU. Fixed by setting OMP_THREAD_LIMIT
when running tesseract. This will only apply when OCR from Images. OCR embedded in Ghostscript does not seem to need this. -
... and often crash, so is restarted. This is fixed by updated tesseract.
Updates of ghostscript and tesseract are part of slapos!985 (merged)