OCR: Tesseract 4.1.1 / Ghostscript 9.54.0 (!985) · Merge requests · nexedi / slapos

Merged Jérome Perrin requested to merge feat/tesseract-version-up into master May 20, 2021

With tesseract v4.0.0-beta.3 we often observe crashes with:

contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 511

This seems to have been fixed by https://github.com/tesseract-ocr/tesseract/pull/1954

Still, even after updating to 4.1.1, text recognition from PDF in ERP5 is too expensive. We also update Ghostscript to 9.54.0, because this version has built-in OCR, which does not need to convert the PDF to PNG then TIFF as we currently do in ERP5.

Edited May 25, 2021 by Jérome Perrin