Commit 5f616dd7 authored by Jérome Perrin's avatar Jérome Perrin

component/tesseract: version up 4.1.1

Also enable shared parts and provide new languages:
Simplied Chinese, Japanese and French
parent 4dbc46ff
......@@ -10,43 +10,34 @@ extends =
../fontconfig/buildout.cfg
../lcms/buildout.cfg
../pkgconfig/buildout.cfg
./buildout.hash.cfg
parts =
tesseract
tesseract-traineddata
tesseract-eng-traineddata
tesseract-osd-traineddata
[tesseract]
recipe = slapos.recipe.cmmi
url = https://github.com/tesseract-ocr/tesseract/archive/6b250b58121a9858d3e3019a78a6f7d421bd0fc7.tar.gz
md5sum = fdc38148ad8eb1bd0485a217503dd6d5
shared = true
url = https://github.com/tesseract-ocr/tesseract/archive/refs/tags/4.1.1.tar.gz
md5sum = 51fe2bcbff1bbce77a25d180fd247f7d
pkg_config_depends = ${leptonica:location}/lib/pkgconfig:${fontconfig:location}/lib/pkgconfig:${fontconfig:pkg_config_depends}:${lcms2:location}/lib/pkgconfig:${xz-utils:location}/lib/pkgconfig
pre-configure =
autoreconf -ivf -I${pkgconfig:location}/share/aclocal -I${libtool:location}/share/aclocal -Wno-portability
configure-options =
--disable-static
--datarootdir=${tesseract-traineddata:location}
# XXX: tesseract seems not easily configurable at runtime about where to find
# its trained data, so we set its datarootdir above to a controlled location
  • @jerome is there any reason to remove datarootdir ? doesn't work any more on 4.1.1 ?

    convert tiff_to_text doesn't work because failed to load training data

    Error Type: EnvironmentError
    Error Value: Command ('/srv/slapgrid/slappart49/srv/runner/instance/slappart5/bin/tesseract', '/srv/slapgrid/slappart49/srv/runner/instance/slappart5/tmp/tmp3WrN7h/input.tiff', '/srv/slapgrid/slappart49/srv/runner/instance/slappart5/tmp/tmp3WrN7h/output') exited with status 1. Command output: Error opening data file /srv/slapgrid/slappart49/srv/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.
  • i tried a bit, datarootdir option still works, i'll fix it

  • @xiaowu.zhang the reason was to make it possible to install as shared = true.

    But the problem might be different, I had similar errors while development and I thought I fixed everything. There's a post-make-hook which shall copy the files in the default location and "it should work". If I remember correctly, these tests are using tesseract from tiff_to_text.

    There was then another problem when installing in "root slapos" with different users, fixed in 7df9bc95 , but from the path you have this seems like a web runner.

    Are you using software/erp5 or do you have more customization ? if I can access this machine (send me an email if you want), I can take a look, but I think we should first check for a problem in the software installation, because I believe this case is be covered by test.

  • ( post make hook is in this patch, it's this part:

    [tesseract-download-traineddata]
    post-make-hook = ${:_profile_base_location_}/${download-tessdata.py:filename}#${download-tessdata.py:md5sum}:post_make_hook

    )

  • for reference, it was another problem because of slaprunner double slashes !1040 (merged)

Please register or sign in to reply
environment =
PATH=${pkgconfig:location}/bin:${autoconf:location}/bin:${automake:location}/bin:${libtool:location}/bin:${m4:location}/bin:${patch:location}/bin:%(PATH)s
PKG_CONFIG_PATH=${:pkg_config_depends}
LDFLAGS=-L${leptonica:location}/lib -Wl,-rpath=${leptonica:location}/lib -L${jbigkit:location}/lib -Wl,-rpath=${jbigkit:location}/lib -L${zlib:location}/lib -Wl,-rpath=${zlib:location}/lib
[tesseract-traineddata]
location = ${buildout:parts-directory}/${:_buildout_section_name_}
post-make-hook = ${tesseract-download-traineddata:post-make-hook}
tessdata-urls = ${tesseract-download-traineddata:urls}
tessdata-location = @@LOCATION@@/share/tessdata/
[tesseract-eng-traineddata]
recipe = slapos.recipe.build:download
destination = ${tesseract-traineddata:location}/tessdata/eng.traineddata
url = https://github.com/tesseract-ocr/tessdata/raw/590567f20dc044f6948a8e2c61afc714c360ad0e/eng.traineddata
md5sum = 57e0df3d84fed9fbf8c7a8e589f8f012
[tesseract-osd-traineddata]
recipe = slapos.recipe.build:download
destination = ${tesseract-traineddata:location}/tessdata/osd.traineddata
url = https://github.com/tesseract-ocr/tessdata/raw/590567f20dc044f6948a8e2c61afc714c360ad0e/osd.traineddata
md5sum = 7611737524efd1ce2dde67eff629bbcf
[tesseract-download-traineddata]
post-make-hook = ${:_profile_base_location_}/${download-tessdata.py:filename}#${download-tessdata.py:md5sum}:post_make_hook
urls =
https://raw.githubusercontent.com/tesseract-ocr/tessdata/4.1.0/eng.traineddata#57e0df3d84fed9fbf8c7a8e589f8f012
https://raw.githubusercontent.com/tesseract-ocr/tessdata/4.1.0/osd.traineddata#7611737524efd1ce2dde67eff629bbcf
https://raw.githubusercontent.com/tesseract-ocr/tessdata/4.1.0/fra.traineddata#a73e70c872f262895d93976febeb1638
https://raw.githubusercontent.com/tesseract-ocr/tessdata/4.1.0/jpn.traineddata#af3a30a9bec904e106aa8521e7caaeca
https://raw.githubusercontent.com/tesseract-ocr/tessdata/4.1.0/chi_sim.traineddata#6965cb3213edd961cb16264e2ea45f5c
[download-tessdata.py]
filename = download-tessdata.py
md5sum = 02960cbbe84f484a532ee4e279b87150
# This is a post-make hook script to download tesseract training data.
#
# This script uses the following buildout options:
# - tessdata-urls: list of URLs and their expected md5sum as URL fragments
# - tessdata-location: path where to install the data.
import zc.buildout
import os
def post_make_hook(options, buildout, env):
os.makedirs(options['tessdata-location'])
download = zc.buildout.download.Download(
buildout['buildout'],
hash_name=True,
)
for url in options['tessdata-urls'].splitlines():
url, _, md5sum = url.partition('#')
if url:
download(
url,
md5sum=md5sum,
path=os.path.join(options['tessdata-location'],
os.path.basename(url)),
)
......@@ -65,8 +65,6 @@ parts +=
slapos-cookbook
mroonga-mariadb
tesseract
tesseract-eng-traineddata
tesseract-osd-traineddata
zabbix-agent
# Buildoutish
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment