1. 04 Jun, 2021 1 commit
    • Jérome Perrin's avatar
      OCR: Tesseract 4.1.1 / Ghostscript 9.54.0 · ec129b70
      Jérome Perrin authored
      With tesseract v4.0.0-beta.3 we often observe crashes with:
      
      ```
      contains_unichar_id(unichar_id):Error:Assert failed:in file ../../src/ccutil/unicharset.h, line 511
      ```
      
      This seems to have been fixed by https://github.com/tesseract-ocr/tesseract/pull/1954
      
      Still, even after updating to 4.1.1, text recognition from PDF in ERP5 is too expensive. We also update Ghostscript to 9.54.0, because this version has built-in OCR, which does not need to convert the PDF to PNG then TIFF as we currently do in ERP5.
      
      See merge request nexedi/slapos!985
      ec129b70
  2. 03 Jun, 2021 2 commits
    • Thomas Gambier's avatar
      582b0b03
    • Jérome Perrin's avatar
      component/ghostscript: Workaround for slaprunner paths with double slashs · 1b291415
      Jérome Perrin authored
      This tessdata path will be included in cpp code by pre-processor macros
      https://github.com/ArtifexSoftware/ghostpdl/blob/gs9.54.0/base/tessocr.cpp#L188-L193
      Since // is the marker for a comment in cpp and as documented in
      https://gcc.gnu.org/onlinedocs/cpp/Stringizing.html "Comments are replaced by
      whitespace long before stringizing happens, so they never appear in stringized
      text", the STRINGIFY/STRINGIFY2 approach of including a path does not work
      when the path contain // , because anything after // is considered a comment
      and is not included, causing errors like this when using ghostscript with OCR
      in webrunner:
      
          $ strace -e open -o open.strace /srv/slapgrid/slappart42/srv/runner/shared/ghostscript/4387fe7a8d2034ac5691d43b58134248/bin/gs -sDEVICE=ocr
          GPL Ghostscript 9.54.0 (2021-03-30)
          Copyright (C) 2021 Artifex Software, Inc.  All rights reserved.
          This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
          see the file COPYING for details.
          Error opening data file ./eng.traineddata
          Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
          Failed loading language 'eng'
          Tesseract couldn't load any languages!
          **** Unable to open the initial device, quitting.
          $ grep eng open.strace
          open("./eng.traineddata", O_RDONLY)     = -1 ENOENT (No such file or directory)
          open("/srv/slapgrid/slappart42/srv/eng.traineddata", O_RDONLY) = -1 ENOENT (No such file or directory)
          open("eng.traineddata", O_RDONLY)       = -1 ENOENT (No such file or directory)
      
      eng.traineddata is looked up in /srv/slapgrid/slappart42/srv/ because
      ghostscript was configured with:
      
         --with-tessdata=/srv/slapgrid/slappart42/srv//runner//shared/ghostscript/4387fe7a8d2034ac5691d43b58134248/share/tessdata/
      
      and everything after // was stripped.
      
      This was reported upstream as https://bugs.ghostscript.com/show_bug.cgi?id=703905
      
      More about the case of // in slaprunner paths was on commit eb544196
      (slparunner: document the reasons why we keep srv//slaprunner, 2019-10-10)
      1b291415
  3. 02 Jun, 2021 4 commits
  4. 31 May, 2021 7 commits
  5. 28 May, 2021 2 commits
  6. 27 May, 2021 6 commits
  7. 26 May, 2021 4 commits
  8. 25 May, 2021 2 commits
  9. 24 May, 2021 1 commit
  10. 21 May, 2021 3 commits
  11. 20 May, 2021 2 commits
  12. 19 May, 2021 3 commits
  13. 17 May, 2021 1 commit
  14. 16 May, 2021 1 commit
    • Julien Muchembled's avatar
      ERP5: notebook 4.4.1 does not need argon2-cffi · 12eab3c1
      Julien Muchembled authored
      This fixes:
      
        Installing jupyter.
        While:
          Installing jupyter.
          Base installation request: ...
          Getting distribution for 'argon2-cffi'.
        Error: Picked: argon2-cffi = 20.1.0
      
      Addition of argon2-cffi in commit 7d1ea024
      was a last-minute change to fix jupyter SR.
      12eab3c1
  15. 15 May, 2021 1 commit