- 09 Jun, 2021 33 commits
-
-
Jérome Perrin authored
-
Jérome Perrin authored
For historical reasons, PDF to text involved conversion first of the PDF to png, then this png to tiff and the tiff was sent to tesseract. This works, but it consumes a lot of resources with large PDFs, especially because the intermediate png/tiff are created with a resolution of 300 DPI, which easily needs serveral Go of RAM and temporary disk space. This was obsorved with the PDF created by erp5_document_scanner, which are usually high quality (1 or 2Mo per page) and even a one page PDF sometimes took more than one minute to OCR. Since 9.53 ghostscript integrates tesseract engine directly, we don't need to prepare a tiff beforehand, we can directly send the PDF data to ghostscript. These change use ghostscript if available and otherwise fallback to the same pipeline as before. This will allow the transition until all ERP5 instances are running a recent enough SlapOS with ghostscript 9.54. Fortunately, before SlapOS include ghostscript 9.54, ERP5 software release did not have ghostscript in $PATH, so we don't have to check ghostscript version, we assume that if gs is in $PATH, it means we have a recent enough SlapOS. This new approach was less tolerant regarding broken/password-protected PDFs so we perform a new check that the PDF is valid and not encrypted before trying to use OCR.
-
Jérome Perrin authored
-
Jérome Perrin authored
also enable more plugins: - https://ckeditor.com/cke4/addon/autolink - https://ckeditor.com/cke4/addon/pastebase64
-
Kirill Smelkov authored
Wendelin.core is now integral part of ERP5 (see [1,2]), but nothing inside ERP5 currently uses it. And even though wendelin.core has its own testsuite, integration problems are always possible. -> Add test to erp5_core_test that minimally makes sure that basic wendelin.core operations work. This test currently passes with wendelin.core 1, which is the default. It also passes as live test with wendelin.core 2. However with wendelin.core 2 it currently fails on testnodes like e.g. ValueError: ZODB.MappingStorage.MappingStorage is in-RAM storage in-RAM storages are not supported: a zurl pointing to in-RAM storage in one process would lead to another in-RAM storage in WCFS process. and RuntimeError: wcfs: join file:///srv/slapgrid/slappart8/srv/testnode/djk/test_suite/unit_test.2/var/Data.fs: server not started (https://nexedijs.erp5.net/#/test_result_module/20210530-92EF3124/102) because we need to amend ERP5 test driver 1) to run tests on a real storage instead of in-RAM Mapping Storage(*), and 2) to spawn WCFS server for each such storage. I will try to address those points in a later patch. In the meantime there should be no reason not to merge this, because we do not use wendelin.core 2 yet, and solving "1" and "2" first are preconditions to begin such a usage. /cc @rafael, @tomo, @seb, @jerome, @romain, @vpelletier, @Tyagov, @klaus, @jp (*) Combining Zope and WCFS working together requires data to be on a real storage, not on in-RAM MappingStorage inside Zope's Python process. [1] slapos@7f877621 [2] slapos!874 (comment 122339)
-
Jérome Perrin authored
For SEPA files "End to End ID" we need a reference on payment transactions. In the first prototype, the reference was set at the time of generating the file, but it's a non common behaviour to generate reference when producing reports, so we changed to ensuring payment has reference when adding to the group.
-
Jérome Perrin authored
Instead of using only setAggregate(), specify a portal type to consider only payment transaction groups.
-
Jérome Perrin authored
Because of the wrong TALES, jump and select was not working on the aggregate relation field on accounting transaction line.
-
Jérome Perrin authored
Support generating pain.001.001.02 credit transfer from payment transaction groups.
-
Jérome Perrin authored
-
Jérome Perrin authored
Having a portal type is required to use the listbox as "Proxy Listbox ID" in a relation field.
-
Jérome Perrin authored
Follow the same rule that allow other organisations from the same group to use bank account from the main section when selecting the payments and displaying the payments from the group.
-
Jérome Perrin authored
This constraint will have to be enabled by configurator/upgrader.
-
Jérome Perrin authored
-
Jérome Perrin authored
Instead of selecting already stopped payments, introduce a new "mode" field in the dialog, where user can choose the previous behavior of selecting stopped payments, or a new behavior where user would select planned or confirmed payments, in that case the payments will be automatically stopped before being added to the group.
-
Jérome Perrin authored
Functional test for erp5_payment_mean with ERP5JS
-
Jérome Perrin authored
Functional test for erp5_payment_mean
-
Jérome Perrin authored
Rename the field, because on xhtml_style, submitting the field with an empty limit now cause an error like: ValueError: invalid literal for int() with base 10: '' and on ERP5JS it is ignoed. Also, make the field required and set a small default value and a range, to prevent accidentally displaying/selecting too much when there's lot of data matching.
-
Jérome Perrin authored
-
Jérome Perrin authored
-
Jérome Perrin authored
We don't want to show the ID which has a prefix, but the reference
-
Jérome Perrin authored
By default, tesseract runs on 4 CPU and this can be controlled by OMP_THREAD_LIMIT=1 to run on only one CPU (as documented on https://tesseract-ocr.github.io/tessdoc/FAQ.html) In ERP5, we tend to use one zope node per CPU, so we don't want each of these zope nodes to spawn a process which will run on 4 CPU. In a quick benchmark it's not slower, even a bit faster to disable threads: ## a big image in france (a picture of an invoice) $ time ./bin/tesseract /tmp/input.tiff /tmp/out.txt Tesseract Open Source OCR Engine v4.1.1 with Leptonica Page 1 Error in pixClipBoxToForeground: box not within image Error in pixClipBoxToForeground: box not within image ________________________________________________________ Executed in 14.41 secs fish external usr time 27.88 secs 1002.00 micros 27.88 secs sys time 0.74 secs 0.00 micros 0.74 secs $ time OMP_THREAD_LIMIT=1 ./bin/tesseract /tmp/input.tiff /tmp/out.txt Tesseract Open Source OCR Engine v4.1.1 with Leptonica Page 1 Error in pixClipBoxToForeground: box not within image Error in pixClipBoxToForeground: box not within image ________________________________________________________ Executed in 12.58 secs fish external usr time 11.84 secs 955.00 micros 11.84 secs sys time 0.52 secs 503.00 micros 0.52 secs ## a small japanese image $ time ./tesseract -l jpn+eng /tmp/inputjp.tiff /tmp/out.txt Tesseract Open Source OCR Engine v4.1.1 with Leptonica Page 1 ________________________________________________________ Executed in 2.16 secs fish external usr time 3.77 secs 590.00 micros 3.77 secs sys time 0.27 secs 209.00 micros 0.27 secs $ time OMP_THREAD_LIMIT=1 ./tesseract -l jpn+eng /tmp/inputjp.tiff /tmp/out.txt Tesseract Open Source OCR Engine v4.1.1 with Leptonica Page 1 ________________________________________________________ Executed in 2.02 secs fish external usr time 1766.07 millis 1437.00 micros 1764.63 millis sys time 214.06 millis 522.00 micros 213.54 millis
-
Jérome Perrin authored
-
Jérome Perrin authored
because DMS extends image portal types with interaction workflows etc, it's better to also cover the case where DMS is installed.
-
Jérome Perrin authored
-
Jérome Perrin authored
-
Jérome Perrin authored
This fixes problem that some formats such as tiff were not supported.
-
Jérome Perrin authored
testSQLCachedWorklist is now part of a dedicated erp5_worklist_sql_test business template.
-
Jérome Perrin authored
We had two mimetypes entries, which caused inconsistencies depending on wether the lookup was done by mimetype, by glob or by extension. We had: - name: "Windows BMP image" - mimetypes: image/bmp image/x-bmp image/x-MS-bmp - extensions: - globs: *.bmp and - name: "image/x-ms-bmp" - mimetypes: image/x-ms-bmp - extensions: bmp - globs: With this commit they are merged into one: - name: "Windows BMP image" - mimetypes: image/x-ms-bmp image/bmp image/x-bmp image/x-MS-bmp - extensions: bmp - globs: *.bmp This way we only have one consistent mimetype. For compatibility with extension lookups (that are done in Document_guessMimeType interaction workflow from DMS), image/x-ms-bmp is kept as default. This might not be the best choice, according to https://www.iana.org/assignments/media-types/media-types.xhtml
-
Jérome Perrin authored
This script creates Web Message, not Mail Message
-
Jérome Perrin authored
When updating ghostscript, the rendering of images seems slightly different, as we can observe on the logos.
-
Roque authored
-
Roque authored
-
- 07 Jun, 2021 1 commit
-
-
Aurel authored
-
- 03 Jun, 2021 1 commit
-
-
Aurel authored
-
- 02 Jun, 2021 2 commits
- 26 May, 2021 3 commits
-
-
Aurel authored
-
Aurel authored
-
Jérome Perrin authored
Fixes [#20210517-960A47](https://erp5js.nexedi.net/#/bug_module/20210517-960A47) The most important changes are: - coding style is enabled again for workflow scripts and starts to be enabled for ERP5 Python scripts - monaco editor support for workflow scripts, SQL methods and .less - small fixes for python/workflow scripts forms and ZMI See merge request nexedi/erp5!1422
-