-
francois authored
This commit contain the business template that take a receipt image as a source, binarize then segmentize it, and apply OCR on it. It then extract the meaning with regular expressions. The image should already be loaded inside the image module before it can be read. The business template contain: * The receipt recognition module * An extension containing the code that binarize, crop and segmentize the image then analize it. * A new type "Receipt" that contain a source image and the field that contain the "total" value * A portal skin folder containing the extension externalMethods aswell as the conversion script that call the recognition and update the Receipt "total" field Improvements (not limited to this list): - Easier loading of picture: directly from the receipt page. - Easier loading of picture 2: from phone with OfficeJS (or any renderJS) application? - Detect when images are sideway and rotate them straight - Better "boxing" and segmentation: some lines are deleted from the original image during the segmentation when they are too close from other - Modify the neural network (lstm) to increase weight of signs like $, euro, / and numbers - Use of a faster/smaller neural network: Most of the time is lost with the loading of the neural network - Caching the neural network: See previous statement. - Extract currency, date and receipt emettor. - Use a neural network for the meaning extraction?
4ba30106