BluePrintOCR – SahanaEden

Context Navigation

Version 8 (modified by Michael Howden, 15 years ago) ( diff )
--

Blueprint for Optical Character Recognition

Functionality

Be able to scan in a paper-based form to populate the database

This would be useful if Sahana is being used to generate forms which are printed, and filled out by hand, then can be scanned back, directly, into the database.
It may be impractical to get people to fill out forms in handwriting which can be "recognized".

Being able to identify check-boxes being checked - and design forms which rely heavily on check boxes.
Being able to copy blocks of test out of a hand written form, and display it on screen, next to an editable text box, where the text can be "recognized" and entered manually.

Technology

The C++ code written for SahanaPHP (during GSoC 2007) can almost-certainly be tweaked to work with SahanaPy:

http://sahana.cvs.sourceforge.net/viewvc/sahana/sahana-phase2/bin/ocr/?pathrev=rel_gsoc_2007

This version uses OpenCV & FANN

A Firefox add-on to enable a nice workflow for users is being developed for SahanaPHP as part of GSoC 2009:

http://sahana.cvs.sourceforge.net/viewvc/sahana/sahana-phase2/bin/ocr/?pathrev=gsoc_2009

This will access the Scanner (e.g. using TWAIN or SANE) and read the Image. The acquired image will be passed to the OCR library & the result will be posted into the web form.
Again, this should be easy to tweak to get working with Py.

Possibility of using pytesser ( http://code.google.com/p/pytesser/ ) with cross platform tesseract-ocr ( http://code.google.com/p/tesseract-ocr/ )

Plone uses Tesseract: http://plone.org/documentation/tutorial/ocr-in-plone-using-tesseract-ocr

BluePrints

Attachments (1)

xforms_ocr.png (25.0 KB ) - added by Dominic König 14 years ago.

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text