wiki:BluePrintOCR

Version 8 (modified by Michael Howden, 12 years ago) ( diff )

--

Blueprint for Optical Character Recognition

Functionality

Be able to scan in a paper-based form to populate the database

This would be useful if Sahana is being used to generate forms which are printed, and filled out by hand, then can be scanned back, directly, into the database.
It may be impractical to get people to fill out forms in handwriting which can be "recognized".

  • Being able to identify check-boxes being checked - and design forms which rely heavily on check boxes.
  • Being able to copy blocks of test out of a hand written form, and display it on screen, next to an editable text box, where the text can be "recognized" and entered manually.

Technology

The C++ code written for SahanaPHP (during GSoC 2007) can almost-certainly be tweaked to work with SahanaPy:

This version uses OpenCV & FANN

A Firefox add-on to enable a nice workflow for users is being developed for SahanaPHP as part of GSoC 2009:

This will access the Scanner (e.g. using TWAIN or SANE) and read the Image. The acquired image will be passed to the OCR library & the result will be posted into the web form.
Again, this should be easy to tweak to get working with Py.

Possibility of using pytesser ( http://code.google.com/p/pytesser/ ) with cross platform tesseract-ocr ( http://code.google.com/p/tesseract-ocr/ )

Plone uses Tesseract: http://plone.org/documentation/tutorial/ocr-in-plone-using-tesseract-ocr


BluePrints

Attachments (1)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.