== Blueprint for Optical Character Recognition ==
=== Functionality ===
Be able to scan in a paper-based form to populate the database

 * http://wiki.sahana.lk/doku.php/sahanaocr
 * http://wiki.sahana.lk/doku.php?id=dev:sahana_xform
 * http://humanitariantech.com/2009/11/16/talking-papers-a-world-without-data-entry/

This would be useful if Sahana is being used to generate forms which are printed, and filled out by hand, then can be scanned back, directly, into the database.[[BR]]
It may be impractical to get people to fill out forms in handwriting which can be "recognized".
 * Being able to identify check-boxes being checked - and design forms which rely heavily on check boxes.
 * Being able to copy blocks of test out of a hand written form, and display it on screen, next to an editable text box, where the text can be "recognized" and entered manually.

=== Technology ===

The C++ code written for SahanaPHP (during GSoC 2007) can almost-certainly be tweaked to work with !SahanaPy:
 * http://sahana.cvs.sourceforge.net/viewvc/sahana/sahana-phase2/bin/ocr/?pathrev=rel_gsoc_2007
This version uses [http://opencv.willowgarage.com/wiki OpenCV] & [http://leenissen.dk/fann FANN]

A Firefox add-on to enable a nice workflow for users is being developed for SahanaPHP as part of GSoC 2009:
 * http://sahana.cvs.sourceforge.net/viewvc/sahana/sahana-phase2/bin/ocr/?pathrev=gsoc_2009
This will access the Scanner (e.g. using TWAIN or SANE) and read the Image. The acquired image will be passed to the OCR library & the result will be posted into the web form. [[BR]]
Again, this should be easy to tweak to get working with Py.

Possibility of using pytesser ( http://code.google.com/p/pytesser/ ) with cross platform tesseract-ocr ( http://code.google.com/p/tesseract-ocr/ )

Plone uses Tesseract: http://plone.org/documentation/tutorial/ocr-in-plone-using-tesseract-ocr

----
BluePrints