wiki:BluePrint/OCRIntegration

Sahana Eden OCR Integration

The Optical Character Recognition Software of Sahana Eden has some additional dependencies as well as can be configured according to the needs.

If OCR module is not enabled, it can be enabled by un-commenting the ocr block in models/000_config.py in eden directory.

Dependencies

python modules

  1. python-lxml
  2. python-imaging (PIL)
  3. python-reportlab

command-line tools

  1. Imagemagick 'convert'
  2. Tesseract 3.00-1
    apt-get install -y imagemagick
    # Old versions:
    #apt-get install -y libleptonica-dev tesseract-ocr
    wget http://www.leptonica.com/source/leptonica-1.69.tar.gz
    tar zxvf leptonica-1.69.tar.gz
    cd leptonica-1.69
    ./configure
    make
    make install
    cd ..
    wget http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.02.tar.gz
    tar zxvf tesseract-ocr-3.02.02.tar.gz
    cd tesseract-ocr
    ./configure
    make
    make install
    cd ..
    

Configuration

Exclude Component Tables

Each Resource table in Sahana Eden can have several component tables. Many a times when generating paper based PDF Form for including some components makes a little sense.

For example, for hospital registry Form, if the staff component table is included then it makes very little sense because no one would like to add single staff to a hospital and therefore he/she would like to exclude that component and have the Form associated to component table separately.

This exclusion of component table for Resource can be done inside method get_pdf_excluded_fields which is present in modules/s3/s3cfg.py, so before generating a PDF Form s3pdf.py reads this configuration.

Example Configuration:

    def get_pdf_excluded_fields(self, resourcename):
        excluded_fields_dict = {
            "hms_hospital" : [
                "hrm_human_resource",
                ],

            "pr_group" : [
                "pr_group_membership",
                ],
            }
        excluded_fields =\
                excluded_fields_dict.get(resourcename, [])

        return excluded_fields

In the above configuration, we have excluded hrm_human_resource component of hms_hospital and pr_group_membership component of pr_group

Workflow Diagrams

Generating PDF Forms

http://eden.sahanafoundation.org/raw-attachment/wiki/BluePrint/OCRIntegration/generated.png

Data import from image to Text

http://eden.sahanafoundation.org/raw-attachment/wiki/BluePrint/OCRIntegration/importflow.png

Review User Interface

http://eden.sahanafoundation.org/raw-attachment/wiki/BluePrint/OCRIntegration/reviewUI.png

Last modified 12 years ago Last modified on 01/28/13 18:50:26

Attachments (3)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.