Version 9 (modified by Fran Boon, 14 years ago) ( diff )



Spreadsheet Importer

We have access to a lot of relatively unstructured data in Spreadsheets which we need to import into Sahana. Currently this is done via a laborious Data Entry process.

We really need to be able to do Bulk Imports.

Example Data sources:


CSV Import

Here is a process that works, although is a bit cumbersome, so we should look to automate as much as possible & improve:

  1. Get a spreadsheet of good Data together (e.g. using Google Docs)
  2. Save as CSV
  3. Open in Open Office (MS Excel destroys data - do NOT use!)
  4. Export copies of the tables you wish to import to in CSV format, e.g.:
    • /gis/location.csv
    • /hms/hospital.csv
  5. Plan how you are going to deal with duplicates between these datasets.
    • e.g. copy exported data into Spreadsheet & then clear table: db.module_resource.truncate()
  6. Plan the order of imports:
    • Need to import referenced tables before those which refer to them, e.g.:
      • pr_person, gis_location, or_organisation, or_office, or_contact
  7. Copy the data from the datasheet into the one with the column headers
    • NB All column headers need to be present, named correctly, in the correct order
    • No blank rows in the sheet
  8. Upload the file to the server: /home/haiti/web2py
  9. Do a Test Import on Dev
    • Check that data looks good
  10. Repeat on Prod:
    vim /etc/crontab
    #0-59/1 * * * * www-data cd /home/haiti/web2py/ && python -C -D 1 >> /tmp/cron.output 2>&1
    vim /home/haiti/prod/models/
    rm /etc/apache2/sites-enabled/prod-ssl
    /etc/init.d/apache2 force-reload
    rows = db(>0).select()
    for row in rows:
        if row.lon and and not row.wkt:
            db.gis_location[] = dict(wkt = 'POINT(%f %f)' % (row.lon,
        if not row.deleted:
            db.gis_location[] = dict(deleted = False)
        if not row.created_on:
            db.gis_location[] = dict(created_on = request.utcnow)
        if not row.modified_on:
            db.gis_location[] = dict(modified_on = request.utcnow)
    Set the initial ID in import sheet to be the last+1 & then continue on
    rows = db(>0).select()
    for row in rows:
        if not row.deleted:
            db.hms_hospital[] = dict(deleted = False)
        if not row.created_on:
            db.hms_hospital[] = dict(created_on = request.utcnow)
        if not row.modified_on:
            db.hms_hospital[] = dict(modified_on = request.utcnow)
    ln -s /etc/apache2/sites-available/prod-ssl /etc/apache2/sites-enabled
    /etc/init.d/apache2 force-reload
    vim /home/haiti/prod/models/
    vim /etc/crontab
    0-59/1 * * * * www-data cd /home/haiti/web2py/ && python -C -D 1 >> /tmp/cron.output 2>&1


  • Do data-de-duplication
    • hook into the Model's validators
  • Provide a Controller to allow web-based access at: /admin/import_data (The current implementation broke at some point)
  • Use functions imported from a new modules/ for the WKT-creation

Google Spreadsheets API




Note: See TracWiki for help on using the wiki.