SpreadsheetImporter – SahanaEden

Context Navigation

Version 9 (modified by Fran Boon, 16 years ago) ( diff )
--

Haiti

Spreadsheet Importer

We have access to a lot of relatively unstructured data in Spreadsheets which we need to import into Sahana. Currently this is done via a laborious Data Entry process.

We really need to be able to do Bulk Imports.

Example Data sources:

http://spreadsheets.google.com/ccc?key=0Aq_3OYelM4ZUdHJyc0lUekZLUmEwa3ZSU2dJS3YyWWc&hl=en_GB (Data Entry tab)
- The OCHA contacts list is a task we need to do regularly, so need to be able to handle duplicates effectively. We can influence them to standardise their format to make it easier.

Options:

CSV Import

Here is a process that works, although is a bit cumbersome, so we should look to automate as much as possible & improve:

Get a spreadsheet of good Data together (e.g. using Google Docs)
Save as CSV
Open in Open Office (MS Excel destroys data - do NOT use!)
Export copies of the tables you wish to import to in CSV format, e.g.:
- /gis/location.csv
- /hms/hospital.csv
Plan how you are going to deal with duplicates between these datasets.
- e.g. copy exported data into Spreadsheet & then clear table: db.module_resource.truncate()
Plan the order of imports:
- Need to import referenced tables before those which refer to them, e.g.:
  - pr_person, gis_location, or_organisation, or_office, or_contact
Copy the data from the datasheet into the one with the column headers
- NB All column headers need to be present, named correctly, in the correct order
- No blank rows in the sheet
Upload the file to the server: /home/haiti/web2py
Do a Test Import on Dev
- Check that data looks good

Repeat on Prod:

vim /etc/crontab
#0-59/1 * * * * www-data cd /home/haiti/web2py/ && python web2py.py -C -D 1 >> /tmp/cron.output 2>&1
vim /home/haiti/prod/models/00_db.py
 migrate=True
rm /etc/apache2/sites-enabled/prod-ssl
/etc/init.d/apache2 force-reload
/home/haiti/update-prod
/home/haiti/shell-prod
db(db.gis_location.id>0).select().last().id
db.gis_location.import_from_csv_file(open('hospital_locations.csv','rb'))
rows = db(db.gis_location.id>0).select()
for row in rows:
    if row.lon and row.lat and not row.wkt:
        db.gis_location[row.id] = dict(wkt = 'POINT(%f %f)' % (row.lon, row.lat))
    if not row.deleted:
        db.gis_location[row.id] = dict(deleted = False)
    if not row.created_on:
        db.gis_location[row.id] = dict(created_on = request.utcnow)
    if not row.modified_on:
        db.gis_location[row.id] = dict(modified_on = request.utcnow)

db.commit()
Ctrl+D
Set the initial ID in import sheet to be the last+1 & then continue on
/home/haiti/shell-prod
db.hms_hospital.import_from_csv_file(open('hospitals_import.csv','rb'))
rows = db(db.hms_hospital.id>0).select()
for row in rows:
    if not row.deleted:
        db.hms_hospital[row.id] = dict(deleted = False)
    if not row.created_on:
        db.hms_hospital[row.id] = dict(created_on = request.utcnow)
    if not row.modified_on:
        db.hms_hospital[row.id] = dict(modified_on = request.utcnow)

db.commit()
Ctrl+D
ln -s /etc/apache2/sites-available/prod-ssl /etc/apache2/sites-enabled
/etc/init.d/apache2 force-reload
vim /home/haiti/prod/models/00_db.py
 migrate=False
vim /etc/crontab
0-59/1 * * * * www-data cd /home/haiti/web2py/ && python web2py.py -C -D 1 >> /tmp/cron.output 2>&1

ToDo

Do data-de-duplication
- hook into the Model's validators
Provide a Controller to allow web-based access at: /admin/import_data (The current implementation broke at some point)
Use functions imported from a new modules/s3gis.py for the WKT-creation

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text

Context Navigation

Spreadsheet Importer

CSV Import

ToDo

Google Spreadsheets API

Excel

Other

Download in other formats: