wiki:UserGuidelines/Importer

Version 42 (modified by Fran Boon, 12 years ago) ( diff )

--

Importer

Spreadsheets

3 options available:

  • CSV (preferred)
  • Open Office
  • Excel via interactive UI (incomplete)

CSV

Can import data from spreadsheets in CSV (Comma Separated Values) format as long as it is formatted properly.

This can be done to PrePopulate an instance.

After 1st run, there are menu links for many resources, but any REST controller can have the 'import' method invoked if there is a suitable .xsl file as static/formats/s3csv/module/resource.xsl:

http://host.domain/module/resource/import

Files & Stylesheets can be loaded from alternate sources, such as:

http://host.domain/eden/hrm/person.s3csv/create?fetchurl=http://my.host/HR.csv&transform=/home/web2py/applications/eden/static/formats/s3csv/hrm_import.xsl

Alternately these can be done using CLI scripts:

Old: Web2Py native

This is 1 sheet per-table, not per-resource & foreign keys need to be as numeric IDs, so there is a fair amount of setup.

Can do this via CLI (put file 'volunteers.csv' into private/import):

import_file = os.path.join(request.folder, "private", "import", "volunteers.csv")
table.import_from_csv_file(open(import_file, "r"))

or Web UI (to be checked):

Open Office

Can import data from spreadsheets in .fods format (e.g. created by Open Office).

This is best as it handles complex resources.

via web:

(1st is path on local hard disk. If this doesn't work (e.g. 'failed to load external entity' then try loading onto a webserver)

or from Web2Py CLI:

resource = s3mgr.define_resource("hms", "hospital")
template = os.path.join(request.folder, resource.XSLT_PATH, "fods", "import.xsl")
resource.import_xml("myhospitals.fods", template=template)

Structure of the file should be as follows:

  • Sheet name = resource name
  • First row = attribute/field names
  • Other rows = attribute/field values

References

  1. Insert another sheet with the entries of the referenced table, this sheet must have the same name as the referenced table and a column "uuid"
  2. Insert a column in the main sheet with a column title like:
    reference:<fieldname>:<referenced_table>
    

e.g. "reference:organisation_id:org_organisation"

  1. fill in this column with the UUIDs of the respective entries in the referenced table

Components:

  1. Insert another sheet with the entries of the component table, this sheet must have a name like:
    <main-table-name>+<component-table-name>
    

e.g. "hms_hospital+hms_hactivity"

  1. Insert a column into the component sheet with a name like:
    <main-table-name>.<key-field>
    

e.g. "hms_hospital.gov_uuid"
=> the key field must be present in the main table
=> the key field must be unique (primary key) in the main table

  1. Fill in this column with the respective values of the key field in the parent entry of the main table

Excel via interactive UI

  • GSoC project - merged with Trunk, but disabled by default as not fully working

Outline

  • Spreadsheet importer, in its current form, can be used to import data in .XLS files to the Sahana Eden database
  • It has a spreadsheet like interface to enable users to edit spreadsheets in the browser itself.
  • Referenced resources, like pr_person in org_staff, can be imported as well.
  • Records, which could not be validated, are not inserted into the database. These records are displayed to the users in a spreadsheet interface. Records have at least one field in red. This is the field with an incorrect value. The format in which errors are displayed is : <field-name>: <error> . You entered <value entered by user> . Records can then be corrected and re-imported.
  • The spreadsheet importer uses the Jaro-Winkler metric to determine the similarity of two rows. Similarity between rows is checked two at a time. Similar rows are displayed consecutively in another spreadsheet like interface. The user can choose to edit, delete and then import these rows to the database. Errors in these rows are displayed in the same form as described previously.

Steps

Video Demo

  1. Import spreadsheet
    • The importer can be accessed at http://<web address>/<application name>/importer . This is the landing page for the importer.
    • Click on spreadsheet.
    • Upload a spreadsheet. Add a name to the spreadsheet record, and a comment if you wish.
    • Tips:
      • Keep a single worksheet in the spreadsheet.
      • Keep the number of rows around 100. More rows require more rendering and processing,therefore more time. This could also cause server requests to time out. (Please note, 100 rows is by no means a limitation, it is a simply a safe number of rows to render and import, keeping in mind the strength of internet connection. The number of rows isn't really a problem in an offline client of course)
  1. Edit spreadsheet
    • You can make changes to the spreadsheet in this step. You can modify cells, delete and add rows.
    • You can search for data. Search returns the first occurrence of a the string in the spreadsheet. Substrings in cells are also matched.
    • Rows can be deleted by either clicking the cross on the right end of each row.
    • A checkbox on the left of each row selects that row. You can click the delete button in the tool bar to delete the select row. You can only select one row at a time.
    • Prior to clicking import, select a row which has titles for each column in the spreadsheet. You will be prompted to select one if you haven't done so already.
  1. Select module and resource
    • This step of the spreadsheet importer wizard has two sections, so to speak.
    • On the left is a list of modules in the Sahana Eden installation. You can select a single module from this list.
    • As soon as a selection is made from the modules list, the panel on the right updates to show which resources are present in that module. You can select one resource.
    • When you click the next button, the module fetches resources and fields from the server. Depending upon server speed, this process could take some time. Please be patient during this step.
  1. Mapping columns to fields
    • The column titles selected in the second step are displayed next to dropdowns in this step.
    • The drop down lists are populated with the fields present in selected resource. There are two types of fields.
      • Normal fields.
      • Reference fields. These fields are are actually references to different tables in the Eden schema. In the drop down list, they are represented as <field-name> --> <resource-name> --> <field in the nested resource>. Therefore, reference to a person in org_staff is represented in the drop down as person_id --> pr_person --> first_name etc.
    • All columns must be matched to fields. Otherwise, you cannot proceed.
    • The data is extracted from the spreadsheet and then processed for import to the database.
  1. Edit invalid rows
    • The database only accepts those records/rows which satisfy validation rules.
    • In case some rows are present which do not conform to the validation enforced on the database, they are rejected. Rows in which all fields have valid values are accepted.
    • The spreadsheet interface shows only those rows which returned an error. The fields which could not be validated are shown in red. Each row will have at least one red field.
    • Errors are represented as <field-name> : <error message>. You entered <your value>. The text in the entire field has to be replaced with the correct value.
    • If you do not wish to import a row, make sure that it is deleted.
    • Click on import to move to the next step.
  1. Edit similar rows
    • In large spreadsheets, it is possible that some rows may be similar. The spreadsheet importer determines such rows.
    • These rows are represented in a spreadsheet format are displayed. Two consecutive rows (like 1 & 2, 3 & 4) are similar.
    • You can edit these rows as you wish. You do not have to select a row with column titles.
    • Click the import button to import these rows.
  1. Edit invalid rows (only those which were flagged as similar)
    • In case there are some errors in the fields in the similar rows, they are displayed in the same format as in step 5.
    • Operations that can be performed in this step are identical to step 5.

Limitation:

  • You cannot import the same spreadsheet again. Causes duplicacy and outright rejection of some records from the database.
  • Bugs, listed below

Some bugs and improvements:

  • Need to remove console.log in the JS code. These are very helpful when debugging, so I'll remove them when the project is merged. Also, redirection is currently commented out, will be removed when importer is merged.
  • Mapping to organisation_id --> org_organisation --> comments works in some cases and does not work in others.
  • Better representation of nested resources.
  • The UI of the stages in the wizard may be modified suitably.

OpenStreetMap

XSLT available to import Hospitals & Locations into Sahana Eden

ToDo: Provide a UI to select which levels the admin boundary levels correspond to:

Ushahidi

XSLT available to import Ushahidi reports into Sahana Eden's Incident Reporting system

See Also


UserGuidelines

Attachments (2)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.