Changes between Initial Version and Version 1 of UserGuidelines/Importer/Old


Ignore:
Timestamp:
09/24/13 12:21:13 (11 years ago)
Author:
Fran Boon
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • UserGuidelines/Importer/Old

    v1 v1  
     1= Archived information =
     2[[TOC]]
     3
     4= Old CSV: Web2Py native ==
     5This is 1 sheet per-table, not per-resource & foreign keys need to be as numeric IDs, so there is a fair amount of setup.
     6
     7Can do this via CLI (put file 'volunteers.csv' into private/import):
     8{{{
     9import_file = os.path.join(request.folder, "private", "import", "volunteers.csv")
     10table.import_from_csv_file(open(import_file, "r"))
     11}}}
     12
     13or Web UI (''to be checked''):
     14 * http://myhost.com/eden/vol/person.csv/create?filename=myvols.csv
     15
     16= Open Office =
     17Can import data from spreadsheets in .fods format (e.g. created by Open Office).
     18
     19This is best as it handles complex resources.
     20
     21via web:
     22 * http://myhost.com/eden/hms/hospital.fods/create?filename=myhospitals.fods
     23 * http://myhost.com/eden/hms/hospital.fods/create?fetchurl=http://host.domain/myhospitals.fods
     24(1st is path on local hard disk. If this doesn't work (e.g. 'failed to load external entity' then try loading onto a webserver)
     25
     26or from Web2Py CLI:
     27{{{
     28resource = s3db.resource("hms_hospital")
     29template = os.path.join(request.folder, resource.XSLT_PATH, "fods", "import.xsl")
     30resource.import_xml("myhospitals.fods", template=template)
     31}}}
     32
     33Structure of the file should be as follows:
     34 * Sheet name = resource name
     35 * First row = attribute/field names
     36 * Other rows = attribute/field values
     37
     38== References ==
     39 1. Insert another sheet with the entries of the referenced table, this sheet must have the same name as the referenced table and a column "uuid"
     40 2. Insert a column in the main sheet with a column title like:
     41{{{
     42reference:<fieldname>:<referenced_table>
     43}}}
     44e.g. "reference:organisation_id:org_organisation"
     45 3. fill in this column with the UUIDs of the respective entries
     46   in the referenced table
     47
     48== Components: ==
     49 1. Insert another sheet with the entries of the component table, this sheet must have a name like:
     50{{{
     51<main-table-name>+<component-table-name>
     52}}}
     53e.g. "hms_hospital+hms_hactivity"
     54 2. Insert a column into the component sheet with a name like:
     55{{{
     56<main-table-name>.<key-field>
     57}}}
     58e.g. "hms_hospital.gov_uuid"[[BR]]
     59=> the key field must be present in the main table[[BR]]
     60=> the key field must be unique (primary key) in the main table[[BR]]
     61 3. Fill in this column with the respective values of the key field in the parent entry of the main table
     62
     63
     64= Excel via interactive UI =
     65 * [http://wiki.sahanafoundation.org/doku.php/foundation:gsoc_kohli GSoC project] - merged with Trunk, but disabled by default as not fully working
     66
     67== Outline ==
     68  * Spreadsheet importer, in its current form, can be used to import data in .XLS files to the Sahana Eden database
     69  * It has a spreadsheet like interface to enable users to edit spreadsheets in the browser itself.
     70  * Referenced resources, like pr_person in org_staff, can be imported as well.
     71  * Records, which could not be validated, are not inserted into the database. These records are displayed to the users in a spreadsheet interface. Records have at least one field in red.
     72   This is the field with an incorrect value. The format in which errors are displayed is :  ''' <field-name>: <error> . You entered <value entered by user> '''. Records can then be corrected and re-imported.
     73  * The spreadsheet importer uses the Jaro-Winkler metric to determine the similarity of two rows. Similarity between rows is checked two at a time. Similar rows are displayed consecutively in another spreadsheet like interface. The user can choose to edit, delete and then import these rows to the database. Errors in these rows are displayed in the same form as described previously.
     74
     75== Steps ==
     76[http://www.youtube.com/watch?v=qOFy-xG4Exk Video Demo]
     77
     781. Import spreadsheet
     79   * The importer can be accessed at {{{http://<web address>/<application name>/importer}}} . This is the landing page for the importer.
     80   * Click on spreadsheet.
     81   * Upload a spreadsheet. Add a name to the spreadsheet record, and a comment if you wish.
     82   * Tips:
     83     * Keep a single worksheet in the spreadsheet.
     84     * Keep the number of rows around 100. More rows require more rendering and processing,therefore more time. This could also cause server requests to time out. (Please note, 100 rows is by no means a limitation, it is a simply a safe number of rows to render and import, keeping in mind the strength of internet connection. The number of rows isn't really a problem in an offline client of course)
     85
     862. Edit spreadsheet
     87   * You can make changes to the spreadsheet in this step. You can modify cells, delete and add rows.
     88   * You can search for data. Search returns the first occurrence of a the string in the spreadsheet. Substrings in cells are also matched.
     89   * Rows can be deleted by either clicking the cross on the right end of each row.
     90   * A checkbox on the left of each row selects that row. You can click the delete button in the tool bar to delete the select row. You can only select one row at a time.
     91   * Prior to clicking import, select a row which has titles for each column in the spreadsheet. You will be prompted to select one if you haven't done so already.
     92
     933. Select module and resource
     94   * This step of the spreadsheet importer wizard has two sections, so to speak.
     95   * On the left is a list of modules in the Sahana Eden installation. You can select a single module from this list.
     96   * As soon as a selection is  made from the modules list, the panel on the right updates to show which resources are present in that module. You can select one resource.
     97   * When you click the next button, the module fetches resources and fields from the server. Depending upon server speed, this process could take some time. Please be patient during this step.
     98
     994. Mapping columns to fields
     100  * The column titles selected in the second step are displayed next to dropdowns in this step.
     101  * The drop down lists are populated with the fields present in selected resource. There are two types of fields.
     102    * Normal fields.
     103    * Reference fields. These fields are are actually references to different tables in the Eden schema. In the drop down list, they are represented as <field-name> --> <resource-name> --> <field in the nested resource>.
     104      Therefore, reference to a person in org_staff is represented in the drop down as person_id --> pr_person --> first_name etc.
     105  * All columns '''must''' be matched to fields. Otherwise, you cannot proceed.
     106  * The data is extracted from the spreadsheet and then processed for import to the database.
     107
     1085. Edit invalid rows
     109  * The database only accepts those records/rows which satisfy validation rules.
     110  * In case some rows are present which do not conform to the validation enforced on the database, they are rejected. Rows in which all fields have valid values are accepted.
     111  * The spreadsheet interface shows only those rows which returned an error. The fields which could not be validated are shown in red. Each row will have '''at least one''' red field.
     112  * Errors are represented as <field-name> : <error message>. You entered <your value>. The text in the entire field has to be replaced with the '''correct''' value.
     113  * If you do not wish to import a row, make sure that it is deleted.
     114  * Click on import to move to the next step.
     115
     1166. Edit similar rows
     117  * In large spreadsheets, it is possible that some rows may be similar. The spreadsheet importer determines such rows.
     118  * These rows are represented in a spreadsheet format are displayed. Two consecutive rows (like 1 & 2, 3 & 4) are similar.
     119  * You can edit these rows as you wish. You '''do not''' have to select a row with column titles.
     120  * Click the import button to import these rows.
     121
     1227. Edit invalid rows (__only those which were flagged as similar__)
     123  * In case there are some errors in the fields in the similar rows, they are displayed in the same format as in step 5.
     124  * Operations that can be performed in this step are identical to step 5.
     125
     126Limitation:
     127  * You cannot import the same spreadsheet again. Causes duplicacy and outright rejection of some records from the database.
     128  * Bugs, listed below
     129
     130Some bugs and improvements:
     131  * Need to remove console.log in the JS code. These are very helpful when debugging, so I'll remove them when the project is merged. Also, redirection is currently commented out, will be removed when importer is merged.
     132  * Mapping to organisation_id --> org_organisation --> comments works in some cases and does not work in others.
     133  * Better representation of nested resources.
     134  * The UI of the stages in the wizard may be modified suitably.