Changes between Version 44 and Version 45 of UserGuidelines/Importer


Ignore:
Timestamp:
09/24/13 12:19:39 (8 years ago)
Author:
Fran Boon
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • UserGuidelines/Importer

    v44 v45  
    11= Importer =
    22[[TOC]]
     3
    34== Spreadsheets ==
     5Data is normally imported from spreadsheets.
    46
    5 3 options available:
    6  * CSV (preferred)
    7  * Open Office
    8  * Excel via interactive UI (incomplete)
     7Currently these need to be converted to CSV format, however there is now support for building .xls[x] templates to wrap around the CSV:
     8*
    99
    10 === CSV ===
    11 Can import data from spreadsheets in CSV (Comma Separated Values) format as long as it is formatted properly.
     10Process:
     11* Data needs putting into the correct columns
     12* Date formats need to be correct
     13
    1214
    1315This can be done to [DeveloperGuidelines/PrePopulate PrePopulate] an instance.
     
    2527Alternately these can be done using CLI scripts:
    2628* [wiki:S3/DataImportCLI DataImportCLI]
    27 
    28 ==== Old: Web2Py native ====
    29 This is 1 sheet per-table, not per-resource & foreign keys need to be as numeric IDs, so there is a fair amount of setup.
    30 
    31 Can do this via CLI (put file 'volunteers.csv' into private/import):
    32 {{{
    33 import_file = os.path.join(request.folder, "private", "import", "volunteers.csv")
    34 table.import_from_csv_file(open(import_file, "r"))
    35 }}}
    36 
    37 or Web UI (''to be checked''):
    38  * http://myhost.com/eden/vol/person.csv/create?filename=myvols.csv
    39 
    40 === Open Office ===
    41 Can import data from spreadsheets in .fods format (e.g. created by Open Office).
    42 
    43 This is best as it handles complex resources.
    44 
    45 via web:
    46  * http://myhost.com/eden/hms/hospital.fods/create?filename=myhospitals.fods
    47  * http://myhost.com/eden/hms/hospital.fods/create?fetchurl=http://host.domain/myhospitals.fods
    48 (1st is path on local hard disk. If this doesn't work (e.g. 'failed to load external entity' then try loading onto a webserver)
    49 
    50 or from Web2Py CLI:
    51 {{{
    52 resource = s3db.resource("hms_hospital")
    53 template = os.path.join(request.folder, resource.XSLT_PATH, "fods", "import.xsl")
    54 resource.import_xml("myhospitals.fods", template=template)
    55 }}}
    56 
    57 Structure of the file should be as follows:
    58  * Sheet name = resource name
    59  * First row = attribute/field names
    60  * Other rows = attribute/field values
    61 
    62 ==== References ====
    63  1. Insert another sheet with the entries of the referenced table, this sheet must have the same name as the referenced table and a column "uuid"
    64  2. Insert a column in the main sheet with a column title like:
    65 {{{
    66 reference:<fieldname>:<referenced_table>
    67 }}}
    68 e.g. "reference:organisation_id:org_organisation"
    69  3. fill in this column with the UUIDs of the respective entries
    70    in the referenced table
    71 
    72 ==== Components: ====
    73  1. Insert another sheet with the entries of the component table, this sheet must have a name like:
    74 {{{
    75 <main-table-name>+<component-table-name>
    76 }}}
    77 e.g. "hms_hospital+hms_hactivity"
    78  2. Insert a column into the component sheet with a name like:
    79 {{{
    80 <main-table-name>.<key-field>
    81 }}}
    82 e.g. "hms_hospital.gov_uuid"[[BR]]
    83 => the key field must be present in the main table[[BR]]
    84 => the key field must be unique (primary key) in the main table[[BR]]
    85  3. Fill in this column with the respective values of the key field in the parent entry of the main table
    86 
    87 
    88 === Excel via interactive UI ===
    89  * [http://wiki.sahanafoundation.org/doku.php/foundation:gsoc_kohli GSoC project] - merged with Trunk, but disabled by default as not fully working
    90 
    91 ==== Outline ====
    92   * Spreadsheet importer, in its current form, can be used to import data in .XLS files to the Sahana Eden database
    93   * It has a spreadsheet like interface to enable users to edit spreadsheets in the browser itself.
    94   * Referenced resources, like pr_person in org_staff, can be imported as well.
    95   * Records, which could not be validated, are not inserted into the database. These records are displayed to the users in a spreadsheet interface. Records have at least one field in red.
    96    This is the field with an incorrect value. The format in which errors are displayed is :  ''' <field-name>: <error> . You entered <value entered by user> '''. Records can then be corrected and re-imported.
    97   * The spreadsheet importer uses the Jaro-Winkler metric to determine the similarity of two rows. Similarity between rows is checked two at a time. Similar rows are displayed consecutively in another spreadsheet like interface. The user can choose to edit, delete and then import these rows to the database. Errors in these rows are displayed in the same form as described previously.
    98 
    99 ==== Steps ====
    100 [http://www.youtube.com/watch?v=qOFy-xG4Exk Video Demo]
    101 
    102 1. Import spreadsheet
    103    * The importer can be accessed at {{{http://<web address>/<application name>/importer}}} . This is the landing page for the importer.
    104    * Click on spreadsheet.
    105    * Upload a spreadsheet. Add a name to the spreadsheet record, and a comment if you wish.
    106    * Tips:
    107      * Keep a single worksheet in the spreadsheet.
    108      * Keep the number of rows around 100. More rows require more rendering and processing,therefore more time. This could also cause server requests to time out. (Please note, 100 rows is by no means a limitation, it is a simply a safe number of rows to render and import, keeping in mind the strength of internet connection. The number of rows isn't really a problem in an offline client of course)
    109 
    110 2. Edit spreadsheet
    111    * You can make changes to the spreadsheet in this step. You can modify cells, delete and add rows.
    112    * You can search for data. Search returns the first occurrence of a the string in the spreadsheet. Substrings in cells are also matched.
    113    * Rows can be deleted by either clicking the cross on the right end of each row.
    114    * A checkbox on the left of each row selects that row. You can click the delete button in the tool bar to delete the select row. You can only select one row at a time.
    115    * Prior to clicking import, select a row which has titles for each column in the spreadsheet. You will be prompted to select one if you haven't done so already.
    116 
    117 3. Select module and resource
    118    * This step of the spreadsheet importer wizard has two sections, so to speak.
    119    * On the left is a list of modules in the Sahana Eden installation. You can select a single module from this list.
    120    * As soon as a selection is  made from the modules list, the panel on the right updates to show which resources are present in that module. You can select one resource.
    121    * When you click the next button, the module fetches resources and fields from the server. Depending upon server speed, this process could take some time. Please be patient during this step.
    122 
    123 4. Mapping columns to fields
    124   * The column titles selected in the second step are displayed next to dropdowns in this step.
    125   * The drop down lists are populated with the fields present in selected resource. There are two types of fields.
    126     * Normal fields.
    127     * Reference fields. These fields are are actually references to different tables in the Eden schema. In the drop down list, they are represented as <field-name> --> <resource-name> --> <field in the nested resource>.
    128       Therefore, reference to a person in org_staff is represented in the drop down as person_id --> pr_person --> first_name etc.
    129   * All columns '''must''' be matched to fields. Otherwise, you cannot proceed.
    130   * The data is extracted from the spreadsheet and then processed for import to the database.
    131 
    132 5. Edit invalid rows
    133   * The database only accepts those records/rows which satisfy validation rules.
    134   * In case some rows are present which do not conform to the validation enforced on the database, they are rejected. Rows in which all fields have valid values are accepted.
    135   * The spreadsheet interface shows only those rows which returned an error. The fields which could not be validated are shown in red. Each row will have '''at least one''' red field.
    136   * Errors are represented as <field-name> : <error message>. You entered <your value>. The text in the entire field has to be replaced with the '''correct''' value.
    137   * If you do not wish to import a row, make sure that it is deleted.
    138   * Click on import to move to the next step.
    139 
    140 6. Edit similar rows
    141   * In large spreadsheets, it is possible that some rows may be similar. The spreadsheet importer determines such rows.
    142   * These rows are represented in a spreadsheet format are displayed. Two consecutive rows (like 1 & 2, 3 & 4) are similar.
    143   * You can edit these rows as you wish. You '''do not''' have to select a row with column titles.
    144   * Click the import button to import these rows.
    145 
    146 7. Edit invalid rows (__only those which were flagged as similar__)
    147   * In case there are some errors in the fields in the similar rows, they are displayed in the same format as in step 5.
    148   * Operations that can be performed in this step are identical to step 5.
    149 
    150 Limitation:
    151   * You cannot import the same spreadsheet again. Causes duplicacy and outright rejection of some records from the database.
    152   * Bugs, listed below
    153 
    154 Some bugs and improvements:
    155   * Need to remove console.log in the JS code. These are very helpful when debugging, so I'll remove them when the project is merged. Also, redirection is currently commented out, will be removed when importer is merged.
    156   * Mapping to organisation_id --> org_organisation --> comments works in some cases and does not work in others.
    157   * Better representation of nested resources.
    158   * The UI of the stages in the wizard may be modified suitably.
    15929
    16030== !OpenStreetMap ==