wiki:BluePrint/Importer/Excel

Version 9 (modified by Dominic König, 11 years ago) ( diff )

--

Import Excel

We have an Importer which uses CSV files and works very well when the correct data is entered.

The limitations of CSV, however, are that there is no client-side Validation or Online Help, so it is very easy for users to input incorrect data (e.g. Dates).

To overcome that, we're looking for a pre-processor that converts XLS (MS Excel) sheets into CSV before importing.

NB This should also help us with ensuring that imported data is properly UTF-8. MS Excel tends to save CSVs as ANSI whilst it defaults to UTF-8 from 97+ & provides the encoding sed for older versions which can be picked up by xlrd:

Implementation

Interactive Importer

Besides .csv files, the interactive importer now also accepts .xls/.xlsx (Excel) files.

If such a file is uploaded, the importer will search for a sheet with the name "SahanaData" (or fall back to the first sheet, of "SahanaData" is not present), and import it under the same formatting rules as the corresponding CSV format.

The sheet is read from the top left, and the first row interpreted as column names. Else, the sheet is parsed, transformed and imported just like the corresponding CSV format.

The XLS workbook can contain other, more user-friendly sheets with online help, tooltips andor/ validation, and have the "SahanaData" sheet hidden and populated by macros.

xls2tree

S3XML.xls2tree() is the back-end method to parse Excel sheets.

Note: unlike originally planned, this is not a pre-processor to convert XLS into CSV and then pass into the CSV importer, but just like csv2tree (the CSV parser) it converts the Excel sheet into an ElementTree for import. However, this ElementTree is then transformed with the same XSLT stylesheets as the CSV format.

The xls2tree method is much more flexible than the current importer front-end suggests - it can read any sheet in the workbook and a range of rows/columns anywhere on the sheet (i.e. not only top left). It also accepts a column mapping as parameter which allows to specify which column corresponds to which field instead of reading the column headers from the sheet.

This flexibility could be utilized to implement a front-end that allows the user to import from any XLS document even if there is no "SahanaData" sheet in CSV format. These "XLS format configurations" (sheet name, cell range, column mapping) could be saved ("Saved XLS Formats"), and then re-used for other documents - which allows to quickly adapt to various document sources.

Xls2tree can also work from open work books, which can be used to implement an XLS bulk importer to import multiple resources and/or multiple sheets from the same XLS work book in one go.

Templates

We would like default Excel Workbooks for all of our major resources which have protected fields for the core data inputs (like those used for feeding Assessments to the Survey module).

We would like some Excel routines for:

  • Date[Time] Validation
  • Selection from pre-populated dropdowns
  • Selection of Location Hierarchy
  • Online Help

Deployers can then customise the default Excel workbooks for local contexts, such as providing localised names for the Location Hierarchy rather than 'L1', 'L2', 'L3', etc and prepopulating the spreadsheet with a valid lookup lists.

Extensions

The link to download the template should check for the existence of an .xls in static/formats/s3csv & use that instead of the .csv one where-present.

Note: See TracWiki for help on using the wiki.