Changes between Version 7 and Version 8 of BluePrint/Importer/Excel

09/19/13 10:55:21 (11 years ago)
Dominic König



  • BluePrint/Importer/Excel

    v7 v8  
    55The limitations of CSV, however, are that there is no client-side Validation or Online Help, so it is very easy for users to input incorrect data (e.g. Dates).
    7 We have now got a pre-processor for the Importer back-end to allow .xls[x] files to be uploaded & converting them to CSV before passing them to the current CSV importer.
     7To overcome that, we're looking for a pre-processor that converts XLS (MS Excel) sheets into CSV before importing.
    99NB This should also help us with ensuring that imported data is properly UTF-8. MS Excel tends to save CSVs as ANSI whilst it defaults to UTF-8 from 97+ & provides the encoding sed for older versions which can be picked up by xlrd:
    10 *
     11  *
     13== Implementation ==
     15=== Interactive Importer ===
     17Besides .csv files, the interactive importer now also accepts .xls/.xlsx (Excel) files.
     19If such a file is uploaded, the importer will search for a sheet with the name "SahanaData" (or fall back to the first sheet, of "SahanaData" is not present), and import it under the same formatting rules as the corresponding CSV format.
     21The sheet is read from the top left, and the first row interpreted as column names. Else, the sheet is parsed, transformed and imported just like the corresponding CSV format.
     23The XLS workbook can contain other, more user-friendly sheets with online help, tooltips andor/ validation, and have the "SahanaData" sheet hidden and populated by macros.
     25=== xls2tree ===
     27Xls2tree is the back-end method to parse Excel sheets.
     29Note: unlike originally planned, this is not a pre-processor to convert XLS into CSV and then pass into the CSV importer, but just like csv2tree (the CSV parser) it converts the Excel sheet into an ElementTree for import. However, this ElementTree is transformed with the same XSLT stylesheets as the CSV format.
     31The xls2tree method is much more flexible than the current importer front-end suggests - it can read any sheet in the workbook and a range of rows/columns anywhere on the sheet (i.e. not only top left). It also accepts a column mapping as parameter which allows to specify which column corresponds to which field instead of reading the column headers from the sheet.
     33This flexibility could be utilized to implement a front-end that allows the user to import from any XLS document even if there is no "SahanaData" sheet in CSV format. These "XLS format configurations" (sheet name, cell range, column mapping) could be saved ("Saved XLS Formats"), and then re-used for other documents - which allows to quickly adapt to various document sources.
     35Xls2tree can also work from open work books, which can be used to implement an XLS bulk importer to import multiple resources and/or multiple sheets from the same XLS work book in one go.
    1237== Templates ==
    1339We would like default Excel Workbooks for all of our major resources which have protected fields for the core data inputs (like those used for feeding Assessments to the Survey module).
    2349== Extensions ==
    2451The link to download the template should check for the existence of an .xls in static/formats/s3csv & use that instead of the .csv one where-present.