Changes between Version 11 and Version 12 of SpreadsheetImporter


Ignore:
Timestamp:
01/23/10 17:16:01 (15 years ago)
Author:
mattb
Comment:

add a proposal for how to deal with importing csv data via the web ui

Legend:

Unmodified
Added
Removed
Modified
  • SpreadsheetImporter

    v11 v12  
    8484 * Provide a Controller to allow web-based access at: /admin/import_data (The current implementation broke at some point)
    8585 * Use functions imported from a new {{{modules/s3gis.py}}} for the WKT-creation
     86
     87=== Proposed Automation for Spreadsheet (csv) Imports ===
     88
     89'''Requirements (as per !ToDo above)'''
     90 * Allow admins to import data for a model via the Web interface
     91 * Hook into the Model's validations
     92 * Provide ability to de-duplicate (phase 2)
     93 * Keep track of when and where data was imported from (maybe just for a few days to help with conflict resolution, etc).
     94
     95'''Outline'''
     96 * Allow an admin to start an import 'Job' for a particular Model
     97 * Jobs are described by a free-form description field, and an uploaded .csv file.
     98 * Extract csv column names, store in job. Set job state to 'new'
     99 * Attempt to match csv column names to model column names
     100 * Present list to user, allow them to change csv column <-> model column mapping
     101 * Once user confirms column mappings, set job state to 'processing'
     102 * Background cronjob picks up all import jobs in state 'processing', for each data row:
     103   * Parse into dictionary of {'db_column': value, ..} using column name mapping in Job record
     104   * Validate data using SQLFORM accepts method with dbio=False (no database activity)
     105   * Store row into ImportLine model:
     106     * job_id
     107     * line_no
     108     * valid (true, false)
     109     * status (ignore, import, imported)
     110     * data (pickled dictionary from above)
     111 * Background job status set to 'processed'
     112 * Ajax keeps the job page updated with the background job status, until completed
     113 * Display table of rows and validation status to user, default status to ignore for rows that fail validation, import for ones that pass.
     114   * Future enhancements will most likely occur at this step
     115     * Allow user to edit/fix fields that failed validation and re-validate
     116     * Present list of likely duplicate entries alongside each imported entry
     117 * User can de-select any row to ignore it, !ImportLine status set to 'ignore'
     118 * Once user has checked all rows, click 'Import', set job state to 'import'
     119 * Background cronjob picks up all import jobs in state 'import', for each !ImportLine with status=import:
     120   * Insert data via SQLFORM.accepts method (dbio defaults to True this time, put into DB)
     121     * pass special onvalidation method to accepts, to covert lat, long to wkt as shown above.
     122   * Update !ImportLine status to imported.
     123 * Background job status set to 'completed'
     124
    86125=== Google Spreadsheets API ===
    87126 * http://code.google.com/apis/spreadsheets/