Changes between Version 19 and Version 20 of BluePrint/Importer


Ignore:
Timestamp:
05/19/10 14:01:51 (15 years ago)
Author:
Fran Boon
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • BluePrint/Importer

    v19 v20  
    11BluePrints
    22----
     3== Importer Blueprint ==
    34
    45Integrating/Developing a framework to extract structured data from web sources with a simple query language.
     
    2930 * Spreadsheets with multiple sheets
    3031 * Methods of automatically (or with a user friendly interface) cleaning data (removing duplicate values with variations due to typos) - for example:
    31   * If there were a list of countries which contained Indonesia, Spain, India, Indonesiasia, New Zealand, NZ, France, UK, Indonsia - the import may be able to identify whcih fields were duplicates, rather than adding 2 incorrect spellings for Indonesia.
     32  * If there were a list of countries which contained Indonesia, Spain, India, Indonesiasia, New Zealand, NZ, France, UK, Indonsia - the import may be able to identify which fields were duplicates, rather than adding 2 incorrect spellings for Indonesia.
    3233  * Also important for catching things like different spelling, punctuation or orders of words.
    3334Ideally different templates will be able to be designed (by users) for importing different types of data. Machine learning algorithms with (multiple?) human verification could try parsing new data formats based on previous templates used.
     
    3637
    3738Some links that might be useful:
    38   * Karma: a system for doing the Import/Clean/Integrate/Publish workflow through a UI paradigm of 'Programming by Example' (instead of via Widgets):
     39  * Karma: a system for doing the Import/Clean/Integrate/Publish workflow through a UI paradigm of 'Programming by Demonstration' (instead of via Widgets):
    3940   * [ftp://ftp.umiacs.umd.edu/pub/louiqa/PUB2010/GeoNets_Shubham.pdf Presentation from ISCRAM 2010]
    4041   * [http://isi.edu/integration/videos/mashup_building.mp4 Video]