Changes between Version 19 and Version 20 of BluePrint/Importer
- Timestamp:
- 05/19/10 14:01:51 (15 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
BluePrint/Importer
v19 v20 1 1 BluePrints 2 2 ---- 3 == Importer Blueprint == 3 4 4 5 Integrating/Developing a framework to extract structured data from web sources with a simple query language. … … 29 30 * Spreadsheets with multiple sheets 30 31 * Methods of automatically (or with a user friendly interface) cleaning data (removing duplicate values with variations due to typos) - for example: 31 * If there were a list of countries which contained Indonesia, Spain, India, Indonesiasia, New Zealand, NZ, France, UK, Indonsia - the import may be able to identify wh cih fields were duplicates, rather than adding 2 incorrect spellings for Indonesia.32 * If there were a list of countries which contained Indonesia, Spain, India, Indonesiasia, New Zealand, NZ, France, UK, Indonsia - the import may be able to identify which fields were duplicates, rather than adding 2 incorrect spellings for Indonesia. 32 33 * Also important for catching things like different spelling, punctuation or orders of words. 33 34 Ideally different templates will be able to be designed (by users) for importing different types of data. Machine learning algorithms with (multiple?) human verification could try parsing new data formats based on previous templates used. … … 36 37 37 38 Some links that might be useful: 38 * Karma: a system for doing the Import/Clean/Integrate/Publish workflow through a UI paradigm of 'Programming by Example' (instead of via Widgets):39 * Karma: a system for doing the Import/Clean/Integrate/Publish workflow through a UI paradigm of 'Programming by Demonstration' (instead of via Widgets): 39 40 * [ftp://ftp.umiacs.umd.edu/pub/louiqa/PUB2010/GeoNets_Shubham.pdf Presentation from ISCRAM 2010] 40 41 * [http://isi.edu/integration/videos/mashup_building.mp4 Video]