Changes between Version 17 and Version 18 of BluePrint/Importer
- Timestamp:
- 05/19/10 10:12:47 (15 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
BluePrint/Importer
v17 v18 5 5 6 6 The SpreadsheetImporter will be a component of this. 7 7 8 But it would also be good to be able to import from the following formats: 8 9 * PDF 9 10 * HTML (File/URL) 10 11 * DOC 11 * XML (Not matching out data schema) 12 * RSS 12 * XML formats (not matching out data schema) via [wiki:S3XRC S3XRC], such as: 13 * RSS 14 * Ushahidi 13 15 * News feeds 14 * Ushahidi15 16 * Incoming SMS 16 Some of these formats will be able to be parsed and imported, others may be unstructured and saved as a "New Feed".[[BR]] 17 18 Some of these formats will be able to be parsed and imported, others may be unstructured and saved as a "New Feed". 19 17 20 Some of the data may be tabular, or just single record. 18 21 … … 28 31 29 32 Some links that might be useful: 33 * Karma: a system for doing the Import/Clean/Integrate/Publish workflow through a UI paradigm of 'Programming by Example' (instead of via Widgets): 34 * [ftp://ftp.umiacs.umd.edu/pub/louiqa/PUB2010/GeoNets_Shubham.pdf Presentation from ISCRAM 2010] 35 * [http://isi.edu/integration/videos/mashup_building.mp4 Video] 36 * [http://content.digitalwell.washington.edu/msr/external_release_talks_12_05_2005/16012/lecture.htm Presentation] 37 * [http://mashmaker.intel.com Intel MashMaker]: Firefox extension to ease widget-based HTML-based mashups 30 38 * http://wiki.github.com/fizx/parsley/ 31 39 * http://developer.yahoo.com/yql/guide/ 32 * PDFminer is a tool to convert pdf docs into text, it is open source [http://www.unixuser.org/~euske/python/pdfminer/index.html#license (Licence)].40 * [http://www.unixuser.org/~euske/python/pdfminer/ PDFMiner] is an !OpenSource tool to convert PDF docs into text. 33 41 34 42 * Code snippet to extract hyperlinks from HTML docs.