Context Navigation

Changes between Version 17 and Version 18 of BluePrint/Importer

Timestamp:: 05/19/10 10:12:47 (15 years ago)
Author:: Fran Boon
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

BluePrint/Importer

-              v17
+              v18
 The SpreadsheetImporter will be a component of this.
 But it would also be good to be able to import from the following formats:
  * PDF
  * HTML (File/URL)
  * DOC
+ * XML (Not matching out data schema)
+ * RSS
+ * XML formats (not matching out data schema) via [wiki:S3XRC S3XRC], such as:
+  * RSS
+  * Ushahidi
  * News feeds
- * Ushahidi
  * Incoming SMS
+Some of these formats will be able to be parsed and imported, others may be unstructured and saved as a "New Feed".[[BR]]
+Some of these formats will be able to be parsed and imported, others may be unstructured and saved as a "New Feed".
 Some of the data may be tabular, or just single record.
 …
 Some links that might be useful:
+  * Karma: a system for doing the Import/Clean/Integrate/Publish workflow through a UI paradigm of 'Programming by Example' (instead of via Widgets):
+   * [ftp://ftp.umiacs.umd.edu/pub/louiqa/PUB2010/GeoNets_Shubham.pdf Presentation from ISCRAM 2010]
+   * [http://isi.edu/integration/videos/mashup_building.mp4 Video]
+   * [http://content.digitalwell.washington.edu/msr/external_release_talks_12_05_2005/16012/lecture.htm Presentation]
+  * [http://mashmaker.intel.com Intel MashMaker]: Firefox extension to ease widget-based HTML-based mashups
   * http://wiki.github.com/fizx/parsley/
   * http://developer.yahoo.com/yql/guide/
   * PDFminer is a tool to convert pdf docs into text, it is open source [http://www.unixuser.org/~euske/python/pdfminer/index.html#license (Licence)].
+  * [http://www.unixuser.org/~euske/python/pdfminer/ PDFMiner] is an !OpenSource tool to convert PDF docs into text.
   * Code snippet to extract hyperlinks from HTML docs.