Changes between Version 17 and Version 18 of BluePrint/Importer


Ignore:
Timestamp:
05/19/10 10:12:47 (15 years ago)
Author:
Fran Boon
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • BluePrint/Importer

    v17 v18  
    55
    66The SpreadsheetImporter will be a component of this.
     7
    78But it would also be good to be able to import from the following formats:
    89 * PDF
    910 * HTML (File/URL)
    1011 * DOC
    11  * XML (Not matching out data schema)
    12  * RSS
     12 * XML formats (not matching out data schema) via [wiki:S3XRC S3XRC], such as:
     13  * RSS
     14  * Ushahidi
    1315 * News feeds
    14  * Ushahidi
    1516 * Incoming SMS
    16 Some of these formats will be able to be parsed and imported, others may be unstructured and saved as a "New Feed".[[BR]]
     17
     18Some of these formats will be able to be parsed and imported, others may be unstructured and saved as a "New Feed".
     19
    1720Some of the data may be tabular, or just single record.
    1821
     
    2831
    2932Some links that might be useful:
     33  * Karma: a system for doing the Import/Clean/Integrate/Publish workflow through a UI paradigm of 'Programming by Example' (instead of via Widgets):
     34   * [ftp://ftp.umiacs.umd.edu/pub/louiqa/PUB2010/GeoNets_Shubham.pdf Presentation from ISCRAM 2010]
     35   * [http://isi.edu/integration/videos/mashup_building.mp4 Video]
     36   * [http://content.digitalwell.washington.edu/msr/external_release_talks_12_05_2005/16012/lecture.htm Presentation]
     37  * [http://mashmaker.intel.com Intel MashMaker]: Firefox extension to ease widget-based HTML-based mashups
    3038  * http://wiki.github.com/fizx/parsley/
    3139  * http://developer.yahoo.com/yql/guide/
    32   * PDFminer is a tool to convert pdf docs into text, it is open source [http://www.unixuser.org/~euske/python/pdfminer/index.html#license (Licence)].
     40  * [http://www.unixuser.org/~euske/python/pdfminer/ PDFMiner] is an !OpenSource tool to convert PDF docs into text.
    3341
    3442  * Code snippet to extract hyperlinks from HTML docs.