Version 6 (modified by Michael Howden, 14 years ago) ( diff )


Integrating/Developing a framework to extract structured data from web sources with a simple query language.

Some links that might be useful:

The Spreadsheet Importer will be a component of this.

But it would also be good to be able to import from the following formats:

  • PDF
  • HTML (File/URL)
  • DOC
  • XML (Not matching out data schema)
  • RSS
  • News feeds
  • Ushahidi

Some of these formats will be able to be parsed and imported, others may be unstructured and saved as a "New Feed".
Ideally different templates will be able to be designed (by users) for importing different types of data.

Note: See TracWiki for help on using the wiki.