Integrating/Developing a framework to extract structured data from web sources with a simple query language. Some links that might be useful: * http://wiki.github.com/fizx/parsley/ * http://developer.yahoo.com/yql/guide/ * PDFminer is a tool to convert pdf docs into text, it is open source [http://www.unixuser.org/~euske/python/pdfminer/index.html#license (Licence)]. Some hacking in the souce code will is a good option for coding IMPORTING TOOL [http://trac.sahanapy.org/wiki/SpreadsheetImporter Spreadsheet Importer] The [http://trac.sahanapy.org/wiki/SpreadsheetImporter Spreadsheet Importer] will be a component of this.