| 2 | |
| 3 | The [http://trac.sahanapy.org/wiki/SpreadsheetImporter Spreadsheet Importer] will be a component of this. |
| 4 | But it would also be good to be able to import from the following formats: |
| 5 | * PDF |
| 6 | * HTML (File/URL) |
| 7 | * DOC |
| 8 | * XML (Not matching out data schema) |
| 9 | * RSS |
| 10 | * News feeds |
| 11 | * Ushahidi |
| 12 | * Incoming SMS |
| 13 | Some of these formats will be able to be parsed and imported, others may be unstructured and saved as a "New Feed".[[BR]] |
| 14 | Some of the data may be tabular, or just single record.[[BR]] |
| 15 | |
| 16 | A generic importing tool, which allowed data to be imported from various sources automatically. The data could be parsed and fitted into our data model, or it may just be added to a news feed aggregator. This project could include: |
| 17 | * A User friendly interface to match fields to parse the data |
| 18 | * Importing from "flat" tables to linked tables |
| 19 | * Methods of automatically (or with a user friendly interface) cleaning data (removing duplicate values with variations due to typos) |
| 20 | Ideally different templates will be able to be designed (by users) for importing different types of data. Machine learning algorithms with (multiple?) human verification could try parsing new data formats based on previous templates used. |
46 | | The [http://trac.sahanapy.org/wiki/SpreadsheetImporter Spreadsheet Importer] will be a component of this. |
47 | | |
48 | | But it would also be good to be able to import from the following formats: |
49 | | * PDF |
50 | | * HTML (File/URL) |
51 | | * DOC |
52 | | * XML (Not matching out data schema) |
53 | | * RSS |
54 | | * News feeds |
55 | | * Ushahidi |
56 | | * Incoming SMS |
57 | | Some of these formats will be able to be parsed and imported, others may be unstructured and saved as a "New Feed".[[BR]] |
58 | | Some of the data may be tabular, or just single record.[[BR]] |
59 | | Ideally different templates will be able to be designed (by users) for importing different types of data. Machine learning algorithms with (multiple?) human verification could try parsing new data formats based on previous templates used. |