wiki:BluePrint/DataRepository

Version 2 (modified by devin, 7 years ago) ( diff )

--

Data repository tools such as CKAN are becoming popular within the humanitarian aid space as evidence by projects like HDX and Data.Gov's disaster portal.

These tools allow users to publish data sets and associate them with metadata that enables others to easily find them. This is particularly useful for organizations that receive and produce lots of raw and refined data sets. Many of these organizations are also collecting data sets that they will then integrate into their own information management systems. Sometimes the data they organize in their information management systems is also data they want to make available in a raw format via a data repository.

Since Sahana produces the type of information management systems into which people want to integrate data they collect, it makes sense for Sahana to provide data repository functionality that would enable users to publish datasets and metadata that follows the DKAT standard and is accessible via API.

It's likely this data would fall into a few categories:

  • raw datasets collect (ex. information about medical clinics collected by workers in the field)
  • polished datasets (ex. medical clinics from WHO)
  • datasets produced by the Sahana system (ex. all medical facilities being managed in the Sahana system)
  • documents and reports (ex. PDF of reports and supplemental spreadsheet information)

The basic idea is to create a "data repository module" that would perform some key functions:

  • Publish Data
    • registered users can publish data via link or file upload
    • they can add metadata that conforms to DCAT standards
    • they can set permissions for that data. Start with public, metadata view only, private
  • Manage Data
    • users can become the (manager) of a specific dataset
    • they can change its status (new, processing, processed up-to-date, outdated)
    • they can edit its metadata information
    • they can delete it
  • Find Data
    • users can filter and search through data
    • they can download data in multiple formats (if available)
    • they can add comments/notes to the data
    • they can access metadata information via API

Potential Schema:

  • Title
  • Data Formats
  • Original Author (individual, organization or group)
  • Date/Time Submitted
  • Submitted through (channel)
  • Date/Time Updated
  • Updated By (individual, organization or group)
  • Purpose
  • Permissions: Public, View Metadata, Private
  • Status: New, Processing (+ manager), Integrate (+reference_link, +note)
  • Manager (Sahana user managing this data set)
  • Accessibility Notes
  • General Notes
  • Change Log
  • Comments
Note: See TracWiki for help on using the wiki.