Version 11 (modified by 13 years ago) ( diff ) | ,
---|
UUID conventions
Table of Contents
General
UUIDs (Universal Unique IDentifiers) in Eden follow the Python uuid implementation, which is specified in RFC4122:
However, other systems do not necessarily follow this convention (especially data sets outside of IT systems), but we still must be able to identify such data resources.
Therefore, in every shared resource (XML+JSON), the UUID shall be prefixed by the domain name of the originating instance (authoritative domain) plus a slash ("/"), e.g.
haiti.sahanafoundation.org/12345678-1234-5678-1234-567812345678
Generally, a "domain name" here can be any arbitrary XML name except it must not contain any slashes. However, the domain name must be unique, of course - and therefore it is recommended to use the internet domain name of the current instance. This in turn requires that all applications which share that domain name and take part in the data exchange adhere to the same UUID convention (UUID's have to be unique at least within the same domain).
This convention is adopted from the person_record_id convention of PFIF.
Implementation Guideline
At import, when the system receives a resource with a prefixed UUID and the prefix matches the domain of the current instance, then the prefix gets removed to identify the resource in the database. If the prefix differs from our domain, then the prefix is retained in the database record.
At export, all unprefixed UUIDs from the database get prefixed by the domain name of the current instance. In those UUIDs which already have a prefix (because they have been imported from another domain), the prefix is retained.
URNs instead of UUIDs
Eden has just moved from UUIDs to URNs in order to enhance interoperability in multi-application scenarios like Haiti or Pakistan.
From experience we know that data exchange in the field can involve a variety of applications other than Eden, each implementing their own identifier schemes - and furthermore data sets which instead of application-specific IDs use officially assigned identifiers (e.g. PAHO IDs for health facilities in Haiti). Implementation of URNs will add support for both multiple different identifier schemes, as well as cross-application common namespaces and ID schemas (as favorable e.g. for geolocations or personal data).
In practise, that means:
- there should be a common namespace for sahana applications, at best "sahana"
- uuid="eden.sahanafoundation.org/XXXX-YYYY" would become something like uuid="urn:sahana:eden.sahanafoundation.org/XXXX-YYYY"
- Eden can support other namespaces, by making the namespace a configurable attribute of the "uuidstamp" reusable field
Mapping
We need an agreed set of UUIDs for GIS Data so that we can share data more easily across systems, such as the current Pakistan data
- OpenStreetMap IDs can change over time if records are deleted/recreated
- They can hold additional uid/uuid fields though
- Geonames data isn't free enough for OSM: http://wiki.openstreetmap.org/wiki/Geonames
- Geonet can be: http://wiki.openstreetmap.org/wiki/GEOnet_Names_Server
- Yahoo WoE data is in the public domain:
- Ushahidi doesn't have a common set of IDs across instances
- No space for a UUID either?
- Longer-term we need a common central repository of UUIDs that is held for the common good.
- Propose that Sahana start this off & then give up ownership/branding later:
- Sahana can use UUIDs of format: http://geo.sahanafoundation.org/<ID>
- These are associated with an OSM ID & Geonames ID for cross-correlation
- The Source ID field shows the source, so that OSM export can filter out sources like Geonames
- IDs from Ushahidi instances can be appended to the comments field in Sahana