wiki:UUID

Context Navigation

Version 14 (modified by Dominic König, 13 years ago) ( diff )
--

UUID conventions

General

All records in Sahana Eden which shall be shared with other instances or applications must have a universally unique identifier (UUID).

In Sahana Eden, these record UUIDs must be ASCII strings.

Eden automatically generates a UUID for each record according to the specifications in RFC4122:

http://www.faqs.org/rfcs/rfc4122.html

All record UUIDs generated by Sahana Eden are strings in URN notation:

urn:uuid:a46c1b3b-35c0-44c6-9142-bbe3a013039a

If a record is imported into Sahana Eden and it already has a UUID, then Eden will retain this UUID as-is (exception: local domain prefix, see below) - without any syntax validation (i.e. no specific UUID schema is required). However, for consistency reasons, we recommend to use URNs.

Some data formats (e.g. PFIF) may require a domain prefix for a UUID, followed by a slash:

sahanafoundation.org/urn:uuid:a46c1b3b-35c0-44c6-9142-bbe3a013039a

Where such a domain prefix is used and it matches the local domain of the importing Sahana instance, it will be removed during import.

Implementation Guideline

At import, when the system receives a resource with a prefixed UUID and the prefix matches the domain of the current instance, then the prefix gets removed to identify the resource in the database. If the prefix differs from our domain, then the prefix is retained in the database record.

At export, all unprefixed UUIDs from the database get prefixed by the domain name of the current instance. In those UUIDs which already have a prefix (because they have been imported from another domain), the prefix is retained.

URNs instead of UUIDs

Eden has just moved from UUIDs to URNs in order to enhance interoperability in multi-application scenarios like Haiti or Pakistan.

From experience we know that data exchange in the field can involve a variety of applications other than Eden, each implementing their own identifier schemes - and furthermore data sets which instead of application-specific IDs use officially assigned identifiers (e.g. PAHO IDs for health facilities in Haiti). Implementation of URNs will add support for both multiple different identifier schemes, as well as cross-application common namespaces and ID schemas (as favorable e.g. for geolocations or personal data).

In practise, that means:

there should be a common namespace for sahana applications, at best "sahana"
uuid="eden.sahanafoundation.org/XXXX-YYYY" would become something like uuid="urn:sahana:eden.sahanafoundation.org/XXXX-YYYY"
Eden can support other namespaces, by making the namespace a configurable attribute of the "uuidstamp" reusable field

Mapping

We need an agreed set of UUIDs for GIS Data so that we can share data more easily across systems, such as the current Pakistan data

OpenStreetMap IDs can change over time if records are deleted/recreated
- They can hold additional uid/uuid fields though
Geonames data isn't free enough for OSM: http://wiki.openstreetmap.org/wiki/Geonames
- Geonet can be: http://wiki.openstreetmap.org/wiki/GEOnet_Names_Server
Yahoo WoE data is in the public domain:
- XML: http://code.flickr.com/blog/2009/05/21/flickr-shapefiles-public-dataset-10/
- Tab: http://developer.yahoo.com/geo/geoplanet/data/
Ushahidi doesn't have a common set of IDs across instances
- No space for a UUID either?
Longer-term we need a common central repository of UUIDs that is held for the common good.
- Propose that Sahana start this off & then give up ownership/branding later:
- Sahana can use UUIDs of format: http://geo.sahanafoundation.org/<ID>
- These are associated with an OSM ID & Geonames ID for cross-correlation
- The Source ID field shows the source, so that OSM export can filter out sources like Geonames
- IDs from Ushahidi instances can be appended to the comments field in Sahana