Changes between Version 38 and Version 39 of BluePrint/Importer


Ignore:
Timestamp:
01/22/11 13:06:25 (14 years ago)
Author:
Pat Tressel
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • BluePrint/Importer

    v38 v39  
    3131Having the data parsed into an !ElementTree allows [wiki:S3XRC S3XRC] to handle all the database integrity & framework rules.
    3232
    33  Q: Is this correct? !ElementTree without pointers between separate trees does not seem to have a way to encode a directed acyclic graph. A general database schema is a DAG plus self-loops (references from a table to itself, so long relations among elements are not cyclic). (For instance, consider volunteers. They have components via pe_id. They also have references to zero or more elements of the volunteer skills table. Other volunteers point to those same skill records. Thus there are multiple roots -- the skills -- to the tree of volunteers. The same structure occurs in inventory, where catalog items are referenced by multiple order items, but order items are also components of orders. In these cases, there isn't a (clean) way to pick one root for a tree. If we decided to have an skill category table, then we would have diamond-shaped DAGs -- a volunteer could point to several skills, and those skills could point to a common category.) For output, this is not relevant because the records will have their primary keys and foreign keys available. It's only an issue when creating a collection of dag-structured data, as no actual keys have been assigned yet. This is not hard to overcome -- it just means adding placeholder keys to represent the linkage between records in separate ElementTrees. There are examples of DAG representations and algorithms -- a search for "xml directed acyclic graph" will turn them up.
     33> (Pat:) Q: Is this correct? !ElementTree without pointers between separate trees does not seem to have a way to encode a directed acyclic graph. A general database schema is a DAG plus self-loops (references from a table to itself, so long relations among elements are not cyclic). (For instance, consider volunteers. They have components via pe_id. They also have references to zero or more elements of the volunteer skills table. Other volunteers point to those same skill records. Thus there are multiple roots -- the skills -- to the tree of volunteers. The same structure occurs in inventory, where catalog items are referenced by multiple order items, but order items are also components of orders. In these cases, there isn't a (clean) way to pick one root for a tree. If we decided to have an skill category table, then we would have diamond-shaped DAGs -- a volunteer could point to several skills, and those skills could point to a common category.) For output, this is not relevant because the records will have their primary keys and foreign keys available. It's only an issue when creating a collection of dag-structured data, as no actual keys have been assigned yet. This is not hard to overcome -- it just means adding placeholder keys to represent the linkage between records in separate ElementTrees. There are examples of DAG representations and algorithms -- a search for "xml directed acyclic graph" will turn them up.
     34
     35> (Pat:) Dominic and I discussed this in #sahana-eden. Here's the result:
    3436
    3537>> (Dominic:) S3XML supports DAGs via UIDs. Referenced <resource>s can be placed anywhere in the source as <resource name="tablename" uuid="XXX">, and then be referenced by <reference resource="tablename" uuid="XXX">. We're using UIDs here to facilitate identification of records (e.g. for updates), and we do accept foreign-generated UIDs for that. We could perhaps additionally introduce temporary reference IDs ("tuid") to just establish the reference structure within the source without the need to have the generator producing unique IDs (tuids must be unique only inside the source document, not universally). These tuids would then be replaced by UIDs during import. However, tuids cannot facilitate record identification, and can therefore not be used for updates.
     38
     39>>> (Pat:) One takeaway for the CSV importer is that, since all externally visible Eden records have UUIDs, and the importer is unpacking the records, it can just create UUIDs for them to use as reference keys. The XML importer (which will be called by the CSV importer) will use these as the actual UUIDs, so the (mildly expensive) random # call isn't wasted.
    3640
    3741- this also allows Eden's Importer tool to be used as a Mashup handler for other systems (such as Agasti) by posting the data back out.