Context Navigation

Changes between Version 27 and Version 28 of BluePrint/Importer

Timestamp:: 01/22/11 11:28:09 (14 years ago)
Author:: Pat Tressel
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

BluePrint/Importer

-              v27
+              v28
 }}}
+== CSV import ==
+What sources and forms of data should we support?
+Sources use cases:
+A repository or other source that has its own data format that does not match ours.
+A means of mapping from the source's schema to ours will be needed.
+It is likely, for an existing source, that we will produce the schema mapping.
+Would want to detect schema changes, or get notification of them.
+A user who wants to upload data, and is willing to format it to our specification.
+In this case the data can be processed without a schema mapping.
+Source data structure uses cases (not yet discussing mapping -- only the source schema):
+A flat table.
+Multiple tables but 1-1or 1-at most 1 -- much the same as a flat table.
+-N relationships (such as are represented by the dependent table having an fk ref to the primary).
+M-N relationships (typically represented by a relationship table).
+Possible CSV representations:
+Separate files per table with key references to link entries across tables.
+The keys can either be existing Eden database keys (for updating existing
+records), or scratch keys (not stored as ids in any other database, only
+used to associate dependent records for this upload, or external database
+keys (i.e. actual keys in the source database, which we might want to
+preserve for future updates.)  This can easily represent any valence of
+relationship.
+One file with separate sections, equivalent to concatenating the separate
+files above.
+A single file with an outer join of all the tables.  For 1-N, the data on the 1-
+side is reapeated in each row along with the separate records of the -N
+side.  For M-N, either side may be replicated across multiple lines in the
+file, as needed.  For a deeper hierarchy, the common records are repeated
+as needed.  This is just a standard outer join.  If there is a large fanout
+(1-lots of records) then could "compress' records by including one full copy
+of a record, then just its key field with non-key fields left empty.  This can
+represent any valence of relationship at the expense of some extra storage.
+It has the advantage that related pieces are easy to identify, and it's not
+necessary for them to be in any specific order, except that if the above
+compression is used and some fields are required to be non-null, then
+it's simpler if the complete record is available before the partial records.
+A flat file with embedded structure -- that is, cells that contain records,
+or multiple items or records.  A simple example is a cell that contains a list
+of strings, or a collection of key=value pairs.  Or even xml...
+Any combination of the above.
 ----
 BluePrints