Changes between Version 27 and Version 28 of BluePrint/Importer


Ignore:
Timestamp:
01/22/11 11:28:09 (14 years ago)
Author:
Pat Tressel
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • BluePrint/Importer

    v27 v28  
    111111}}}
    112112
     113== CSV import ==
     114
     115What sources and forms of data should we support?
     116
     117Sources use cases:
     118
     119A repository or other source that has its own data format that does not match ours.
     120A means of mapping from the source's schema to ours will be needed.
     121It is likely, for an existing source, that we will produce the schema mapping.
     122Would want to detect schema changes, or get notification of them.
     123
     124A user who wants to upload data, and is willing to format it to our specification.
     125In this case the data can be processed without a schema mapping.
     126
     127Source data structure uses cases (not yet discussing mapping -- only the source schema):
     128
     129A flat table.
     130
     131Multiple tables but 1-1or 1-at most 1 -- much the same as a flat table.
     132
     1331-N relationships (such as are represented by the dependent table having an fk ref to the primary).
     134
     135M-N relationships (typically represented by a relationship table).
     136
     137Possible CSV representations:
     138
     139Separate files per table with key references to link entries across tables.
     140The keys can either be existing Eden database keys (for updating existing
     141records), or scratch keys (not stored as ids in any other database, only
     142used to associate dependent records for this upload, or external database
     143keys (i.e. actual keys in the source database, which we might want to
     144preserve for future updates.)  This can easily represent any valence of
     145relationship.
     146
     147One file with separate sections, equivalent to concatenating the separate
     148files above.
     149
     150A single file with an outer join of all the tables.  For 1-N, the data on the 1-
     151side is reapeated in each row along with the separate records of the -N
     152side.  For M-N, either side may be replicated across multiple lines in the
     153file, as needed.  For a deeper hierarchy, the common records are repeated
     154as needed.  This is just a standard outer join.  If there is a large fanout
     155(1-lots of records) then could "compress' records by including one full copy
     156of a record, then just its key field with non-key fields left empty.  This can
     157represent any valence of relationship at the expense of some extra storage.
     158It has the advantage that related pieces are easy to identify, and it's not
     159necessary for them to be in any specific order, except that if the above
     160compression is used and some fields are required to be non-null, then
     161it's simpler if the complete record is available before the partial records.
     162
     163A flat file with embedded structure -- that is, cells that contain records,
     164or multiple items or records.  A simple example is a cell that contains a list
     165of strings, or a collection of key=value pairs.  Or even xml...
     166
     167Any combination of the above.
     168
     169
     170
     171
    113172----
    114173BluePrints