Changes between Version 27 and Version 28 of BluePrint/Importer

01/22/11 11:28:09 (13 years ago)
Pat Tressel



  • BluePrint/Importer

    v27 v28  
     113== CSV import ==
     115What sources and forms of data should we support?
     117Sources use cases:
     119A repository or other source that has its own data format that does not match ours.
     120A means of mapping from the source's schema to ours will be needed.
     121It is likely, for an existing source, that we will produce the schema mapping.
     122Would want to detect schema changes, or get notification of them.
     124A user who wants to upload data, and is willing to format it to our specification.
     125In this case the data can be processed without a schema mapping.
     127Source data structure uses cases (not yet discussing mapping -- only the source schema):
     129A flat table.
     131Multiple tables but 1-1or 1-at most 1 -- much the same as a flat table.
     1331-N relationships (such as are represented by the dependent table having an fk ref to the primary).
     135M-N relationships (typically represented by a relationship table).
     137Possible CSV representations:
     139Separate files per table with key references to link entries across tables.
     140The keys can either be existing Eden database keys (for updating existing
     141records), or scratch keys (not stored as ids in any other database, only
     142used to associate dependent records for this upload, or external database
     143keys (i.e. actual keys in the source database, which we might want to
     144preserve for future updates.)  This can easily represent any valence of
     147One file with separate sections, equivalent to concatenating the separate
     148files above.
     150A single file with an outer join of all the tables.  For 1-N, the data on the 1-
     151side is reapeated in each row along with the separate records of the -N
     152side.  For M-N, either side may be replicated across multiple lines in the
     153file, as needed.  For a deeper hierarchy, the common records are repeated
     154as needed.  This is just a standard outer join.  If there is a large fanout
     155(1-lots of records) then could "compress' records by including one full copy
     156of a record, then just its key field with non-key fields left empty.  This can
     157represent any valence of relationship at the expense of some extra storage.
     158It has the advantage that related pieces are easy to identify, and it's not
     159necessary for them to be in any specific order, except that if the above
     160compression is used and some fields are required to be non-null, then
     161it's simpler if the complete record is available before the partial records.
     163A flat file with embedded structure -- that is, cells that contain records,
     164or multiple items or records.  A simple example is a cell that contains a list
     165of strings, or a collection of key=value pairs.  Or even xml...
     167Any combination of the above.