Changes between Version 28 and Version 29 of BluePrint/Importer


Ignore:
Timestamp:
01/22/11 11:46:21 (11 years ago)
Author:
Pat Tressel
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • BluePrint/Importer

    v28 v29  
    113113== CSV import ==
    114114
    115 What sources and forms of data should we support?
     115=== What sources and forms of data should we support? ===
    116116
    117 Sources use cases:
     117==== Sources use cases: ====
    118118
    119 A repository or other source that has its own data format that does not match ours.
     119- A repository or other source that has its own data format that does not match ours.
    120120A means of mapping from the source's schema to ours will be needed.
    121121It is likely, for an existing source, that we will produce the schema mapping.
    122122Would want to detect schema changes, or get notification of them.
    123123
    124 A user who wants to upload data, and is willing to format it to our specification.
     124- A user who wants to upload data, and is willing to format it to our specification.
    125125In this case the data can be processed without a schema mapping.
    126126
    127 Source data structure uses cases (not yet discussing mapping -- only the source schema):
     127==== Data structure uses cases: ====
    128128
    129 A flat table.
     129(This is only about the source schema, not the CSV representation.)
    130130
    131 Multiple tables but 1-1or 1-at most 1 -- much the same as a flat table.
     131- A flat table -- one resource with no components or foreign key references.
    132132
    133 1-N relationships (such as are represented by the dependent table having an fk ref to the primary).
     133- Multiple tables but 1-1 or 1-(at most 1) -- a structure that could be a flat table.
    134134
    135 M-N relationships (typically represented by a relationship table).
     135- 1-N relationships (such as are represented by the dependent table having an fk ref to the primary).
    136136
    137 Possible CSV representations:
     137- M-N relationships (typically represented by a relationship table).
    138138
    139 Separate files per table with key references to link entries across tables.
    140 The keys can either be existing Eden database keys (for updating existing
    141 records), or scratch keys (not stored as ids in any other database, only
    142 used to associate dependent records for this upload, or external database
    143 keys (i.e. actual keys in the source database, which we might want to
    144 preserve for future updates.)  This can easily represent any valence of
    145 relationship.
     139=== Possible CSV formats we might receive ===
    146140
    147 One file with separate sections, equivalent to concatenating the separate
    148 files above.
     141- Separate files per table with key references to link entries across tables.
     142  This can easily represent any valence of relationship, and is much like a
     143  spreadsheet with multiple linked sheets.
     144  The keys might be:
     145 - Existing Eden database keys (for updating existing records).
     146 - The external source's keys (i.e. actual keys in the source database, which
     147   we might want to preserve for future updates.)
     148 - Scratch keys that the source includes to describe the structure
     149   (i.e. not stored as keys in their database, only used to associate related
     150   records for this upload.
    149151
    150 A single file with an outer join of all the tables.  For 1-N, the data on the 1-
    151 side is reapeated in each row along with the separate records of the -N
    152 side.  For M-N, either side may be replicated across multiple lines in the
    153 file, as needed.  For a deeper hierarchy, the common records are repeated
    154 as needed.  This is just a standard outer join.  If there is a large fanout
    155 (1-lots of records) then could "compress' records by including one full copy
    156 of a record, then just its key field with non-key fields left empty.  This can
    157 represent any valence of relationship at the expense of some extra storage.
    158 It has the advantage that related pieces are easy to identify, and it's not
    159 necessary for them to be in any specific order, except that if the above
    160 compression is used and some fields are required to be non-null, then
    161 it's simpler if the complete record is available before the partial records.
     152- One file with separate sections, equivalent to concatenating the separate
     153  files above.
    162154
    163 A flat file with embedded structure -- that is, cells that contain records,
    164 or multiple items or records.  A simple example is a cell that contains a list
    165 of strings, or a collection of key=value pairs.  Or even xml...
     155- A single file with a recursive outer join of all the tables.
     156  For 1-N, the data on the "1-"
     157  side is repeated in each row along with the separate records of the -N
     158  side.  For M-N, either side may be replicated across multiple lines in the
     159  file, as needed.  For a deeper hierarchy, the common records are repeated
     160  as needed.  This is just a standard outer join, so is easy for the remote
     161  source to produce if they have their data in a relational database. 
     162  (If there is a large fanout, i.e.
     163  1-(lots of records), then could "compress' records by including one full copy
     164  of a record, then just its key field with non-key fields left empty.  This can
     165  represent any valence of relationship at the expense of some extra storage.
     166  It has the advantage that related pieces are easy to identify, and it's not
     167  necessary for them to be in any specific order, except that if the above
     168  compression is used and some fields are required to be non-null, then
     169  it's simpler if the complete record is available before the partial records.
    166170
    167 Any combination of the above.
     171- A flat file with embedded structure -- that is, cells that contain records,
     172  or multiple items or records.  A simple example is a cell that contains a list
     173  of strings, or a collection of key=value pairs.  Or even xml...
     174
     175- Any combination of the above.
    168176
    169177