| 113 | == CSV import == |
| 114 | |
| 115 | What sources and forms of data should we support? |
| 116 | |
| 117 | Sources use cases: |
| 118 | |
| 119 | A repository or other source that has its own data format that does not match ours. |
| 120 | A means of mapping from the source's schema to ours will be needed. |
| 121 | It is likely, for an existing source, that we will produce the schema mapping. |
| 122 | Would want to detect schema changes, or get notification of them. |
| 123 | |
| 124 | A user who wants to upload data, and is willing to format it to our specification. |
| 125 | In this case the data can be processed without a schema mapping. |
| 126 | |
| 127 | Source data structure uses cases (not yet discussing mapping -- only the source schema): |
| 128 | |
| 129 | A flat table. |
| 130 | |
| 131 | Multiple tables but 1-1or 1-at most 1 -- much the same as a flat table. |
| 132 | |
| 133 | 1-N relationships (such as are represented by the dependent table having an fk ref to the primary). |
| 134 | |
| 135 | M-N relationships (typically represented by a relationship table). |
| 136 | |
| 137 | Possible CSV representations: |
| 138 | |
| 139 | Separate files per table with key references to link entries across tables. |
| 140 | The keys can either be existing Eden database keys (for updating existing |
| 141 | records), or scratch keys (not stored as ids in any other database, only |
| 142 | used to associate dependent records for this upload, or external database |
| 143 | keys (i.e. actual keys in the source database, which we might want to |
| 144 | preserve for future updates.) This can easily represent any valence of |
| 145 | relationship. |
| 146 | |
| 147 | One file with separate sections, equivalent to concatenating the separate |
| 148 | files above. |
| 149 | |
| 150 | A single file with an outer join of all the tables. For 1-N, the data on the 1- |
| 151 | side is reapeated in each row along with the separate records of the -N |
| 152 | side. For M-N, either side may be replicated across multiple lines in the |
| 153 | file, as needed. For a deeper hierarchy, the common records are repeated |
| 154 | as needed. This is just a standard outer join. If there is a large fanout |
| 155 | (1-lots of records) then could "compress' records by including one full copy |
| 156 | of a record, then just its key field with non-key fields left empty. This can |
| 157 | represent any valence of relationship at the expense of some extra storage. |
| 158 | It has the advantage that related pieces are easy to identify, and it's not |
| 159 | necessary for them to be in any specific order, except that if the above |
| 160 | compression is used and some fields are required to be non-null, then |
| 161 | it's simpler if the complete record is available before the partial records. |
| 162 | |
| 163 | A flat file with embedded structure -- that is, cells that contain records, |
| 164 | or multiple items or records. A simple example is a cell that contains a list |
| 165 | of strings, or a collection of key=value pairs. Or even xml... |
| 166 | |
| 167 | Any combination of the above. |
| 168 | |
| 169 | |
| 170 | |
| 171 | |