| 207 | //Under Construction -- need to enumerate the options and check what the spreadsheet |
| 208 | importer is doing.// |
| 209 | |
| 210 | There are two main categories of representation: |
| 211 | |
| 212 | - Formatting, such as which of the file layouts is used, what the separator character |
| 213 | is, how the text is escaped, which cells are structured... This is the "parsing" |
| 214 | aspect of the representation. |
| 215 | |
| 216 | - The actual mapping of the source schema to our schema, that is, once we have their |
| 217 | structured objects read in, how do we create our objects out of theirs? |
| 218 | |
| 219 | We should distinguish between the external specification that a user would submit |
| 220 | with their files, or produce via a UI, from the importer's internal representation. |
| 221 | We want the external specification to be easy for a person to construct rather than |
| 222 | easy for the importer to use. The importer can produce from that an internal |
| 223 | representation that is convenient for running the data conversion. |
| 224 | |
209 | | - If the data uses a format we specify, we don't need a schema mapping -- we just need |
210 | | to be told it's our formatting. |
211 | | |
212 | | - If the source has a schema that does not match ours, a means of mapping from the |
213 | | source's schema to ours will be needed. |
214 | | For an existing major source, it is likely that we would write the schema mapping. |
215 | | (But for a source we draw on regularly, there may be better means of pulling data |
216 | | than CSV files...) |
| 227 | - The file format (the options described above) seems to be largely independent of the |
| 228 | schema mapping. Let's try specifying them separately. |
| 229 | |
| 230 | - If the data uses a format and schema we specify, we don't need a format or mapping |
| 231 | supplied -- we just need to be told it's our native format and schema. |
| 232 | |
| 233 | - For an existing major source, it is likely that we would write the schema mapping. |
| 234 | But for a source we draw on regularly, there may be better means of pulling data |
| 235 | than CSV files... |
234 | | - In any case, by the time the back end is called, we should have a schema mapping. |
235 | | |
236 | | ==== Options for format and schema mapping representations: ==== |
237 | | |
238 | | //Under Construction -- need to enumerate the options and check what the spreadsheet |
239 | | importer is doing.// |
240 | | |
241 | | We want the representation to be easy for a person to construct rather than easy for |
242 | | the importer to use. The importer can always produce an internal representation that |
243 | | is convenient for running the data conversion. |
244 | | |
245 | | There are two main categories of representation: |
246 | | |
247 | | - Formatting, such as which of the file layouts is used, what the separator character |
248 | | is, how the text is escaped, which cells are structured... This is the "parsing" |
249 | | aspect of the representation. |
250 | | |
251 | | - The actual mapping of the source schema to our schema, that is, once we have their |
252 | | structured objects read in, how do we create our objects out of theirs? |
| 254 | - If the user specification and internal specification differ, the conversion can be |
| 255 | done as a preliminary step. For prominent sources, we might save either or both of |
| 256 | the user and internal representations. The user and internal specification may |
| 257 | change due to either a change in the source schema for the file format they use, |
| 258 | or to a change in our schema. |
| 259 | |
| 260 | ==== File format specification: ==== |
| 261 | |
| 262 | ==== Schema mapping specification: ==== |