Changes between Version 51 and Version 52 of BluePrint/Importer
- Timestamp:
- 01/23/11 08:27:41 (12 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
BluePrint/Importer
v51 v52 205 205 ==== Observed file formats and sample data: ==== 206 206 207 208 207 - Outer join has been observed "in the wild", e.g. the 209 208 [http://haiti.resource-finder.appspot.com/export?subject_type=hospital Google hospital data for Brazil] … … 243 242 importer is doing.// 244 243 245 There are two main categories of representation:244 There are two main categories of specification: 246 245 247 246 - Formatting, such as which of the file layouts is used, what the separator character 248 247 is, how the text is escaped, which cells are structured... This is the "parsing" 249 aspect of the representation.248 aspect of the specification. 250 249 251 250 - The actual mapping of the source schema to our schema, that is, once we have their … … 295 294 ==== File format specification: ==== 296 295 296 - File paths. 297 298 - One table per file, or concatenated tables, or a flat outer-join file? 299 300 - Character set? 301 302 - For concatenated tables, what is the table separator? 303 304 - Is there a row with column names in the file, or are column names supplied 305 separately? 306 307 - Column separator? (Popular separators other than commas are tabs, semicolons, 308 and vertical bars.) Or are columns specified by width? 309 310 - Comment character? or rows to ignore? (Some files have titles or explanatory 311 comments included in the file.) 312 313 - String quote characters? 314 315 - Format of embedded lists, i.e. are they quoted? what is the separator? 316 317 - Format of embedded objects? 318 319 - Are there links to other documents that should be uploaded? and how are they 320 distinguished from URLs that are just data? 321 297 322 ==== Schema mapping specification: ==== 298 323