wiki:S3/S3XML

Context Navigation

Version 50 (modified by Dominic König, 14 years ago) ( diff )
--

S3XML

S3XML is a generic RESTful data exchange interface for the S3 framework.

It comes with a genuine XML data format, but also provides built-in data format conversion and transformation to support a variety of custom XML, JSON and CSV formats and schemas.

Minimum Requirements

Clients

Interfaces which want to exchange data with S3XML interfaces must implement the following:

an HTTP client which can perform GET and POST requests
the native S3XML data format

Note:

Where the target interface has built-in support for data format conversion/transformation (as in S3), it is sufficient if the client implements an S3XML-compatible data format (XML, JSON or CSV).
S3 comes with a number of built-in transformation stylesheets for some standard data formats. Where other formats shall be used, clients can also provide their own XSLT transformation stylesheets.

Servers

Interfaces which want to provide S3XML server capabilities (e.g. for Synchronization) must implement the following:

an HTTP server interface accepting and performing GET, PUT and POST requests
the RESTful API as described in this document
the native S3XML data format

Optionally they can provide:

JSON/CSV to S3XML conversion
S3XML to JSON/CSV conversion
XSLT-1.0 transformation

Conventions

Name Space

Where a name space identifier for the native S3XML format is to be used, it shall be:

http://eden.sahanafoundation.org/wiki/S3XML

In the current implementation of S3, no name space identifier shall be used. This is though subject to change in future versions.

Character Encoding

XML documents to be used for S3XML can specify their character encoding in the XML header.

Where JSON or CSV formats are used, they are expected to be UTF-8 encoded. S3XML interfaces can support other encodings for JSON/CSV, but this is not a requirement.

All exported data are always UTF-8 encoded.

URL format

Data format extensions in URLs must be all-lowercase. Where uppercase characters are used, they are converted into lowercase.

Interface

S3XML implements the S3 RESTful API and its URL format to address resources.

RESTful Methods

The following methods are supported:

Method Action
GET <resource> returns the contents of the specified resource
GET <resource>/fields returns a schema document for the resource (without components)
GET <resource>/options returns a field options document for the resource
GET <resource>/create without source returns a schema document for the resource
GET <resource>/create with source analyzes the source and returns an import job (both creating new and updating existing records), returns a view of the job
GET <resource>/create with job ID returns a view of the specified import job
POST <resource>/create with job ID updates or deletes the specified job
GET <resource>/update without source returns a schema document for the resource and all of its components
GET <resource>/update with source analzes the source and returns an import job (only updating existing records), returns a view of the job
GET <resource>/update with job ID returns a view of the specified import job
POST <resource>/update with job ID updates or deletes the specified job
POST <resource> with job ID commits the specified job to the database
POST <resource> with source analyzes the source, creates and import job and commits the job to the database
PUT <resource> with job ID commits the specified job to the database
PUT <resource> with source analyzes the source, creates an import job and commits the job to the database

Source Submission

There are multiple ways to submit source files:

Files on the Server

A source file in the server file system can be specified using the filename URL variable:

PUT http://<server>/<controller>/<resource>?filename=<path>

Multiple files can be specified as list of comma-separated pathnames:

PUT http://<server>/<controller>/<resource>?filename=<path>,<path>,<path>

URLs

A source file can be specified by its URL using the fetchurl URL variable:

PUT http://<server>/<controller>/<resource>?fetchurl=<url>

Multiple files can be specified as list of comma-separated pathnames:

PUT http://<server>/<controller>/<resource>?fetchurl=<url>,<url>,<url>

Supported URL protocols are http, ftp and file, where file is interpreted in the server file system context. URLs of different protocols can be mixed.

The specified URLs must be accessible either without authentication, or (if you specify credentials in the URLs) they must support unsolicited HTTP basic authentication - HTTP 403 retries are not handled by the interface. The URLs must be properly quoted, and must not contain commas.

Request Attachments

Source files can also be attached to a multipart-request. In this case the file extension of the source file must match the request URL file extension. Multiple files can be attached.

Multiple Sources

Where multiple sources are specified or attached, they are first converted and transformed one-by-one and then combined into a single element tree before import.

Duplicate Resolution

In the current S3 implementation, the interface does not handle duplicates within the same request. This is because the order of elements in the resulting element tree is not defined, and the last update time attribute is optional in source elements, so that there is no predictable rule of precedence.

Records in the source must not be fractionated, but submitted in one element. Fractions of records will not be merged by the interface, and which of the fractions finally would be imported is unpredictable.

Source elements using unique keys are automatically matched with existing records. Where matches are ambiguous (e.g. a set of keys matching multiple existing records), the import element will be rejected as invalid. For certain resources, the server may have additional duplicate finders and resolvers configured. How duplicates are handled by these resolvers, can differ from resource to resource.

The default behavior for duplicate resolution in standard import mode is to update the exiting records with the values from the source record. In synchronization mode, though, the default is to accept/keep the newest data (and the last update time attribute mandatory).

Data Format Conversion and Transformation

S3XML interfaces may provide built-in codecs to convert and transform the input or output data from/to various data formats:

The current S3 implementation provides built-in codecs for CSV, XML and JSON formats (PDF with OCR to come).

XSLT stylesheets for the format transformations can be built-in on the server (found by the request URL file extension), or can be specified by the client. The client can use the transform URL variable to specify the path (on the server file system) or URL of the XSLT stylesheet:

GET http://<server>/<controller>/<resource>.<extension>?transform=<path_or_url>

Alternatively, the client can attach the stylesheet to the request body. In this case the stylesheet's file name must be: <resource>.xslt.

The transform variable overrides any attached or built-in stylesheets, and attached stylesheets override built-in stylesheets. The .xml request URL extension is reserved for the native S3XML format, and must not use or accept any stylesheets.

Error Handling

The HTTP status code in the response indicates the success or failure of a request:

Status Code	Causes	Response Body
200 OK	Success	results or JSON message
400 BAD REQUEST	Syntax error or method not supported	JSON message
401 UNAUTHORIZED	Authorization required	Clear text error
403 FORBIDDEN	Insufficient permissions	Clear text error
404 NOT FOUND	Non-existent Resource	Clear text error
50x	Unrecoverable internal error	Clear text error

Where a JSON message is returned, it has the following structure:

  {
    success= "True" | "False",
    statuscode = "XXX",
    message = "clear text error message",
    tree = {
      /* element tree */
    }
  }

If there was an input element tree and it contained any errors, a subtree with the invalid elements will be added to the JSON message ("tree"). This subtree is expressed in JSON Format. Invalid elements will have an additional @error attribute containing a clear-text error desription.

Skipping invalid records at import:

By default, an import request will be rolled back (completely) and an HTTP 400 BAD REQUEST error be raised if the source contains any invalid data. You can override this behavior by using the ignore_errors URL variable (with any non-empty string, e.g. ?ignore_errors=True) - invalid records would then be skipped, while the valid ones would be committed to the database and the request return a HTTP 200 OK. The JSON message would though however contain the error message and the element tree. Note that ignore_errors applies to Validation Errors only. Any other error (e.g. XML syntax error) will be handled as usual (=rollback + error message).

The ignore_errors option is meant for "dirty" data, e.g. cases where you need to import from a source but do not have permission and/or means to clean it up before import - where possible, you should avoid ignore_errors and rather sanitize the source.

XML Format

Document Types and Structure

S3XML defines 3 types of documents:

Schema Documents

Schema documents describe the data schema for a resource. Clients can use these documents e.g. for automatic generation of forms.

Document Tree:

<s3xml>
  <resource>
    <field>
    ...
    <resource>
      <field>
      ...
    </resource>
  </resource>
</s3xml>

or (if requested with the fields URL method):

<fields resource="name">
  <field/>
  <field/>
  <field/>
  ...
</fields>

Note:

In the current S3 implementation, these documents can only be requested (GET). Future versions may also accept submissions of such documents to update the data schema.

Field Option Documents

Field option documents describe the currently acceptable options for fields in a resource. Clients can use these documents e.g. for automatic generation and/or client-side validation of forms.

Document Tree:

<options>
  <select>
    <option>
    <option>
    <option>
    ...
  </select>
  <select>
    ...
  </select>
  ...
</options>

Note:

if the field URL variable is used to specify a particular field in the resource, the enclosing <options> element is omitted (i.e. <select> becomes root element)
In the current S3 implementation, transformation of field option documents is not supported. JSON conversion is possible, though.
Field option documents can only be requested (GET). Future versions may also accept submissions of such documents to update the data schema.

Data Documents

Data documents provide the current contents (data) of resources.

Document Tree:

<s3xml>
  <resource> <!-- primary resource element -->
    <data> <!-- field data -->
    <data>
    ...
    <resource> <!-- component resource inside the primary resource -->
      <data>
      <data>
      <reference/> <!-- reference -->
      ...
    </resource>
    <reference/> <!-- reference -->
    <reference> <!-- reference with embedded resource element -->
       <resource>
         <data>
         ...
       </resource>
    </reference>
  </resource>
</s3xml>

Components

Component resources are <resource> elements inside of their primary <resource> element. Component records will be automatically imported and the required key references be added (=no explicit reference-element required).

Foreign key references of component records to their primary record will not be exported, and where they appear in import sources, they will be ignored.

Components of components are not allowed (maximum depth 1), and where they appear in import sources, they will be ignored.

Where components use link-tables and the component record can be linked to multiple parent records (many-to-many) or where the link table entry can carry data (attributed link), the respective link-table record is exported as component <resource> with a forward <reference> to the actual component record, while the component record itself is represented by a separate <resource> element (outside the primary resource).

References

Foreign key references (except those linking components to their primary record) are represented by <reference> elements.

Foreign keys can be importable UIDs (uuid-attribute, which will be both imported and used to find and/or link to existing records in the DB) or temporary UIDs (tuid-attribute, which will not be imported but only used to find records within the current tree), If a <resource> element with a matching UID key attribute is found in the same tree, it will be automatically imported.

References inside referenced elements will be resolved (unlimited depth) and also be imported. Circular references will be detected and properly resolved.

Multi-references (list:reference type in web2py) use a list of UID keys separated by vertical dashes like uuid=|uid1|uid2|uid3|. The leading and trailing vertical dashes must be present.

If a <resource> element is embedded inside the <reference>, either or both of the UID keys can be omitted. Where both keys are however used, they must match. Multiple embedded <resource> elements are allowed for multi-references.

Element Descriptions

s3xml

The root element (in schema and data documents).

<s3xml success="true" results="2" domain="mycomputer" url="http://127.0.0.1:8000/eden" latmin="-90.0" latmax="90.0" lonmin="-180.0" lonmax="180.0">
   ...
</s3xml>

Parent elements	none (root element)
Child elements	resource
Contents	empty

Attributes:

Name	Type	Description	mandatory?
domain	string	the domain name of the data repository	no
url	string	the URL of the data repository	no
success	boolean	true if the page contains any records, otherwise false	no
results	integer	the total number of records matching the request	no
start	integer	the index of the first record returned (in paginated requests)	no
limit	integer	the maximum number of records returned (in paginated requests)	no
latmin, latmax, lonmin, lonmax	geo-location boundary box of the results	no

resource

Represents a record (in data documents) or a database table (in schema documents).

<s3xml>
  <resource name="xxx_yyy">
     ...
  </resource>
</s3xml>

Parent elements	s3xml, resource, reference
Child elements	resource, data, field
Contents	empty

Attributes:

Name	Type	Description	mandatory?
name	string	the name of the resource, usually the DB table name	yes
uuid	string	a unique identifier for the record	no*
tuid	string	a temporary unique identifier for the record	no*
created_on	datetime	date and time when the record was created	no
modified_on	datetime	date and time when the record was last updated	no, default: request date/time*
created_by	string	username (email-address) of the user who created the record	no
modified_by	string	username (email-address) of the user who last updated the record	no
mci	integer	master-copy-index	no, default: 2*

(*) Records will be identified within the input file by their uuid, or, if no uuid is specified, by their tuid. () as YYYY-MM-DDTHH:mm:ssZ, always UTC (*) the last update date/time and mci are required in synchronization () the master copy index specifies how often a record has been copied across sites, see below

The uuid will be stored in the database together with the record. If uuid is present and matches an existing record in the database, then this record will be updated. If there's no match or no uuid specified in the resource element, then the importer will create a new record in the database (and automatically generate a uuid if required).

The mci - master-copy-index - indicates how often this record has been copied across sites:

when importing a new record the mci value is always *imported* as-is from the source
when updating a record, the mci of the database record remains unchanged
the mci of a record is *exported* as its current database value + 1.
the repository first creating a record sets mci=0 in the database record, which appears as mci=1 in the exported XML.
a copying site then imports mci=1 into its database, which appears as mci=2 in its export XML, and so forth...

The mci can be used to filter records for whether they have been originated at a repository or not. If there's a fixed set of synchronization paths between a number of S3 instances, the mci can be used for conflict resolution. If the mci is not specified, it defaults to 2.

MCI handling is optional for non-synchronizing interfaces.

data

Parent elements	resource
Child elements	none (leaf element)
Contents	Text

Represents the value of a single field in the record.

Attributes:

Name	Type	Description	mandatory?
field	string	the field name in the record	yes
value	JSON value	the native field value	no
url	URL	the URL to download the contents from*	no
filename	filename	the filename of the attached contents*	no

The text node in the data element provides a human-readable representation of the field value. If this representation is different from the original value in the database, then the original value must be provided by the value attribute.

(*) If the field is for file upload, a url attribute should be provided to specify the location of the file. The importer will try to download and store the file (file transfer) from that URL (pull). It is also possible to send the file with the HTTP request - in this case the filename must be specified instead of url (push). The push variant for uploads is meant for peers which do not support pulling for some reason (e.g. mobile phones). Normal servers would always provide a URL for download in order to allow the consuming site decide which files to download and when (saves bandwidth).

reference

Parent elements	resource
Child elements	resource
Contents	Text

Represents a foreign key reference.

Attributes:

Name	Type	Description	mandatory?
field	string	the field name in the record	yes
resource	string	the name of the referenced resource, usually the tablename	yes
uuid	string	the unique identifier of the referenced record (foreign key)*	(yes)**
tuid	string	a temporary identifier for a referenced record (foreign key)*	(yes)**

(*) Referenced records would always be exported in the same output file. If a referenced record is found in the same input file, then it will be automatically imported.

(**) Records will be identified within the input file by their uuid, or, if no uuid is specified, by their tuid.

If the referenced record is enclosed in the reference element, then uuid and tuid can be omitted:

<s3xml>
   <resource name="xxxyyy">
       <reference field="xy" resource="aaabbb">   <!-- the reference element, uuid/tuid can be omitted if -->
          <resource name="aaabbb">                <!-- the referenced record is enclosed in the reference -->
          </resource>
       </reference>
   </resource>
</s3xml>

JSON Format

CSV Format

Examples

XML Format

<s3xml>

  <resource                                                 <-- a record in the database -->
      created_on="2009-10-02 08:55:11"                      <-- date/time when the record was created -->
      modified_on="2009-10-02 08:56:03"                     <-- date/time when the record was last modified -->
      uuid="6e6e76dc-8ed7-408c-bb09-54476e3944ae"           <-- UUID of the record (if present in DB) -->
      created_by="None"                                     <-- Author -->
      modified_by="Dominic"                                 <-- Last Author -->
      name="pr_person">                                     <-- Resource Name -->

    <reference                                              <-- Reference Field (foreign key) in the record -->
      field="pr_pe_id"                                      <-- Field name -->
      resource="pr_pentity"                                 <-- Name of the referenced resource -->
      uuid="6e6e76dc-8ed7-408c-bb09-54476e3944ae"/>         <-- UUID of the referenced entry -->

    <data field="pr_pe_label">730421</data>                 <-- A field in the record -->
    <data field="first_name">Dominic</data>
    <data field="middle_name"/>
    <data field="last_name">König</data>
    <data field="preferred_name"/>
    <data field="local_name"/>
    <data field="opt_pr_gender" value="3">male</data>
    <data field="opt_pr_age_group" value="5">Adult (21-50)</data>
    <data field="email">dominic@nursix.org</data>
    <data field="mobile_phone"/>
    <data field="date_of_birth">1973-04-21</data>
    <data field="opt_pr_nationality" value="65">Germany</data>
    <data field="opt_pr_country" value="167">Sweden</data>
    <data field="opt_pr_religion" value="1">none</data>
    <data field="opt_pr_marital_status" value="3">married</data>
    <data field="occupation">Nurse</data>
    <data field="comment"/>

    <resource                                               <-- A sub-resource (component) of the record -->
      created_on="2009-10-02 11:34:34"
      modified_on="2009-10-02 11:34:34"
      uuid="89217054-3c10-4f5d-959a-420254243498"
      name="pr_address">

      <data
        field="opt_pr_address_type"                         <-- field name -->
        value="1">                                          <-- original value in the database -->
          Home Address                                      <-- value represented for human readability -->
      </data>
      <data field="co_name"/>
      <data field="street1">Lundgatan</data>
      <data field="street2"/>
      <data field="postcode">38031</data>
      <data field="city">Läckeby</data>
      <data field="state"/>
      <data field="opt_pr_country" value="167">Sweden</data>
      <data field="lat">56.78042</data>
      <data field="lon">16.27914</data>
      <data field="comment"/>
    </resource>
  </resource>
</s3xml>

UUID - how we handle Unique IDs for records across heterogeneous systems

JSON Format

The data structure of the native S3JSON format is equivalent to the XML format (=element trees) - except that markup elements are represented by prefixes:

{
    "@domain": "yana",                                             // Server name
    "@url": "http://127.0.0.1:8000/eden"                           // Server URL
    "$_pr_person": {                                               // Resource, prefix: $_
        "@uuid": "44fc762e-02df-44e0-8bd1-9b58e3132894",           // Resource attribute, prefix: @
        "@url": "http://127.0.0.1:8000/eden/pr/person/1",
        "@created_on": "2009-11-16 22:33:35",
        "@created_by": "None",
        "@modified_on": "2009-11-19 21:32:19",
        "@modified_by": "Dominic",
        "first_name": "Dominic",                                   // Data field, no prefix
        "last_name": "K\u00f6nig",
        "email": "dominic@nursix.org",
        "opt_pr_age_group": {"@value": "1", "$": "unknown"},       // Data field with textual representation:
        "opt_pr_religion": {"@value": "1", "$": "none"},           // @value=Value, $=TextualRepresentation
        "opt_pr_gender": {"@value": "1", "$": "unknown"},
        "opt_pr_nationality": {"@value": "999", "$": "unknown"},
        "opt_pr_country": {"@value": "999", "$": "unknown"},
        "opt_pr_marital_status": {"@value": "1", "$": "unknown"},
        "$k_pr_pe_id": {                                           // External Reference (Key), prefix: $k_
            "@resource": "pr_pentity",                             // Key resource name
            "@uuid": "a2a945bd-4f43-41da-bcdb-e2e638a987ea",       // UUID of the key record
            "$": "Dominic K\u00f6nig [no label] (Person)"          // Textual representation of the reference
        },
        "$_pr_presence": {                                         // Sub-resource (Component):
            "@uuid": "14af2751-7277-4e90-b42b-0d0430684561",       // appears as component within the resource
            "@created_on": "2009-11-19 19:42:46",
            "@modified_on": "2009-11-19 19:42:46"
            "@url": "http://127.0.0.1:8000/eden/pr/person/1/presence/1",
            "opt_pr_presence_condition": {"@value": "4", "$": "Found"},
            "time": {"@value": "2009-11-19 18:42:00 +0000", "$": "2009-11-19 20:42:00"},
            "$k_reporter": {
                "@resource": "pr_person",
                "@uuid": "44fc762e-02df-44e0-8bd1-9b58e3132894",
                "$": "Dominic K\u00f6nig"
            },
        }
    },
}

JSON format characteristics:

The JSON output contains _no_ whitespace between elements, it's just added here by hand for better readability

The outermost structure is always a JSON object (not a list)
All data is represented as strings (for security reasons)

If @value is sent for a field, it overrides the element text ($) at import
however, the use of @value is not mandatory, data can simply be placed instead of element text
Note that there is no automatic data encoding: data must be sent in DB-encoded format
@resource, @name and @uuid attributes are mandatory at input, other attributes can be omitted

Multiple records of the same resource will be aggregated as lists like:

{
    $_my_resource: [
        {
            // record1 of my_resource
        }
        {
            // record2 of my_resource
        }
    ]
}

Attachments (1)

ConversionTransformation.png (24.0 KB ) - added by Dominic König 14 years ago.

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text

Method	Action
GET <resource>	returns the contents of the specified resource
GET <resource>/fields	returns a schema document for the resource (without components)
GET <resource>/options	returns a field options document for the resource
GET <resource>/create without source	returns a schema document for the resource
GET <resource>/create with source	analyzes the source and returns an import job (both creating new and updating existing records), returns a view of the job
GET <resource>/create with job ID	returns a view of the specified import job
POST <resource>/create with job ID	updates or deletes the specified job
GET <resource>/update without source	returns a schema document for the resource and all of its components
GET <resource>/update with source	analzes the source and returns an import job (only updating existing records), returns a view of the job
GET <resource>/update with job ID	returns a view of the specified import job
POST <resource>/update with job ID	updates or deletes the specified job
POST <resource> with job ID	commits the specified job to the database
POST <resource> with source	analyzes the source, creates and import job and commits the job to the database
PUT <resource> with job ID	commits the specified job to the database
PUT <resource> with source	analyzes the source, creates an import job and commits the job to the database

Context Navigation

Table of Contents

S3XML

Minimum Requirements

Clients

Servers

Conventions

Name Space

Character Encoding

URL format

Interface

RESTful Methods

Source Submission

Files on the Server

URLs

Request Attachments

Multiple Sources

Duplicate Resolution

Data Format Conversion and Transformation

Error Handling

XML Format

Document Types and Structure

Schema Documents

Field Option Documents

Data Documents

Components

References

Element Descriptions

s3xml

resource

data

reference

JSON Format

CSV Format

Examples

XML Format

JSON Format

Attachments (1)

Download in other formats: