[[TOC]] = S3XML = S3XML is a generic RESTful data exchange interface for the S3 framework. It comes with a genuine XML data format, but also provides built-in data format conversion and transformation to support a variety of custom XML, JSON and CSV formats and schemas. == Minimum Requirements == === Clients === Interfaces which want to exchange data with S3XML interfaces must implement the following: - an HTTP client which can perform GET and POST requests - the native S3XML data format ''Note:'' - Where the target interface has built-in support for data format conversion/transformation (as in S3), it is sufficient if the client implements an S3XML-compatible data format (XML, JSON or CSV). - S3 comes with a number of built-in transformation stylesheets for some standard data formats. Where other formats shall be used, clients can also provide their own XSLT transformation stylesheets. === Servers === Interfaces which want to provide S3XML server capabilities (e.g. for Synchronization) must implement the following: - an HTTP server interface accepting and performing GET, PUT and POST requests - the RESTful API as described in this document - the native S3XML data format Optionally they can provide: - JSON/CSV to S3XML conversion - S3XML to JSON/CSV conversion - XSLT-1.0 transformation == Conventions == === Name Space === Where a name space identifier for the native S3XML format is to be used, it shall be: - http://eden.sahanafoundation.org/wiki/S3XML In the current implementation of S3, no name space identifier shall be used. This is though subject to change in future versions. === Character Encoding === XML documents to be used for S3XML can specify their character encoding in the XML header. Where JSON or CSV formats are used, they are expected to be UTF-8 encoded. S3XML interfaces can support other encodings for JSON/CSV, but this is not a requirement. All exported data are always UTF-8 encoded. === URL format === Data format extensions in URLs must be all-lowercase. Where uppercase characters are used, they are converted into lowercase. == Interface == S3XML implements the [wiki:S3XRC/RESTfulAPI S3 RESTful API] and its [wiki:S3XRC/RESTfulAPI/URLFormat URL format] to address resources. === RESTful Methods === The following methods are supported: ||'''Method'''||'''Action'''|| ||GET ||returns the contents of the specified resource|| ||GET /fields||returns a ''schema'' document for the resource (without components)|| ||GET /options||returns a field options document for the resource|| ||GET /create ''without source''||returns a ''schema'' document for the resource|| ||GET /create ''with source''||analyzes the source and returns an import job (both creating new and updating existing records), returns a view of the job|| ||GET /create ''with job ID''||returns a view of the specified import job|| ||POST /create ''with job ID''||updates or deletes the specified job|| ||GET /update ''without source''||returns a ''schema'' document for the resource and all of its components|| ||GET /update ''with source''||analzes the source and returns an import job (only updating existing records), returns a view of the job|| ||GET /update ''with job ID''||returns a view of the specified import job|| ||POST /update ''with job ID''||updates or deletes the specified job|| ||POST ''with job ID''||commits the specified job to the database|| ||POST ''with source''||analyzes the source, creates and import job and commits the job to the database|| ||PUT ''with job ID''||commits the specified job to the database|| ||PUT ''with source''||analyzes the source, creates an import job and commits the job to the database|| === Source Submission === There are multiple ways to submit source files: ==== Files on the Server ==== A source file in the server file system can be specified using the ''filename'' URL variable: {{{ PUT http:////?filename= }}} Multiple files can be specified as list of comma-separated pathnames: {{{ PUT http:////?filename=,, }}} ==== URLs ==== A source file can be specified by its URL using the ''fetchurl'' URL variable: {{{ PUT http:////?fetchurl= }}} Multiple files can be specified as list of comma-separated pathnames: {{{ PUT http:////?fetchurl=,, }}} Supported URL protocols are http, ftp and file, where file is interpreted in the server file system context. URLs of different protocols can be mixed. The specified URLs must be accessible either without authentication, or (if you specify credentials in the URLs) they must support unsolicited HTTP basic authentication - HTTP 403 retries are not handled by the interface. The URLs must be properly [http://www.w3schools.com/tags/ref_urlencode.asp quoted], and must not contain commas. ==== Request Attachments ==== Source files can also be attached to a multipart-request. In this case the file extension of the source file must match the request URL file extension. Multiple files can be attached. ==== Multiple Sources ==== Where multiple sources are specified or attached, they are first converted and transformed one-by-one and then combined into a single element tree before import. ==== Duplicate Resolution ==== In the current S3 implementation, the interface does not handle duplicates within the same request. This is because the order of elements in the resulting element tree is not defined, and the last update time attribute is optional in source elements, so that there is no predictable rule of precedence. Records in the source must not be fractionated, but submitted in one element. Fractions of records will not be merged by the interface, and which of the fractions finally would be imported is unpredictable. Source elements using unique keys are automatically matched with existing records. Where matches are ambiguous (e.g. a set of keys matching multiple existing records), the import element will be rejected as invalid. For certain resources, the server may have additional duplicate finders and resolvers configured. How duplicates are handled by these resolvers, can differ from resource to resource. The default behavior for duplicate resolution in standard import mode is to update the exiting records with the values from the source record. In synchronization mode, though, the default is to accept/keep the newest data (and the last update time attribute mandatory). === Data Format Conversion and Transformation === S3XML interfaces may provide built-in codecs to convert and transform the input or output data from/to various data formats: [[Image(ConversionTransformation.png)]] The current S3 implementation provides built-in codecs for CSV, XML and JSON formats (PDF with OCR to come). XSLT stylesheets for the format transformations can be built-in on the server (found by the request URL file extension), or can be specified by the client. The client can use the ''transform'' URL variable to specify the path (on the server file system) or URL of the XSLT stylesheet: {{{ GET http:////.?transform= }}} Alternatively, the client can attach the stylesheet to the request body. In this case the stylesheet's file name must be: .xslt. The ''transform'' variable overrides any attached or built-in stylesheets, and attached stylesheets override built-in stylesheets. The ''.xml'' request URL extension is reserved for the native S3XML format, and must not use or accept any stylesheets. === Error Handling === The HTTP status code in the response indicates the success or failure of a request: ||'''Status Code'''||'''Causes'''||'''Response Body'''|| ||200 OK||Success||results or JSON message|| ||400 BAD REQUEST||Syntax error or method not supported||JSON message|| ||401 UNAUTHORIZED||Authorization required||Clear text error|| ||403 FORBIDDEN||Insufficient permissions||Clear text error|| ||404 NOT FOUND||Non-existent Resource||Clear text error|| ||50x||Unrecoverable internal error||Clear text error|| Where a JSON message is returned, it has the following structure: {{{ { success= "True" | "False", statuscode = "XXX", message = "clear text error message", tree = { /* element tree */ } } }}} If there was an input element tree and it contained any errors, a subtree with the invalid elements will be added to the JSON message ("tree"). This subtree is expressed in [#JSONFormat1 JSON Format]. Invalid elements will have an additional ''@error'' attribute containing a clear-text error desription. **Skipping invalid records at import: By default, an import request will be rolled back (completely) and an HTTP 400 BAD REQUEST error be raised if the source contains any invalid data. You can override this behavior by using the ''ignore_errors'' URL variable (with any non-empty string, e.g. {{{?ignore_errors=True}}}) - invalid records would then be skipped, while the valid ones would be committed to the database and the request return a HTTP 200 OK. The JSON message would though however contain the error message and the element tree. Note that ignore_errors applies to Validation Errors only. Any other error (e.g. XML syntax error) will be handled as usual (=rollback + error message). The ''ignore_errors'' option is meant for "dirty" data, e.g. cases where you need to import from a source but do not have permission and/or means to clean it up before import - where possible, you should avoid ignore_errors and rather sanitize the source. == XML Format == === Document Types and Structure === S3XML defines 3 types of documents: ==== Schema Documents ==== '''Schema documents''' describe the data schema for a resource. Clients can use these documents e.g. for automatic generation of forms. Document Tree: {{{ ... ... }}} or (if requested with the ''fields'' URL method): {{{ ... }}} ''Note:'' - In the current S3 implementation, these documents can only be requested (GET). Future versions may also accept submissions of such documents to update the data schema. ==== Field Option Documents ==== '''Field option documents''' describe the currently acceptable options for fields in a resource. Clients can use these documents e.g. for automatic generation and/or client-side validation of forms. Document Tree: {{{ ... }}} ''Note:'' - if the ''field'' URL variable is used to specify a particular field in the resource, the enclosing element is omitted (i.e.