wiki:BluePrint/Synchronisation

Version 29 (modified by Hasanat Kazmi, 12 years ago) ( diff )

--

WEB SERVICES API

(This API is also used by daemonX which drives automatic p2p syncing between clients)

Supported web services:

JSON
JSON-RPC

To do (partial work has been done):

XML
XML-RPC

Service Proxy:

JSON: http://localhost:8000/sahana/admin/call/json

e.g. http://localhost:8000/sahana/admin/call/json/getdata?timestamp=0&.........

JSON-RPC: http://server's-ip:port/sahana/admin/call/jsonrpc replace json and jsonrpc with xml and xmlrpc respectively for xml

Available functions:
putdata(uuid, username, password, nicedbdump):

This function is used to insert data in the system. args:

uuid (required):

uuid of the machine which is calling, uuid of machine is 16 character unique string. In the case when web services client is also a Sahana instance, uuid will be generated and stored by deamonX.

Username, password (required):

Used for authentication purposes. Both are strings. A user must be registered at the host machine with data alteration privileges. e.g. Administrator of the system can put data in the system.

nicedbdump (required):

nicedbdump can be best illustrated using diagrammatically representation. If [] represents a python list then: nicedump = [ each element of this list is another list representing a database table

[BR

table name,
[ comma separated table attributes as string]
[ each element in this list is a list which represents a row in table

[comma separated row values]
[]
..
..

]

]
[]
[]
..
..
..

]

Note that if you pass a table using nicedbdump which is not present in database, it will be simply ignored. If nicedbdump is not formated properly then an error string will be returned. Following situations will raise an error: If nicedbdump is not a list. If nicedbdump is not list of lists. Each list in nicedbdump represents a table, say n:

If n does not exactly has 3 elements. If first of these three elements is not a string data type. If second of these three elements, say s, is not a list:

if s in this list is not a string

If third of these three elements, say t, is not a list:

if each element in t is not a list, let such an element be r:

if number of elements in r is not equal to number of elements in s

If a table (s in the case described above) is not having 'id', 'uuid' and 'modified_on' as attribute. 'id' is unique id for each row in table. This 'uuid' is different from the 'uuid' which daemonX maintains, this is row uuid. 'modified_on' represents the last time data was modified (or created it not altered after creation) Note that only that data (referenced by row uuid) which has never life then the one in database will be added. In case of absence of that uuid in database, that data is be added. return:

If user is authenticated and nicedbdump is successfully parsed, data will be added to the database and True will be returned. On the other hand, in case of error, error message will be returned as String.

getdata(uuid, username, password, timestamp = None):

retruns data as nicedbdump defined in putdata. Data after timestamp time will returned, if None is passes as timestamp, then that data which has been added to the system after last getdata call from uuid will be returned. Args:

uuid (required):

uuid of the machine which is calling, uuid of machine is 16 character unique string. In the case when web services client is also a Sahana instance, uuid will be generated and stored by deamonX.

username, password (required except for local machine):

Both are strings. used for authentication, user must have privileges for reading the database. If service is called from local machine (i.e. with IP 127.0.0.1) username and password are ignored and user is given access. e.g. deamonX accesses this function locally without providing username and password.

Timestamp (optional):

timestamp is of string type. It should be like “YYYY-MM-DD HH:MM:SS”. If timestamp = null is passes, system will automatically return data after last getdata operation between uuid and machine. deamonX uses this setting

return:

In case of error (like failure to authenticate), error message as string will be returned. If successful, then nicedbdump will be returned which is described above.

Example code:
Python:

from jsonrpc import ServiceProxy, JSONRPCException
#jsonrpc needs simplejson which we have to install first
s = ServiceProxy("http://localhost:8000/sahana/admin/call/jsonrpc")
try:
	nicedbdump = s.getdata("machinename12345","email@lums.edu.pk", "myPassword")
	if type(result) == str:
		#it means there is an error, now result has error messege
		pass
	else:
		#result is list type for sure, 
		#result is nicedbdump type
    		
	putit = s.putdata("machinename12345", "email@lums.edu.pk", "myPassword", nicedbdump)
	if putit == True:
		#data sucessfully sent, parsed and processes at server (but in this case no data will be added because you just queried data and sent it back)
		pass
	else:
		#its an error
		pass
except JSONRPCException, e:
	print repr(e.error)

Note:

You can write a client in language of your choice.

Help:

IRC: #sahana at freenode email: <Fran Boon> francisboon at googlemail dot com or hasanatkazmi at gmail dot com

Choosing ZeroConf for Network discovery

Automatic synchronization between servers require automatic service discovery. We had two major options to choose from: 1) ZeroConf 2) Mesh4x

ZeroConf & Mesh4x solve different problems. They don't overlap in functionality at all: ZeroConf provides a solution to automatic discovery. Mesh4x provides a solution to the data sync.

We were more interested in Zeroconf because: 1) ZeroConf has Python library but Mesh4x doesn't. I means double work was required if we go with Mesh4x. 2) We just needed automatic discovery of service because we wanted to use web services, so that foreign developers can also use Restful API 3) Mesh4x required java daemon, which meant adding jre in the package which would double Sahana package size.

deamonX: Daemon which runs automatic synchronization

We created a daemon which calls web services listed above. DaemonX uses ZeroConf libraries available at http://www.amk.ca/python/zeroconf Note that ZeroConf is not being maintained after Dec 2006. deamonX also requires installing jsonrpc libraries from http://json-rpc.org/wiki/python-json-rpc for processing JSON.

Very initial tests of deamonX using Zeroconf are below expectations. Using a GPRS moderm as network source, Zeroconf library has thrown errors. More testing needs to be done before making any final statement.

Blueprint for Synchronization

We need to implement a system performing automatic Sahana synchronization. This Synchronization will be between any Sahana servers (PHP and Py). Our focus should be on SahanaPy but it should be compatible with SahanaPHP. Currently SahanaPy data exporting module exports data in CSV (web based: not for autonomous process). We can add support for XML and JSON. XML exporting will ensure compatibility with PHP version Sahana. JSON is modern, futuristic, light over HTTP and reliable, so using JSON for data synchronization looks promising. These should strictly adhere to XSD standards set approved by W3C.

Another important point is to use UUID information of each of sync activity. We must include UUID while exporting data. Current UUID of export module of PHP version includes 'Instance ID' to make it clear which install instance it belongs to. Similar approach should also be adopted in SahanaPy.

Automatic synchronization is different from manual data export / import module present in Sahana. Automatic process should run continuously as daemon.

Currently we are using database dump for exporting which is definitely not optimal way for synchronization of databases. A paper written by Leslie Klieb ( http://hasanatkazmi.googlepages.com/DistributedDisconnectedDatabases.pdf ) discusses various ways for this. In the light of this research, we can implement synchronization as following:

  • we need to put time stamp as additional attribute in each table of database (tables which has data like names of missing people etc, we do not need to sync internally required tables which an instance of Sahana installation uses for saving internal information). This time stamp and UUID of Sahana Instance together can represent a unique attribute. This time stamp attribute MUST be added to SahanaPHP for making intelligent database synchronization.

There is a desire already for data deleted from Sahana to stay available but with a deleted flag. This would then not be visible during normal DB queries, but is accessible for audit purposes if required. We can make this a reusable field in models/00_db.py & then add it to each table definition (well, all real, syncable data - no need for internal settings). For this to be accomplished in SahanaPHP, we MUST put another attribute: delete flag (alongside time stamp). Delete flag will be Boolean represented if tuple has been deleted or not.

When new tuple is added: new date is entered, when tuple is updated: date is modified to present one. if tuple is deleted, we set delete flag as true for that tuple (and do not delete it for real) Now take two instances of Sahana A & B. Now A calls JSON-RPC (or XML-RPC) passing his (A's) UUID, now B looks into synchronization table (in B's database) for the last time data was sent from B to A, then B create JSON/XML of only those entries/tuples which are after that date and return then to A. It also sends in deleted tuples after the asked date. Now B immediately asks A and same process is repeated for A. Now each machine either updates or puts new tuples in specific tables. It also deletes all tuples which the other machine has deleted IF and only if it hadn't updated that tuple in its own database after the deletion on other machine.

An important outcome of this implementation can also be used in manual data exporting modules of Sahana (both versions). We can let the user select the age of data which they want to export (i.e. export data form a starting date to b date). Moreover, we can easily set these web services to call its own exposed web service rather them directly communicating with database layer.

Now As it is quite literal after reading last paragraph that this cannot be accomplished over standard web site based architecture so we need to make daemon (or service ) which will continuously run in the background basically doing two tasks:

  • 1) It must find (process in loop) other Sahana servers in the network who have some data
  • 2) It must expose a service to the network telling servers as they enter the network that it has some new data

This process needs to be autonomous and servers must be able to find each other without specifying IP. This can be accomplished by using ZeroConfig. So we need to come out from domain of web2py for this task. We can definitely hook our software with web2py execution sequence for automatic starting of this service as the server goes online.

For this to work with PHP version, we MUST make port this software with PHP version and we MUST must expose web services in PHP version for doing sync. We must find someone in PHP developers who can do it.

We can always ship this with PortablePython eliminating need of installing Python on end machines (like what XAMPP is doing for PHP and MySQL) Reference:

Old Blueprint for Synchronization

The module as present now:

All tables have UUID fields: DeveloperGuidelinesDatabaseSynchronization

We can Export the tables - CSV is best-supported within Web2Py currently
"complete database backup/restore with db.export_to_csv_file(..),db.import_from_csv_file(...),
reimporting optionally fixes references without need for uuid"

This can be done using appadmin, but we have started work on a user-friendly way of dumping all relevant tables:

  • controllers/default.py
  • views/default/export_data.html
  • views/default/import_data.html

This works well, but has some needed enhancements:

  • Define how to deal with duplicates (currently, if a UUID is duplicated then the CSV file Updates the record, if a UUID isn't present then it is Created)
  • Download all
  • Download all for a Module

Need clear list of which tables to include:

  • not lookup lists which are the same across sites
    • e.g. OpenStreetMap/Google Layers (but WMS/SOS Layers Yes. Shapefiles Layers Yes if uploads copied across as well)
  • not site-specific stuff such as system_config, gis_keys, etc

Create an index manually to make the search by uuid faster.

other related threads:

There is a simple 1-table example appliance which has the ability to do syncs via XML-RPC:

In Sahana2 the record ids are UUIDs built from each instance's 'base_uuid'

There is a sync_instance table:

CREATE TABLE sync_instance (
    base_uuid VARCHAR(4) NOT NULL, -- Instance id
    owner VARCHAR(100), -- Instance owner's name
    contact TEXT, -- Contact details of the instance owner
    url VARCHAR(100) DEFAULT NULL, -- Server url if exists
    last_update TIMESTAMP NOT NULL, -- Last Time sync with the instance
    sync_count INT DEFAULT 0, -- Number of times synchronized
    PRIMARY KEY(base_uuid)
);
Note: See TracWiki for help on using the wiki.