Changes between Version 17 and Version 18 of BluePrint/Synchronisation


Ignore:
Timestamp:
05/08/09 08:02:24 (16 years ago)
Author:
Hasanat Kazmi
Comment:

Gsoc prosolal ideas

Legend:

Unmodified
Added
Removed
Modified
  • BluePrint/Synchronisation

    v17 v18  
    1 = Blueprint for Synchronisation =
     1= Blueprint for Synchronization =
    22
    3 We need to be able to support synchronising data between instances:
     3We need to implement a system performing automatic Sahana synchronization. This Synchronization will be between any Sahana servers (PHP and Py). Our focus should be on ShanaPy but it should be compatible with SahanaPHP.
     4Currently SahanaPy data exporting module exports data in CSV (web based: not for autonomous process). We can add support for XML and JSON.
     5XML exporting will insure compatibility with PHP version Sahana. JSON is modern, futuristic, light over HTTP and reliable, so using JSON for data synchronization looks promising. These should strictly adhere to XSD standards set approved by W3C.
     6
     7Another important point is to use UUID information of each of sync activity. We must include UUID while exporting data. Current UUID of export module of PHP version includes 'Instance ID' to make it clear which install instance it belongs to. Similar approach should also be adopted in SahanaPy.
     8
     9Automatic synchronization is different from manual data export / import module present in Sahana. Automatic process should run continuously as daemon.
     10
     11Currently we are using database dump for exporting which is definitely not optimal way for synchronization of databases. A paper written by Leslie Klieb ( http://hasanatkazmi.googlepages.com/DistributedDisconnectedDatabases.pdf ) discusses various ways for this. In the light of this research, we can implement synchronization as following:
     12 * we need to put time stamp as additional attribute in each table of database (tables which has data like names of missing people etc, we don not need to sync internally required tables which an instance of Sahana installation uses for saving internal information). This time stamp and UUID of Sahana Instance together can represent a unique attribute. This time stamp attribute MUST be added to SahanaPHP for  making intelligent database synchronization.
     13
     14There is a desire already for data deleted from Sahana to stay available but with a deleted flag. This would then not be visible during normal DB queries, but is accessible for audit purposes if required. We can make this a reusable field in models/_ _db.py & then add it to each table definition (well, all real, syncable data - no need for internal settings). For this to be accomplished in SahanaPHP, we MUST put another attribute: delete flag (alongside time stamp). Delete flag will be Boolean represented if tuple has been deleted or not.
     15
     16When new tuple is added: new date is entered, when tuple is updated: date is modified to present one. if tuple is deleted, we set delete flag as true for that tuple (and do not delete it for real)
     17 
     18Now take two instances of Sahana A & B. Now A calls JSON-RPC (or XML-RPC) passing his (A's) UUID, now B looks into synchronization table (in B's database) for the last time data was sent from B to A, then B create JSON/XML of only those entries/tuples which are after that date and return then to A. It also sends in deleted tuples after the asked date.
     19Now B immediately asks A and same process is repeated for A.
     20Now each machine either updates or puts new tuples in specific tables. It also deletes all tuples which the other machine has deleted IF and only if it hadn't updated that tuple in its own database after the deletion on other machine.
     21
     22An important outcome of this implementation can also be used in manual data exporting modules of Shana (both version). We can let the user select the age of data which they want to export (i.e. export data form a starting date to b date). Moreover, we can easily these web services to call its own exposed web service rather them directly communicating with database layer.
     23
     24Now As it is quite literal after reading last paragraph that this can not be accomplished over standard web site based architecture so we need to make daemon (or service ) which will continuously run in the background basically doing two tasks:
     25 * 1) It must find (process in loop) other sahana servers in the network who have some data
     26 * 2) It must expose a service to the network telling servers as they enter the network that it has some new data
     27
     28This process needs to be autonomous and servers must be able to find each other without specifying IP. This can be accomplished by using ZeroConfig.
     29So we need to come out from domain of web2py for this task. We can definitely hook our software with web2py execution sequence for automatic starting of this service as the server goes online.
     30
     31For this to work with PHP version, we MUST make port this software with PHP version and we MUST must expose web services in PHP version for doing sync. We must find someone in PHP developers who can do it.
     32
     33We can always ship this with PortablePython eliminating need of installing Python on end machines (like what XAMPP is doing for PHP and MySQL)
     34 
     35Reference:
     36 * Diagram of service as exposed to the network: http://hasanatkazmi.googlepages.com/sahana-rough.jpg
     37 * Initial proposal: http://hasanatkazmi.blogspot.com/2009/04/sahana-proposal.html
     38
     39= Old Blueprint for Synchronization =
     40
     41The module as present now:
    442 * http://wiki.sahana.lk/doku.php?id=doc:sync:english
    543