3 | | We need to be able to support synchronising data between instances: |
| 3 | We need to implement a system performing automatic Sahana synchronization. This Synchronization will be between any Sahana servers (PHP and Py). Our focus should be on ShanaPy but it should be compatible with SahanaPHP. |
| 4 | Currently SahanaPy data exporting module exports data in CSV (web based: not for autonomous process). We can add support for XML and JSON. |
| 5 | XML exporting will insure compatibility with PHP version Sahana. JSON is modern, futuristic, light over HTTP and reliable, so using JSON for data synchronization looks promising. These should strictly adhere to XSD standards set approved by W3C. |
| 6 | |
| 7 | Another important point is to use UUID information of each of sync activity. We must include UUID while exporting data. Current UUID of export module of PHP version includes 'Instance ID' to make it clear which install instance it belongs to. Similar approach should also be adopted in SahanaPy. |
| 8 | |
| 9 | Automatic synchronization is different from manual data export / import module present in Sahana. Automatic process should run continuously as daemon. |
| 10 | |
| 11 | Currently we are using database dump for exporting which is definitely not optimal way for synchronization of databases. A paper written by Leslie Klieb ( http://hasanatkazmi.googlepages.com/DistributedDisconnectedDatabases.pdf ) discusses various ways for this. In the light of this research, we can implement synchronization as following: |
| 12 | * we need to put time stamp as additional attribute in each table of database (tables which has data like names of missing people etc, we don not need to sync internally required tables which an instance of Sahana installation uses for saving internal information). This time stamp and UUID of Sahana Instance together can represent a unique attribute. This time stamp attribute MUST be added to SahanaPHP for making intelligent database synchronization. |
| 13 | |
| 14 | There is a desire already for data deleted from Sahana to stay available but with a deleted flag. This would then not be visible during normal DB queries, but is accessible for audit purposes if required. We can make this a reusable field in models/_ _db.py & then add it to each table definition (well, all real, syncable data - no need for internal settings). For this to be accomplished in SahanaPHP, we MUST put another attribute: delete flag (alongside time stamp). Delete flag will be Boolean represented if tuple has been deleted or not. |
| 15 | |
| 16 | When new tuple is added: new date is entered, when tuple is updated: date is modified to present one. if tuple is deleted, we set delete flag as true for that tuple (and do not delete it for real) |
| 17 | |
| 18 | Now take two instances of Sahana A & B. Now A calls JSON-RPC (or XML-RPC) passing his (A's) UUID, now B looks into synchronization table (in B's database) for the last time data was sent from B to A, then B create JSON/XML of only those entries/tuples which are after that date and return then to A. It also sends in deleted tuples after the asked date. |
| 19 | Now B immediately asks A and same process is repeated for A. |
| 20 | Now each machine either updates or puts new tuples in specific tables. It also deletes all tuples which the other machine has deleted IF and only if it hadn't updated that tuple in its own database after the deletion on other machine. |
| 21 | |
| 22 | An important outcome of this implementation can also be used in manual data exporting modules of Shana (both version). We can let the user select the age of data which they want to export (i.e. export data form a starting date to b date). Moreover, we can easily these web services to call its own exposed web service rather them directly communicating with database layer. |
| 23 | |
| 24 | Now As it is quite literal after reading last paragraph that this can not be accomplished over standard web site based architecture so we need to make daemon (or service ) which will continuously run in the background basically doing two tasks: |
| 25 | * 1) It must find (process in loop) other sahana servers in the network who have some data |
| 26 | * 2) It must expose a service to the network telling servers as they enter the network that it has some new data |
| 27 | |
| 28 | This process needs to be autonomous and servers must be able to find each other without specifying IP. This can be accomplished by using ZeroConfig. |
| 29 | So we need to come out from domain of web2py for this task. We can definitely hook our software with web2py execution sequence for automatic starting of this service as the server goes online. |
| 30 | |
| 31 | For this to work with PHP version, we MUST make port this software with PHP version and we MUST must expose web services in PHP version for doing sync. We must find someone in PHP developers who can do it. |
| 32 | |
| 33 | We can always ship this with PortablePython eliminating need of installing Python on end machines (like what XAMPP is doing for PHP and MySQL) |
| 34 | |
| 35 | Reference: |
| 36 | * Diagram of service as exposed to the network: http://hasanatkazmi.googlepages.com/sahana-rough.jpg |
| 37 | * Initial proposal: http://hasanatkazmi.blogspot.com/2009/04/sahana-proposal.html |
| 38 | |
| 39 | = Old Blueprint for Synchronization = |
| 40 | |
| 41 | The module as present now: |