wiki:UserGuidelines/Admin/Synchronization

Version 76 (modified by Dominic König, 12 years ago) ( diff )

--

Synchronization

Purpose

The Synchronization module allows the synchronization of data resources between Sahana Eden instances. Synchronization jobs can be configured to be run automatically in the background and at regular intervals, without disrupting the current operation of the sites.

This module is part of the site administration module, and requires administrator privileges to view or modify its configuration.

The synchronization module requires web2py revision 3927 (1.99.2) or newer.

Method

Overview

The synchronization process is controlled entirely by the local Sahana Eden instance.

The local Eden instance runs the scheduler process, and initiates the update requests when due, while the remote repository merely responds to these requests.

Synchronization Overview

The local Eden instance first downloads the available updates from the remote repository (pull) and imports them into the local database, and then uploads all available updates from the local database to the remote repository (push).

Both pull and push are each a RESTful HTTP-request, using S3XML as data format.

Configuration

Checklist

Follow this checklist to configure synchronization:

  1. Check the Prerequisites
  2. Make sure the remote site is up and running, and available over network
  3. Login as administrator at the local site and
    1. Configure the default proxy server if needed
    2. Register the remote site
    3. Configure the resources to synchronize
    4. Set up the Synchronization Schedule
  4. Start the worker process at the local site

Prerequisites

Both sites must have Sahana Eden installed and running. To avoid problems with different database structures, both Sahana Eden instances should always use the same version of the software.

Decide which one is the local and which the remote site. The remote site is typically a permanently and publicly accessible Sahana Eden instance, while the local site could be a protected site (e.g. behind a firewall), or with only temporary network access (e.g. a notebook). See Synchronization Overview to understand the situation, maybe note down for yourself or co-workers which instance is which.

While performing synchronization jobs, the local site must be able to establish a connection to the remote site over the network using HTTP.

If a proxy server is to be used for the HTTP connection, this can be configured in the Synchronization Settings (proxy authentication is currently not supported).

Check that both instances have the synchronization module enabled: For each site, in a browser go to the URL http://yoursite/eden/sync where yoursite is the site's Eden server host name. If that shows a description of the sync module, then it is enabled. If that gets a 404 error, then it is not enabled.

If sync is not enabled, then add it in models/000_config.py after the comment that says "# Enable Additional Module(s)":

settings.modules["sync"] = Storage(
    name_nice = T("Synchronization"),
    #description = "Synchronization",
    restricted = True,
    access = "|1|",     # Only Administrators can see this module in the default menu & access the controller
    module_type = None  # This item is handled separately for the menu
)

It is important that the system clocks in both sites are synchronized with each other, which can best be achieved by synchronizing both sites with the same NTP service:

apt-get install -y ntpdate
ntpdate 0.us.pool.ntp.org

Synchronization Homepage

Login as administrator and open the Administration menu. In the left menu, you find the following entries:

Synchronization Menu

Click on Synchronization here to open the homepage of the Synchronization Module:

Synchronization Homepage

Synchronization Settings

Go to the Synchronization Homepage and click Settings to open this page:

Synchronization Settings

This page shows you the UUID (universally unique identifier) of this repository. You will need this identifier to register the repository at the peer site (the local UUID to register at the remote site, and the remote UUID to register at the local site). The UUID is created during the first run of the Sahana Eden instance, and cannot be changed.

If needed, enter the complete URL of the proxy server (including port number if not 80) that is to be used when connecting to the remote site (this is only necessary at the local site). Click Save to update the configuration.

Repository Configuration

Go to the Synchronization Homepage and click Repositories. This will show you a list of all configured repositories:

Repository Registry

To view and/or modify the configuration for a repository, clicking the Open button in the respective row in the list.

By clicking Add Repository, you can register a new repository:

Repository Registration

Fill in the fields as follows:

FieldInstructionsregistering the remote repository at the local siteregistering the local repository at the remote site
NameEnter a name for the repository(for your own reference)requiredrequired
URLEnter the URL of the repository (base URL of the Sahana Eden instance)required-
UsernameEnter the username to authenticate at the repositoryrequired-
PasswordEnter the password to authenticate at the repositoryrequired-
Proxy ServerEnter the URL of a proxy server to connect to the repository, if different from the Synchronization Settingsfill in as needed-
Accept Pushescheck this if the repository is allowed to push updates-set as needed
UUIDEnter the UUID from the Synchronization Settings of the repositoryrequiredrequired

Normally, you only have to register the remote repository at the local site. This will automatically send a request to the remote site to register the local repository. Please make sure that the remote repository is up and running and reachable over network. If this registration request fails, you will see a warning message requesting you to manually register the local repository at the remote site. Otherwise you can find an entry in the synchronization log confirming that the registration was successful.

Resource Configuration

Go to the Synchronization Homepage, click Repositories, then Open the repository you want to configure a resource for, and change to the Resources tab:

Resource Configuration

Fill in the fields as follows:

FieldInstructionsExample
Resource NameFill in the name of the master table of the resource. Details can be found in the documentation for the data model of your Sahana Eden applicationreq_req
ModeSelect the synchronization mode you wish to activate - pull, push or both. See Synchronization Overview to understand the modepull and push
StrategyChoose the import methods you wish to allow for the synchronization of this resourcecreate, update, delete
Update PolicyChoose in which situation records shall be updated, see explanations belowNEWER
Conflict PolicyChoose in which situation records shall be updated in case of conflicts, see explanations belowNEWER
Filterssee section Filters below

Update Policy

If a record has been modified in one of the repositories, then the synchronization process has to decide whether to update the other repository with the new data or not. For this decision you can define a policy:

PolicyMeaning
THISAlways update the remote repository with the local version of the record (overwrite remote updates)
NEWERUpdate both repositories to the newest version of the record (keep the newer data)
MASTERUpdate the record on either side only if the other side has originated the record (keep the master data)
OTHERAlways update the local repository with the remote version of the record (overwrite local updates)

Usually, you would choose "NEWER" here unless you have a good reason to do otherwise.

Conflict Policy

If a record has been modified both in the local repository and the remote repository since the last synchronization time, then this is called a conflict situation, in which two concurrent record updates are available at the same time. You can define a policy for which of the updates to apply, similar to the Update Policy.

If you don't know what to select here, it is reasonable to choose the same option as for the Update Policy.

Policy Transfer

In most situations, you would want both repositories to apply the same policies. This is the default behavior - the policies from the local site are reported to the remote site during the synchronization, and are applied there as well (THIS and OTHER are replaced by the respective opposite at the remote site, of course).

If you for some reason need to define different policies at the remote site, then you have to configure the same resource at the remote site as well, and choose the policies explicitly.

Filters

Sometimes not all records in a table shall be synchronized - use the "Filters" subform to define any number of filters to determine which records shall be synchronized.

Each filter is a URL query string, and is applied to the specified table. Usually, the table would be the same as the master table of the resource, but you can also specify filters which only apply to a specific component or referenced table (at any reference level).

You can use the tilde ~ as shortcut for the master table, both in the "Table" field and in the "Filter" string.

Example: export only project_project's which have link to the DRR Sector:

  • Resource: project_project
  • Tablename: ~
  • Filter: sector.name=DRR

Remember that URL filter strings must always be prefixed with the component alias (or with ~ for the master table).

Note that filters are not global: they apply only for this particular synchronization task and for this particular peer repository.

Synchronization Schedule

Go to the Synchronization Homepage, click Repositories, then Open the repository configuration you want to schedule a synchronization job for and change to the Schedule tab. If there are already jobs configured for this repository, you will see a list of those jobs. Otherwise (or by clicking Add Job), you get to this form:

Synchronization Schedule

With every Job, all resources configured for this repository will be synchronized.

Fill in the fields as follows:

FieldInstructionsExample
EnabledSet to True if the job shall actually be run, or set False to disable the jobTrue
Start TimeSelect date and time for the first run of this job (UTC)2011-09-21 08:30
End TimeSelect date and time after which the job shall not be run anymore (UTC)2012-09-21 08:30
Repeat n timesSelect how often the job shall be run, set to 0 to set no limit0
Run everySelect the time interval after which to repeat the job5 minutes
TimeoutSet a maximum time after which to abort the action600 seconds

If you need to switch between jobs (e.g. for maintenance periods, low-traffic periods), you can set up multiple schedules, and disable/enable them as needed

To consider:

You should choose meaningful time interval and timeout settings: the more resources are to be synchronized, the longer it will take (in this regard, also note that THIS- and OTHER-policies will always exchange all records in a resource, thus taking significantly longer).

How many record have to be exchanged per each run depends on the average update frequency and the time internal between synchronizations: e.g. if there are on average 100 record updates per minute, and you set a 2-minute interval, then there would be 200 records on average to be transmitted every run. The import rate on a small server has been tested at on average 18 records/second, which means, the synchronization process would take around 11 seconds in this case. To be on the safe side, choose a timeout value at least 10 times as high as that - e.g. 120 seconds.

Note that the network traffic arising from synchronization does not mainly depend on the frequency of synchronization, but on the record update rate at the sites. Smaller synchronization intervals would increase the traffic only slightly, but reduce the rate of conflicts and the risk of network-related problems. However, too small intervals (below the update rate of the site) may cause unnecessary network traffic with just empty transmissions.

Worker

The scheduled synchronization jobs are performed by a separate asynchronous web2py worker process at the local site.

To start the worker process, open a shell on the local server, change into the web2py home directory and run:

python web2py.py -K eden -Q

(replace "eden" with the name of your Sahana Eden application if necessary)

In more advanced configurations you may want to run this command as a daemon process, e.g. under Linux by:

nohup python web2py.py -K eden -Q >/dev/null 2>&1 &

Synchronization Log

Go to the Synchronization Homepage and click Log. This shows you a list of all prior log entries for all repositories.

If you instead want to see the log entries only for a particular repository, go to the Synchronization Homepage, click Repositories, then Open the respective repository configuration and go to the Log tab:

Repository Log

Note: the newest entries are shown on top of the list.

Click on Details for a log entry to see the complete entry:

Repository Log Entry

Read the entries as follows:

ItemExplanation
Date/TimeDate and time of the transaction
RepositoryName of the repository synchronized with
Resource NameName of the resource synchronized
ModeTransaction mode (pull or push) and direction of transmission (incoming or outgoing)
ActionAction performed to resolve problems (if any)
ResultResult of the transaction
Remote ErrorTrue if there was an error at the remote site
MessageThe log message

See Also


UserGuidelines/Admin

Attachments (10)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.