wiki:BluePrint/DatabaseMigration

Context Navigation

Version 13 (modified by Fran Boon, 13 years ago) ( diff )
--

GSOC 2012 , Database Migration

Current Status

modules/s3migration.py has functionality to:

Backup the running database to a local sqlite copy so that we can copy data from renamed fields across or even rollback a failed migration.
Remove the 'unique' flag on Postgres Databases
Remove the 'foreign key' constraints on MySQL Databases

It is already used to assist with migrations of live sites running PostgreSQL & MySQL. Soon it will be used to migrate SQLite offline clients ina failsafe manner.

It is envisaged to grow as/when specific migrations require additional functionality.

Work Flow

Primary: SysAdmin of Server

The primary focus should be on supporting the SysAdmin whose job it is to keep production servers running.

we can normally assume Linux servers, but the core should be as agnostic as possible (only wrapper scripts should make assumptions)

They have a live system with live data & new code whose impact on the live data they are unaware of (we assume that the actual code has been tested separately with new data).

They cannot trust that developers have done the right thing & given them all the correct hints (although would like to be able to make use of any which are provided).

Priority tasks for them:

They want a report on the potential data migration issues they may face
If it canot be migrated automatically, then they would like assistance with building a migration script
- The boilerplate should be in-place and the tables with issues should have sections with as much automated migration code done as possible.
Once the script has been written, they'd like to run this on a copy of their live database to check that the migration goes smoothly & then they can run user-testing on the test system (with new code/migrated data) before upgrading the live production system.
- If this can be run automatically then so much the better

One way to achieve this would be to:

run the script in the context of the context of the current live app to pickup some configuration automatically, with the rest being passed in as either os.environs or args from the wrapper script:
```
#!/bin/sh
option1="myoption"
export option1
python web2py.py -S <liveapp> -M -R applications/liveapp/static/scripts/tool/migrationscript.py -A arg1 arg2
```
create a new web2py application folder (e.g. web2py/applications/test)
copy the old code/settings into this new folder
modifiy the settings in the new folder to change the DB name (e.g. '%supgradetest' % livedbname)
create the new DB in the OS
- assume that localhost postgres sites can sudo postgres
- assume that MySQL sites have a .my.cnf setup for OS-level root access
dump the live DB
import into the test DB
copy the .table files
- rename them for MySQL/PostgreSQL
checkout the new code into the new application
- either break out to OS or use a Python wrapper:
  - http://stackoverflow.com/questions/1456269/python-git-module-experiences
calculate the ability of the system to do an automatic migration
- database-specific issues with Web2Py's automatic migration
if OK, then do the migration & let the users test
- still provide a report on all schema changes for review, as this allows the testing to focus on these issues
if not OK, then build a migration script
if confident that the migration script is complete, then do the migration & let the users test
if not confident, then let the SysAdmin complete the migration script, which they then run manually
when the SysAdmin is confident that the migration has been tested sufficiently, they run the script with a different command line option to migrate the live system

Secondary: Offline Client

We have remote offline clients, primarily running on win32.

We'd like to be able to ship them an update .exe which includes the new code & the tested(!) script to migrate the database (this will be between known, not arbitrary, revisions).

The migration script will be built by the release manager, so just needs to be run on the remote, however there is no tech support available locally so it needs to failsafe - it needs to test the migration & enable easy rollback of both DB & code if it encounters any unexpected errors.

Secondary: Developer

A secondary focus would be on supporting developers to provide hints to the SysAdmin about what how the migration should be done:

Developer Who does the changes in his branch

There would be a view that shows the Database Migration Screen (as explained below) . There would be folder in models which will be loaded only when this view is called . This model will have the comparing script that track schema changes ( as explained below ) . This script will compare the current database( the version which merger is about to push or merge to the trunk), with the DAL instance serialized (which will be made part of the trunk).

Thus whenever the merger runs this view a migration script is generated which is stored in the Database folder along with the Current DAL instance serialized which is used for future mergers .

Thus 2 extra things will be stored in the Database folder in the trunk too. Thus whenever somebody pulls from the trunk this current DAL instance he pulled shall be used to compare with the changes he do in the future

Merger to Trunk

This is active at the time the pull request is entertained by the sysadmin who merges with the trunk. Thus he need to travel from commit t to commit k . There can be git hooks attached with the "git pull" which will make the migration script present in the revision (provided by developer ) run . If any discrepancies are found between the schema changes and the migration script suggested by the developer thus the data cannot be migrated , then we would inform the sysadmin about the problem sysadmin is then presented the migration script to customize to requirements needed . This can be done all the way from commit t to commit k.

Tracking Schema Changes

I will create a python code which uses two instances of Database Variables which are initialized using DAL.The current Instances of Eden will also store the (serialized) DAL instance from the previous revision (the Last Commit). One Database Variable points to the Database files of the current instance, while the other variable can be obtained by de-serializing the DAL instance stored. Then from the Database Variable we will extract the table names, their fields and the constraints including the foreign key constraint in the form of a Python Dict. This code will be merged with the topo.py( in static\scripts\tools ) to arrange the extracted Database tables in the topological order which will aid the comparison of the two Database Dict which represent the schemas of the two revisions .

I have created a running prototype of the code that extracts the Database Dict from the Current Instance:-http://paste.pocoo.org/show/573397/

This Comparison will get the changes in the form of a Dict which will have 2 keys “appeared” and “disappeared” corresponds to the fields or tables which are new to the database and the fields which are not the therein the database anymore respectively ,
e.g - dbChanges[“appeared”][“<Table name>_<Field name> ”]
The details of these fields can be obtained from the respective Database Dict. These changes in the database can be used to predict the changes in the database.

NEW CHANGES IN THE DATABASE	DISAPPEARED CHANGES IN THE OLD DATABASE	INFERENCE
A field has appeared in a particular table	A field has disappeared from the same table to which a field has appeared, marching the type of the appeared field.	If only one pair of such a change is detected per table , we can infer that it is renamed
A field has appeared in a particular table	A field has disappeared from the same table with same name but different type	The type of the field has changed
A field has appeared in the particular table	No field has disappeared from that field	A field is added
No field is appeared in particular table	A field has disappeared from that table	A field is deleted

We can also infer the intentions of the developers using the appeared and the disappeared fields . These can be made use to get the default match-ups for the database migration screen .

Using DAL for tracking purpose also allows us to get the changes from different database management systems like PostgreSQL, MySQL,SQLite etc.

Scripts to be generated for different scenarios

Rename of any Field

This kind of a change is most common in the database schema. A select query will be performed in the previous database variable. And the dict of the row will be obtained from select query in the DAL.

The name of the key of the dict(row) that corresponds to the renamed field will be changed. Now the new database is checked if that row or tuple is already present it will not be inserted. However if the row is not there an insert query will be performed.

Changing Type of the field

If any change in the type of a field (like from string to integer) is detected, we will ask him to map the values from the previous data field to new values that he intends to put in the database. These mappings will be asked using the Database Migration Screen.

e.g{1: T("First option"),2: T("Second option")}

Now a select query will be performed on the previous database to get the tuples of the table. After that the value of the particular field can be changed according to the mapping function. These tuples can then be inserted into the new database using DAL. If some values are not mentioned in the mapping functions those can be explicitly type casted.

We can also use mapping for cases in which the schema is not changed however the values within a field needs to be revised according to the present needs .This has been discussed in database migration screen.

Adding fields:

I will look for the default values in the models for that field. If a default value is found then I could add that values in all the tuples. However if no default value is found we will add null in the newly formed field.

However this approach will not work well if there are the following constraints.

o Field is not null:

This constraint will create a problem if no default value is found in the models . In this case we will ask the developer to specify the default value.

o Field is unique and not null:

In this case we will copy one of the existing fields into the new field which has the same type as the appeared field . I will them inform the developer about this rule.

· Removing a Field:

This can be tackled by taking a obtaining the tuples of the previous database instance through select query and do not include the field while writing the insert query in the DAL. Though an error will be prompted if

(Removing a Table)Merge n Tables in 1 Table:

Though this case is very rare and but still this case could be troublesome. In this scenario I can take the natural joint of the tuples in the tables of the previous database.

(Adding a Table)Splitting 1 Tables into n Table:

I will break the tuples into two dicts according to their new tables and then eliminate duplicate rows in both. After this I will insert the tuples into the their respective tables.

Database Migration Screen

I have created a wire frame for this section: http://www.freeimagehosting.net/newuploads/jtzba.png

This screen will list all the appeared fields in one column while all the removed fields will form a drop down menu for each appeared field.

This will allow him to match-up the fields if he has renamed a field in Table. If any appeared field is not matched up to a disappeared field, this would tell us that he intended to make a new field similarly if any disappeared field is not matched to an appeared field, this would indicate that a field has been disappeared.

Also if any change in the type of a field is detected, we will ask him to map the values as explained in the previous script.

Also it will allow to match the values from the previous data field to new values , even if the name and the type of a field is unchanged though the functionality of that field has changed, for this they will have select the tables and the fields from a drop down menu and also add there mapping function.

After this screen a script will be generated for him, in which he can make any changes that he requires and execute the script. He could also tell us to execute the script, in that case we would run the script.