wiki:BluePrint/DatabaseMigration

Version 1 (modified by anubhav, 12 years ago) ( diff )

--

GSOC 2012 , Database Migration

Work Flow

Merger

There would be a view that shows the Database Migration Screen (as explained below) . There would be folder in models which will be loaded only when this view is called . This model will have the comparing script that track schema changes ( as explained below ) . This script will compare the current database( the version which merger is about to push or merge to the trunk), with the DAL instance serialized (which will be made part of the trunk).

Thus whenever the merger runs this view a migration script is generated which is stored in the Database folder along with the Current DAL instance serialized which is used for future mergers .
Thus 2 extra things will be stored in the Database folder in the trunk too. Thus whenever somebody pulls from the trunk this current DAL instance he pulled shall be used to compare with the changes he do in the future


SysAdmin

This is active at the time the pull request is entertained by the sysadmin who merges with the trunk. Thus he need to travel from commit t to commit k . There can be git hooks attached with the "git pull" which will make the migration script present in the revision (provided by developer ) run . If any discrepancies are found between the schema changes and the migration script suggested by the developer thus the data cannot be migrated , then we would inform the sysadmin about the problem sysadmin is then presented the migration script to customize to requirements needed . This can be done all the way from commit t to commit k .

Tracking Schema Changes

I will create a python code which uses two instances of Database Variables which are initialized using DAL.The current Instances of Eden will also store the (serialized) DAL instance from the previous revision (the Last Commit). One Database Variable points to the Database files of the current instance, while the other variable can be obtained by de-serializing the DAL instance stored. Then from the Database Variable we will extract the table names, their fields and the constraints including the foreign key constraint in the form of a Python Dict. This code will be merged with the topo.py( in static\scripts\tools ) to arrange the extracted Database tables in the topological order which will aid the comparison of the two Database Dict which represent the schemas of the two revisions .

I have created a running prototype of the code that extracts the Database Dict from the Current Instance:-http://paste.pocoo.org/show/573397/


This Comparison will get the changes in the form of a Dict which will have 2 keys “appeared” and “disappeared” corresponds to the fields or tables which are new to the database and the fields which are not the therein the database anymore respectively ,
e.g - dbChanges[“appeared”][“<Table name>_<Field name> ”]
The details of these fields can be obtained from the respective Database Dict. These changes in the database can be used to predict the changes in the database.


NEW CHANGES IN THE DATABASE

DISAPPEARED CHANGES IN THE OLD DATABASE

INFERENCE

A field has appeared in a particular table

 A field has disappeared from the same table to which a field has appeared, marching the type of the appeared field.

If only one pair of such a change is detected per table , we can infer that it is renamed

A field has appeared in a particular table

 A field has disappeared from the same table with same  name but different type

The type of the field has changed

 A field has appeared in the particular table

No field has disappeared from that field

A field is added

 No field is appeared in particular table

A field has disappeared from that table

A field is deleted

 

 We can also infer the intentions of the developers using the appeared and the disappeared fields . These can be made use to get the default match-ups for the database migration screen .

Using DAL for tracking purpose also allows us to get the changes from different database management systems like PostgreSQL, MySQL,SQLite etc.



Scripts to be generated for different scenarios

 

  • Rename of any Field

This kind of a change is most common in the database schema. A select query will be performed in the previous database variable.  And the dict of the row will be obtained from select query in the DAL.

The name of the key of the dict(row) that corresponds to the renamed field will be changed. Now the new database is checked if that row or tuple is already present it will not be inserted. However if the row is not there an insert query will be performed. 

 

  • Changing Type of the field

If any change in the type of a field (like from string to integer) is detected, we will ask him to map the values from the previous data field to new values that he intends to put in the database. These mappings will be asked using the Database Migration Screen.

e.g{1: T("First option"),2: T("Second option")}

Now a select query will be performed on the previous database to get the tuples of the table. After that the value of the particular field can be changed according to the mapping function. These tuples can then be inserted into the new database using DAL. If some values are not mentioned in the mapping functions those can be explicitly type casted.

We can also use mapping for cases in which the schema is not changed however the values within a field needs to be revised according to the present needs .This has been discussed in database migration screen.

 

  • Adding fields:

I will look for the default values in the models for that field. If a default value is found then I could add that values in all the tuples. However if no default value is found we will add null in the newly formed field.

However this approach will not work well if there are the following constraints. 

o   Field is not null:

This constraint will create a problem if no default value is found in the models . In this case we will ask the developer to specify the default value.

o   Field is unique and not null:

In this case we will copy one of the existing fields into the new field which has the same type as the appeared field . I will them inform the developer about this rule. 

·         Removing a Field:

This can be tackled by taking a obtaining the tuples of the previous database instance through select query and do not include the field while writing the insert query in the DAL. Though an error will be prompted if

 

  • (Removing a Table)Merge n Tables in 1 Table:

Though this case is very rare and but still this case could be troublesome. In this scenario I can take the natural joint of the tuples in the tables of the previous database.

 

  • (Adding a Table)Splitting 1 Tables into n Table:

I will break the tuples into two dicts according to their new tables and then eliminate duplicate rows in both. After this I will insert the tuples into the their respective tables.

 

Database Migration Screen

I have created a wire frame for this section: http://www.freeimagehosting.net/newuploads/jtzba.png

 

This screen will list all the appeared fields in one column while all the removed fields will form a drop down menu for each appeared field.

This will allow him to match-up the fields if he has renamed a field in Table. If any appeared field is not matched up to a disappeared field, this would tell us that he intended to make a new field similarly if any disappeared field is not matched to an appeared field, this would indicate that a field has been disappeared.

Also if any change in the type of a field is detected, we will ask him to map the values as explained in the previous script.

 

Also it will allow to match the values from the previous data field to new values , even if the name and the type of a field is unchanged though the functionality of that field has changed, for this they will have select the tables and the fields from a drop down menu and also add there mapping function.

After this screen a script will be generated for him, in which he can make any changes that he requires and execute the script. He could also tell us to execute the script, in that case we would run the script.

 

Project Goals and Timelines

Trimester

TIMELINE

THE TASK

MEASURE

 

 

 

 

 

 

Second Trimester

(22 May - 9 July)

 

 

 

2 june  - 15 June

 

 

Complete the Detecting Mechanism

1.       Build the script that extracts the database schema and arrange them in topological order.

2.       Build a script that compares the differences between the database schema of the current instance and previous revision

3.       Generate reports using the dict of  the dbChanges

 

This can be tested  by generating reports after doing some changes in the database schema

 

 

16 June – 2 July

 

 

Create Scripts for different scenarios

1.   Create python code for renaming a field , adding a field and removing a field

2.   For changing the type the mapping function will be taken from a file, which could be changed to test

3.   Also create code for merging and splitting tables

 

This portion of the code can be tested by choosing a script specific to the scenario created by changing schema and then applying it.

 

 

 

3 July – 9July

 

 

Before the Mid-Term evaluation, I would like to concentrate on:-

1.     Debugging the Code

2.     Updating documentations , developer guidelines and functional specifications

 

 

 

 

 

 

 

 

Third Trimester

(10 July - 13 August)

 

 

 

10 July

– 25 July

 

 

Build Database Migration screen:-

  1. Showing the appeared and disappeared field for match-ups

That indicates renaming.

  1. Allow writing mapping functions for the type change or for any other field

Unifying the user interface with the Scripts.

  1. Choose the script accordingly and feed the values of tables and fields to the script

 

 

26 July 

– 6 August

 

  1. Also allow the user to custom the script .By copying the script and telling the developer about that location
  2. Make the script iterate-able and run the script between 2 arbitrary commits. 

Final testing can be performed. By changing any field and observing the results.

 

 

7 August

– 13 August

 

I will like to concentrating on the following things before the End term Evaluation:-

 

1.       Debugging the code for errors

2.       Updating documentation for the code , also writing user guidelines

 

 

 

Note: See TracWiki for help on using the wiki.