Version 9 (modified by 14 years ago) ( diff ) | ,
---|
Table of Contents
BluePrint for automated database migration across schema changes
Caution to GSoC students considering this project
This is a difficult project, involving raw database operations on live user data, where the results must be carefully validated to avoid unrecoverable data loss. It will require knowledge of SQL and database administration, code generation or an equivalent means of specifying the process of converting a database, providing a UI when human assistance is needed to specify the conversion. It will involve dealing with source control tools, continuous integration, and automated testing. It is thus recommended for advanced students. We will limit the scope of the project to what can be done during the summer, and possibly split the project so that two students can participate.
Motivation
Far too often, people make changes to the database schema or to the data with which the database is initially populated. This causes pain and suffering for anyone with a live Eden instance who needs to upgrade their Eden software.
Web2py handles some simple schema changes, but the ones that cause problems are precisely the ones it can't handle, such as renaming a field or table, or adding a required non-null field, or requiring the creation of new records from current ones (e.g. if a single foreign key reference is removed from a table and is placed in a new relationship table), or where the data themselves are modified.
We should provide a way of easing the process of migrating a database to a new revision.
Proposed project
- Identify the possible types of database changes, and make appropriate test cases for each.
- Track database schema and initial database content changes in Eden on each commit to trunk. (This could be done as part of the continuous integration tests that are launched on each commit: Run Eden from scratch, which creates the database and adds initial data, then compare this with the result of the run from the last commit.)
- Identify the differences between the old and new schemas, and old and new initial contents.
- Determine which changes will be handled by Web2py without assistance.
- For more significant schema changes, attempt to figure out how to convert from old to new schema. For instance:
- Attempt to match renamed tables by their fields, and renamed fields by their characteristics. Attempt to match fields moved out to new tables by names and characteristics.
- Determine if type changes might be automated, or partly automated (e.g. if a field's type changes from text to datetime).
- Catalog differences in the initial contents of the database.
- Attempt to identify which records are no longer included, which are new, and which are modified.
- Determine if there will be changes that must be performed or verified by a human after conversion. (E.g. in the datetime conversion example, it is possible that some conversions will fail -- these will likely need to be fixed up by hand.)
- Provide a gui that shows the schema changes and suggested fixups. Allow the user to:
- Confirm or alter matching of renamed elements.
- Say how new fields should be populated from old ones.
- Say how data with type changes should be transformed.
- Provide a migration script for more complex conversions.
- Devise a way to record the operations needed to convert a database from one revision to the next.
- Provide the mechanism for actually converting a database from one revision to the next. For instance:
- Dump or back up the current database as a precaution.
- Make a dump to use in the conversion process.
- Determine if there is data that is affected by the changes.
- Is there data in the altered tables?
- Does anything depend on the altered initial contents?
- Determine which changes are better made by editing the dump, and which are better made with database operations. (Note that for each revision, it is likely that this will need to be done entirely by editing the dump, or entirely by database operations, else one would have to construct an intermediate schema to load the altered dump into, to continue conversion with database commands.)
- If editing the dump, do so, create a new database with the new schema, and load the modified contents.
- If using database operations, create a new database with the old schema, load the existing dump, and alter the database.
- Provide a way for a person to do any fixups or verification needed.
- Determine how the result of a conversion will be verified and tested.
- Provide a way for a sysadmin who has updated their copy of Eden to run the set of conversions from their previous revision to the new revision.
Existing Tools
We use dbstruct to handle this:
- MySQL: http://eden.sahanafoundation.org/browser/static/scripts/tools/dbstruct_mysql.py
- PostgreSQL: http://eden.sahanafoundation.org/browser/static/scripts/tools/dbstruct_postgresql.py
This is normally called from fabfile: