Haiti Example of Data Migration
Note: As of January 2012, BZR/Launchpad info is deprecated. Please use GitHub. Thanks.
Below are some specific examples which were used for the Haiti instance:
We want to be able to migrate data from Prod to Dev so that Testers / Documenters have real data to play with.
For sqlite, this is easy:
/etc/init.d/apache2 stop rm -rf /home/haiti/dev/databases cp -ar /home/haiti/prod/databases /home/haiti/dev /etc/init.d/apache2 start
Maintaining this should be possible via Web Services
We often need to use this same trick to deal with database consistency issues.
Note
- The databases directory contains what web2py knows about your database
- You need to set Migrate=True in 000_config.py before you update the code and switch back to Migrate=False later on.
- In case of database mismatches - check the databases directory for the appropriate table ( cat the file ) and change accordingly.
CSV
Beware DB locks!
- Don't keep Web2Py shell open longer than necessary
- Consider going into Maintenance Mode on the Webserver & disabling the Web2Py cron in
/etc/crontab
cd /home/haiti/web2py python web2py.py -S prod -M -N db.export_to_csv_file(open('db.csv','wb')) Ctrl+D
Remove reference Tables from CSV file (don't want to import duplicates!)
NB Do CSV file edits in a text editor, not MS Excel (MS Excel makes the dumps unusable!)
NB Be careful of file permissions!
- All files in the databases folder should be
chown www-data:www-data
- If the 1st run of the model happens in the Python shell, then they will be owned by root...
cd /home/haiti/web2py python web2py.py -S dev -M -N db.import_from_csv_file(open('db.csv','rb')) db.commit() Ctrl+D
Changing a Column type in Sqlite live
sqlite handles live migrations pretty well, e.g. changing from unique=True just requires migrate=True
However if there is existing data then a conversion from string to integer can cause problems.
For a development system, the easiest solution is simply to remove all databases/*
& restart, however this isn't possible for a live system:
vim /home/haiti/prod/models/00_db.py migrate = True cd /home/haiti/prod bzr pull cd /home/haiti/web2py python web2py.py -S prod -M db(db.or_organisation.id > 0).select().export_to_csv_file(open('orgs.csv','wb')) db.or_organisation.drop() db.commit() Ctrl+D python web2py.py -S prod -M db.or_organisation.import_from_csv_file(open('orgs.csv','rb')) db.commit() Ctrl+D /etc/init.d/apache2 force-reload vim /home/haiti/prod/models/00_db.py migrate = False
If the system is being heavily used, need to put up a holding page during this time to prevent data entry. For switching on maintenance.
cd /home/haiti ./maintenance.on
For switching off maintenance.
cd /home/haiti ./maintenance.off
Changing a set of Lookup Options live
We can easily amend the module_resource_opts = {} live in the code (usually models/module.py) however we need to migrate existing data (after backing it up, of course):
cd /home/haiti/prod bzr pull cd /home/haiti/web2py python web2py.py -S prod -M db(db.or_organisation.id > 0).select().export_to_csv_file(open('orgs.csv','wb')) db(db.or_organisation.type == 4).update(type='') db(db.or_organisation.type == 5).update(type='') db.commit() Ctrl+D /etc/init.d/apache2 force-reload
If the system is being heavily used, need to put up a holding page during this time to prevent data entry. For switching on maintenance.
cd /home/haiti ./maintenance.on
For switching off maintenance.
cd /home/haiti ./maintenance.off
Dealing with database corruption
- We had problems with the or_organisation table - queries wouldn't complete.
The tracebacks included: ValueError: invalid literal for int() with base 10: '01/14/10 1'
By commenting out each field in turn in the model, discovered that this was the timestamp column which included this data.
(Note the file was also chown root:root
!)
To fix this we needed to:
- Delete column from model
vim /home/haiti/prod/models/05_or.py resource = 'organisation' table = module + '_' + resource db.define_table(table, timestamp, deletion_status, # uuidstamp
- Export as CSV
cd /home/haiti/web2py python web2py.py -S prod -M -N db.or_organisation.export_to_csv_file(open('prod_orgs.csv','wb')) Ctrl+D
- Drop table
cd /home/haiti/web2py python web2py.py -S prod -M -N db.or_organisation.drop() db.commit() Ctrl+D
- Rebuild Table
- Ensuring that table is owned by www-data
- Reimport from CSV
cd /home/haiti/web2py python web2py.py -S prod -M -N db.or_organisation.import_from_csv_file(open('prod_orgs.csv','rb')) db.commit() Ctrl+D
- rms_sms_request table couldn't be recreated
Since this part of the system is under active development, it was necessary to drop the table as part of the release process.
- Although normally we can re-populate the tables from their source feeds simply using:
cd /home/haiti/web2py python web2py.py -S prod -M -N feature_class_id = db(db.gis_feature_class.name == 'SMS').select().first().id db(db.gis_location.feature_class_id == feature_class_id).delete() db.rms_tweet_request.truncate() db.rms_sms_request.truncate() db.commit() Ctrl+D
However this didn't complete fully as there was still an associated view file in databases/
& hence when model tries to create:
SyntaxError: invalid table name: rms_sms_request
To resolve this:
- Take site down (I did this the ugly way!)
vim /etc/crontab /etc/init.d/apache2 stop
- Backup all Data
cd /home/haiti/prod mkdir databases2 cp -ar databases/* databases2 rm -rf databases
- Copy data down locally to be able to analyse easily, export as CSV
tar cvf prod_db.tar databases2 gzip -9 prod_db.tar sz prod_db.tar.gz
- Allow model to recreate databases (owned as www-data)
- Restore Data
- Edit CSV file using a text editor (not MS Excel!)
- remove reference tables & intiial records which are created by
models/zzz_1st_run.py
(NB This didn't seem to work properly on this occasion!)
- remove reference tables & intiial records which are created by
- Import CSV segments to see which parts work:
cd /home/haiti/web2py python web2py.py -S prod -M -N db.import_from_csv_file(open('prod_part1.csv','rb')) db.commit() Ctrl+D
- Edit CSV file using a text editor (not MS Excel!)
Migration Error recovery
- Error: "Unknown column \'gis_location.lon_min\' in \'field list\'"
- Cause: MySQL didn't have the field, model did & the migration log in databases\ said that the migration had happened.
To recover:
vim /etc/crontab #0-59/1 * * * * www-data cd /home/haiti/web2py/ && python web2py.py -C -D 1 >> /tmp/cron.output 2>&1 ln -sf /etc/apache2/sites-available/test-maintain /etc/apache2/sites-enabled/test /etc/init.d/apache2 force-reload http://haiti-test.sahanafoundation.org/phpmyadmin/index.php?db=haititest&table=gis_location Export as SQL rm -f /home/haiti/test/databases/a54f35709b0d09d927454f2b745571db_gis_location.table mysql \r haititest drop TABLE gis_location; \q vim /home/haiti/test/models/00_db.py migrate=True ln -sf /etc/apache2/sites-available/uat /etc/apache2/sites-enabled/test /etc/init.d/apache2 force-reload http://haiti-test.sahanafoundation.org/test/ http://haiti-test.sahanafoundation.org/phpmyadmin/index.php?db=haititest&table=gis_location Import SQL vim /etc/crontab 0-59/1 * * * * www-data cd /home/haiti/web2py/ && python web2py.py -C -D 1 >> /tmp/cron.output 2>&1 vim /home/haiti/test/models/00_db.py migrate=False
Web Services
Better to avoid locks & do without SSH login
- Currently needs to be done resource by resource (JOINed tables can be done together as Components of main resource)
- need to do referenced tables before the tables which refer to them
- migrate the locations first (using the xml interface), you can later migrate the orgs and offices together (through Component JOIN)
- http://localhost:8000/sahana/gis/location/create?format=xml&fetchurl=http://haiti.sahanafoundation.org/prod/gis/location.xml
- http://localhost:8000/sahana/pr/person/create?format=xml&fetchurl=http://haiti.sahanafoundation.org/prod/pr/person.xml
- http://localhost:8000/sahana/or/organisation/create?format=xml&fetchurl=http://haiti.sahanafoundation.org/prod/or/organisation.xml
- This should pull in all components, so no need to do:
- http://localhost:8000/sahana/or/office/create?format=xml&fetchurl=http://haiti.sahanafoundation.org/prod/or/office.xml
- http://localhost:8000/sahana/or/contact/create?format=xml&fetchurl=http://haiti.sahanafoundation.org/prod/or/contact.xml
- BluePrintSynchronisation
- Ideally we want to develop a way of having all tables loaded together into RAM, refs fixed-up & then imported to db together
Notes
1) sqlite does seems to treat emails with different capitalizations as different entities. MySQL does not and can cause a problem during import if data is not clean.
2) SQLite doesn't support FOREIGN KEY constraints, neither does MySQL with MyISAM tables.
Migration plan
Done
1) Put the site in read only mode -> Disable Login, Kill existing sessions
2) Export the CSV from web2py shell
3) Create a MyISAM database(say name = sahana) in mysql
4) Disable zzz_1st_run.py and zzz_last.py
5) Switch the settings in 00_db.py to mysql and do a
python web2py.py -S sahana -M -N db.commit() CTRL+Z
6) Drop the tweets and sms data and their corresponding GIS [ Check for any pledges ]
7) Split the data into the 5 csv's as follows(say db1.csv, db2.csv,..) - Deleting other tables.
TABLE auth_event TABLE admin_theme TABLE s3_audit TABLE s3_setting TABLE s3_source TABLE appadmin_setting =========================== TABLE pr_setting TABLE pr_pentity TABLE pr_person TABLE pr_group =========================== TABLE gis_setting TABLE gis_marker TABLE gis_projection TABLE gis_symbology TABLE gis_config TABLE gis_feature_class TABLE gis_symbology_to_feature_class TABLE gis_location TABLE gis_feature_group TABLE gis_location_to_feature_group TABLE gis_feature_class_to_feature_group TABLE gis_apikey TABLE gis_track TABLE gis_layer_openstreetmap TABLE gis_layer_google TABLE gis_layer_yahoo TABLE gis_layer_gpx TABLE gis_layer_georss TABLE gis_layer_kml TABLE gis_layer_wms TABLE gis_layer_tms TABLE gis_layer_xyz TABLE gis_layer_js ============================ TABLE pr_contact TABLE or_setting TABLE or_sector TABLE or_organisation TABLE or_office TABLE or_contact TABLE or_project TABLE or_activity TABLE lms_setting TABLE lms_catalog TABLE rms_setting TABLE admin_setting TABLE budget_setting TABLE cr_setting TABLE dvi_setting TABLE dvr_setting TABLE hrm_setting TABLE mobile_setting TABLE mpr_setting TABLE msg_setting TABLE sync_setting TABLE vol_setting
8) Remove duplicates in the auth_user and pr_person
9) Clean up gis_layer_js -> make multi lined records single lined
10) In or_organisation -> remove "Medecins du Monde France" entry as "Médecins du Monde France" is treated as the same in MySQL
11) Clean up multi lined records in or_organisation.
12) Test import - multilined entries give a "raise SyntaxError, 'invalid file format'
13) or_sector does not have entries 14,15,16 - which are in the database associated with some organisations - Fix them
14) Check that the site is completely functional.
15) Go to phpmyadmin and export the entire db as sql.
16) Replace all MyISAM lines in the sql with INNODB
17) Drop the database using phpmyadmin and import the previously created sql
18) Test everything once.
19) Switch this to production
Estimated time for migration: 1 hour