== Haiti Example of Data Migration == '''Note: As of January 2012, BZR/Launchpad info is deprecated. Please use GitHub. Thanks.'''[[BR]] Below are some specific examples which were used for the [wiki:Haiti] instance: We want to be able to migrate data from Prod to Dev so that Testers / Documenters have real data to play with. For sqlite, this is easy: {{{ /etc/init.d/apache2 stop rm -rf /home/haiti/dev/databases cp -ar /home/haiti/prod/databases /home/haiti/dev /etc/init.d/apache2 start }}} Maintaining this should be possible via [wiki:DataMigration#WebServices Web Services] We often need to use this same trick to deal with database consistency issues.[[BR]] === Note === 1. The databases directory contains what web2py knows about your database 2. You need to set Migrate=True in 000_config.py before you update the code and switch back to Migrate=False later on. 3. In case of database mismatches - check the databases directory for the appropriate table ( cat the file ) and change accordingly. === CSV === SpreadsheetImporter Beware DB locks! * Don't keep Web2Py shell open longer than necessary * Consider going into Maintenance Mode on the Webserver & disabling the Web2Py cron in {{{/etc/crontab}}} {{{ cd /home/haiti/web2py python web2py.py -S prod -M -N db.export_to_csv_file(open('db.csv','wb')) Ctrl+D }}} Remove reference Tables from CSV file (don't want to import duplicates!) NB Do CSV file edits in a text editor, not MS Excel (MS Excel makes the dumps unusable!) NB Be careful of file permissions! * All files in the databases folder should be {{{chown www-data:www-data}}} * If the 1st run of the model happens in the Python shell, then they will be owned by root... {{{ cd /home/haiti/web2py python web2py.py -S dev -M -N db.import_from_csv_file(open('db.csv','rb')) db.commit() Ctrl+D }}} ==== Changing a Column type in Sqlite live ==== sqlite handles live migrations pretty well, e.g. changing from unique=True just requires migrate=True However if there is existing data then a conversion from string to integer can cause problems. For a development system, the easiest solution is simply to remove all {{{databases/*}}} & restart, however this isn't possible for a live system: {{{ vim /home/haiti/prod/models/00_db.py migrate = True cd /home/haiti/prod bzr pull cd /home/haiti/web2py python web2py.py -S prod -M db(db.or_organisation.id > 0).select().export_to_csv_file(open('orgs.csv','wb')) db.or_organisation.drop() db.commit() Ctrl+D python web2py.py -S prod -M db.or_organisation.import_from_csv_file(open('orgs.csv','rb')) db.commit() Ctrl+D /etc/init.d/apache2 force-reload vim /home/haiti/prod/models/00_db.py migrate = False }}} If the system is being heavily used, need to put up a holding page during this time to prevent data entry. For switching on maintenance. {{{ cd /home/haiti ./maintenance.on }}} For switching off maintenance. {{{ cd /home/haiti ./maintenance.off }}} ==== Changing a set of Lookup Options live ==== We can easily amend the module_resource_opts = {} live in the code (usually models/module.py) however we need to migrate existing data (after backing it up, of course): {{{ cd /home/haiti/prod bzr pull cd /home/haiti/web2py python web2py.py -S prod -M db(db.or_organisation.id > 0).select().export_to_csv_file(open('orgs.csv','wb')) db(db.or_organisation.type == 4).update(type='') db(db.or_organisation.type == 5).update(type='') db.commit() Ctrl+D /etc/init.d/apache2 force-reload }}} If the system is being heavily used, need to put up a holding page during this time to prevent data entry. For switching on maintenance. {{{ cd /home/haiti ./maintenance.on }}} For switching off maintenance. {{{ cd /home/haiti ./maintenance.off }}} ==== Dealing with database corruption ==== 1. We had problems with the or_organisation table - queries wouldn't complete. The tracebacks included: {{{ValueError: invalid literal for int() with base 10: '01/14/10 1'}}} By commenting out each field in turn in the model, discovered that this was the timestamp column which included this data. (Note the file was also {{{chown root:root}}}!) To fix this we needed to: * Delete column from model {{{ vim /home/haiti/prod/models/05_or.py resource = 'organisation' table = module + '_' + resource db.define_table(table, timestamp, deletion_status, # uuidstamp }}} * Export as CSV {{{ cd /home/haiti/web2py python web2py.py -S prod -M -N db.or_organisation.export_to_csv_file(open('prod_orgs.csv','wb')) Ctrl+D }}} * Drop table {{{ cd /home/haiti/web2py python web2py.py -S prod -M -N db.or_organisation.drop() db.commit() Ctrl+D }}} * Rebuild Table * Ensuring that table is owned by www-data * http://haiti.sahanafoundation.org * Reimport from CSV {{{ cd /home/haiti/web2py python web2py.py -S prod -M -N db.or_organisation.import_from_csv_file(open('prod_orgs.csv','rb')) db.commit() Ctrl+D }}} 2. rms_sms_request table couldn't be recreated Since this part of the system is under active development, it was necessary to drop the table as part of the [wiki:ReleaseManagement release process]. * Although normally we can re-populate the tables from their source feeds simply using: {{{ cd /home/haiti/web2py python web2py.py -S prod -M -N feature_class_id = db(db.gis_feature_class.name == 'SMS').select().first().id db(db.gis_location.feature_class_id == feature_class_id).delete() db.rms_tweet_request.truncate() db.rms_sms_request.truncate() db.commit() Ctrl+D }}} However this didn't complete fully as there was still an associated view file in {{{databases/}}} & hence when model tries to create: * {{{SyntaxError: invalid table name: rms_sms_request}}} To resolve this: * Take site down (I did this the ugly way!) {{{ vim /etc/crontab /etc/init.d/apache2 stop }}} * Backup all Data {{{ cd /home/haiti/prod mkdir databases2 cp -ar databases/* databases2 rm -rf databases }}} * Copy data down locally to be able to analyse easily, export as CSV {{{ tar cvf prod_db.tar databases2 gzip -9 prod_db.tar sz prod_db.tar.gz }}} * Allow model to recreate databases (owned as www-data) * http://haiti.sahanafoundation.org * Restore Data * Edit CSV file using a text editor (not MS Excel!) * remove reference tables & intiial records which are created by {{{models/zzz_1st_run.py}}} (NB This didn't seem to work properly on this occasion!) * Import CSV segments to see which parts work: {{{ cd /home/haiti/web2py python web2py.py -S prod -M -N db.import_from_csv_file(open('prod_part1.csv','rb')) db.commit() Ctrl+D }}} ==== Migration Error recovery ==== * Error: "Unknown column \'gis_location.lon_min\' in \'field list\'" * Cause: MySQL didn't have the field, model did & the migration log in databases\ said that the migration had happened. To recover: {{{ vim /etc/crontab #0-59/1 * * * * www-data cd /home/haiti/web2py/ && python web2py.py -C -D 1 >> /tmp/cron.output 2>&1 ln -sf /etc/apache2/sites-available/test-maintain /etc/apache2/sites-enabled/test /etc/init.d/apache2 force-reload http://haiti-test.sahanafoundation.org/phpmyadmin/index.php?db=haititest&table=gis_location Export as SQL rm -f /home/haiti/test/databases/a54f35709b0d09d927454f2b745571db_gis_location.table mysql \r haititest drop TABLE gis_location; \q vim /home/haiti/test/models/00_db.py migrate=True ln -sf /etc/apache2/sites-available/uat /etc/apache2/sites-enabled/test /etc/init.d/apache2 force-reload http://haiti-test.sahanafoundation.org/test/ http://haiti-test.sahanafoundation.org/phpmyadmin/index.php?db=haititest&table=gis_location Import SQL vim /etc/crontab 0-59/1 * * * * www-data cd /home/haiti/web2py/ && python web2py.py -C -D 1 >> /tmp/cron.output 2>&1 vim /home/haiti/test/models/00_db.py migrate=False }}} === Web Services === Better to avoid locks & do without SSH login * Currently needs to be done resource by resource (JOINed tables can be done together as Components of main resource) * need to do referenced tables before the tables which refer to them * migrate the locations first (using the xml interface), you can later migrate the orgs and offices together (through Component JOIN) * http://localhost:8000/sahana/gis/location/create?format=xml&fetchurl=http://haiti.sahanafoundation.org/prod/gis/location.xml * http://localhost:8000/sahana/pr/person/create?format=xml&fetchurl=http://haiti.sahanafoundation.org/prod/pr/person.xml * http://localhost:8000/sahana/or/organisation/create?format=xml&fetchurl=http://haiti.sahanafoundation.org/prod/or/organisation.xml * This should pull in all components, so no need to do: * http://localhost:8000/sahana/or/office/create?format=xml&fetchurl=http://haiti.sahanafoundation.org/prod/or/office.xml * http://localhost:8000/sahana/or/contact/create?format=xml&fetchurl=http://haiti.sahanafoundation.org/prod/or/contact.xml * BluePrintSynchronisation * Ideally we want to develop a way of having all tables loaded together into RAM, refs fixed-up & then imported to db together === Notes === 1) sqlite does seems to treat emails with different capitalizations as different entities. MySQL does not and can cause a problem during import if data is not clean.[[BR]] 2) SQLite doesn't support FOREIGN KEY constraints, neither does MySQL with MyISAM tables. === Migration plan === '''Done'''[[BR]] 1) Put the site in read only mode -> Disable Login, Kill existing sessions[[BR]] 2) Export the CSV from web2py shell[[BR]] 3) Create a MyISAM database(say name = sahana) in mysql[[BR]] 4) Disable zzz_1st_run.py and zzz_last.py[[BR]] 5) Switch the settings in 00_db.py to mysql and do a [[BR]] {{{ python web2py.py -S sahana -M -N db.commit() CTRL+Z }}} 6) Drop the tweets and sms data and their corresponding GIS [ Check for any pledges ][[BR]] 7) Split the data into the 5 csv's as follows(say db1.csv, db2.csv,..) - Deleting other tables.[[BR]] {{{ TABLE auth_event TABLE admin_theme TABLE s3_audit TABLE s3_setting TABLE s3_source TABLE appadmin_setting =========================== TABLE pr_setting TABLE pr_pentity TABLE pr_person TABLE pr_group =========================== TABLE gis_setting TABLE gis_marker TABLE gis_projection TABLE gis_symbology TABLE gis_config TABLE gis_feature_class TABLE gis_symbology_to_feature_class TABLE gis_location TABLE gis_feature_group TABLE gis_location_to_feature_group TABLE gis_feature_class_to_feature_group TABLE gis_apikey TABLE gis_track TABLE gis_layer_openstreetmap TABLE gis_layer_google TABLE gis_layer_yahoo TABLE gis_layer_gpx TABLE gis_layer_georss TABLE gis_layer_kml TABLE gis_layer_wms TABLE gis_layer_tms TABLE gis_layer_xyz TABLE gis_layer_js ============================ TABLE pr_contact TABLE or_setting TABLE or_sector TABLE or_organisation TABLE or_office TABLE or_contact TABLE or_project TABLE or_activity TABLE lms_setting TABLE lms_catalog TABLE rms_setting TABLE admin_setting TABLE budget_setting TABLE cr_setting TABLE dvi_setting TABLE dvr_setting TABLE hrm_setting TABLE mobile_setting TABLE mpr_setting TABLE msg_setting TABLE sync_setting TABLE vol_setting }}} 8) Remove duplicates in the auth_user and pr_person[[BR]] 9) Clean up gis_layer_js -> make multi lined records single lined[[BR]] 10) In or_organisation -> remove "Medecins du Monde France" entry as "Médecins du Monde France" is treated as the same in MySQL[[BR]] 11) Clean up multi lined records in or_organisation.[[BR]] 12) Test import - multilined entries give a "raise SyntaxError, 'invalid file format' [[BR]] 13) or_sector does not have entries 14,15,16 - which are in the database associated with some organisations - Fix them[[BR]] 14) Check that the site is completely functional.[[BR]] 15) Go to phpmyadmin and export the entire db as sql.[[BR]] 16) Replace all MyISAM lines in the sql with INNODB[[BR]] 17) Drop the database using phpmyadmin and import the previously created sql[[BR]] 18) Test everything once.[[BR]] 19) Switch this to production[[BR]] Estimated time for migration: 1 hour ---- [wiki:Haiti]