Table of Contents
This page looks at ways we can extend the Internationalisation options within Sahana Eden.
Production Options are defined within DeveloperGuidelines/Internationalisation
User perspective is: UserGuidelines/Localisation
Below are some tasks that can help in improving the existing translation functionality.
- Pootle integration
- Deprecated strings must not be merged back into ".py" language files when merging from pootle.
- Setting up a separate sub-project on Pootle for a deployment
- Upload .po file
- from URL as well as file (e.g. http://pootle.sahanafoundation.org/pootle/export/eden/fr/fr.po)
- Add ability to enable/disable menu options (make this a DB table rather than a deployment_setting?)
- Think about how uploaded files can not conflict with updates from Version Control (currently uploaded updates will be wiped during upgrades)
- Exclude all templates other than current one by default (option to include all templates, defaulting to off)
- Default list of modules to the ones which are active in the running template
- List of modules shouldn't come from list of controllers (e.g. misses translate itself!)
Exclude Unit Tests folder Exclude all full paths (2nd occurrence+ is giving full path)
- Include certain prepop CSV columns (for T(record.field))
Don't include vars - e.g. T(r.name) shouldn't add "r.name" to the translation file
- Rewrite admin.py translate() so that only opt3 is a REST controller for translate_language (no need for an opt)
- All other opts should be separate controllers
- Online help to explain that the local languages/code.py will be updated & that uploaded files will be merged
- Online help to explain 'core files'
- Copy code from TranslateToolkit internally to avoid having external dependencies & launching a shell
- Allow exclusion of OCR strings: http://eden.sahanafoundation.org/ticket/1632
- Include strings from custom menus.py
- Include strings from 'label' deployment_settings
- Extend web2py2po/po2web2py to support translator comments
def translate(self, message, symbols): """ user ## to add a comment into a translation string the comment can be useful do discriminate different possible translations for the same string (for example different locations) T(' hello world ') -> ' hello world ' T(' hello world ## token') -> 'hello world' T('hello ## world ## token') -> 'hello ## world' """
- Babel - good toolkit to combine with GNU/gettext
- LaunchPad Translations - access to Ubuntu community
- GoogleTranslate can be used to help translators get started, but needs humans to make cultural and linguistic refinements
- Google ta3reeb - Arabic 'keyboard' using Latin characters
- MS Localisation Design Pattern: http://msdn.microsoft.com/en-us/library/dd129504%28v=VS.85%29.aspx
Todo List was already present before I started working on the blueprint and I have made some changes in it, and got some tasks done. So it is a good reference for quickly noting the issues. My blueprint starts from here.
This blueprint presents the development of Translation Functionality of Sahana-Eden. The current translation functionality has a lot of features to ease the translation process during disasters. However, there are various issues with some of the features and it can be improved further. The purpose of this blueprint is to address those issues and propose some solutions for the same so that Eden can have a more robust and efficient translation system.
- There is no integration with Pootle
- Size of .py files will grow
- All strings are selected when only a few of them corresponding to the modules in the active template will be required.
- Conflict in strings due to pull requests.
- System calls in the current version. (External Dependencies)
- Prepop CSV files are not included
Benefit to Sahana
- There can be a scenario where the translated strings received through pull request conflict with what’s already in the repository. The project aims to prevent this merge conflict.
- There are external dependencies in the current code as it makes system calls, and these will be avoided in the new version.
- Certain strings get deprecated with time, as the source code is changed and new ones are added. These deprecated strings will be removed and new ones will be added from time to time.
- Many strings are selected for translation even though certain of them would not be required for that particular deployment. So it is important to select only those strings that are present in the active modules. Currently, the translator doesn’t know which modules are active in the current template. The plan is to check these modules by default so as to save time and energy.
- Pootle integration is missing. As some translators prefer using pootle this will allow better options for translators.
As a translator :
I would not want to translate Deprecated Strings
As a system admin I would want to:
Keep the size of .py files small
Allow integration with pootle
Provide strings only from Active Templates to save time of translators. Avoid conflicts when updating.
The current translation functionality in Sahana-Eden does the following ( Most of these are in s3translate.py file) :-
a) Provide a menu to select a list of modules from which strings are to be translated ( doesn’t default modules corresponding to active template)
b) Extract strings from the selected modules using parse tree approach. Also extracts strings of deployment.settings variables (but not database variables)
c) Strings can be exported in xls and po formats
d) Merges uploaded translations ( in csv) with the existing .py language file ( doesn’t overwrite)
e) Pootle translations are not synced currently.
f) Doesn’t account for conflicts due to pulls and pomerge. g) External dependencies due to calls to methods in Translate Toolkit
Name : Nikhil Goyal
github : nownikhil
IRC Handle : nownikhil
- Removal of Deprecated Strings : Size of “.py ” files will keep on increasing if the new strings are merged with the existing strings. As changes are made to code some strings become deprecated while some new strings are introduced. So we can run the code with all the modules selected periodically and replace the existing files. This will remove all the deprecated strings and new strings will be available for translation. This can be done by using the “-o” option in the existing translation module which will overwrite the existing “.py” files instead of merging with them. This can be made into a scheduler job which can run periodically or it can be manually triggered by the admin as and when appropriate.
- Retrieval of strings from currently active template : Currently, we don’t have an option to check which strings are present in the active template. This can be done as follows:
- Use the parse tree approach to parse out the currently active template from 000_config.py
- Next, we parse the eden/modules/templates/<current-template>/config.py to get the active modules of that template
- So, only these modules will be checked by default (when showing the module selection page)
- Hence, we know which modules correspond to the current template and this can be used to extract only the relevant strings.
- Including database variables : We need to extract the strings in database variables so that they too can be translated. Currently, these variables are excluded from translation. Hence, one approach to extract these strings is as follows:
- Use the prepop csv files in modules/templates/<current-template> and mark them to be considered for translation.
- Provide a “Select all templates” option on the module selection page (similar to select all modules) to specify if all prepop files are to be considered. This option will be helpful when introducing new variables and discarding deprecated strings using the overwrite option as mentioned earlier.
- Pootle Integration : We need to make sure that the translation in pootle is kept in sync with that in the “.py” languages file. Below are few points to help us achieve this :
- As and when we use the overwrite option to remove deprecated strings,(as explained earlier) reflect these changes in pootle too. This will ensure that pootle doesn’t have any old strings and that new strings are also added.
- When merging from pootle, we might receive some conflicts ( just as through pull request). One possible solution is to create a script that identifies and stores all such conflicts in a file which can then be manually handled by translators.
- Also, an option for uploading ".po" files will be provided (apart from the current ".csv" files). The conflicts arising when merging this can be handled as mentioned before.
Hence, the translations in pootle and web2py will be consistent.
- Version Control : There can be a scenario where the translated strings received through pull request conflict with what’s already in the repository. Hence, we have to prevent this merge conflict. This can be handled in the same way as for Pootle (manual intervention).
- Avoiding External Dependencies : Current code makes system calls to csv2po and po2web2py but we want to remove these external dependencies. The link below gives a good reason why we should avoid this.
Different Sources of translated strings
There are mainly three sources of translated strings :-
1) Uploading CSV/PO file : In this case, the existing ".py" language file is merged with the translations from the uploaded files. Currently only upload of csv files is supported.
2) Through pull request : Translated strings are received through pull request.( One issue with the pull is that the Version control wipes out the uploaded updates.) Now a scenario might arise that corresponding a string two translations are there. To resolve this conflict we place these strings in a separated file, so that a translator can verify it later.
3) Through Pootle : This is in connection with the Pootle Integration of Sahana Eden. We need to be able to keep the Pootle and web2py language files in sync with respect to the strings. Hence, when merging with Pootle, there mustn't be any conflicts.
- If needing to be able to handle alternate word order with dynamic strings then wrap in XML():
- Databases store Unicode characters as 2+ bytes, so string, length=20 may limit to just 10 characters:
- UTF-8 encoding in Controllers:
- Date fields:
- Working across Timezones:
- Paragraph Translations:
- Currency Formatting:
- 18:48 onwards has relevant discussion...
- http://logs.sahanafoundation.org/sahana-eden/2013-04-19.txt (13:00 onwards)
- http://logs.sahanafoundation.org/sahana-eden/2013-04-22.txt (13:17 onwards)