Upgrade Translation Functionality: GSoC Project 2012
Table of Contents
Weekly Meeting Schedule : Saturday, 9:00 GMT
Personal Details
Student
Name : Vivek Hamirwasia
Country : India
Timezone: GMT + 5:30
Email : vivsmart[at]gmail[dot]com
IRC Nick: vivek_h
Mentor
Name : Graeme Foster
Email : foster[dot]graeme[at]gmail[dot]com
IRC Nick : graemef
Project Abstract
Sahana-eden currently uses web2py translating feature to translate sahana-eden to different languages. With the current system only the original strings and the translated strings are available to the translators. That only would not be enough to translate correctly with the proper meaning. The objective of the project is to improve the translation process so that the translators have more information such as file name, line number, comments for the translators etc. So for example the translators will know the module in which the strings are in and that will help to translate more appropriately. Further the T() function currently used to identify strings to be translated will be improved so that the developer can add a comment for the translators. And a GUI (a web page) embedded into eden will be implemented to translate on the fly and see the progress of the translations.
Project Plan
Below is the detailed project plan for the Upgrade Translation Functionality project :-
Project Deliverables :- The idea of the project is to automate the entire process of selective translation by providing a tool that helps the translators to translate only the relevant strings in the code when it is deployed. Also, when the code is changed, some strings may no longer be required for translation and some new strings might be added, this tool accounts for such changes for consistent translation. A GUI will be developed to present the translation status for each active module. Addition of comments to the T(...) function must also be facilitated. Finally, the tool will ensure that the translations made by the translators is integrated back into the main code of Eden.
1) Retrieval and storage of all strings based on modules :- Initially, a python script would be run to collect all the strings in the Eden system based on the input module. This collection of strings will be stored in a separate file (different from the languages files) such that each row would contain the original (untranslated) string, its location(pathname/line number), comments and a flag to indicate if it has been translated or not (initially all flags will be unset). Note that the same string appearing in two different files will be placed in two different rows. Currently, we are focusing on the python parse tree generated by using the python parser library to get the strings from the ".py" files. Also file dependencies and structure are studied to discover which strings are grouped under the given module. The active modules will be passed as parameters to the function and later on it can be taken as input from the developer using GUI checkboxes. Also settings such as "deployment.setting.variable_name" (inside T(...) ) will be extracted as it is initially and later their value will be retreived. (The strings inside both T(...) and M(...) will be retrieved. ) For HTML and few JS files, we are using regular expressions to extract the strings inside T().
2) Building a spreadsheet for translators :- The strings retreived from the previous step will be converted to .csv format. This spreadsheet would then be available to translators for translating (along with location and comments for each string, if any). For this, first the translate-toolkit will be used to study the required format of .csv file and then using the xlwt library of python, the spreadsheet will be created. The current web2py language file will be checked for existing translations. If any string is already translated, its translation will be fetched into the spreadsheet so that the translator need not translate it again. This also gives the translator an option to overwrite the existing translation with something more appropriate. (Also, instead of spreadsheet, the user can choose to download a pootle file containing the same information as the spreadsheet).
3) Converting back the spreadsheet into web2py :- Once the translations are made, the .csv files contained the translations are to be merged back into web2py format. For this a couple of options will be available - i) Merge the new translations with existing ones ii) Replace the old .py file with the new one. Also, the translations may be present in several .csv files and so we need to merge all these files first before converting them into web2py format.
4) Updating strings due to modification of code :- There might be several changes made to the code from time to time and so we need to update the language file accordingly. The frequency of update can be set manually. We need to consider two cases - when new strings are added and when some existing strings are deleted. Hence while updating we run step 1) as mentioned above. The above procedure ensures that those strings already translated earlier, are not selected again for translation. Hence, this completes the updation of strings and takes care of any modification of code.
5) Integrate the above functionality into an eden module :- The translation functionality described above is to be made an eden module so that it can be run from within eden. The current plan is to add this functionality into the "admin" module. Hence a translate controller is defined for taking input through forms and processing them. The code for 1) to 4) above will be moved to modules/s3/s3translate.py. The controller will then call the required methods from this file and display the result accordingly.
6) GUI for tracking status: The status of translations for each module must be available on a UI. For this, it was decided that a "master" file containing all the strings will be maintained. The strings in the web2py language files will be checked against this master file to determine the percentage of translation - for the complete file and the module-wise breakdown. An update option will be provided to update the strings in the master file periodically.
7) User supplied strings : The user must be able to upload a text file containing some strings which will also be considered for translation. These strings will appear in the spreadsheet created and will be marked as "user-supplied".
Future enhancements
1) Commented strings in JS : Currently, even the commented strings inside T(...) in javascript files are extracted because regular expressions are being used. Hence the functionality can be improved by avoiding the retreival of such strings.
2) Strings inside MM() : The data inside MM() (in eden/modules/menus.py) can also be retreived so as to get few more strings.
3) Values of variables : A lot of variables inside T(...) access strings from database. Hence these strings can be fetched by using the information about the variable.
4) Module and location of user strings : The user can manually specify the module and the location of the strings when adding the extra strings through the text file. Hence, only the relevant strings will be fetched into the spreadsheet.
5) Pootle Integration: If time permits, we can integrate the pootle tool to translate the string instead of using the spreadsheet.
6) Allow comments : We want to have comments as an optional parameter to the T(...) function such that it becomes T(<string> , <comments>). Hence we could create a new T(...) function and over-ride the inbuilt web2py T(...) function. This new T(...) function would contain most of the code from the inbuilt function except that it would allow to pass comments as parameter.
7) Semi-automatic translation: We can use existing translating tools (such as google translate) to translate the untranslated strings and then present the list to the translator. This will greatly reduce the translator's time and effort as he only needs to modify those translations which are incorrect.
Project Goals and Timeline
Due Date | SMART goal | Measure | Status |
---|---|---|---|
First trimester (24 April - 20 May) | |||
17 May | Work on retreival of all the relevant strings inside T(...) and M(...) from the ".py" files and store the result in a file with complete location(file name and line number information). | The required strings are correctly recieved when tested on eden python files | Completed |
Second trimester (21 May - 9 July) | |||
28 May | Identify the categorization of modules in Eden by studying the file structure and dependencies | Retreived strings can be appropriately assigned module(s). | Completed |
4 June | Group the retrieved strings by modules and select the strings in those modules which are currently active | The relevant strings are selected and displayed | Completed |
12 June | Testing the code using unit tests and proper documentation | The code passes all tests and comments are provided to explain the code | Completed |
14 June | Using translate-toolkit to study the steps involved in converting language files from web2py -> po -> csv format | corresponging .csv file is formed | Completed |
16 June | Converting the strings retreived directly into spreadsheets to be presented to the translator using python xlwt library | The spreadsheet formed is in the same format as that formed by translate-toolkit in the step above | Completed |
18 June | Merge the list of strings obtained from different modules by handling duplicates before converting it to spreadsheet | A list of unique strings with semicolon separated locations is obtained | Completed |
24 June | Import the spreadsheet filled in by the translators into web2py format. Existing web2py language file is used to avoid conflicts in translation and make the work of translators easy | A valid ".py" language file containing translations is formed containing all the required strings | Completed |
26 June | Wrap the code into classes wherever necessary and provide comments for the previous week's code | All class functions work as expected | Completed |
30 June | Run a script from within eden to obtain the values for deployment_setting variables which are inside 'T(...)' | The values are collected as passed on for translation with location details | Completed |
8th July | Understand the working of eden modules and controllers and start moving the code to fit into an eden module. | The translation functionality is integrated as a module | Completed |
Third trimester (10 July - 13 August) | |||
15th July | Implement two workflows - a) Take a list of modules input from a form and use previous code to extract the strings. Present the spreadsheet for translation. b) Provide a form for uploading the translated file and use previous code to merge it with web2py language file. Create a table to store the csv files | The implemented workflows function as desired | Completed |
20th July | 1) Code cleanup, comments, fix few bugs. 2) Obtain a list of variables used inside T(...) | The variables inside T() are identified | Completed |
27 July | Retreive strings from html and javascript files | All the strings from HTML and JS code are extracted and appears on the translation spreadsheet | Completed |
3 August | Display translation status per module and for the whole language file | Accurate percentages of translations are reported | Completed |
6 August | 1) Drop-down for language code selection in all forms. 2)Add comments and docstrings. 3)Fix few bugs in UI | All workflows work correctly | Completed |
8 August | Provide export to ".po" option | pootle file is created | Completed |
13 August | Provide an option to add user-supplied strings through a text file | The exceptional strings are displayed in spreadsheet for translation | Completed |
20 August | Testing, debugging, documentation | All features of the project are fully functional and error-free | Completed |