Version 3 (modified by 13 years ago) ( diff ) | ,
---|
Blueprint: Transliteration
Purpose
As Sahana Eden is used in countries which do not use roman letters (eg Sri Lanka), a transliteration feature that transforms roman letter-combinations into their transliterated equivalents (such as Tamil letters) would be useful. For now we just need a form input.
Possible use-case
The CAP broker, which is currently work-in-progress, supports messages in different languages. Helpers might want to share messages in county-specific letters.
Technical side
The best API is offered by Google AJAX Transliteration. Unfortunately, a permanent Internet connection is required. In addition to that, Google deprecated it in summer 2011, which means that they do not support it anymore and are going to shut it down during the next 3 years. Because of that, it is not a good longterm solution. There are a few alternative transliteration services. They have no adoptable GUI, a few have APIs, which, again, require a permanent Internet connection. We could ask them if we are allowed to download and use their databases.
List of transliteration services
- Yamli (Arabic)
- University of Colombo (Sinhala, Tamil (phonetic))
- Translit.cc (Cyrillic langauges)
- Eki.ee (Downloadable Character Maps, most Cyrillic, some Arabic and one language using Hebrew letters)
- OK-Board (Languages used in Europe/North Africa/West Asia)
- Lingua systems (Cyrillic languages) - OpenSource + downloadable
- XLIT web (Indian langauges, Hindi and Marathi available at their webpage, more can be requested)
Implementation
If we would decide to develop the transliteration input ourselves, the recommended approch is a S3Widget, as it is easy to implement into the existing structures of Eden (a simple widget=TransliterationTextarea in the model is the only thing we would need to change to enable transliteration for a text-value). As everything has to be real-time, the Widget should fire a AJAX request. The problem is the data source. Even if the websites above would allow us to use their data, we would have to bring the given data into a consistent format like XML or JSON. Furthermore everything needs to be tested VERY much, as the data might be not 100% correct. Lifes may depend on it and if a CAP message, for instance, is not correct because the transliteration engine made a mistake, we have a serious problem. Because of this we need native speakers of these languages to test them. The last aspect that needs to be mentioned is the work required to maintain everything. If there is an update by one of the transliteration services requireing us to rebuild our JSON/XML files, everything might get outdated.
Another point is the GUI. We need a textarea with similar functions like Eclipse's autocomplete or Visual Studio's IntelliSense, a dropdown with possible meanings a letter combination can have. The only suitable solution is available here. A JQuery plugin needing a bit of styling, but supporting JSON as a data source.
Suggestion
According to the points explained above, a transliteration engine is quite much work, as we have no consistent database of roman -> Indian, Chineese, Cyrillic characters which we could use. As long as we are not able to find one, an implementation of a transliteration engine is not recommended. If we would find a professional and consistent database we could start to implement the feature using the JQuery plugin explained above.