wiki:BluePrintTransliteration

Version 11 (modified by Samsruti Dash, 8 years ago) ( diff )

--

BLUEPRINT TO ADD TRANSLITERATION TO TEXT ENTRY CONTROLS

Aim:

Sahana Eden Software is used in almost all the countries . Transliteration is the conversion of a text from one script to another. For local language text entry one can use a native language keyboard with native Unicode character set or use transliteration to allow entering the native word. E.g. of Transliteration : “Google Transliteration Beta”

Where they are Used:

These are mainly used in CAP Broker GUI, CAP templates.A CAP Broker is always available with messages in different local languages. Users of Sri Lanka would require a CAP broker that a message carries the <cap.alert.info> section in Sinhala or Tamil languages .The person who will create the messages require to type the <cap.alert.info.description> for both the languages . For Example, a term “Earthquake” which is pronounced as ûrth'kwāk'), when it will be typed in the message , the output result for Sinhalese people will be " භූකම්පනය " and for Tamils will be “நிலநடுக்கம்”

Technical side of Using This Feature

Google offered “Google AJAX Transliteration API” . To use that API an internet connection is required for online Transliteration .By Refering to this link : https://developers.google.com/transliterate/v1/getting_started#usingApis , anybody can get started with the API. Google is also providing offline services.But this is deprecated since 2011. But a new Transliteration API is introduced by Bing named Bing Translation APIs. A developer can easily use it by reffering to this link : http://www.microsoft.com/web/post/using-the-free-bing-translation-apis

How To use it in Eden ?

If a developer is decided to develop transliteration input service, then he/she should check the S3Widget.py file , which will make him easy to edit the existing structure of Eden. A widget named as as TransliterationTextarea in the model is the main part that we need to change to activate transliteration for a text-value. The widget should be use AJAX request or Bootstrap request . The main problem arises with the source of data .Even if the Eden website is given permission to use their data source , we still need to convert those into XML or JSON. We are not sure about the data , they may be correct partially.So we need to test it. The work needs to be updates. If any one the transliteration service is updates then others will become useless. User Interface is the main point of transliteration.We will require a text area for the input having autocomplete common function in Visual Studio or Adobe Dreamweaver. A dropdown box should be there with possible meanings a letter combination can have. By using jQueryUI ListBuilder we can overcome this problem which require UI.

SUGGESTION

According to the points explained above, a transliteration engine is quite a lot of work, as we have no consistent database of roman -> Indian, Chinese, or Cyrillic characters which we could use. As long as we are not able to find one, an implementation of a transliteration engine is not recommended. If we could find a professional and consistent database we could start to implement the feature using the JQuery plugin explained above.

Purpose

As Sahana Eden is used in countries which do not use roman letters (eg Sri Lanka), a transliteration feature that transforms roman letter-combinations into their transliterated equivalents (such as Tamil letters) would be useful. For now we just need a form input.

Possible use-cases

The CAP broker, which is currently work-in-progress, supports messages in different languages. A CAP Profile for Sri Lanka would require that a message carries the <cap.alert.info> segment carry information in Sinhala, Tamil, and English. Message creators will need to type the <cap.alert.info.description> for all three languages with the country-specific language characters.Example, the hazard event "tsunami" pronounced su-na-mi, when typed in this manner would result in the Sinhalese equivalent "සුනාමි".

Another one is an emergency coordinator may want to search for a person's name or product name written in the localized script. For example, the medical proper noun "Aspirin" would we pounced the same way in Arabic but may be written as "أسبرين". Hence, the user should be able to switch to Arabic transliteration to try that option in a search for medical supplies.

Technical side

The best API is offered by Google AJAX Transliteration. Unfortunately, a permanent Internet connection is required. In addition to that, Google deprecated it in summer 2011, which means that they do not support it anymore and are going to shut it down during the next 3 years. Because of that, it is not a good longterm solution. There are a few alternative transliteration services. They have no adoptable GUI, a few have APIs, which, again, require a permanent Internet connection. We could ask them if we are allowed to download and use their databases.

List of transliteration services

Implementation

If we would decide to develop the transliteration input ourselves, the recommended approach is an S3Widget, as it is easy to implement into the existing structures of Eden (a simple widget=TransliterationTextarea in the model is the only thing we would need to change to enable transliteration for a text-value). As everything has to be real-time, the Widget should fire an AJAX request. The problem is the data source. Even if the websites above would allow us to use their data, we would have to bring the given data into a consistent format like XML or JSON. Furthermore everything needs to be tested a LOT, as the data might not be 100% correct. Lives may depend on it and if a CAP message, for instance, is not correct because the transliteration engine made a mistake, we have a serious problem. Because of this we need native speakers of these languages to test them. The last aspect that needs to be mentioned is the work required to maintain everything. If there is an update by one of the transliteration services requiring us to rebuild our JSON/XML files, everything might get outdated.

Another point is the GUI. We need a textarea with similar functions like Eclipse's autocomplete or Visual Studio's IntelliSense, a dropdown with possible meanings a letter combination can have. The only suitable solution found so far is jQueryUI ListBuilder. This needs a bit of styling, but supports JSON as a data source.

Thoughts about server-side implementation

Another idea that came up is, that it would be nice if a user could search for something by using his or her language specific letters. For example, a user wants to search for "Aspirin" in Arabic letters. Currently, he won't find anything, as the UTF-8 encodings of roman letters are different from the Arabic letters ("Aspirin" == "أسبرين" would fail). The idea is to reverse transliterate the non-roman string into roman letters on server-side and perform the search with the reverse-transliterated string. That would require modification of the search-engine. The transliteration database could be the same as for the client-side (again, XML or JSON is recommended).

Note: See TracWiki for help on using the wiki.