Community Wishlist/Wishes/Language Converter should support word-to-word conversion and affix-to-affix conversion/en
Description
Currently, LanguageConverter only supports letter-to-letter conversion. For example, the Chinese language converter converts "历史" (zh-cn, utilizing Simplified Chinese script) to "歷史" (zh-tw, utilizing Traditional Chinese script), and the Uzbek language converter converts "Tarix" (uz-Latn, utilizing Latin script) to "Тарих" (uz-Cyrl, utilizing Cyrillic script). In these cases, each letter of the first script corresponds to each letter of the second script in one-to-one manner.
However, LanguageConverter does not support languages with scripts that cannot be uniformly converted on a letter-to-letter basis. For example, in Malay language, "wabak" (ms-Latn, utilizing Latin script) corresponds to "وابق" (ms-Arab, utilizing Jawi script); or "bahasa" (ms-Latn, utilizing Latin script) corresponds to "بهاس" (ms-Arab, utilizing Jawi script). To address this issue, these languages should use a converter that operates on a word-to-word basis.
Another important feature is that LanguageConverter should also support affix-to-affix conversion. For instance, in Malay language, "memberikan" (ms-Latn) consists of a prefix "mem" (ms-Latn), a core word "beri" (ms-Latn), and a suffix "kan" (ms-Latn). The LanguageConverter should be able to convert each of these elements into the corresponding prefix "مم" (ms-Arab), core word "بري" (ms-Arab), and suffix "کن" (ms-Arab), respectively. Finally, it should combine these elements to form the word "ممبريکن" (ms-Arab).
The prototype coding for word-to-word conversion and affix-to-affix conversion has been described in T261507.
In conclusion, LanguageConverter should not only support letter-to-letter conversion (as it currently does), but also support word-to-word conversion (since the letters between two different scripts might not match on a one-to-one basis) and affix conversion. At the same time, LanguageConverter should be maintainable by multi-generational contributors in a sustainable way with detailed documentation on either Meta Wiki or Wikitech. The current state of LanguageConverter is still not well-documented in detail, as noted in T370584 and T21044, raising concerns about its long-term maintainability.
Assigned focus area
Unassigned.
Type of wish
Feature request
Related projects
All projects
Affected users
Users using languages with variant conversions