Community Wishlist/Wishes/Language Converter should support word-to-word conversion and affix-to-affix conversion/en

Language Converter should support word-to-word conversion and affix-to-affix conversion Submitted

Edit wish Discuss this wish

Description

Currently, LanguageConverter only supports letter-to-letter conversion. For example, the Chinese language converter converts "历史" (zh-cn, utilizing Simplified Chinese script) to "歷史" (zh-tw, utilizing Traditional Chinese script), and the Uzbek language converter converts "Tarix" (uz-Latn, utilizing Latin script) to "Тарих" (uz-Cyrl, utilizing Cyrillic script). In these cases, each letter of the first script corresponds to each letter of the second script in one-to-one manner.

However, LanguageConverter does not support languages with scripts that cannot be uniformly converted on a letter-to-letter basis. For example, in Malay language, "wabak" (ms-Latn, utilizing Latin script) corresponds to "وابق" (ms-Arab, utilizing Jawi script); or "bahasa" (ms-Latn, utilizing Latin script) corresponds to "بهاس" (ms-Arab, utilizing Jawi script). To address this issue, these languages should use a converter that operates on a word-to-word basis.

Another important feature is that LanguageConverter should also support affix-to-affix conversion. For instance, in Malay language, "memberikan" (ms-Latn) consists of a prefix "mem" (ms-Latn), a core word "beri" (ms-Latn), and a suffix "kan" (ms-Latn). The LanguageConverter should be able to convert each of these elements into the corresponding prefix "مم" (ms-Arab), core word "بري" (ms-Arab), and suffix "کن" (ms-Arab), respectively. Finally, it should combine these elements to form the word "ممبريکن" (ms-Arab).

The prototype coding for word-to-word conversion and affix-to-affix conversion has been described in T261507.

In conclusion, LanguageConverter should not only support letter-to-letter conversion (as it currently does), but also support word-to-word conversion (since the letters between two different scripts might not match on a one-to-one basis) and affix conversion. At the same time, LanguageConverter should be maintainable by multi-generational contributors in a sustainable way with detailed documentation on either Meta Wiki or Wikitech. The current state of LanguageConverter is still not well-documented in detail, as noted in T370584 and T21044, raising concerns about its long-term maintainability.

Assigned focus area

Unassigned.

Type of wish

Feature request

All projects

Affected users

Users using languages with variant conversions 

Phabricator tasks

T261507

Other details

  • Created: 11:04, 3 August 2024 (UTC)
  • Last updated: 16:28, 14 August 2024 (UTC)
  • Author: Hakimi97 (talk)