How Open/Free content and Open/Free software communities can help each other
How Open/Free content and Open/Free software communities can help each other.
Introduction
editThere are many communities each doing their own thing and doing it quite well, that could benefit a lot of cooperation. Many projects are not outward. Wiktionary, the dictionary is such a project. Wiktionary is a “one to many” dictionary; it aims to have all words in all languages with one language as its base. There are many such projects in several languages. All these projects have their communities and they do not benefit from each other’s work. Mistakes corrected in one project may still exist in another.
Ultimate Wiktionary
editTo alleviate this lack of cooperation, we are working on a project called “Ultimate Wiktionary”, this project is even more ambitious than the Wiktionary project because it will be an “any to any” dictionary; it aims to have all words in all languages and make it accessible with a user interface in all languages. The basis for this new project will be a genuine database.
The idea of Ultimate Wiktionary came into being out of the cooperation between two wiktionaries, the Italian and the Dutch. Sabine Cretella, a professional translator was really enthusiastic about opening the content for other purposes. TBX and LISA was one of her suggestions, GerardM came up Open Office and the Wiktionary mailing list mentioned .dict or RFC 2229. Thinking outside of our box proved quite interesting it expanded the implications of what we may be doing quite a bit.
We indicated over time several organizations and communities that we should cooperate with, they are:
- Sun Microsystems because of their Open Office org
- OmegaT, the best Free CAT tool around, and others (omega t+, ?)
- LISA because of TBX, TMX, and their expertise on localizations
- Lucene because of the Open Source language technology it incorporates.
This list does not exclude any organizations, because the secret of our success is the community. With the cooperation of more people and organizations we will be even more successful.
OpenOffice.org
editFor many users of computers, the most important tools are the “office” tools; word processor, spreadsheet, presentation tool, and ,in addition, databases. The great thing about OOo is that those who developing it are really active in the localization of its component tools. This localization enables the use of computers in the third world. “Give a man a net and he can eat tomorrow”. A word processor has a spellchecker. A spellchecker needs a list of correctly spelled words. These lists can become a major resource for the Ultimate Wiktionary. The people who collaborate on the localizations and the wordlists could help to enrich the content of the Ultimate Wiktionary.
For translators there are specialized tools to translate. OmegaT is software published under the GPL and like OOo it is platform independent. It can and does compete with proprietary tools; its ability to translate websites makes it very useful. One feature that needs improvement is the ability to build glossaries from within OmegaT itself. At this stage glossaries are tab delimited text files (for further info please see the OmegaT User Manual). Suppose that OmegaT could use the contents of the Ultimate Wiktionary, its content in return would quickly grow as more words are translated. A feedback mechanism for words with translations would be to the mutual benefit of Ultimate Wiktionary and OmegaT. The current (03 January 2006) version 1.6RC5 a very stable release candidate.
omega t+ is a translation tools suite. It includes omegat, a fork of OmegaT originally aimed at providing provides improved functionality and more cross-platform packages for installation for comparable versions. However, omegat has not been updated since it was forked, whereas the original OmegaT has been continuously improved. Amongst the other programs available is bitext2tmx, a program for manually aligning and converting old translations into TMX for use in translation memory capable applications. More applications are planned for addition to the suite.
The “Localization Industry Standards Association” or LISA is the organisation that publishes TBX, TMX, and a few other standards. TBX or “TermBase eXchange” is an open standard for the exchange of terminological data. TMX (Translation Memory eXchange) is the standard for the exchange of translation data. When Ultimate Wiktionary is able to talk the standard open formats, translators in free and open software like OmegaT can use it. Having both open standards, open software and open content ensures that there is little that can stop the fulfillment of open communications.
Lucene
editThe community that is behind the Lucene search engine has produced a significant amount of language related technology that could lend functionality to translation software or Ultimate Wiktionary. An Ultimate Wiktionary that rapidly increases in size will be of increasing academic interest because research is hampered by the availability of free content wordlists of sufficient size.