Talk:Machine translation
from Dec 2003
editI teach translation at the University of Hawaii. I have native speakers of Japanese, various varieties of Chinese, Khmer (Cambodian), Korean and Pingelap (Marshall Islands). I cannot figure out from all the info on WIkipedia if there is any way that my students could participate either in translating from English or in some other way.
We would not be much interested in machine translation
Dave Ashworth Univ of Hawaii (from Dec 2003)
- This sounds great! I wish you had left us some way to contact you. The easiest way for your students to participate is to create accounts on en: (the English wiki) and on the wiki of the appropriate target language (Chinese, Khmer, Korean, etc.) wikipedia... and then to pick a topic area and start translating! We have had groups of students choose from a list of topics that needed research; you could certainly do the same for areas that need translation. Sj 22:47, 16 May 2004 (UTC)
- His email address is listed here: http://cits.hawaii.edu/ashworth/Default.htm
from Jan 2004
editWouldn't it be a good starting point to use the various dictionaries which are notably used with OpenOffice -- i.e. it_IT.dic, fr_FR.dic, en_EN.dic which are also under GNU License. As a further step, the various entries in each language dictionary could link to the English dictionary. This would allow for translation (word to word) from any language to any language. A dictionary along the same lines could then be added for the expressions. Paul
I suggest create links in wikipedia for article translations, indicanting that the pages has not been created ( by a red colour).
I.e. I could create a link [[es:Disco duro]] in wikipedia article hard disk. If there wouldn´t spanish article about "disco duro" the link would appear in red; if there would be spanish article about "disco duro" the link would appear in red.
I think it is an excellent idea. Along the same lines, there could be a different colour indicating for example if the article corresponding to the link exists and is an automatic translation of an English article (i.e. it stands in need of revision by a native writer of the corresponding language).
Current
editWikipedia Perl modules
editHello. I start writing Wikipedia Perl modules in future also suitable for computer-assisted Wikipedia translation. I will write English to Czech translation part. See English project page (done, ToDo, ...). My English is bad, therefor I need your help (docs, mailing list, code, comments and ideas, sourceforge project, ...). Note: Please correct my English. I will study my mistakes. Thanks. --Michal Jurosz 10:00, 30 Jul 2004 (UTC)
Moved from Discussion
editTranslate or Write from scratch?
editAre you sure the other language wikipedias would rather translate text than write it themselves? It seems to me that it's almost more effort to translate text than to write an article yourself. For instance, I run the Esperanto wikipedia or eo: and I think we appreciate the international nature of our articles. I was wondering if other second-language wikipedias would feel the same way. I mean do the other language wikipedias want it?
- Initially, it´s better have something that nothing. So, I prefer a translation if there is not an esperanto original language article. Remember, later you can wiki (modify) the translated article.
- I strongly agree: it is better to have a translated article to refer to, than no article, and it is usually easier to correct a couple of errors at a time than start writing a new article. Now, what will happen when a translated article is corrected and then the original is modified? pgan002 2004-06-25 15:53 EST
T14N, I18N, L10N
edit- I'm an active contributor/translator for the Vietnamese Wikipedia, and I find that it's okay to simply translate most articles to Vietnamese, but with articles that relate more to Vietnamese-speakers (such as articles on Vietnam, Vietnamese, Vietnamese-Americans, etc.), it's better to write an article (almost) from scratch. Because the international Wikipedias are more than internationalization – they're also about localization. – Minh Nguyễn 22:50, 15 Apr 2004 (UTC)
- Then do you go back and contribute your advanced knowledge of Vietnamese language and culture to the English (or other existing) version? 59.104.85.53 00:49, 24 July 2005 (UTC)
It's true that the effort to write articles is almost the same as the effort to translate. But there are some exceptions. If you are not an expert in the topic, it's easier to translate than to write. On the other hand, it's clear to me that the number of contributers to the portuguese encyclopedia is very small. You must consider the fact that portuguese is not a second-language, but the first language of milions of people. Unfortunaly very few of those millions have access to the internet and/or have an education. A free encyclopedia would be an extraordinary resource for tose people, so every effort to speed the creation of portugurese version is welcomed. Of course, people that write to a second language wikipedia like Vikipedio, have diferent purposes, do it for fun and are not interested in Machine Translation.
PS-You may be interested in knowing that the Traduki project uses esperanto for the deeper word representation to achieve machine translation. user:joao.
- I fail to understand why this would be a good thing. Esperanto is yet another language. Why would it be superior and if it is superior in some aspects would that not be it, just some aspects ? GerardM 08:06, 10 March 2006 (UTC)
Point well made. It would be especially good for the minority languages. I was aware of the Traduki project and it looks interesting although it looks like nothing has happened on the project lately... maybe I'm wrong. I actually looked at the pages again yesterday. ...and since I'm going to start learning Portuguese soon (I plan to visit Brazil next August), I'll probably take a closer look at it later. I now know Sim, N~ao and Obrigado. :)
Make Auto-translations available?
editNow that I think more about it I'd like to see auto-translation so I can get rough translations of encyclopedias in non-English, non-Esperanto language wikipedias. Seems like in a future version we could have a drop down list on each page that could translate a page for us and also give a link to the article on another language wikipedia if it exists. I'm think there's already free services that do this, does anyone know? --Chuck Smith
There are some links to such services above under "Free translations on the web" Joao
Has anyone seen Google Translate at http://translate.google.com/translate_t ?
Would a automatical translation script be run only once for each article, multiple times at an interval, immediately when changes are made or immediately on demand by a reader? If only once or by an interval, how would article conficts be handled? 24.198.63.192 03:52 Oct 18, 2002 (UTC)
Machine translation can give the best of both worlds:
- point your machine translator at the English version (probably) and get local language version with the most content, with a slight US/European bias.
- point your normal browser at your own language version, and get (sometimes) less content, but more readable, and with local bias, as well as local-specific content. User:Willsmith
To translate an article to another language is a great thing. Especially because not the whole world understands english. But in my opinion a static translation would decrease the quality of wikipedia. Auto translated text is almost always really in bad quality. But I would give the user the option to just view an english page in his native language by simply getting a tranlation in his/her native language without adding an article to the wikipedia in that language. That way the user definetelly knows that this is just an auto-translated article and has to be read very carfully because of translation mistakes. The same way google can translate a page online. User:GarciaB
...or Not
edit- How about never, because machine translation produces utterly crap results? There are plenty of crap online translators out there that people can feed pages into if they simply must; I don't think we should encourage it that much. --Brion VIBBER 06:01 Oct 18, 2002 (UTC)
- Seconded! -- Tarquin
Yeah, I'm not so big on the idea anymore either. I do think it's interesting as an extremely long term project, though.
Agreed here too, though I think the project has merits in its own right. Even if it's no more helpful to them than what's currently available, a great deal of information can be collected from translators who use the tool as a starting point. It will take someone to write the right kind of program though, one that can be extended by non-technical users. That's where the wiki in WikiTran gets its power. For instance, if it draws upon the Wiktionary, then efforts to build a library of translations there won't be duplicated. 59.104.85.53 01:35, 24 July 2005 (UTC)
I've noticed that machine translation is adequate for getting the general meaning across but isn't very pleasing to the eye. If it has to be used, it'd probably best be used to populate a blank page so that native speakers of the language can clean it up in normal wikiwiki style. -- Daniel Thomas
Have any of you heard of Knowledge Based Translation Systems. This is a hibrid of Machine Translation, Translation Memories and Human Translation. It is offered by a company called SDL and reduces the translation process massively but gives a quality that is indistinguishable from full human translation. It is generally more appropriate for technical writing as opposed to 'flowery' marketing text and literary works. SDL [1] (there the people behind www.FreeTranslation.com [2])
Mi nur bedauwras ke la tuta diskuto estas nur en la angla kaj ke la diskutantoj deiras de la punkto kvazaux la anglalingva vikipedio estus la cxefa kulturfonto. Ja valorus traduki artikolojn jam ekzistantajn sed en cxiujn direktojn (ne nepre nur de la angla). Kaj mi gxis nun spertis ke la auxtomataj tradukiloj donas acxegajn rezultojn. Arno Lagrange
Fixing Auto-translators
editWhen I use automated translation, I usually observe two problems:
- The sentence stucture cannot be parsed correctly, and
- the meaning of certain words is misunderstood by the translation program.
Both seems to be caused by ambiguities.
So, my idea is:
- Run the texts through a parser that displays all ambiguities.
- Create a disambingued version (maybe using an artificially enhanced grammar) and recheck it using the parser.
- Then, automatically translate the disambingued text into every other language.
- Finally, for each language, merge the translated text with already existing text (and maybe correct some oddities).
This would require for each language to add two additional wikis to the 'presentable' version: one for disambingued texts, and one as a collection pool for raw translations. Sloyment 12:47, 22 Oct 2003 (UTC)
Some examples how the above procedure could work:
- The German sentence "Die Katze hatte Klaus bereits verspeist" can mean both: "Klaus had already eaten the cat" or "The cat had already eaten Klaus". So, in this case, the parser would say: "In the sentence blahblah, I don't understand who eats whom." So, for these situations, there could be case marks, like {1}: nominative, {2}: genitive, {3}: dative, {4}: accusative. So if we change the sentence to: "Die Katze{4} hatte Klaus{1} bereits verspeist", it will be clear that the cat gets eaten. (BTW: Google translates "verspeisen" with "feed" -- which is wrong).
- In some cases, it might be neccessary to just disambingue the hierarchy within complex expressions. The structure could be disambingued using brackets, e.g. "{{{parallel port} {{flat bed} scanner}} {{reverse engineering} howto}}".
- Words with several meanings (e.g. "port") could be clarified in a definition section.
The assumption behind this idea is that it would be easier to disambingue a text than to translate it, and that it is easier to correct an automated translation that has only few mistakes in it, than to correct the rubbish that current translation programs produce. Sloyment 14:59, 22 Oct 2003 (UTC)
There are other problems. Some languages may not have words or phrases for certain technical concepts because no native speaker has ever needed them before. This is particularly true of languages with small numbers of native speakers in rural settings. It may be difficult to automatically translate an article on co-routines, for instance, because ideas like subroutine, co-routine, time-sharing and multi-tasking have never been put into words in that particular language before. A human translator can normally use a bit of imagination to invent a new term or reuse a term previously used for an analogous existing concept and if the translator is any good, the result will fit into the language reasonably well. However a machine can do little better than to leave the untranslatable term untranslated and mark it for human attention. -- Derek Ross 16:05, 26 Mar 2004 (UTC)
Good idea! The disambigued text could be marked with symbols that appear when you click "edit this page" but are invisible in the displayed page (like comments .)
Two other types of ambiguity that could be marked:
How words are grouped:
- I put it in (the oven with the glass door).
- I put it (in the oven) with the oven mitts.
Which part of speech a word is (verb, noun, preposition, adjective):
- Time flies(v) like(p) an arrow.
- Fruit flies(n) like(v) a banana.
People could fairly easily learn how to disambigue text: much more easily than learning another language, I think. Maybe the disambigued text would be easy (well, comparatively easy) for a machine to translate into another language. Then a native speaker of the other language would probably have to fix it up a bit, but they could probably do that without having to know the language of the original. They could do things like change "I washed my teeth with a brush" to "I cleaned my teeth with a toothbrush" or "I brushed my teeth". That would fit in well with how wikis work.
If a piece of text has not been disambigued, and if the same ambiguity doesn't reproduce in the target language, perhaps the machine translator could present several possibilities. E.g. "Klaus ate the cat/ The cat ate Klaus" and someone could specify which is correct. Perhaps the machine should know good ways to ask someone who speaks German well but doesn't know the disambiguation symbols who ate whom. "Hat Klaus die Katze verspeist?" I think is not ambiguous. Or it could present two possibilities: "Klaus hat die Katze..." oder "Die Katze hat Klaus..." and ask the writer to specify which. This would be easier for the machine to do, I think, but doesn't really solve the problem; the writer might see both as correct or not know which means which.
Is there machine translation discussion going on somewhere else? --Coppertwig 04:42, 11 December 2006 (UTC)
I agree that the above is a great idea. Links already give the text more context; It's already easy to disambiguate nouns because they usually have pages. (fruit flies, oven mitts in the above examples). Perhaps wikipedia pages be linked to dictionary definitions for disambiguation of ALL words (e.g. orange(color) vs orange(fruit)). Maybe a convenient disambiguation syntax (short of writing orange could be devised?
I really like the idea of grouping hints.
Additional motivation for grouping & disambiguation hints would be: (i) improved Text To Speech quality, (ii) and improved utility of wikipedia data for AI (e.g. the way it was used by Watson) Fmadd (talk) 18:56, 15 April 2016 (UTC)
Other wondering
editThree main things I'm wondering about.
- I don't remember where it was, but I know for sure that recently I saw an article on using neural networking so that "computers can learn languages" and to translate between them; the whole purpose was originally translation from/to minority languages or ancient languages for which not many people see the point of making a translation program for.
- Any viable MT project would do itself much good to include the Reverso method in some way. (Reverso method = find a translation that will back-translate as close as possible to the original)
- I am a member of the UNDL foundation or whatever the hell they call it. I have access to all their crap and everything. Really a very nice thing. If anybody wants the documents, I think I'm allowed to give them out provided you promise to use them for personal/not-for-profit purposes only. I don't think we'd be allowed the use of the Enconverters/Deconverters, but the programming for them themselves should be fairly easy; the main thing that the people working on that project are working on is the encoversion/deconversion RULES for different languages. I think that an open-source program which encorporated UNL would be perfectly legal. If anybody is interested, though, I'll have to check all the licencing crap they sent me.
So essentially, if I knew any programming language other than HTML (hey, I'm only 14, though I am going to begin taking CC courses in C or some crap like that over the summer) and I were to make MT software, it would incorporate all 3 of these. I think that a lot of the programming behind neural networks is availible for free online to plug into whatever you want, so that (afaik) wouldn't be very hard, except maybe the customization part.
UNL, at its best, claims a 99% accuracy rate. I have seen UNL at work. The English deconversions are fantastic, though they do leave something to be desired. As far as I can tell from what others have told me, though, the deconversions for languages such as Russian and Italian are - though one can get what they say - totally ungrammatical.--Node_ue 03:11, 7 Apr 2004 (UTC)
- Then imagine what deconversions to Asian languages would be like… :) – Minh Nguyễn
- Yes, I have seen the results of Japanese and Chinese deconversions. They're actually OK (in the case of Chinese; the Japanese ones are grammatically OK but most of the vocabulary seems to be missing), but the emphasis here is that UNL is a work-in-progress and should not be judged in its present form. It may be at least semi-sucky at teh stage it is in right now, but hopefully it will improve. Also, I've noticed lots of users on this page say stuff about how MT produces sucky results. Well, if you think about the advance of MT technology like the advance of computers, then it makes more sense. If you just look up each word in a dictionary, well, that is the original method of machine translation. When you start to add some grammar, that is what comes next. When you start to add more grammar and even some context-sensitivity, that's even better. Then there come the more advanced things: the "reverso method" (trying to get a translation so that the back translation matches the original as closely as possible), neural networks, UNL, etc... These methods produce much better results than those before them, and those before them produce much better results than those before THEM... ASASF... Node
- My project www.babelcode.org has the same goal as UNL, but has better underlying theory and better development strategy. bootedcat
|}
So What ?
editScuse me folks, I've just read the whole page for the first time and felt like adding my little contribution : I'm french and have written some articles for the French Wikipedia, but since I could'nt help thinking about this translation question I decided to stop writing and to go to the English Wikipedia in search of some kind of an answer (since you english-talking guys are a lot more numerous than us.)... So I came here and ... Gee, what a mess ! can't anyone here try to clean a little this page ? (I didn't dare to do it myself) This isn't a forum ; think a little of all these people like me who come here coz they're seeking for some ideas or explanations -and I don't mean you haven't any ideas, this long discussing is quite full of propositions and ideas, but could someone make it simpler, shorter and more understandable ? Instead of filling the whole page with -interesting- conversations, couldn't it be better to edit in a single part (as a list for instance)the latest propositions and ways of proceed to solve the problem of translation, with their advantages and disadvantages, to the international stupid ones like me who can't understand such impressive conferences ? Thanks everyone for your attention ; (I hope this contribution will be deleted soon with the recasting of the page.)
A french friend of Wikipedia, user:persivre
- Okay, now we're on a talk page. And it is messy, isn't it? I was surprised when I arrived here that the talk pages rely on the same technology as the encyclopedia, rather than using standard forums. Edit wars of discussions and even votes are the result. 59.104.85.53 01:27, 24 July 2005 (UTC)
What's the status of this project?
editI got involved in Wikimedia a couple of days ago, came across the WikiProject_Translation project [[3]] and thought: hey, that's something that interests me! It also looked like a fairly new project that would be active. I decided to look into the tooling side of translating and eventually arrived here.
My impression is that - apart from exchanging ideas - nothing much has happened in this project in two years. If anyone's still watching this page, maybe you could confirm this (or give a short project status update).
Is anybody still interested in cooperating to make some MT and/or TM tools available (together with the WikiProject_Translation project)?
At the moment, the idea of developing (coding) our own translation tool doesn't seem at all feasible to me. I'd much prefer to investigate the possibilities of making existing translation tools available (through discounts, sponsorship etc.). IMHO, if we are to use translation tools, then they have to be reliable, easy-to-use, tried and tested etc. It would also be nice if we could make some progress within a few months rather than within 10 years.
There are a number of people at Meta who have professional translating experience. I think that we should try to involve some of these people in drawing up the key requirements that translation tools should meet, drawing up a short-list of potentially useful tools, evaluating them and making recommendations to the wider translation community. As far as licences go, we don't need to have a licence to change or develop the code. We just need a licence to be able to use the tool. If we could organize the translation process and resources better (which is the aim of the Wikiproject_translation project) then we would only need a limited number of licences for Wikipedians who provide translation services to other Wikipedians. --Idaho2000 06:52, 28 August 2005 (UTC)
Translation between other languages
editMuch has been said here about bias, and that is a real problem. In an article on Vietnam in the Vietnamese wikipedia, you can skip some basic things, since the readers know more, but you must say more, since readers ecxept more than English readers would. Machine-translation is more appropriate for articles that are less likely to vary across cultures. Also, there will always be a need for human post-editing, both linguistically and content-wise.
But translating from English to all other languages need not be the optimal solution, for both cultural and linguistic reasons. Culturally, it could be wiser to translate from a neighbour language, sharing more cultural knowledge than one does with English (say, from Russian to Ukrainian or Belorussian), and linguistically, machine translation in small steps, between closely related languages, will create better result. Moreover, commercial MT projects tend to consentrate on the X-to-English part, i.e., the opposite of what we want here.
For Norwegian we are looking into translating from Bokmål Norwegian (39000 articles) to Nynorsk Norwegian (11000 articles), we consider using an existing MT module for this specific language pair, and have it adjusted to the wiki format, e.g. by paying special attention to references, categories, including no:reference (which naturally is not part of the original Bokmål article, etc. The plan is to make this option available in the editing process, and then to do manual postediting. More often than not, such post-editing also improves the original article, as anyone who tried translating articles from one wikipedia to another can confirm. One possible such pair is Castillian Spanish (70000 articlas) to Catalan (18000 articles), in Barcelona, newspapers are published bilingually on a daily basis, with the help of MT. Trondtr 11:37, 15 October 2005 (UTC).
Wikis for results
editThere could be special wikis into which bots could dump their “translations”. So, for example en-de.wikipedia.og
would be a German translation of the English Wikipedia. Bots might have different strategies in which order to translate the articles. In order not to double work, the date of the original versions and the translation method could be noted at the top of the translation. -- Sloyment 17:57, 15 December 2005 (UTC)
Getting ready: starting small to grow big
editHuman language translation is quite complex. Whatever is to be done it will require significant linguistic information to be obtained from whatever sources. This information would also have to be maintained and this maintenance could be done in much the same way Wikipedia articles are maintained.
There is a lot of work that can be done before any translation features are made - there is a severe lack of information specifying languages, pronunciations, dialects, alphabets, quotes, etc - lots of markup. For example, although English Wikipedia is predominantly in English, it frequently features quotes in other languages in a way that automatic translator cannot recognize. This is just one issue out of many. Eliminating them can only help translation project.
Then, there can be some proving ground. There are many translation integration side-effects that need to be considered. Some easily forseable and, perhaps, some not. Proving ground can help weed out any unwanted effects or, at least, identify them early. Picking and doing some easy translation project first can, therefore be beneficial - and there are those - such as translation between similar dialects of the same language. Although it may appear as if it is not needed (as dialects are mutually intelligible and generally understood by all speakers of the language regardless of their native dialect), it becomes important when articles are to be found - whether by using Wikipedia search engine or external one, such as Google. For example, word "milk" in my language is either "mleko", "mlijeko", "mliko" and sometimes also "mljeko". This does not confuse any listener as they all sound similar, but if the "wrong one" is entered into the search field, article won't be found...
I've written a lot about this and similar issues that various language Wikipedias are facing, specifically focusing on Serbian Wikipedia's Challenges (written in English). Please read when you have time - many ideas are collected there that would be useful for many languages covering possibly billions of readers...
--Aleksandar Šušnjar 21:22, 9 March 2006 (UTC)
Regarding Apertium posted below as a continuation of this post - it would be nice to try that. I mentioned Serbian dialects as extremely simple - essentially there is nothing to translate - differences are stemming from different pronounciation of once-existing letter, Yat. That's all, but requires a dictionary. Coupled with availability of two alphabets (one "lossy") dialect conversion + transliteration can prove to be an interesting exersize, already running on official Serbian Wikipedia (for about a week now) - and we're seeing intial onset of problems to learn from...
--Aleksandar Šušnjar 04:51, 10 March 2006 (UTC)
- Document! document! document! :))
- When I heard about it first my first thoughts were for a Macedonian <-> Bulgarian adaptation, but that would be much more work. - FrancisTyers 11:25, 10 March 2006 (UTC)
Here's the document :)) --64.119.118.231 13:23, 10 March 2006 (UTC)
Apertium
editJust to mention it here too, en:Apertium is an open source project to translate between related languages. When translating a short article from the es:Privatización to ca:Privatització using the online version [4], it seemed to make only two mistakes [5]. It would be interesting to test this out on a larger scale. - FrancisTyers 22:48, 9 March 2006 (UTC)
Languages that can benefit from this technology (pages as Wikipedia articles):
- Spanish (268,547 pages) <--> Catalan (71,407 pages)
- Spanish (268,547 pages) <--> Galician (28,482 pages)
Languages that can potentially benefit from this technology:
- Bulgarian (80,520 pages) <--> Macedonian (12,637 pages)
- Persian (28,444 pages) <--> Tajik (30 pages)
- Indonesian (55,963 pages) <--> Malaysian (25,384 pages)
- Hindi (3,800 pages) <--> Urdu (3,379 pages)
OmegaT
editIt is a CAT-Tool ... GPL ... and it could easily be adapted to read from a wiki page, have text translated on the local computer and store the translated text to a wiki page - plans on that end already exist. See also en:OmegaT --Sabine 13:19, 21 April 2006 (UTC)
Machine translation for contents creation
editMachine translation for fast contents creation makes only sense if afterwards the text is being proof-read by a native speaker. It will work only for certain languages - for most language combinations machine translation creates a huge mess and cleaning up takes longer than working regularly wit a CAT tool (CAT=Computer Assisted Translation). I would very much like to deepen these thematics, being a translator it is one of my the things that interest me, but time is short, so I will come and have a look every now and then. If you have specific questions, also on OmegaT, please contact me. Best, --Sabine 13:22, 21 April 2006 (UTC)
- I generally agree for most machine translation. However, as mentioned above, for translating between closely related languages, it might not hold. - FrancisTyers 14:06, 6 May 2006 (UTC)
wixi: chunk translation interface
editwixi interface can help users create translation data useful to machine translation.. wixi will enable translators to easily format text for language learners (and when working with CAT tools, provide instructive feedback to MT sources).. basic xcroll interface includes two input plus one preview window formatting text with twext..
system promises to build bridges between a wide variety of languages; especially useful with songs, pithy quotes, proverbs, sayings, poems, etc.. System could integrate with variety of softwares, including multilingual wikis and mediawiki.. http://twext.cc/license is GPL/CCL, with flexibility to host variable licenses by permission.. system focuses neither on grammar study (boring), nor "immersion" (stressful), but rather on "comprehensible input" http://www.sk.com.br/sk-krash.html
OLPC showed some interest in project.. http://www.gelbukh.com/ shows interest in mentoring development of wixi for inclusion in Google's Summer of Code for 2008 and may convince students to begin development in the academic year of 2007-2008..
http://wixi.cc/lucha-libre challenges computer science students in México and all over to deliver a simple twexter, potentially useful to a large number of language learners around the world, including millions of kids using the OLPC.
Considerazioni sulla complessità delle traduzioni automatiche.
Sono italiano e desidero esporre, senza alcuna pretesa di originalità, alcune mie considerazione riguardo ad una possibile strada per semplificare e rendere precisa la traduzione fra linguaggi diversi.
1) L'incremento della complessità delle traduzioni fra lingue diverse, con l'aumentare del numero di queste, segue una legge quadratica simile a quella che regola il numero di diagonali di un poligono.
2) Tutte le lingue hanno avuto una evoluzione da lingue storicamente precedenti, molte volte da linguaggi storicamente consolidati (ricche di letteratura), come il latino e il greco, sono derivate diverse altre lingue: p.e. Italiano, Spagnolo, Portoghese, Francese ...
3) Se si utilizza come intermediaria una lingua progenitrice si potrebbe ipotizzare una notevole semplificazione derivante dal fatto che la crescita di complessità di traduzione fra un gruppo di lingue derivate dalla stessa lingua progenitrice, utilizzando questa come intermedia, diventerebbe lineare come il numero di segmenti che uniscono un punto ai vertici di un poligono.
4) Si potrebbe quindi ipotizzare un progetto di recupero dei glossari delle lingue progenitrici dalla letteratura disponibile e attivare dei traduttori automatici dalle lingue progenitrici a quelle attuali e viceversa, passando poi alla traduzione fra due lingue attuali per mezzo di due macchine traduttrici che utilizzano la lingua progenitrice come ponte. Esiste già un progetto in tal senso?
Idea for a very basic translation aid
editThis is an idea for an aid for translators which I think would be easy to program and implement on Wikipedia.
We already have Wiktionary; it is already an aid to translators, with lists of translations of words. This idea would go one step further.
Instead of having to look words up one at a time in Wiktionary, this idea would allow the human translator to essentially look up many words with one click of the mouse. Similar to automatic translators available on the Internet, but perhaps covering many more languages (languages covered would depend on user input, just as Wiktionary does).
Input: "I want to write a report." click on "look up French words" Output could look like this:
- I: je
- want: vouloir (veux,voudrai,voulu); préférer
- to: á
- write: écrire (écris,écrirai,écrit); rédiger
- a: un, une
- report: rapport(m); dossier(m)
Description of output: a brief output for each word of input. Perhaps the output is always the same for a given input word. The output gives one word or a choice of several words. It gives lexical information such as the gender of nouns, and a few forms of irregular verbs, not so many as to take up a lot of space, but enough to allow someone familiar with the language to produce many verb forms even if they are not familiar with the particular verb or need to be reminded about it; each word would also be a link to the Wiktionary entry or something so the person could get more complete information if desired.
Advantages over Wiktionary: The person could get many words looked up at once with a single click. The person would not have to skim a whole Wiktionary page (pronunciation, meaning, etymology, translations into languages other than the target language, etc.) to find the one line of information they want.
Advantages over machine translations as translation aids: Possibly available in more languages (depends on users entering words into Wiktionary or similar database). Gives more than one suggestion for each word, so the translating person can choose the one they find most appropriate.
Limitations: the user needs to ignore "to" being translated into "à", realizing that this is just the computer being stupid. The person still needs to know the grammar of the language they're translating into; they still need to have some knowledge of both languages.
Uses: I would find this useful as an aid to translating when either the language I'm translating from or the one I'm translating into is a language I know the grammar of reasonably well but don't have a big vocabulary in. I would find this useful when translating between languages I know well, especially if I'm tired, as I'll sometimes temporarily forget a word even in my native language when I've just been looking at another language; I think this is a common human trait related to the part of the brain that lets us use one language without frequently throwing in words from other languages by mistake. I would glance at the list of words and it would remind me what the words are, or perhaps suggest alternate translations such as rédiger which I might not think of on my own but might recognize and realize are a good word to use when I see them in the list.
Development: Would require some simple software. Some sort of interface to make it available on one of the Wiki projects. And would require users to supply the single-word translations. Perhaps large numbers of these can be easily extracted from Wiktionary pages (though perhaps without the lexical information, which could be added later.) --Coppertwig 04:47, 10 December 2006 (UTC)
Phrase translation database
editI think the Wiktionaries are for single words only; am I wrong about that? I think we need to collect an extensive database of translated phrases e.g. "conflict of interest" defined and translated into many languages, just as we do for words. Either within the existing Wiktionaries, or if the Wiktionaries don't want that, then separate projects for phrases. --Coppertwig 04:53, 10 December 2006 (UTC)
Phrase translating tool
editI've already started something like this, for my own use in translating from the English to Afrikaans wikipedias, and it's easily extendable to other languages (not only FROM English to X-lang, but from X-lang to Y-lang). My aim in doing this was to write a tool that could support the efforts of translators to populate all the South African language wikipedias. I've found it remarkably useful in speeding up my work. I used it particularly for translating things like country templates, not for the actual body of articles, which are a whole different thing.
It works like this:
- copy the template from the English edit page
- paste into a textarea on a webpage and click translate
- the new version appears, and can be pasted into the new Afrikaans article.
Current status
- there are currently 165 strings in my tiny little database
- the database consists, basically, of a table with phrases, and an English and Afrikaans column.
- strings can be single words, such as French (translated to Frans) or phrases such as French language, translated to Franse taal. You can see the advantage in using a phrase translator.
- For obvious reasons, the translation tool words from longest string (in English) to shortest, so the phrase French language is translated before French
The future
- I envisage extending the tool to include other languages. This can easily be done by adding a new table column in the database, and populating these.
- I also see people being able to submit sample translation phrases online to rapidly grow the database. Potentially, phrases could also be stored per user, so that if they're unhappy with the 'central' translation, they can substitute their own and not be slowed down in their editing.
The point is that it's not meant to be a complete or 100% accurate translation. But if greatly simplifies the work of an editor. The editor will of course manually go through the resultant page, translating any missing strings, and sorting out any grammar etc oddities that arose.
I know there are existing dictionaries and perhaps tools, but I was frustrated trying to use, and decided to whip something up quickly for my own use.
It's embarassingly ugly at present, but I'll post the URL if anyone shows interest! You can also email me via my user page. Greenman 19:21, 9 January 2007 (UTC)
- Ok, here it is :) Feel free to post me suggested languages, or suggested phrases for me to manually add for now, until I develop the more streamlined method, populate more thoroughly etc. Greenman 21:33, 9 January 2007 (UTC)
- That's very good. Now, can you go one more step and set it up as a wiki, so that many people can easily add phrases? That will be much more useful. Another thing someone might do is automatically load in huge numbers of translations of individual words from Wiktionary.
- What I would really like to see is: a machine translation program for which the program itself is relatively simple, and the program reads grammar and vocabulary which have been written by many people just the way people are writing Wiktionary, except that it has just a little more information about how the words fit together in the language.
- When translating an article, I would like to be able to edit the Wiktionary pages for the words in the article and immediately be able to run the automatic translator using the new information I just added. That way I could check that the information works correctly, and the next person translating an article with similar words and phrases might see them working correctly for them. --Coppertwig 22:45, 14 January 2007 (UTC)
- I'm busy messing around with Pootle, as I think that may be the most convenient way to contribute words. Failing that, I'll put up a wiki soon. Greenman 23:43, 14 January 2007 (UTC)
- I think putting up a wiki may be a useful step. Interfacing with Wiktionary may be even more useful. See also User:Coppertwig/Wikiglotbot for a vision of how machine translation may work using wikis. --Coppertwig 02:33, 16 January 2007 (UTC)
- A wiki is up, linked to off the main translate tool. It's PHPWiki, so not as easy to use, but can be a start. Feel free to add any languages there. I'll need to manually sync the wiki with the translation tool for now, but I will try and do this regularly while I work on the automation, imports etc. Greenman 13:44, 17 January 2007 (UTC)
- I think putting up a wiki may be a useful step. Interfacing with Wiktionary may be even more useful. See also User:Coppertwig/Wikiglotbot for a vision of how machine translation may work using wikis. --Coppertwig 02:33, 16 January 2007 (UTC)
- I'm busy messing around with Pootle, as I think that may be the most convenient way to contribute words. Failing that, I'll put up a wiki soon. Greenman 23:43, 14 January 2007 (UTC)
Is this project active or has this sort of work relocated? (edmund.huber - Fri Jul 17 15:47:14 EDT 2009)
New page
editI have updated the page (after years) with some ideas and drawing a lot from the Apertium Wiki. Please comment, I would really like to know your ideas, people! :) Tresoldi 16:59, 13 March 2010 (UTC)
Verifiability - Machine Translation - Request for Comments
edit- Comments are requested from all interested editors at a discussion to amend WP:V. Please participate. Do you support the proposal to amend the guidance in WP:NONENG regarding the use of machine translations, as given below? Please note that the scope of WP:NONENG is limited to the translation of non-English sources for use in English Wikipedia.
┌─────────────────────────────────┘
The proposal is to replace this sentence in WP:NONENG :
- Translations published by reliable sources are preferred over translations by Wikipedians, but translations by Wikipedians are preferred over machine translations.
with the following :
- Translations published by reliable sources are preferred over translations by Wikipedians, and should always be attributed. A machine translation may be used in the text of the article only if the Wikipedian speaks the source language and confirms the accuracy of the translation.
- Footnote: Attributions and confirmations may be provided on the talk page or in the edit summary.
┌─────────────────────────────────┘
Please add your comments at WP:V:talk and not here. Thanks. Rubywine 02:26, 17 August 2011 (UTC)
Do we know of any past system, other than mw:Content translation, which provided CAT/MT within a rich text editor? In my past research I didn't encounter any. --Nemo 15:08, 28 February 2015 (UTC)