IRC office hours/Office hours 2014-01-08
Time: 17:00-18:00 UTC
Channel: #wikimedia-office
Timestamps are in UTC.
[17:01:10] <arrbee> === Wikimedia Language Engineering Office Hour - Start ===
[17:01:31] <arrbee> Hello, Welcome to the monthly office hour of the Wikimedia Language Engineering team. This is the first office hour of 2014.
[17:01:47] <arrbee> My name is Runa and I am the Outreach and QA coordinator for the team.
[17:02:00] <arrbee> Our last office hour was held on December 11, 2013
[17:02:17] <arrbee> The logs are at https://meta.wikimedia.org/wiki/IRC_office_hours/Office_hours_2013-12-11
[17:02:35] <arrbee> Please note, as the topic of the channel suggests the chat today will be logged and publicly posted.
[17:03:07] <arrbee> For those who haven't met us before, we are the Wikimedia Language Engineering team and we work on enabling support for all the languages of the world on MediaWiki and Wikimedia projects.
[17:03:38] <arrbee> Currently, we provide tools and features for nearly 300 languages. Using these tools users around the world can read and edit Wikimedia websites in their own languages.
[17:03:43] <marktraceur> *all* activity in this channel is logged and publicly posted. Wikimedia cannot be held liable for lost or stolen property while in this channel. Keep your arms and hands inside the vehicle at all times.
[17:04:04] <arrbee> Thanks marktraceur :)
[17:04:40] <arrbee> The other members of our team present today are: aharoni Nikerabbit kart_ siebrand and pginer
[17:04:50] <Nikerabbit> hi
[17:04:55] <siebrand> hi
[17:05:03] <arrbee> Joining us today for the first time is our newest team member - David Chan aka divec
[17:05:09] <divec> Hello
[17:05:09] <siebrand> wee!
[17:05:30] <arrbee> Some of you may have already met him. He is currently sharing his time between VisualEditor and Language Engineering.
[17:05:51] <arrbee> With David joining the team, we hope to work more closely with the VisualEditor team to bring internationalization features to VisualEditor.
[17:06:14] <arrbee> Today we will briefly talk about our recent work.
[17:06:22] <arrbee> After that we are open for questions.
[17:06:35] <arrbee> If you'd like to send in your questions before the Q&A, please send them to me as a private message.
[17:06:45] <arrbee> === Project Updates ===
[17:07:03] <arrbee> === Completion of the new translatewiki.net main page ===
[17:07:18] <arrbee> Development of the TwnMainPage extension was completed.
[17:07:29] <arrbee> translatewiki.net now has a new user registration process for new users.
[17:07:49] <arrbee> New users have to provide test translations, which are reviewed by administrators. If the test translations look OK, the administrators promote the new users to translator role.
[17:08:06] <arrbee> Logged in users have a new dashboard on the main page that provides insight in their activity compared to that of other users.
[17:08:29] <arrbee> You can see this by visiting https://translatewiki.net/
[17:09:03] <arrbee> Nikerabbit: anything to add here before we move to the next topic
[17:09:05] <arrbee> ?
[17:10:08] <arrbee> Probably not :)
[17:10:12] <arrbee> === Changes to Plural Rules ===
[17:10:14] <Nikerabbit> arrbee: I don't have stats but we have tens of translators gone through that process already
[17:10:45] <arrbee> Nikerabbit: Thats impressive. Any plans to have the stats collected sometime?
[17:11:08] <siebrand> See https://translatewiki.net/wiki/Special:Log/newusers
[17:11:27] <Nikerabbit> we are currently relying on that + the graphical version of that
[17:11:47] <arrbee> Thats a long list.
[17:11:54] <arrbee> Nice!
[17:12:01] <arrbee> Thanks Nikerabbit siebrand
[17:12:16] <arrbee> Moving on
[17:12:23] <arrbee> === Changes to Plural Rules ===
[17:12:33] <arrbee> MediaWiki now supports plural rules according to CLDR 24.
[17:12:48] <arrbee> This removes local customizations in MediaWiki, which are not necessary any more, and makes the plural rules support more robust and comprehensive.
[17:13:13] <arrbee> To make it possible, over a thousand current translation in Belarusian, Russian, Serbian, Ukrainian, and languages that have Russian as fallback were updated.
[17:13:30] <arrbee> This was done automatically initially, and the translators to these languages are making more necessary corrections as needed.
[17:13:56] <arrbee> Details are described at http://lists.wikimedia.org/pipermail/mediawiki-i18n/2014-January/000797.html
[17:14:16] <arrbee> In a recent blog post, Niklas (Nikerabbit) has described the use of CLDR data for plural use in MediaWiki i18n development: http://laxstrom.name/blag/2014/01/05/mediawiki-i18n-explained-plural/
[17:15:11] <arrbee> aharoni: would you like to add anything about the changes that had to be made for the languages that we mentioned?
[17:15:21] <aharoni> Yes...
[17:15:33] <aharoni> The update was surprisingly smooth and simple,
[17:15:56] <aharoni> Even though the backend changes, mostly done by santhosh and Tim Starling, were quite complicated,
[17:16:02] <aharoni> and a lot of translations had to be updated.
[17:16:22] <arrbee> Would you know how many… approximately?
[17:16:52] <aharoni> Much than a thousand. siebrand may have a more precise number.
[17:17:12] <aharoni> We sincerely thank the translators to Belarusian, Russian, Serbian, Ukrainian and other languages who are working to fix the last glitches there.
[17:17:21] <arrbee> :)
[17:17:21] <siebrand> About 4500 translations were potentially affected.
[17:17:40] <siebrand> Eventually some 2000 were updated, and I think 3000 were marked as outdated and needing ewview.
[17:17:42] <siebrand> review.
[17:17:53] <aharoni> I know Russian, so I contributed there a bit, as well :)
[17:18:06] <arrbee> Okay. Thats still a lot.
[17:18:12] <arrbee> Thanks aharoni siebrand
[17:18:27] <siebrand> I know people who can write regular expressions (/me winks at legoktm and Nikerabbit) and I can operate Pywikibot, so I contributed as well :)
[17:18:28] <arrbee> Next topic
[17:18:43] <arrbee> === Support for localization files in JSON ===
[17:18:46] <siebrand> (or actually, SieBOt did)
[17:19:04] <arrbee> MediaWiki always stored translations in PHP files.
[17:19:27] <arrbee> This is not portable, because it's unique to MediaWiki, and it's also not adequately secure.
[17:19:43] <arrbee> JSON is a simple and secure portable format that can be processed in any programming language.
[17:20:02] <arrbee> Most importantly, the format can be shared between backend PHP and frontend JavaScript.
[17:20:17] <arrbee> Translations can now be stored either in the old PHP format and in JSON format.
[17:20:44] <arrbee> Roan Kattouw from the VisualEditor team developed a script that allows easy conversion of an extension from PHP translations to JSON translations.
[17:21:00] <arrbee> It will output code which makes the extension fully backward compatible with older MediaWiki versions all the way to 1.16.
[17:21:13] <arrbee> Both formats are currently supported. The plan is to remove support for the PHP format by the end of 2014.
[17:21:41] <arrbee> Nikerabbit: any further comments about this?
[17:21:52] <protonk> Ironholds: ^
[17:22:07] <Nikerabbit> arrbee: yes
[17:22:21] <Nikerabbit> only few extensions have been converted so far
[17:22:42] <Nikerabbit> more extensions will be converted as well as core once we have full json support in localisation update
[17:23:51] <aharoni> ("localisation update" is a piece of software that automatically updates the translations in all the projects a few days after they are made in translatewiki.net)
[17:24:09] <arrbee> Thanks Nikerabbit . We may see more questions about this.
[17:24:28] <arrbee> === Upcoming Projects ===
[17:24:39] <arrbee> Yesterday we started development on the Content Translation project.
[17:24:59] <arrbee> This will result in a tool to make it easy to translate pages from one Wikipedia language version to another one.
[17:25:13] <arrbee> The project page is at https://www.mediawiki.org/wiki/Content_translation
[17:25:30] <arrbee> A protoype can be viewed here: http://pauginer.github.io/prototype-translate/translation-center.html
[17:26:07] <arrbee> The progress of this story can be tracked at: https://wikimedia.mingle.thoughtworks.com/projects/language_engineering/cards/3623
[17:26:23] <arrbee> We will have more updates and something real to show on this project in our next office hour.
[17:26:48] <arrbee> siebrand: would you like to say something more about the plans for this project?
[17:26:59] <siebrand> Sure.
[17:27:05] <siebrand> It's our latest plot to take over the world.
[17:27:24] <siebrand> It's in its infancy. We're starting small, and plan to launch it as soon as possible as a beta feature.
[17:27:34] <siebrand> It will not be able to do much in the beginning.
[17:27:51] <siebrand> We have many, many, many, many, many, many features that can be added to it over time.
[17:28:07] <siebrand> We'll probably draw a line somewhere, and launch it as "done".
[17:28:16] <siebrand> It's not really clear, when that is.
[17:28:34] <siebrand> I think it'll be a project that keeps out team busy for a while.
[17:28:37] <siebrand> The potential is huge.
[17:28:48] <siebrand> There have been tools before, that haven't been used a lot.
[17:29:13] <siebrand> We think that developing it as Wikimedia will increase trust and integration, as well as control over the development direciton.
[17:29:21] <siebrand> Stay tuned; we think this can be huge.
[17:29:35] <arrbee> Indeed :)
[17:29:42] <arrbee> Thanks siebrand
[17:29:46] <alolita> siebrand: can you clarify control over development direction
[17:29:55] <alolita> is that feature creep
[17:30:19] <alolita> or defining scope better
[17:30:31] <siebrand> If someone else than Wikimedia Foundation developers a feature, that party has control over how it's being developed, and final say over what is in an out.
[17:30:41] <siebrand> If WMF has the budget, it's obviously WMF.
[17:30:46] <siebrand> Hence, more control over development.
[17:31:35] <alolita> thanks! do we envision APIs for 3rd party use (many years later)
[17:33:37] <arrbee> siebrand: ^^
[17:33:54] <siebrand> arrbee: Oh, was that a questing.
[17:33:57] <siebrand> question.
[17:34:09] <siebrand> I read it as "we do envision APIs for 3rd party use (many years later)"
[17:34:30] <siebrand> The answer is: I don't know what 3rd party APIs are meant here. It's a maybe.
[17:34:52] <siebrand> All of the code we develop has open licenses, but this feature is primary developed for use in Wikimedia wikis.
[17:34:52] <Nemo_bis> siebrand: what do you mean "developes a feature
[17:35:40] <siebrand> Our target projects for now is Wikimedia, and possibly later Wikivoyage because of the similarities in structure of the main namespace.
[17:36:40] <alolita> do we plan to use open source tools for MT on the backend?
[17:37:00] <siebrand> Nemo_bis: A feature is a distinguishing characteristic of a software item (according to IEEE 829)
[17:37:51] <siebrand> alolita: We plan to allow any MT or TM backend supported to be used.
[17:38:02] <siebrand> hmm, that's a weird sentence.
[17:38:08] <siebrand> Let's rephrase that.
[17:38:30] <siebrand> There are two or three major machine translation parties: Google, Microsoft and Yandex.
[17:38:37] <siebrand> Google has a fully paid API.
[17:38:57] <Nemo_bis> siebrand: so anyone developing one fitting that definition will be able to get it included?
[17:39:07] <siebrand> We are talking to them to ask them if Wikimedia can use the Google Translate API at no cost.
[17:39:11] <siebrand> That's in progress.
[17:39:22] <Nikerabbit> there is also Apertium and Moses, which might do better in limited contexts
[17:39:34] <siebrand> Microsoft also has a paid API when more than 2 million characters per month are to be tranlalted.
[17:39:34] <alolita> There are open MTs such as Moses and Apertium
[17:39:39] <siebrand> Wikimedia has such a requirement.
[17:39:45] <siebrand> So we'll have to talk to them, too.
[17:39:50] <alolita> are we planning to evaluate these for specific language pairs
[17:39:53] <siebrand> Then there's Yandex; same thing.
[17:40:16] <alolita> having non-open MTs can be an issue for us
[17:40:23] <alolita> as you know
[17:40:24] <siebrand> There are open source machine translation ENGINES, that's a bit different from the multi language backends I've mentioned just before this.
[17:40:27] <siebrand> Apertium is one of those.
[17:40:49] <siebrand> Apertium instances most support a single or a very few language pairs.
[17:41:10] <siebrand> I plan to prioritise support for those kinds of backends as a secondary feature.
[17:41:18] <divec> I suspect there will always be language combinations where there's enough data available to build an MT engine, but it's not commercially viable enough for Google, Microsoft etc to support.
[17:41:31] <siebrand> I really want to have it, but it will not supply the bulk of the content, so we'll do it later, rather than sooner.
[17:41:35] <alolita> and most commercial MTs are also pretty poor at suggestions where there is not much data for language pairs (such as Indic languages)
[17:42:05] <siebrand> Language Engineering itself has no plans to work on machine translation engines.
[17:42:20] <alolita> from a product perspective perhaps
[17:42:20] <siebrand> This is only about supporting APIs of MT engines as data supplier.
[17:42:21] <Nikerabbit> Something like Finnish -> Sami could be a special pilot project that could use Apertium (just throwing ideas)
[17:42:29] <siebrand> Does that answer the questions? If not, which are open?
[17:42:51] <alolita> Nikerabbit: yes
[17:43:45] <siebrand> Nemo bis asked a question earlier: Will be accept classes that can interface with up to then unsupported MT backends?
[17:43:48] <siebrand> The answer is: yes.
[17:44:14] <alolita> are we looking at scoping the language pairs we plan to support initially
[17:44:35] <alolita> do we have data on what language pairs are even possibilities beyond the traditional top 10 Latin based languages
[17:44:46] <Nikerabbit> code alone is not enough though, we need stable service to be able to use it at WMF
[17:45:48] <alolita> nikerabbit: agreed
[17:45:57] <siebrand> Google Translate currently supports these languages:
[17:46:07] <siebrand> http://translate.google.com/about/intl/en_ALL/
[17:46:14] <siebrand> That's a list of MANY languages.
[17:46:30] <siebrand> There is no question that we'll focus on "traditional top 10 Latin based languages"
[17:46:44] <Nikerabbit> "support" does not equal to "is helpful for translation task"
[17:48:01] <siebrand> The idea is that a translation is bootstrapped to make human copy editing easier.
[17:48:29] <siebrand> The aim is not to have 100% MT. Other websites than Wikimedia already do that.
[17:48:40] <siebrand> (including the aforementioned MT suppliers)
[17:48:44] <alolita> are the top 10 Latin languages where we need to provide translation for helping editors create content
[17:49:09] <alolita> or are we supporting high growth non Latin languages
[17:49:11] <siebrand> alolita: What do you mean by that comment?
[17:49:43] <alolita> based on those factors - the effectiveness of any MT drops significantly for non-Latin languages
[17:50:03] <siebrand> alolita: Which factors are you talking about?
[17:50:21] <arrbee> Quick time check: We are down to the last 10 minutes now.
[17:50:42] <alolita> siebrand: useful results for non Latin languages
[17:50:52] <alolita> from any MT or TM
[17:51:05] <siebrand> alolita: Can you define useful results?
[17:51:17] <siebrand> alolita: I think it very much depends on the language pair that is being requested.
[17:51:25] <alolita> siebrand: agreed
[17:51:32] <siebrand> alolita: regardless of the languages, some pairs will render better results than others.
[17:52:30] <alolita> siebrand: yes - better is very context driven too for each language pair
[17:53:30] <Nikerabbit> does anyone have questions for other topics?
[17:54:10] <arrbee> I was hoping if sucheta and Niharika could give us a quick update of the OPW project
[17:54:41] <Niharika> Hello everyone.
[17:54:44] <pavanaja> What is the status of integrating ULS with VE?
[17:55:05] <Niharika> Here´s my progress report on the project: https://www.mediawiki.org/wiki/User:Niharika/Project_Progress_Report
[17:55:26] <sucheta> Hi all :) Niharika, our OPW intern has been working on Compacting interlanguage links as a beta feature. The latest patchset on her work on this, which is under review is: https://gerrit.wikimedia.org/r/#/c/104793/ .
[17:55:37] <alolita> Niharika: hi!
[17:55:42] <Niharika> Hi. :)
[17:55:53] <sucheta> The next steps would include, improvement on the current code, and to eventually have it as a beta feature.
[17:56:08] <sucheta> She has been maintaining her blog here in http://niharika29.roon.io/. Also, the weekly updates from her can be tracked at https://www.mediawiki.org/wiki/User:Niharika/Project_Progress_Report
[17:56:35] <arrbee> sucheta: Niharika : Do you have a time when you would be ready to showcase it as a beta feature?
[17:57:06] <sucheta> arrbee, That would complete the project, technically :)
[17:57:15] <Nikerabbit> pavanaja: what kind of integration do you have in mind?
[17:57:17] <arrbee> sucheta: aha
[17:57:42] <aharoni> pavanaja: are you talking about keyboard support?
[17:57:53] <pavanaja> @Nkerabbit - at present ULS is not available in VE
[17:58:20] <pavanaja> @aharoni - Yes.
[17:58:38] <aharoni> divec, santhosh - can you answer pavanaja?
[17:58:52] <arrbee> (This will have to be the last question before we wrap up in another 2 minutes)
[17:58:57] <aharoni> I think that you know better than I would.
[17:58:59] <divec> pavajana, aharoni: yes certainly
[17:59:36] <divec> There are various event handling issues blocking a number of IMEs from working, including jQuery.IME
[17:59:50] <divec> I'm working on these, e.g. https://gerrit.wikimedia.org/r/#/c/105231/ and https://gerrit.wikimedia.org/r/#/c/105172/
[18:00:00] <divec> It's obviously a top priority :-)
[18:00:37] <divec> That's about all I can explain in two minutes, but feel free to ask me more elsewhere!
[18:00:45] <arrbee> Thanks divec aharoni
[18:00:55] <alolita> pavanaja: please feel free to add your feed back to bugzilla where language support for VE is being tracked
[18:01:06] <arrbee> pavanaja: sorry I will have to cut you short. You can catch divec on #mediawiki-i18n
[18:01:19] <pavanaja> @alolita - doen it long ago
[18:01:45] <arrbee> To wrap up..
[18:01:57] <arrbee> If nothing changes, our next office hour will be on 12th February 2014
[18:02:03] <alolita> pavanaja: thanks!
[18:02:12] <arrbee> Our office hours are held every 2nd Wednesday of the month.
[18:02:16] <arrbee> 1700-1800 UTC
[18:02:34] <arrbee> I'll be posting the logs later tonight
[18:02:58] <arrbee> Thanks everyone! See you next month.
[18:02:59] <kart_> arrbee: thanks!
[18:03:01] <arrbee> === Wikimedia Language Engineering Office Hour - End ===