IRC office hours/Office hours 2015-05-05
Log
editTime: 14:30-15:30 UTC
Channel: #wikimedia-office
Timestamps are in IST.
20:00 arrbee: #startmeeting Language Engineering monthly office hour - May 2015
20:00 arrbee: hmm.. no meetbot
20:00 arrbee: anyways
20:01 arrbee: Hello and welcome to the monthly office hour of the Wikimedia Language Engineering team
20:01 arrbee is Runa
20:01 Niharika: Hi arrbee.
20:01 arrbee: with me today are my team mates aharoni santhosh pginer Nikerabbit
20:01 arrbee: kart_ would be joining soon
20:01 arrbee: hey Niharika
20:02 Niharika: Congratulations to the team for the awesome work on CX project. \o/
20:02 arrbee: Before we begin, please note that this chat will be publicly logged
20:02 arrbee: Niharika: Thank you :D
20:02 aharoni: Haumьhьƣьđ
20:03 arrbee waits to see if aharoni has a special greeting for today
20:03 aharoni: Of course I do.
20:03 arrbee: :D
20:03 arrbee: So our last office hour was on February 18th
20:03 aharoni: That's "hello" in Bashkir (the old https://en.wikipedia.org/wiki/Ya%C3%B1alif orthograhy).
20:04 arrbee: logs are at: https://meta.wikimedia.org/wiki/IRC_office_hours/Office_hours_2015-02-18
20:04 arrbee: (in case you missed them)
20:04 froskos: hello everybody!
20:04 arrbee: hello froskos
20:05 arrbee waves around to people who I am assuming are here for the office hour today - bachounda TarLocesilion :)
20:05 TarLocesilion: hey
20:06 arrbee: I hope this time is more convenient than what we were using before
20:06 arrbee: So we could not follow up with the March and April meetings due to too many things
20:06 arrbee: We also had our quarterly review
20:07 arrbee: http://meta.wikimedia.org/wiki/WMF_Metrics_and_activities_meetings/Quarterly_reviews/Editing,_Collaboration_and_Language_Engineering,_April_2015
20:07 arrbee: You can find the slides and minutes on the page linked above
20:08 arrbee: Also, if you haven't seen the announcement already, the Language Engineering team is now part of the bigger Editing team in WMF
20:08 arrbee: More details here on the FAQ: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Engineering_reorganization_FAQ
20:09 arrbee: At the moment there is no major changes in the way we are working
20:09 arrbee: We are still available on the same IRC channels, mailing lists, talk pages, phabricator etc.
20:09 arrbee: Our main focus for this month and the next is still Content Translation
20:10 kart_: hello.
20:10 arrbee: hi kart_
20:10 arrbee: n our last published blog post, we had posted some figures in terms of how Content Translation was being adopted
20:11 arrbee: here is the link http://blog.wikimedia.org/2015/04/08/the-new-content-translation-tool/
20:11 arrbee: Since then we have observed an exponential growth in terms of users adopting the tool to publish articles
20:12 arrbee: For instance, between January to March around 700 articles were published
20:12 arrbee: and from April 1 to today its more than 1500
20:12 arrbee: which is double of what we had seen in the first 3 months
20:13 arrbee: Its quite overwhelming for us and we are constantly exploring new ideas on how this can be made more efficient and reach more people at the same time
20:14 arrbee: Content Translation aka CX is now deployment on 44 Wikipedias... as an opt-in beta-feature
20:14 arrbee: And like with most things, we are also getting new bugs or usage issues being reported
20:15 arrbee: for instance, recently we have seen quite a few problems related to publishing or saving of articles
20:15 arrbee: The reasons have been varied but the reports have been very helpful in finding the causes
20:16 arrbee: Some of these errors are still being investigated
20:16 arrbee: santhosh: would you like to add more here?
20:17 santhosh_: nope
20:17 arrbee: okay
20:17 aharoni: I shall add that we improved our logging, so we can now investigate the failures better.
20:17 hasharAway: legoktm: no ci meeting indeed sorry :
20:18 aharoni: and we are thankful for each report from our users.
20:18 arrbee: Generally we get to know about these errors through comments on the talk page and phabricator
20:18 hashar: legoktm: I had an appointment. I screwed up and should have sent the ci meeting minutes ages ago as well as a remember message that today meeting was cancelled
20:18 hashar: legoktm: sorry you potentially had to wake up earlier :/
20:18 arrbee: And yes, its really very helpful like aharoni said
20:19 arrbee wonders if anyone is here who faced some of these publishing errors
20:19 kart_: hashar: welcome to the Language Engineering Office hour! :)
20:21 hashar: kart_: sorry :/
20:21 arrbee: okay, so earlier during the day there were some queries coming in from the Polish Wikipedia community
20:21 arrbee: hashar: lol.. no worries
20:21 TarLocesilion: yep, we have some ideas of improvement
20:22 arrbee: TarLocesilion: yay.. please go ahead
20:22 aharoni: TarLocesilion: you may be happy to hear that we fixed the <Ś> letter issue earlier today ;)
20:23 arrbee: for context
20:23 TarLocesilion: aharoni: yes, I know :D
20:23 arrbee: The Alt+s shortcut currently gets in the way of typing the character ś
20:23 TarLocesilion: https://phabricator.wikimedia.org/T98126
20:23 TarLocesilion: and
20:23 TarLocesilion: https://phabricator.wikimedia.org/T98153
20:23 arrbee clicks
20:23 arrbee: the first is : Add "Translated page" template
20:24 TarLocesilion: first, many wikis have templates which inform users: "this page contains translation"
20:24 TarLocesilion: it'd be really, really nice if CX added such templates automatically.
20:25 arrbee: santhosh: aharoni ^^
20:25 TarLocesilion: such templates help power users to control all pages translated from a specific language
20:27 TarLocesilion: and ofc, are important for those readers who dare to read talkpages. #copyright etc.
20:27 aharoni: TarLocesilion: it's probably similar to https://phabricator.wikimedia.org/T96935
20:27 aharoni: copyright is handled through the edit summary (it was actually approved by the legal team ;) )
20:27 aharoni: but I guess that it's OK to add the template, too
20:28 Pavanaja: When will have any dictionary or corpus or support for Content Translation tool?
20:28 aharoni: A much better solution would be to have true page metadata, something that a lot of projects have been requesting for a long time, but till then we can probably add templates.
20:29 TarLocesilion: aharoni: yep, but you know, info only in edit summary is kinda minimalism ;)
20:29 TarLocesilion: true.
20:29 santhosh_: Content Translation has dictionary support. But for every language pair, it is difficult to find a free licensed good quality dictionary. If we find one, we would be happy to add.
20:29 aharoni: Another issue is that such templates work differently in different languages, but we can try to cope with that, too. (I'd love to see true cross-project collaboration between the editors in different languages here :) )
20:30 aharoni: Pavanaja: as santhosh_ says, the technical side is already implemented, but we need the data for every language.
20:30 santhosh_: We have it for es-ca , ca-es, ca-en pairs
20:30 arrbee: aharoni: okay.. i am kind of trolling here about this template so please feel free to shoo me away... what happens if an article with that template is translated through CX?
20:30 santhosh_: I mean we have dictionary support for above pairs
20:30 aharoni: arrbee: it's intended for the talk page, IIUC
20:30 arrbee: ahh ok
20:31 arrbee: yes, now I recall TarLocesilion saying so
20:31 aharoni: If I'm not mistaken, we have a dictionary for Spanish-Portuguese, too.
20:31 aharoni: In any case, we need the data.
20:32 aharoni: Pavanaja: if you have any connections with academic institutions or other projects that provide dictionary files, we'd love to get in touch with them and integrate them.
20:32 santhosh_: We can probably create a provision to add such template, but must be opt-in and configurable per wiki. We have a provision to add a category if we see too much machine translation.
20:32 santhosh_: Currently not used in any wiki
20:33 arrbee: Pavanaja: We saw the updates about Konkani Wikipedia. Do you think having CX in there would be helpful? Are they translating the articles from any other languages?
20:33 arrbee: TarLocesilion: We still have one more phab ticket of yours
20:34 TarLocesilion: yes, https://phabricator.wikimedia.org/T98153 -- a hidden category
20:35 kart_: TarLocesilion: contenttranslation tag may be useful too :)
20:35 kart_: which will show all articles created by CX
20:35 TarLocesilion: kart_ gotya. no.
20:35 TarLocesilion: not all. only recent.
20:36 kart_: TarLocesilion: oh that's true. thanks for correcting.
20:36 TarLocesilion: and that's very important.
20:37 TarLocesilion: because when you have a lot of translations (and apparently that's the upcoming fact on many wikis -- see stats) you can find only few articles translated with CX.
20:37 arrbee: santhosh_: thoughts?
20:38 santhosh_: There are two ways: The tag filter page for example https://ca.wikipedia.org/w/index.php?title=Especial:Canvis_recents&tagfilter=contenttranslation allows you to choose how many to see and days
20:38 TarLocesilion: and here comes the traditional question asked by established communities who forgot the rocket period -- "what about quality"?
20:38 santhosh_: Second way is, we provide an API to list all published articles across any languages with more details
20:39 santhosh_: That is documented at https://www.mediawiki.org/wiki/Content_translation/Published_translations
20:39 kart_: TarLocesilion: Tricky question, isn't it? :)
20:39 TarLocesilion: that's why we care about ALL input by newbies.
20:40 TarLocesilion: and it's not hard to imagine how easily CX may be used to autotranslate random pieces of content.
20:40 santhosh_: https://www.mediawiki.org/wiki/Content_translation/Abuse_prevention has notes about approaches we do about quality
20:41 TarLocesilion: and that's also great.
20:41 santhosh_: From our analytics so far, such abuse or low quality articles created and then deleted are relatively too small
20:42 aharoni: TarLocesilion: To be more precise: I checked three days ago, and of 2000 articles that were created using ContentTranslation, only about 60 were deleted as "bad translation" or "vandalism".
20:42 aharoni: (In all languages.)
20:43 matanya: that is a very good rate
20:44 TarLocesilion: it highly depends on inclusionism/deletionism. my community doesn't only aim to control "new articles to be speedy deleted", but all of them.
20:44 TarLocesilion: 60 - yep, it's great.
20:45 arrbee: TarLocesilion: you mean delete anything - old or new, that looks sub-standard?
20:45 TarLocesilion: delete. move to sandbox. etc.
20:45 arrbee: okay
20:47 arrbee: Essentially, in terms of vandalism CX doesn't add any capability that is different or more than how a new article can be created
20:47 arrbee: but yeah, these are valid concerns about how individual communities operate
20:47 aharoni: TarLocesilion: well, that makes sense - it can happen in any language that a bad article hides in the corner for years :)
20:48 TarLocesilion: we are quite similar to dewiki in that point. for instance, eswiki allows users to create articles without sources, or only with a list of sources.
20:48 Pavanaja: @arrbee: I am not aware of Konkani people translating from other language Wikipedia. I guess their articles are based on the Konkani Encyclopaedia released under CC by Goa University
20:48 arrbee recently heard about a fictitous article on enwiki about some event that never happened and it went on to become a featured article
20:48 arrbee: I need to find out about that
20:49 arrbee: Pavanaja: oh ok,
20:50 Pavanaja: @aharoni: I am a member of the Kannada Software Commitee of Govt of Karnataka. Creating a corpus is one of our agendas. We can definitely work together
20:50 bachounda: hello again
20:50 TarLocesilion: aharoni: I'm in a difficult position right now, because I'm here to show the state of a restrictive community with very restrictive barriers for article creation.
20:50 kart_: arrbee: War happen in Goa that never happened :)
20:51 arrbee: Pavanaja: CX is currently available on Kannada, Gujarati and Punjabi Wikipedia. In case you can come across information related to resources for these languages we would really appreciate it.
20:51 Pavanaja: @arrbee: I am aware of the availability of CX for KN WP
20:52 arrbee: Pavanaja: Any feedback for us? :)
20:52 kart_: Pavanaja: also, try CX :)
20:52 aharoni: Pavanaja: awesome
20:53 Pavanaja: CX currently works only for totally new article creation by translation. There are many articles in KN WP which are more like stubs, having the same article in EN WP. But we can't use CX to improve those articles
20:53 arrbee: lol
20:53 arrbee: yes, thats a feature we all want to see soon
20:53 arrbee: but its not planned yet
20:53 santhosh_: Pavanaja: technically you can. Currently you can overwrite that one line articles by big articles
20:54 aharoni: TarLocesilion: I imagine - I am very much a Wikipedian myself, with a lot of experience in English, Hebrew, and Russian, and I know about the hard control of new articles. You know, you can create bad new articles without ContentTranslation :)
20:54 santhosh_: CX gives a warning, but that does not stop
20:54 froskos_: yep , it suggests changing the name of the article
20:54 aharoni: If anything, I think that creating good new articles is easier and more likely with ContentTransltaion.
20:54 aharoni: That's at least what we see from the data from other languages.
20:55 arrbee: Quick timecheck.. we have about 5 mins left of the hour
20:56 arrbee: TarLocesilion: We will keep updating you through the phab tickets. Thanks a lot for filing them. It really helps us!!
20:57 bachounda: aharoni: hi i search farmer
20:57 arrbee: lets wrap up from here now, I am not sure if there is any other meeting scheduled on the channel :)
20:57 aharoni: bachounda: I remember you! :)
20:58 arrbee: Thanks a lot everyone for coming today.
20:58 bachounda: aharoni: bachounda from algeria wikimania london
20:58 kart_: good night!
20:58 arrbee: Our next office hour is planned for June 10th, same time and same place
20:59 arrbee: #endmeeting