IRC office hours/Office hours 2011-06-16

[11:00am] StevenW: Hey folks
[11:01am] Fajro joined the chat room.
[11:01am] matanya: hi StevenW
[11:01am] ChristineM: howdee
[11:01am] jorm left the chat room. (Ping timeout: 276 seconds)
[11:01am] jorm_ is now known as jorm.
[11:01am] You were granted voice by ChanServ.
[11:01am] DarTar joined the chat room.
[11:01am] Eloquence was granted voice by ChanServ.
[11:02am] RoanKattouw_away is now known as RoanKattouw.
[11:02am] You were promoted to operator by ChanServ.
[11:02am] Theo10011: hoi
[11:02am] killiondude: Land ahoy.
[11:03am] StevenW: Hoi Theo
[11:03am] WereSpielChequer: Good evening Wikimedia!
[11:03am] Thehelpfulone: office hours soon?
[11:03am] Eloquence: hi folks
[11:03am] Eloquence: yep, let's get started
[11:03am] StevenW: Yeah
[11:03am] jorm: hey guys!
[11:03am] JoeGazz84: Let's start ;)
[11:03am] StevenW: So is everyone aware of the topic?
[11:03am] StevenW: Article feedback tool :)
[11:03am] JoeGazz84: Yes
[11:03am] StevenW: Awesome
[11:03am] Topic changed to "IRC office channel for Wikimedia Foundation | For next scheduled office hours, see http://meta.wikimedia.org/wiki/IRC_office_hours | Office Hours: Thursday June 16th, 18:00 UTC" by Thehelpfulone.
[11:03am] DarTar: http://www.mediawiki.org/wiki/Article_feedback
[11:04am] Eloquence: See http://www.mediawiki.org/wiki/Article_feedback for background, and http://en.wikipedia.org/wiki/5_centimeters for a live example
[11:04am] Topic changed to "IRC office channel for Wikimedia Foundation | For next scheduled office hours, see http://meta.wikimedia.org/wiki/IRC_office_hours | Office Hours: Thursday June 16th, 18:00 UTC - Topic: Article feedback tool" by Thehelpfulone.
[11:04am] Alpha_Quadrant joined the chat room.
[11:04am] Eloquence: Currently it's deployed to 100,000 articles on en.wp
[11:04am] Eloquence: We're planning to gradually ramp up through this month in order to allow for responding to its real world performance characteristics
[11:05am] Olipro joined the chat room.
[11:05am] shimgray: is the goal to roll it out on all articles?
[11:05am] Eloquence: yeah
[11:05am] matanya: and all wikis?
[11:05am] Eloquence: initially we're just doing en.wp and responding to bugzilla requests as they are made
[11:05am] closedmouth joined the chat room.
[11:05am] Eloquencethere's one to enable it on portuguese, as far as I've seen
[11:05am] jorm: Spanish, too.
[11:05am] killiondude: Does it still show up on redirects and disambig pages?
[11:05am] matanya: such as en.wikibooks?
[11:06am] jormWe: fixed a bug on that.
[11:06am] StevenW: killiondude: nope.
[11:06am] killiondude: Okay.
[11:06am] WereSpielChequer: What happened to the idea of testing whether it was encouraging or deterring editing before we took a decision on rollout?
[11:06am] shimgray: do we have any idea what the pageview:rating ratio is?
[11:06am] jorm: Redirects is fixed; disambiguation not yet.
[11:06am] DarTar: here's the public log of changes: http://www.mediawiki.org/wiki/Article_feedback/Log
[11:06am] Eloquence: killiondude, redirects are fixed, disambiguation pages require individual blacklisting of pages via a blacklist category in the current implementation.
[11:06am] shimgray: (I mean, 1,000 views per rating action, etc)
[11:06am] DarTar: shimgray, have you checked out the research page?
[11:06am] killiondude: Thanks, Erik.
[11:07am] DarTar: http://www.mediawiki.org/wiki/Article_feedback/Research
[11:07am] Eloquence: WereSpielChequer, let me take a crack at answering that
[11:07am] closedmouth left the chat room.
[11:07am] Eloquence: WereSpielChequer, our primary hypothesis is that AFT is an opportunity to do the opposite: to create a low-level engagement opportunity on the site, and then convert people who engage to do more.
[11:08am] Eloquence: the data we've seen so far is strongly supportive of that hypothesis.
[11:08am] DarTar: shimgray: the volume of ratings depends on the number of views an article gets, plus a number of other factors (such as its length), but you can look at these figures for a controlled sample of 380 articles: http://www.mediawiki.org/wiki/Article_feedback/Research#Decreased_rating_volume:_effects_of_the_expertise_checkbox
[11:08am] Eloquence: so, for example, 1) the number of rating actions on pages with AFT vastly outnumbers the number of edits, 2) the call-to-action data from calls like "Make an account" or "Take a survey" suggests very strong conversion into other actions.
[11:08am] RoanKattouw: Also, I am working on making all rating data publicly available as we speak
[11:08am] RoanKattouw: It should be done before the end of this office hours session
[11:08am] jorm: Hey Roan!
[11:09am] Eloquence: So, for example, on the call-to-action that we've implemented as a post-rating action to take a survey, we see 40% conversion. that's amazing.
[11:09am] shimgray: DarTar: awesome! hadn't realised we had more recent data than the initial tests
[11:09am] Eloquence: And we have a call-to-action implemented right now "Did you know you can edit this page?"
[11:09am] Thehelpfulone: question, I don't know if it's been asked yet but how did we decide which articles to add it to Eloquence?
[11:09am] Eloquence: It's not an ideal implementation for a number of reasons, but it's still getting low two digit activation right now.
[11:09am] howief: to be realistic, I don't anticipate the conversion rate will remain that high when thetool is rolled out to more articles, but it's still a significant number
[11:09am] DarTar: ...which is currently producing a 2.6%-3.2% conversions
[11:09am] jorm: Thehelpfulone: that's a good question with a boring answer.
[11:10am] effeietsanders: Eloquence: as a call to action, did you also consider pointing to the talkpage "Suggest how to improve it" (which might be lower barrier)
[11:10am] jorm: Originally, the articles chosen were all within the Public Policy Initiative.
[11:10am] Eloquence: effeietsanders, yes, we did.
[11:10am] jorm: after that, we had a handful of specialized articles that we added it to manually, chosen because we knew that they would undergo fundamental change within a short period
[11:10am] RoanKattouw: howief: Yeah, from operational experience I know we got more ratings/article on the PPI stuff than on the ones from the extended roll-out
[11:10am] Eloquence: effeietsanders, right now our thinking is that we don't want to encourage direct talk page response based on the signal/noise ratio we're seeing in survey responses
[11:10am] jorm: (like "True Grit" or "Arsenic Based Life")
[11:10am] DarTar: we actually don't have a way of tracking those edits that result from the call to action, but it would be a nice to have feature
[11:10am] howief: thanks RoanKattouw
[11:10am] WereSpielChequer: That was 40% to click on a survey - editing is less but if positive that would be good news
[11:11am] jorm: after that, for the 100,000 stage roll-out, they were randomly chosen.
[11:11am] Eloquence: i.e. while there are lots and lots of good and helpful survey responses,the free text fields have too many instances of "adsjfasdkfj" "WIKIPEDIA ROCKS" "JOSH IS GAY" etc.
[11:11am] jorm: which is why the tool appears on disambigs and so forth.
[11:11am] Eloquence: so, if we want to drive raters to comment, we probably need to build a collaborative filtering tool in order to triage useful comments
[11:11am] RoanKattouw: To elaborate on 'randomly chosen': it's based on page IDs, we selected 0,1,2,...,28,1000,1001,...,1028,1100,...,1128,etc
[11:12am] Eloquence: I'd like to encourage y'all, if you haven't, to take a look at the http://www.mediawiki.org/wiki/Article_feedback/Extended_review page
[11:12am] effeietsanders: yeah, makes sense :)
[11:12am] shimgray: RoanKattouw: in part, presumably, because the PPI articles all had a reasonably high traffic rate - currently it's on some pages with ~2-3 views per day, which probably sends the averages way down!
[11:12am] DarTar: shimgray: that's correct
[11:12am] RoanKattouw: Oh yeah that too probably
[11:12am] Thehelpfulone: jorm: and now we are going to go for all 3 million + articles - what about new articles that are just created, do we do it for those to?
[11:12am] RoanKattouw: I agree you need to look at page views too
[11:12am] DarTar: PPI are definitely not a random sample of enwiki articles
[11:12am] WereSpielChequer: If we encourage raters to comment we need to gear up to handle those comments. That would be difficult
[11:12am] Eloquence: The extended review page is a spec that Guillaume and I have worked on for a potential next generation iteration of article feedback to incorporate comments, rating classifications, rater metadata, etc., in order to make the ratings more useful.
[11:12am] jorm: well, if we go full roll-out, it will just be on all articles, including new ones.
[11:12am] shimgray: if the 100k are a representative sample, the "rating rate" should remain sort of the same from here on out, barring a change in the way people respond
[11:13am] jorm: now, i think we should apply a small filter based on size and age.
[11:13am] howief: shimgray: i think that's right
[11:13am] DarTar: if you look at the AFT dashboard you'll see what articles have a high volume of daily ratings
[11:13am] jorm: so that we don't have it appear on super-small stubs that have just been created.
[11:13am] jorm: (but it would appear on longer articles that are freshly created)
[11:13am] DarTar: shimgray: yes, 100K is representative of the whole enwiki
[11:13am] JoeGazz84: So, to get this straight, no tool to "review" survey comments is released yet, but one will be released in the future? Special Usergroup needed to access?
[11:13am] StevenW: https://secure.wikimedia.org/wikipedia/en/wiki/Special:ArticleFeedback is the dashboard
[11:14am] Eloquence: JoeGazz84, the current survey implementation was just a test, and we're likely to disable it.
[11:14am] Thehelpfulone: jorm: and if an article is deleted, where do the article ratings go? I mean if we were to undelete the article, would we undelete the ratings?
[11:14am] JoeGazz84: Okay
[11:14am] JoeGazz84: Thank you Eloquence
[11:14am] RoanKattouw: Thehelpfulone: I'll answer that
[11:14am] jorm: go roan.
[11:14am] Thehelpfulone: okay :)
[11:15am] RoanKattouw: Thehelpfulone: This is something we actually didn't consider in the implementation, shame on us. Ratings are attached to page IDs, so they don't come back when an article is undeleted. Maybe they should. They're also not garbage-collected though, I guess that's bad too
[11:16am] Ziko_ left the chat room. (Ping timeout: 252 seconds)
[11:16am] RoanKattouw: (By which I mean that if you delete an article, its ratings become inaccessible, but they're not actually deleted in the database, so we're stuck with those useless(?) ratings forever)
[11:16am] Eloquence: jorm, I'm not sure about a size filter per my response here http://www.mediawiki.org/wiki/Thread:Talk:Article_feedback/Please_exempt_redirects_and_short_articles/reply_(4)
[11:16am] Thehelpfulone: RoanKattouw: by page IDs do you mean page diffs - as when we undelete, don't the diffs stay the same?
[11:16am] StevenW: Yeah speaking of filters, I actually thought it was fun to see it appear on an article I'd just made.
[11:16am] effeietsanders: RoanKattouw: is there a way to "expire" ratings then?
[11:17am] RoanKattouw: Thehelpfulone: No, page IDs. Numerical identifiers for pages. Also known as 'curid's
[11:17am] Thehelpfulone: okay
[11:17am] shimgray: Eloquence: using it as a way to get non-editors involved in triaging new pages is an interesting opportunity, there
[11:17am] RoanKattouw: effeietsanders: Yes, there are notions of rating staleness, but the data sticks around
[11:17am] Thehelpfulone: Also, can we lock an article from being rated? Take http://en.wikipedia.org/wiki/Special:ArticleFeedback for example, Justin Bieber3.132.102.532.952.68 has a low rating because most people seem to dislike him, this may not necessarily be because of the actual content of the article, just the subject that it is about.
[11:17am] RoanKattouw: Every rating is counted towards averages etc unless there's a more recent rating by the same user
[11:18am] RoanKattouw: Thehelpfulone: We've implemented blacklist categories so you can blacklist disambig pages etc, but they could be (ab)used for Justin Bieber as well
[11:18am] shimgray: "user" is defined by IP?
[11:18am] RoanKattouw: Either username if logged in, or a cookie if logged out
[11:18am] effeietsanders: Thehelpfulone: why lock it from rating? You can also decide not to do anything with it :)
[11:18am] RoanKattouw: Because we didn't want IP contamination
[11:18am] shimgray: okay, good, so we don't have one rating for all of Thailand :-)
[11:18am] RoanKattouw: Exactly
[11:18am] Eloquence: fundamentally, this is the first time we're asking our readers to DO STUFF beyond having a little "edit" link on the pages (which many folks find daunting, or do not understand) -- and IMO that's a very big deal and a very important step for us to be taking. we should be thinking about how we can best build a funnel for helping everyone to make the best possible contribution they can to wikimedia projects.
[11:19am] Thehelpfulone: effeietsanders: so you can disable page rating on particular pages?
[11:19am] StevenW: But why do that when you can just ignore the results if you think it's not useful?
[11:19am] effeietsanders: Thehelpfulone: why would you want that? I mean, the fact that you disagree with a rating is a bit fuzzy reason to me :)
[11:19am] jorm: To be honest, the idea of disabling on specific pages strikes me as antithetical to the wikipedian ideal.
[11:19am] Thehelpfulone: okay
[11:20am] Eloquence: I think we'll need to analyze the data a bit more before we take such steps, anyway
[11:20am] Eloquence: Roan just made the data available in case people didn't catch that
[11:20am] effeietsanders: Eloquence: talking about analysis... who/when/where can you analyze?
[11:20am] RoanKattouw: Yeah data for the last week or so
[11:20am] StevenW: Show. me. the. data!
[11:20am] Eloquence: http://toolserver.org/~catrope/articlefeedback/ is the current rating data from the last week
[11:20am] RoanKattouw: I'm working on getting all data since March
[11:20am] DarTar: If you have other ideas about CTA you think we should experiment with, please drop us a line on the AFT talk page
[11:20am] Moonriddengirl: jorm: we lock articles temporarily to prevent vandalism. It sounds to me as though what Thehelpfulone is talking about is a very similar concept: deliberate abuse of the system.
[11:20am] Thehelpfulone: I'm just thinking if we have a featured article and some vandals - ED, GNAA, 4chan - who knows decides to simply attack the page by rating it low StevenW / effeietsanders - not that these ratings matter much, I'm just thinking of any potential abuse
[11:20am] RoanKattouw: For click-through rates on calls-to-action, see the clicktracking file. It's really small
[11:20am] Eloquence: So I'd love it if folks here could dig into e.g. the Justin Bieber data and see if we're getting useful value out of it
[11:21am] • RoanKattouw • considers writing a README file for these stats
[11:21am] Thehelpfulone: Moonriddengirl: exactly
[11:21am] Eloquence: I don't want to by any means imply that we're certain that all rating data is going to be useful, or even that 4 quantitative ratings per article are the right approach.
[11:21am] jorm: Ah, that makes more sense. I have gotten a lot of requests about "selective removal from pages", which is what I confused it with.
[11:22am] shimgray: also, we know from vandalism that it happens to the weirdest and most improbable pages. spurious ratings are likely to be a problem everywhere and not just the high-traffic pages - it might be better to figure out a way to identify the hallmarks of spurious reviews rather than simply closing off the bi targets
[11:22am] shimgray: *big
[11:22am] StevenW: Thehelpfulone: as an alternative, what about the ability to clear the ratings?
[11:22am] Eloquence: RoanKattouw, do you know if the AFT tables are already replicated to the toolserver, BTW? I'm assuming that's how you're pulling the data in the first place, right?
[11:22am] WereSpielChequer: the risk is that people will run a campaign to downrate someone they don't like. So for BLP reasons we would probably have the option to remove articles from the rating system
[11:22am] YairRand joined the chat room.
[11:22am] RoanKattouw: Eloquence: Let me check.
[11:22am] DarTar: Thehelpfulone: see - abuse and gaming is something you can control when you display these ratings, there's no need to restrict this upfront, we need to think of good algorithms to control gaming when we display them in an aggregate form
[11:22am] Thehelpfulone: StevenW: that could be a good idea, but what about in cases of attacks on an article, we don't want to remove the positive ratings as well as the negative ones...
[11:22am] killiondude: They said there is a blacklist category, WereSpielChequer :-)
[11:23am] RoanKattouw: Eloquence: No I dump them on the cluster and re-import them on the toolserver. Horrible hack
[11:23am] Eloquence: ah :)
[11:23am] RoanKattouw: Nope, no AFT tables on TS
[11:23am] shimgray: WereSpielChequer: but then, "flagging problems" on BLP articles is potentially something this is very useful for. "no, this article is not neutral", etc.
[11:23am] howief: it seems to me the solution to the vandalsim problem should be to figure out a way to prevent vandalism to begin with, if we can detect it
[11:23am] effeietsanders: RoanKattouw: , Eloquence : will the analysis in the end be built in and accessible on an ongoing basis to the community? (ideally as detailed as possible without releasing personally identifyable information)
[11:23am] RoanKattouw: Some of them would need to be censored, too
[11:23am] howief: e.g., consecutive ratings of 1,1,1,1
[11:23am] Thehelpfulone: DarTar: okay, so the system can be programmed such that if there is a sudden influx of negative or positive ratings it can do something with them (either wait to display them or something similar)
[11:23am] Eloquence: my sense is that we're going to be able to use AFT data to surface _potential_ problems and opportunities, but that it's never going to be a slam dunk because of issues like rating spamming or rating bias.
[11:23am] killiondude: It might be nice to have a MediaWiki page instead of a category. Ne'erdowells might figure out the category thingy. :-)
[11:23am] RoanKattouw: effeietsanders: That's what I am working on right this minute
[11:24am] RoanKattouw: Eloquence and howief et al really wanted it done before this session started so they could show it off, but I didn't quite make that
[11:24am] Ziko_ joined the chat room.
[11:24am] Eloquence: Roan will correct me if I'm wrong, but I believe right now it's doing a uniqueness filter on IP addresses, which is one way to mitigate rating spam
[11:24am] DarTar: Thehelpfulone: that's what I would do, after all the only place where ratings matter is where you display them or use them for ranking articles, flagging problematic articles etc.
[11:24am] Thehelpfulone: where are the ratings currently stored? On an editable page or in a database somewhere?
[11:25am] RoanKattouw: Eloquence: Ahhhh, no?
[11:25am] RoanKattouw: Uniqueness filter? I don't know what you're talking about
[11:25am] WereSpielChequer: Clearing the ratings would only work if you were sure an attack was over. Better in my view to make it a protection option that admins can apply
[11:25am] howief: there can only be one rating per IP per article at a given time
[11:25am] jorm: database.
[11:25am] RoanKattouw: Thehelpfulone: Database
[11:25am] Thehelpfulone: DarTar: is it possible to implement a way of reviewing the feedback on articles? i.e. put it in dare I say a "pending feedback" system or something similar
[11:25am] Thehelpfulone: okay
[11:26am] DarTar: Thehelpfulone: yes that was one of the plans we were considering
[11:26am] shimgray: as a quick note, if you think the ratings have been gamed by a single attack of spurious reviews, as currently designed you can probably clear them by rapid editing of the article
[11:27am] shimgray: if it's an ongoing problem, that won't help, but.
[11:27am] RoanKattouw: Not really
[11:27am] RoanKattouw: Ratings that go stale still count
[11:27am] DarTar: WereSpielChequer: I disagree, if we block ratings upfront we can actually hit potentially good faith raters, while we can process and filter ratings if we have evidence that they come from potential vandals or gamers
[11:27am] shimgray: (at least, that's my interpretation of the "30 edits average" thing
[11:27am] shimgray: ah, right, I get you
[11:28am] RoanKattouw: Or.. wait
[11:28am] RoanKattouw: I'm confused at my own code now
[11:28am] StevenW: Heh.
[11:28am] Thehelpfulone: DarTar: this processing/filtering of ratings would be added to a new user group? or an existing user group? or to all autoconfirmed accounts?
[11:28am] RoanKattouw: You're completely right
[11:29am] RoanKattouw: The average is over the last 30 revs
[11:29am] RoanKattouw: howief: I lied in yesterday's meeting, stale ratings DO NOT count
[11:29am] howief: stale ratings to not count toward the averages, correct
[11:29am] RoanKattouw: I guess I should expose the per-revision data in addition to the per-page data
[11:29am] RoanKattouw: I thought they did count but I was wrong
[11:30am] Eloquence: howief, are we documenting the staleness behavior on http://www.mediawiki.org/wiki/Article_feedback or related pages? if not, could you add a summary there?
[11:30am] Fenix2 joined the chat room.
[11:30am] DarTar: Thehelpfulone: it'll have to be part of the algorithm that processes the entire ratings to generate reports (as in the Dashboard), we can spot gaming if we look at suspicious patterns in the data we collect
[11:30am] WereSpielChequer: hence my suggestions on the talkpage - vandalisms and reverts shouldn't count towards the 30 edits
[11:30am] shimgray: Eloquence: it's in the FAQ, I think
[11:30am] Moonriddengirl: It seems best that they not count; an article can be completely rewritten from one day to the next.
[11:30am] howief: Eloquence: yes, we can add a summary
[11:30am] shimgray: "Average ratings for each article are calculated based on an arithmetic average of all the ratings submitted against the last 30 revisions of the article (i.e., all "unexpired" ratings)"
[11:30am] RoanKattouw: WereSpielChequer: In practice that's very difficult to implemet
[11:30am] shimgray: http://www.mediawiki.org/wiki/Article_feedback/Public_Policy_Pilot/FAQs#How_are_the_averages_calculated.3F
[11:30am] jorm: the term "stale" was only used in the first version.
[11:31am] Eloquence: thanks shim
[11:31am] jorm: we have deprecated that, and introduced "expired", which makes more sense.
[11:31am] DarTar: here's on expired ratings (!= stale): http://www.mediawiki.org/wiki/Article_feedback/FAQ#How_will_out-of-date_ratings_be_handled.3F
[11:31am] jorm: *expired* ratings *do not* count to the averages.
[11:31am] shimgray: excellent. so we do have an undocumented and hacky purge option if needed :-)
[11:32am] Fenix2: shimgray: that'd be awfully corrupt way to go about it
[11:32am] WereSpielChequer: If ignoring vandalisms and reverts is difficult how about altering the 30 limit per article?
[11:32am] Eloquence: I'd like to step back a bit in the time we have left and would love to get your ideas of where this could go
[11:32am] Eloquence: for one thing, in the article feedback tool we have the option to implement arbitrary "calls to action"
[11:32am] Eloquence: we can essentially very easily have any text/invitation show up after a user submits a rating
[11:32am] StevenW: Like the survey ask, or ask to edit, etc.
[11:32am] Eloquence: and we can very easily measure how many users complete that action.
[11:33am] jorm: I'd like to see it help create a "to-do" list for the article. "Needs more photos."
[11:33am] Eloquence: are there other calls to action that would make sense for a reader? we've tested 1) create account, 2) take survey, 3) edit
[11:33am] WereSpielChequer: If the call to action is "so fix it" that's good.
[11:33am] DarTar: WereSpielChequer: we are going to experiment with different thresholds (both on the AFT ratings summary and in the Dashboard), the current criteria of min 10 ratings/24h in the Dashboard, for example, are producing results that are not tremendously useful
[11:33am] effeietsanders: Eloquence: follow a workshop nearby
[11:33am] Eloquence: WereSpielChequer, we've also considered connecting calls to action to the actual ratings -- i.e. if you give low ratings, have a more obvious Template:Sofixit invitations
[11:34am] Thehelpfulone: ok that sounds good - jorm's idea is a good one, but the survey shouldn't be too long - just a simple to-do not what country? what's your age? sex? etc
[11:34am] StevenW: You mean like an IRL event effeietsanders?
[11:34am] effeietsanders: not sure if that would scale through the world, but if you could make it geotargeted...
[11:34am] effeietsanders: StevenW: yes
[11:34am] jorm: 140 characters!
[11:34am] Thehelpfulone: or less ;)
[11:34am] Eloquence: effeietsanders, geo-information would definitely be interesting, especially where available
[11:34am] StevenW: What about join/view/something the related WikiProject?
[11:34am] Eloquence: one thing to keep in mind when thinking about these calls
[11:34am] killiondude: "You might be interested in these articles:"
[11:34am] Eloquence: is that the stuff that tends to work well is stuff that connects directly to the mindset of the rater
[11:34am] jorm: to be honest, i don't think that collecting the demographics of people who rate the articles is going to give us a lot of value in the long term.
[11:35am] WereSpielChequer: Have we tested how much this slows page loading for readers?
[11:35am] effeietsanders: Eloquence: I was thinking along the line of "learn how to improve it myself"
[11:35am] jorm: so yeah, we should just be asking "what does this article need"?
[11:35am] howief: jorm: it would be if we had something to compare the data to
[11:35am] StevenW: Yeah, that is good killiondude. Suggested similar articles.
[11:35am] jorm: Roan, you want to take the speed question.
[11:35am] howief: (e.g., demographics of readers)
[11:35am] shimgray: Eloquence: would "report problem" be workable as a call to action?
[11:35am] Eloquence: so, the mindset of the rater is "I have an opinion on this article". we'd probably get highest activation on "add a comment" (with the aforementioned signal/noise issues).
[11:36am] RoanKattouw: jorm: Which one?
[11:36am] shimgray: if you rated significantly low, a little tickbox saying "why" and an automated note dropped on the talkpage or something
[11:36am] jorm: WereSpielChequer's "slow loading" question above.
[11:36am] Eloquence: shimgray, I think so. per http://upload.wikimedia.org/wikipedia/commons/e/e7/Article_feedback_extended_review_-_Review_list.svg for a triaging interface, I think it would definitely make sense to have a category/flag for abuse reports.
[11:37am] Thehelpfulone: shimgray: yes, that could be done with some javascript or something similar
[11:37am] DarTar: re: effects on page loading time, the tool is not actually loaded until it's visible on the screen, correct RoanKattouw?
[11:37am] RoanKattouw: WereSpielChequer: We took steps to minimize that. First, the widget is only 'drawn' after the page has loaded. And even then, we don't load data into it until you scroll it into the visible range
[11:37am] effeietsanders: Eloquence: add a photo?
[11:37am] effeietsanders: (tricky stretch perhaps)
[11:37am] shimgray: (including a "this should not be here" option, which would log it somewhere for people to skim through for deletion listings)
[11:38am] Eloquence: effeietsanders, yeah, we could at least do an activation test on that, although the experience is still pretty horrible
[11:38am] shimgray: Eloquence: yeah, the flags there are nice.
[11:38am] effeietsanders: Eloquence: we've had pretty decent experiences with WikiPortrait even though it has a horrible user interface
[11:38am] WereSpielChequer: That's reassuring re speed. I'm on a UK broadband so I get reasonable access. but I was talking yesterday to an editor in Thailand and WP is very slow there
[11:39am] effeietsanders: it does give quite some noise though - so you need to build in a filter
[11:39am] Eloquence: if we want to think out of the box a bit, we could try to evaluate categories on the article -- needs reference, POV dispute etc., for targeted callsto fix
[11:39am] shimgray: Eloquence: "This article is currently marked as needing work for A, B, C. Do you think any of these are okay now?"
[11:39am] WereSpielChequer: With the call to action could we ask people to give a description for photos without alt text?
[11:40am] Eloquence: RoanKattouw, sorry, did you have a chance to poke at the toolserver status / did I miss your response on that?
[11:40am] Fenix2: it's good that wikipedia doesn't collect and analyze link transit data..
[11:40am] Fenix2: that'd be unfair
[11:40am] RoanKattouw: Eloquence: TS status?
[11:40am] Eloquence: RoanKattouw, whether the tables are replicated
[11:40am] RoanKattouw: They're not
[11:40am] James_F|Away left the chat room. (Read error: Connection reset by peer)
[11:42am] Eloquence: ok
[11:42am] RoanKattouw: Some could easily be
[11:42am] Eloquence: we'll prod river/daniel about that
[11:42am] RoanKattouw: Like the aggregate tables for pages and revs
[11:42am] RoanKattouw: Yeah file a JIRA ticket
[11:42am] shimgray: since we have the known problem of cleanup tags hanging around a long time after the problem's been solved and no-one ever thinking to fix them
[11:42am] RoanKattouw: The main table needs censoring
[11:42am] RoanKattouw: For private data (IPs and user names)
[11:42am] Eloquence: right
[11:42am] shimgray: one other issue that would be very interesting to know is the opinion of BLP subjects on quality of their articles - from OTRS correspondence about specific issues, they often rank an article relatively highly when from our perspective it's not very good (referencing problems, etc)
[11:42am] Eloquence: the raw dumps will be a great start for anyone wanting to dig into the data, but the toolserver availability will allow for creation of real-time tools to aid page patrollers etc.
[11:42am] shimgray: but I don't think we can easily gather that data without people being silly with it :-)
[11:42am] Fajro: Fenix2: I think Wikimedia should collect and analyze data, but as a new separate project.
[11:42am] DarTar: incidentally, we're planning to have AFT data accessible via the toolserver so as to allow devs to build cool things and useful tools for the community
[11:42am] James_F|Busy joined the chat room.
[11:42am] DarTar: I have some scripts that track the evolution of an article in length, number of references, number of "citation needed"-like templates and ratings and I'll make them available on the toolserver
[11:42am] WereSpielChequer: Will blocked IPs be blocked from rating etc?
[11:43am] Eloquence: shimgray, yeah, in the http://www.mediawiki.org/wiki/Article_feedback/Extended_review we're suggesting to have an "I am the subject of this article" rater metadata field for that.
[11:43am] RoanKattouw: WereSpielChequer: They're currently not, but they could easily be
[11:43am] RoanKattouw: Ditto for blocked users
[11:43am] shimgray: Eloquence: tough to stop it being gamed, though! but for low-traffic articles with no signs of dodgly rating behaviour I guess we could trust it.
[11:43am] shimgray: oh, wait, it's a different system. ignore me :-)
[11:44am] WereSpielChequer: Seems like a sensible precaution for this simply not to appear to blocked users
[11:44am] effeietsanders: RoanKattouw: is there some kind of a filter "hey, you did three ratings last minute - cool down fellow"? :P
[11:44am] RoanKattouw: [ANNOUNCE] OK so I've got full statistics up now. Weekly .csv.gz files starting Mar 14 ending Jun 13
[11:44am] RoanKattouw: http://toolserver.org/~catrope/articlefeedback/
[11:44am] Eloquence: RoanKattouw, \o/
[11:44am] RoanKattouw: effeietsanders: No. But if you rate the same page 3x nothing happens
[11:44am] StevenW: So in the time we have left I think howief has a question.
[11:44am] RoanKattouw: Your new ratings overwrite the old ones
[11:44am] effeietsanders: RoanKattouw: i mean different pages :)
[11:44am] howief: I'd lik e to spend some time to talk about the usefulness of this information for our editors
[11:44am] RoanKattouw: No, no throttle on that
[11:45am] howief: What types of feedback from readers do you guys think would be more useful?
[11:45am] shimgray: completeness
[11:45am] howief: E.g., would be helpful in the further development of articles?
[11:45am] shimgray: "did this answer my question, are there holes in it, etc"
[11:45am] • Logan_ • waves to everyone
[11:45am] howief: shimgray: so input from readers on, for example, what's missing in the article
[11:46am] effeietsanders: Yes, missing sounds great
[11:46am] shimgray: howief: bingo. articles written by nonspecialists often end up with odd gaps which you simply don't notice unless you know the topic or unless you're looking for them
[11:46am] effeietsanders: it is hard to find holes
[11:46am] DarTar: RoanKattouw: good job with the dumps
[11:46am] howief: got it
[11:46am] howief: what else would be helpful?
[11:47am] effeietsanders: I would like to know if the language was understandable
[11:47am] effeietsanders: (although I realize we should very much limit the number of questions)
[11:47am] howief: effeietsanders: what type of input around language would be meaningful?
[11:47am] RoanKattouw: I see where you're going with that, i.e. every physics article past the first section
[11:47am] effeietsanders: howief: especially if a non-native can understand it
[11:48am] effeietsanders: or indeed the famous mathematical articles
[11:48am] WereSpielChequer: What academic level do you think this has been written for?
[11:48am] DarTar: those of you interested in the self-identified expertise of raters, check out this page: http://www.mediawiki.org/wiki/Article_feedback/Research/Rater_expertise
[11:48am] effeietsanders: WereSpielChequer: that is a tough one if you want to scale it world wide
[11:48am] Fajro: DAE dislikes the stars for feedback? Sometimes the answers of the people are "I don't know" or "I'm not sure". You cant say that with the stars.
[11:49am] effeietsanders: it is easier to make it personal "was it too hard for you"
[11:49am] Eloquence: There was a request to add at least labels when you mouse over to the stars, like netflix does, which I think is a great idea.
[11:50am] Eloquence: i.e. when you rate a movie on netflix, it says "Really didn't like it" when you mouse over the first star, "Really loved it" when you mouse over the fifth star
[11:50am] shimgray: effeietsanders: perhaps "Was the writing in this article too simple or too complex? [high] [low] [just fine]"
[11:50am] Eloquence: that helps to contextualize/baseline
[11:50am] DarTar: beside that, you can skip a rating dimension if you are not comfortable with that
[11:50am] effeietsanders: shimgray: yeah, I'll leave the wording to the experts, but something along those lines
[11:50am] shimgray: Eloquence: yes! because one person's three is average, and one person's three is "fairly bad but not given up on it yet"
[11:50am] shimgray: that would be great.
[11:50am] WereSpielChequer: Graduate and High school are widely understood and fairly universal. Too hard for you would get a very different response from an 8 year old and 28 year old with a relevant Phd
[11:51am] effeietsanders: WereSpielChequer: I would already be confused by "graduate"
[11:51am] effeietsanders: and high school is even within the Netherlands widely varied
[11:51am] shimgray: WereSpielChequer: sure, but if we get a mix of "too hard, too easy", it's probably fine. if we get all "too hard", odds are it's not just eight-year-olds.
[11:52am] StevenW: Right
[11:52am] Beria left the chat room. (Read error: Connection reset by peer)
[11:52am] WereSpielChequer: High School, University degree, relevant university degree?
[11:53am] effeietsanders: My guess would be in NL that the difference between two types of high school is bigger than between high school and university :P
[11:53am] Beria joined the chat room.
[11:53am] effeietsanders: (when it comes to reading skills)
[11:53am] RoanKattouw: Yes, probably
[11:53am] RoanKattouw: We have a three-tier high school system
[11:54am] Eloquence: btw, there are lots of other rating systems we've looked and and that we can learn from-- TED has an interesting model where the rater has to allocate points to dimensions like "insightful", "funny" etc.
[11:54am] effeietsanders: I don't want to go into details there, but I can imagine every country has its specifics :)
[11:54am] effeietsanders: but Eloquence , how does Khan Academy do this?
[11:54am] effeietsanders: because they are perhaps a good role model for information we want?
[11:54am] YairRand left the chat room. (Quit: Page closed)
[11:54am] Eloquence: effeietsanders, not very familiar with it
[11:54am] Fajro: I prefer something like TEDs tags
[11:55am] howief: would the TED tags be useful for editors?
[11:55am] DarTar: in the future we may also want to complement ratings-based metrics with automatically generated metrics, like readability index or other metrics extracted from the content of the article
[11:55am] Eloquence: the challenge with TEDs vs. quantitative ratings is that change-over-time comparisons get quite a bit harder. the advantage is that you can surface predominant characteristics of an article with higher specificity.
[11:56am] StevenW: I feel like the TED tags are great for videos, but too fuzzy for encyclopedia articles. Personally.
[11:56am] Fajro: howie: we just should use different tags.
[11:56am] effeietsanders: Ah, Khan uses Youtube style up/down
[11:56am] DarTar: yeah TED-like tags make only sense if you can use them to discover topics
[11:56am] effeietsanders: but for comments
[11:56am] Fenix2 left the chat room.
[11:56am] WereSpielChequer: A database report of the thousand least readable articles would be useful
[11:57am] Eloquence: you could have a number of quality tags mirroring the ones used for templating articles, e.g. "needs citations", "needs a picture", "too technical" etc.
[11:57am] Eloquence: which might be more manageable than sifting through free-text comments to surface that specificity.
[11:57am] StevenW: True.
[11:57am] You left the chat by being disconnected from the server.
[11:58am] calvino.freenode.net: [freenode-info] if you're at a conference and other people are having trouble connecting, please mention it to staff: http://freenode.net/faq.shtml#gettinghelp
[11:59am] howief: WereSpielChequer: yes, we should include something like that in the dashboard
[11:59am] Eloquence: we're nearing the end of this discussion, but let's continue it on mediawiki.org asynchronously
[11:59am] WereSpielChequer: We already have a problem with too much tagging for our editors to keep up with, do we really want to encourage more?
[11:59am] effeietsanders: Eloquence: just a wild thought about the call for action alternatives... why not a "did you know" type? "did you know you can edit this page" "did you know this was written by volunteers" etc
[11:59am] effeietsanders: and then "more info"
[11:59am] Fajro: too much tagging?
[11:59am] Eloquence: just folks please keep in mind that we're continuing to try to learn what the best/most effective tools are, and we really need folks to dig into the data that we've made available to-date to help us surface what can/can't be useful in what we have today.
[12:00pm] RoanKattouw is now known as RoanKattouw_away.
[12:00pm] StevenW: Thanks for coming everyone.
[12:00pm] Eloquence: so, again, the current raw data is at http://toolserver.org/~catrope/articlefeedback/
[12:00pm] StevenW: This has been fruitful.
[12:00pm] killiondude: fotl.
[12:00pm] StevenW: :)
[12:00pm] Eloquence: and we're going to ask for full toolserver replication for toolserver hackers
[12:01pm] Eloquence: thanks all for your comments - lots of meat
[12:01pm] effeietsanders: now we just need to grow some bones to support it :P
[12:01pm] Eloquence: :-)
[12:01pm] Eloquence: y'all have a good day/night.
[12:01pm] StevenW: Adios