IRC office hours/Office hours 2012-02-02

<poem style="font-family:monospace,Courier;background:#F2F2F2"> [21:07] <Ironholds> hey SteveMobile, halfak, tommorris, mabdul|dog :) [21:07] <Ironholds> so, we're going to talk about the data analysis from all the hand-coding and surveys [21:07] <tommorris> for those who didn't see, this is the BEST thing I've seen thanks to the AFT: https://gist.github.com/1724508 [21:07] <Ironholds> and then, if we have time, we'll look at the feedback page :) [21:07] <Ironholds> fabriceflorin_, do you want to open? [21:08] * tommorris suffered a mortal copy-paste fail on Github [21:08] * mabdul|dog is now known as mabdul|busy [21:08] <mabdul|busy> but still here XD [21:08] <fabriceflorin_> Hi everyone! today, we would like to share some PRELIMINARY findings of our work so far on Article Feeback v5. [21:09] <fabriceflorin_> We have prepared a first set of slides, as well as an interim report about key findings so far. [21:09] <SteveMobile> Ok [21:09] <fabriceflorin_> I want to stress that our research work is not complete, so these are PRELIMINARY findings only, and more data will keep coming in coming weeks. [21:09] * howief (~howiefung@216.38.130.165) has joined #wikimedia-office [21:10] <Ironholds> and I want to stress how incredibly grateful we are to all of you for your hard work :) [21:10] <fabriceflorin_> But there is enough there already for us to think about some of these early findings and discuss their implications on our work. [21:10] <Ironholds> except you, tommorris. you haven't finished hand-coding. BAD. [21:10] * Utar (50fa04ac@gateway/web/freenode/ip.80.250.4.172) has joined #wikimedia-office [21:10] <Ironholds> Utar! Great to have you here :) [21:10] <tommorris> Yes, i've been a bit busy with a little thing called life. [21:10] <fabriceflorin_> Why, thank you Ironholds ;o) [21:10] <Ironholds> we're just starting discussing the research results [21:10] <Ironholds> tommorris: I am not familiar with that term [21:10] <fabriceflorin_> Here is a link to our PRELIMINARY slides: http://commons.wikimedia.org/wiki/File:Article-Feedback-Report-Slides-02-02-PRELIMINARY.pdf [21:10] <howief> hey everyone [21:10] <howief> sorry i'm late [21:10] <fabriceflorin_> Hey Howie! [21:11] <SteveMobile> Oh so he is howie [21:11] <howief> hey! [21:11] <fabriceflorin_> Hello SteveMobile. [21:11] * SteveMobile waves to howief [21:11] <Utar> Ironholds: I was reading through Wikimania text and just found your text. [21:11] <howief> hey SteveMobile [21:11] <SteveMobile> Ironholds told me a bit about you :p [21:11] <howief> uh oh [21:11] <fabriceflorin_> If you turn to page, 2, here's an overview of our goals for this project. [21:12] <SteveMobile> :p [21:12] <fabriceflorin_> We have 2 primary goals: 1) engage readers to contribute more and 2) help editors improve articles. [21:12] <mabdul|busy> fabriceflorin_: you want only blame Ironholds ^^ [21:12] <Ironholds> mabdul|busy: in my experience, most things are my fault [21:12] <mabdul|busy> +1 [21:12] <mabdul|busy> ^^ [21:12] <Ironholds> maybe we should have that as a WMF rule? "something went wrong" "oliver did that thing" [21:13] <fabriceflorin_> These two goals have led us to try inviting readers to provide article feedback, and try 3 different forms, and research them together. Today, we can look as some of the first results of our investigation. [21:13] <SteveMobile> Lol [21:13] <SteveMobile> WP:Ironholds [21:13] <fabriceflorin_> If you turn to page 3, some highlights give you a sense of what we've done so far, and what we plan to do next. [21:14] <fabriceflorin_> Any questions so far? [21:14] <fabriceflorin_> Just wanted to set the context before getting started ;o) [21:14] <mabdul|busy> SteveMobile: redirecting to WP:5EVIL? [21:14] <fabriceflorin_> So let's go to page 5 for a look at overall findings: http://commons.wikimedia.org/w/index.php?title=File:Article-Feedback-Report-Slides-02-02-PRELIMINARY.pdf&page=5 [21:14] <Narodnik> peer review/assessment of quality is not a goal in itself, it appears [21:15] <SteveMobile> Hm [21:15] <mabdul|busy> fabriceflorin_: is 30k much? [21:15] <SteveMobile> Looks good [21:15] <fabriceflorin_> We have collected over 30,000 feedback posts so far, and about 73% of them had comments. [21:15] <halfak> That 45% figure isn't "at least two" but rather "all editor". It just so happens we had two editors look. [21:15] <halfak> *all editors [21:15] <howief> Narodnik: quality assessment isn't a primary goal [21:15] <Ironholds> Narodnik: it kinda follows on from "reader input" [21:15] <fabriceflorin_> 98% were from anonymous users. [21:15] <howief> but may fall out of the feature [21:15] <tommorris> holy crap, 45% useful? [21:16] <howief> (e.g., % found what they're looking for could be interpreted as quality assessment) [21:16] <Narodnik> "anonymous users" means not registered editors or "didn't leave their name"? [21:16] <Ironholds> tommorris: that's an average. [21:16] <Narodnik> thanks howief [21:16] <Ironholds> Narodnik: the former [21:16] <halfak> tommorris: yes. As judged by two Wikipedians independently. [21:16] <howief> "anon" = not logged in when they left post [21:16] * tommorris understands, just surprised that it was that high. [21:16] <Ironholds> note that this is before we implement the abuse filter, spam blacklist, etc. [21:16] <fabriceflorin_> Well, let's qualify the 45% useful, tommorris. As Ironholds will explain in the second half, there are many ways to assess usefulness. Here we look at situations where 2 editors agreed the post was useful. [21:17] * Keos (~keos-soph@a83-163-58-132.adsl.xs4all.nl) has joined #wikimedia-office [21:17] <mabdul|busy> Ironholds: is 35k much for such a study? [21:17] <Narodnik> the "finds useful" input is gathered through the FeedbackDashboard? [21:17] <fabriceflorin_> But it is encouraging to note that both readers and editors find the process useful in general. This means both of our goals are being addressed by this tool so far. [21:18] * shim-away (~andrew@wikimedia/Shimgray) has joined #wikimedia-office [21:18] <Ironholds> mabdul|busy: halfak should answer that :) [21:18] <mabdul|busy> Ironholds: sounds not that much for the 8th biggest webpage in the web... [21:18] * shim-away is now known as shimgray [21:18] <halfak> Narodnik: It was gathered using an interface designed for coding feedback. See: http://en.wikipedia.org/wiki/Wikipedia:Article_Feedback_Tool/Version_5/Feedback_evaluation [21:18] <Narodnik> thanks halfak [21:18] <halfak> By "coding" I mean categorizing BTW. [21:18] <Ironholds> mabdul|busy: it *was* deployed on 0.6 percent of the articles, though [21:18] <tommorris> I think another feature of Option 1 that could be added is that it is VERY similar to what people are used to on a lot of websites. e.g. both Microsoft and Apple have those on tech support/developer pages. [21:18] <fabriceflorin_> So let's take a look at the results for each feedback form. For those who are new to this project, we tested 3 feedback forms, which are illustrated in pages 6 to 9. [21:18] <Ironholds> tommorris: that's a great point. Can you link me to the examples after the chat? [21:19] <tommorris> Ironholds: yep [21:19] <Ironholds> ta [21:19] <Narodnik> what's the followthrough on reader-finds-it-useful-so-edits and editor-finds-it-useful-so-edits? [21:19] <fabriceflorin_> All of these feedback forms enabled users to leave comments, but the first one also asked a yes/no question, while the second one offered a range of feedback types and the third one allowed users to rate the article. [21:19] <Ironholds> Narodnik: the form itself, or the comments that are left? [21:20] <Ironholds> so, we don't know yet. Our research into that is in its infancy :) [21:20] <Ironholds> but personally I can't wait to see what the impact is on editing [21:20] <Narodnik> are you tracking edits to pages by people who rate pages or rate ratings? [21:20] <fabriceflorin_> It we jump to page 10, we get to see a PRELIMINARY comparison of the data we collected from four different studies. http://commons.wikimedia.org/w/index.php?title=File:Article-Feedback-Report-Slides-02-02-PRELIMINARY.pdf&page=10 [21:20] <Ironholds> Narodnik: an excellent question; I don't know precisely how we're working it [21:20] <Ironholds> do you want me to email dario and find out? [21:21] <howief> Narodnik: we can run that type of analysis for registered editors [21:21] <howief> if the reader is registered, we can figure out what the impact is on subsequent editing behavior [21:21] <Narodnik> howief: any insight on that front? [21:21] <howief> not yet [21:21] <howief> we unfortunately can't do meaningful analysis on anons [21:21] <Narodnik> Ironholds: cheers, but don't trouble yourself on my account [21:21] <SteveMobile> The current version looks good [21:21] <howief> because of the multiple user to single ip issue [21:22] <Utar> Ironholds: impact of AFT5o bediting? [21:22] <Ironholds> Narodnik: I want to know the answer anyway ;p [21:22] <fabriceflorin_> Based on overall volume, it appears that we received a lot more feedback for option 1 throughout this experiment. [21:22] <Ironholds> Utar: sorry? [21:22] <Utar> Ironholds: I repaired at least two articles during going through FES [21:22] <tommorris> Ironholds: I have now finished my feedback. ;-) [21:22] <Ironholds> Utar: awesome! :D [21:22] <Ironholds> tommorris: thanks :) [21:22] * Jamietw (~Jamietw@wikimedia/Jamietw) has joined #wikimedia-office [21:23] <Narodnik> howief: you can track edits by IP though, for whatever that's worth? [21:23] <Utar> tommorris: at last :D [21:23] <howief> Narodnik: yeah you can, but we can't be sure it's the same individual [21:23] <halfak> tommorris: Thanks! [21:23] <Narodnik> gotcha [21:23] <Ironholds> so, halfak has some info about tracking, apparently :) [21:23] <Ironholds> halfak, over to you [21:23] <Keos> My apologies if I'm interrupting, but is there anybody here I could talk to about graduate research regarding helping people make credibility judgments on Wikipedia? It'll be a literature study of the AFT as well as a lab experiment, and since it's still in the startup fase, input from Wikimedia is valued. [21:23] <fabriceflorin_> We are not sure why that is yet. But the data is very clear on this point: Option 1 generated the most volume of posts, and Option 3 generated the least. [21:24] <halfak> The problem is privacy. We want to track individuals who visit the page at all and their conversion rates... not just the ones who submit feedback. [21:24] <howief> Keos: you can talk to dario [21:24] <Ironholds> Keos: not immediately here, but I can suggest someone to email? [21:24] <Narodnik> fabriceflorin_: are you chaps moving ahead with an Option 1-based model then? or more testing? [21:24] <halfak> However, we *shouldn't* be tracking readers and their activities. [21:24] <Keos> @Ironholds: that would be great! [21:24] <fabriceflorin_> The next line in our comparison chart shows the percentages of editors who felt that the feedback posts they evaluated were useful. [21:24] <Ironholds> Keos: send a proposal/question to dtaraborelli@wikimedia.org. He's our research guy :) [21:24] <halfak> So I've been looking for ways to answer the question with aggregated data with DarTar. [21:25] <tommorris> I have to say, given the quality of the feedback - which we now know both qualitatively and quantitatively, I hope that the feedback given during the trial will be loaded into the database at launch. [21:25] <Keos> Thanks, Ironholds! [21:25] <fabriceflorin_> These percentages are based on cases where 2 editors evaluated the same feedback posts. On that basis, there seemed to be a slightly higher level of utility for option 3. [21:26] <Ironholds> Keos: no problem :) [21:26] <Utar> just a question: [21:26] <Ironholds> Utar: sure :) [21:26] <Narodnik> tommorris: there aren't issues with mixing together data derived in different ways? [21:26] <Ironholds> Narodnik: well, we're not using it for research [21:26] <fabriceflorin_> But as you can tell from Aaron's report, which we will share with you shortly, there are other ways of assessing editor utility. Overall, though, we couldn't find any strong differences between the options. [21:26] <Ironholds> so the source isn't as relevant [21:26] <Ironholds> plus, it's already in the database :) [21:26] <Utar> how much edits were signed as both USEFUL and NOT USEFUL ? [21:27] <tommorris> Narodnik: I don't see why we can't just dump all the feedback into the system. There's some valuable feedback in there about how to improve articles. [21:27] <fabriceflorin_> The third line shows how much the readers who used the tool liked this article feedback tool. [21:27] <Narodnik> Ironholds: sorry, I thought you were voiding the data from the earlier stages? [21:27] <tommorris> It'd do a disservice to readers to solicit their feedback and then not act on it. [21:27] <Ironholds> Narodnik: we shouldn't be! [21:27] <Narodnik> tommorris: I agree wholeheartedly, just wondering [21:27] <Ironholds> tommorris: exactly [21:27] <Ironholds> but I think we need to consider the impact on high-profile pages [21:27] <Utar> i mean how near were ratings of different handcoders [21:28] <Ironholds> there are some articles where the feedback is utter garbage. We should take that into account when loading things in. [21:28] <Ironholds> Utar: that's a question for halfak :). Aaron? [21:28] <Narodnik> fabriceflorin_: are we talking about page 12 now or? [21:28] <fabriceflorin_> In this case, there seems to about the same level of reader satisfaction between option 1 and option 3. [21:28] <tommorris> Ironholds: well, you can automatically hide the ones that we've marked them as either abuse or irrelevant [21:28] <Ironholds> tommorris: that's a great point [21:28] <tommorris> and the stuff that's marked as useful, suggestion etc., you +n that, for some value of n [21:29] <fabriceflorin_> Lastly, we also asked the team that helped develop and advised on this project for their impressions. In this case, it was pretty clear that the team's favorite was Option 1, by far. [21:29] <halfak> Utar: We'll get to that in the quality analysis, but I should say that an overwhelming majority of hand coder ratings agreed. [21:29] <Utar> halfak: ok, thats enough [21:29] <Ironholds> tommorris: that. is genius. [21:29] <halfak> Where they didn't agree, one or both hand coders marked their evaluation as "unsure" [21:30] <fabriceflorin_> So it's still too soon for us to make a definitive conclusions based on this preliminary data, but Option 1 is looking like a good candidate to optimize for, if we all think that's a good direction to explore. [21:30] <tommorris> personally, I prefer #2, but if the data says #1... do what the data says. [21:30] <tommorris> Go Science! [21:30] <Utar> halfak: so it can be said, mostly only unsure comments were marked both usavle and unusable, can't it? [21:30] <howief> Science! [21:30] <fabriceflorin_> But we are now looking for your impressions, overall. [21:30] <SteveMobile> Personally [21:30] <SteveMobile> I woulsnt use the feedback tool [21:31] <SteveMobile> Id leave a comment on the talk page [21:31] * tommorris might use the feedback tool on, say, mobile. [21:31] <halfak> Utar: I want to say that, but not be quoted until I go back to the dataset. [21:31] <howief> that's why you're a wikipedia editor :) [21:31] <Ironholds> SteveMobile: sure, you're not a reader :) [21:31] <Utar> tommorris: there are milions of people who di't kbow Talk page exists [21:31] <tommorris> but then, I might just protect the page and block everyone who edits it, just because I can. ;-) [21:31] <SteveMobile> Lol I guess :p [21:31] <howief> tommorris: mobile is interesting. i'd be very curious to see the s/n ratio for comments from mobile [21:31] <Utar> eh, it hsould be to tommorris [21:31] <fabriceflorin_> The next few slides show more data on volume, with a helpful timeline from Dario Taraborelli's report: http://commons.wikimedia.org/w/index.php?title=File:Article-Feedback-Report-Slides-02-02-PRELIMINARY.pdf&page=12 [21:31] <Utar> no tommorris, SteveMobile [21:32] <Utar> damn, ctrlC,ctrlV [21:32] <fabriceflorin_> That chart confirms that Option 1 (shown in green) received more volume of feedback very consistently throughout our investigation. [21:32] <Utar> definitely to SteveMobile [21:32] <halfak> Utar: If you are still curious about getting a good answer, would you drop me a note @ User:EpochFail and I'll get back to you on the specifics? [21:32] <fabriceflorin_> The big dip, of course, is the SOPA blackout ;o) [21:33] <Utar> halfak: no, only "most" "about half" "they check the same thing hardly anywhere " was good enough [21:33] <fabriceflorin_> The next slide shows the relative utility agreement from the editor evaluations: http://commons.wikimedia.org/w/index.php?title=File:Article-Feedback-Report-Slides-02-02-PRELIMINARY.pdf&page=13 [21:33] <SteveMobile> Me what :p [21:33] <halfak> Utar: OK :) [21:33] <Utar> so I will go with "most" [21:34] <fabriceflorin_> EpochFail did this fabulous research and can explain this chart better than I can. All yours, Aaron! [21:34] <halfak> <--- EpochFail [21:34] * Jamietw (~Jamietw@wikimedia/Jamietw) Quit (Remote host closed the connection) [21:35] * Keos (~keos-soph@a83-163-58-132.adsl.xs4all.nl) has left #wikimedia-office [21:35] <halfak> This chat will answer Utar's question indirectly. The utility is measured by three different ways of combining Wikipedians' evaluations. [21:35] <halfak> *chart [21:35] <Utar> ironholds: for reeadibilty issuess, try make text of page 13 bigger [21:35] <fabriceflorin_> My apologies, I meant to say halfak, which is the username that Aaron is currently using ;o) [21:35] <mabdul|busy> fabriceflorin_: so, I don't want to be rude, but why this study? what shows this study or what was the goal? [21:35] <halfak> On the left is "someone" were at least one Wikipedian marked the feedback as useful. [21:36] <halfak> In the middle is "everyone" where both Wikipedians marked the feedback as useful. [21:36] <Ironholds> Utar: that would certainly help [21:36] <halfak> On the right is "strict" (the proportion we like to report) where both Wikipedians marked the feedback as useful and neither signified they were unsure. [21:36] <Utar> mabdul|busy: we are getting there, be patient [21:37] <Ironholds> mabdul|busy: it was to see which of the three forms works best [21:37] <mabdul|busy> Utar: yeah, no problem (me is following the log, but now i have time XD) [21:37] <halfak> The error bars are based on a the standard error normal approximation to a binomial distribution. [21:37] <Ironholds> so, what one produces the most feedback, the most useful feedback, etc [21:37] <fabriceflorin_> Hi madbul|busy, you might like to read up about AFT5 at this project page, which will answer your question: http://www.mediawiki.org/wiki/Article_feedback/Version_5 [21:37] <halfak> Essentially, if the bars overlap, we aren't confident that the difference isn't just due to sampling error (randomness). [21:38] <Utar> mabdul|busy: basic goal is to chose among options 1, 2 and 3 [21:38] <mabdul|busy> fabriceflorin_: yeah i did, but if we get feedback, what changes this feedback on the actual content?! [21:39] <fabriceflorin_> Thanks, halfak, this is very helpful. Our next slide shows overall responses from over 1,400 readers who used the article feedback tool. http://commons.wikimedia.org/w/index.php?title=File:Article-Feedback-Report-Slides-02-02-PRELIMINARY.pdf&page=14 [21:39] * Jamietw (~Jamietw@wikimedia/Jamietw) has joined #wikimedia-office [21:39] <Utar> halfak: i see you sended to FES only a random part of all given coments?] [21:39] <halfak> Utar: This is correct. [21:39] <Utar> oh] [21:39] <fabriceflorin_> These readers were given an opportunity to take a short survey right after they posted their feedback, which asked them if they liked this feedback form. [21:39] <halfak> And each volunteer received a random sample(s) of that. [21:40] <Utar> ironholds: that's why GF and BEnsin could be succesful to make FES dry [21:40] <halfak> (It's random all the way down). [21:40] <Ironholds> Utar: I know, right! It was crazy! [21:40] <Utar> how much random is random ? >:D [21:40] <Ironholds> although halfak informs me I actually coded the most :D [21:40] <fabriceflorin_> Again, we see no drastic differences between these forms, though there might be a slight advantage for Option 1, particularly if you subtract the number of dislikes from the number of likes. [21:40] <halfak> Ironholds cheated though. [21:40] <Utar> Ironholds: we know, B and GW are puppets :D [21:40] <Ironholds> halfak: this is true [21:41] <Ironholds> I used console commands [21:41] <Ironholds> player.additem patrols 90000 [21:41] <fabriceflorin_> But the good news is that the general response is favorable. Readers like being given a chance to provide some suggestions on how to improve the articles. [21:41] <Utar> first rounf was fast, second one slower, uoit needed 3 clicks instead odf 2 [21:42] <fabriceflorin_> Lastly, we also asked our extended team for their impressions about this tool. The general feeling is that Option 1 would be a helpful tool for this purpose. [21:42] <tommorris> how about editors? has there been any dramatic ANI-esque rumblings or plans to reject the project, murder all those involved and so on? Because that would suck. [21:42] <Ironholds> (if anyone gets where that console command comes from: you rock) [21:42] <Ironholds> tommorris: actually, no [21:42] <Utar> [22:40] <fabriceflorin_> thats because of more comments too [21:43] <Ironholds> I'm hoping it's because the project rocks [21:43] <fabriceflorin_> So that's it for the overall highlights on these preliminary findings. I would like to turn this over now to Ironholds about the overview report he has compiled, based on halfak and DarTar's detailed reports. [21:43] <FooBarMartijn> Ironholds, It's not just "it's pretty good" is it? [21:43] * vvv (vvv@mediawiki/VasilievVV) has joined #wikimedia-office [21:44] <Ironholds> FooBarMartijn: that would also be nice [21:44] <Ironholds> the alternative is editors finally hate us too much to even bother complaining ;p [21:44] <Ironholds> anyway! [21:44] <Ironholds> okay, so I have a slightly more detailed report [21:44] <fabriceflorin_> In the meantime, we will seek more comments from community members like you about your recommendations based on these preliminary findings. The goal is to select a feedback form to optimize for next week, then run more A/B tests for that selected form for the next few weeks, before deploying a final candidate more widely. Sound good? [21:45] <Utar> Ironholds: another way is puppets of devs made thousands of positive comments :D who knows? [21:45] <Ironholds> the report is at http://meta.wikimedia.org/wiki/Research:Article_feedback/Interim_report, if everyone wants to grab a copy [21:45] <Ironholds> print it off! Bring it to Wikimania! Get it autographed! [21:45] <Ironholds> pose for a picture with the author! [21:45] <SteveMobile> Sign it for me [21:45] <Ironholds> </joke> [21:45] <Ironholds> if you all want to take a look at that, it's a brief summary of all the different research [21:46] <Utar> Ironholds: I am trying to find opening tag [21:46] <Ironholds> Utar: opening tag? [21:46] <fabriceflorin_> Hehe, Ironholds. ;o) [21:46] <Utar> Ironholds: was it posted before i joined this channel? [21:46] <Ironholds> Utar: opening what, sorry? [21:46] <mabdul|busy> Ironholds: i will tag it as ^{[citation needed]} and Template:Unreliable and Template:Primarysource ^^ [21:46] <Ironholds> so, the crucial bit is http://meta.wikimedia.org/wiki/Research:Article_feedback/Interim_report#Comparisons - that's the breakdown of each type of data we gathered [21:46] <Utar> <XXX> is tag XXX [21:46] <Ironholds> mabdul|busy: it is all of those things [21:48] <Ironholds> anyway! [21:48] <mabdul|busy> Ironholds: i will check your draft later ;) i have to do stats and discussion about them about a psychology study :/ [21:48] <Ironholds> so, the form that performed best in each category is bolded [21:48] <Ironholds> mabdul|busy: ooh, fun! [21:48] <Utar> Number of posts is inacurete [21:49] <Utar> you used random sample of whoole sack [21:49] <Ironholds> Utar: what do you mean? [21:49] <Ironholds> the number of posts is all the posts [21:49] <Utar> or was it some defined percentage of whole group of comments given? [21:49] <Ironholds> but the posts we evaluated through FES, that's a random sample [21:50] <Ironholds> Utar: the methodology section makes clear they're gathered in different ways :) [21:50] <Ironholds> but halfak can show you the full report if you want? [21:50] <Utar> Ironholds: I am just a bit curious about "number of posts" if it is some (random) part of whole [21:50] <fabriceflorin_> Utar, we used different samples for different studies, which explains some of the discrepancies. [21:50] <Ironholds> Utar: oh, right! [21:50] <Ironholds> no, "Number of posts" is "all the posts" [21:51] <halfak> Utar: Full report on samples in FES: http://meta.wikimedia.org/wiki/Research:Article_feedback/Quality_assessment [21:51] <Ironholds> between 9th January and 24th January [21:51] <Utar> 2,630? in presentation was 10k [21:51] <Ironholds> Utar: different metrics [21:51] <fabriceflorin_> But there seems to be a general convergence between studies on a number of points, which is helpful for us as developers. [21:51] <Utar> oh [21:51] <Ironholds> (there was a small bug with option 3 on like, 4th January or something, so the actual research reports use data from the 9th, when a patch was applied) [21:51] <Utar> thats it [21:51] <Utar> :[ [21:52] <Ironholds> Utar: sorry :( [21:52] <Utar> presnetation Preliminary: option one, toatl posts 10540 [21:52] <Ironholds> Utar: ahh, gotcha [21:53] <Ironholds> okay, so "total posts" is all the posts [21:53] <Ironholds> but "number of posts" in my report is "all the posts from 10th January", because of the bug [21:53] <Utar> oh [21:53] <Utar> that's pissing the credibility of that document a bit, don't you think :D? [21:54] <Utar> my lawyers will look thourg it, just continue :D [21:54] <Ironholds> Utar: yeah. the results for both samples were fairly similar, though. Whichever set of numbers we use, option 1 remains in the lead for volume [21:54] <Ironholds> hahaha [21:55] <fabriceflorin_> I would like to clarify that the volume percentages in page 10 are based on the total number of feedback posts that included comments, using a larger data set than the one in DarTar's study report, which are detailed in page 11. Sorry for any confusion. The good news is that the relative order between options doesn't change, no matter which data set we used. [21:56] <mabdul|busy> -.- I hate statistics XD [21:56] <Utar> fabriceflorin_: ok, just try to clarify that in those documents to not confuse more people [21:56] <Ironholds> mabdul|busy: oh god, so do I [21:56] <Ironholds> but I pretend otherwise so halfak will love me [21:56] <halfak> mabdul|busy & Ironholds: I hate stats too. [21:56] <mabdul|busy> Ironholds: but I can do this stuff somehow... dunno why. I have a good hand for this... [21:56] <halfak> It's just so useful. [21:56] <halfak> :\ [21:56] <fabriceflorin_> Thanks for keeping us honest, Utar! [21:57] <mabdul|busy> halfak: ^{[citation needed]} [21:57] <Ironholds> statistics have an 83.55 percent chance of being useful [21:57] <mabdul|busy> heh [21:57] <fabriceflorin_> DarTar's report on Article Feedback volume is here: http://meta.wikimedia.org/wiki/Research:Article_feedback/Volume [21:57] <Utar> said who? [21:57] <Utar> eh, who said that, i mean? [21:57] <FooBarMartijn> within a 95 precent confidence bracket of 7.03 percent point [21:57] <fabriceflorin_> Lots to chew on … DarTar is on vacation this week, but can give us a full overview when he returns. [21:58] <Ironholds> FooBarMartijn: you fool! it's 7.02 percent! The project is doomed! [21:58] <mabdul|busy> fabriceflorin_: uuugh, why are the graphs in that pdf so bad pixeled? use svg/some useful tools XD [21:58] * RoanKattouw_away (~chatzilla@mediawiki/Catrope) Quit (Quit: zzzzzzzzzzzzzzzzzzz) [21:58] <Utar> does "confidence bracket" mean "plus or minus"? [21:59] <halfak> http://en.wikipedia.org/wiki/Confidence_interval? [21:59] <fabriceflorin_> Hey madbul, if you click on the PDF image, it takes you to the full, non-pixelated version of the PDF, at this URL: http://upload.wikimedia.org/wikipedia/commons/8/8e/Article-Feedback-Report-Slides-02-02-PRELIMINARY.pdf [21:59] <FooBarMartijn> yeah, that's what I meant ;) [21:59] <FooBarMartijn> Ironholds, percent, or percent point? [22:00] <Ironholds> FooBarMartijn: percent points [22:00] <mabdul|busy> fabriceflorin_: I was talking about http://meta.wikimedia.org/wiki/Research:Article_feedback/Volume, but yes, it seems a mw probelm XD [22:01] <Utar> FooBarMartijn : from 87.97 to 102.3 percent chance of being useful ??? [22:01] <Utar> lol [22:01] <fabriceflorin_> Hey everybody, do you have any general observations about phase 1 of the AFT5 project, based on this preliminary data? [22:01] <Utar> File:Article-Feedback-Report-Slides-02-02-PRELIMINARY.pdf [22:02] <Utar> what are those letters on page 12? [22:02] <tommorris> fabriceflorin_: in general, it's rocking pretty hard. [22:02] * pgehres (~pgehres@wikimedia/Pgehres-WMF) has joined #wikimedia-office [22:02] <mabdul|busy> ("Rate this article", orange) – underFeedback by design ("boxplot") is really bad --> It's missing a legend [22:02] <mabdul|busy> Utar: how can something be reported 140 (%?)? [22:03] <fabriceflorin_> Thanks, tommorris, really appreciate the encouragement. We are certainly giving it our absolute best, because we think it's an important opportunity to engage more users, with an overall focus on quality. [22:03] <Utar> mabdul|busy: it is number of commnets, not some percentage [22:03] <mabdul|busy> oooh [22:03] <Ironholds> Utar: investigating :) [22:03] <Utar> Ironholds: another Da Vinci Code? [22:03] <mabdul|busy> no really, only 100 for option3 (rate the article)? [22:04] <fabriceflorin_> Utar, the letters mark significant events that influenced the project. [22:04] <Ironholds> Utar: heh. they mark significant events that distort things [22:04] <Ironholds> so if you see D, for example, that's the SOPA blackout [22:04] <Utar> i just got to that too [22:04] <Utar> interesting there are still some [22:04] <howief> here's the actual report: http://toolserver.org/~dartar/aft5/ [22:04] <Utar> i saw latest changes that day [22:04] <howief> you can hover over the letters to see what they represent [22:05] <Utar> and only HELP file for SOPA was under construction [22:05] <Utar> with even TWO hours between two edits [22:05] <fabriceflorin_> Check out DarTar's report to learn more: http://meta.wikimedia.org/wiki/Research:Article_feedback/Volume [22:05] <Utar> that was English Wikipedia rush hour :D [22:05] * Theo10011 (~Theo10011@wikimedia/Theo10011) has joined #wikimedia-office [22:05] <mabdul|busy> fabriceflorin_: i do, answer my last question XD [22:06] * gwicke is now known as gwicke_away [22:06] <Utar> howief: my hovering is not doing anything [22:06] <Ironholds> mabdul|busy: oh! [22:06] <Ironholds> you mean how the numbers all hover around 100? [22:06] <mabdul|busy> Utar: turn on javascript ^^ [22:06] <fabriceflorin_> Note that the best data for our purposes starts after the black-out, when we collected the most reliable data, and it was outside the holiday period, which is always a bit unusual. So the next few weeks will offer more findings. [22:07] <mabdul|busy> Ironholds: http://meta.wikimedia.org/wiki/Research:Article_feedback/Volume#Feedback_by_design [22:07] <mabdul|busy> the boxplots [22:07] <Ironholds> mabdul|busy: oh, gotcha [22:07] <Utar> mabdul|busy: I know something called readibility and accesibility [22:07] <Utar> what about poor people with ou javascript? [22:07] <Ironholds> yes; option 3 got substantially less feedback than the other two [22:07] <fabriceflorin_> Halftak, what are you working on next as part of your research? Also, what are your key observations so far? [22:07] <Ironholds> it's noted in my interim report :) [22:07] <mabdul|busy> Utar: (I have it turned of by default except some sites liek the wmf servers XD) [22:07] <Ironholds> so, I'll be back in two minutes [22:07] <Ironholds> I have to go to the little contractor's room [22:08] <mabdul|busy> Ironholds: and what is the Y-achis? [22:08] <mabdul|busy> *axis [22:08] <Ironholds> which is two jokes, in fact, because I'm a midget [22:08] <fabriceflorin_> Hehe Oliver, [22:08] <tommorris> it's like going to the toilet, but you charge for a whole hour at the end of it. [22:08] <Ironholds> mabdul|busy: I believe it's feedback posts, but I'm not entirely sure. Do you want me to email dario and find out? [22:08] <Utar> goes to find a dictionary with midget [22:08] <Ironholds> tommorris: no, I only do that for lunch and naps [22:08] <mabdul|busy> at least he is not smoking... [22:08] <fabriceflorin_> Very funny, TomMorris. [22:08] <Utar> found that [22:08] * howief (~howiefung@216.38.130.165) Quit (Quit: howief) [22:09] <halfak> Overall, I've learned from the analysis of quality that the minor differences between the three interfaces don't substantially affect the quality of feedback that we receive from them. With this in mind, I think it is more important that we focus on maximizing the amount of feedback and capturing information from users that Wikipedians will actually find useful (e.g. Proportion of feedback marked "Did you find what you were looking for?") [22:09] <Ironholds> back! [22:10] <Utar> reads it once again [22:10] <halfak> Next, I'll be looking at editor conversion with DarTar. On a related note, we've got some plans to examine editor retention and use of MoodBar. [22:10] <Utar> finnaly got through [22:10] * sgardner (~sgardner@216.38.130.163) Quit (Ping timeout: 272 seconds) [22:11] <tommorris> oh yeah, one little thing: next time, please, please, if there's ever a "help us churn through lots of data" tool - please make it keyboard accessible [22:11] <jorm> i'd love to have some serious numbers about moodbar and editor retention. [22:11] <jorm> also feedback dashboard responses. [22:11] <Utar> somebody: how it looks with positioning of link to comment and links to page witrh got comments? [22:11] <halfak> tommorris: I wish you would have said so earlier. I respond quickly to feature requests! [22:12] <mabdul|busy> halfak: sry, but only for a conclusion for me: why did the wmf send money(?)/manpower/whatever on this stuy(ies?) to get results on a design with no improvement on the content? I mean what does the feedback change the product now? [22:12] <tommorris> halfak: have you seen Freebase's Genderizer? I could spend all day on that thing [22:12] * howief (~howiefung@216.38.130.165) has joined #wikimedia-office [22:12] <Utar> tommorris: good point, my mouse had to go on vacation for a week after a night with FES [22:12] <Ironholds> Utar: hahah [22:12] <Ironholds> mine went to maui [22:13] <Utar> Ironholds: mine told me later that she met yours on the beach [22:13] * Jamietw (~Jamietw@wikimedia/Jamietw) Quit (Quit: Leaving) [22:13] <halfak> mabdul|busy: Now we *know* that the changes to the design didn't have a substantial effect on *quality* but we *do* know that it has an effect on *quantity* [22:13] <Ironholds> Utar: yeah, he sent me a load of photos [22:13] <Ironholds> they look happy together [22:14] <Utar> Ironholds: i had to reserve whole one wall in my room for them [22:14] <Utar> Ironholds: we should not stand in the way of thir happiness [22:14] <mabdul|busy> halfak: so with these feedbacks of the (mostly anon) users the content changes? how? in which kind? [22:14] <Ironholds> Utar: hahaha [22:14] <Utar> i, at least, hope it is only friendship and notjing more [22:14] <mabdul|busy> Ironholds: with cable? XD [22:14] <Utar> : [22:14] <Utar> D [22:15] <Ironholds> mabdul|busy: so, could you clarify your question, please? [22:15] <Utar> Ironholds: which connections are between people editing page and people commenting page [22:15] <Ironholds> Utar: oh! [22:15] <mabdul|busy> "[23:14:07] <halfak> mabdul|busy: Now we *know* that the changes to the design didn't have a substantial effect on *quality* but we *do* know that it has an effect on *quantity*" how? [22:16] <mabdul|busy> how does it affect? [22:16] <Ironholds> mabdul|busy: we don't know which elements make a difference in quantity [22:16] <Ironholds> just that option 1 gets more hits than 2 or 3. [22:16] <mabdul|busy> yes, there is the gap! [22:16] <Ironholds> it's like saying "why does gravity happen" [22:16] <halfak> We just know that option #1 fares better. [22:16] <Ironholds> we don't know why, just that it does [22:16] <halfak> *better = more [22:16] <Utar> Ironholds: maybe option 1 got "good" articles? [22:16] <Ironholds> well, actually we do know why gravity happens [22:16] <mabdul|busy> so we have a study on design which makes no matter (ok, a conclusion) [22:17] <Utar> [23:16] <Ironholds>: it is MAGIC [22:17] <mabdul|busy> better = more, but no content change, or? [22:17] <Ironholds> Utar: well, no. all of them were deployed on the same articles [22:17] <Ironholds> but each got a third of users :) [22:17] <Utar> so it got good users [22:17] <Utar> easy [22:17] <Ironholds> Utar: haha. possible! [22:18] <fabriceflorin_> Hey guys, thanks as always for all your good observations and questions. Your insights are always much, much appreciated. You guys keep us honest, and your input is hugely valuable in planning our next steps. [22:18] <mabdul|busy> so again: with that feedback, will it ever have an impact on the content? or is it just for the archives? [22:18] <jorm> $deity, i hope it impacts content. [22:18] <Utar> is it possible to find there were, say tottaly 10 IP addresses which commented more than 100 pages [22:18] <Utar> ? [22:19] <fabriceflorin_> I have to go now, but look forward to continuing our discussion in coming weeks. Onward! [22:19] <Ironholds> mabdul|busy: it will hopefully have an impact [22:19] * SarahBusy (~SarahStie@wikipedia/SarahStierch) Quit (Quit: Ta-ta!) [22:19] <Utar> fabriceflorin_: that's why we have been swaged [22:19] <Ironholds> but that's up to editors [22:19] <jorm> my vision is to have it accumulate so that we can further refine the "what can be done to improve this" bit, into a set of common bullets, and use them to generate work queues. [22:19] <shimgray> work queues would be great [22:19] <mabdul|busy> Ironholds: how? [22:19] <Ironholds> so, Utar tells me he made two tweaks based on the feedback he got just from doing the analysis :) [22:19] <Utar> those good-commenter could be good material for a wikipedian [22:19] <shimgray> especially if we can go the other way as well, and absorb article tags [22:19] <Ironholds> mabdul|busy: all comments will be left on a feedback page that editors can view, and then make edits accordingly [22:19] <halfak> mabdul: I'm not quite sure what you are getting at and I'm sorry to say I'll have to run, but if you want to talk more, I'm watching the talkpage here: http://meta.wikimedia.org/wiki/Research:Article_feedback/Quality_assessment and you can always message me here: http://en.wikipedia.org/wiki/User:EpochFail [22:20] <halfak> Take care guys! [22:20] <mabdul|busy> halfak: i will if Ironholds can't answer later XD [22:20] <fabriceflorin_> Thanks, Utar, it's really cool to be working with you over such a long stretch of time. You have made a big difference already! (even if we don't always use everyone of your good ideas ;o) [22:20] * halfak (halfak@tako.cs.umn.edu) has left #wikimedia-office [22:20] <Utar> it were those: [22:20] <Utar> http://en.wikipedia.org/w/index.php?title=January_17&diff=prev&oldid=473323623 [22:20] <fabriceflorin_> OK, bye y'all! Talk to you next week. Thanks again for everything … [22:20] <Utar> http://en.wikipedia.org/w/index.php?title=January_17&diff=prev&oldid=473323623 [22:21] <Utar> dman [22:21] <Utar> http://en.wikipedia.org/w/index.php?title=Johnny_Gildea&diff=prev&oldid=473323647 [22:21] <fabriceflorin_> Yes, I know, Utar. We haven't forgotten. [22:21] <Utar> anddcan someone answer my last uestion? [22:21] <Utar> about really active commentrers? [22:21] * tommorris wonders if WMF designers have ever read Yahoo's excellent Social Design Patterns website - http://developer.yahoo.com/ypatterns/social/ [22:22] <Ironholds> tommorris: ask jorm :) [22:22] <Utar> fabriceflorin_: by fab, see you soon again [22:22] <jorm> i have. [22:22] <jorm> i look at anything with "yahoo" and "patterns" with extreme suspicion. [22:22] <Ironholds> okay, so it's Oliver Plausible Deniability Keyes, here to answer your miscellaneous questions [22:22] <Ironholds> anyone with questions that weren't answered during office hours...shoot! [22:23] <Utar> mine [22:23] <tommorris> some good people wrote it, even if they did work at Yahoo [22:23] <Ironholds> Utar: yes! Remind me? [22:23] <Utar> Ironholds: is it possible to find there were, say tottaly 10 IP addresses which commented more than 100 pages [22:23] <tommorris> note the word 'did'. for many, that is the appropriate term. [22:23] <Ironholds> Utar: it would be possible, yeah; all feedback actions should be logged [22:23] <Ironholds> so theoretically you'd be able to get them from the API, say [22:23] <Utar> those good-commenter could be good material for a wikipedian, [22:23] <Ironholds> but I'm not sure if it would tell us anything useful [22:23] <Ironholds> the entire nation of quatar shares one email address, say ;p [22:23] <Ironholds> *IP address [22:23] <Ironholds> sorry, brainfuck [22:24] <Utar> and I would advice to some tema to conatct them [22:24] <Ironholds> *nods* [22:24] <FooBarMartijn> jorm, the social design, but also the perfomence guidelines by Yahoo are excellent [22:24] <Ironholds> that's something we should totally look into after we get things working [22:25] <jorm> oh, i'm not saying it's not worth looking at. [22:25] <jorm> but these are the same people who made yahoo answers. [22:25] <Utar> and i also didnt saw responce for my previous answer [22:25] <FooBarMartijn> true :) [22:25] * fabriceflorin_ (~fabricefl@216.38.130.165) Quit (Read error: Connection reset by peer) [22:25] <Ironholds> Utar: sorry? [22:26] <Utar> looking for it [22:26] * sgardner (~sgardner@216.38.130.166) has joined #wikimedia-office [22:26] <Utar> [23:11] <Utar> somebody: how it looks with positioning of link to comment and links to page witrh got comments? [22:26] <Ironholds> sgardner: you're coming to my office hours session too? ;p [22:26] <Ironholds> Utar: uhh. could you clarify? [22:27] <Ironholds> I'm having trouble parsing that [22:27] <Utar> Ironholds: sure [22:27] <Utar> are there links to page if comments gained alreay? [22:27] <Utar> if = with [22:27] <Ironholds> Utar: oh, right! [22:27] <Ironholds> as in, links to the feedback page? [22:27] <Utar> yup [22:27] <Ironholds> we have the links. Fabrice has told me I can't show it to you until next week :S [22:28] <Utar> I like to make new teminus technicuses. [22:28] <Ironholds> which sucks. Up until 20 minutes before office hours we were meant to distribute today [22:28] <Utar> Ironholds: sad, but satysfaing [22:28] <Ironholds> sorry :( [22:28] <Utar> second part: [22:28] <Utar> how with links do FEEEDBACK THIS PAGE? [22:28] <Utar> positioning problems [22:29] <Utar> up, left, right, corner , floating, A,B,C,D,E,F,G,H...... [22:29] <Ironholds> oh, gotcha! [22:29] <Utar> [feedback] and so [22:29] <Utar> :D [22:29] <Ironholds> so, what positioning are we planning to test next? [22:29] <Ironholds> we're scrapping the annoying thing on the bottom right :D [22:30] <Ironholds> I think the one we're testing after that is a teeny tiny "provide feedback" link under the article title [22:30] <mabdul|busy> -.- another useless study XD [22:31] <Ironholds> mabdul|busy: how's it useless? [22:31] <Utar> bytheway, do you have some ryules of graphic of navboxes? [22:31] <Utar> Template:Pittsburgh Steelers starting quarterback navbox is iching my eyes [22:31] <Ironholds> uhh. nooo idea, I'm afraid :) [22:31] <Ironholds> I'm not a templates person [22:31] <mabdul|busy> Ironholds: uugh [22:32] <mabdul|busy> Ironholds: content related [22:32] <Utar> Ironholds: ok, that means we are a furlong in front [22:32] <Ironholds> mabdul|busy: well, it's basic a/b testing [22:32] <Utar> i should really do the transltion [22:32] <Ironholds> we want to work out which form produces the most useless feedback and the least sucky feedback [22:32] <Ironholds> sorry, most useful [22:32] <mabdul|busy> Ironholds: with the wrong basis... [22:32] <Ironholds> *useful*! :D [22:32] <Ironholds> mabdul|busy: howso? [22:32] * SteveMobile (~SteveMobi@wikimedia/Steven-Zhang) Quit (Quit: Colloquy for iPhone - http://colloquy.mobi) [22:33] <Utar> Ironholds: you should also mention tha tlater studied position will proably got more hits as AFT goes to people minds [22:33] <Ironholds> Utar: agreed :) [22:33] <Ironholds> it's going to be interesting to see what happens when people get used to the form [22:33] <Utar> or less, if they find it disgusting :D [22:33] <Ironholds> like, at the moment we're getting lots of "TEST!" style feedback [22:34] <Utar> yeah [22:34] <Utar> i saw it [22:34] <Ironholds> okay; I'm afraid I have to go :( [22:34] <Ironholds> the other project I'm working on - the new Special:NewPages interface? I have to publish a complete community engagement plan by Monday [22:34] <Utar> Ironholds: some talkings about exact feedback can be uuseful too [22:34] <Ironholds> or on monday. one of the two. I'll find out! [22:34] <Utar> like "you marked it so , you other way - why?" [22:35] <Ironholds> Utar: that would be really interesting to investigate. I'll talk to Aaron [22:35] <Utar> or "what they intended by posting it"? [22:35] <mabdul|busy> Ironholds: you should concentrate a/b testing on something you want to improve something [22:35] <Ironholds> mabdul|busy: we do want to improve it? [22:35] <Ironholds> otherwise we wouldn't be testing on it [22:35] <Ironholds> we don't just design testing infrastructure for shits and giggles. [22:35] <Utar> Ironholds: I for exapmple got at least two nonEnglish comments [22:35] <mabdul|busy> Ironholds: yeah, what do you want to impove? [22:35] <Ironholds> mabdul|busy: the Article Feedback Tool? [22:36] <Utar> mabdul|busy> Wikipedia? [22:36] <Ironholds> which is why we're a/b testing tweaks to it? [22:36] <Utar> World? [22:36] <Ironholds> anyway; lovely talking to y'all, to steal fabrice's word. Drop me an email if you have any additional questions - okeyes@wikimedia.org [22:36] <Utar> It> Session Close: Thu Feb 02 22:36:22 2012