Research:Newsletter/2011/November

Vol: 1 • Issue: 5 • November 2011 [contribute] [archives]

Quantifying quality collaboration patterns, systemic bias, POV pushing, the impact of news events, and editors' reputation

With contributions by: Tbayer, Hfordsa, DarTar and Romanesco

Collaboration pattern analysis: Editor experience more important than "many eyes"

One of the motifs indicating article quality: One editor (top) having worked on several related articles (bottom)

A paper titled "Characterizing Wikipedia Pages using Edit Network Motif Profiles"^[1] by three researchers from University College Dublin indicates that the quality of a Wikipedia article can be predicted from characteristics of its "edit network" – a graph derived from the collaboration of Wikipedians in that area. Network motifs are small graphs which occur particularly frequently as sub-graphs of networks of a certain kind, and can be regarded as its building blocks in some sense. (The concept is popular in bioinformatics, where it is applied to gene regulatory networks.) In this paper, the authors use graphs with at most five nodes consisting of users and articles, which are connected by an edge if the user has edited the article – giving 17 possible "Wikipedia network motifs". (Anonymous users are disregarded.) For a Wikipedia article, the researchers form an "ego network" consisting of that article, articles which link to it (and have been edited by at least one of the users who edited the core article), and the users who edited them. For a sample of around 2000 articles from the History and United States categories, the frequencies of the 17 "Wikipedia network motifs" in those article's "ego networks" were calculated.

Using machine learning techniques, the researchers are able to discern with some certainty articles of basic quality (defined as having been assessed as Start class by Wikipedians) from those of good quality (defined as Featured or B class), solely based on this set of motif frequencies in the article's edit network. Looking at the impact of each of the 17 types separately, they found that "all network motifs have some potential to discriminate between good and basic Wikipedia articles" in the sample, but that among the four best predicting motifs, three are "stars with editors at their centre":

"This is interesting because it shows that many eyes is not really the defining characteristic of quality; instead experience is important – the editors should have worked on many other articles."

Another section of the paper constructs spatializations of the sample (i.e. a two-dimensional mapping where articles with similar motif frequency are close to each other). For the history articles sample, this visualization clearly separated B class and Start class articles, but Featured articles are "more spread out", with two clusters on opposite sides of the diagram. The researchers made the interesting discovery that this seems related to the assessed importance of the articles:

"It transpires that the Featured Articles on the left are inclined to be low or mid importance compared to high or top importance articles on the right. This niche characteristic is emphasized by the fact that these articles are inclined not to have been featured on the Wikipedia main page. We conclude from this that, at least in edit network terms, some low importance Featured Articles look like more ordinary articles. ... It seems that articles on niche topics can reach Featured Article status without a huge amount of collaboration."

Systemic bias quantified for twenty language Wikipedias

A paper titled "Cultural Configuration of Wikipedia: Measuring Autoreferentiality in Different Languages"^[2] by two researchers from the Universitat Politècnica de Catalunya, published in the proceedings of the "Recent Advances in Natural Language Processing" conference and apparently based on the first author's masters thesis^[3] attempts to test the hypothesis that "contributing for the visibility of the own national or language related content" is among the motivations to participate in Wikipedia. According to the authors, "some informal surveys in Catalan WP association ‘Amical Viquipèdia’ showed how the national topics were a focus of interest for writing and conflict". They propose the concept of "autoreferentiality" "to describe the interest of a culture on itself, which in WP translates to the interest of editors for their own local content in a WP language edition", and set out to measure it by various quantitative features, which are first defined on the article level and then tested on a selection of articles that are assumed to be "local content", using the Java-based WikAPIdia tool. (This set is formed by starting with a few keywords clearly pertaining to the local language, and then including articles which share categories – as examples from their own language, the authors list "“catalunya”, “català”, and also “valencia” or “mallorquí”" as start words, which "would retrieve titles in articles and categories like “escriptors de catalunya” or “dret català”, referring to writers and law".) Among the tested quantitative features are:

"Isolation", based on the number of interwiki links
"Effort", based on the size of the article and the number of internal links it contains
"Prominence", based on the number of incoming wikilinks, the number of categories where the article is a member, and its PageRank
"Edition", a measure of how diverse the authorship of an article is, specified as the smallest number of editors who together contribute 80% of the page's edits (assuming to be lower for local content because it is edited by "highly motivated users")

The paper applies the eventual formula to Wikipedias in twenty languages – the English language edition is excluded "due to its size and difficulties in processing in all dimensions", and the second and third largest Wikipedias (German and French) are missing as well. In the final "autoreferentiality index", the Icelandic, Japanese and Swahili Wikipedias come out as the most local-focused among these twenty, while, curiously, the Catalan edition which prompted the research question has the lowest autoreferentiality value.

Does "In the news"-like attention have a positive effect on article quality?

A five page paper^[4] by a Ph.D. student in Computer Science at the University of Iowa examines "The Impact of Heavy Editorial Events on Wikipedia Page Quality" – for example the flurry of edits to the article Elizabeth Taylor after the actor's death in March 2011. To measure quality, the approach of an earlier paper^[5] is used, which assigns article contributors a reputation value depending on how many of their earlier contributions have been deleted, and by whom, and also takes into account whether the article revision in question was reverted later. The resulting formula was applied to "high editorial events" in 100 articles of the English Wikipedia, from the start of Wikipedia in 2001 until the beginning of 2010. As expected, the data supported the hypothesis that "high editorial events would contribute positively to a page's quality". The five articles impacted most positively among the studied sample (biased toward the beginning of the alphabet) were art, Allen Ginsberg, anarcho-capitalism, chiropractic and death. The paper also found that a higher increase in the edit rate was associated with a higher quality increase, but does not address the question of whether the relation could be explained by the mere number of edits (i.e. whether the same number of edits over a longer time might have had the same effect).

Detecting POV pushing editors

A working paper posted this month to ArXiv with the title "Pushing Your Point of View: Behavioral Measures of Manipulation in Wikipedia" presents a method to score the neutrality of Wikipedia contributors and to "detect potential POV pushing behavior".^[6] The authors propose two metrics to quantify an editor's involvement in controversial topics. The first metric (Controversy score or C-score) measures the amount of attention spent by an individual editor on controversial articles, where controversiality is defined on the basis of several quantitative factors previously established in the literature. The second metric (Clustered Controversy score or CC-score) quantifies the focus of an editor's attention on controversial articles on the same topic or very similar topics: the purpose of this metric is to tease apart editors involved in genuine controversy resolution (such as administrators who are likely to participate in a broad range of discussions on controversial topics) from "potentially manipulative users" who focus their attention on a narrow set of controversial topics. To assess the validity of the above metrics the authors test their discriminatory power at identifying which editors are blocked and which are regular users who were never blocked. The remainder of the paper examines the breakdown of edits by administrators immediately after a successful Request for Adminship. The results, based on qualitative coding by a single reviewer, suggests that some topical areas in the English Wikipedia (such as politics and media) are more likely to be frequently edited by administrators with a high C-score and CC-score than any other topical categories.

Historian of encyclopedias reviews Good Faith Collaboration

The most recent issue of Annals of Science (a scholarly journal about the history of science and technology, founded in 1936) contains a four-page review^[7] of Joseph Reagle's book Good Faith Collaboration: The Culture of Wikipedia (published in 2010 and recently released on the Web under a CC-BY-NC-SA license). The reviewer Jeff Loveland, who has written extensively about the early history of encyclopedias, criticizes the book for having "one major weakness, namely in historical contextualization" (he mentions two 18th-century precedents which should have been given more attention, as they, like Wikipedia, intended to include contributions from the public: Vincenzo Coronelli's Biblioteca Universale and Zedler's Universal-Lexicon) – and rejects Reagle's claim that "historically, reference works have made few claims about neutrality as a stance of collaboration, or as an end result": "References to such values as impartiality, unbiasedness and objectivity are frequent in the prefaces of encyclopaedias over the last three hundred years". On the other hand, the reviewer praises the book for "com[ing] close to offering" a comprehensive introduction to Wikipedia, "touching as it does on nearly all aspects of the encyclopaedia" and he commends the author's writing style as "informal, energetic and appropriately paced". The "insightful and worthwhile" ethnography of Wikipedia is highlighted as the second success of the book.

Regarding chapter 3 of the book, which postulates Neutral Point of View and Assume Good Faith as the two principles at "the heart of Wikipedia collaboration", the review recommends "Anne Goldgar’s study of conduct as a force binding together the early modern Republic of Letters in Impolite Learning (1995) [as] an interesting point of comparison" regarding "the historical connection between knowledge and civility". Commenting on chapter 7, which examines criticism of Wikipedia, Loveland observes that "the portrayal by critics of a possible Wikipedian collective intelligence as anti-individualistic, or anti-rationalistic seems opportunistic and off-the-mark. Meanwhile, Wikipedia now bears the brunt of a refurbished but centuries-old accusation against encyclopaedias, namely that they trivialize and fragment knowledge."

Briefly

Automatically assessing editors' reputations: Wöhner, Thomas and Köhler present new metrics for automatic reputation assessment of Wikipedia editors. They evaluate seven potential metrics for reputation assessment including editing frequency and contribution to high-quality articles, plus new metrics that they conceived including 'efficiency' which they define as 'the portion of an author's contribution that is persistent and quantifies the acceptance of the author within the Wikipedia community.' They evaluate the metrics using a database of the Germany Wikipedia from January, 2008 and tested their metrics against Wikipedia's internal user classification of blocked users, administrators and anonymous users. They conclude that editing efficiency is most significant for reputation assessment since it was able to distinguish between blocked and regular authors with an accuracy rate of 86%.^[8]
Students reflect on Wikipedia assignments: Chen and Reber present the results of a pilot study where students from a Norwegian and a German university were asked to reflect on their experience writing a Wikipedia article as a course assignment. They provide the results of two independent judges who analyzed the written reflections of students on ten dimensions including: relevance for society, learning outcome and difficulty, among others. The authors conclude that students were highly motivated by the task and 'have learned much about the topic that they wrote about'.^[9]
Too few newbies?: A paper titled "Too Few New Wikipedians? Modelling Effort and Participation in Wikipedia"^[10] evaluates "the efficiency of the Wikipedia projects in different languages in transforming inputs (people using the Internet) into outputs (articles). We find a decreasing return to scale in the biggest projects, but the size or the age of the projects are not the main explanation for the variations in efficiency we see."
How student editors use sources: synthesis vs plagiarism: Sormunen and Lehtio report the findings of a pilot study on how Finnish secondary-school students use sources when they are required to contribute to Wikipedia as part of their coursework. They interviewed, observed, and analyzed the work of 11 groups of students, and found that: (1) the students relied almost exclusively on Web sources, (2) a sizable fraction (33%) of their work was copied verbatim (or very lightly edited) from their sources, (3) 30% of sources used were not cited at all. The dataset in this study is extremely small and the sample was not designed to be at all representative. Still, the conclusions are disconcerting, especially considering the recent controversy over student plagiarism in a related Wikipedia writing program (Signpost coverage: "A post-mortem on the Indian Education Program pilot"). The interviews with the students could potentially provide insight into why student plagiarism occurs.^[11]
Tracking changes in Wikipedia: A student thesis in Computer Science at the University of Dresden^[12] describes the prototype of a software that tracks and categorizes edits on Wikipedia – trying, among other things, to detect articles that are being affected by external events. In a test sample containing articles that had been subject to major news events in recent years, such as Fukushima Daiichi Nuclear Power Plant or Dominique Strauss-Kahn, "about 74% of the events ... have been detected and about 68% of these detected events (74%) are recognized correctly."
Gendered language on Wikipedia: Several Wikimedians have announced a study titled "Mind the Gap(s)! Writing Styles of Female Editors on Wikipedia",^[13] applying algorithms that try to classify a text as "male" or "female" (based on the frequency of "male keywords" and "female keywords") to text contributions by editors who state their gender on their user page (1,119 females and 722 males). Among the conclusions: "While the data is insufficient to reach the conclusion that Wikipedia attracts females who code their language usage as male in all circumstances on-wiki and off-wiki, we have shown that females use a more male style of writing when writing for Wikipedia."

References

↑ Wu, Guangyu, Martin Harrigan, and Pádraig Cunningham (2011). Characterizing Wikipedia Pages using Edit Network Motif Profiles. In Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents (SMUC'11) at the 20th ACM Conference on Information and Knowledge Management (CIKM'11), ACM Press, October 28, 2011. DOI • PDF
↑ Ribé, Marc Miquel, and Horacio Rodrìguez (2011) Cultural Configuration of Wikipedia: Measuring Autoreferentiality in Different Languages. In Proceedings of Recent Advances in Natural Language Processing, 316-322. Hissar, Bulgaria. PDF
↑ Ribé, Marc Miquel (2011) Cultural conﬁguration of Wikipedia: Measuring autoreferentiality in different languages. Universitat Politècnica de Catalunya. PDF
↑ Oliver, Corey (2011) The Impact of Heavy Editorial Events on Wikipedia Page Quality. PDF
↑ Javanmardi and C. Lopes. Statistical Measure of Quality in Wikipedia. In: 1st Workshop on Social Media Analytics (SOMA ’10), July 2010. PDF
↑ Das, Sanmay, Allen Lavoie, and Malik Magdon-Ismail (2011). Pushing Your Point of View: Behavioral Measures of Manipulation in Wikipedia. arXiV, November 8, 2011. PDF
↑ Loveland, Jeff (2011). Review of: Good Faith Collaboration: The Culture of Wikipedia. Annals of Science 68 (4) (October) 555-558. DOI
↑ Wöhner, Thomas, Sebastian Köhler, and Ralf Peters (2011). Automatic Reputation Assessment in Wikipedia. In ICIS 2011 Proceedings. HTML
↑ Chen, Weiqin, and Rolf Reber (2011). Writing Wikipedia Articles as Course Assignment. In Proceedings of the 19th International Conference on Computers in Education, T. Hirashima et al. (Eds). Chiang Mai, Thailand. PDF
↑ Crowston, Kevin, Nicolas Jullien, and Felipe Ortega (2011) Too Few New Wikipedians? Modelling Effort and Participation in Wikipedia. SSRN eLibrary. PDF
↑ Sormunen, Eero, and Leeni Lehtio (2011) Authoring Wikipedia articles as an information literacy assignment – copy-pasting or expressing new understanding in one's own words? PDF
↑ Deng, Yihan (2011) Change Tracking in Wikipedia. Master Thesis, PDF
↑ LauraHale, Hawkeye7, Pine and others, Mind the Gap(s)! Writing Styles of Female Editors on Wikipedia.

Wikimedia Research Newsletter
Vol: 1 • Issue: 5 • November 2011
About • Subscribe: Email • [archives] • [Signpost edition] • [contribute] • [research index]

[1] Wu, Guangyu, Martin Harrigan, and Pádraig Cunningham (2011). Characterizing Wikipedia Pages using Edit Network Motif Profiles. In Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents (SMUC'11) at the 20th ACM Conference on Information and Knowledge Management (CIKM'11), ACM Press, October 28, 2011. DOI • PDF

[2] Ribé, Marc Miquel, and Horacio Rodrìguez (2011) Cultural Configuration of Wikipedia: Measuring Autoreferentiality in Different Languages. In Proceedings of Recent Advances in Natural Language Processing, 316-322. Hissar, Bulgaria. PDF

[3] Ribé, Marc Miquel (2011) Cultural conﬁguration of Wikipedia: Measuring autoreferentiality in different languages. Universitat Politècnica de Catalunya. PDF

[4] Oliver, Corey (2011) The Impact of Heavy Editorial Events on Wikipedia Page Quality. PDF

[5] Javanmardi and C. Lopes. Statistical Measure of Quality in Wikipedia. In: 1st Workshop on Social Media Analytics (SOMA ’10), July 2010. PDF

[6] Das, Sanmay, Allen Lavoie, and Malik Magdon-Ismail (2011). Pushing Your Point of View: Behavioral Measures of Manipulation in Wikipedia. arXiV, November 8, 2011. PDF

[7] Loveland, Jeff (2011). Review of: Good Faith Collaboration: The Culture of Wikipedia. Annals of Science 68 (4) (October) 555-558. DOI

[8] Wöhner, Thomas, Sebastian Köhler, and Ralf Peters (2011). Automatic Reputation Assessment in Wikipedia. In ICIS 2011 Proceedings. HTML

[9] Chen, Weiqin, and Rolf Reber (2011). Writing Wikipedia Articles as Course Assignment. In Proceedings of the 19th International Conference on Computers in Education, T. Hirashima et al. (Eds). Chiang Mai, Thailand. PDF

[10] Crowston, Kevin, Nicolas Jullien, and Felipe Ortega (2011) Too Few New Wikipedians? Modelling Effort and Participation in Wikipedia. SSRN eLibrary. PDF

[11] Sormunen, Eero, and Leeni Lehtio (2011) Authoring Wikipedia articles as an information literacy assignment – copy-pasting or expressing new understanding in one's own words? PDF

[12] Deng, Yihan (2011) Change Tracking in Wikipedia. Master Thesis, PDF

[13] LauraHale, Hawkeye7, Pine and others, Mind the Gap(s)! Writing Styles of Female Editors on Wikipedia.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]