Research:Natural experiments

This page serves to document known natural experiments that impacted readers, editors, or content on Wikimedia projects (though many of these are specific to Wikipedia given its prominence).

Background

edit

While there is a wealth of observational data available about the Wikimedia projects that allow researchers to study lots of interesting dynamics and correlations between various aspects, it can still be difficult to study the causal impact of various interventions on e.g., editor retention or readership. Determining causation generally requires some sort of randomized controlled trial, which either might be unethical to conduct or logistically challenging to recruit sufficient participants. As such, many researchers turn to natural experiments -- i.e. (sudden) events where a substantial number of readers, editors, or articles are impacted in a way that resembles a randomized controlled trial. Often the most interesting of these natural experiments involves the sudden removal (or construction) of a structural barrier to access (see the Taxonomy of Knowledge Gaps for a summary of these barriers). Common examples of natural experiments include regulatory events such as censorship or new copyright laws, technical events such as tool/site outages, or social events such as wars or epidemics quickly changing people's behavior.

List of Natural Experiments

edit

Below is a list of known natural experiments. Please add to it and provide whatever context is useful. There currently is no order, but if the list grows longer, more structure can be added to it.

  • Censorship -- many examples, see en:Censorship of Wikipedia and Clark et al.[1] for more details, and Zhang and Zhu[2] and Zhang et al.[3] for illustrative analyses. Note that both the start and end of censorship can be considered natural experiments.
  • Cost of internet data -- Wikipedia Zero was a series of partnerships between 2012 and 2018 that eliminated the cost of data to access Wikipedia in 72 countries (estimated to impact 800 million people). In some countries, this intervention had substantial impacts on readership.[4] Again, both the start and end of zero-rating access to Wikipedia can be considered a natural experiment.
  • Page latency -- as new data centers are added (or have outages), readers in certain parts of the world often see a dramatic shift in how fast Wikipedia articles load. See task T222078 for an example analysis of the impact of latency on readership with the addition of a new data center in Singapore in 2018.
  • Tool outages -- important tools or bots on the wikis sometimes go down, dramatically affecting contribution patterns. A good example of this is ClueBot NG (anti-vandalism bot on English Wikipedia), which has periodically gone down and enabled analyses of the role of bots in patrolling -- e.g., Geiger and Halfaker.[5]
  • Regulatory changes -- changes in laws in a given country can have dramatic effects on the Wikimedia projects. A recurring example is Public Domain Day (when copyright expires for many works of art, literature, etc.) but other examples could include changes in Freedom of Panorama laws or laws that stifle free speech.
  • Major social events -- sudden, massive changes in people's behavior such as the lockdowns associated with Covid-19 that lead to many people being stuck at home (sometimes without work), wars that displace large populations (and might lead to a large shift from desktop to mobile readership), or the chilling effects observed following the Edward Snowden revelations.[6]
  • Campaigns -- external events to increase awareness and access to Wikimedia projects content might affect readership and contributions, as explored for a promotional video campaign in Hindi Wikipedia [7], feminist edit-a-thons[8], or Wiki Education campaigns[9].
  • Algorithmic Triggers -- Arbitrary discontinuous changes in user experience features can be used to draw causal inferences. For example, when the ORES "damaging" score crosses a threshold that triggers flags in Recent changes, comparing those edits that were **very nearly** flagged to those that were **just barely flagged** allows an estimate of the causal effect of being flagged.[10]

References

edit
  1. Clark, Justin; Faris, Robert; Heacock Jones, Rebekah (2017-05-01). "Analyzing Accessibility of Wikipedia Projects Around the World". Rochester, NY. 
  2. Zhang, Xiaoquan (Michael); Zhu, Feng (2011). "Group Size and Incentives to Contribute: A Natural Experiment at Chinese Wikipedia". American Economic Review 101 (4): 1601–1615. ISSN 0002-8282. doi:10.1257/aer.101.4.1601. 
  3. Zhang, Ark Fangzhou; Livneh, Danielle; Budak, Ceren; Robert, Lionel; Romero, Daniel (2017). "Shocking the crowd: The effect of censorship shocks on Chinese Wikipedia" 11 (1). pp. 367––376. 
  4. Wikimedia Foundation Audiences Metrics & Insights Q1 2018-19
  5. Geiger, R. Stuart; Halfaker, Aaron (2013-08-05). "When the levee breaks: without bots, what happens to Wikipedia's quality control processes?". Proceedings of the 9th International Symposium on Open Collaboration. WikiSym '13 (New York, NY, USA: Association for Computing Machinery): 1–6. ISBN 978-1-4503-1852-5. doi:10.1145/2491055.2491061. 
  6. Penney, Jonathon W. (2016). "Chilling Effects: Online Surveillance and Wikipedia Use". Berkeley Technology Law Journal. doi:10.15779/z38ss13. Retrieved 2019-08-20. 
  7. Chelsy Xie, Xiaoxi; Johnson, Isaac; Gomez, Anne (2019). "Detecting and gauging impact on Wikipedia page views". pp. 1254––1261. 
  8. Langrock, Isabelle; González-Bailón, Sandra (2020). "The Gender Divide in Wikipedia: Quantifying and Assessing the Impact of Two Feminist Interventions". Available at SSRN 3739176. 
  9. Zhu, Kai; Walker, Dylan; Muchnik, Lev (2020). "Content growth and attention contagion in information networks: Addressing information poverty on Wikipedia". Information Systems Research (INFORMS) 31 (2): 491––509. 
  10. TeBlunthuis, Nathan; Hill, Benjamin Mako; Halfaker, Aaron (2021-04-22). "Effects of Algorithmic Flagging on Fairness: Quasi-experimental Evidence from Wikipedia". Proceedings of the ACM on Human-Computer Interaction 5 (CSCW1): 56–1–56:27. doi:10.1145/3449130. Retrieved 2021-09-21.