Research:Wikimedia Research Best Practices Around Privacy Whitepaper

Tracked in Phabricator:
Task T337883
Created
16:35, 17 January 2024 (UTC)
Collaborators
Michael Zimmer
Duration:  2023-October – 2024-March

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


The goal and anticipated output of this project is a whitepaper that creates shared understanding about how researchers should conduct their research on or about the Wikimedia projects in light of how Wikimedia communities value Privacy.

Overview

edit

Requested by English Wikipedia's Arbitration Committee, this white paper is intended to convey "[...] to researchers the principles of our movement and give specific recommendation for researchers on how to study and write about Wikipedians and their personal information in a way that respects our principles."

Executive summary

edit

Readers access Wikipedia articles more than 15 billion times every month, but the usage of Wikipedia does not stop here. Researchers frequently use Wikipedia related data for developing models, insights, and as part of research and development workflows. On average, every year researchers use or refer to Wikipedia in more than 130,000 articles and publish a minimum of roughly 500 articles about Wikipedia itself.[1] The amount and diversity of the usage of Wikipedia in research projects has resulted in significant insights and improvements in Wikipedia itself as well as in other aspects of our lives (e.g., through machine translation). However, conducting research using Wikipedia has its own challenges, both for researchers, Wikipedia community members, and Wikimedia Foundation. In this paper we focus on one of the most frequent topics we observe the Wikipedia community and researchers having to grapple with: privacy. Our aim with this work is to help researchers and Wikipedia contributors see the challenges each group faces in their work. We further offer recommendations about what to pay attention to and how to navigate some of the questions we expect Wikipedia contributors and researchers face when interfacing with or conducting research projects.

Read the latest version of the privacy white paper

edit

(placeholder section for arXiv.org link, which we anticipate should be available soon)

Updates

edit

January 2025 We have completed a number of iterations and improvements. We have also worked in coordination with the English Wikipedia Arbitration Committee to review the most recent draft, and have made some additional revisions based on that feedback. At this point we assess that the paper is safe to try; namely, that the paper is at a good place for it to be shared publicly and start encouraging the Wikipedia researchers and community members to use it and let us know what concretely we may need to consider improving to make it more actionable for them. As such, at the moment, we are in the process of publishing this paper on arxiv.org so that it can receive DOI and be referenced in research publications. The link will be provided as soon as we receive it.

Next steps:

  • We will consider January - July 2025 as a 'testing period' and raise awareness about the paper's availability. During this period, we will only update the text of the paper if an emergency request for update comes in.
  • We may do additional publications immediately to increase the reach of the work and make sure more researchers are aware of it. Most likely, however, we keep this push for after the testing period. At that point, we can write comments as part of journals, etc.
  • WMF will implement changes as part of the Research Fund so that the Research Fund applicants start testing it.
  • We will reach out to specific researchers in our networks who we know are facing relevant questions and will ask them to use the recommendations and give us feedback.
  • In July 2025, we will compile feedback received from testing the paper in action, iterate and improve it, and will consider it “stable”. After that point, we will revisit the paper every few years to make sure it’s still updated, however, we won’t frequently update it.

Initial draft notes and process (outdated)

edit

How to provide feedback and comments

edit

We're gathering feedback on the Research Ethics Privacy White Paper until 30 April 2024. We encourage you to provide your feedback in the corresponding talk page. If you prefer to share it privately, you can do so by sending an email to research-feedback@wikimedia.org with "privacy white paper" in the subject line.

  • We encourage you to use the talk page/discussion feature to provide your input. (Please don't directly edit the draft.)
  • As we are still drafting and revising (hence some notes you see throughout the draft), the most helpful feedback would be content-oriented in nature (since things are still in progress, copy-editing and feedback of that nature is less helpful for the moment).
  • Please add new topics or comment on existing topics to help us keep feedback organized.
  • The talk page also includes a few prompts for specific groups that we're hoping to receive feedback on.
  • We will be monitoring the talk page until 30 April 2024, but won't be able to respond directly to comments. However, all comments will be reviewed and considered in the ongoing drafting and revising process.
  • If you are more comfortable leaving comments in a language other than English, please feel welcome to do so. Please note that we may utilize machine translation in reviewing non-English content.
  • Join us for a Conversation Hour on 23 April 2024 at 15:00 UTC. This conversation will be guided by some questions to encourage actionable feedback. Join via Google Meet.

Initial Working outline

edit

Having gone through a feedback process for the outline with the original requesters, English Wikipedia's Arbitration Committee, we have been drafting the white paper based on the following outline, considered stable as of February 2024.

  • Introduction: What is the problem, why is it important, what has been tried before, and what is the goal of the white paper?
  • Related work: A review of related work, including privacy risks and adaptation on Wikipedia, ethical judgments for researchers, naming/referencing research participants, and existing related Wikipedia policies and guidelines.
  • Exploring key questions: Understanding key values of Wikipedians, policies around doxxing, understanding parameters of variation for different language versions of Wikipedia, understanding researchers, among other topics.
  • Recommendations: Recommendations for researchers and Wikipedians.

Notes

edit
  1. As measured by the number of search results in https://scholar.google.com/ when searching for articles with the word “Wikipedia” in their title.