Grants:Programs/Wikimedia Research Fund/Cover Women

statusfunded
Cover Women
Grant IDG-RS-2402-15223
start and end datesJune 1, 2024 – June 30, 2025
budget (USD)32000 USD
fiscal year2023-24
keywordsgender identities, intersectionalities, bias, gaps, systemic problem, Wikipedia, Wikidata, Spanish, English, Catalan, German, French, Italian, Portuguese, interviews, content analysis, linked open data, open refine, semantic web
applicant(s)• Núria Ferran-Ferrer
organization (if applicable)• Universitat de Barcelona

Title

edit

Cover Women: A comparative study of Wikipedia’s front page content from a gender and intersectional perspective with volunteer-driven insights and their newsroom guidelines

Overview

edit

Applicant(s)

Núria Ferran-Ferrer

Organization (if applicable)

Universitat de Barcelona

Project title

Cover Women

Budget

32000 USD

Introduction

edit

This proposal presents a research project that looks into the most popular Wikipedia pages. The main page, or front page from a communication perspective, is analyzed across the seven longest-standing Wikipedia editions: English, German, Catalan, French, Portuguese, Italian, and Spanish. Grounded in a gender and intersectional perspective, this study delves into the daily content, newsroom guidelines (principles and standards that guide the dissemination of information), and volunteer community insights. The examination employs communication theories like gatekeeping and agenda-setting. Beyond academic research, our goal is to actively contribute to editing communities by addressing the daily challenges and needs in managing front-page content.

We have selected the most popular Wikipedia pages for analysis. This page, commonly referred to as the main page, or front page from a communication perspective, is accessible in all language editions of the global encyclopedia, and we are conducting our study on it. We are researching possible gender and intersectional bias in its daily content, its newsroom guidelines (principles and standards that govern the dissemination of information), and the insights from the volunteer community who decide which information gets disseminated to the public on the main page. This research utilizes communication theories such as gatekeeping, which examines the process by which information is filtered, selected, and ultimately presented to the public (Barzilai-Nahon, 2009), and agenda-setting (McCombs and Shaw, 1972), which studies the effect.

To contrast the feasibility of this proposal with seven language editions of Wikipedia, we have already conducted a micro-project with a sample of the English and Spanish Wikipedia to assess the viability of the global project. That is: a) whether there are open and formalized recommendations and guidelines that determine which contents are published on the main page and if the publication criteria can be analyzed; b) at the same time, we were interested in seeing if, using data wrangling techniques, we could work with the biographies published on all Wikipedia main pages and analyze them from a gender and intersectional perspective using the properties of Wikidata; and c) finally, we highlight the ease of contacting the community that performs gatekeeping tasks, and we have begun to prepare the relevant questions to understand the decision-making process, editorial practices, and identify the issues that may be relevant to understanding the phenomenon. The results of this previous trial work with two language editions will be published soon (Ferran-Ferrer et al., 2024).

This study goes beyond affirming Wikipedia's reflection of reality to delve into its systemic challenges (Ford and Wajcman, 2017). It analyzes not only main page content selection but also newsroom guidelines, including interviews with gatekeepers, to enhance understanding and address systemic issues.

Literature review

edit

Wikipedia; as a key player in the public sphere, transforms information dissemination. Still; Wikipedia grapples with persistent gender bias in both editing and content(Antin et al., 2011; Bear and Collier, 2016; Wagner et al., 2016; Hinnosaar, 2019; Minguillón et al., 2021; Ferran-Ferrer, Boté-Vericad, et al., 2023) Alongside additional prejudices (Redi et al., 2021; Beytía et al., 2022); bias in contributions perpetuates imbalances in content coverage and discourages diversity, which further exacerbates the issue (Worku et al., 2020). Scholars highlight the need for a comprehensive understanding of Wikipedia's knowledge production culture to address these biases and make Wikipedia more robust, reliable, and transparent (Menking and Erickson, 2015). Reducing the gender and other intersectional biases necessitates more than acknowledging Wikipedia as a mirror of societal biases—it involves addressing the platform's deeper logic embedded in its techno-scientific project (Ford and Wajcman, 2017).

Bias in contributions perpetuates imbalances in content coverage and discourages diversity, which further exacerbates the issue (Worku et al., 2020). To address this, scholars stress the importance of understanding Wikipedia's knowledge production culture to tackle its gender gap (Menking and Erickson, 2015). Addressing this issue requires delving into the foundational principles driving the platform's techno-scientific project (Ford and Wajcman, 2017; Geiger, 2017), necessitating the recognition and dismantling of exclusionary practices (Menking and Rosenberg, 2021). Communication theories like gatekeeping and agenda-setting provide valuable frameworks for understanding Wikipedia's potential biases. Gatekeeping theory, focusing on information filtering processes, is applied to scrutinize stories selected for the Front Page, which attracts millions of readers monthly (Barzilai-Nahon, 2009; Wikimedia, 2023). Gatekeeping theory has previously been applied to Wikipedia by researchers to further understand biases in content selection and presentation (Li and Farzan, 2020) and to advocate for a reorganization of online spaces to democratize content and encourage dialectical gatekeeping that could reduce racial and other disparities (Ezell, 2021). Additionally, drawing from agenda-setting theory, we examine how Wikipedia's main page influences viewers and shapes news hierarchy, including its agenda-building power (McCombs and Shaw, 1972; Ren and Xu, 2023). Agenda setting can impact the choices of frames and sentiment adopted by Wikipedia pages regarding a particular issue or event (Lee, 2018) and it can play a role in shaping the focus and intensity of user edit activity in Wikipedia (Mahabir et al., 2018).

Research questions

edit
  • RQ1: What insights do interviews with volunteer gatekeepers (editors of the main page of Wikipedia) provide on decision-making, biases, and strategies affecting the visibility of gender and intersectionality-related content on Wikipediaʼs front page, particularly regarding how their preferences and interests, shape the topics featured
  • RQ2: How does gatekeeping impact gender gaps in content representation on digital platforms, specifically in the peer production of knowledge (decision-making system on suitable content and what is not) within newsrooms or editorial policies, and why is understanding this phenomenon crucial for addressing gender disparities?
  • RQ3:How does agenda setting influence the selection of frames and sentiment adopted by Wikipedia pages concerning specific issues or events, and how does it shape the focus and intensity of user edit activity within Wikipedia
  • RQ4: How prevalent is gender and intersectional bias in the content featured on Wikipedia’s front pages? This research is necessary to draw further attention to the need for systemic change within the platform’s newsroom/editorial practices to address disparities in gender and diversity representation in online knowledge and foster a more inclusive and diverse digital information landscape.

Methods and project stages

edit

This research proposal outlines a study on gender representation and biases on Wikipedia’s main page, the most visited Wikipedia page. The main page (or front page from a communication perspective) received 46.8 billion visits last November on the English edition (Wikimedia, 2023). We are conducting a comparative analysis across seven of the longest-standing Wikipedia editions—English, German, Catalan, French, Portuguese, Italian, and Spanish—all launched in 2001, employing a mixed-methods approach. Grounded in gender and intersectionality, the study analyzes daily content, editorial/newsroom guidelines, and insights from volunteer communities using communication theories like gatekeeping (Barzilai-Nahon, 2009) and agenda-setting (McCombs and Shaw, 1972). Our aim is not only academic research but also active contribution to editing communities by addressing daily challenges in curating front-page content. Therefore, we have already included seven working groups of Wikipedia users involved in gender issues for each language edition and the chapters of all Wikipedias analyzed in this project.

The different stages of the project are:

a) Conducting a scoping review, a systematic literature review using the SALSA Framework (Grant and Booth, 2009), to analyze academic publications from 2001 to 2024. This review focuses on examining Wikipedia within the framework of a communication ecosystem, followed by a triangulation methodology.

b) In-depth interviews with voluntary editors of the front page from all seven Wikipedia editions to ascertain decision-making processes, biases, and strategies that influence content visibility related to gender and other intersectionalities. The interviews are conducted in person or online and in the native languages of the volunteer participants. We plan to conduct around five interviews per language edition. Contacts with the volunteers are obtained through discussion pages related to editing the main page, as well as from user groups participating in the project, including calls from the chapters to their networks. The interview transcriptions are coded and analyzed using qualitative data analysis software, and a specific codebook is generated to facilitate the coding. This methodological approach addresses RQ1 and RQ3.

c) Newsroom guidelines: We apply content analysis to main-page, or front-page, editorial guidelines for each language edition, exploring the decision-making processes of the gatekeepers who determine story prominence. The content of these guidelines is coded and analyzed using qualitative data analysis software, and a specific codebook is generated to facilitate the coding. This research strategy addresses RQ2. The analysis of the qualitative approach to agenda-setting and gatekeeping practices (RQ1-3) is conducted independently with two codebooks—one for the interviews and one for the editorial policies. However, each codebook encodes elements specific to gatekeeping and agenda-setting to obtain evidence corresponding to the theoretical framework.

d) Main-page content quantitative analysis: We scrutinize the content (biographies) on the front page in each of the seven language editions over ten years, using data wrangling. First, we identify the sections of the main page that are consistently present across all Wikipedias and are easily comparable. Wikipedia’s front pages regularly feature changing content, offering snapshots of current events, featured articles, and useful links. It is important to note that volunteers maintain these main pages, and they may evolve in format and content over time. For each language edition, we employ a unique method to retrieve the content and data of its main page from the past ten years, as the URLs of previous main pages cannot be obtained from the dumps. Quantitative analysis begins by scraping data through the open-source tool OpenRefine to reconcile the URIs found in the sections of Wikipedia covers in both language editions. This process enriches them with specific properties from Wikidata to obtain values for selected properties for study, such as P21 (sex or gender), P106 (occupation), P172 (ethnic group), P103 (native language), and others. OpenRefine, utilized in various contexts and applications, is essential for this research as it enables the preparation and analysis of vast amounts of data. This method addresses RQ4.

Expected outputs

edit

The specific research outputs that we envision for our proposed project include, but are not limited to:

  • Scientific publications: We will draft scientific publications for each research question and assess whether the approach is comparative across all editions or if it is better to separate them by smaller communities, editorial process typologies, etc. This will be determined once the study is completed to decide on the best dissemination approach.
  • The data set, emerging from RQ4, will be made available as downloadable dumps, and will be accessed via public APIs and a SPARQL endpoint.
  • Participation at least at these conferences:
    • Wikiworkshop
    • Wikimania
    • WikiWomenCamp
    • Each user group and chapter will participate in national or regional events with Women Cover results.
  • Tools to support the editorial tasks of gatekeeping, namely:
    • Guidelines for content selection on front pages that are attuned to intersectionality and gender diversity;
    • Bots and AI assistants that facilitate the content selection process for front pages, with a focus on acknowledging intersectionality and gender differences. Both tools will be developed with a focus on considering the collaborative environment and consensus-driven approach characteristic of Wikipedia.
  • Resources aimed at enhancing the archiving and curation of main-page content across all Wikipedia editions outlined in this proposal.

References

edit
  • Barzilai‐Nahon, K. (2009). Gatekeeping: a critical review. Annual Review of Information Science and Technology, 43(1), 1–79. https://doi.org/10.1002/aris.2009.1440430117
  • Bear, J. B., & Collier, B. (2016). Where are the Women in Wikipedia? Understanding the Different Psychological Experiences of Men and Women in Wikipedia. Sex Roles, 74(5), 254–265. https://doi.org/10.1007/s11199-015-0573-y
  • Beytía, P., Agarwal, P., Redi, M., & Singh, V. K. (2022). Visual gender biases in Wikipedia: a systematic evaluation across the ten most spoken languages. Proceedings of the International AAAI Conference on Web and Social Media, 16, 43–54. https://ojs.aaai.org/index.php/ICWSM/article/view/19271
  • Ferran-Ferrer, N., Fernández, L., & Centelles, M. (2024). Wikipedia’s Front page 10 years evolution: analysis of gender and intersectionalities on content, newsroom guidelines and volunteer-driven insights in the Spanish and English editions. [Forthcoming].
  • Ford, H., & Wajcman, J. (2017). Anyone can edit, not everyone does: Wikipedia’s infrastructure and the gender gap. Social Studies of Science, 47(4), 511–527. https://doi.org/10.1177/0306312717692172
  • Hinnosaar, M. (2019). Gender inequality in new media: Evidence from Wikipedia. Journal of Economic Behavior & Organization, 163, 262–276. https://doi.org/10.1016/j.jebo.2019.04.020
  • McCombs, M. E., & Shaw, D. L. (1972). The agenda-setting function of mass media. Public Opinion Quarterly, 36(2), 176–187.
  • Menking, A., & Erickson, I. (2015). The elephant in the room: Gender and Wikipedia. In Proceedings of the 2015 ACM conference on computer supported cooperative work & social computing (pp. 100-110). https://doi.org/10.1145/2675133.2675160
  • Minguillón, J., Meneses, J., Aibar, E., Ferran-Ferrer, N., & Fàbregues, S. (2021). Exploring the gender gap in the Spanish Wikipedia: differences in engagement and editing practices. PLOS ONE, 16(2), e0246702. https://doi.org/10.1371/journal.pone.0246702
  • Redi, M., Gerlach, M., Johnson, I., Morgan, J., & Zia, L. (2021). A taxonomy of knowledge gaps for Wikimedia projects (Second Draft) (arXiv:2008.12314). arXiv. http://arxiv.org/abs/2008.12314
  • Wagner, C., Graells-Garrido, E., Garcia, D., & Menczer, F. (2016). Women through the glass ceiling: Gender asymmetries in Wikipedia. EPJ Data Science, 5(1), Article 1. https://doi.org/10.1140/epjds/s13688-016-0066-4
  • Wikimedia. (2023). Wikimedia statistics: all wikis. https://stats.wikimedia.org/#/all-projects
  • Worku, Z., Bipat, T., McDonald, D. W., & Zachry, M. (2020). Exploring Systematic Bias through Article Deletions on Wikipedia from a Behavioral Perspective. ACM International Conference Proceeding Series. https://doi.org/10.1145/3412569.3412573

Workshop proposal

edit

Front page editing guidelines for bots: first version of a speculative improvement exercise

edit

Wikipedia is a key player in the digital sphere and open knowledge, as well as in the dissemination of current information. From a communicative perspective, Wikipedia’s main page — one of the most visited on the platform and thus on the internet — plays a crucial role in access to the platform. Additionally, it has the potential influence of contributing to public opinion formation and, by extension, to shaping the news agenda.

This workshop aims to support editing communities by addressing the daily challenges and coordination needs involved in curating the front-page content. Taking a speculative approach, the goal is to develop a structured content guide to continue improving existing editing guides...

We imagine a future scenario in which a bot or other tools, utilizing natural language models (LLMs), would automatically manage the editorial tasks of all sections on Wikipedia's front page (article of the day, featured project, trends, on this day, etc.). Although this scenario is not yet a reality, the increasing use of such technologies on Wikipedia invites us to collaboratively rethink new perspectives on front-page editing. Some questions we will address include: How would an AI-driven bot be guided in editing the front page? How could rigor and relevance be ensured while avoiding any kind of bias? How could editorial guidelines be synthesized, expanded, or improved through collaborative approaches?

This co-creation workshop, proposed by members of the Women & Wikipedia research team (University of Barcelona), is open to any editor, not only those contributing to the front page, as well as to readers interested in this popular section. Based on previous research, interviews, and digital communication theories, we propose dynamically exploring a new approach to the editorial guidelines of Wikipedia's front page with the aim of strengthening them, making them more accessible, and enhancing their impact as a medium for knowledge and dissemination.

Research proposal

edit

Stage 1 Submission PDF on OpenReview

Stage 2 Submission PDF on OpenReview

Full proposal on OpenReview

Community feedback

edit

Please add any feedback or endorsements to the grant discussion page.

Give feedback on the discussion page